Skip to main content
Tutorial

PostgreSQL Full-Text Search for Arabic and English: A Practical Guide

4 min read

Learn how to build a functional multilingual FTS system in PostgreSQL for Arabic and English

PostgreSQLFull-Text SearchArabic LanguageDatabase OptimizationUAE Development

Last year, I spent entirely too long fighting PostgreSQL's full-text search (FTS) for a client. They run a real estate platform in Abu Dhabi—let's call them "Reach Home Properties" because that's literally the name on their contract. They wanted listings searchable in both languages: Arabic for locals, English for expats. The first version we shipped had zero support for this. I knew basic FTS, but Arabic threw me for a loop. Let me save you those 8 hours.


Basics: How Not to Mess Up

I'll assume you've used PostgreSQL's FTS before. You've probably written queries like WHERE to_tsvector(title) @@ to_tsquery('search:en'). But that's English-only. What if your table has mixed language content?

PostgreSQL defaults to the simple configuration, which just splits words. That won't handle Arabic roots like "كتب" (he wrote), "كتاب" (book), or "مكتبة" (library). They all share the root "كتب". The built-in Arabic dictionary? Let's say it's underwhelming—it doesn't stem correctly out of the box.


Configuring for Arabic

First thing I did: create a new text search configuration that actually respects Arabic morphology. We used:

CREATE TEXT SEARCH CONFIGURATION arabic (PARSER = default);

ALTER TEXT SEARCH CONFIGURATION arabic

ADD MAPPING FOR word, hword, email

WITH arabic_ispell, simple;

Yes, that's the arabic_ispell dictionary. Wait, does that work? Short answer: sometimes. Longer answer: you probably need to compile the hunspell Arabic affix file and restart PostgreSQL. Took forever. I'll skip the OS-specific compiling steps unless you want trauma.


Mixed Input? Handle Both Languages

Now about clients blending languages. One listing's title might say "3 Bedroom فيلا for Sale". We needed both languages indexed together. I ended up with a GIN index like:

CREATE INDEX idx_fts_all_langs ON properties

USING GIN( (setweight(to_tsvector('english'::regconfig, coalesce(title_en, '')), 'A') ||

setweight(to_tsvector('arabic'::regconfig, coalesce(title_ar, '')), 'A')) );

Notice setweight and coalesce? Makes zero sense unless you've already broken a query trying to combine two to_tsvector calls. The || operator merges vectors. Weigh both equally since the client couldn't decide which was more important.


Performance Headache

Full-text queries without indexes are a nightmare. For a database with 400k+ properties, we saw response times jump from 50ms to 6 seconds after enabling Arabic parsing. The fix was obvious in hindsight: always use GIN. Always.

CREATE INDEX idx_fts_fast ON properties

USING GIN( (to_tsvector('english'::regconfig, body_text) ) );

Wait no—that didn't handle Arabic. Had to create separate indexes for each language and then combine them in queries. Maybe not 100% optimal, but at least it didn't time out clients.


The Day Arabic Queries Returned Empty Results

Mid-deployment something broke. English searches worked, but Arabic terms like "شقة" (apartment) returned nothing. I debugged for hours until I checked lexeme generation:

SELECT ts_debug('arabic', 'شقة دبي');

Turns out some Arabic prepositions aren't stripped correctly. A search for "شقة" got turned into "شقت" due to noun-adjective handling. The parser treated the input like a derived form it didn't recognize. Solution? Wrote a wrapper that runs normalize_arabic() on incoming search strings—that's a custom function stripping diacritics and standardizing roots. Brutal, but effective.


Why Care About This in the UAE?

Clients here expect dual-language support, period. Whether it's an Abu Dhabi startup or a regional holding company, you'll get asked "will people in Al Ain and Dubai both find this?" all the time. Arabic isn't an afterthought—they demand proper handling of تَشْدِيد and ة at minimum. PostgreSQL can do it, but you'll burn some patience.

A similar problem came up in a plant care app called Greeny Corner (yes, on UAE App Stores)—searching plant names in both languages had similar gotchas. Same workaround: pre-normalize search terms before querying.


TL;DR

PostgreSQL's full-text search isn't magic. You can index Arabic content, but it takes elbow grease. Compile the right dictionaries. Use ts_debug to check lexemes. Don't assume arabic_ispell works out of the box. And for God's sake, always test with sample text from real clients.

If you're stuck fighting this on a tight deadline, ping me via sarahprofile.com/contact. I've burned too many hours on this to watch anyone else suffer.


S

Sarah

Senior Full-Stack Developer & PMP-Certified Project Lead — Abu Dhabi, UAE

7+ years building web applications for UAE & GCC businesses. Specialising in Laravel, Next.js, and Arabic RTL development.

Work with Sarah