Last year, I spent entirely too long fighting PostgreSQL's full-text search (FTS) for a client. They run a real estate platform in Abu Dhabi—let's call them "Reach Home Properties" because that's literally the name on their contract. They wanted listings searchable in both languages: Arabic for locals, English for expats. The first version we shipped had zero support for this. I knew basic FTS, but Arabic threw me for a loop. Let me save you those 8 hours.
Basics: How Not to Mess Up
I'll assume you've used PostgreSQL's FTS before. You've probably written queries like WHERE to_tsvector(title) @@ to_tsquery('search:en'). But that's English-only. What if your table has mixed language content?
PostgreSQL defaults to the simple configuration, which just splits words. That won't handle Arabic roots like "كتب" (he wrote), "كتاب" (book), or "مكتبة" (library). They all share the root "كتب". The built-in Arabic dictionary? Let's say it's underwhelming—it doesn't stem correctly out of the box.
Configuring for Arabic
First thing I did: create a new text search configuration that actually respects Arabic morphology. We used:
CREATE TEXT SEARCH CONFIGURATION arabic (PARSER = default);
ALTER TEXT SEARCH CONFIGURATION arabic
ADD MAPPING FOR word, hword, email
WITH arabic_ispell, simple;
Yes, that's the arabic_ispell dictionary. Wait, does that work? Short answer: sometimes. Longer answer: you probably need to compile the hunspell Arabic affix file and restart PostgreSQL. Took forever. I'll skip the OS-specific compiling steps unless you want trauma.
Mixed Input? Handle Both Languages
Now about clients blending languages. One listing's title might say "3 Bedroom فيلا for Sale". We needed both languages indexed together. I ended up with a GIN index like:
CREATE INDEX idx_fts_all_langs ON properties
USING GIN( (setweight(to_tsvector('english'::regconfig, coalesce(title_en, '')), 'A') ||
setweight(to_tsvector('arabic'::regconfig, coalesce(title_ar, '')), 'A')) );
Notice setweight and coalesce? Makes zero sense unless you've already broken a query trying to combine two to_tsvector calls. The || operator merges vectors. Weigh both equally since the client couldn't decide which was more important.
Performance Headache
Full-text queries without indexes are a nightmare. For a database with 400k+ properties, we saw response times jump from 50ms to 6 seconds after enabling Arabic parsing. The fix was obvious in hindsight: always use GIN. Always.
CREATE INDEX idx_fts_fast ON properties
USING GIN( (to_tsvector('english'::regconfig, body_text) ) );
Wait no—that didn't handle Arabic. Had to create separate indexes for each language and then combine them in queries. Maybe not 100% optimal, but at least it didn't time out clients.
The Day Arabic Queries Returned Empty Results
Mid-deployment something broke. English searches worked, but Arabic terms like "شقة" (apartment) returned nothing. I debugged for hours until I checked lexeme generation:
SELECT ts_debug('arabic', 'شقة دبي');
Turns out some Arabic prepositions aren't stripped correctly. A search for "شقة" got turned into "شقت" due to noun-adjective handling. The parser treated the input like a derived form it didn't recognize. Solution? Wrote a wrapper that runs normalize_arabic() on incoming search strings—that's a custom function stripping diacritics and standardizing roots. Brutal, but effective.
Why Care About This in the UAE?
Clients here expect dual-language support, period. Whether it's an Abu Dhabi startup or a regional holding company, you'll get asked "will people in Al Ain and Dubai both find this?" all the time. Arabic isn't an afterthought—they demand proper handling of تَشْدِيد and ة at minimum. PostgreSQL can do it, but you'll burn some patience.
A similar problem came up in a plant care app called Greeny Corner (yes, on UAE App Stores)—searching plant names in both languages had similar gotchas. Same workaround: pre-normalize search terms before querying.
TL;DR
PostgreSQL's full-text search isn't magic. You can index Arabic content, but it takes elbow grease. Compile the right dictionaries. Use ts_debug to check lexemes. Don't assume arabic_ispell works out of the box. And for God's sake, always test with sample text from real clients.
If you're stuck fighting this on a tight deadline, ping me via sarahprofile.com/contact. I've burned too many hours on this to watch anyone else suffer.