I was debugging a Laravel app for a construction client in Abu Dhabi when I hit a wall. Their document management system needed full-text search across Arabic and English contracts. PostgreSQL's default configuration choked on Arabic text, and the search results were garbage. After three wasted hours, I remembered the unaccent extension and some dictionary configuration tweaks I’d read about. Fifteen minutes later, it worked perfectly. That’s the problem with documentation — the official stuff makes this feel more complex than it is. Let me walk you through how to solve this pain point without repeating my mistakes.
Basic Setup: Beyond the Default
Most tutorials start with to_tsvector and to_tsquery, but in the UAE market, you'll run into language requirements fast. Here's where Postgres falls short out of the box:
- •Arabic requires different tokenization rules
- •Diacritics (tashdeed, fatha) mess up matches
- •English and Arabic should be weighted differently in results
First, enable required extensions:
CREATE EXTENSION IF NOT EXISTS unaccent;
CREATE EXTENSION IF NOT EXISTS pg_trgm;I'll be real — pg_trgm wasn't obvious to me until I saw it mentioned in a 2019 case study from a Riyadh startup. It helps with partial matches in Arabic text, which matters when users type search queries with missing diacritics.
Configuration: Don't Touch the Default Dictionaries
Here's the trap: most guides tell you to modify simple, english, or arabic text search configurations directly. Don't. Instead, create custom ones. One of my past clients (a news archive app in Jeddah) had me rollback changes multiple times when we broke default configurations.
Make your own:
CREATE TEXT SEARCH CONFIGURATION arabic_custom (PARSER = default);
ALTER TEXT SEARCH CONFIGURATION arabic_custom
ADD MAPPING FOR word WITH arabic_stem;
-- Add these for modern dialect handling
ALTER TEXT SEARCH CONFIGURATION arabic_custom
ADD MAPPING FOR hword_hyphen, hyphenated_word WITH simple;This configuration handles modern MSA and Gulf dialects, but don't stop here. Add custom dictionaries if your users swear in mixed English/Arabic (looking at you, Abu Dhabi social media devs).
Data Cleaning: Pre-Indexing Matters
The unaccent extension is your friend. Arabic speakers type inconsistently with diacritics. One search test I ran for a DAS Holding subsidiary showed: queries without diacritics returned 60% fewer documents.
Normalize text before indexing:
// In a Laravel model observer
public function creating(Document $doc) {
$doc->searchable_content = DB::raw("unaccent('{$doc->content}')");
}Yes, this duplicates data. No, you don't want to run unaccent() on every query. Disk is cheaper than annoyed users.
Indexing Strategy: The Good, the Bad, and the Slow
A UAE real estate client (not Reach Home, but close) had me benchmark multiple approaches. We ended up with a GIN index combining both languages:
CREATE INDEX idx_fts_both
ON documents
USING GIN(
setweight(to_tsvector('arabic_custom', coalesce(content_ar, ''))) ||
setweight(to_tsvector('english', coalesce(content_en, '')), 'B')
);Note the weights: 'A' for Arabic gives priority in relevance scores. We changed this twice during the project before finding the right balance for Abu Dhabi business users who often mix languages.
Query Time: Don't Forget the Ranking
This is where Postgres tutorials fail you. The default @@ operator works for binary matches but doesn't help with ranking results. Try:
SELECT ts_rank_cd(
setweight(to_tsvector('arabic_custom', coalesce(content_ar, ''))) ||
setweight(to_tsvector('english', coalesce(content_en, '')), 'B'),
phraseto_tsquery('fat car')
) as rank
FROM documents
ORDER BY rank DESC;I wasted an afternoon on this — phraseto_tsquery handles multi-word searches better than to_tsquery with manual AND/OR. Your Arabic speakers won’t care, but the expats typing "VIP service ممتاز" will notice.
Gotcha: The Language Detection Shortfall
One project I did for a food delivery app (think Greeny Corner but for shawarma) needed automatic language detection. Postgres can’t do that natively — we had to add a Python extension. Don’t bother reinventing the wheel. Use a library to tag text language before inserting/updating.
Real-World Example: Tawasul Limo Booking
The client wanted drivers to search for customer notes in either language. We built a simple UI with React Native that sent search terms to a Postgres view. The key tweak? We excluded certain common words like "al" (ال) from stop words, because Arabic articles actually matter in context sometimes.
Final metrics: ~4 million records, 150ms average query time with 35 concurrent users. Not bad for a server with 8GB RAM, honestly.
Closing Thoughts (But Not That Kind)
This setup isn't perfect. I had to manually tweak the Arabic stemmer dict to handle UAE-specific terms for one client — turns out "burj" as a keyword needs special case handling. But the beauty of Postgres is its flexibility if you don't let the complexity intimidate you.
Need help with a search implementation in Dubai or beyond? I've done 40+ of these. Contact page is always open.