Last year, I spent three weeks fighting with PostgreSQL’s full-text search while building a plant care app for a client in Abu Dhabi. We needed to search plant names and care tips in both Arabic and English, and the database kept giving me garbage results for Arabic queries. Turns out, PG’s default settings treat Arabic like it’s just another Latin-language cousin — big mistake. Eventually, I got it working, and I’m writing this post to save you the 2 a.m. sessions I endured.
Basic Setup (The Easy Part)
Let’s start with a table of plant care tips — something like this:
CREATE TABLE plant_tips (
id SERIAL PRIMARY KEY,
title_en TEXT,
title_ar TEXT,
content_en TEXT,
content_ar TEXT
);For English-only search, you’d do:
SELECT * FROM plant_tips
WHERE to_tsvector('english', content_en) @@ to_tsquery('prune');Simple enough. But Arabic? Let’s talk about that elephant in the room.
Why Arabic Breaks Everything
PostgreSQL treats Arabic like it’s a slightly weird English cousin. In reality:
- •Arabic characters connect.
- •Many diacritics exist (like تَ or شُرْبَةٌ).
- •Stopwords are different (“ال” isn’t just “the” — sometimes).
- •Stemming works differently.
I’ll be real — my first attempt tried to_tsvector('arabic', content_ar) and returned zero useful results. Arabic text got split into random chunks, and searching for “نخيل” (palm tree) wouldn’t find “النخيل” because of the article “ال.”
Fixing Arabic Tokenization
Here’s what worked after trial and error:
- Normalize text before indexing — Strip diacritics using
unaccentextension:
CREATE EXTENSION IF NOT EXISTS unaccent;
UPDATE plant_tips SET content_ar_unaccented = unaccent('arabic', content_ar);- Create custom text search configuration:
CREATE TEXT SEARCH CONFIGURATION arabic_custom (COPY = simple);
-- Then add mappings for Arabic language...
-- (Full config omitted for brevity — see below)- Use a dedicated indexed column:
ALTER TABLE plant_tips ADD COLUMN search_vector_ar tsvector;
CREATE INDEX idx_search_vector_ar ON plant_tips USING GIN (search_vector_ar);Combining Both Languages
For a unified search box, I created a materialized view that concatenates normalized English and Arabic text:
CREATE MATERIALIZED VIEW combined_search AS
SELECT id,
to_tsvector('english', coalesce(content_en, '')) ||
to_tsvector('arabic', coalesce(content_ar_unaccented, '')) AS search_vector
FROM plant_tips;This way, users in Dubai can search “ palm tree” or “النخيل” interchangeably.
Annoying Pitfall: The || operator for tsvector only works if both sides have the same configuration. Took me two days to realize I’d mixed 'english' and 'arabic' in the same vector. Not that I want to admit it.
Real-World Test Case: Greeny Corner
The plant app that started this misery — Greeny Corner — went live on the UAE App Store after reworking the search like this. Now their users in Al Ain search for “طماطم” and get tomato care tips in Arabic, while expats type “tomato” and get the same.
Performance Gotchas
- •Index size: A GIN index on a 50k-row table with mixed text took 300MB. Expect that.
- •Update triggers: Add a trigger to refresh the search vector on insert/update:
CREATE OR REPLACE FUNCTION update_search_vector_ar()
RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector_ar = to_tsvector('arabic_custom', coalesce(NEW.content_ar_unaccented, ''));
RETURN NEW;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER tsvector_update
BEFORE INSERT OR UPDATE ON plant_tips
FOR EACH ROW EXECUTE FUNCTION update_search_vector_ar();- •Query patterns: Wildcard searches (
LIKE) on big tables will destroy performance. Use full-text search for Arabic — notILIKE.
A UAE Developer’s Perspective
Clients here expect bi-lingual UX. I once had a construction company in Riyadh ask, “Why can’t my team search equipment in Arabic?” They didn’t want an afterthought — they wanted Arabic treated as first-class. PG’s default setup doesn’t do that, so you’ll need to customize.
Also, remember: Arabic is right-to-left. Store the text correctly, but that’s a frontend concern. The backend just needs clean, normalized vectors.
Final Thoughts
It’s been 18 months since I shipped this search logic for Greeny Corner. Since then, I’ve reused the same approach for a real estate listings platform (Reach Home Properties) where property titles switch between languages. The pattern works, but it’s finicky — especially when dealing with old PG versions that lacked proper Arabic dictionaries.
If you’re trying to do this and it’s not working, check your normalization first. Are you stripping accents? Using the right config? Is your GIN index actually being used? (Check EXPLAIN ANALYZE.)
Need help? Hit me up on sarahprofile.com/contact — I’ve been there, and I’ll probably have a sarcastic but helpful response ready.