Skip to main content
Tutorial

PostgreSQL Full-Text Search for Arabic and English: A Practical Guide

4 min read

How I fixed Arabic full-text searches in PostgreSQL for a plant care app while dealing with diacritics, stemming, and configuration nightmares.

PostgreSQLsearchArabicLaravelNext.js

Last year, I spent three weeks fighting with PostgreSQL’s full-text search while building a plant care app for a client in Abu Dhabi. We needed to search plant names and care tips in both Arabic and English, and the database kept giving me garbage results for Arabic queries. Turns out, PG’s default settings treat Arabic like it’s just another Latin-language cousin — big mistake. Eventually, I got it working, and I’m writing this post to save you the 2 a.m. sessions I endured.

Basic Setup (The Easy Part)

Let’s start with a table of plant care tips — something like this:

sql
CREATE TABLE plant_tips (
  id SERIAL PRIMARY KEY,
  title_en TEXT,
  title_ar TEXT,
  content_en TEXT,
  content_ar TEXT
);

For English-only search, you’d do:

sql
SELECT * FROM plant_tips
WHERE to_tsvector('english', content_en) @@ to_tsquery('prune');

Simple enough. But Arabic? Let’s talk about that elephant in the room.

Why Arabic Breaks Everything

PostgreSQL treats Arabic like it’s a slightly weird English cousin. In reality:

  • Arabic characters connect.
  • Many diacritics exist (like تَ or شُرْبَةٌ).
  • Stopwords are different (“ال” isn’t just “the” — sometimes).
  • Stemming works differently.

I’ll be real — my first attempt tried to_tsvector('arabic', content_ar) and returned zero useful results. Arabic text got split into random chunks, and searching for “نخيل” (palm tree) wouldn’t find “النخيل” because of the article “ال.”

Fixing Arabic Tokenization

Here’s what worked after trial and error:

  1. Normalize text before indexing — Strip diacritics using unaccent extension:
sql
CREATE EXTENSION IF NOT EXISTS unaccent;

UPDATE plant_tips SET content_ar_unaccented = unaccent('arabic', content_ar);
  1. Create custom text search configuration:
sql
CREATE TEXT SEARCH CONFIGURATION arabic_custom (COPY = simple);
-- Then add mappings for Arabic language...
-- (Full config omitted for brevity — see below)
  1. Use a dedicated indexed column:
sql
ALTER TABLE plant_tips ADD COLUMN search_vector_ar tsvector;
CREATE INDEX idx_search_vector_ar ON plant_tips USING GIN (search_vector_ar);

Combining Both Languages

For a unified search box, I created a materialized view that concatenates normalized English and Arabic text:

sql
CREATE MATERIALIZED VIEW combined_search AS
SELECT id,
  to_tsvector('english', coalesce(content_en, '')) ||
  to_tsvector('arabic', coalesce(content_ar_unaccented, '')) AS search_vector
FROM plant_tips;

This way, users in Dubai can search “ palm tree” or “النخيل” interchangeably.

Annoying Pitfall: The || operator for tsvector only works if both sides have the same configuration. Took me two days to realize I’d mixed 'english' and 'arabic' in the same vector. Not that I want to admit it.

Real-World Test Case: Greeny Corner

The plant app that started this misery — Greeny Corner — went live on the UAE App Store after reworking the search like this. Now their users in Al Ain search for “طماطم” and get tomato care tips in Arabic, while expats type “tomato” and get the same.

Performance Gotchas

  • Index size: A GIN index on a 50k-row table with mixed text took 300MB. Expect that.
  • Update triggers: Add a trigger to refresh the search vector on insert/update:
sql
CREATE OR REPLACE FUNCTION update_search_vector_ar()
RETURNS TRIGGER AS $$
BEGIN
  NEW.search_vector_ar = to_tsvector('arabic_custom', coalesce(NEW.content_ar_unaccented, ''));
  RETURN NEW;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER tsvector_update
BEFORE INSERT OR UPDATE ON plant_tips
FOR EACH ROW EXECUTE FUNCTION update_search_vector_ar();
  • Query patterns: Wildcard searches (LIKE) on big tables will destroy performance. Use full-text search for Arabic — not ILIKE.

A UAE Developer’s Perspective

Clients here expect bi-lingual UX. I once had a construction company in Riyadh ask, “Why can’t my team search equipment in Arabic?” They didn’t want an afterthought — they wanted Arabic treated as first-class. PG’s default setup doesn’t do that, so you’ll need to customize.

Also, remember: Arabic is right-to-left. Store the text correctly, but that’s a frontend concern. The backend just needs clean, normalized vectors.

Final Thoughts

It’s been 18 months since I shipped this search logic for Greeny Corner. Since then, I’ve reused the same approach for a real estate listings platform (Reach Home Properties) where property titles switch between languages. The pattern works, but it’s finicky — especially when dealing with old PG versions that lacked proper Arabic dictionaries.

If you’re trying to do this and it’s not working, check your normalization first. Are you stripping accents? Using the right config? Is your GIN index actually being used? (Check EXPLAIN ANALYZE.)

Need help? Hit me up on sarahprofile.com/contact — I’ve been there, and I’ll probably have a sarcastic but helpful response ready.

S

Sarah

Senior Full-Stack Developer & PMP-Certified Project Lead — Abu Dhabi, UAE

7+ years building web applications for UAE & GCC businesses. Specialising in Laravel, Next.js, and Arabic RTL development.

Work with Sarah