Skip to main content
Tutorial

How to Stream AI Responses in Next.js Without Losing Your Mind

5 min read

A developer-friendly guide to streaming AI outputs in Next.js, with lessons from real UAE projects.

Next.jsAIStreamTypeScriptFirebaseTutorialUAE developer

I was once stuck trying to hook up an AI-powered plant identifier in Greeny Corner where users could upload photos and get real-time care instructions. The AI model took 5 seconds to process each request — a lifetime in user experience terms. I needed to stream the response chunks as they came in instead of making visitors stare at a loading spinner until the full output arrived.

Spoiler alert: getting Next.js to handle real-time streaming from a provider like Google Gemini or OpenAI isn't as simple as res.send() followed by res.end().

Setting Up the Next.js API Route

Next.js API routes default to buffering responses. To stream, you need to access the raw http.ServerResponse object — which means you can’t use async/await directly. Here's what I learned the hard way:

  1. Return a promise from your API handler to prevent Next.js from auto-ending the response.
  2. Disable responseLimit in next.config.js` if you're proxying large AI outputs:

`js

module.exports = {

experimental: {

responseLimit: '10mb'

}

}

`

  1. Set headers manually before sending the first chunk:

`js

res.writeHead(200, {

'Content-Type': 'text/plain',

'X-Content-Type-Options': 'nosniff'

});

`

I once forgot to disable responseLimit while helping a Abu Dhabi logistics client — their AI-generated shipping reports kept crashing with Response size exceeded 4MB. That took three coffees to debug.

Handling Streaming on the Client

Most React tutorials suggest using fetch() and response.text(), but that only gets you the full response after it finishes. For real streaming, you need ReadableStream. Here's how I structured the component:

tsx
useEffect(() => {
  const ctrl = new AbortController();
  
  fetch('/api/ai-endpoint', {
    method: 'POST',
    body: JSON.stringify(input),
    signal: ctrl.signal
  }).then(response => {
    const reader = response.body?.getReader();
    // Process chunks using read() in a recursive loop
  });

  return () => ctrl.abort();
}, [input]);

Important gotcha: If you're using React Query or SWR, this breaks their caching assumptions. For AI streaming, I usually go plain fetch() and manually handle state.

Dealing with Edge Cases

  • Don't assume all clients can handle streaming. Add fallback logic for older browsers.
  • Always implement timeout handling — I’ve seen AI APIs hang indefinitely when users cancel requests mid-stream.
  • Use text/event-stream content type for Server-Sent Events (SSE), but know that SSE has limitations compared to WebSockets.

I once wasted two hours trying to stream from a Laravel backend to a Next.js frontend because Apache was buffering the response. The fix? Adding SetEnvIfNoCase X-Requested-With "XMLHttpRequest" nokeepalive to .htaccess.

Frequently Asked Questions

How do I fix CORS issues when streaming from Next.js?

Add your client-domain to the API response headers before sending the first chunk. Example:

js
res.setHeader('Access-Control-Allow-Origin', 'https://my-app.com');
res.setHeader('Access-Control-Allow-Methods', 'POST, OPTIONS');

If you’re working with UAE companies that need Arabic language support, include Access-Control-Allow-Headers to handle UTF-8 encoding.

What's the difference between streaming and Server-Sent Events (SSE)?

Streaming sends raw text/data in chunks over a single connection. SSE wraps chunks in event:-prefixed payloads with built-in reconnect logic. Use SSE for chatbots but plain streaming for one-off AI outputs — I do this in Tawasul Limo.

How can I improve streaming performance on Vercel?

Stick with the Edge runtime if you're using Vercel. Node.js runtime has higher latency for streaming endpoints. Also compress responses with gzip:

js
import { Gzip } from 'zlib';
import { pipeline } from 'stream';

// Then pipe your response through Gzip...

Does streaming AI responses cost extra token fees?

Yes. Providers charge per input + output tokens. Streaming doesn’t reduce costs, but makes long responses feel faster. With UAE clients, I track token usage via middleware and show users an estimated charge before processing.


Want to skip the debugging nightmares? Book a free consultation and I’ll help you integrate AI streaming with TypeScript and Firebase. I’ve done 40+ projects in the GCC region — including a luxury car booking platform that handles 12,000+ monthly requests.

S

Sarah

Senior Full-Stack Developer & PMP-Certified Project Lead — Abu Dhabi, UAE

7+ years building web applications for UAE & GCC businesses. Specialising in Laravel, Next.js, and Arabic RTL development.

Work with Sarah