Optimizing a Streaming API: A Technical Journey Through FastAPI, Gunicorn, and Nginx

Today, I focused on optimizing a streaming API built with FastAPI, hosted on an EC2 instance, and served through Nginx. What began as a performance tuning task evolved into an in-depth analysis of server configuration, proxy settings, and API optimization. This post details the technical challenges I encountered and the solutions I implemented.

Initial Setup

The API was built using FastAPI, a high-performance Python web framework. The stack included:

Gunicorn as the WSGI HTTP Server
Nginx as the reverse proxy
EC2 instance for hosting
OpenAI’s GPT model for generating streaming responses

Challenge 1: Performance Issues with Cloudflare

Initially, I observed significant latency in API response chunking when accessed through our Cloudflare-proxied domain. To isolate the issue, I temporarily disabled Cloudflare proxying. This test revealed that the problem wasn’t Cloudflare-specific, but rather stemmed from our server configuration.

Challenge 2: Persistent Streaming Latency

Even with Cloudflare disabled, the API exhibited slow chunking when accessed through our domain. To further isolate the issue, I tested the API by running it directly on port 8000 of the EC2 instance. This test showed improved streaming performance, indicating that Nginx was likely the bottleneck.

Challenge 3: Nginx Configuration Optimization

Analysis of the Nginx configuration revealed that it wasn’t optimized for handling streaming responses and Server-Sent Events (SSE). I implemented the following key changes to the Nginx configuration:

Disabled response buffering
Disabled caching for the API location
Enabled keepalive connections
Disabled chunked transfer encoding
Increased read and send timeouts
Disabled gzip compression

Here’s the optimized Nginx configuration:

location /api/ {
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    proxy_pass http://unix:/home/ubuntu/catalyst-api/catalyst.sock;

    proxy_buffering off;
    proxy_cache off;
    proxy_set_header Connection '';
    proxy_http_version 1.1;
    chunked_transfer_encoding off;
    proxy_read_timeout 300s;
    proxy_send_timeout 300s;
    gzip off;
}

Results

After implementing these configuration changes, the streaming API performance improved significantly. The response chunking speed matched the performance observed when running the API directly on the EC2 instance.

Key Learnings

Systematic Isolation: Testing the API at different levels (with Cloudflare, without Cloudflare, directly on EC2) was crucial in identifying the source of performance issues.
Nginx Configuration for Streaming: Standard Nginx configurations are often insufficient for streaming responses and SSE. Specific optimizations are necessary for these use cases.
Full Stack Understanding: Issues can manifest at any level of the application stack. A comprehensive understanding of each component (FastAPI, Gunicorn, Nginx, Cloudflare) is essential for effective troubleshooting.
Continuous Performance Monitoring: As the API scales, ongoing monitoring and optimization will be necessary. This experience provided valuable insights for future performance tuning efforts.

Conclusion

Optimizing a streaming API requires more than efficient code; it demands careful configuration of the entire server stack. Through methodical isolation of issues and a deep understanding of each system component, I was able to significantly enhance the API’s performance.

This experience underscores the importance of thorough server configuration analysis when addressing performance issues in complex API setups.