Selecting Python app Server for REST api

September 14 2016

At Insight Stash we use python based stack, and deciding on a correct application server for serving surveys is of high importance. Unfortunately our research showed that there are no up to date benchmarks of python application servers to help us decide on the fastest one. So we decided to carry out our own testing, the results of which you can find in this article.

Server testing infrastructure

server structure

Each of the servers in the infrastructre is a separate virtual machine in the same network, a total of 4 servers: Web, App, Database and a separate Test Server. All test requests originate from Test Server to Web or App servers.

Testing methodology

We have tested a number of different App Server implementations: Gunicorn with 4 different worker types, uWSGI, Flask Dev Server, Waitress and NGINX just to have a baseline performance for the virtual machine. Each test is carried out 10 times, and average result across 10 runs is what you will see in charts.

Routes tested

We have tested 3 different routes in the implementation:

  1. NGINx serving static HTML file to measure performance of virtual server, and establish a performance and networking baseline. NGINx should show a much higher performance serving a static file than app server serving template or API response.
  2. Template generation and response by APP Server. This request goes directly to App server bypassing NGINx proxy. Template is a simple Jinja2 template with no variables or logic.
  3. API request. This request also goes directly to App server bypassing NGINx proxy. API request is more computationally intensive it returns JSON response and performs one DB write and ond DB read after which it further filters results before returning them.
Server configuration

All machines in diagram are virtual machines. There are 2 different configurations tested:

  • 1 CPU 2GB RAM and SSD HD
  • 2 CPUs 4GB RAM and SSD HD

App servers were started with number of workers equal to #CPUs * 3

Test command

Benchmark performed using apache utils suite - ab from Test Server.

ab -c 100 -n 10000 "URL"

Note: for API Route, number of requests was lowered to 1,000, since it was a slow route and took a while to complete.

Tests

We have primarily focused on 2 performance indicators for each test:

  1. Number of Requests served per second.
  2. Time under which it takes to serve 95% of request.

Note: there were 0 errors allowance for all tests, meaning each test run returned 100% valid results (200 response and correct data)

Test results - Server configuration #1

Requests per second (Template Generation)

Test Results

NGINx is a very fast server and even on a very low level hardware is capable of achieving 1.2k requests per second. So how many requests can a python server achieve while performing computation logic and template generation? It seems - around 20% - 40% of a simple static file serving. Meinheld worker class for gunicorn is a clear winner capable of outputting 530 requests per second in this particular test.

95% of requests served under this time in milliseconds (Template Generation)

Test Results

This test is important to help understand what is the average expected response time per request under load. Some servers are better at simultaneous request handling, and some are better at handling a single request fast.

Requests per second (API Request)

Test Results

Once we switch to real world example of a request that is a bit computationally intensive the performance drops dramatically. We are basically down to 4% throughput of NGINx serving static file on the same hardware, or 10% throughput of simple template generation.

There is still noticable difference between server performance and selecting even a little faster server will be always a better choice. In this test Gunicorn Eventlet takes the lead, followed by waitress.

95% of requests served under this time in milliseconds (API Request)

Test Results

Under a heavy load it takes much longer to process each request, but overall all requests are served much faster due to concurrency. For example with a load of 1000 requests and concurrency of 100, Gunicorn takes around 2 seconds to respond to each request. However if we request 1 request at a time, it would take around 100ms to serve.

Test results - Server configuration #2

Requests per second (Template Generation)

Test Results

We also wanted to understand how well do different servers scale with increase in RAM and most importantly CPU count. Looking at NGINx performance, we can see it can scale quiet nicely. With 2 vCPUs it is now serving 2.8k responses per second which is more than a 100% throughput increase. We believe a more than 2x performance boost is due to better scheduling opportunities with 2 vCPUs - less requests are in queue waiting to be served.

How did python app servers scale? - mostly nicely. All of the servers scaled almost linearly and were able to serve close to 2x as much requests per second. Two exceptions are - Flask-Dev server, which is expected to not scale at all. And Waitress server. We are not sure what happened to it, but it was able to serve less requests than on a single vCPU machine. We rerun tests 10 times for each configuration and still got the same results. We will blame low waitress performance on our configuration settings.

95% of requests served under this time in milliseconds (Template Generation)

Test Results

It seems that Gunicorn-Gevent and Gunicorn-Eventlet take more time on average than other servers to serve a request, but overall performance of requests/second is very comparable to others. Does this mean that these 2 servers take more requests to process simultanously, but then spend more time actually generating responses?

Requests per second (API Request)

Test Results

Picture is similar for API Request processing, all servers scaled almost linearly. Gunicorn-Meinheld and Gunicorn-Eventlet are leading the pack.

95% of requests served under this time in milliseconds (API Request)

Test Results

Meinheld is again serving most of the responses under 810 milliseconds, which provides a low latency interface for application.

Conclusion

When it comes to performance of Python App Servers the first thing to remember when your app is slow is to optimize the code first. Performance gain from switching to a different App Server will be negligible.

We have a good example for the above statement, when we wanted to write this article we have selected 2 routes we were going to review. API route used to have a throughput of around 15 requests per second on hardware #1, which would take forever to benchmark. Implementing simple changes to our code brought it's performance to 55 requests per second, which is 3.6x performance boost without changing app server.

Update: After writing this article we wanted to get even more performance out of our API route. With simple SQL baking instructions on this route we were able to gain another 100% throughput performance. Now our API route serves around 125 requests per second on #1 hardware configuration.

Remember to do your own testing too, your mileage may vary greatly.

Future

We might update this page in the future with 2 more app servers we wanted to test: CherryPy and Bjoern. We are also interested to see if Chaussette would glue backends better.