Benchmarking REST daemons/minimal web servers

Preface
PHP
Node
C++
Common Lisp
C
Conclusion
Discuss

Preface

Micro services are fun and easy to set up, and an important part in a micro service architecture.

One consideration in choosing which language to use for a given micro service/REST API is the speed of the servers that are readily available for that language.

Given this, I have decided to test out a variety of minimal web servers, that would serve as a solid base for a REST API / micro service.

All tests are performed on an (obviously weak) machine with the following stats:

Single Core Intel(R) Pentium(R) M processor 1.60GHz
1.5G RAM
HDD (no SSD)

To start with testing, we will serve a simple "Hello, World" response from each server and run the following benchmark test against them:

ab -c10 -n1000 'http://127.0.0.1:<port>/'

This is Apache Bench, with 10 concurrent users performing 1000 total requests (so, in the output, you will see batches of 100 requests performed).

<port> is going to be the port the server is running on.

It should go without saying, but for all these benchmarking tests, please disable any special debugging options that may have dramatic performance hits in each given language (such as xdebug in PHP).

PHP

To test PHP, we will begin by benchmarking a standard "Hello, World" script served from Apache 2 set up with the default configuration, followed by a "Hello, World" served from a persistent daemon (REACT PHP).

After that, we'll compare how much difference a persist daemon makes when using a framework vs not.

Apache 2 Benchmark

Add the following script into a file named 'hw.php' in a directory accessible via Apache:

<?php

echo 'Hello, World';

Then (assuming your Apache is already set up) test with CURL to verify the response, test with ab to generate the bench:

curl http://127.0.0.1/hw.php # Should produce 'Hello, World'
ab -c10 -n1000 'http://127.0.0.1/hw.php'

We end up with the following (truncated) output (obviously this will differ on your machine):

Concurrency Level:      10
Time taken for tests:   0.617 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      154000 bytes
HTML transferred:       12000 bytes
Requests per second:    1620.24 [#/sec] (mean)
Time per request:       6.172 [ms] (mean)
Time per request:       0.617 [ms] (mean, across all concurrent requests)
Transfer rate:          243.67 [Kbytes/sec] received

Not bad, but it could probably be faster if we weren't having Apache handle loading the PHP interpreter on each request.

React PHP Benchmark

So, lets try this out. For our purposes, the single page of setup here will work to get us started:

http://reactphp.org/

When you set it up/start the server, prepare to benchmark with the following:

ab -c10 -n1000 'http://127.0.0.1:1337/'

Concurrency Level:      10
Time taken for tests:   0.423 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      122000 bytes
HTML transferred:       22000 bytes
Requests per second:    2361.98 [#/sec] (mean)
Time per request:       4.234 [ms] (mean)
Time per request:       0.423 [ms] (mean, across all concurrent requests)
Transfer rate:          281.41 [Kbytes/sec] received

Well, that isn't terrible (over 700 more requests per second, a 45% performance boost), however Apache could probably be tuned to perform better in this use case, but it does illustrate how removing the start up time can have a significant impact on performance.

Now, lets see if running in a daemon format makes more of a difference under a different use case.

Apache 2 Benchmark with Symfony 3

Symfony is a great framework, I love it, but in the case of our micro service setup, it may not be a good choice, as it is a pretty heavy framework, and we're testing on a pretty weak machine.

Regardless, lets see how it peforms (if we wanted to use it to gain the code structure/cleanliness/libraries it provides):

Set up a copy (refer to http://symfony.com/doc/current/setup.html), then lets update the DefaultController.php file in the codebase as such:

public function indexAction(Request $request)
{
    return new Response('Hello, World');
}

and don't forget to disable the debug features in app_dev.php (or they will tear up your performance - and really make sure xdebug isn't turned on!), and then bench similar to our other projects with:

ab -c10 -n1000 'http://local.symfony/app_dev.php'

and the results:

Concurrency Level:      10
Time taken for tests:   174.178 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      380000 bytes
HTML transferred:       12000 bytes
Requests per second:    5.74 [#/sec] (mean)
Time per request:       1741.775 [ms] (mean)
Time per request:       174.178 [ms] (mean, across all concurrent requests)
Transfer rate:          2.13 [Kbytes/sec] received

Pretty terrible (400x slower than plain PHP, quite a heavy tax to pay for what a framework offers).

Could it be better if we didn't have to have all the time spent ramping up those framework files? Lets give it a shot with PHP PPM (React PHP for Frameworks).

Symfony 3 with PHP PPM

Follow the set up instructions on https://github.com/php-pm/php-pm for setup with Symfony, and onward to the benchmark:

Concurrency Level:      10
Time taken for tests:   7.371 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      244000 bytes
HTML transferred:       22000 bytes
Requests per second:    135.67 [#/sec] (mean)
Time per request:       73.710 [ms] (mean)
Time per request:       7.371 [ms] (mean, across all concurrent requests)
Transfer rate:          32.33 [Kbytes/sec] received

Hmm, not bad! About 25x faster than plain old Symfony over Apache, unfortunately it is still 20x slower than a plain PHP script (but, that is to be expected, as we're loading an immense number of libraries/classes/scripts).

Node

So, Symfony had some issues - how does a very minimal framework in a language touted for multi-threading handle?

Lets test out Express with node/npm (https://www.npmjs.com/package/express) with the Hello, World sample listed on that page:

Concurrency Level:      10
Time taken for tests:   1.019 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      205000 bytes
HTML transferred:       11000 bytes
Requests per second:    981.60 [#/sec] (mean)
Time per request:       10.187 [ms] (mean)
Time per request:       1.019 [ms] (mean, across all concurrent requests)
Transfer rate:          196.51 [Kbytes/sec] received

Not too shabby, but are we failing to break 1k/RPS due to the framework, or is that just as fast as node is going to go for us?

Lets try without a framework, create the following node.js program:

var app = require('http').createServer(handler);
app.listen(3000);

function handler(req, res) {
    res.writeHead(200);
    res.end('Hello, World');
}

Concurrency Level:      10
Time taken for tests:   0.660 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      87000 bytes
HTML transferred:       12000 bytes
Requests per second:    1515.78 [#/sec] (mean)
Time per request:       6.597 [ms] (mean)
Time per request:       0.660 [ms] (mean, across all concurrent requests)
Transfer rate:          128.78 [Kbytes/sec] received

Not bad! Almost twice as fast as not using Express.

C++

So, you're interested in programming if you're reading this, so you've likely heard or come to understand that C++ is pretty fast (I mean, they use it for lots of important things requiring high performance/manual memory management and fine tuning etc.).

Unfortunately, C++ doesn't have nearly the same footprints in the web/REST club that PHP does, so options for a C++ based web server are a bit more limited.

Fortunately, there is a project here:

https://github.com/eidheim/Simple-Web-Server

That can quickly get us up and running for the purpose of this test.

Go ahead and clone the repository, but before building it, make the following edit to the http_examples.cpp file to add a plain old 'Hello, World' route:

//GET-example for the path /hw
//Responds with Hello, World
server.resource["^/hw"]["GET"]=[](shared_ptr<HttpServer::Response> response, shared_ptr<HttpServer::Request>) {
  *response <<  "HTTP/1.1 200 OK\r\nContent-Length: " << "12" << "\r\n\r\n" << "Hello, World";
};

Cool, they have a routing mechanism/URL pattern matcher similar to Symfony, and they make use of C++11 lambdas to do it!

Well, lets see how this one performs, build it with:

cmake .
make

and run it with:

./http_examples

and bench it with:

ab -c10 -n1000 'http://127.0.0.1:8080/hw'

which gives us the results:

Concurrency Level:      10
Time taken for tests:   0.395 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      51000 bytes
HTML transferred:       12000 bytes
Requests per second:    2530.52 [#/sec] (mean)
Time per request:       3.952 [ms] (mean)
Time per request:       0.395 [ms] (mean, across all concurrent requests)
Transfer rate:          126.03 [Kbytes/sec] received

Not bad, not bad at all (if you just got done with the Symfony tests, this is like going in a jet after walking with weights strapped to your back).

However, it's just a little faster than the RPS we had running PHP as a daemon under React PHP (2500 vs 2350).

Although, to make use of this will mean we're going to be using C++, which some people may find more difficult than writing PHP, and a little too low level (but hey, this Simple-Web-Server project includes Boost already, which has many high level language functions/classes built right in).

Common Lisp

As always, I've got to bring in one of my personal favorite languages (Common Lisp) and see how it performs in the context of these other languages.

Pop open a Common Lisp REPL (I assume you have a working copy of SBCL and Quicklisp set up, otherwise go do that first) and type:

(ql:quickload :woo)
(woo:run (lambda (env) (declare (ignore env)) '(200 (:content-type "text/plain") ("Hello, World"))))

Voila, a working web server to serve "Hello, World" requiring nothing more than 10 seconds of time in the Common Lisp REPL.

Lets see how it performs with the following:

ab -c10 -n1000  'http://127.0.0.1:5000'

and we see the output:

Concurrency Level:      10
Time taken for tests:   0.332 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      114000 bytes
HTML transferred:       12000 bytes
Requests per second:    3015.20 [#/sec] (mean)
Time per request:       3.317 [ms] (mean)
Time per request:       0.332 [ms] (mean, across all concurrent requests)
Transfer rate:          335.68 [Kbytes/sec] received

Wow! Finally broke the 3000/RPS mark on this old rig.

To his credit, the author of this package is amazing, and I suggest you check out his Github for it (scroll down a little to see how it compares to other language servers):

https://github.com/fukamachi/woo

On a good machine, such as the one he benches on, he hits 40,000/RPS (with the next highest being a server written in Go reaching 30,000).

Is Common Lisp faster than C++? Not necessarily in every context, but if you look at general language benchmarks, it actually does surpass C++ and Java in some (and get beat in others), leaving only pure minimal C as a language that can consistently come in at number one, so, lets go find and test out a C based web server that can serve minimal responses.

Lets try some Common Lisp with an actual framework (Caveman2 in this case, using the bare skeleton produced by the following):

(ql:quickload :caveman2)
(caveman2:make-project (pathname "~/src/lisp/cm2-bench"))
(cm2-bench:start :server :woo :port 5000)

Concurrency Level:      10
Time taken for tests:   1.166 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      307000 bytes
HTML transferred:       12000 bytes
Requests per second:    857.86 [#/sec] (mean)
Time per request:       11.657 [ms] (mean)
Time per request:       1.166 [ms] (mean, across all concurrent requests)
Transfer rate:          257.19 [Kbytes/sec] received

Hmm, alright, so it seems inline with the speed of node + express - definitely not as nice as the plain woo bench of a simple request, but still decent.

C

Ok, this is a little trickier to find, as if you thought people were reluctant to write REST based services in C++, C is even more rare.

Ultimately, I didn't see a good one posted out there, but to give you an idea, a very amateurish one I put together using plain old forking and not threading was able to get the following Hello, World bench:

Concurrency Level:      10
Time taken for tests:   0.618 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      71000 bytes
HTML transferred:       12000 bytes
Requests per second:    1617.66 [#/sec] (mean)
Time per request:       6.182 [ms] (mean)
Time per request:       0.618 [ms] (mean, across all concurrent requests)
Transfer rate:          112.16 [Kbytes/sec] received

If and when I get time, I'll attempt to perform a similar test on a C based one with actual threading + scheduling of threads, so it can hopefully see some good results (although I do not expect it to pass the benchmarks performed by Woo).

Conclusion

If you anticipate your micro service being a bottleneck in your micro service architecture (a central point many other components depend on) try to avoid cumbersome frameworks in any language, as all the extra routing/features/etc. they provide will have a dramatic impact on your RPS throughput.

The big take away is this: Common Lisp wins once again!

It has a set of community supported libraries on par with composer/npm via quicklisp
It can (and does) outperform other compiled languages
It has many high level language features still lacking in other languages (look into the Common Lisp macro system sometime)
It provides interactive development with a live image + REPL