Optimizing performance of PHP app that sends an external request
I’m running web services API written in PHP and one of the endpoints during every request pings another external web service, particularly Firebase Realtime DB which in its turn can deliver notification to the client through the Websockets API. Below is the image with a rough architecture of this process:
As you can see, any client app (browser) and any server app (some 3rd party) can send a request to the PHP endpoint, then PHP pings the Firebase endpoint and finally the Firebase notifies the client app (browser) about intercepted request.
Web service short story
The web service I am running is not very complicated and we can divide the operations is does into the following steps:
- Processing request
- Doing few SQL queries including insert queries
- Performing POST request to the external Firebase Realtime DB API
- Providing response to the client
After it was completed, I did load tests and profiling. It turned out that posting data to Firebase was taking around 1.6–2 seconds. Therefore number of requests that could be performed simultaneously to my own PHP endpoint was around 300–400 / 1 minute with the response time of more than 2 seconds (PHP app time + Firebase request time). This was a very poor result, so I started to look into improving the performance and particularly the request time to Firebase.
A bit of side talk
You might know that any VM is generally limited by 2 main factors: CPU and Memory. When it comes to these limitations, a very important aspect is the time (a relevant time value to our VM capacity) that is required for our application to run in a specific environment. The faster our app runs or in other words the more optimized it is, the more simultaneous instances of our app can be executed. This is especially valid for the PHP scripts.
Optimization Iteration #0
First of all, it turned out that the Firebase PHP SDK was making 2 requests all the time. First request to get the access token and second to actually post the data. I found out that there’s a way to use a “database secrets” which is a one-time generated token that you may use to access your Firebase DB. So I just dropped the SDK and used a direct request using CURL interface provided inside the PHP. The time to post data to Firebase decreased approximately by 1 second. So, now I could perform 700–800 requests per 1 minute and the response time was around 1–1.2 seconds. This is better than previous, but still not good enough.
The reason why Firebase request was taking so long is because PHP waits for the response from the remote server due to its synchronous code execution nature. So all the subsequent code is blocked until the response is received. This piece of functionality with Firebase notifications is not very critical to my application and in case for any reason something goes wrong during the request to Firebase, there’s no need to perform some rollback and I do not actually need to know about it immediately etc. Thus, I decided to speed up the performance by omitting the part where PHP is waiting for the response from the remote server. The PHP should just send a request and does not care about what happens afterwards.
Optimization Iteration #1
To solve this task I used a simple solution where you can execute external CLI commands using PHP. And yes, cURL does has a CLI interface (tool).
We may present the updated architecture on the diagram below:
The PHP code combined with the cURL command looks the following way:
$cliCommand = <<<CODEcurl -k -H "Content-Type: application/json" -d '{"hello": "world"}' -X POST https://<my-db>.firebaseio.com/<my-endpoint>.json?auth=<database_secret> >> /tmp/firebase.log 2>&1 &CODE;exec($cliCommand);
This part “>> /tmp/firebase.log 2>&1 &” allows to omit the response waiting (code blocking) and instead log it inside the firebase.log file in case I need to check for any possible error later which I implemented later using a cronjob task.
This simple solution made the endpoint working almost instantly. The response time from the PHP script dropped from 1–1.2 seconds to 150–250ms and now I could perform around 1200–1300 simultaneous requests per 1 minute… Really? I was expecting a bit more. Something was definitely wrong here.
When I checked the htop tool (CPU and memory monitoring tool), I found out that during the load test, curl tools literally eats all the CPU. 100% of CPU power was filled with CURL tasks. I’m not really sure why this little command line application is “so hungry” for the computation power. If you know, please drop a message in the comments below.
Optimization Iteration #2
Anyway, I started to search for some alternatives. Among CLI tools there’s nothing better than curl. So, I’ve decided to try another interface, particularly HTTP interface by experimenting with NodeJS (Express). Javascript is an asynchronous programming language and Node runs it very efficiently along with Express. I’ve created a small JS script using 2 extra libraries: Express and HTTP. It is basically an asynchronous proxy that listens for HTTP requests and forwards data to Firebase Realtime DB endpoint. Because we access the NodeJS script through the HTTP interface, instead of using exec() method on the PHP side, I had to switch to sockets, particularly to fsocketopen(), fwrite() and fclose(). Sockets allow to send ping requests from PHP without waiting for the response. You might want to ask why in the hell I need the NodeJS then? Well, if I use fsocketopen for sending request directly to the remote web server (Firebase) in a different network and region and for sending request to the local web server (NodeJS) that sits on the same machine — that’s 2 totally different things in terms of timing. Also I can run my local Express server without SSL, but Firebase can run using only HTTPS, thus fsocketopen would have to spend additional time to send an extra background request for the SSL handshake. Thus, yes, there’s a great benefit in using fsocketopen in order to simply open some another thread on some different local interface.
Anyway, this is the new architecture diagram I came to:
And here are the performance optimization results. CPU load went down from 100% to 40–50% max. The memory was almost on the same level, roughly 50–60%. The response time was 150–250 ms. And finally the number of simultaneous requests I could execute to the endpoint skyrocketed up to 5500 requests per 1 minute.
Testing Environment
Finally, revealing the environment I used for those tests. It is EC2 t2.micro instance (1 CPU and 1GB of memory). BTW, MySQL DB is on the separate VM instance, which saves the VM resources greatly. For load tests I was using Apache jMeter and a default thread properties, which are:
- number of threads (users) equals to 10
- ramp-up period equals to 1 second.
Code snippets
PHP script that sends request to the NodeJS script using the fsocketopen
$url = 'http://localhost:3000/prod'; // URL to the NodeJS script
$urlParts = parse_url($url);
$jsonData = '{"hello": "world"}';try {
$fp = fsockopen($urlParts['host'], $urlParts['port'], $errno, $errstr, 0.1);
if (!$fp) {
// good place to log your error
} else {
$out = "POST " . $urlParts['path'] . " HTTP/1.1\r\n";
$out .= "Host: localhost\r\n";
$out .= "Content-Type: application/json\r\n";
$out .= "Content-Length: " . strlen($jsonData) . "\r\n";
$out .= "Connection: Close\r\n\r\n";
$out .= $jsonData;
fwrite($fp, $out);
fclose($fp);
}
} catch (Exception $e) {
// good place to log your error
}
NodeJS script which forwards request asynchronously to Firebase
"use strict";const express = require("express");
const https = require("https");const environments = {
dev: {
host: "<my-db-dev>.firebaseio.com",
token: "<my-dev-token>",
},
prod: {
host: "<my-db-prod>.firebaseio.com",
token: "<my-prod-token>",
},
};function postFirebase(envName, data) {
if (!environments[envName]) {
console.log(`${envName} not found`);
return;
}
const env = environments[envName]; const options = {
hostname: env.host,
port: 443,
path: `/<my-endpoint>.json?auth=${env.token}`,
method: "POST",
timeout: 2000,
headers: {
"Content-Type": "application/json",
"Content-Length": data.length,
},
}; const req = https.request(options); req.on("error", (error) => {
console.error(error);
}); req.write(data);
req.end();
}const app = express();
app.use(express.json());app.post("*", function (req, res) {
postFirebase(req.originalUrl.substr(1), JSON.stringify(req.body));
res.set("Content-Type", "application/json");
res.json();
});// Listen on port 3000
app.listen(3000, function (err) {
if (err) {
throw err;
} console.log("Server started on port 3000");
});
Summary
To summarize, there’s always a room for improving and optimizing the code and its efficiency. I’ve managed to improve the time required to run the API endpoint from 2.2 seconds to 0.2 (11x times). As for the number of simultaneous requests, the improvement is more than 13x times (from 300–400 requests per minute and up to 5500). NodeJS performed much better than a CLI’s curl tool in terms of consuming CPU and memory resources. Therefore the pair of “fsocketopen() / NodeJS” works much better than “exec() / curl” if you want to initiate another thread from within the PHP to ping some external resource or web service.
Thanks for reading. Please let me know if you have an idea why curl requires so much of CPU resources to send requests comparing to NodeJS? Also it is interesting if there’s any other good option instead of NodeJS to create a small proxy over the HTTP protocol to send requests asynchronously (e.g. Python?) and do you think it can perform better? Thanks ahead for your thoughts!