A queueing system for asynchronous task execution in PHP web apps

I've recently volunteered on a project about detection of deforestation using satellite imagery. The goal was to have a MVP in two months, and the envisionned web app included a queueing system to run the detection tasks on server side asynchronously. I wasn't in the team working on that part and didn't intervene, but I was a bit puzzled by the proposed solution which looked to me like a labyrinthic patchwork. In my usual old fashion way I started thinking: what could be done from scratch to fit the bill with a minimum amount of time and effort while avoiding all that fancy stuff. Then I developped my own version in parallel of the official project. This article describes what I've ended up with.

First, for comparison, the solution proposed was using FastAPI, Celery, Redis, Flower, PostgreSQL database, MinIO, running inside a Docker container. The details are available here. That's as many tools you need to install, configure, learn how to use, and find your way on how to use them all together. Several volunteers worked on it for the whole length of the project, but at the deadline there was still no MVP available. Also, it was mentionned in the final report that 10Gb of disk space was insufficient for that stack, and 1Gb of RAM was recommended.

The requirements were: take coordinates and dates as input, run the detection task asynchronously, and store the detected areas in an unspecified way. But the exact inputs and outputs really don't matter here, then the example code below is a generic example straightforward to adapt to any application. About functionalities, I personnally think the following ones are already more than enough for a MVP: adding/removing tasks (with support for multiple users); listing the tasks, their result, and their status; running the tasks in the background (on server side) with a limited number of workers in parallel.

I'm reusing the same template as for my little "one single PHP file" web apps (cf previous articles 1, 2, 3), and gives only here the details related to the task queueing system. Refer to the other articles if you want to know more about my template (which is just ~400 lines of PHP/HTML/CSS/JS and no dependencies by the way).

I used a SQLite database to store the tasks data and what's needed for the queueing system. SQLite is natively included in PHP, so if you have already a PHP server (like Apache) running (needs around 100-200Mb of RAM and less than 100Mb of disk space), which is the case in many rental servers and even if not is a matter of seconds to setup, you need absolutely nothing to install or configure. Moreover SQLite is supported by a plethora of programming languages, leaving you the choice for the implementation of the worker. The database (one single file, 20000 tasks data takes around 1Mb for two single integers as input and output) is created automatically by CheckDb in my template given the following model:

TaskConfig.NbMaxWorker stores the maximum number of allowed workers in parallel. In the database automatic creation method, its value can be set as in the code below, where $this->nbMaxWorker is previously set in the constructor. Storing it in the database allows to resize the worker pool on the fly if needed.

In the table Task:

In the PHP API there are three entry points: getTasks, addTask, deleteTask.

deleteTask deletes the record identified by its reference and owner. getTasks gets all the tasks' data for a given owner. addTask adds a new record with the input info from the UI, and Status is set to "0" to mark the task as pending. Then it tries to spawn a new worker as follow:

worker is the worker executable, stored in the same folder as the PHP file of the app. The redirection of the output to /dev/null is necessary to avoid the call to be blocking (& is not enough). $this->dbObfuscation is the database obfuscation parameter, cf my previous articles.

This method creates a new worker process when the user creates tasks, if there are not already the maximum number of workers. The workers terminate themselves when there are no more task to process, or keep working on another task if they find one after completing their current one. No need for a seperate way to start/keep alive/terminate workers, the server and the workers takes care of themselves automatically.

Trying to spawn new workers at the end of GetTasks (which is repeatedly called as long as a user is logged in) could also be done if you're afraid that a worker may die and want to be 100% sure that there are always as many workers as possible even in such a case. However, spawning only when adding is probably sufficient in most case.

One point to be careful about to be able to withstand heavy load is to check that each SQL command hasn't failed due to a busy database. If so, it is retried until it succeeds, with a small delay between attempts to avoid drowning the server and DB with an avalanche of requests.

The worker can be written in any programming language (as long as it supports SQLite, a very low requirement), so I'll use pseudocode to describe its algorithm. In plain words, a main loop run until the worker can't find a task to process or detect too many workers in parallel, then terminates. If it finds a task, it tries to update the task status and check it was actually able to do so (to avoid race condition), and then processes the task.

The most important part here is to take care of concurrency problems: two workers trying to take responsibility of the same task at the same time. WHERE Status=0 in the UPDATE Task SET Status=1, Key=... statement, followed by the test on taskActuallyTaken, plus the atomicity of the transactions, ensure such problem is avoided.

And that's it. Over a week-end, in ~1000 lines on PHP side (almost half of it coming from the template) and ~200 lines on worker side including the actual detection task (in python, yuck but it was the project requirement), I had a MVP fulfilling the whole requirements except for an interactive Google map in the interface. It can be seen in the demo video below. Copy 2 files in the appropriate folder of an Apache server, set the permission to allow execution of the worker and automatic creation of the database file, and that's up and running with a ridiculously low footprint on memory and disk. Why make it difficult when it can be easy?

The envisionned web app had super low requirements (from the perspective of the task queueing system at least). I recall the project leader expecting a few requests per minute at most. The demo above and me actually using it (on my hometown, on recent wildfires here in Japan, ...) convinced me that the solution above is already enough to be very much useful.

But how would my solution perform with high frequency and many workers? To check it I've setup a test environment where tasks are automatically created at a given pace. I can then monitor how it performs for various frequency/load/number of workers. The video below shows the test running for 10 workers in parallel, 5 tasks added per second, 2 seconds to complete one task, and 100 total tasks.

The final and most stressing test I did was 20 workers in parallel, 10 tasks added per second, 1.5-2.5 seconds (randomly) to complete one task, and 10000 total tasks. The graph below shows the evolution of the number of pending tasks, running tasks and worker processes.

One can see that

Watching the list of tasks in the UI I could verify that the app stays responsive despite the load during the ~15mn of the test. And finally I checked that no task had been left unprocessed, or had been mistakenly processed twice by logging the workers' actions. In conclusion, that looks pretty good and perfectly capable of supporting heavy load.

The only problem I had during the tests was Chrome throwing net::ERR_INSUFFICIENT_RESOURCES during the most stressing tests. That was actually on the automatic task creation side, and I solved the problem by switching to Firefox which was fine with the same load. Thumbs up Google ! 🙃

If you want to use my code, you're welcome. A tar is available here. It contains the php file for the app and a C implementation of the worker. One could adapt it and integrate it to a different app by modifying the input/output, adding the actual task in the worker, and refactoring the interface according to its need. The C code compiles as follow (including the command to install libsqlite if you need):

So, what's the conclusion? Sure enough my tool doesn't include the ton of whistles and blowers provided by Celery and co. Do you need them ? Are they worse the bloated Gb and the pain ? You're the only one to know. However, be aware that very probably you can actually do what you want with a few hundred lines of code written over a week end from scratch, simple as it can be to setup and use. Choose freely, choose wisely my friends.

2025-04-01
in All, C programming, Web app,
76 views
Copyright 2021-2025 Baillehache Pascal