Baillehache Pascal's personal website

A queueing system for asynchronous task execution in PHP web apps

I've recently volunteered on a project about detection of deforestation using satellite imagery. The goal was to have a MVP in two months, and the envisionned web app included a queueing system to run the detection tasks on server side asynchronously. I wasn't in the team working on that part and didn't intervene, but I was a bit puzzled by the proposed solution which looked to me like a labyrinthic patchwork. In my usual old fashion way I started thinking: what could be done from scratch to fit the bill with a minimum amount of time and effort while avoiding all that fancy stuff. Then I developped my own version in parallel of the official project. This article describes what I've ended up with.

First, for comparison, the solution proposed was using FastAPI, Celery, Redis, Flower, PostgreSQL database, MinIO, running inside a Docker container. The details are available here. That's as many tools you need to install, configure, learn how to use, and find your way on how to use them all together. Several volunteers worked on it for the whole length of the project, but at the deadline there was still no MVP available. Also, it was mentionned in the final report that 10Gb of disk space was insufficient for that stack, and 1Gb of RAM was recommended.

The requirements were: take coordinates and dates as input, run the detection task asynchronously, and store the detected areas in an unspecified way. But the exact inputs and outputs really don't matter here, then the example code below is a generic example straightforward to adapt to any application. About functionalities, I personnally think the following ones are already more than enough for a MVP: adding/removing tasks (with support for multiple users); listing the tasks, their result, and their status; running the tasks in the background (on server side) with a limited number of workers in parallel.

I'm reusing the same template as for my little "one single PHP file" web apps (cf previous articles 1, 2, 3), and gives only here the details related to the task queueing system. Refer to the other articles if you want to know more about my template (which is just ~400 lines of PHP/HTML/CSS/JS and no dependencies by the way).

I used a SQLite database to store the tasks data and what's needed for the queueing system. SQLite is natively included in PHP, so if you have already a PHP server (like Apache) running (needs around 100-200Mb of RAM and less than 100Mb of disk space), which is the case in many rental servers and even if not is a matter of seconds to setup, you need absolutely nothing to install or configure. Moreover SQLite is supported by a plethora of programming languages, leaving you the choice for the implementation of the worker. The database (one single file, 20000 tasks data takes around 1Mb for two single integers as input and output) is created automatically by CheckDb in my template given the following model:

    $this->dbDef = array(
      ...
      'TaskConfig' => array(
        'NbMaxWorker' => 'INTEGER'),
      'Task' => array(
        'Status' => 'INTEGER',
        'Key' => 'INTEGER',
        'Label' => 'TEXT',
        'Owner' => 'TEXT',
        'WorkerMsg' => 'TEXT',
        'Input' => 'INTEGER',
        'Output' => 'INTEGER'),
      ...
    );

TaskConfig.NbMaxWorker stores the maximum number of allowed workers in parallel. In the database automatic creation method, its value can be set as in the code below, where $this->nbMaxWorker is previously set in the constructor. Storing it in the database allows to resize the worker pool on the fly if needed.

  function CheckDb() {
    ...
    // Set the maximum number of workers in the configuration
    $sql =
      'INSERT OR REPLACE INTO TaskConfig (Reference, NbMaxWorker) VALUES (1, ' .
      $this->nbMaxWorker . ')';
    $this->db->exec($sql);
    ...
  }

In the table Task:

Status is the task's status, 0: pending, 1: running, 2: complete, 3: failed
Key is used in the worker, cf below
Owner is used to associate tasks with users in a multi-user context
WorkerMsg is used to send info from the worker to the UI during the task execution (e.g. the percentage of completion)
Label is the label of the task, to identify the task in a human readable format (not necessary for our purpose, given more as an example of possible data associated with a task)
Input, Output are the inputs/outputs of the task, two integers as simple example here, in the detection app the inputs were coordinates and dates, and the output were images whose file's name was created based on the key

In the PHP API there are three entry points: getTasks, addTask, deleteTask.

  // Process an API request for add a task
  function AddTask() {
    try {
      // Ensure there is no double quote in the labels
      $_POST['label'] = str_replace('"', ' ', $_POST["label"]);
      // Prepare the statement for addition of a new task
      $sql =
        "INSERT INTO Task " .
        "(Key, Status, Label, Owner, Input) " .
        "VALUES (:key, :status, :label, :owner, :input)";
      while(($stmt = $this->db->prepare($sql)) == false) usleep(10000);
      // Bind and execute the statement
      $stmt->bindValue(":key", "", SQLITE3_TEXT);
      $stmt->bindValue(":status", 0, SQLITE3_INTEGER);
      $stmt->bindValue(":label", $_POST["label"], SQLITE3_TEXT);
      $stmt->bindValue(":owner", $_POST["owner"], SQLITE3_TEXT);
      $stmt->bindValue(":input", $_POST["input"], SQLITE3_INTEGER);
      while($stmt->execute() == false) usleep(10000);
      // Spawn a worker (if possible)
      $log = $this->SpawnWorker();
      // Create the response
      echo '{"actionResult":"ok", "msg":"added task ' .
        $_POST["label"] . '", "log":"' . $log . '"}';
    } catch (Exception $e) {
      echo '{"actionResult":"error", "msg":"' . $e . '"}';
    }
  }
  // Process an API request for removal of a task
  function DeleteTask() {
    try {
      // Prepare the statement for removing
      $sql = "DELETE FROM Task WHERE Reference=:ref AND Owner=:owner";
      while(($stmt = $this->db->prepare($sql)) == false) usleep(10000);
      // Bind and execute the statement
      $stmt->bindValue(":ref", $_POST['ref'], SQLITE3_INTEGER);
      $stmt->bindValue(":owner", $_POST['owner'], SQLITE3_TEXT);
      while($stmt->execute() == false) usleep(10000);
      // Create the response
      echo '{"actionResult":"ok", "msg":"removed task ' .
        $_POST["label"] . '"}';
    } catch (Exception $e) {
      echo '{"actionResult":"error", "msg":"' . $e . '"}';
    }
  }
  // Process an API request to get the tasks of a user
  function GetTasks() {
    try {
      // Create the statement
      $sql = 
        "SELECT Reference, Key, Status, Label, Owner, WorkerMsg, " .
        "Input, Output FROM Task WHERE Owner=:owner " .
        "ORDER BY Label";
      while(($stmt = $this->db->prepare($sql)) == false) usleep(10000);
      // Bind and execute the statement
      $stmt->bindValue(":owner", $_POST["owner"], SQLITE3_TEXT);
      while(($rows = $stmt->execute()) == false) usleep(10000);
      // Convert the results into a JSON string
      $sep = ' ';
      $tasks = '';
      if($rows) while($row = $rows->fetchArray()) {
        $tasks .=
          $sep . '{' . 
          '"Reference":"' . $row['Reference'] . '",' .
          '"Key":"' . $row['Key'] . '",' .
          '"Status":"' . $row['Status'] . '",' .
          '"Label":"' . $row['Label'] . '",' .
          '"Owner":"' . $row['Owner'] . '",' .
          '"WorkerMsg":"' . $row['WorkerMsg'] . '",' .
          '"Input":"' . $row['Input'] . '",' .
          '"Output":"' . $row['Output'] . '"}';
        $sep = ', ';
      }
      // Create the response
      echo '{"actionResult":"ok", "msg":"", "tasks":[' . $tasks . ']}';
    } catch (Exception $e) {
      echo '{"actionResult":"error", "msg":"' . $e . '"}';
    }
  }

deleteTask deletes the record identified by its reference and owner. getTasks gets all the tasks' data for a given owner. addTask adds a new record with the input info from the UI, and Status is set to "0" to mark the task as pending. Then it tries to spawn a new worker as follow:

  // Spawn a new worker if possible
  function SpawnWorker() {
    try {
      // If there are not too many workers
      $rows = $this->db->query("SELECT COUNT(*) FROM Task WHERE Status=1");
      $nbRunning = $rows->fetchArray(SQLITE3_NUM)[0];
      if($nbRunning < $this->nbMaxWorker) {
        // Spawn a new worker
        $cmd = dirname(__FILE__) . '/worker ' .
          $this->dbObfuscation . ' > /dev/null &';
        $ret = exec($cmd);
      }
    } catch (Exception $e) {
      return str($e);
    } 
    return "";
  }

worker is the worker executable, stored in the same folder as the PHP file of the app. The redirection of the output to /dev/null is necessary to avoid the call to be blocking (& is not enough). $this->dbObfuscation is the database obfuscation parameter, cf my previous articles.

This method creates a new worker process when the user creates tasks, if there are not already the maximum number of workers. The workers terminate themselves when there are no more task to process, or keep working on another task if they find one after completing their current one. No need for a seperate way to start/keep alive/terminate workers, the server and the workers takes care of themselves automatically.

Trying to spawn new workers at the end of GetTasks (which is repeatedly called as long as a user is logged in) could also be done if you're afraid that a worker may die and want to be 100% sure that there are always as many workers as possible even in such a case. However, spawning only when adding is probably sufficient in most case.

One point to be careful about to be able to withstand heavy load is to check that each SQL command hasn't failed due to a busy database. If so, it is retried until it succeeds, with a small delay between attempts to avoid drowning the server and DB with an avalanche of requests.

The worker can be written in any programming language (as long as it supports SQLite, a very low requirement), so I'll use pseudocode to describe its algorithm. In plain words, a main loop run until the worker can't find a task to process or detect too many workers in parallel, then terminates. If it finds a task, it tries to update the task status and check it was actually able to do so (to avoid race condition), and then processes the task.

Build the path to the SQLite database based on the current directory
  and obfuscation parameter given in argument
Connect to the database
flagContinue = TRUE
nbMaxWorker = SQL("SELECT NbMaxWorker FROM TaskConfig WHERE Reference=1"
WHILE flagContinue == TRUE:
  nbWorker = SQL("SELECT COUNT(*) FROM Task WHERE Status=1"
  IF nbWorker >= nbMaxWorker:
    flagContinue = FALSE
  ELSE:
    flagContinue = ProcessTask()
Disconnect from the database

FUNCTION ProcessTask:
  refTask = SQL("SELECT Reference FROM Task WHERE Status=0 LIMIT 1")
  IF refTask IS NOT NULL:
    key = PID() + "_" + refTask
    SQL(
      "UPDATE Task SET Status=1, Key=" + key +
      " WHERE Status=0 AND Reference=" + refTask
    taskActuallyTaken = SQL(
      "SELECT COUNT(*) FROM Task WHERE Status=1 AND Key=" + key +
      " AND Reference=" + refTask)
    IF taskActuallyTaken == 1:
      input = SQL("SELECT Input FROM Task WHERE Ref=" + refTask)
      outputs = RunTask(input, refTask)
      SQL(
        "UPDATE Task SET Status=2, Output=" + output +
        " WHERE Reference=" + refTask)
    RETURN TRUE
  ELSE:
    RETURN FALSE

FUNCTION RunTask(input, refTask):
  ... do the actual work on input ...
  optionally, update the task with info about progress with something like:
  SQL("UPDATE Task SET WorkerMsg=" + info + " WHERE Reference=" + refTask)
  RETURN output

The most important part here is to take care of concurrency problems: two workers trying to take responsibility of the same task at the same time. WHERE Status=0 in the UPDATE Task SET Status=1, Key=... statement, followed by the test on taskActuallyTaken, plus the atomicity of the transactions, ensure such problem is avoided.

And that's it. Over a week-end, in ~1000 lines on PHP side (almost half of it coming from the template) and ~200 lines on worker side including the actual detection task (in python, yuck but it was the project requirement), I had a MVP fulfilling the whole requirements except for an interactive Google map in the interface. It can be seen in the demo video below. Copy 2 files in the appropriate folder of an Apache server, set the permission to allow execution of the worker and automatic creation of the database file, and that's up and running with a ridiculously low footprint on memory and disk. Why make it difficult when it can be easy?

The envisionned web app had super low requirements (from the perspective of the task queueing system at least). I recall the project leader expecting a few requests per minute at most. The demo above and me actually using it (on my hometown, on recent wildfires here in Japan, ...) convinced me that the solution above is already enough to be very much useful.

But how would my solution perform with high frequency and many workers? To check it I've setup a test environment where tasks are automatically created at a given pace. I can then monitor how it performs for various frequency/load/number of workers. The video below shows the test running for 10 workers in parallel, 5 tasks added per second, 2 seconds to complete one task, and 100 total tasks.

The final and most stressing test I did was 20 workers in parallel, 10 tasks added per second, 1.5-2.5 seconds (randomly) to complete one task, and 10000 total tasks. The graph below shows the evolution of the number of pending tasks, running tasks and worker processes.

One can see that

the maximum number of worker (in term of running task) is respected
the workers catch up with the accumulation of pending tasks even after lagging behind
the number of workers decreases when the number of pending tasks gets near zero and come back to zero once all the tasks are processed
the number of processes stays just above the maximum number of worker. It overshoots a bit because the test in the PHP code can "slip through" (it's a test on the number of running task, not number of process) before getting caught by the test in the worker.

Watching the list of tasks in the UI I could verify that the app stays responsive despite the load during the ~15mn of the test. And finally I checked that no task had been left unprocessed, or had been mistakenly processed twice by logging the workers' actions. In conclusion, that looks pretty good and perfectly capable of supporting heavy load.

The only problem I had during the tests was Chrome throwing net::ERR_INSUFFICIENT_RESOURCES during the most stressing tests. That was actually on the automatic task creation side, and I solved the problem by switching to Firefox which was fine with the same load. Thumbs up Google ! 🙃

If you want to use my code, you're welcome. A tar is available here. It contains the php file for the app and a C implementation of the worker. One could adapt it and integrate it to a different app by modifying the input/output, adding the actual task in the worker, and refactoring the interface according to its need. The C code compiles as follow (including the command to install libsqlite if you need):

SQLITE_VERSION=3490100
SQLITE_URL=https://sqlite.org/2025/sqlite-amalgamation-$(SQLITE_VERSION).zip
COMPILER=gcc
/usr/local/lib/libsqlite3.a:
	wget $(SQLITE_URL)
	unzip sqlite-amalgamation-$(SQLITE_VERSION).zip
	rm -rf sqlite3
	mv sqlite-amalgamation-$(SQLITE_VERSION) sqlite3
	rm sqlite-amalgamation-$(SQLITE_VERSION).zip
	cd sqlite3 &&	$(COMPILER) -O3 -c sqlite3.c
	sudo rm -rf /usr/local/include/SQLite3
	sudo mkdir /usr/local/include/SQLite3
	cd sqlite3 && sudo cp sqlite3.h /usr/local/include/SQLite3/sqlite3.h
	sudo ar -r /usr/local/lib/libsqlite3.a sqlite3/sqlite3.o
	rm -rf sqlite3

worker: worker.c Makefile
	gcc worker.c -lsqlite3 -o worker

So, what's the conclusion? Sure enough my tool doesn't include the ton of whistles and blowers provided by Celery and co. Do you need them ? Are they worse the bloated Gb and the pain ? You're the only one to know. However, be aware that very probably you can actually do what you want with a few hundred lines of code written over a week end from scratch, simple as it can be to setup and use. Choose freely, choose wisely my friends.

2025-04-01
in All, C programming, Web app,
435 views
A comment, question, correction ? A project we could work together on ? Email me!
Learn more about me in my profile.