Bull Library – manage your queues graciously
Job queues are an essential piece of some application architectures. They can be applied as a solution for a wide variety of technical problems:
Avoiding the overhead of high loaded services.
Controlling the concurrency of processes accessing to shared (usually limited) resources and connections.
Handling communication between microservices or nodes of a network.
As a typical example, we could think of an online image processor platform where users upload their images in order to convert them into a new format and, subsequently, receive the output via email. Image processing can result in demanding operations in terms of CPU but the service is mainly requested in working hours, with long periods of idle time. A job queue would be able to keep and hold all the active video requests and submit them to the conversion service, making sure there are not more than 10 videos being processed at the same time. It would allow us keeping the CPU/memory use of our service instance controlled, saving some of the charges of scaling and preventing other derived problems like unresponsiveness if the system were not able to handle the demand.
Redis as a queue service
Redis is a widely used in-memory data storage system which was primarily designed to work as an applications cache layer. But it also provides the tools needed to build a queue handling system.
An important point to take into account when you choose Redis to handle your queues is: you’ll need a traditional server to run Redis.
This can or cannot be a problem depending on your application infrastructure but it's something to account for. You might have the capacity to spin up and maintain a new server or use one of your existing application servers with this purpose, probably applying some horizontal scaling to try to balance the machine resources. Besides, the cache capabilities of Redis can result useful for your application.
If your application is based on a serverless architecture, the previous point could work against the main principles of the paradigma and you’ll probably have to consider other alternatives, let’s say Amazon SQS, Cloud Tasks or Azure queues.
Ok, we'll use Redis. How?
Although you can implement a job queue making use of the native Redis commands, your solution will quickly grow in complexity as soon as you need it to cover concepts like:
- Concurrency management
- Scheduling and recurrency
Then, as usual, you’ll end up making some research of the existing options to avoid re-inventing the wheel. There are a good bunch of JS libraries to handle technology-agnostic queues and there are a few alternatives that are based in Redis. Depending on your requirements the choice could vary. In our case, it was essential:
- Setting up of scheduled jobs. Some tasks needed to happen at a specific time, every day.
- Errorless concurrency management. The queue would be responsible for controlling the consumption of shared resources. An error could leave some components, required for the correctness of the business logic, inoperative.
- Priority queues. The source of the job could/should be considered to decide what element in the queue will be executed.
- Auto-recovery from system failure. The platform should be able to detect an unexpectedly exited execution and retry it.
Bull is a JS library created to do the hard work for you, wrapping the complex logic of managing queues and providing an easy to use API. And what is best, Bull offers all the features that we expected plus some additions out of the box:
- Jobs can be categorised (named) differently and still be ruled by the same queue/configuration. This is great to control access to shared resources using different handlers.
- It is actively maintained.
- Support for delayed jobs.
- Global and local events to notify about the progress of a task.
- Very well documented.
- Support for LIFO queues - last in first out.
Running your queues with Bull
Bull is based on 3 principal concepts to manage a queue. Let’s take as an example the queue used in the scenario described at the beginning of the article, an image processor, to run through them.
Events and listeners
For each relevant event in the job life cycle (creation, start, completion, etc) Bull will trigger an event. Listeners will be able to hook these events to perform some actions, eg. inform a user about an error when processing the image due to an incorrect format.
A neat feature of the library is the existence of global events, which will be emitted at a queue level eg. all the jobs have been completed and the queue is idle.
Responsible for adding jobs to the queue. They need to provide all the information needed by the consumers to correctly process the job. In its simplest form, it can be an object with a single property like the id of the image in our DB. In general, it is advisable to pass as little data as possible and make sure is immutable. Otherwise, the data could be out of date when being processed (unless we count with a locking mechanism).
A producer would add an image to the queue after receiving a request to convert it into a different format.
Responsible for processing jobs waiting in the queue. They’ll take the data given by the producer and run a function handler to carry out the work (like transforming the image to svg).
Consumers and producers can (in most of the cases they should) be separated into different microservices. In fact, new jobs can be added to the queue when there are not online workers (consumers). As soon as a worker shows availability it will start processing the piled jobs.
This approach opens the door to a range of different architectural solutions and you would be able to build models that save infrastructure resources and reduce costs like:
Begin with a stopped consumer service. If there are no jobs to run there is no need of keeping up an instance for processing.
If new image processing requests are received, “produce” the appropriate jobs and add them to the queue.
A local listener would detect there are jobs “waiting” to be processed. It could trigger the start of the consumer instance.
New jobs would stack up in the queue.
When the consumer is ready, it will start handling the images.
Once all the tasks have been completed, a global listener could detect this fact and trigger the stop of the consumer service until it is needed again.
No doubts, Bull is an excellent product and the only issue we’ve found so far it is related to the queue concurrency configuration when making use of named jobs.
Naming is a way of job categorisation. The name will be given by the producer when adding the job to the queue:
Then, a consumer can be configured to only handle specific jobs by stating their name:
This functionality is really interesting when we want to process jobs differently but make use of a single queue, either because the configuration is the same or they need to access to a shared resource and, therefore, controlled all together.
However, when setting several named processors to work with a specific concurrency, the total concurrency value will be added up. This is mentioned in the documentation as a quick note but you could easily overlook it and end-up with queues behaving in unexpected ways, sometimes with pretty bad consequences.
Nevertheless, with a bit of imagination we can jump over this side-effect by:
Following the author advice: using a different queue per named processor. Not ideal if you are aiming for resharing code.
Including the job type as a part of the job data when added to queue. The job processor will check this property to route the responsibility to the appropriate handler function.
Creating a custom wrapper library (we went for this option) that will provide a higher-level abstraction layer to control named jobs and rely on Bull for the rest behind the scenes. Although it involved a bit more of work, it proved to be a more a robust option and consistent with the expected behaviour.
I hope you enjoyed the article and, in the future, you consider queues as part of your new architectural puzzle and Redis and Bull as the glue to put all the pieces together.