How the task system works
This page explains the DiracX task system from an operational perspective. For the design rationale and architecture, see the developer explanation and the ADR.
Task lifecycle
- A task is submitted — either by application code, the scheduler (for periodic tasks), or the CLI (
diracx-task-run call) - The broker places it on one of nine Redis Streams, selected by the task's priority (realtime, normal, background) and size (small, medium, large)
- A worker picks up the message from the stream, consuming in strict priority order (realtime first)
- The worker acquires locks required by the task (mutex, RW, limiters). If a lock cannot be acquired, the task is rescheduled with a short delay
- The worker resolves dependencies (database connections, settings) via the dependency injection system
- The task's
execute()method runs. A watchdog thread periodically extends lock TTLs for long-running tasks - On success, the message is acknowledged and removed from the stream. If the task belongs to a callback group, the worker checks whether all siblings have completed
- On failure, the retry policy is consulted. If retries remain, the task is placed in the delayed sorted set for later promotion. If retries are exhausted and the task is dead-letter-queue-eligible, it is persisted to SQL
Streams
The broker uses nine Redis Streams named diracx:tasks:{priority}:{size}:
| Small | Medium | Large | |
|---|---|---|---|
| Realtime | diracx:tasks:realtime:small |
diracx:tasks:realtime:medium |
diracx:tasks:realtime:large |
| Normal | diracx:tasks:normal:small |
diracx:tasks:normal:medium |
diracx:tasks:normal:large |
| Background | diracx:tasks:background:small |
diracx:tasks:background:medium |
diracx:tasks:background:large |
Workers are configured for a specific size class and consume from the three priority streams for that size. This allows different worker pools to be scaled independently based on resource requirements.
Locking
Two categories of lock protect tasks:
- Structural locks prevent data corruption.
MutexLockprovides exclusive access;ExclusiveRWLock/SharedRWLockallow concurrent readers or a single writer. These are always enforced, including in interactive CLI mode. - Limiters control throughput.
RateLimitercaps operations per time window;ConcurrencyLimitercaps simultaneous executions. These are skipped in interactive mode.
All locks have TTLs. A watchdog thread extends lock TTLs during execution so that long-running tasks don't lose their locks. If a worker crashes, locks auto-expire and are released.
Scheduler
The scheduler is a singleton process (enforced by a Redis mutex with a 30-second TTL). It runs four concurrent loops:
- Periodic loop — checks if any periodic task is due (based on
IntervalSeconds,CronSchedule, orRRuleSchedule) and submits it to the broker - Delayed poll loop — promotes tasks from the delayed sorted set (
diracx:tasks:delayed) to their target stream when their scheduled time arrives. Uses an atomic Lua script to prevent race conditions - Lock extend loop — periodically refreshes the scheduler's own Redis mutex
- Config watch loop — detects changes to the VO list and adds/removes periodic task schedules for new/removed VOs
Dead-letter queue
Tasks marked with dlq_eligible = True that exhaust their retries are persisted to the dlq_tasks table in the SQL database. This provides a durable safety net — critical tasks are never silently dropped, even if Redis is restarted. Dead letter queue tasks can be inspected and resubmitted.
Ephemeral broker state
The broker's Redis state is designed to be ephemeral. On restart:
- The scheduler repopulates periodic task schedules from entry points and configuration
- Consumer groups are recreated for all nine streams
- Dead-letter-queue-eligible tasks that were persisted in SQL can be resubmitted
- Delayed one-off tasks that were in flight are lost (periodic tasks are automatically rescheduled)
See the broker lifecycle explanation for details on how this affects application design.