Distributed Counting Semaphore with MySQL
Notes on building a distributed counting semaphore from scratch using MySQL.
- Starts with the simplest possible thing, breaks it, fixes it, and repeats until we arrive at a working design.
Part 1: The Problem
In a single process, limiting concurrency is trivial:
sem = threading.Semaphore(10) # max 10 concurrent
sem.acquire()
do_work()
sem.release()
But when services run across multiple machines, they don’t share memory. A process on Machine A has no idea what Machine B is doing. We need something external that all machines can agree on.
Options:
- Redis — fast, in-memory,
SET NX EXfor simple locks. But no transactional multi-resource acquire and safety depends on timing assumptions (see Kleppmann’s critique below). - ZooKeeper/etcd — true consensus, ephemeral nodes handle crash recovery automatically. But requires dedicated infrastructure.
- Your existing MySQL — already there, ACID transactions, row-level locking. No new infrastructure. That’s what we’ll use.
Part 2: Start Simple — A Distributed Mutex
Before counting semaphores, let’s build the simplest thing: a binary mutex (only 1 holder at a time).
Attempt 1: Naive INSERT
CREATE TABLE lock_table (
name VARCHAR(255) PRIMARY KEY,
holder VARCHAR(255) NOT NULL
);
To acquire: INSERT INTO lock_table (name, holder) VALUES ('my-lock', 'client-A');
To release: DELETE FROM lock_table WHERE name = 'my-lock' AND holder = 'client-A';
If the INSERT fails with duplicate key, the lock is held by someone else.
What breaks:
- Client crashes → row stays forever → lock never released. Permanent resource leak — no other client can ever acquire, even though nobody holds the lock.
- No way to wait — you just get an error and have to poll.
- Race on release: A deletes its row, B and C both try to INSERT simultaneously — one wins, the other gets a duplicate key error. The loser has no way to distinguish “someone else acquired” from a transient failure.
sequenceDiagram
participant A as Client A (holder)
participant DB as MySQL
participant B as Client B
participant C as Client C
A->>DB: DELETE (releases lock)
B->>DB: INSERT (succeeds — acquires)
C->>DB: INSERT (duplicate key error)
Note over C: Lost the race. Retry? When?
Attempt 2: Add TTL
CREATE TABLE lock_table (
name VARCHAR(255) PRIMARY KEY,
holder VARCHAR(255) NOT NULL,
expires_at TIMESTAMP NOT NULL
);
Acquire: insert with expires_at = NOW() + interval. A background sweeper deletes expired rows.
Better, but still breaks:
- Client A’s lock expires while it’s still doing work (GC pause, slow network). Client B acquires. Both think they hold the lock. This is exactly the problem Kleppmann describes — you cannot rely on timing for correctness.
- The check-then-act race still exists.
sequenceDiagram
participant A as Client A
participant DB as MySQL
participant S as TTL Sweeper
participant B as Client B
A->>DB: INSERT lock (expires_at = now+30s)
Note over A: Working... GC pause
S->>DB: DELETE expired locks
Note over DB: A's row deleted
B->>DB: INSERT lock (succeeds)
Note over A: Resumes — thinks it holds lock
Note over A,B: Both think they hold the lock
Ref: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
The storage server remembers that it has already processed a write with a higher token number (34), and so it rejects the request with token 33.
- Note this requires the storage server to take an active role in checking tokens, and rejecting any writes on which the token has gone backwards.
- But this is not particularly hard, once you know the trick.
- And provided that the lock service generates strictly monotonically increasing tokens, this makes the lock safe.
- For example, if you are using ZooKeeper as lock service, you can use the zxid or the znode version number as fencing token, and you’re in good shape
Rebuttal: https://antirez.com/news/101
Attempt 3: SELECT FOR UPDATE
The root problem with Attempts 1-2: they use row presence/absence as the locking mechanism. INSERT and DELETE aren’t designed for mutual exclusion — they’re just data operations that happen to fail on duplicates. What if the row always exists, and we use the database’s actual locking primitive to protect it?
CREATE TABLE mutex (
name VARCHAR(255) PRIMARY KEY,
holder VARCHAR(255),
state ENUM('FREE', 'HELD') NOT NULL DEFAULT 'FREE'
);
-- Pre-create the mutex row
INSERT INTO mutex (name, state) VALUES ('my-lock', 'FREE');
Acquire:
START TRANSACTION;
SELECT * FROM mutex WHERE name = 'my-lock' FOR UPDATE; -- blocks other acquirers
-- Application checks: if state = 'FREE', then:
UPDATE mutex SET state = 'HELD', holder = 'client-A'
WHERE name = 'my-lock' AND state = 'FREE';
-- Check rows affected: 0 = lock was HELD → ROLLBACK, 1 = acquired
COMMIT;
Release:
UPDATE mutex SET state = 'FREE', holder = NULL WHERE name = 'my-lock';
What this fixes vs Attempt 2:
Check-then-act race → fixed. In Attempts 1-2, “check if row exists, then INSERT” is two separate operations — another client can slip in between. With SELECT FOR UPDATE, the check (is state FREE?) and the update (set to HELD) happen inside one transaction, and the exclusive row lock prevents anyone else from even reading the row until you commit.
Blocking instead of failing. In Attempts 1-2, a contending client gets a duplicate key error and has to poll. With SELECT FOR UPDATE, the second client blocks until the first commits — the database acts as a natural queue.
sequenceDiagram
participant A as Client A
participant DB as MySQL
participant B as Client B
A->>DB: BEGIN
A->>DB: SELECT FOR UPDATE (locks row)
A->>DB: sees state=FREE → UPDATE to HELD
B->>DB: BEGIN
B->>DB: SELECT FOR UPDATE
Note over B,DB: Blocks (A holds exclusive lock)
A->>DB: COMMIT
Note over B,DB: Unblocks, reads row
B->>DB: sees state=HELD → ROLLBACK
What this does NOT fix:
TTL misfire is still a safety problem. We still need TTL for crash recovery (client crashes → state stays HELD forever). But TTL can still fire too early:
- Client A holds the lock but is slow (GC pause, network delay)
- TTL sweep sets state back to
FREE - Client B does
SELECT FOR UPDATE, seesFREE, acquires - Client A resumes — both think they hold the lock
SELECT FOR UPDATE prevents the race (two clients checking simultaneously), but can’t prevent TTL from pulling the rug out from under a slow holder. This is the same fundamental problem as Attempt 2 — timing can’t guarantee correctness.
This requires a fencing token to solve completely — a monotonically increasing ID passed to the downstream resource, which rejects writes from stale holders. We’ll address this in Part 6.
Also still limited: only supports 1 holder. We want N concurrent holders (counting semaphore).
Key insight: SELECT FOR UPDATE is the fundamental building block. It fixes the race condition and the check-then-act problem. Everything below builds on it.
Part 3: From Mutex to Counting Semaphore
A mutex allows 1 holder. A counting semaphore allows N. Instead of checking state = FREE, we count how many permits are currently held and compare against capacity.
Schema
CREATE TABLE semaphore (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL UNIQUE,
capacity INT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
CREATE TABLE permit_request (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
external_id VARCHAR(255) NOT NULL UNIQUE, -- client-provided idempotency key
owner VARCHAR(255),
state ENUM('ACQUIRED', 'RELEASED') NOT NULL, -- no PENDING/WAITING: see note below
ttl_seconds INT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
CREATE TABLE permit (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
semaphore_id BIGINT UNSIGNED NOT NULL REFERENCES semaphore(id),
permit_request_id BIGINT UNSIGNED NOT NULL REFERENCES permit_request(id),
count INT NOT NULL DEFAULT 1,
state ENUM('ACQUIRED', 'RELEASED') NOT NULL,
INDEX idx_semaphore_state (semaphore_id, state),
INDEX idx_request (permit_request_id)
);
No PENDING or WAITING state: This is intentional. The design doesn’t support queued/blocking acquisition — if there’s no capacity, the acquire fails immediately (or blocks briefly at the DB lock level). If you need a waiting queue, you’d add a PENDING state and a notification mechanism, but that adds significant complexity. Failing fast and letting the caller retry is simpler and sufficient for most use cases.
The idx_semaphore_state index is critical — without it, SELECT SUM(count) FROM permit WHERE semaphore_id = ? AND state = 'ACQUIRED' does a full table scan on every acquire.
erDiagram
semaphore ||--o{ permit : "has"
permit_request ||--o{ permit : "has"
semaphore {
bigint id PK
varchar name UK
int capacity
}
permit_request {
bigint id PK
varchar external_id UK
varchar owner
enum state
int ttl_seconds
}
permit {
bigint id PK
bigint semaphore_id FK
bigint permit_request_id FK
int count
enum state
}
Why 3 tables instead of 1?
A single acquire can span multiple semaphores atomically — “I need 1 backup slot AND 1 network slot.” The permit table is the many-to-many join between requests and semaphores. permit_request tracks the overall lifecycle. semaphore defines capacity.
This also means you can query utilization per semaphore independently.
The external_id (Idempotency Key)
The external_id on permit_request is client-provided, not server-generated. The caller decides what to pass based on their operation:
- A background job uses its job ID:
"job-abc-123" - A job holding permits for multiple snapshots uses a composite key:
"job-abc-123::snapshot-456" - Any caller can use a UUID they generate:
"550e8400-e29b-41d4-a716-446655440000"
The service doesn’t care what the string is — it just enforces uniqueness. This is what makes retries safe: the caller knows their own operation ID and can call acquire again with the same key without risk of double-granting.
Note: Do not confuse this idempotency key with a fencing token. The
external_idprotects the lock database from double-granting on retries. It does not protect downstream resources from stale writes if a TTL expires while the client is still working. For that, you’d use the auto-incrementingpermit.idas a monotonically increasing fencing token passed to the storage layer (see Part 6).
Part 4: Acquire — Building It Up
The naive version (broken)
SELECT SUM(count) AS used FROM permit
WHERE semaphore_id = 1 AND state = 'ACQUIRED';
-- Application checks: if used + requested <= capacity, then:
INSERT INTO permit_request (...) VALUES (...);
INSERT INTO permit (...) VALUES (...);
Race condition: Two clients read used = 9 simultaneously, both see room for 1 more (capacity = 10), both insert. Now 11 permits are held. The read and write are not atomic.
Fix 1: SELECT FOR UPDATE
Lock the semaphore row before reading:
START TRANSACTION;
SELECT id, capacity FROM semaphore WHERE name = 'backup-slots' FOR UPDATE;
-- Now this row is exclusively locked. Other transactions block here.
SELECT SUM(count) AS used FROM permit
WHERE semaphore_id = 1 AND state = 'ACQUIRED';
-- Check: used + requested <= capacity
-- If yes:
INSERT INTO permit_request ...;
INSERT INTO permit ...;
COMMIT;
-- If no:
ROLLBACK;
Now the second client blocks on SELECT FOR UPDATE until the first commits. When it unblocks, it reads the updated count and correctly sees no room.
sequenceDiagram
participant A as Client A
participant DB as MySQL
participant B as Client B
A->>DB: BEGIN + SELECT FOR UPDATE (locks semaphore row)
B->>DB: BEGIN + SELECT FOR UPDATE
Note over B,DB: Blocks (A holds lock)
A->>DB: SELECT SUM(count) → used=9, cap=10
A->>DB: INSERT permit (used+1 ≤ cap ✓)
A->>DB: COMMIT
Note over B,DB: Unblocks
B->>DB: SELECT SUM(count) → used=10, cap=10
Note over B: used+1 > cap ✗
B->>DB: ROLLBACK
Fix 2: Multi-semaphore acquire — the deadlock problem
What if we need to acquire from multiple semaphores atomically?
-- Request A wants [X, Y]
SELECT ... FROM semaphore WHERE name = 'X' FOR UPDATE; -- locks X
SELECT ... FROM semaphore WHERE name = 'Y' FOR UPDATE; -- locks Y
-- Request B wants [Y, X]
SELECT ... FROM semaphore WHERE name = 'Y' FOR UPDATE; -- locks Y
SELECT ... FROM semaphore WHERE name = 'X' FOR UPDATE; -- blocks forever (A holds X)
A holds X, waits for Y. B holds Y, waits for X. Deadlock.
sequenceDiagram
participant A as Client A
participant DB as MySQL
participant B as Client B
A->>DB: FOR UPDATE name='X' (locks X)
B->>DB: FOR UPDATE name='Y' (locks Y)
A->>DB: FOR UPDATE name='Y'
Note over A,DB: Blocks (B holds Y)
B->>DB: FOR UPDATE name='X'
Note over B,DB: Blocks (A holds X)
Note over A,B: Circular wait = DEADLOCK
Fix: sort semaphore names before locking. Both requests lock X first, then Y. One waits, the other proceeds. No circular dependency. This is the classic lock ordering technique.
sequenceDiagram
participant A as Client A [X, Y]
participant DB as MySQL
participant B as Client B [X, Y]
A->>DB: FOR UPDATE name='X' (locks X)
B->>DB: FOR UPDATE name='X'
Note over B,DB: Blocks (A holds X)
A->>DB: FOR UPDATE name='Y' (locks Y)
A->>DB: COMMIT (releases X, Y)
B->>DB: FOR UPDATE name='X' (locks X)
B->>DB: FOR UPDATE name='Y' (locks Y)
B->>DB: COMMIT
Note over A,B: Both sorted [X,Y] → no circular wait
InnoDB deadlock detection: Even with sorted locking, InnoDB actively detects deadlocks (e.g., from gap locks or other edge cases) and automatically aborts one of the transactions. This makes the retry loop in Fix 4 not just a good idea, but mandatory — your code must handle ER_LOCK_DEADLOCK gracefully.
Also set a short lock wait timeout to fail fast under contention:
SET SESSION innodb_lock_wait_timeout = 5; -- vs default 50 seconds
Isolation level note: MySQL/InnoDB defaults to REPEATABLE READ. A subtle but critical detail: SELECT FOR UPDATE does not use the MVCC snapshot — it reads the latest committed version of the row, even under REPEATABLE READ. This is how it can see permits inserted by a transaction that committed while we were waiting for the lock. The stability guarantee comes from the exclusive lock itself (no concurrent writer can modify the row while we hold it), not from the snapshot. This trips people up — a regular SELECT under REPEATABLE READ would see the stale snapshot, but SELECT FOR UPDATE always reads current data.
Fix 3: Retries — the duplicate problem
Network blip. Client sends acquire, server processes it, response gets lost. Client retries.
Without protection: two permits granted for the same operation. Double-acquire.
Fix: idempotency check before doing anything.
SELECT id, state FROM permit_request WHERE external_id = 'req-abc-123';
-- If found → return existing state, done.
The UNIQUE constraint on external_id is the last line of defense. Even if two concurrent requests slip past the initial check (read NULL, both try to INSERT), the second INSERT fails with a duplicate key error. The service catches this, re-fetches the existing row, and returns its state.
Fix 4: Wrap it in retry-able transaction
Transient DB errors (deadlock detected, connection reset) should retry automatically. Up to 3 attempts:
- Failed before commit → nothing written, retry is fresh start
- Committed but response lost → retry hits idempotency check, returns existing result
The complete acquire
Putting all fixes together:
1. Idempotency check (SELECT by external_id)
2. Sort semaphore names
3. [Optional] Acquire in-memory pre-locks
4. BEGIN TRANSACTION (with retry, max 3)
a. SET innodb_lock_wait_timeout = 5
b. SELECT FOR UPDATE on each semaphore (sorted order)
c. SUM currently acquired permits
d. Check availability — if any exhausted → ROLLBACK
e. INSERT permit_request + INSERT permits
f. Reset lock wait timeout to default
5. COMMIT
Part 5: Release — Why It’s Simpler
The release flow
-- 1. Load the request
SELECT id, state FROM permit_request WHERE external_id = 'req-abc-123';
-- 2. If ACQUIRED:
UPDATE permit SET state = 'RELEASED'
WHERE permit_request_id = @request_id AND state != 'RELEASED';
UPDATE permit_request SET state = 'RELEASED'
WHERE id = @request_id;
-- 3. If already RELEASED: return ALREADY_RELEASED, no writes.
Why release doesn’t need SELECT FOR UPDATE
Acquire needs SELECT FOR UPDATE because multiple writers contend over a shared count. Release has no contention — it only touches rows belonging to this specific request.
Why release doesn’t need a transaction
The two UPDATEs are deliberately NOT wrapped in a transaction:
- One-way state machine.
ACQUIRED → RELEASEDonly. Never back. No race over direction. - Idempotent WHERE clause.
AND state != 'RELEASED'— running the same release twice matches zero rows the second time. - Crash between the two UPDATEs. Permits say RELEASED, request still says ACQUIRED. The orphan cleanup mechanism (Part 6) detects this inconsistency during its periodic sweep — it calls release again, the permit UPDATE is a no-op (already RELEASED), and the request UPDATE succeeds. Consistency restored eventually.
No locking. No transaction. The one-way state machine + idempotent updates + eventual cleanup make this safe.
stateDiagram-v2
[*] --> ACQUIRED : acquire()
ACQUIRED --> RELEASED : release()
RELEASED --> RELEASED : release() again (no-op)
note right of RELEASED : One-way. Never back to ACQUIRED.
Compare this to ZooKeeper’s approach: ephemeral nodes auto-delete when the session dies. Here, we get the same effect through the one-way state machine + background cleanup — it’s just not instant.
Part 6: The Leaked Permit Problem
The hardest problem in distributed locking: client acquires, then crashes. Nobody calls release.
Redis solves this with TTL auto-expiry on keys. ZooKeeper solves it with ephemeral nodes tied to sessions. Both are immediate (or near-immediate). Our MySQL approach uses passive cleanup — it’s not instant, but it’s correct.
Three cleanup mechanisms
a) TTL expiry — client sets a TTL when acquiring. Background sweep every ~5 minutes:
SELECT id, external_id, created_at, ttl_seconds
FROM permit_request
WHERE state = 'ACQUIRED'
AND ttl_seconds IS NOT NULL;
For each: if now() - created_at > ttl_seconds → release it.
Not real-time (max ~5 min delay). Fine for long-running operations (backups, exports). Not suitable if you need sub-second expiry — use Redis for that.
b) Owner liveness check — for permits owned by background jobs, query the job orchestrator: “Is this job still running?” Dead job → release its permits.
c) Staleness sweep — any ACQUIRED request older than a threshold (e.g. 24h) → release.
stateDiagram-v2
[*] --> ACQUIRED : acquire()
ACQUIRED --> RELEASED : release() (normal)
ACQUIRED --> RELEASED : TTL expiry (passive cleanup)
ACQUIRED --> RELEASED : Owner dead (liveness check)
ACQUIRED --> RELEASED : Stale > 24h (staleness sweep)
RELEASED --> [*] : Cleaned up after 7 days
Why passive TTL is actually fine
Kleppmann’s core argument against Redlock: you can’t use timing for correctness. A lock that expires before work completes leads to two holders.
Our design treats TTL as a cleanup mechanism, not a correctness mechanism. SELECT FOR UPDATE provides mutual exclusion during acquisition. TTL just prevents permanent leaks from crashed clients.
But what if TTL fires while the client is still working? Two clients can genuinely hold permits simultaneously. The semaphore service alone cannot prevent this — it’s a fundamental limitation of any TTL-based system.
To fully solve this, the downstream resource must participate. The auto-incrementing permit.id can serve as a fencing token: the client passes it to the storage layer on every write, and the storage layer rejects writes with an ID lower than the highest it has seen. This way, even if a stale client resumes after TTL expiry, its writes are rejected.
sequenceDiagram
participant A as Client A (permit.id=42)
participant S as Semaphore DB
participant D as Downstream Storage
participant B as Client B (permit.id=43)
A->>S: acquire → permit.id=42
Note over A: Working... GC pause
S->>S: TTL sweep → releases permit 42
B->>S: acquire → permit.id=43
B->>D: write(data, fence=43)
D->>D: max_fence=43 ✓
A->>D: write(data, fence=42)
D->>D: 42 < 43 → REJECT (stale)
Without fencing, the semaphore provides efficiency (limiting concurrency) but not safety (preventing stale writes). With fencing, you get both.
Part 7: Performance — In-Memory Pre-Lock
Under high contention, SELECT FOR UPDATE queues transactions at the DB level. Each holds a DB connection while blocked. This wastes connection pool.
Optimization: before hitting the DB, acquire an in-memory named lock per semaphore within the service process:
Without: 10 concurrent requests → 10 DB transactions queuing on row lock
With: 10 concurrent requests → 1 hits DB, 9 wait in-memory (no connection held)
Properties:
- Per-semaphore granularity — different semaphores proceed in parallel
- 5s timeout — fail fast
- Purely optional — correctness from DB layer only. Multi-instance deployments rely on the DB for cross-instance serialization.
- Auto-released via deferred cleanup on transaction completion
Part 8: Maintenance
| Task | Frequency | Action |
|---|---|---|
| Orphan release | Every 5 mins | Release dead owner and TTL-expired permits |
| Request cleanup | Every 4 hours | Delete released requests older than 7 days |
| History cleanup | Every 24 hours | Delete capacity-change audit records older than 30 days |
Old request cleanup uses batched deletes (~800 rows/batch) to avoid long transactions that block production traffic. Fetch batch → delete permits → delete requests → repeat.
Summary: How Each Problem Gets Solved
| Problem | Naive approach | What breaks | Fix |
|---|---|---|---|
| Concurrent access | No locking | Race condition on count | SELECT FOR UPDATE |
| Multi-resource acquire | Lock in arbitrary order | Deadlock | Sort names → lock ordering |
| Network retry | No dedup | Double-acquire | external_id UNIQUE constraint |
| Partial failure | Separate statements | Some permits granted, some not | Single transaction |
| Client crash | Nothing | Permit leaked forever | TTL + owner liveness + staleness sweep |
| Timing-based expiry | TTL as lock mechanism | Two holders after expiry | Fencing token on downstream resource |
| DB contention | All requests hit DB | Connection pool exhaustion | In-memory pre-lock |
| DB transient failure | No retry | Operation lost | Retry + idempotency makes it safe |
Design Patterns
| Pattern | Purpose |
|---|---|
Idempotency key (external_id) |
Safe retries, no double-acquire |
Fencing token (permit.id) |
Prevent stale writes to downstream resources after TTL expiry |
| Sorted lock ordering | Deadlock prevention |
SELECT FOR UPDATE |
Mutual exclusion across distributed clients |
| Single transaction for acquire | All-or-nothing multi-semaphore acquire |
| One-way state machine | Idempotent release without locking |
| Passive TTL sweep | Leaked permit recovery without per-permit timers |
| In-memory pre-lock | Reduced DB contention under high load |
| Transaction retry + idempotency | Tolerance for transient DB failures |
| Batched deletes | Maintenance without blocking production traffic |
When to Use
Good fit:
- Counting semaphore (not just binary mutex)
- Multi-process/multi-machine coordination
- Already have MySQL/PostgreSQL — no new infrastructure
- Need atomic multi-semaphore acquire
- Need durability (permits survive restarts)
- Can tolerate passive TTL (minutes, not milliseconds)
Not a good fit:
- Sub-second lock expiry → Redis
SET NX EX - Leader election → ZooKeeper/etcd
- Microsecond latency on lock path
- No existing relational DB
Core Primitives
All correctness comes from five things:
SELECT FOR UPDATE— exclusive row locks for mutual exclusionUNIQUEconstraint — duplicate detection for idempotency- Transactions — atomic multi-table writes for all-or-nothing
- Sorted lock acquisition — deadlock prevention
- One-way state machine — safe concurrent releases without locking
References & Related Work
Distributed Locking Theory
- How to do distributed locking — Martin Kleppmann — Canonical critique of distributed locking algorithms. Key insight: timing-based locks (TTL expiry) cannot guarantee correctness. Introduces fencing tokens — monotonically increasing IDs that the downstream storage uses to reject stale writes. Note: our
external_idis an idempotency key (prevents double-grants on retries), not a fencing token. To implement fencing, you’d pass the auto-incrementingpermit.idto the downstream resource, which must reject writes with an ID lower than the highest it has seen. - Distributed Locking: A Practical Guide — Architecture Weekly — Practical comparison of distributed locking approaches: Redis
SET NX EX, ZooKeeper ephemeral nodes, DB-basedSELECT FOR UPDATE, PostgreSQL advisory locks, and Kubernetes single-instance pattern. Good overview of trade-offs between the approaches discussed here.
SQL-Based Locking Primitives
- MySQL InnoDB Locking Reads (
SELECT FOR UPDATE) — Official docs on the core primitive this design uses. Covers exclusive vs shared locks and interaction with transaction isolation levels. - PostgreSQL Advisory Locks — PostgreSQL’s alternative: lightweight application-defined locks that don’t require a dedicated table. Simpler but less durable (session-scoped).
Libraries & Tools
- ShedLock — Java library for distributed task locking using a DB table. Same pattern (lock table + TTL-based expiry) but limited to binary mutex for scheduled tasks. Good reference for a minimal implementation.
Alternative Approaches
- Redis Distributed Locks (Redlock) — Redis’s approach:
SET NX EXwith majority quorum across N instances. Better for sub-second TTLs. Limitations: depends on clock synchronization, no transactional multi-resource acquire, no consensus algorithm. Kleppmann argues it’s unsafe for correctness-critical locks. - ZooKeeper Recipes: Locks — Uses sequential ephemeral znodes. Elegant: ephemeral nodes auto-delete on session death (instant crash recovery, unlike our passive cleanup). Sequential ordering avoids herd effect (each waiter watches only the next-lower node). Downside: requires dedicated ZK infrastructure.