In the bustling data center of the e-commerce platform, there lived a tired but loyal piece of infrastructure: a PostgreSQL database named KQR (Key-Query-Resolver).
def get(key): if key in cache: return cache[key] else: // Only one thread goes to DB; others wait for its result return cache.load_or_wait(key) Within 30 seconds, the contention ratio dropped from 1.00 to 0.001. kqr row cache contention check gets
def get(key): if key in cache: return cache[key] else: value = db.query("SELECT * FROM items WHERE id = ?", key) // slow cache[key] = value return value Because the cache was empty, all 10,000 threads saw a at the exact same moment. They all rushed to the database. In the bustling data center of the e-commerce
, the on-call engineer, saw the alert: kqr row cache contention check gets = CRITICAL She’d seen this before. It wasn’t a database problem — it was a thundering herd problem. They all rushed to the database
At 9:00:00 AM, a surge of traffic hit. Every user, in every time zone, suddenly demanded the same piece of data: the flash sale metadata for item ID #42.
KQR’s cache logic looked like this (pseudocode):
She hot-patched KQR’s logic to use :