Chapter· Feb 2025 — Present

Data Engineer
Zepto

Bangalore, India

At Zepto, the clock is the antagonist. Ten-minute delivery only works if every shelf, in every one of eleven hundred dark stores, knows the truth about itself in the same heartbeat. I came in to build that heartbeat — a planning source-of-truth and a real-time inventory spine that the entire merchandising and fulfilment org could lean on.

The streaming spine is a long, deliberate sentence: Debezium reads Postgres write-ahead logs the moment a SKU moves, Kafka carries the event, PyFlink keeps state in flight and reconciles it, ClickHouse fans it out for sub-second lookups. The thirty-minute batch we replaced wasn't slow because the cluster was small — it was slow because the architecture asked the wrong question. Now the question is asked continuously, and the answer is always less than a minute old.

The planning layer sits on PySpark, Delta Lake on S3, and Airflow, but the interesting work was less about frameworks and more about cost discipline. By rethinking partitioning, broadcast joins, and predicate pushdown on the Transfer Order pipeline, we picked up a 4.5× speedup, trimmed thirty to thirty-five percent off infra spend, and pulled mean-time-to-recovery from ninety minutes down to thirty.

The other half of the story is governance. Two hundred jobs, dozens of engineers, no appetite for snowflake clusters. I built an IaaC framework around GitHub Actions, Delta-table audit, and Databricks RBAC so that onboarding a new pipeline takes two hours instead of two days. The anomaly detector — a small Z-score service over Databricks telemetry, backed by Redis for sub-millisecond thresholds and MongoDB for alert history — pre-empts about half of what would otherwise become incidents.