♞♝♜♛

Ashish Kumar

♚

Senior Data Engineer Bangalore, India🇮🇳 target

Targeting

Google

Hello —I'm Ashish Kumar,

I help businesses stop guessing. I clean their messy data, connect it across systems, and turn it into the dashboards, alerts and AI that their teams actually trust.

See my work Hire me

4+ years building petabyte-scale data50% fewer production incidents35% infra cost cut30 min → <1 min real-time inventory1100+ dark stores on one source of truth200+ Spark jobs governed by IaaC2M+ records/day TV measurement ETL12B records · 15TB PySpark on EMRKafka · Flink · Debezium · ClickHouse · Databricks

What I actually do

I turn messy data into business value.

Every business sits on a pile of scattered, half-trusted information. My job is to clean it, connect it, and turn it into the fuel that decisions, savings, products and AI run on.

● The reality of any business

unclean

Sales & POS

stores, online, retail

unclean

Apps & web

clicks, sessions, carts

unclean

CRM & support

customers, tickets

unclean

Ads & marketing

spend, campaigns

unclean

Spreadsheets & ops

the messy reality

♚

I clean it. Connect it. Make it trustworthy.

pipelines · quality · governance

● What the business gets

Faster decisions

minutes, not days

0×

Lower costs

cloud + people time

New revenue

unlocked use-cases

0K+

AI-ready data

trustworthy fuel for ML

About

A note about me.

♚

Senior Data Engineer with 4+ years building petabyte-scale ETL/ELT pipelines, real-time streaming, and distributed warehousing at Zepto, Nielsen, and Tredence. PySpark, Kafka, Flink, Databricks, AWS & Azure. Reduced incidents 50%, cut infra cost 30–35%, accelerated delivery 30–40%.

4+
years building data
50%
fewer incidents
35%
infra cost cut
∞
curiosity

Milestones

The ladder so far.

Hover a stone for a one-line read. Click to step into the full story.

Feb 2025 — Present
Data Engineer
Zepto · Bangalore, India
Read story
Pioneered big-data SOT for Planning platform across 1100+ dark stores.Architected real-time Inventory Consolidation: Debezium → Kafka → PyFlink → ClickHouse.Built Anomaly Detection over Databricks telemetry with Redis + MongoDB.IaaC framework governing 200+ jobs.Optimized Spark Transfer Order pipelines — 4.5× speedup.
Sep 2024 — Jan 2025
Data Engineer
Nielsen · Bangalore, India
Read story
Big-data ETL on Databricks for TV audience measurement.2M+ records/day via Kafka + Airflow on AWS.Resolved bottleneck via broadcast joins, key salting, multi-threading.Accelerated ETL 30% via predicate pushdown and rightsizing.
Jun 2022 — Aug 2024
Data Engineer
Tredence Inc. · Bangalore, India
Read story
Built XML metadata parser contributing $200K revenue.DataStage → ADF/ADLS Gen2/Databricks migration.Tuned PySpark on EMR (15TB / 12B records).Integrated 6 multi-source systems across AWS, Azure, GCP.
2018 — 2022
B.Tech, Chemical Engineering
NIT Jalandhar · Jalandhar, India
Read story
B.Tech in Chemical Engineering — graduated 2022.

Stack

The toolkit.

A curated set across cloud, streaming, batch and orchestration.

AWS

Azure

GCP

Selected Work

Things I've built.

Platforms and pipelines designed for throughput, cost, and operability at real production scale.

30 min → <1 min

Real-time Inventory Consolidation

Debezium CDC from PostgreSQL → Kafka → PyFlink stateful streaming → ClickHouse OLAP. Sub-minute SKU inventory across 1100+ Zepto dark stores.

#kafka#pyflink#debezium#clickhouse

1100+ dark stores

Planning Platform SOT (Petabyte-scale)

PySpark + Spark SQL + Delta Lake on S3, Airflow DAGs — single source of truth for FMCG, SuperStore, Cafe & Milk planning.

#pyspark#delta lake#airflow#s3

10+ eng-hrs / week

IaaC Governance Framework

Diff engine + Delta audit + Databricks RBAC governing 200+ jobs. Multi-stage Docker builds for PyFlink/Debezium cut onboarding to 2 hrs.

#github actions#docker#databricks

50% incidents prevented

Anomaly Detection Service

OOP Python service over Databricks telemetry with Redis sub-ms threshold lookup and MongoDB alert history.

#python#redis#mongodb#databricks

Contact

Let's build something.

Open to senior data engineering roles at top product companies — building real-time data platforms, lakehouses, and streaming systems at scale.

asisnitj@gmail.com