Ashish Kumar portrait
Ashish Kumar

Hello —I'm Ashish Kumar,

I help businesses stop guessing. I clean their messy data, connect it across systems, and turn it into the dashboards, alerts and AI that their teams actually trust.

4+ years building petabyte-scale data50% fewer production incidents35% infra cost cut30 min → <1 min real-time inventory1100+ dark stores on one source of truth200+ Spark jobs governed by IaaC2M+ records/day TV measurement ETL12B records · 15TB PySpark on EMRKafka · Flink · Debezium · ClickHouse · Databricks

What I actually do

I turn messy data into business value.

Every business sits on a pile of scattered, half-trusted information. My job is to clean it, connect it, and turn it into the fuel that decisions, savings, products and AI run on.

● The reality of any business
unclean
Sales & POS
stores, online, retail
unclean
Apps & web
clicks, sessions, carts
unclean
CRM & support
customers, tickets
unclean
Ads & marketing
spend, campaigns
unclean
Spreadsheets & ops
the messy reality

I clean it. Connect it. Make it trustworthy.

pipelines · quality · governance

● What the business gets
Faster decisions
minutes, not days
0×
Lower costs
cloud + people time
0%
New revenue
unlocked use-cases
0K+
AI-ready data
trustworthy fuel for ML
0%

About

A note about me.

Ashish Kumar portrait

Senior Data Engineer with 4+ years building petabyte-scale ETL/ELT pipelines, real-time streaming, and distributed warehousing at Zepto, Nielsen, and Tredence. PySpark, Kafka, Flink, Databricks, AWS & Azure. Reduced incidents 50%, cut infra cost 30–35%, accelerated delivery 30–40%.

  • 4+
    years building data
  • 50%
    fewer incidents
  • 35%
    infra cost cut
  • curiosity

Milestones

The ladder so far.

Hover a stone for a one-line read. Click to step into the full story.

  1. Zepto logo

    Feb 2025 — Present

    Data Engineer

    Zepto · Bangalore, India

    Read story
    Pioneered big-data SOT for Planning platform across 1100+ dark stores.Architected real-time Inventory Consolidation: Debezium → Kafka → PyFlink → ClickHouse.Built Anomaly Detection over Databricks telemetry with Redis + MongoDB.IaaC framework governing 200+ jobs.Optimized Spark Transfer Order pipelines — 4.5× speedup.
  2. Nielsen logo

    Sep 2024 — Jan 2025

    Data Engineer

    Nielsen · Bangalore, India

    Read story
    Big-data ETL on Databricks for TV audience measurement.2M+ records/day via Kafka + Airflow on AWS.Resolved bottleneck via broadcast joins, key salting, multi-threading.Accelerated ETL 30% via predicate pushdown and rightsizing.
  3. Tredence Inc. logo

    Jun 2022 — Aug 2024

    Data Engineer

    Tredence Inc. · Bangalore, India

    Read story
    Built XML metadata parser contributing $200K revenue.DataStage → ADF/ADLS Gen2/Databricks migration.Tuned PySpark on EMR (15TB / 12B records).Integrated 6 multi-source systems across AWS, Azure, GCP.
  4. NIT Jalandhar logo

    2018 — 2022

    B.Tech, Chemical Engineering

    NIT Jalandhar · Jalandhar, India

    Read story
    B.Tech in Chemical Engineering — graduated 2022.

Stack

The toolkit.

A curated set across cloud, streaming, batch and orchestration.

AWAWS logo
AWS
AZAzure logo
Azure
GCGCP logo
GCP

Selected Work

Things I've built.

Platforms and pipelines designed for throughput, cost, and operability at real production scale.

30 min → <1 min

Real-time Inventory Consolidation

Debezium CDC from PostgreSQL → Kafka → PyFlink stateful streaming → ClickHouse OLAP. Sub-minute SKU inventory across 1100+ Zepto dark stores.

#kafka#pyflink#debezium#clickhouse
1100+ dark stores

Planning Platform SOT (Petabyte-scale)

PySpark + Spark SQL + Delta Lake on S3, Airflow DAGs — single source of truth for FMCG, SuperStore, Cafe & Milk planning.

#pyspark#delta lake#airflow#s3
10+ eng-hrs / week

IaaC Governance Framework

Diff engine + Delta audit + Databricks RBAC governing 200+ jobs. Multi-stage Docker builds for PyFlink/Debezium cut onboarding to 2 hrs.

#github actions#docker#databricks
50% incidents prevented

Anomaly Detection Service

OOP Python service over Databricks telemetry with Redis sub-ms threshold lookup and MongoDB alert history.

#python#redis#mongodb#databricks

Contact

Let's build something.

Open to senior data engineering roles at top product companies — building real-time data platforms, lakehouses, and streaming systems at scale.

asisnitj@gmail.com