Tredence is where I learned that consulting data engineering is half archaeology, half cartography. Every engagement begins with a legacy estate that grew faster than its documentation, and ends with a map clear enough that the next team can navigate it alone.
The first artefact was an XML metadata parser, written in Python with lxml and a stubborn commitment to clean OOP — small in lines, large in consequence. It cut client onboarding by a quarter and quietly contributed two hundred thousand dollars in revenue across the year.
The headline project was a DataStage to Azure migration: ADF as the orchestrator, ADLS Gen2 as the lake, Databricks as the compute, all arranged in a medallion architecture that finally let bronze be bronze and gold be gold. Processing time fell by twenty percent, and — more importantly — the lineage became something you could explain to a stakeholder in one breath.
Underneath all of it ran the PySpark work on EMR, fifteen terabytes and twelve billion records at a time. Star-schema models in Hive and Snowflake, six source systems wired across AWS, Azure, and GCP, and a release cycle that got shorter by an entire sprint after we cleared thirty-odd defects out of SIT and UAT. The lesson I took with me: tuning a query is technical, but tuning a delivery cadence is engineering.