Welcome to the real world of data engineering

Data Victims

We've all been victims of

This is where we fight back.

terminal
$ spark-submit --master yarn pipeline_main.py
[ERROR] Pipeline FAILED at 02:47 AM
[FATAL] CEO dashboard showing yesterday's data.
01The Curriculum

What You'll Learn

Real Pipelines

End-to-end builds from raw sources to insights — orchestrated, idempotent, fail-fast, with actual production code you can use at work tomorrow.

$ run pipeline --env prod

✓ bronze  1.9M rows landed

✓ silver  dedup + SCD2

✓ gold    star schema OK

✓ 0 rows dropped silently

Real Architecture

Medallion lakehouses, layered zones, scheduled rebuilds — the decisions companies actually make, and why.

Real Problems

Schema drift, duplicate customers, failed payments, midnight failures. The ugly parts nobody talks about.

Real Modeling

Star schemas with hash-keyed dimensions, SCD Type 2 history, PII masking — modeled the way analysts need it.

Real Dashboards

Fully designed BI reports on every build — measures, semantic models, and glossaries for true self-serve.

Interview Prep

Technical, behavioral, and coding rounds — plus real interview problems from data engineering loops, solved and explained step by step.

Real Costs

Cost checkpoints at every stage and honest trade-off discussions — every build sized for a low cost or student budget.

monthly cloud spend$11.42 / $30.00

full lakehouse · student subscription · headroom to spare

02The Projects

Not Tutorials. Production Builds.

Every Data Victims project is a complete, production-style system: a realistic business generating millions of rows of deliberately messy data, walked end to end through a real cloud lakehouse — from raw chaos to dashboards executives actually trust. Every decision is made the way a real team would make it. Nothing skipped.

pipeline/medallion_build — liveHEALTHY

Messy Sources

websites · marketplaces · APIs

Bronze

raw, landed as-is

Silver

cleansed · conformed · historized

Gold

star schema, rebuilt on schedule

Insights

dashboards people actually use

Deliberately messy data

Duplicate customers, returns, failed payments, inconsistent feeds — millions of rows of realistic chaos, because real data is never clean.

Idempotent, fail-fast orchestration

Pipelines you can rerun safely at 3 AM, that stop loudly instead of silently corrupting downstream tables.

History, privacy & identity done right

SCD Type 2 history, PII masking, and cross-channel identity stitching — the unglamorous work that makes data trustworthy.

Modeled for the business

Hash-keyed dimensions and facts in a proper star schema, with measures and glossaries so business users can self-serve.

Secured & cost-checked at every stage

Secrets in a vault, cost checkpoints before every scale-up, and honest trade-off discussions a real team would have.

Sized for a low cost or student budget

Every build runs end to end at low cost — even on a student subscription. Production patterns shouldn't require a production wallet.

0.0M+

rows of messy, realistic data in every build

0+

source systems stitched into one view

0+

measures behind every dashboard

0/7

scheduled, self-rebuilding pipelines

Real architecture. Real problems. Real fixes. Follow every build, step by step.

03The Stack

Tools I Work With Daily

SnowflakeDatabricksSparkPythonawsAWSAzureGCPPower BIAPIsDatabasesSnowflakeDatabricksSparkPythonawsAWSAzureGCPPower BIAPIsDatabases
04The Channel

Latest from the Channel

Coming Soon22:14

Building a Production Lakehouse on the Cloud — Start to Finish

Coming Soon18:47

Turning 2M Rows of Raw Chaos into a Star Schema

Coming Soon15:02

Cracking the Data Engineering Interview — Real Questions, Real Answers

pipeline_prod.log — liveREC
$

Every data engineer has lived the first half of that log. This channel teaches you to write the second half.

05The Engineer

Who's Behind This?

I'm Haseeb Shaikh — a Senior Data Engineer who builds and ships production systems for real companies across the UK, South Africa, and the Middle East. Everything I teach comes from real projects, not textbooks.

Haseeb Shaikh

Haseeb Shaikh

Senior Data Engineer · Architect · Founder of Data Victims

I don't just move data — I make it talk, perform, and drive decisions.

I'm a Senior Data Engineer who builds and ships production systems for real companies across the UK, South Africa, the Middle East, and Pakistan. Smart pipelines, powerful models, clean medallion architectures — data that lands where it should, fast, secure, and ready for action.

From SQL wizardry to Python spells, cloud platforms to APIs — I play with it all. Data Victims is where I take everything I've learned shipping real systems and turn it into builds you can follow end to end: real architecture, real problems, real fixes, nothing skipped.

Tech keeps changing. I keep learning. And no matter the tool, I get the job done.

Major Expertise

SQLPythonPySparkAzureSnowflakeDatabricksAWSAPIsData ArchitectureTransformationModelingAnalytics

Certified

SnowPro AssociateMicrosoft Fabric DEAzure DP-203

Recognition

Impact LeaderExceptional PerformerGold Medalist
06Work With Me

Need More Than Videos?

The same engineering you see on the channel is available for your business — from a one-hour consulting call to a fully delivered data platform.

Consulting & Second Opinions

Architecture reviews, pipeline audits, and “is this design right?” calls — get an experienced eye on your data platform before you commit to it.

Freelance Builds

End-to-end pipelines, lakehouses, data models, and BI dashboards — scoped, built, documented, and delivered production-ready.

Client Engagements

Longer-term work embedded with your team — owning your data platform, mentoring engineers, and shipping alongside you.

Mock Technical Interviews

A realistic data engineering interview run against the job description you share — or one we suggest — with honest, detailed feedback on every answer.

Tell me about your data problem — I'll tell you honestly if I can fix it.

response within 24h · honest scoping · no lock-in