Data Victims
We've all been victims of
This is where we fight back.
$ spark-submit --master yarn pipeline_main.py [ERROR] Pipeline FAILED at 02:47 AM [FATAL] CEO dashboard showing yesterday's data.
What You'll Learn
Real Pipelines
End-to-end builds from raw sources to insights — orchestrated, idempotent, fail-fast, with actual production code you can use at work tomorrow.
$ run pipeline --env prod
✓ bronze 1.9M rows landed
✓ silver dedup + SCD2
✓ gold star schema OK
✓ 0 rows dropped silently
Real Architecture
Medallion lakehouses, layered zones, scheduled rebuilds — the decisions companies actually make, and why.
Real Problems
Schema drift, duplicate customers, failed payments, midnight failures. The ugly parts nobody talks about.
Real Modeling
Star schemas with hash-keyed dimensions, SCD Type 2 history, PII masking — modeled the way analysts need it.
Real Dashboards
Fully designed BI reports on every build — measures, semantic models, and glossaries for true self-serve.
Interview Prep
Technical, behavioral, and coding rounds — plus real interview problems from data engineering loops, solved and explained step by step.
Real Costs
Cost checkpoints at every stage and honest trade-off discussions — every build sized for a low cost or student budget.
full lakehouse · student subscription · headroom to spare
Not Tutorials. Production Builds.
Every Data Victims project is a complete, production-style system: a realistic business generating millions of rows of deliberately messy data, walked end to end through a real cloud lakehouse — from raw chaos to dashboards executives actually trust. Every decision is made the way a real team would make it. Nothing skipped.
Messy Sources
websites · marketplaces · APIs
Bronze
raw, landed as-is
Silver
cleansed · conformed · historized
Gold
star schema, rebuilt on schedule
Insights
dashboards people actually use
Deliberately messy data
Duplicate customers, returns, failed payments, inconsistent feeds — millions of rows of realistic chaos, because real data is never clean.
Idempotent, fail-fast orchestration
Pipelines you can rerun safely at 3 AM, that stop loudly instead of silently corrupting downstream tables.
History, privacy & identity done right
SCD Type 2 history, PII masking, and cross-channel identity stitching — the unglamorous work that makes data trustworthy.
Modeled for the business
Hash-keyed dimensions and facts in a proper star schema, with measures and glossaries so business users can self-serve.
Secured & cost-checked at every stage
Secrets in a vault, cost checkpoints before every scale-up, and honest trade-off discussions a real team would have.
Sized for a low cost or student budget
Every build runs end to end at low cost — even on a student subscription. Production patterns shouldn't require a production wallet.
0.0M+
rows of messy, realistic data in every build
0+
source systems stitched into one view
0+
measures behind every dashboard
0/7
scheduled, self-rebuilding pipelines
Real architecture. Real problems. Real fixes. Follow every build, step by step.
Tools I Work With Daily
Latest from the Channel
Building a Production Lakehouse on the Cloud — Start to Finish
Turning 2M Rows of Raw Chaos into a Star Schema
Cracking the Data Engineering Interview — Real Questions, Real Answers
Every data engineer has lived the first half of that log. This channel teaches you to write the second half.
Who's Behind This?
I'm Haseeb Shaikh — a Senior Data Engineer who builds and ships production systems for real companies across the UK, South Africa, and the Middle East. Everything I teach comes from real projects, not textbooks.
“I don't just move data — I make it talk, perform, and drive decisions.”
I'm a Senior Data Engineer who builds and ships production systems for real companies across the UK, South Africa, the Middle East, and Pakistan. Smart pipelines, powerful models, clean medallion architectures — data that lands where it should, fast, secure, and ready for action.
From SQL wizardry to Python spells, cloud platforms to APIs — I play with it all. Data Victims is where I take everything I've learned shipping real systems and turn it into builds you can follow end to end: real architecture, real problems, real fixes, nothing skipped.
Tech keeps changing. I keep learning. And no matter the tool, I get the job done.
Major Expertise
Certified
Recognition
Need More Than Videos?
The same engineering you see on the channel is available for your business — from a one-hour consulting call to a fully delivered data platform.
Consulting & Second Opinions
Architecture reviews, pipeline audits, and “is this design right?” calls — get an experienced eye on your data platform before you commit to it.
Freelance Builds
End-to-end pipelines, lakehouses, data models, and BI dashboards — scoped, built, documented, and delivered production-ready.
Client Engagements
Longer-term work embedded with your team — owning your data platform, mentoring engineers, and shipping alongside you.
Mock Technical Interviews
A realistic data engineering interview run against the job description you share — or one we suggest — with honest, detailed feedback on every answer.
Tell me about your data problem — I'll tell you honestly if I can fix it.
response within 24h · honest scoping · no lock-in
