Home | Aurélien Darracq

Full Load vs Incremental Load: How to Choose the Right Data Ingestion Strategy

March 10, 2026 · 📝 Blog

Your pipeline runs perfectly on day one.

PySpark vs Scala Spark: Performance Comparison and Syntax Guide for Data Engineers

February 17, 2026 · 📝 Blog

You’ve heard Scala is “faster” for Spark, but is rewriting your entire pipeline worth a 15% speedup.

Real-Time Banking Fraud Detection — Serverless Streaming Pipeline

February 8, 2026 · 🚀 Project

A real-time streaming ETL pipeline that detects fraudulent banking transactions with < 1 second latency, processing data through a medallion architecture on Google Cloud Platform.

Building Efficient Data Pipelines with SQL Window Functions

December 28, 2025 · 📝 Blog

Window functions let you perform complex aggregations and rankings across ordered datasets without expensive joins or subqueries, making your data pipelines faster and more readable.

When to Migrate from Pandas to PySpark: Configuration, Hybrid Patterns, and ML Integration

December 10, 2025 · 📝 Blog

Your churn model crashes at 15GB.

Best practices for Well-Designed SQL Tables

December 4, 2025 · 📝 Blog

Your analytics dashboard freezes every time you filter by date range.

Recent Posts