Window functions let you perform complex aggregations and rankings across ordered datasets without expensive joins or subqueries, making your data pipelines faster and more readable.
❌ The Problem: Complex Subqueries and Joins
You’re calculating month-over-month growth rates using three nested subqueries and a self-join. There’s a better way.
Consider this SQL to compute month-over-month sales growth:
SELECT
curr.month,
curr.total_sales,
prev.total_sales AS prev_month_sales,
((curr.total_sales - prev.total_sales) / prev.total_sales) * 100 AS mom_growth
FROM
(SELECT month, SUM(sales) AS total_sales FROM sales_data GROUP BY month) AS curr
JOIN
(SELECT month, SUM(sales) AS total_sales FROM sales_data GROUP BY month) AS prev
ON curr.month = DATEADD(MONTH, 1, prev.month);
This query is hard to read and inefficient due to multiple aggregations and joins.
✅ The Solution: SQL Window Functions
Using window functions, you can simplify the above query significantly:
SELECT
month,
SUM(sales) AS total_sales,
LAG(SUM(sales)) OVER (ORDER BY month) AS prev_month_sales,
((SUM(sales) - LAG(SUM(sales)) OVER (ORDER BY month)) / LAG(SUM(sales)) OVER (ORDER BY month)) * 100 AS mom_growth
FROM sales_data
GROUP BY month;
You can ever improve readability by using a CTE:
WITH monthly_sales AS (
SELECT
month,
SUM(sales) AS total_sales
FROM sales_data
GROUP BY month
)
SELECT
month,
total_sales,
LAG(total_sales) OVER (ORDER BY month) AS prev_month_sales,
((total_sales - LAG(total_sales) OVER (ORDER BY month)) / LAG(total_sales) OVER (ORDER BY month)) * 100 AS mom_growth
FROM monthly_sales;
For this precise example it can be over kill to use CTE, but in more complex queries it can help with readability and maintainability.
Key Window Functions
LAG()andLEAD(): Access data from previous or next rows without self-joins.ROW_NUMBER(),RANK(),DENSE_RANK(): Assign unique or ranked numbers to rows within partitions.SUM() OVER(),AVG() OVER(): Perform cumulative or moving aggregates.
Improving Performance
- Always Define Frame Bounds for Aggregations
-- ❌ SLOW: Unbounded (recalculates from start every row) SUM(revenue) OVER (ORDER BY date)-- ✅ FAST: Bounded window SUM(revenue) OVER ( ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) - Partition Large Datasets
-- Splits dataset into smaller chunks for parallel processing SELECT date, product_category, revenue, SUM(revenue) OVER ( PARTITION BY product_category ORDER BY date ) as category_running_total FROM daily_sales; -- Processes each category independently ✅ - Use ROWS vs RANGE Carefully
-- ROWS: Physical row offset (faster) SUM(revenue) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW )-- RANGE: Logical value range (slower, handles ties) SUM(revenue) OVER ( ORDER BY date RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW ) - Reuse Window Definitions
-- ❌ INEFFICIENT: Recalculates ordering multiple times SELECT date, LAG(revenue) OVER (ORDER BY date), LEAD(revenue) OVER (ORDER BY date), SUM(revenue) OVER (ORDER BY date) FROM daily_sales;-- ✅ EFFICIENT: Define window once SELECT date, LAG(revenue) OVER w, LEAD(revenue) OVER w, SUM(revenue) OVER w FROM daily_sales WINDOW w AS (ORDER BY date);
Real World Performance Comparisons
Scenario: Calculate 30-day metrics for 5 years of daily data (~1,825 rows)
- Query A: Using LAG() only
SELECT date, revenue, LAG(revenue, 30) OVER (ORDER BY date) as revenue_30d_ago, revenue - LAG(revenue, 30) OVER (ORDER BY date) as change_30d FROM daily_sales; -- Execution time: ~50ms ✅ - Query B: Using SUM OVER (unbounded)
SELECT date, revenue, SUM(revenue) OVER (ORDER BY date) as running_total FROM daily_sales; -- Execution time: ~180ms ⚠️ - Query C: Using SUM OVER (bounded 30-day window)
SELECT date, revenue, SUM(revenue) OVER ( ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW ) as rolling_30d_total FROM daily_sales; -- Execution time: ~75ms ✅
When to use each approach
Use LAG/LEAD When:
✅ Comparing specific rows (yesterday, last week, last month)
✅ Calculating period-over-period changes
✅ Finding next/previous non-null values
✅ Performance is critical ```sql -- Perfect use cases LAG(revenue, 1) OVER (ORDER BY date) as yesterday LAG(revenue, 7) OVER (ORDER BY date) as last_week LAG(revenue, 30) OVER (ORDER BY date) as last_month ```
Use SUM/AVG OVER When:
✅ Calculating running totals
✅ Computing moving averages
✅ Aggregating within sliding windows
✅ Need cumulative metrics
-- Perfect use cases
SUM(revenue) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -- running total
AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) -- 7-day MA
SUM(orders) OVER (PARTITION BY region ORDER BY date) -- regional running totals
ADVANCED: Hybrid Approach (Best Performance)
-- Combine techniques for optimal performance
WITH daily_metrics AS (
SELECT
date,
revenue,
-- Fast LAG operations
LAG(revenue, 1) OVER w as prev_day,
LAG(revenue, 7) OVER w as prev_week,
-- Bounded aggregations
AVG(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as ma_7day,
-- Row number (very fast)
ROW_NUMBER() OVER w as day_number
FROM daily_sales
WINDOW w AS (ORDER BY date)
)
SELECT
*,
revenue - prev_day as daily_change,
revenue - prev_week as weekly_change,
revenue - ma_7day as vs_7day_avg
FROM daily_metrics;
-- Execution time: ~60ms ✅
Bottom Line (Conclusion)
Performance Ranking (Fast → Slow):
- 🥇 ROW_NUMBER / RANK - Simple enumeration
- 🥈 LAG / LEAD - Direct row access (O(1))
- 🥉 SUM/AVG with bounded ROWS - Fixed window size
- ⚠️ SUM/AVG with RANGE - Value-based frames
- 🐌 SUM/AVG unbounded - Recalculates everything (O(N²))
Choose based on:
- Need point comparisons? → LAG/LEAD
- Need cumulative values? → SUM OVER with bounds
- Need both? → Combine them efficiently with WINDOW clause