How to Use Self Join to Calculate Running Total in SQL Server
In this post, we'll go over an example of calculating running total of each employee's daily sales using Self Join technique.
You can use this DB Fiddle to follow along and calculate Running Total with
Self Join.
Sample Data for Running Total Calculation
Let’s create some sample data to demonstrate how to calculate running total using self join technique.
We will create a EmployeeSales
table which has a EmployeeID
, SaleDate
, and SaleAmount
columns. SaleAmount
stores each employee's daily sales amount.
CREATE TABLE EmployeeSales(
[EmployeeID] INT,
[SaleDate] DATE,
[SaleAmount] INT
)
INSERT INTO EmployeeSales VALUES
(1,'2022-06-01',10),
(1,'2022-06-02',20),
(1,'2022-06-03',30),
(2,'2022-06-01',10),
(2,'2022-06-02',10),
(2,'2022-06-03',10)
EmployeeID | SaleDate | SaleAmount |
---|---|---|
1 | 2022-06-01 | 10 |
1 | 2022-06-02 | 20 |
1 | 2022-06-03 | 30 |
2 | 2022-06-01 | 10 |
2 | 2022-06-02 | 10 |
2 | 2022-06-03 | 10 |
Calculate Running Total with Self Join
Calculating Running Total with Self Join is a more efficient way when comparing to the Correlated Subquery approach of computing running total.
Since we want to compute running total for each EmployeeID
, we'll begin by performing a self join of EmployeeSales
table on the EmployeeID
column.
We alias the left table as L and the right table as R.
SELECT *
FROM EmployeeSales AS L
JOIN EmployeeSales AS R ON L.EmployeeID = R.EmployeeID
ORDER BY L.EmployeeID
,L.SaleDate
Looking at the Self Join results so far, we realize that we need to filter out some rows before we can calculate running total of each EmployeeID.
For examples, we only need row 1 (highlighted in green) to calculate running total for EmployeeID
1 and SaleDate
2022-06-01.
We only need rows 4 and 5 (highlighted in green) to calculate running total for EmployeeID
1 and SaleDate
2022-06-02.
This can be accomplished by adding a WHERE clause to ensure that only rows satisfying the condition L.SaleDate >= R.SaleDate are returned.
SELECT *
FROM EmployeeSales AS L
JOIN EmployeeSales AS R ON L.EmployeeID = R.EmployeeID
WHERE L.SaleDate >= R.SaleDate
ORDER BY L.EmployeeID
,L.SaleDate
EmployeeID | SaleDate | SaleAmount | EmployeeID | SaleDate | SaleAmount |
---|---|---|---|---|---|
1 | 2022-06-01 | 10 | 1 | 2022-06-01 | 10 |
1 | 2022-06-02 | 20 | 1 | 2022-06-01 | 10 |
1 | 2022-06-02 | 20 | 1 | 2022-06-02 | 20 |
1 | 2022-06-03 | 30 | 1 | 2022-06-01 | 10 |
1 | 2022-06-03 | 30 | 1 | 2022-06-02 | 20 |
1 | 2022-06-03 | 30 | 1 | 2022-06-03 | 30 |
2 | 2022-06-01 | 10 | 2 | 2022-06-01 | 10 |
2 | 2022-06-02 | 10 | 2 | 2022-06-01 | 10 |
2 | 2022-06-02 | 10 | 2 | 2022-06-02 | 10 |
2 | 2022-06-03 | 10 | 2 | 2022-06-01 | 10 |
2 | 2022-06-03 | 10 | 2 | 2022-06-02 | 10 |
2 | 2022-06-03 | 10 | 2 | 2022-06-03 | 10 |
Now we finally have the correct self-joined dataset to compute running total for each employee.
We just group by left table's EmployeeID
and SaleDate
columns,
apply MAX() aggregate function on left table's SaleAmount
column to get each EmployeeID's daily SaleAmount,
and apply SUM() aggregate function on right table's SaleAmount
column to compute running total sales amount so far.
SELECT L.EmployeeID
,L.SaleDate
,MAX(L.SaleAmount) AS SaleAmount
,SUM(R.SaleAmount) AS RunningTotal
FROM EmployeeSales AS L
JOIN EmployeeSales AS R ON L.EmployeeID = R.EmployeeID
AND L.SaleDate >= R.SaleDate
GROUP BY L.EmployeeID
,L.SaleDate
ORDER BY L.EmployeeID
,L.SaleDate
EmployeeID | SaleDate | SaleAmount | RunningTotal |
---|---|---|---|
1 | 2022-06-01 | 10 | 10 |
1 | 2022-06-02 | 20 | 30 |
1 | 2022-06-03 | 30 | 60 |
2 | 2022-06-01 | 10 | 10 |
2 | 2022-06-02 | 10 | 20 |
2 | 2022-06-03 | 10 | 30 |
As you can see, it's a bit trickier to compute cumulative running total with self join technique.
If you are running SQL Server 2012 or later, I would recommend utilizing Window Function to calculate running total.
Your code will be more concise and your query will perform well as Window Function is designed and optimized for analytic scenarios like this.