SQL Find Missing Values in a Sequence
Finding missing values in a sequence is crucial for maintaining data integrity, ensuring analytical accuracy, supporting informed decision-making, and contributing to the overall reliability and quality of data-driven systems.
In this post, we will demonstrate how to use Recursive CTE to construct a virtual RecursiveSequence
table and then use LEFT JOIN technique to find missing values within a sequence.
You can use this SQL Server Fiddle to follow along and practice finding missing values in a sequence
Orders Data
We create a table to store orders data, inserts orders, and then verifies the inserted data by retrieving all rows from the table.
-- Create a sample Orders table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY
);
-- Insert some sample records with missing values
INSERT INTO Orders (OrderID) VALUES (1001), (1002), (1004), (1006), (1008);
SELECT * FROM Orders;
OrderID |
---|
1001 |
1002 |
1004 |
1006 |
1008 |
Find Missing Values in a Sequence
-- Use a recursive common table expression (CTE) to find missing values in the sequence
WITH RecursiveSequence
AS (SELECT MIN(OrderID) AS OrderID,
MIN(OrderID) AS StartRange,
MAX(OrderID) AS EndRange
FROM Orders
UNION ALL
SELECT OrderID + 1,
StartRange,
EndRange
FROM RecursiveSequence
WHERE OrderID < EndRange)
-- Select the missing values
SELECT rs.OrderID AS MissingOrderID
FROM RecursiveSequence rs
LEFT JOIN Orders o
ON rs.OrderID = o.OrderID
WHERE o.OrderID IS NULL
-- Set MAXRECURSION to 0 to avoid recursion limit
OPTION (MAXRECURSION 0);
The SQL query detects there are 3 missing OrderID values in the sequence.
MissingOrderID |
---|
1003 |
1005 |
1007 |
Let's walk through the provided T-SQL code step by step:
-
Create a recursive common table expression (CTE):
WITH RecursiveSequence
AS (SELECT MIN(OrderID) AS OrderID,
MIN(OrderID) AS StartRange,
MAX(OrderID) AS EndRange
FROM Orders
UNION ALL
SELECT OrderID + 1,
StartRange,
EndRange
FROM RecursiveSequence
WHERE OrderID < EndRange)- The
RecursiveSequence
CTE is created using a recursive structure. - The first part (before UNION ALL) selects the minimum and maximum order IDs from the
Orders
table. This serves as the initial range. - The second part (after UNION ALL) recursively selects the next order ID (
OrderID + 1
) while maintaining the same start and end range. The recursion continues until theOrderID
reaches theEndRange
.
- The
The RecursiveSequence CTE contains all OrderID values between 1001
and 1008
.
OrderID | StartRange | EndRange |
---|---|---|
1001 | 1001 | 1008 |
1002 | 1001 | 1008 |
1003 | 1001 | 1008 |
1004 | 1001 | 1008 |
1005 | 1001 | 1008 |
1006 | 1001 | 1008 |
1007 | 1001 | 1008 |
1008 | 1001 | 1008 |
-
Select the missing values:
SELECT rs.OrderID AS MissingOrderID
FROM RecursiveSequence rs
LEFT JOIN Orders o
ON rs.OrderID = o.OrderID
WHERE o.OrderID IS NULL
OPTION (MAXRECURSION 0);- The main query selects the
OrderID
from theRecursiveSequence
CTE asMissingOrderID
. - It performs a
LEFT JOIN
with theOrders
table on theOrderID
column. TheLEFT JOIN
is used to include all values from theRecursiveSequence
even if there is no match in theOrders
table. - The
WHERE
clause filters out the rows where there is a match (o.OrderID IS NULL
), indicating that the order ID is missing in the originalOrders
table.
- The main query selects the
-
OPTION (MAXRECURSION 0):
- This option is used to set the maximum recursion level to 0, effectively removing any limit on the number of recursions. It is added to avoid reaching the default recursion limit in case the sequence is large. Adjust this value based on your specific needs and the range of order numbers in your dataset.
Conclusion
In summary, the code recursively generates a sequence of order IDs and then identifies the missing order IDs by comparing them with the actual data in the Orders
table. The final result includes the order IDs that are in the generated sequence but not in the original Orders
table.