Skip to main content

MySQL - Gap Analysis - Find Gaps in Number Sequence

Understanding how to identify gaps is useful in data analysis. In this post, we will go over different techniques of using Self Join and Window Functions Lead() and Lag() to identify gaps in a number sequence. Instead of simply outputting a list of missing values, gaps will be presented as a range so it is more useful.

Problem


The sample Test numbers table contains 3 gaps. We need to create a SQL script to identify them.

number
1
2
5
6
7
8
9
10
12
13
14
15
20

Execution of the SQL script should output lower and upper bound of each gap.

gap_startgap_end
34
1111
1619

Example Data


create table Test (
number INT
);

insert into Test (number) values (1);
insert into Test (number) values (2);
-- Gap 1
insert into Test (number) values (5);
insert into Test (number) values (6);
insert into Test (number) values (7);
insert into Test (number) values (8);
insert into Test (number) values (9);
insert into Test (number) values (10);
-- Gap 2
insert into Test (number) values (12);
insert into Test (number) values (13);
insert into Test (number) values (14);
insert into Test (number) values (15);
-- Gap 3
insert into Test (number) values (20);

Use

DB-Fiddle to execute SQL scripts on sample data.

Solution - Find Gaps Using Self Join


Our solution should identify 3 gaps in the test data for rows where a row value does not sequentially follow another.

1
2
5
6
7
8
9
10
12
13
14
15
20

We can left join the table with itself on a join key offset by 1 to check if there is a gap between the current number and the next number.

SELECT l.number
,r.number
FROM Test l
LEFT JOIN Test r ON l.number = r.number - 1 -- Do we have a next number in the sequence that's 1 greater than the current number?
WHERE l.number != (
SELECT max(number) -- Gap will never start after the maximum sequence number
FROM Test
)
AND r.number IS NULL -- Gap identified

Use Self Join to Find Island Gap Starts

We can left join the table with itself again on a join key offset by 1 to check if there is a gap between the current number and the previous number.

SELECT l.number
,r.number
FROM Test l
LEFT JOIN Test r ON l.number = r.number + 1 -- Do we have a previous number in the sequence that's 1 less than the current number?
WHERE l.number != (
SELECT min(number) -- Gap will never end before the minimum sequence number
FROM Test
)
AND r.number IS NULL -- Gap identified

Use Self Join to Find Island Gap Ends

After getting gap start points and end points, we can use a subquery to put together the upper bound and the lower bound of each gap in the same row.

WITH gap_starts
AS (
-- The end of consecutive numbers + 1 is the start of the gap
SELECT l.number + 1 AS gap_start
FROM Test l
LEFT JOIN Test r ON l.number = r.number - 1
WHERE l.number != (
SELECT max(number)
FROM Test
)
AND r.number IS NULL
)
,gap_ends
AS (
-- The start of consecutive numbers - 1 is the end of the gap
SELECT l.number - 1 AS gap_end
FROM Test l
LEFT JOIN Test r ON l.number = r.number + 1
WHERE l.number != (
SELECT min(number)
FROM Test
)
AND r.number IS NULL
)
SELECT *
,(
SELECT e.gap_end
FROM gap_ends e
WHERE e.gap_end >= s.gap_start
ORDER BY e.gap_end limit 1
) AS gap_end
FROM gap_starts s;

Solution - Find Gaps Using Lead Window Function


Using Window function to identify gaps can make your codes more intuitive, succint, and readable to other developers.

By appying Lead() function we can have a pair of the current number and the next number in the sequence for each row.

SELECT t.number
,lead(t.number, 1) OVER (
ORDER BY t.number
) AS next_number
FROM Test t

A gap is identified when the difference between the current and the next number is greater than 1.

Use Lead Window Function to Find Gaps

By filtering only pairs where the difference between the two is greater than 1, add 1 to the current number and substract 1 from the next number to generate gap information.

WITH gap_analysis
AS (
SELECT t.number
,lead(t.number, 1) OVER (
ORDER BY t.number
) AS next_number
FROM Test t
)
SELECT g.number + 1 AS gap_start
,g.next_number - 1 AS gap_end
FROM gap_analysis g
WHERE g.next_number - g.number > 1;

Solution - Find Gaps Using Lag Window Function


We could modify the above query slightly to use Lag() function to produce a pair of the current number and the previous number in the sequence for each row.

SELECT t.number
,lag(t.number, 1) OVER (
ORDER BY t.number
) AS prev_number
FROM Test t

A gap is identified when the difference between the current and the previous number is greater than 1.

Use Lag Window Function to Find Gaps

By filtering only pairs where the difference between the two is greater than 1, add 1 to the previous number and substract 1 from the current number to generate gap information.

WITH gap_analysis
AS (
SELECT t.number
,lag(t.number, 1) OVER (
ORDER BY t.number
) AS prev_number
FROM Test t
)
SELECT g.prev_number + 1 AS gap_start
,g.number - 1 AS gap_end
FROM gap_analysis g
WHERE g.number - g.prev_number > 1;