Skip to main content

MySQL - Select Top N Rows Per Group

It is a very common task to select top N rows from each group in MySQL when doing data analysis and reporting. You might use this to get the top 3 selling products for each product category, get the highest paid employee in each department, find out the top 10 popular posts in your blog, and so on. We will go over modern SQL techniques of answering top N rows per group questions using Row_Number, Rank, and Dense_Rank window functions.

Example Data


A school has collected student score data for Math and English classes. We would like to perform analysis of top student scores in each class.

create table student_score (
class_name VARCHAR(9),
student_login VARCHAR(50),
score INT
);

insert into student_score (class_name, student_login, score) values ('math', 'gacreman0', 100);
insert into student_score (class_name, student_login, score) values ('english', 'ewoodington1', 87);
insert into student_score (class_name, student_login, score) values ('math', 'ctilliard2', 72);
insert into student_score (class_name, student_login, score) values ('english', 'gzorer3', 81);
insert into student_score (class_name, student_login, score) values ('english', 'swyre4', 75);
insert into student_score (class_name, student_login, score) values ('math', 'ppadgett8', 82);
insert into student_score (class_name, student_login, score) values ('math', 'mreynalds9', 100);
insert into student_score (class_name, student_login, score) values ('math', 'dlettena', 100);
insert into student_score (class_name, student_login, score) values ('math', 'dbartellib', 86);
insert into student_score (class_name, student_login, score) values ('english', 'tlorinezc', 87);
insert into student_score (class_name, student_login, score) values ('english', 'ddeftied', 81);

Use

DB-Fiddle to execute SQL scripts on sample data.

Select Top N Rows Per Group with Row_Number


If the business requirements are to always return exactly 3 top students in each class, Row_Number window function should be used. When there are tied scores, sort student_login ascendingly to determine which student_login to pick. Since the PARTITION BY clause is present, the rankings are reset for rows in each class.

SELECT *
,row_number() OVER (
PARTITION BY class_name ORDER BY score DESC
,student_login ASC
) AS row_num
FROM student_score

Find top 3 students per class using Row_Number window function

Here is the completed script to get exactly 3 top students in each class with Row_Number.

WITH score_analysis
AS (
SELECT *,
row_number() OVER (
PARTITION BY class_name ORDER BY score DESC,
student_login ASC
) AS row_num
FROM student_score
)
SELECT class_name,
student_login,
score
FROM score_analysis
WHERE row_num <= 3;

Here is the output of the query.

class_namestudent_loginscore
englishewoodington187
englishtlorinezc87
englishddeftied81
mathdlettena100
mathgacreman0100
mathmreynalds9100

Select Top N Rows Per Group with Dense_Rank


If the business requirements are to always report a first, second, and third place for each class, regardless of score ties for each place, Dense_Rank window function is required to achieve this. Each place could have 1 or multiple students.

If two or more rows tie, there will be no gap in the sequence of ranked values. Since the PARTITION BY clause is present, the rankings are reset for rows in each class.

SELECT *
,dense_rank() OVER (
PARTITION BY class_name ORDER BY score DESC
) AS dense_rank_num
FROM student_score

Find students in top 3 places per class using Dense_Rank window function Here is the completed script to always have first, second, and third place students in each class with Dense_Rank.

WITH score_analysis
AS (
SELECT *,
dense_rank() OVER (
PARTITION BY class_name ORDER BY score DESC
) AS dense_rank_num
FROM student_score
)
SELECT class_name,
student_login,
score
FROM score_analysis
WHERE dense_rank_num <= 3;

Here is the output of the query.

class_namestudent_loginscore
englishewoodington187
englishtlorinezc87
englishgzorer381
englishddeftied81
englishswyre475
mathgacreman0100
mathmreynalds9100
mathdlettena100
mathdbartellib86
mathppadgett882

Select Top N Rows Per Group with Rank


If the business requirements are to allow the possibility of no second or third place, Rank function can be used to achieve this. Each place could have 0, 1, or multiple students.

Rows with equal scores for the ranking criteria receive the same rank. The next rank is calculated by adding the number of tied rows to the tied rank. Therefore, Rank function will leave gaps in the ranking values. Since the PARTITION BY clause is present, the rankings are reset for rows in each class.

SELECT *
,rank() OVER (
PARTITION BY class_name ORDER BY score DESC
) AS rank_num
FROM student_score

Example 1: English class has 2 scores tied at first place so there is no second place. But there is a third place with 2 scores tied.

Example 2: Math class has 3 scores tied at first place, there will be no second or third place.

Find students in top 3 places per class using Dense_Rank window function Here is the completed script to always have first, second, and third place students in each class with Dense_Rank.

WITH score_analysis
AS (
SELECT *,
rank() OVER (
PARTITION BY class_name ORDER BY score DESC
) AS rank_num
FROM student_score
)
SELECT class_name,
student_login,
score
FROM score_analysis
WHERE rank_num <= 3;

Here is the output of the query.

class_namestudent_loginscore
englishewoodington187
englishtlorinezc87
englishgzorer381
englishddeftied81
mathgacreman0100
mathmreynalds9100
mathdlettena100

Conclusion


Hopefully through the examples discussed above, you can see that when a tie of two or more scores occurs, both Rank and Dense_Rank assign the same rank number to all records with the same value.

Where they differ is that Rank function continues the rank value in line with the Row_Number series so it skips the next available ranking value. This results in gaps in the ranking values.

Dense_Rank will instead use the next chronological ranking value so it leaves no gaps in the ranking values. Find top students with Row_Number, Rank, and Dense_Rank window functions

As you can see that Row_Number, Rank, and Dense_Rank window functions are very useful in analyzing data and answering Top N Rows Per Group types of questions. It's good to have a good understanding on each of them so you can determine which one to use based on the business requirements.