MySQL - Select Latest Record for Each Group
Very often we need to select the most recent record or get the latest record for each date, user, id or any other group. There are different ways to approach this problem, we'll review MySQL queries for 3 different solutions.
Example Data
CREATE TABLE UserLogin
(
user_id varchar(300),
login_date varchar(300)
);
INSERT INTO UserLogin (user_id,login_date) VALUES ('100', '2021-12-14');
INSERT INTO UserLogin (user_id,login_date) VALUES ('100', '2021-12-18');
INSERT INTO UserLogin (user_id,login_date) VALUES ('100', '2021-12-15');
INSERT INTO UserLogin (user_id,login_date) VALUES ('100', '2022-12-19');
INSERT INTO UserLogin (user_id,login_date) VALUES ('102', '2021-12-31');
INSERT INTO UserLogin (user_id,login_date) VALUES ('102', '2022-01-01');
INSERT INTO UserLogin (user_id,login_date) VALUES ('102', '2022-01-15');
INSERT INTO UserLogin (user_id,login_date) VALUES ('102', '2023-01-15');
INSERT INTO UserLogin (user_id,login_date) VALUES ('102', '2022-01-15');
INSERT INTO UserLogin (user_id,login_date) VALUES ('102', '2022-01-15');
Use
DB-Fiddle to execute SQL scripts on sample data.
user_id | login_date |
---|---|
101 | 2021-12-14 |
101 | 2021-12-18 |
101 | 2021-12-15 |
101 | 2021-12-19 |
102 | 2021-12-31 |
102 | 2022-01-01 |
102 | 2022-01-15 |
102 | 2022-01-15 |
Solution 1: Group By and MAX() Aggregate Function
We perform a GROUP BY user_id, we will get distinct user_ids, and using MAX(login_date) we will get the maximum date (latest login date) for that user_id.
SELECT UL.user_id
,MAX(UL.login_date) AS latest_login_date
FROM UserLogin AS UL
GROUP BY UL.user_id;
user_id | latest_login_date |
---|---|
100 | 2022-12-19 |
102 | 2023-01-15 |
Solution 2: Subquery
For each row, we use a subquery to find the latest login date for each user_id. We then use DISTINCT keyword in the SELECT statement so we don't return duplicate user_ids.
SELECT DISTINCT UL.user_id
,(
SELECT MAX(ULS.login_date)
FROM UserLogin AS ULS
WHERE ULS.user_id = UL.user_id
) AS latest_login_date
FROM UserLogin AS UL;
user_id | latest_login_date |
---|---|
100 | 2022-12-19 |
102 | 2023-01-15 |
Solution 3: ROW_NUMBER() Window Function
We use ROW_NUMBER() Window Function to partition data into window frames by user_id and order by login_date in descending order. The latest login date for each user_id will have a row_num value of 1. The intermediate analysis result is stored in user_login_analysis CTE. We then query the CTE to get the latest record for each user_id group by retrieving records that have a row_num value of 1.
WITH user_login_analysis
AS (
SELECT UL.user_id
,UL.login_date
,ROW_NUMBER() OVER (
PARTITION BY UL.user_id ORDER BY UL.login_date DESC
) AS row_num
FROM UserLogin AS UL
)
SELECT user_id
,login_date AS latest_login_date
FROM user_login_analysis
WHERE row_num = 1;
user_id | latest_login_date |
---|---|
100 | 2022-12-19 |
102 | 2023-01-15 |
Conclusion
Even though we can solve this problem in many different ways, with a simple scenario such as this, I would probably go with solution 1 to select the latest login date for each user_id group. The code is simpler, clean, and super readable.