LeetCode 2199. Finding the Topic of Each Post SQL Solution
Problem
LeetCode SQL Problem
- Finding the Topic of Each Post
Posts table
post_id | content |
---|---|
1 | We call it soccer They call it football hahaha |
2 | Americans prefer basketball while Europeans love handball and football |
3 | stop the war and play handball |
4 | warning I planted some flowers this morning and then got vaccinated |
Keywords table
topic_id | word |
---|---|
1 | handball |
1 | football |
3 | WAR |
2 | Vaccine |
Firstable, we will split each post record into multiple rows to find out each post's topics. We will then combine each post's distinct topics to a comma-delimited string.
Solution
MySQL
- Use CHAR_LENGTH() and REPLACE() functions to find out the maximum number of words we need to split.
- Create a
Numbers
table dynamically in the memory using recursive CTE which contains numbers from 1 to the maximum number of words to we need to split - INNER JOIN
Numbers
andPosts
tables and utilize nested SUBSTRING_INDEX() calls to split each post into multiple keyword rows. - LEFT JOIN
PostKeywords
andKeywords
tables to output all posts including ones that do not belong to any topic. - Utilize GROUP_CONCAT() function to generated comma-delimited distinct topic string for each post.
- For posts that does not have any topic, GROUP_CONCAT() result will be NULL. In this case, we'll output 'Ambiguous!'.
TSQL
- Utilize Cross Apply and STRING_SPLIT() function to split each post into multiple keyword rows.
- LEFT JOIN
PostKeywords
andKeywords
tables to output distinctpost_id
andtopic_id
pairs including ones that do not belong to any topic. - Utilize STRING_AGG() function to generated comma-delimited distinct topic string for each post.
- For posts that does not have any topic, STRING_AGG() result will be NULL. In this case, we'll output 'Ambiguous!'.
- MySQL
- TSQL
SET @maxWords := (
SELECT max(CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE(content, ' ', '')))
FROM Posts
);
WITH recursive Numbers (n)
AS (
SELECT 1
UNION
SELECT n + 1
FROM Numbers
WHERE n <= @maxWords
),
PostKeywords
AS (
SELECT *,
SUBSTRING_INDEX(SUBSTRING_INDEX(p.content, ' ', Numbers.n), ' ', - 1) AS keyword
FROM Numbers
INNER JOIN Posts p ON CHAR_LENGTH(p.content) - CHAR_LENGTH(REPLACE(p.content, ' ', '')) + 1 >= Numbers.n
)
SELECT pk.post_id,
IFNULL(GROUP_CONCAT(DISTINCT k.topic_id ORDER BY topic_id SEPARATOR ','), 'Ambiguous!') AS topic
FROM PostKeywords pk
LEFT JOIN Keywords k ON pk.keyword = k.word
GROUP BY pk.post_id;
WITH PostKeywords
AS (
SELECT P.*
,value AS keyword
FROM Posts AS P
CROSS APPLY string_split(P.content, ' ')
)
,DistinctPostTopicPairs
AS (
SELECT DISTINCT PK.post_id
,K.topic_id
FROM PostKeywords AS PK
LEFT JOIN Keywords AS K ON PK.keyword = K.word
)
SELECT DPT.post_id
,ISNULL(STRING_AGG(DPT.topic_id, ',') WITHIN GROUP (
ORDER BY DPT.topic_id ASC
), 'Ambiguous!') AS topic
FROM DistinctPostTopicPairs AS DPT
GROUP BY DPT.post_id;
Query Output
post_id | topic |
---|---|
1 | 1 |
2 | 1 |
3 | 1,3 |
4 | Ambiguous! |