Skip to main content

LeetCode 2199. Finding the Topic of Each Post SQL Solution

Problem

LeetCode SQL Problem

  1. Finding the Topic of Each Post

Posts table

post_idcontent
1We call it soccer They call it football hahaha
2Americans prefer basketball while Europeans love handball and football
3stop the war and play handball
4warning I planted some flowers this morning and then got vaccinated

Keywords table

topic_idword
1handball
1football
3WAR
2Vaccine

Firstable, we will split each post record into multiple rows to find out each post's topics. We will then combine each post's distinct topics to a comma-delimited string.

Solution

MySQL

  • Use CHAR_LENGTH() and REPLACE() functions to find out the maximum number of words we need to split.
  • Create a Numbers table dynamically in the memory using recursive CTE which contains numbers from 1 to the maximum number of words to we need to split
  • INNER JOIN Numbers and Posts tables and utilize nested SUBSTRING_INDEX() calls to split each post into multiple keyword rows.
  • LEFT JOIN PostKeywords and Keywords tables to output all posts including ones that do not belong to any topic.
  • Utilize GROUP_CONCAT() function to generated comma-delimited distinct topic string for each post.
  • For posts that does not have any topic, GROUP_CONCAT() result will be NULL. In this case, we'll output 'Ambiguous!'.

TSQL

  • Utilize Cross Apply and STRING_SPLIT() function to split each post into multiple keyword rows.
  • LEFT JOIN PostKeywords and Keywords tables to output distinct post_id and topic_id pairs including ones that do not belong to any topic.
  • Utilize STRING_AGG() function to generated comma-delimited distinct topic string for each post.
  • For posts that does not have any topic, STRING_AGG() result will be NULL. In this case, we'll output 'Ambiguous!'.
SET @maxWords := (
SELECT max(CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE(content, ' ', '')))
FROM Posts
);

WITH recursive Numbers (n)
AS (
SELECT 1

UNION

SELECT n + 1
FROM Numbers
WHERE n <= @maxWords
),
PostKeywords
AS (
SELECT *,
SUBSTRING_INDEX(SUBSTRING_INDEX(p.content, ' ', Numbers.n), ' ', - 1) AS keyword
FROM Numbers
INNER JOIN Posts p ON CHAR_LENGTH(p.content) - CHAR_LENGTH(REPLACE(p.content, ' ', '')) + 1 >= Numbers.n
)
SELECT pk.post_id,
IFNULL(GROUP_CONCAT(DISTINCT k.topic_id ORDER BY topic_id SEPARATOR ','), 'Ambiguous!') AS topic
FROM PostKeywords pk
LEFT JOIN Keywords k ON pk.keyword = k.word
GROUP BY pk.post_id;

Query Output

post_idtopic
11
21
31,3
4Ambiguous!