【LeetCode】SQL 50 - 50題SQL 應用筆記 | TAKA的學習筆記 - 點部落

2025-02-18

【LeetCode】SQL 50 - 50題SQL 應用筆記

112
0
SQL
2025-06-27

記錄練習LeetCode SQL 50 挑戰時學習到的新知識

Select

[ Easy] 584. Find Customer Referee

PostgreSQL 中想要比對某欄位不是特定非NULL 值的時候，!= 或 <> 有一個限制：無法比對NULL 值。
假設WHERE 條件："column" != 'searchstring'，則column 為NULL 的資料行會被漏抓。
需使用 "column" <> 'searchstring' OR "column" is null或 "column" is distinct from 'searchstring'

Reference:
https://stackoverflow.com/questions/36508815/not-equal-and-null-in-postgres

Basic Joins

[ Easy] 197. Rising Temperature

PostgreSQL 中的資料型態date 可以直接做加減運算。例如 date + 1 代表加一天，回傳值的資料型態為date。
如果是需要加減其他的時間單位，如：年、月、小時，則需要用INTERVAL，如 + INTERVAL '1 hour'，回傳值的資料型態為timestamp。

[ Easy] 1280. Students and Examinations

select * from A, B;查詢結果相等於 select * from A a cross join B b;。如果後續要分別用到A 或B 的欄位做比較，可以用cross join，因為, 方式無法指定你的column 是來自哪個table。

[ Medium] 1934. Confirmation Rate

主表: Signups，子表: Confirmations，資料關聯: 一對多。
在關聯一對多且子表可能為空的狀況下，使用LEFT JOIN，可以找出Confirmations.action = 'confirmed' 出現的總次數作為分子。
但是有一個盲點是，若子表為空，則用子表出現的數量當分母並不正確，會出現子表 count(c.*)=0 ，而分母為0 的例外錯誤。
換個方向思考 0/1 的比率也是0，所以分母改算LEFT JOIN 後主表出現的次數，因為對user_id = 6 來說，必有一Row 會是c=null 的紀錄。
故分母改採主表 Signups 出現的次數，來算出比例。

select s.user_id,
    round(
        sum(case when c.action='confirmed' then 1 else 0 end)/count(s.*)::decimal
    , 2) as confirmation_rate 
from Signups s
left join Confirmations c on s.user_id = c.user_id
group by s.user_id

以下是他人分享的答案，看起來是AVG 函數已經幫我把設想到的情況處理掉了

SELECT 
    s.user_id,
    ROUND(
    	AVG(CASE WHEN c.action = 'confirmed' THEN 1 ELSE 0 END)
    , 2) AS confirmation_rate
FROM Signups s
    LEFT JOIN Confirmations c
    ON s.user_id = c.user_id
GROUP BY s.user_id

Basic Aggregate Functions

[ Easy] 1251. Averagae Selling Price

JOIN 要記得考慮子表為空的狀況，此題算平均也無法用AVG ，故首先想到的是用CASE-WHEN 來避免分母為零的狀況


select p.product_id, 
case when sum(units) <> 0 then round(sum(p.price*u.units)/sum(u.units)::decimal,2) 
else 0 end as average_price 
from Prices p
left join UnitsSold u on p.product_id = u.product_id 
    and (u.purchase_date between p.start_date and p.end_date)
group by p.product_id

除了CASE-WHEN，還有另外一個語法：COALESCE可以接很多參數，照參數的排列依序判斷，一遇到不為空的參數，就回傳該參數的值
https://docs.postgresql.tw/the-sql-language/functions-and-operators/conditional-expressions

SELECT COALESCE(description, short_description, '(none)', ...)

使用COALESCE則結果如下

-- 當u.units 沒資料 -> sum=null -> 除法=null -> round=null -> coalesce 取非第二個參數非null 值的 0
select p.product_id, 
COALESCE(round(sum(p.price*u.units)/sum(u.units)::decimal,2), 0) as average_price 
from Prices p
left join UnitsSold u on p.product_id = u.product_id 
    and (u.purchase_date between p.start_date and p.end_date)
group by p.product_id

關鍵在於:聚合函數遇到null 會回傳null，而非0，所以避免了sum(u.units) 可能會發生除以0 的錯誤。
*備註: 聚合函數(Aggregate Functions) 其實也就是這系列題目的主題，例如:SUM, AVG, MAX 等等，通常是配合group by 使用的函數。

[ Easy] 1211. Queries Quality and Rercentage

這次學到 WITH xxx AS() 的用法，多個用逗號隔開

with poor as (
    select query_name, count(*) as count from Queries 
    where rating < 3
    group by query_name 
),
quality as (
    select q.query_name, round(avg(rating/position::numeric) , 2) as quality, count(q.*) as count
    from Queries q
    group by q.query_name
)
select q.query_name, q.quality, coalesce(round(p.count*100/q.count::numeric, 2), 0) as poor_query_percentage
from quality q 
left join poor p on q.query_name = p.query_name

但是看到別人有更簡潔的寫法：

與100.0 運算就不用轉型
聚合函數裡面可以寫簡單的判斷 (CASE-WHEN)
承上，所以不需要為了得到另外一個需要條件過濾數字，而在query 成一張temp table

SELECT 
    query_name,
    ROUND(AVG(CAST(rating AS NUMERIC) / position), 2) AS quality,
    ROUND(
        100.0 * SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) / COUNT(*), 
        2
    ) AS poor_query_percentage
FROM 
    Queries
GROUP BY 
    query_name;

[ Medium] 1193. Monthly Transactions I

with extract as (
    select substr(trans_date::varchar, 0 , 8) as month, 
        country, state, amount from Transactions t
)
select month, country, count(*) as trans_count,
sum(case when state = 'approved' then 1 else 0 end) as approved_count,
sum(amount) as trans_total_amount,
sum(case when state = 'approved' then amount else 0 end) as approved_total_amount
from extract
group by month, country

如果在query 中where 條件想要加工欄位，也希望這個加工後的欄位要被select 出來。如果在where 跟select 都寫，會運算兩次。所以用WITH-AS 可以避免這個問題。如以下SQL

select substr(trans_date::varchar, 0 , 8) as month
 from Transactions t
 where substr(trans_date::varchar, 0 , 8) = '2025-03'

Advanced Select and Joins

[ Medium] 180. Consecutive Numbers

以下是我的直覺寫法

select distinct(l1.num) as ConsecutiveNums from Logs l1
join Logs l2 on l1.id+1 = l2.id
join Logs l3 on l2.id+1 = l3.id
where l1.num = l2.num and l2.num = l3.num

看到別人的寫法

SELECT DISTINCT(num) AS ConsecutiveNums
FROM (
    SELECT num,
           LAG(num, 1) OVER (ORDER BY id) AS prev1,
           -- LAG 取前一筆的num，基於id asc 排列
           LAG(num, 2) OVER (ORDER BY id) AS prev2
           -- LAG 取前二筆的num，基於id asc 排列
    FROM Logs
) AS subquery
WHERE num = prev1 AND num = prev2

與LAG 相對的是LEAD。與JOIN 的差別在於，如果id 不連續，很明顯JOIN 的撈法就會失效；另外一個是JOIN 的效能會比較差。

[ Medium] 1164. Product Price at a Given Date

依照題目敘述：資料表代表某些Product 的價錢在指定的日期有變動。又假設產品未變更的價格為10。就聯想到資料表Products 有可能是空的。但是並沒有明確指出是否有另外一張資料表存著所有產品。以下是我的答案

-- Write your PostgreSQL query statement below
-- 找出指定日期以前，產品價格有變動的最後一個日期
with lastChangeDate as (
select product_id, max(change_date) as change_date from Products 
where change_date <= '2019-08-16'
group by product_id
),
-- 找出全部Product 的Id。如上面所說，若真的有缺漏狀況，只要有完整產品列表，我的SQL 是可以解決的
allProductIds as (
    select distinct(product_id) from Products 
)
select a.product_id, 
case when l.change_date is null then 10 else p.new_price end as price
 from allProductIds a
left join lastChangeDate l on a.product_id = l.product_id
left join Products p on l.product_id = p.product_id and l.change_date = p.change_date;

以下是擷取別人的答案，他的撈法跟我的差異在這一句，找出最大的日期。
看來這一個session: 進階select & join，要使用function 去解決這類問題。

-- 我用group by + max 取出每個product 的取出最大的變動日期
-- 他用partition 幫變動日期做排序並給予編號(rank)，最後需要自己挑選ranking = 1 的row
rank() over(partition by product_id order by change_date desc) as ranking

[ Medium] 1204. Last Person to Fit in the Bus

看到需要累加，一時間不知怎麼寫。後來想到可以用join 找出前面上車乘客的重量，再全部group by turn + sum。

-- Write your PostgreSQL query statement belows
-- 找出前面上車乘客的重量，算出每個乘客上車時的總重量
with weights as (
select q.turn, sum(q2.weight) as totalWeight from Queue q
join Queue q2 on q.turn >= q2.turn
group by q.turn
order by q.turn
)
select q.person_name from weights w 
join Queue q on q.turn = w.turn
where w.totalweight <= 1000
order by w.totalweight desc 
limit 1;

看看別人的答案吧，用over(order by )可以簡單做到排序累加的效果。

select person_name from 
  (select person_name, sum(weight) over(order by turn) as total_weight from queue) 
where total_weight<=1000 
order by total_weight desc 
limit 1;

Subqueries

[ Easy ] 1978. Employees Whose Manager Left the Company

比較JOIN、WHERE-IN、EXISTS 的差異

JOIN: 有索引、需要取出多資料表欄位。
WHERE-IN: select 子句資料量少
EXISTS: 只需要檢查是否存在，效率會比WHERE-IN 好

[ Medium] 626. Exchange Seats

-- 用窗口函數LAG 跟LEAD 來抓前後一筆資料的id。也因為是窗口函數，如果抓不到會是null。
with lagAndLead as (
    select id, 
        LAG(id) OVER (ORDER BY id) AS previous_value,
        LEAD(id) OVER (ORDER BY id) AS next_value,
        student     
    from Seat
)

-- 最後再用case-when 做交換的判斷邏輯
select 
    case 
        when id%2=1 and next_value is not null then next_value
        when id%2=0 and previous_value is not null then previous_value
        else id end as id,
    student
from lagAndLead
order by id;

-- 要改良的話也可以把兩個查詢合在一起，然後只拿出必要的值，不要一次拿出前後兩個id
SELECT id,
CASE
	THEN id % 2 != 0 THEN LEAD(student, 1, student) OVER (ORDER BY id) 
	ELSE LAG(student, 1) OVER (ORDER BY id) 
	END AS student 
FROM Seat;

有關LEAD跟LAD的用法是: lead(欄位, integer:取第幾筆, default value:如果取不到的預設值)
詳細用法可參考: https://www.postgresql.org/docs/8.4/functions-window.html

[ Medium] 1321. Restaurant Growth

在假定天數會連續的狀況下，至少要從第七天開始，才能計算。所以先抓出包含第七天開始，不重複的日期。
再用範圍式的JOIN，用每個日期所符合的範圍，抓出資料做運算。

-- Write your PostgreSQL query statement below
with dates as (
select distinct(c.visited_on) from Customer c 
order by c.visited_on
offset 6
)
select c1.visited_on, sum(c2.amount) as amount, 
    round(sum(c2.amount)/7.0, 2) as average_amount from dates c1 
join Customer c2 on (c2.visited_on between c1.visited_on-6 and c1.visited_on)
group by c1.visited_on;

當然也可以用窗口函數計算

-- Write your PostgreSQL query statement below
SELECT visited_on
     , SUM(SUM(amount))
         OVER (ORDER BY visited_on
               ROWS BETWEEN 6 PRECEDING  AND CURRENT ROW)
         AS amount
     , round(AVG(SUM(amount))
               OVER (ORDER BY visited_on
                     ROWS BETWEEN 6 PRECEDING  AND CURRENT ROW)
            ,2)
         AS average_amount
  FROM customer c
 GROUP BY visited_on
 ORDER BY visited_on
 OFFSET 6 ROWS;

[ Medium] 602. Friend Requests II: Who Has the Most Friends

這邊學到兩點

UNION 跟UNION ALL 的差別是: UNION 會過濾重複資料，UNION ALL 則不過濾。
SQL執行順序 FROM > WHERE > GROUP BY > HAVING > SELECT > ORDER BY > LIMIT/OFFSET。

[ Hard] 185. Department Top Three Salaries

50 題裡面唯一一題難度是Hard 的。要拿出每個部門前三高薪水的人，tricky 的是可能會有人的薪水是一樣的，所以不能直接取最高三個。
我自己的思路是：用CTE 呼叫rank() over PARTITION 找出每個部門前三高的薪水，再用部門跟薪水數字去找出相對應的人。

-- Write your PostgreSQL query statement below
with top3SalaryByDepart as (
select departmentid, salary, rank from (
    select *, rank() over (PARTITION BY t.departmentid order by t.salary desc) from 
        (select e.departmentId, e.salary from Employee e
        join Department d on e.departmentId = d.id
        group by e.departmentId, e.salary
        order by e.departmentId, e.salary desc) t
    ) t2
where rank < 4
)
select d.name as "Department", e.name as "Employee", e.salary as "Salary"
from top3SalaryByDepart t
join Department d on t.departmentid = d.id
join Employee e on t.departmentid = e.departmentid and t.salary = e.salary;

實作中有遇到一個問題：rank() + oder by column 如果遇到column 值相同時，排名數字會不連續，即排名數字可能出現：1, 2, 2, 4 這種狀況。
經研究發現其實也有好用的窗口函數。PostgreSQL 支援四種排名函數：

ROW_NUMBER：為每一列分配一個唯一的連續編號。
RANK：為每列分配排名，可以指定排名條件，值相同的列會有相同排名，但後續排名會跳號，如上面所述。
DENSE_RANK：適合本次使用的函數，不跳號的RANK，如上述相同狀況時，排名數字為：1, 2, 2, 3。
NTILE(n)：將SELECT 結果平均分成 n 組

如此上面複雜的SQL，就可以簡化

SELECT d.name AS "Department", e.name AS "Employee", e.salary AS "Salary"
FROM (
  SELECT *,
         DENSE_RANK() OVER (PARTITION BY departmentId ORDER BY salary DESC) AS rnk
  FROM Employee
) e
JOIN Department d ON e.departmentId = d.id
WHERE rnk <= 3;

Advanced String Functions / Regex / Clause

[ Easy ] 1667. Fix Names in a Table

字串串接，除了可以用CONCAT(string1, string2…) 之外，也可以用string1||string2。

[ Easy ] 196. Delete Duplicate Emails

WHERE (NOT) EXISTS (SELECT ……)

DELETE FROM Person P2
USING Person P1
WHERE P1.email = P2.email
  AND P1.id < P2.id;

[ Easy ] 1484. Group Sold Products By The Date

count(DISTINCT a.column)
STRING_AGG(DISTINCT a.column order by a.id)

[ Easy ] 1327. List the Products Ordered in a Period

'BETWEEN a and b' the same as 'column>=a and column<=b'

回首頁

TAKA的學習筆記

錢要花在刀口上，時間亦然。Since 2017/12/16