記錄練習LeetCode SQL 50 挑戰時學習到的新知識
Select
[ Easy] 584. Find Customer Referee
PostgreSQL 中想要比對某欄位不是特定非NULL 值的時候,!=
或 <>
有一個限制:無法比對NULL 值。
假設WHERE 條件:"column" != 'searchstring'
,則column 為NULL 的資料行會被漏抓。
需使用 "column" <> 'searchstring' OR "column" is null
或 "column" is distinct from 'searchstring'
Reference:
https://stackoverflow.com/questions/36508815/not-equal-and-null-in-postgres
Basic Joins
[ Easy] 197. Rising Temperature
PostgreSQL 中的資料型態date
可以直接做加減運算。例如 date + 1
代表加一天,回傳值的資料型態為date
。
如果是需要加減其他的時間單位,如:年、月、小時,則需要用INTERVAL
,如 + INTERVAL '1 hour'
,回傳值的資料型態為timestamp
。
[ Easy] 1280. Students and Examinations
select * from A, B;
查詢結果相等於 select * from A a cross join B b;
。如果後續要分別用到A 或B 的欄位做比較,可以用cross join,因為,
方式無法指定你的column 是來自哪個table。
[ Medium] 1934. Confirmation Rate
/26c577da-5082-4f60-be9a-072c9a083740/1740386655.png.png)
主表: Signups
,子表: Confirmations
,資料關聯: 一對多。
在關聯一對多且子表可能為空的狀況下,使用LEFT JOIN
,可以找出Confirmations.action = 'confirmed'
出現的總次數作為分子。
但是有一個盲點是,若子表為空,則用子表出現的數量當分母並不正確,會出現子表 count(c.*)=0
,而分母為0 的例外錯誤。
換個方向思考 0/1 的比率也是0,所以分母改算LEFT JOIN
後 主表出現的次數,因為對user_id = 6
來說,必有一Row 會是c=null
的紀錄。
故分母改採主表 Signups
出現的次數,來算出比例。
select s.user_id,
round(
sum(case when c.action='confirmed' then 1 else 0 end)/count(s.*)::decimal
, 2) as confirmation_rate
from Signups s
left join Confirmations c on s.user_id = c.user_id
group by s.user_id
以下是他人分享的答案,看起來是AVG 函數已經幫我把設想到的情況處理掉了
SELECT
s.user_id,
ROUND(
AVG(CASE WHEN c.action = 'confirmed' THEN 1 ELSE 0 END)
, 2) AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c
ON s.user_id = c.user_id
GROUP BY s.user_id
Basic Aggregate Functions
[ Easy] 1251. Averagae Selling Price
JOIN 要記得考慮子表為空的狀況,此題算平均也無法用AVG ,故首先想到的是用CASE-WHEN 來避免分母為零的狀況
select p.product_id,
case when sum(units) <> 0 then round(sum(p.price*u.units)/sum(u.units)::decimal,2)
else 0 end as average_price
from Prices p
left join UnitsSold u on p.product_id = u.product_id
and (u.purchase_date between p.start_date and p.end_date)
group by p.product_id
除了CASE-WHEN,還有另外一個語法:COALESCE
可以接很多參數,照參數的排列依序判斷,一遇到不為空的參數,就回傳該參數的值
https://docs.postgresql.tw/the-sql-language/functions-and-operators/conditional-expressions
SELECT COALESCE(description, short_description, '(none)', ...)
使用COALESCE
則結果如下
-- 當u.units 沒資料 -> sum=null -> 除法=null -> round=null -> coalesce 取非第二個參數非null 值的 0
select p.product_id,
COALESCE(round(sum(p.price*u.units)/sum(u.units)::decimal,2), 0) as average_price
from Prices p
left join UnitsSold u on p.product_id = u.product_id
and (u.purchase_date between p.start_date and p.end_date)
group by p.product_id
關鍵在於:聚合函數遇到null 會回傳null,而非0,所以避免了sum(u.units) 可能會發生除以0 的錯誤。
*備註: 聚合函數(Aggregate Functions) 其實也就是這系列題目的主題,例如:SUM, AVG, MAX 等等,通常是配合group by 使用的函數。
[ Easy] 1211. Queries Quality and Rercentage
這次學到 WITH xxx AS()
的用法,多個用逗號隔開
with poor as (
select query_name, count(*) as count from Queries
where rating < 3
group by query_name
),
quality as (
select q.query_name, round(avg(rating/position::numeric) , 2) as quality, count(q.*) as count
from Queries q
group by q.query_name
)
select q.query_name, q.quality, coalesce(round(p.count*100/q.count::numeric, 2), 0) as poor_query_percentage
from quality q
left join poor p on q.query_name = p.query_name
但是看到別人有更簡潔的寫法:
- 與100.0 運算就不用轉型
- 聚合函數裡面可以寫簡單的判斷 (CASE-WHEN)
- 承上,所以不需要為了得到另外一個需要條件過濾數字,而在query 成一張temp table
SELECT
query_name,
ROUND(AVG(CAST(rating AS NUMERIC) / position), 2) AS quality,
ROUND(
100.0 * SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) / COUNT(*),
2
) AS poor_query_percentage
FROM
Queries
GROUP BY
query_name;
[ Medium] 1193. Monthly Transactions I
with extract as (
select substr(trans_date::varchar, 0 , 8) as month,
country, state, amount from Transactions t
)
select month, country, count(*) as trans_count,
sum(case when state = 'approved' then 1 else 0 end) as approved_count,
sum(amount) as trans_total_amount,
sum(case when state = 'approved' then amount else 0 end) as approved_total_amount
from extract
group by month, country
如果在query 中where 條件想要加工欄位,也希望這個加工後的欄位要被select 出來。如果在where 跟select 都寫,會運算兩次。所以用WITH-AS 可以避免這個問題。如以下SQL
select substr(trans_date::varchar, 0 , 8) as month
from Transactions t
where substr(trans_date::varchar, 0 , 8) = '2025-03'
Advanced Select and Joins
[ Medium] 180. Consecutive Numbers
以下是我的直覺寫法
select distinct(l1.num) as ConsecutiveNums from Logs l1
join Logs l2 on l1.id+1 = l2.id
join Logs l3 on l2.id+1 = l3.id
where l1.num = l2.num and l2.num = l3.num
看到別人的寫法
SELECT DISTINCT(num) AS ConsecutiveNums
FROM (
SELECT num,
LAG(num, 1) OVER (ORDER BY id) AS prev1,
-- LAG 取前一筆的num,基於id asc 排列
LAG(num, 2) OVER (ORDER BY id) AS prev2
-- LAG 取前二筆的num,基於id asc 排列
FROM Logs
) AS subquery
WHERE num = prev1 AND num = prev2
與LAG 相對的是LEAD。與JOIN 的差別在於,如果id 不連續,很明顯JOIN 的撈法就會失效;另外一個是JOIN 的效能會比較差。
[ Medium] 1164. Product Price at a Given Date
依照題目敘述:資料表代表某些Product 的價錢在指定的日期有變動。又假設產品未變更的價格為10。就聯想到資料表Products 有可能是空的。但是並沒有明確指出是否有另外一張資料表存著所有產品。以下是我的答案
-- Write your PostgreSQL query statement below
-- 找出指定日期以前,產品價格有變動的最後一個日期
with lastChangeDate as (
select product_id, max(change_date) as change_date from Products
where change_date <= '2019-08-16'
group by product_id
),
-- 找出全部Product 的Id。如上面所說,若真的有缺漏狀況,只要有完整產品列表,我的SQL 是可以解決的
allProductIds as (
select distinct(product_id) from Products
)
select a.product_id,
case when l.change_date is null then 10 else p.new_price end as price
from allProductIds a
left join lastChangeDate l on a.product_id = l.product_id
left join Products p on l.product_id = p.product_id and l.change_date = p.change_date;
以下是擷取別人的答案,他的撈法跟我的差異在這一句,找出最大的日期。
看來這一個session: 進階select & join,要使用function 去解決這類問題。
-- 我用group by + max 取出每個product 的取出最大的變動日期
-- 他用partition 幫變動日期做排序並給予編號(rank),最後需要自己挑選ranking = 1 的row
rank() over(partition by product_id order by change_date desc) as ranking
[ Medium] 1204. Last Person to Fit in the Bus
看到需要累加,一時間不知怎麼寫。後來想到可以用join 找出前面上車乘客的重量,再全部group by turn + sum。
-- Write your PostgreSQL query statement belows
-- 找出前面上車乘客的重量,算出每個乘客上車時的總重量
with weights as (
select q.turn, sum(q2.weight) as totalWeight from Queue q
join Queue q2 on q.turn >= q2.turn
group by q.turn
order by q.turn
)
select q.person_name from weights w
join Queue q on q.turn = w.turn
where w.totalweight <= 1000
order by w.totalweight desc
limit 1;
看看別人的答案吧,用over(order by )可以簡單做到排序累加的效果。
select person_name from
(select person_name, sum(weight) over(order by turn) as total_weight from queue)
where total_weight<=1000
order by total_weight desc
limit 1;
Subqueries
[ Easy ] 1978. Employees Whose Manager Left the Company
比較JOIN、WHERE-IN、EXISTS 的差異
- JOIN: 有索引、需要取出多資料表欄位。
- WHERE-IN: select 子句資料量少
- EXISTS: 只需要檢查是否存在,效率會比WHERE-IN 好
[ Medium] 626. Exchange Seats
-- 用窗口函數LAG 跟LEAD 來抓前後一筆資料的id。也因為是窗口函數,如果抓不到會是null。
with lagAndLead as (
select id,
LAG(id) OVER (ORDER BY id) AS previous_value,
LEAD(id) OVER (ORDER BY id) AS next_value,
student
from Seat
)
-- 最後再用case-when 做交換的判斷邏輯
select
case
when id%2=1 and next_value is not null then next_value
when id%2=0 and previous_value is not null then previous_value
else id end as id,
student
from lagAndLead
order by id;
-- 要改良的話也可以把兩個查詢合在一起,然後只拿出必要的值,不要一次拿出前後兩個id
SELECT id,
CASE
THEN id % 2 != 0 THEN LEAD(student, 1, student) OVER (ORDER BY id)
ELSE LAG(student, 1) OVER (ORDER BY id)
END AS student
FROM Seat;
有關LEAD跟LAD的用法是: lead(欄位, integer:取第幾筆, default value:如果取不到的預設值)
詳細用法可參考: https://www.postgresql.org/docs/8.4/functions-window.html
[ Medium] 1321. Restaurant Growth
在假定天數會連續的狀況下,至少要從第七天開始,才能計算。所以先抓出包含第七天開始,不重複的日期。
再用範圍式的JOIN,用每個日期所符合的範圍,抓出資料做運算。
-- Write your PostgreSQL query statement below
with dates as (
select distinct(c.visited_on) from Customer c
order by c.visited_on
offset 6
)
select c1.visited_on, sum(c2.amount) as amount,
round(sum(c2.amount)/7.0, 2) as average_amount from dates c1
join Customer c2 on (c2.visited_on between c1.visited_on-6 and c1.visited_on)
group by c1.visited_on;
當然也可以用窗口函數計算
-- Write your PostgreSQL query statement below
SELECT visited_on
, SUM(SUM(amount))
OVER (ORDER BY visited_on
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
AS amount
, round(AVG(SUM(amount))
OVER (ORDER BY visited_on
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
,2)
AS average_amount
FROM customer c
GROUP BY visited_on
ORDER BY visited_on
OFFSET 6 ROWS;
[ Medium] 602. Friend Requests II: Who Has the Most Friends
這邊學到兩點
- UNION 跟UNION ALL 的差別是: UNION 會過濾重複資料,UNION ALL 則不過濾。
- SQL執行順序 FROM > WHERE > GROUP BY > HAVING > SELECT > ORDER BY > LIMIT/OFFSET。
[ Hard] 185. Department Top Three Salaries
50 題裡面唯一一題難度是Hard 的。要拿出每個部門前三高薪水的人,tricky 的是可能會有人的薪水是一樣的,所以不能直接取最高三個。
我自己的思路是:用CTE 呼叫rank() over PARTITION
找出每個部門前三高的薪水,再用部門跟薪水數字去找出相對應的人。
-- Write your PostgreSQL query statement below
with top3SalaryByDepart as (
select departmentid, salary, rank from (
select *, rank() over (PARTITION BY t.departmentid order by t.salary desc) from
(select e.departmentId, e.salary from Employee e
join Department d on e.departmentId = d.id
group by e.departmentId, e.salary
order by e.departmentId, e.salary desc) t
) t2
where rank < 4
)
select d.name as "Department", e.name as "Employee", e.salary as "Salary"
from top3SalaryByDepart t
join Department d on t.departmentid = d.id
join Employee e on t.departmentid = e.departmentid and t.salary = e.salary;
實作中有遇到一個問題:rank() + oder by column 如果遇到column 值相同時,排名數字會不連續,即排名數字可能出現:1, 2, 2, 4 這種狀況。
經研究發現其實也有好用的窗口函數。PostgreSQL 支援四種排名函數:
- ROW_NUMBER:為每一列分配一個唯一的連續編號。
- RANK:為每列分配排名,可以指定排名條件,值相同的列會有相同排名,但後續排名會跳號,如上面所述。
- DENSE_RANK:適合本次使用的函數,不跳號的RANK,如上述相同狀況時,排名數字為:1, 2, 2, 3。
- NTILE(n):將SELECT 結果平均分成 n 組
如此上面複雜的SQL,就可以簡化
SELECT d.name AS "Department", e.name AS "Employee", e.salary AS "Salary"
FROM (
SELECT *,
DENSE_RANK() OVER (PARTITION BY departmentId ORDER BY salary DESC) AS rnk
FROM Employee
) e
JOIN Department d ON e.departmentId = d.id
WHERE rnk <= 3;
Advanced String Functions / Regex / Clause
[ Easy ] 1667. Fix Names in a Table
字串串接,除了可以用CONCAT(string1, string2…)
之外,也可以用string1||string2
。
[ Easy ] 196. Delete Duplicate Emails
WHERE (NOT) EXISTS (SELECT ……)
DELETE FROM Person P2
USING Person P1
WHERE P1.email = P2.email
AND P1.id < P2.id;
[ Easy ] 1484. Group Sold Products By The Date
count(DISTINCT a.column)
STRING_AGG(DISTINCT a.column order by a.id)
[ Easy ] 1327. List the Products Ordered in a Period
'BETWEEN a and b' the same as 'column>=a and column<=b'