[SQL]列出每個年齡層員工人數-2
以10歲為間隔列出每個年齡層員工人數,要用員工的生日去算出年齡,還是用間隔算出間隔的日期後,再透過生日去對應到日期區間呢?
在前一篇「列出每個年齡層員工人數」中,Scar及Kenny大大提到可使用間隔去算出日期區間,再去Map到每個員工的生日,這樣就不需要針對每個員工的生日去算出年齡。
以下為了測試,所以先準備大量的資料(1千萬筆資料,會需要Run一段時間哦!),如下,
USE tempdb
go
CREATE TABLE [dbo].[EmployeesAge](
[empid] [int] IDENTITY(1,1) NOT NULL,
[lastname] [nvarchar](20) NOT NULL,
[firstname] [nvarchar](10) NOT NULL,
[title] [nvarchar](30) NOT NULL,
[titleofcourtesy] [nvarchar](25) NOT NULL,
[birthdate] [datetime] NOT NULL,
[hiredate] [datetime] NOT NULL,
[address] [nvarchar](60) NOT NULL,
[city] [nvarchar](15) NOT NULL,
[region] [nvarchar](15) NULL,
[postalcode] [nvarchar](10) NULL,
[country] [nvarchar](15) NOT NULL,
[phone] [nvarchar](24) NOT NULL,
[mgrid] [int] NULL
) ON [PRIMARY]
GO
--新增 10000001 筆資料
DECLARE @i INT, @age DATETIME
SET @i = 0;
WHILE @i <= 10000000
BEGIN
SET @i = @i + 1;
SET @age = DATEADD(dd, @i % 22222, '1951/1/1');
INSERT [dbo].[EmployeesAge] ( [lastname], [firstname], [title], [titleofcourtesy], [birthdate], [hiredate], [address], [city], [region], [postalcode], [country], [phone], [mgrid])
VALUES ( N'Rainmaker', STR(@i), N'Developer', N'Mr.', @age
, CAST(N'2004-11-15 00:00:00.000' AS DateTime), N'1234 Houndstooth Rd.'
, N'London', NULL, N'10008'
, N'UK', N'(71) 456-7890', 5)
end
SELECT COUNT(*) FROM EmployeesAge;
--10,000,001
-- birthdate 設定成cluster index
CREATE CLUSTERED INDEX IX_EmployeesAge_TestCol1
ON EmployeesAge (birthdate);
GO
--用empid做pk NONCLUSTERED
ALTER TABLE EmployeesAge
ADD CONSTRAINT PK_EmployeesAge_empid PRIMARY KEY NONCLUSTERED (empid);
方法1:
1.1開始執行之前,可以先將Cache清掉,並設定將相關的統計資訊一併顯示出來,如下,
DBCC DROPCLEANBUFFERS
GO
SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO
1.2算出每個員工年齡,然後再算出每個年齡層的員工人數
;WITH SEQ_RESULT
AS(
SELECT 99 AS SEQ_NO, 100 - 10 AS RANGE_START, 10 AS NUM_CNT
UNION ALL
SELECT Y.SEQ_NO - 10, RANGE_START - 10, Y.NUM_CNT -1
FROM SEQ_RESULT Y
WHERE Y.NUM_CNT > 1
)
,
empAge
AS
(
SELECT DATEDIFF(YY, birthdate, GETDATE())
- CASE WHEN DATEADD(YY, DATEDIFF(YY, birthdate, GETDATE()), birthdate) > GETDATE()
THEN 1
ELSE 0
END AS age
FROM EmployeesAge
)
SELECT rag.RANGE_START, rag.SEQ_NO, COUNT(age) AS levelCount
FROM empAge age RIGHT JOIN SEQ_RESULT rag
ON age.age between rag.RANGE_START AND rag.SEQ_NO
GROUP BY rag.RANGE_START, rag.SEQ_NO
ORDER BY rag.RANGE_START;
執行時間約30秒,統計資訊如下,
SQL Server parse and compile time:
CPU time = 15 ms, elapsed time = 61 ms.
(10 row(s) affected)
Table 'Worktable'. Scan count 2, logical reads 61, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeesAge'. Scan count 2, logical reads 321890, physical reads 1, read-ahead reads 32190, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 55131 ms, elapsed time = 29993 ms.
方法2:算出年齡層的日期區間,再計算出每個年齡層的員工人數
2.1. 先算出年齡層的日期區間,顯出來看看區間是否OK
;WITH SEQ_RESULT
AS(
SELECT 99 AS SEQ_NO, 100 - 10 AS RANGE_START, 10 AS NUM_CNT
UNION ALL
SELECT Y.SEQ_NO - 10, RANGE_START - 10, Y.NUM_CNT -1
FROM SEQ_RESULT Y
WHERE Y.NUM_CNT > 1
)
SELECT rag.RANGE_START, rag.SEQ_NO, DATEADD(YY, 0- RANGE_START,GETDATE()) AS date_start, DATEADD(YY, 0- SEQ_NO -1 ,GETDATE()) AS date_end
FROM SEQ_RESULT rag
ORDER BY rag.RANGE_START;
2.2.開始執行之前,可以先將Cache清掉,並設定將相關的統計資訊一併顯示出來,如下,
DBCC DROPCLEANBUFFERS;
GO
SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO
2.3. 間隔算出間隔的日期後,再透過生日去> AND <= 在日期區間之中
;WITH SEQ_RESULT
AS(
SELECT 99 AS SEQ_NO, 100 - 10 AS RANGE_START, 10 AS NUM_CNT
UNION ALL
SELECT Y.SEQ_NO - 10, RANGE_START - 10, Y.NUM_CNT -1
FROM SEQ_RESULT Y
WHERE Y.NUM_CNT > 1
)
SELECT rag.RANGE_START, rag.SEQ_NO, COUNT(age.birthdate) AS levelCount
FROM EmployeesAge age RIGHT JOIN SEQ_RESULT rag
ON age.birthdate > DATEADD(YY, 0- SEQ_NO -1,GETDATE()) AND age.birthdate <= DATEADD(YY, 0- RANGE_START ,GETDATE())
GROUP BY rag.RANGE_START, rag.SEQ_NO
ORDER BY rag.RANGE_START;
執行時間約30秒,統計資訊如下,
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 67 ms.
(10 row(s) affected)
Table 'Worktable'. Scan count 2, logical reads 61, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeesAge'. Scan count 10, logical reads 271173, physical reads 11, read-ahead reads 271075, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3401 ms, elapsed time = 27129 ms.
2個方式比較之下,使用員工的生日去Map到年齡層的區間日期中速度快了一些,read也比較小。
如果是用Inner Join的方式,會差的更多哦!
接下來我們將2個SQL一起執行,來看看這2個SQL的執行計畫,如下,
開始執行之前,可以先將Cache清掉,並設定將相關的統計資訊一併顯示出來,如下,
DBCC DROPCLEANBUFFERS
GO
SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO
開啟包含實際的執行計畫功能,
然後執行方法1、2的SQL,如下,
--方法1
;WITH SEQ_RESULT
AS(
SELECT 99 AS SEQ_NO, 100 - 10 AS RANGE_START, 10 AS NUM_CNT
UNION ALL
SELECT Y.SEQ_NO - 10, RANGE_START - 10, Y.NUM_CNT -1
FROM SEQ_RESULT Y
WHERE Y.NUM_CNT > 1
)
,
empAge
AS
(
SELECT DATEDIFF(YY, birthdate, GETDATE())
- CASE WHEN DATEADD(YY, DATEDIFF(YY, birthdate, GETDATE()), birthdate) > GETDATE()
THEN 1
ELSE 0
END AS age
FROM EmployeesAge
)
SELECT rag.RANGE_START, rag.SEQ_NO, COUNT(age) AS levelCount
FROM empAge age RIGHT JOIN SEQ_RESULT rag
ON age.age between rag.RANGE_START AND rag.SEQ_NO
GROUP BY rag.RANGE_START, rag.SEQ_NO
ORDER BY rag.RANGE_START;
--方法2
;WITH SEQ_RESULT
AS(
SELECT 99 AS SEQ_NO, 100 - 10 AS RANGE_START, 10 AS NUM_CNT
UNION ALL
SELECT Y.SEQ_NO - 10, RANGE_START - 10, Y.NUM_CNT -1
FROM SEQ_RESULT Y
WHERE Y.NUM_CNT > 1
)
SELECT rag.RANGE_START, rag.SEQ_NO, COUNT(age.birthdate) AS levelCount
FROM EmployeesAge age RIGHT JOIN SEQ_RESULT rag
ON age.birthdate > DATEADD(YY, 0- SEQ_NO -1,GETDATE()) AND age.birthdate <= DATEADD(YY, 0- RANGE_START ,GETDATE())
GROUP BY rag.RANGE_START, rag.SEQ_NO
ORDER BY rag.RANGE_START;
執行完成後,可按下執行計畫的Tab,查看2個方法的相對成本,如下圖,
另外,可以注意到一點的是,如果方法1使用Right Join跟使用Inner Join的速度(約要花1分多)會差蠻多的哦!
附件是方法1,使用Right Outer Join與Inner Join的比較的執行計畫,大家可參考看看哦!
感謝Scar大大及Kenny大大的分享。
Hi,
亂馬客Blog已移到了 「亂馬客 : Re:從零開始的軟體開發生活」
請大家繼續支持 ^_^