[SQL]列出每個年齡層員工人數-2

[SQL]列出每個年齡層員工人數-2

以10歲為間隔列出每個年齡層員工人數,要用員工的生日去算出年齡,還是用間隔算出間隔的日期後,再透過生日去對應到日期區間呢?

在前一篇「列出每個年齡層員工人數」中,Scar及Kenny大大提到可使用間隔去算出日期區間,再去Map到每個員工的生日,這樣就不需要針對每個員工的生日去算出年齡。

以下為了測試,所以先準備大量的資料(1千萬筆資料,會需要Run一段時間哦!),如下,


USE tempdb
go


CREATE TABLE [dbo].[EmployeesAge](
	[empid] [int] IDENTITY(1,1) NOT NULL,
	[lastname] [nvarchar](20) NOT NULL,
	[firstname] [nvarchar](10) NOT NULL,
	[title] [nvarchar](30) NOT NULL,
	[titleofcourtesy] [nvarchar](25) NOT NULL,
	[birthdate] [datetime] NOT NULL,
	[hiredate] [datetime] NOT NULL,
	[address] [nvarchar](60) NOT NULL,
	[city] [nvarchar](15) NOT NULL,
	[region] [nvarchar](15) NULL,
	[postalcode] [nvarchar](10) NULL,
	[country] [nvarchar](15) NOT NULL,
	[phone] [nvarchar](24) NOT NULL,
	[mgrid] [int] NULL
) ON [PRIMARY]

GO
 
--新增 10000001 筆資料
DECLARE @i INT, @age DATETIME
SET @i = 0;
WHILE @i <= 10000000
BEGIN
	SET @i = @i + 1;
	SET @age = DATEADD(dd, @i % 22222,  '1951/1/1');
	INSERT [dbo].[EmployeesAge] ( [lastname], [firstname], [title], [titleofcourtesy], [birthdate], [hiredate], [address], [city], [region], [postalcode], [country], [phone], [mgrid]) 
	VALUES ( N'Rainmaker', STR(@i), N'Developer', N'Mr.', @age
	, CAST(N'2004-11-15 00:00:00.000' AS DateTime), N'1234 Houndstooth Rd.'
	, N'London', NULL, N'10008'
	, N'UK', N'(71) 456-7890', 5)
end
 

SELECT COUNT(*) FROM EmployeesAge;
--10,000,001

-- birthdate 設定成cluster index
CREATE CLUSTERED INDEX IX_EmployeesAge_TestCol1 
    ON EmployeesAge (birthdate); 
GO

--用empid做pk NONCLUSTERED
ALTER TABLE EmployeesAge
ADD CONSTRAINT PK_EmployeesAge_empid PRIMARY KEY NONCLUSTERED (empid);

 

方法1:

1.1開始執行之前,可以先將Cache清掉,並設定將相關的統計資訊一併顯示出來,如下,


DBCC DROPCLEANBUFFERS
GO
SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO

 

1.2算出每個員工年齡,然後再算出每個年齡層的員工人數


;WITH SEQ_RESULT 
AS(
	SELECT  99 AS SEQ_NO, 100 - 10 AS RANGE_START, 10 AS NUM_CNT
		UNION ALL
		SELECT Y.SEQ_NO - 10, RANGE_START - 10, Y.NUM_CNT -1
		FROM SEQ_RESULT Y
		WHERE Y.NUM_CNT > 1
)
,
empAge
AS
(
SELECT DATEDIFF(YY, birthdate, GETDATE())
        - CASE WHEN DATEADD(YY, DATEDIFF(YY, birthdate, GETDATE()), birthdate) > GETDATE()
               THEN 1
               ELSE 0
          END AS age
FROM EmployeesAge
)
SELECT rag.RANGE_START, rag.SEQ_NO, COUNT(age) AS levelCount
FROM empAge age RIGHT JOIN SEQ_RESULT rag
ON age.age between rag.RANGE_START  AND rag.SEQ_NO
GROUP BY rag.RANGE_START, rag.SEQ_NO
ORDER BY  rag.RANGE_START;

image

執行時間約30秒,統計資訊如下,

SQL Server parse and compile time:
   CPU time = 15 ms, elapsed time = 61 ms.

(10 row(s) affected)
Table 'Worktable'. Scan count 2, logical reads 61, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeesAge'. Scan count 2, logical reads 321890, physical reads 1, read-ahead reads 32190, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 55131 ms,  elapsed time = 29993 ms.

 

方法2:算出年齡層的日期區間,再計算出每個年齡層的員工人數

2.1. 先算出年齡層的日期區間,顯出來看看區間是否OK


;WITH SEQ_RESULT 
AS(
	SELECT  99 AS SEQ_NO, 100 - 10 AS RANGE_START, 10 AS NUM_CNT
		UNION ALL
		SELECT Y.SEQ_NO - 10, RANGE_START - 10, Y.NUM_CNT -1
		FROM SEQ_RESULT Y
		WHERE Y.NUM_CNT > 1
)
SELECT rag.RANGE_START, rag.SEQ_NO, DATEADD(YY, 0- RANGE_START,GETDATE()) AS date_start, DATEADD(YY, 0- SEQ_NO -1 ,GETDATE()) AS date_end
FROM	 SEQ_RESULT rag
ORDER BY  rag.RANGE_START;

image

 

2.2.開始執行之前,可以先將Cache清掉,並設定將相關的統計資訊一併顯示出來,如下,


DBCC DROPCLEANBUFFERS;
GO
SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO

2.3. 間隔算出間隔的日期後,再透過生日去> AND <= 在日期區間之中


;WITH SEQ_RESULT 
AS(
	SELECT  99 AS SEQ_NO, 100 - 10 AS RANGE_START, 10 AS NUM_CNT
		UNION ALL
		SELECT Y.SEQ_NO - 10, RANGE_START - 10, Y.NUM_CNT -1
		FROM SEQ_RESULT Y
		WHERE Y.NUM_CNT > 1
)
SELECT rag.RANGE_START, rag.SEQ_NO, COUNT(age.birthdate) AS levelCount
FROM EmployeesAge age RIGHT JOIN SEQ_RESULT rag
ON age.birthdate > DATEADD(YY, 0- SEQ_NO -1,GETDATE())  AND  age.birthdate <= DATEADD(YY, 0- RANGE_START ,GETDATE())
GROUP BY rag.RANGE_START, rag.SEQ_NO 
ORDER BY  rag.RANGE_START;

image

執行時間約30秒,統計資訊如下,

SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 67 ms.

(10 row(s) affected)
Table 'Worktable'. Scan count 2, logical reads 61, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'EmployeesAge'. Scan count 10, logical reads 271173, physical reads 11, read-ahead reads 271075, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 3401 ms,  elapsed time = 27129 ms.

 

2個方式比較之下,使用員工的生日去Map到年齡層的區間日期中速度快了一些,read也比較小。

如果是用Inner Join的方式,會差的更多哦!

接下來我們將2個SQL一起執行,來看看這2個SQL的執行計畫,如下,

開始執行之前,可以先將Cache清掉,並設定將相關的統計資訊一併顯示出來,如下,


DBCC DROPCLEANBUFFERS
GO
SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO

開啟包含實際的執行計畫功能,

image

然後執行方法1、2的SQL,如下,


--方法1
;WITH SEQ_RESULT 
AS(
	SELECT  99 AS SEQ_NO, 100 - 10 AS RANGE_START, 10 AS NUM_CNT
		UNION ALL
		SELECT Y.SEQ_NO - 10, RANGE_START - 10, Y.NUM_CNT -1
		FROM SEQ_RESULT Y
		WHERE Y.NUM_CNT > 1
)
,
empAge
AS
(
SELECT DATEDIFF(YY, birthdate, GETDATE())
        - CASE WHEN DATEADD(YY, DATEDIFF(YY, birthdate, GETDATE()), birthdate) > GETDATE()
               THEN 1
               ELSE 0
          END AS age
FROM EmployeesAge
)
SELECT rag.RANGE_START, rag.SEQ_NO, COUNT(age) AS levelCount
FROM empAge age RIGHT JOIN SEQ_RESULT rag
ON age.age between rag.RANGE_START  AND rag.SEQ_NO
GROUP BY rag.RANGE_START, rag.SEQ_NO
ORDER BY  rag.RANGE_START;

--方法2 
;WITH SEQ_RESULT 
AS(
	SELECT  99 AS SEQ_NO, 100 - 10 AS RANGE_START, 10 AS NUM_CNT
		UNION ALL
		SELECT Y.SEQ_NO - 10, RANGE_START - 10, Y.NUM_CNT -1
		FROM SEQ_RESULT Y
		WHERE Y.NUM_CNT > 1
)
SELECT rag.RANGE_START, rag.SEQ_NO, COUNT(age.birthdate) AS levelCount
FROM EmployeesAge age RIGHT JOIN SEQ_RESULT rag
ON age.birthdate > DATEADD(YY, 0- SEQ_NO -1,GETDATE())  AND  age.birthdate <= DATEADD(YY, 0- RANGE_START ,GETDATE())
GROUP BY rag.RANGE_START, rag.SEQ_NO 
ORDER BY  rag.RANGE_START;

執行完成後,可按下執行計畫的Tab,查看2個方法的相對成本,如下圖,

image

 

另外,可以注意到一點的是,如果方法1使用Right Join跟使用Inner Join的速度(約要花1分多)會差蠻多的哦!

image

image

 

附件是方法1,使用Right Outer Join與Inner Join的比較的執行計畫,大家可參考看看哦!

感謝Scar大大及Kenny大大的分享。

Hi, 

亂馬客Blog已移到了 「亂馬客​ : Re:從零開始的軟體開發生活

請大家繼續支持 ^_^