常見統計圖型、機率分配 實作By Tony
常見統計圖型
長條圖(bar chart)
取用 R 內建資料庫 mtcars| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
| Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
| Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
| Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
| Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
| Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
| Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
| Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
| Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
| Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
| AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
| Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
| Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
利用 table 可以對應 carb 變數計數
table(mtcars$carb)
##
## 1 2 3 4 6 8
## 7 10 3 10 1 1
長條圖主要用來比較兩個或以上的類別/時間點/條件
barplot(table(mtcars$carb),
main="Bar chart",
xlab="Carb",
ylab="Frequency",
col="green")
圓餅圖/圓形圖(pie chart)
圓餅圖主要用於描述量、頻率或百分比之間的相對關係。
pie(x = c(10,20,30),
labels = c("A","B","C") )
直方圖(histogram)
取用第一週介紹的成績資料| 班級A | 班級B | 班級C |
|---|---|---|
| 71 | 21 | 82 |
| 52 | 90 | 89 |
| 35 | 57 | 100 |
| 74 | 56 | 89 |
| 41 | 68 | 86 |
| 71 | 59 | 85 |
| 48 | 72 | 88 |
| 68 | 58 | 90 |
| 38 | 50 | 87 |
| 55 | 49 | 99 |
| 56 | 54 | 95 |
| 84 | 50 | 85 |
| 43 | 48 | 84 |
| 64 | 40 | 88 |
| 65 | 38 | 88 |
| 66 | 44 | 90 |
| 73 | 39 | 90 |
| 33 | 74 | 84 |
| 55 | 56 | 85 |
| 26 | 71 | 85 |
| 64 | 35 | 84 |
| 75 | 40 | 96 |
| 73 | 64 | 85 |
| 81 | 90 | 90 |
| 48 | 39 | 79 |
| 52 | 31 | 100 |
| 33 | 49 | 94 |
| 61 | 47 | 93 |
| 99 | 66 | 94 |
| 36 | 50 | 70 |
| 46 | 68 | 82 |
| 69 | 67 | 89 |
| 71 | 49 | 86 |
| 52 | 82 | 84 |
| 75 | 99 | 99 |
| 54 | 58 | 83 |
| 53 | 38 | 90 |
| 64 | 53 | 87 |
| 43 | 65 | 75 |
| 53 | 51 | 82 |
直方圖用於顯示數據分布情況,兩個坐標分別是統計樣本和該樣本對應的某個屬性的度量。
hist(midterm$`班級A`, breaks = 10, xlim = c(0, 100),
col="green", xlab = "score", main = "班級 A")
hist(midterm$`班級B`, breaks = 10, xlim = c(0, 100),
col="green", xlab = "score", main = "班級 B")
hist(midterm$`班級C`, breaks = 10, xlim = c(0, 100),
col="green", xlab = "score", main = "班級 C")
莖葉圖(Stem and leaf plot)
莖葉圖用於以圖形格式呈現定量數據的裝置,類似於直方圖,以幫助可視化分佈的形狀。
stem(midterm$`班級A`)
##
## The decimal point is 1 digit(s) to the right of the |
##
## 2 | 6
## 3 | 33568
## 4 | 133688
## 5 | 222334556
## 6 | 14445689
## 7 | 11133455
## 8 | 14
## 9 | 9
散佈圖(scatter diagram)
取用 R 內建資料庫 cars,資料 speed 為汽車速度,dist 為每次煞車到到靜止所需距離。| speed | dist |
|---|---|
| 4 | 2 |
| 4 | 10 |
| 7 | 4 |
| 7 | 22 |
| 8 | 16 |
| 9 | 10 |
| 10 | 18 |
| 10 | 26 |
| 10 | 34 |
| 11 | 17 |
散佈圖常用於表示兩個計量變數間的關係。
plot(cars, col="red", pch = 20)
機率分配介紹
二項分配(binomial)
| 函數 | 指令 | 說明 |
|---|---|---|
| 機率質量函數(PMF) | dbinom() | |
| 累積機率函數(CDF) | pbinom() | P(X<=x) |
| 抽樣函數 | rbinom() | 傳回 n 個常態分配樣本 |
dbinom
dbinom(0, 2, 0.5)
## [1] 0.25
dbinom(1, 2, 0.5)
## [1] 0.5
\(\frac{4!}{2!2!}\times(0.5)^2\times(0.5)^2\)
dbinom(2, 4, 0.5)
## [1] 0.375
pbinom
pbinom(50, 100, 0.5) - pbinom(49, 100, 0.5)
## [1] 0.07958924
dbinom(50, 100, 0.5)
## [1] 0.07958924
常態分配(normal distribution)
μ 與 σ 為常態分配兩個重要參數,μ 為平均數,σ 為標準差。
| 函數 | 指令 | 說明 |
|---|---|---|
| 機率密度函數(PDF) | dnorm() | |
| 累積機率函數(CDF) | pnorm() | P(X<=x) |
| 抽樣函數 | rnorm() | 傳回 n 個常態分配樣本 |
plot
curve(dnorm(x, mean = 0, sd = 1), -8, 8, ylab = "", col = "red")
curve(dnorm(x, mean = 3, sd = 1), -8, 8, add = T, col = "green")
curve(dnorm(x, mean = -1, sd = 2), -8, 8, add = T, col = "blue")
pnorm
pnorm(Inf)-pnorm(0)
## [1] 0.5
pnorm(0)-pnorm(-Inf)
## [1] 0.5
pnorm(1)-pnorm(-1)
## [1] 0.6826895
pnorm(2)-pnorm(-2)
## [1] 0.9544997
pnorm(3)-pnorm(-3)
## [1] 0.9973002
rnorm
rnorm(10)
## [1] 0.7385227 -0.5147605 -1.6401813 0.9160368 -1.2674820 0.7382478
## [7] -0.7826228 0.5092959 -1.4899391 -0.3191793
hist(rnorm(10), xlim = c(-3, 3))
hist(rnorm(100), xlim = c(-3, 3))
hist(rnorm(1000), xlim = c(-3, 3))
hist(rnorm(10000), xlim = c(-3, 3))
t 分配(Student t distribution)
| 函數 | 指令 | 說明 |
|---|---|---|
| 機率密度函數(PDF) | dt() | |
| 累積機率函數(CDF) | pt() | P(X<=x) |
| 抽樣函數 | rt() | 傳回 n 個常態分配樣本 |
curve(dnorm(x), -5, 5, col="black", ylab="f(x)", ylim=c(0,0.5))
curve(dt(x, df = 1), -5, 5, col="blue", add=T)
curve(dt(x, df = 5), -5, 5, col="red", add=T)
curve(dt(x, df = 30), -5, 5, col="green", add=T)
legend("topright",
c("df=1","df=5","df=30","standard normal distribution"),
col=c("blue","red","green","black"),
lty=1)
參考資料
常見統計圖型
機率分配
二項分配(binomial) https://ppt.cc/fUS72x
常態分配(normal distribution) https://ppt.cc/fSklmx
t 分配(Student t distribution) https://ppt.cc/fygiDx