### MicrosoftML新的演算法:

scikit-learn的演算法選擇路徑圖

## 測試步驟

1.首先我們會先載入2個library，MicrosoftML及ggplot2

2.訓練資料:亂數隨機取出500個值，同時使用"歸一化"方法將資料分佈在0-1之間後乘100，讓資料呈現統計的概率分佈。

3.測試資料:亂數隨機取出10個值，使用"歸一化"方法將資料分佈在0-1之間後乘100，但故意修改第1筆和第10筆資料，讓他們離群索居。

4.以正常資料訓練並以測試資料預測

5.找出異常值outlier。

###########################
#library
###########################
library(ggplot2)
library(MicrosoftML)

###########################
# train data with normal data
###########################
train_count <- 500
ndivall <- rnorm(train_count)
ndivnorm <- (ndivall - min(ndivall))/(max(ndivall) - min(ndivall))
traindata <- data.frame(CardHolderFeatures = round(100 * ndivnorm, digits = 2))

ndivall <- rnorm(train_count)
ndivnorm <- (ndivall - min(ndivall))/(max(ndivall) - min(ndivall))
traindata$TransactionFeatures <- round(100 * ndivnorm, digits = 2) ########################### #test data with some anomaly data ########################### test_count <- 10 ndivall <- rnorm(test_count) ndivnorm <- (ndivall - min(ndivall))/(max(ndivall) - min(ndivall)) testdata <- data.frame(CardHolderFeatures = round(100 * ndivnorm, digits = 2)) ndivall <- rnorm(test_count) ndivnorm <- (ndivall - min(ndivall))/(max(ndivall) - min(ndivall)) testdata$TransactionFeatures <- round(100 * ndivnorm, digits = 2)
testdata$CardHolderFeatures[c(1,10)] <- c(100, 0) testdata$TransactionFeatures[c(1,10)] <- c(0, 100)
testdata\$seq = seq(1:10)

###########################
# train by ONE CLASS SVM with normal data
###########################
model <- rxOneClassSvm(
formula = ~CardHolderFeatures + TransactionFeatures,
data = traindata)

# predict
result <- rxPredict(
model,
data = testdata,
extraVarsToWrite = c("CardHolderFeatures", "TransactionFeatures","seq"))

result

###########################
# Outliner
###########################
anormal <- subset(result, Score >= 5)
anormal

ggplot(traindata, aes(x = CardHolderFeatures, y = TransactionFeatures)) +
geom_point(colour = "blue", size = 1) +
stat_density2d()+
geom_point(colour = "red", aes(x = CardHolderFeatures, y = TransactionFeatures), data = anormal, alpha = .7) +
geom_label(aes(x = CardHolderFeatures - 5, y = TransactionFeatures,label = seq),data = anormal)


## 小結

• SQL Server 2016 SP1想要binding R Server 9.0.1一直失敗，只好安裝SQL Server 2017。
• 下次要試PCA-Based Anomaly Detection

## 參考

What is the MicrosoftML package?

Anomaly Detection (One Class SVM) in R with MicrosoftML

Running MicrosoftML in SQL Server 2016

Cheat Sheet: How to choose a MicrosoftML algorithm

Choosing the right estimator