筆記下 Google Machine Learning 的那些事
筆記一下
AutoML 訓練用於分類/迴歸的每節點時數價格 $21.252 美元
流程
https://console.cloud.google.com/
data:image/s3,"s3://crabby-images/c0d86/c0d86c66baac3fa1facb125848bd9bd3f409a618" alt=""
啟用 API
data:image/s3,"s3://crabby-images/32731/327318dbcdd80f0d7a44dce9de5084c07a9d2d72" alt=""
新建資料集
這邊會定義資料集名稱與種類,並可以直接從這邊選本機檔案上傳到 Storage
※ 反過來可以先把資料上傳到 Storage 再來這裡直接設定資料集來源 from Storage
data:image/s3,"s3://crabby-images/1c2fa/1c2fa0af0a2f7be3bc29c1de2ee44fe98c74a7c3" alt=""
DataSet
data:image/s3,"s3://crabby-images/9a619/9a61979fe5b0c0ac43f632bebb0b0eba48e47472" alt=""
上傳資料
https://github.com/dsindy/kaggle-titanic/blob/master/data/train.csv
※ 這份資料只有八百多筆,Azure可以用,GCP說要一千筆以上資料才能跑
data:image/s3,"s3://crabby-images/b5c2b/b5c2b429a9e44495c4aaed8fe0a6e235c861d9e6" alt=""
Storage
data:image/s3,"s3://crabby-images/96309/96309f6841d9585a9d70027de060905be85c289a" alt=""
新建 Storage
data:image/s3,"s3://crabby-images/661d8/661d8fe1eeeeb6ca79a8ad8f1c4a802b18cd5355" alt=""
主畫面會有最近的資料
data:image/s3,"s3://crabby-images/a622c/a622c980312d120f0d248eaad4214af6c2f79775" alt=""
在資料集內可以訓練新模型
data:image/s3,"s3://crabby-images/865d0/865d016669c7b102b1c3372f3d2e4a038d3333a8" alt=""
AutoML
目標這邊要改回歸
data:image/s3,"s3://crabby-images/138e3/138e324d70101cc178ba7856bc59669ce8fd3d4b" alt=""
選擇欄位
data:image/s3,"s3://crabby-images/74399/74399c03e5997c4ca9cc11e4ba64f9175a278bf3" alt=""
提交
data:image/s3,"s3://crabby-images/e4eba/e4eba80feb75c6d7459fc7b33df2955919707f82" alt=""
結果
data:image/s3,"s3://crabby-images/9be68/9be6808136213966650e079172d01ab0d1f6874d" alt=""
成功畫面
data:image/s3,"s3://crabby-images/b8185/b81859c90bab0de3e47822e1348a88e867ab0ebf" alt=""
pipelines
data:image/s3,"s3://crabby-images/3b22e/3b22e31606c4a76367407ee797915706834d009c" alt=""
跑完之後可以到模型園地
data:image/s3,"s3://crabby-images/26909/269099f68f79ad64edd14a46afe621c00a4f5fde" alt=""
我的模型中找到剛剛(兩個多小時)訓練好的模型
data:image/s3,"s3://crabby-images/5539b/5539b29897e34f7d8b7a5c479c4ceb3837ef7dde" alt=""
點進去可以看參數
data:image/s3,"s3://crabby-images/bf05f/bf05f05b4f2e28df1cb3bee39770cefe6e12aa00" alt=""
Deploy
data:image/s3,"s3://crabby-images/111f6/111f6a761f569a1769dd7cc2a194a98fcd4b6eb4" alt=""
Endpoint
data:image/s3,"s3://crabby-images/d0f65/d0f6583d2de08d8d6d770ea838def79d0236d3ed" alt=""
Spec
data:image/s3,"s3://crabby-images/1c0b7/1c0b74ea8e56b9ad480e69ad75a06c68fd80cd4b" alt=""
模型監控
data:image/s3,"s3://crabby-images/7e2bd/7e2bdf631c892716ae1817b79c8c34a8fa91353a" alt=""
監控目標
data:image/s3,"s3://crabby-images/d827d/d827d809f03eff62451694c2b9484ed83e2850ef" alt=""
也可以關閉監控直接部署
data:image/s3,"s3://crabby-images/2226f/2226f091440694f9aa708434f08a1fa435d0de3c" alt=""
Rest
data:image/s3,"s3://crabby-images/31223/31223590c79dd552552efa774f6458ca6d6c1b6c" alt=""
Batch
data:image/s3,"s3://crabby-images/831a4/831a400318b189e15776088d8be6e91d369ce341" alt=""
最後可以在線上預測中找到端點
data:image/s3,"s3://crabby-images/b3f64/b3f64c7c0b841542349d4dfee415cd062c37699a" alt=""
注意
線上預測端點一但部署就會開始算錢(因為要起一台機器隨時提供Rest API)
練習完千萬記得刪除端點(要先到端點裡面取消模型註冊方可刪除)
Error
資料少於1000筆會失敗
data:image/s3,"s3://crabby-images/8d147/8d147ef369827c9ba70b67eec3ca8402358ff3a8" alt=""
Invalid CSV file: 'utf-8' codec can't decode byte 0xad in position 129: invalid start byte
上傳資料遇到錯誤,看了一下手上是 Big5,我改成 UTF8 後可以正常分析統計資料,但訓練會失敗,最後改成UTF8BOM。
- 資料筆數最少要1000筆以上。
- 權重欄位不重複值不能超過10000筆以上。
- 欄位名稱無法解析(每個欄位開頭要用英文字母,『-』、『?』、『:』和中文好像有問題)。
Required column(s) not included in the provided schema: ['A', 'B', 'C']
我最終重建 DataSet 來重跑 AutoML 之後就成功了
猜測一開始建立 DataSet 時的欄位就已經固定了,後續改 csv 欄位會導致錯誤?
參照
介紹Vertex(1) | ML#Day18 - iT 邦幫忙::一起幫忙解決難題,拯救 IT 人的一天 (ithome.com.tw)
data:image/s3,"s3://crabby-images/a60dd/a60dd253910b03da99eec726cbd3f4ff796c62fa" alt=""