Azure Machine Learning

2023-07-17

102
0
Machine Learning
2023-07-24

Azure Machine Learning workspace

前言

此篇為補充方式呈現

其實主要出入是這篇舊文章是 V1 API

DAY03 建立 Datastore 和 Dataset （上） - 10 上傳資料檔

https://github.com/dsindy/kaggle-titanic/blob/master/data/train.csv

DAY04 建立 Datastore 和 Dataset （下）

Data types vs. dataset types

Data asset types

The Azure Machine Learning CLI v2 and Python SDK v2 now include the following data types:

uri_file (display name: File)
uri_folder (display name: Folder)
mltable (display name: Table)

Learn more about data asset types 

Dataset types

Dataset types from Azure Machine Learning CLI v1 and Python SDK v1 can still be used, but will be mapped to the appropriate data type in the v2 systems:

file dataset type: maps to Folder type
tabular dataset type: maps to Table type

Learn more about dataset types 

Compute

NV6 (M60) 將於 2023/7/31 棄用

還有 serverless 可以選，但第一次按了要等三到五分鐘

看起來是閒置 20 min 釋放

Expandable diagram that shows scenarios for Apache Spark session inactivity period and cluster teardown.

Lib 是用 pyspark，懂 Spark 可以研究一下

在 Azure Machine Learning 中使用 Apache Spark 進行互動式資料整頓 - Azure Machine Learning | Microsoft Learn

NoteBook Sample

import pandas as pd

df = pd.read_csv("azureml://subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/MLRG/workspaces/TestWorkSpace/datastores/testdata/paths/test.csv")
df.head()

azureml 可以從設定好的 DataSet 中取得

Designer

6. select columns: Survived, Pclass, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked

13. 在右邊 asset library 輸入 train model，拖曳到 Canvas 中，用 filter based feature selection 左邊的點點連接到它。 Train model 的右邊方框，在 Label Column 選擇 Survived。(原文為 Survivde)