Advent of 2022, Day 4 – Getting data to Azure Machine Learning workspace
This article is originally published at https://tomaztsql.wordpress.com
In the series of Azure Machine Learning posts:
- Dec 01: What is Azure Machine Learning?
- Dec 02: Creating Azure Machine Learning Workspace
- Dec 03: Understanding Azure Machine Learning Studio
Yesterday, we have learned the general outlook of the Studio and in this blog post, we will be focusing primarily on getting data to the workspace and reading data from other data sources.
Data Assets
Click on “Data” under Assets and click on “+ CREATE” button.
A new dialogue window will appear to create data assets. In this case, I am selecting Type: “File (uri_file)”
Confirming the type will get you to the next step, where you will define the data source.
From this point, onward, you can get the data from a local file, from a URI or from Azure storage.
Getting data from local file
Selecting “From Local files” will get you to the next step. Selecting already created Azure Blob Storage will use the storage that was created when we created the workspace (on day 1).
In the next step, you will upload the file. The file is available in the GitHub repository; https://github.com/tomaztk/Azure-Machine-Learning (folder Data -> iris.csv).
You will get the confirmation, and review page and click “Create”. This will upload the file and create a data asset with the data source of workspaceblobstore.
Getting data from Azure Blob Storage folder
Let’s check the Storage explorer. Navigate to the Storage account (that we created on day 2) and click Storage browser.
Now you can access the blob container and create a folder structure for your needs.
Within the “Azureml-blobstore-{guid}” I will create a new folder ML_iris and upload two files – iris.csv and another copy of iris-_duplicate.csv file. This will enable multiple CSV files reads in Azure Machine Learning.
Now we can go back to Studio and add data assets, but this time choose “From Azure Storage”.
Then select Azure blob storage with the name “workspaceblobstore”. And finally, select the complete folder / Storage path. All data in this path will be used as data assets.
Review the complete path and hit create.
Under data assets you will see, that we have created two datastores from the blob store, one being just a single file, and the other being a folder.
Getting data from Azure SQL database
Assuming, you have created an Azure SQL Database, you can also add it as a source for your machine learning data source.
Navigate to Datastores and click “+ Create”, to create a datastore. Under the Datastore type, you will select “Azure SQL Database”.
Filling in all the relevant information, the datastore will be created with the correct values.
Uploading data to files in Notebook
Another way – especially when exploring or doing some minor analysis – to get the data to Azure Machine Learning is to click “Notebook” from the navigation bar and simply upload the data.
And after the file is uploaded, the preview of data is created:
When you upload the file, that is visible in the notebook, you are uploading it to the Azure file share. You can check the file share within the Storage account browser to see, the files and folders, that are directly visible in Notebooks. Under the “File Shares”, you will find the “code-{guid}” folder with “Users/tomaz.kastrun” folder structure that corresponds to Notebook directory structure.
Another way to get to these files is to go to Data assets and under Datastores, there will be “workspaceworkingdirectory” Azure file share (created automatically). Exploring and viewing the data will also reveal the same folder structure and all the files that are used or created in the Azure file share with notebooks.
Now that we have imported, uploaded or stored the data to our machine learning workspace, we need another ingredient, compute power. Tomorrow, we will look into provisioning and managing compute assets.
Compete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Azure-Machine-Learning
Happy Advent of 2022!
Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.