Azure / Machine learning / Python / R News

Advent of 2022, Day 9 – Python SDK namespaces for environment, and pipelines

by tomaztsql · December 9, 2022

This article is originally published at https://tomaztsql.wordpress.com

In the series of Azure Machine Learning posts:

Dec 01: What is Azure Machine Learning?
Dec 02: Creating Azure Machine Learning Workspace
Dec 03: Understanding Azure Machine Learning Studio
Dec 04: Getting data to Azure Machine Learning workspace
Dec 05: Creating compute and cluster instances in Azure Machine Learning
Dec 06: Environments in Azure Machine Learning
Dec 07: Introduction to Azure CLI and Python SDK
Dec 08: Python SDK namespaces for workspace, experiments and models

Let’s continue to explore the power of SDK and the namespaces.

Environment

Python SDK namespace is azureml.core.environment. Environments specify the set of Python packages, environment variables, and software settings around your training and scoring scripts. In addition to Python, you can also configure PySpark, Docker and R for environments.

You can use namespace Environment (or created object/asset) to make deployment and code reusable for training purposes at given docker images, configurations and compute type.

The general script is:

from azureml.core.environment import Environment
Environment(name="MyDevEnvironment")

but you can use a Docker image and scale faster with:

# environment variables
environment_file = "4_environment_settings.yaml"
environment_name = "MyDevEnvironmentDocker"

from azureml.core import Environment

env = Environment.from_conda_specification(environment_name, environment_file)
env.docker.enabled = True
env.docker.base_image = (
    "mcr.microsoft.com/azureml/curated/minimal-ubuntu18.04-py37-cuda11.0.3-gpu-inference:10"
)

And we have a YAML file with stored dependencies for the Conda specifications. It hold the following instructions:

channels:
  - conda-forge
dependencies:
  - python=3.9
  - pip
  - pip:
    - azureml-defaults
    - torch==1.8.1
    - torchvision==0.9.1
    - pytorch-lightning==1.1.8
    - mlflow
    - azureml-mlflow

Pipelines

Namespace for the pipeline in SDK is azureml.pipeline.core.pipeline.Pipeline. Pipeline is an automated workflow of a compute task. There can be many subtasks within a pipeline and are a series of instructions. It varies, from simple pipelines, like calling a single script (py file) to a series of steps for data preparation, logging and training configurations, training and validating for efficient repeatability, and deployment steps

An Azure Machine Learning pipeline is an automated workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. An Azure Machine Learning pipeline can be as simple as one step that calls a Python script. Pipelines include functionality for:

Namespace for step in SDK is azureml.pipeline.steps. A step is a single encapsulated instruction that creates a pipeline. We will take a single script name and use a function (in this namespace)

It takes a script name and other optional parameters like arguments for the script, compute target, inputs and outputs. The following code is a simple example of a PythonScriptStep.

#settings

blob_input_data = "iris.csv"
output_data1 = "output.csv"
compute_target = "AMLBlog2022-ds12-v2"
project_folder = "/Users/tomaz.kastrun/outputs"

from azureml.pipeline.steps import PythonScriptStep

train_step = PythonScriptStep(
    script_name="train.py",
    arguments=["--input", blob_input_data, "--output", output_data1],
    inputs=[blob_input_data],
    outputs=[output_data1],
    compute_target=compute_target,
    source_directory=project_folder
)

And once we have at least one step, we can create a pipeline.

ws = "AML_BLogspost2022"

from azureml.pipeline.core import Pipeline

pipeline = Pipeline(workspace=ws, steps=[train_step])
pipeline_run = experiment.submit(pipeline)

Pipelines needs to be published at the end:

ws = "AML_BLogspost2022" 

pipeline_run1 = Experiment(ws, 'Submit').submit(pipeline, regenerate_outputs=False)
print("Pipeline has been submitted")

Once the pipeline is submitted, you can also find it under “Assets” in navigation bar.

Each pipeline must have a compute attached!

Tomorrow, we will look into using Python SDK to connect from the client’s machine to Azure ML.

Compete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Azure-Machine-Learning

Happy Advent of 2022!

Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Advent of 2022, Day 9 – Python SDK namespaces for environment, and pipelines

You may also like...

Categories

Advent of 2022, Day 9 – Python SDK namespaces for environment, and pipelines

Environment

Pipelines

You may also like...

Reading and analysing log files in the RRD database format

Machine Learning Basics – Random Forest

Developing the right mindset for learning statistics: Some suggestions

Categories