A Guide To Azure ML Pipelines and EndPoints

Hashedin

Deepanshu Singh

30 Mar 2021

Machine learning (ML) pipeline is an Azure tool used to club together various machine learning cycle phases. This cycle can be a training cycle, data cleaning or pre-processing, and a prediction cycle. Let’s look at the multiple aspects of an ML cycle and how the pipeline helps do these tasks with ease.

 

What is the machine learning cycle?

 

ML cycle is a combination of steps integrated to produce a final output. It can be a training algorithm model or inferences from a trained model. For example, for a training cycle, the steps include:

These steps can be dependent or independent, based on input needed from other steps.

Similarly, there is an inference cycle in which a trained model and real data are loaded to produce an inference.

The Machine Learning Cycle1,2

 

ML cycle to ML pipeline

 

ML cycle scripts run on a single workstation in a production environment, that may run out of resources while handling large amounts of data and parallel processing. On the other hand, ML pipeline is a tool used to replicate the ML cycle’s structure steps into the Microsoft Azure Machine Learning Workspace. It can be connected to various other Azure services like storage and compute resources, with all the data security and reliability compliances. The cycle’s multiple steps can be converted into steps in the pipeline, where each step represents a process that has some inputs and produces an output. Splitting the complete process into steps creates a modular system where steps can be reused as per the requirement. Each step of the pipeline runs in a separate container, thus, independent of system-based configurations that Azure Machine Learning Service abstracts. 

 

The intermediate data generated can be passed in and out of steps using PipelineData configurations and persists throughout the pipeline across all steps.

Azure Machine Learning Workspace3

The ML pipeline can be created using a simple drag-n-drop method in Designer or can be custom-developed using SDKs. Azure SDK provides various functions that handle most of the how-to-do stuff, so users can focus more on what to do.

 

Azure ML pipelines are designed to reuse the output. The runtime environment decides which step will run and which may be reused from the previous run. This capability speeds up the execution and saves the compute resources and, thus, the overall cost.

 

Handling large amounts of data using ML pipeline

 

The ML pipelines can handle a large amount of data by parallel processing steps. Any step of the pipeline can be defined as a single process step or parallel process step. In a single process step, complete data is passed to a single node. In parallel process steps, data is broken into mini-batches and passed to each node in the compute cluster as per the needs. 

This is an added advantage of the ML pipeline, as a high-end compute cluster can be spun up when a parallel process step is run and shut down automatically when the step is completed.

 

The structured approach to the parallel process step helps the ML pipeline abstract the underlying requirements to run a parallel process and handle data independently. Also, it takes into account the failure scenarios, such as:

 

Endpoints and pipeline deployment 

 

An endpoint is a programmatic service provided by Azure Machine Learning Workspace. After creating an ML pipeline, it can be published with an endpoint. Publishing a pipeline means registering and deploying a pipeline and providing a trigger for it. Pipeline endpoints automate pipeline workflows by acting as a trigger for the pipeline. An endpoint could be:

 

The endpoint can also be used to pass parameters into the pipeline on run time, which can be defined during the pipeline development.

 

The Azure ML pipeline is an easy way to build and go-live with machine learning projects. It follows professional standards, and has all the system requirements to scale up and down the pipeline when required. Therefore, it reduces cost and removes the problems of maintaining workstations.

References:

  1. https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines
  2. https://towardsdatascience.com/i-had-no-idea-how-to-build-a-machine-learning-pipeline-but-heres-what-i-figured-f3a7773513a
  3. https://docs.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines

Have a question?

Need Technology advice?

Connect

+1 669 253 9011

contact@hashedin.com

linkedIn youtube