Notes about Azure ML, Part 5 - Azureml AutoML

January 6, 2022
machine-learning azure ml automl

Automated machine learning (AutoML) automates the creation of machine learning models. Typically, the process of creating a model can be long and tedious. AutoML makes it possible for people who do not have coding experience to develop and use ML models.

In a typical machine learning application, we start with raw data for Training. The data might have missing fields, contain outliers, and require cleaning work. The following steps might be required:

Each of these steps may be challenging, and tedious resulting in significant hurdles to using machine learning.

AutoML makes Training, running and deployment a no-code experience. It will go through a combination of algorithms and hyperparameters and searches until it finds the best model for the data, according to metrics defined by the user.

In this post, we will go through creating a model using AutoML.

The data

For this example, we will use the UCI Concrete dataset. This dataset contains data on the compressive strength of concrete given the mixture and curing time as features. The dataset is available from UCI Machine Learning Repository.

A dataset pointing to this data has been created in a previous post, used in this example.

Starting AutoML

AutoMl is started by selecting the AutoMl tab in the AzureML workspace and then beginning an AutoML job. We notice that the job consists of three steps:

AutoML Job Start

Configuring the run

The first step is to select the dataset. This is very easy, as we have already created the dataset, so choosing it will suffice. It is also possible to create a new dataset at this stage.

The next step is to configure the run. There are several options available;

The next step is to select the ML task for this experiment. We notice that AutoML tries to reason if the task is classification, regression or time series from the dataset, but it is possible to select a task manually. In our case, we notice that Regression is the most suitable task for this dataset as the label is numeric.

AutoML Job Config

Additional Run Configuration

It is possible to further configure the run by

In the Additional Settings section, we can specify

AutoML Additional Settings

Featurization settings allow us to confirm the data types of the features. We can also select an inpute method for the missing data in each feature. In this case, the data does not have any missing values.

Validation Type and Starting the Run

Finally, the last step is to select the validation type. We select the validation type as train validation, which will split the data into a training and testing set. We chose a 20% testing set. We also noticed that it is possible to provide an external test set.

The AutoMl run is initialized and started. We noticed settings that we had specified, like the dataset and the compute, together with the status of the run.

Results

Upon completion of the run, we can see the results. We notice the following:

The run duration was about 40 minutes, which is in line with the preset time of 30 minutes per algorithm. We also noticed that one of the algorithms was stopped early as it converged.

The best model section gives us the following information about the selected model:

results

Other sections give us information about the run.

Data Guardrails tells us the featurization steps performed during the run. No actions were performed on our dataset.

Models shows us the models created during the run and their scores, ascending on the primary metric as it was NRMSE.

Results Models

It is also possible to get more details on the selected model by selecting to view the explanation.

Model Explanation

Model explanation displays the top n features that affect the model. In our case, we notice that the curing time and cement are the top two features that affect the strength of concrete.

Deployment

We also notice that it is possible to deploy the model as an API that we can call. Azure allows us to deploy the model as an ACI or a Kubernetes service. A RESTful URL is given to consume the model.




Linear Regression, Part 7 - Multivariate Gradient Descent

January 12, 2022
machine-learning linear-regression python

Linear Regression, Part 6 - The Gradient Descent Algorithm, Univariate Considerations

January 7, 2022
machine-learning linear-regression python

Notes about Azure ML, Part 4 - Creating Azure ML Datasets from a URL

January 5, 2022
machine-learning azure ml dataset datastore
comments powered by Disqus


machine-learning 17 python 13 fuzzy 11 hugo_cms 11 linear-regression 7 azure-ml 5 type1-fuzzy 5 type2-fuzzy 5 type2-fuzzy-library 5 cnc 4 dataset 3 datastore 3 excel 3 r 3 iot 2 it2fs 2 weiszfeld_algorithm 2 arduino 1 automl 1 classifier 1 computation 1 cost-functions 1 development 1 game 1 javascript 1 learning 1 mathjax 1 maths 1 multi-variable 1 mxchip 1 pandas 1 random_walk 1 robot 1 roc 1 tools 1 univariate 1 vscode 1 wsl 1