SageMaker AutoML

Auto-ML with Amazon SageMaker Autopilot

Share This Blog
Abstract:

Machine learning is defined as the ability for computers to learn without being explicitly programmed. This is supposed to find an algorithm than can extract patterns from an existing data set, and use these patterns to build a predictive model that will generalize well to new incoming data.

A machine learning model is a file that has been trained to recognize certain types of patterns. You train a model over a set of data, providing it an algorithm that it can use to reason over and learn from those data. Currently there are lots of algorithm are available and choosing the best suited one is a hard task. ML practitioners use many opensource tools, libraries, frameworks etc to build the ML model for their dataset. In addition to these automation, orchestration, underlying infrastructure etc has also to be designed, which is time consuming and error-prone.

Amazon SageMaker Autopilot automatically builds, trains, and tunes the best machine learning models based on your data, while allowing you to maintain full control and visibility. With SageMaker Autopilot, you simply provide a tabular dataset and select the target column to predict, which can be a number or a category. SageMaker Autopilot will automatically explore different solutions to find the best model. You then can directly deploy the model to production with just one click, or iterate on the recommended solutions with Amazon SageMaker Studio to further improve the model quality.

SageMaker Autopilot currently supports algorithms such as linear regression, binary classification, and multi-class classification. It performs automatic hyperparameter optimization and automatic instance and cluster size selection. SageMaker Autopilot job consists of following Processes:

  1. Pre-Processing
  2. Candidate Definition Generation
  3. Feature Engineering
  4. Model Tuning
  5. Explainability Report Generated
  6. Deploying model

In this Blog an Auto-ML job will be created and it will be exposed via Amazon API Gateway. Note this services will occur some charges.

Architecture Diagram:
Architecture
Steps involved in the Process

Step 1: Download the Dataset and send it to S3 bucket.
Step 2: Run SageMaker Autopilot
Step 3: Create Lambda Function to integrate SageMaker with AWS API Gateway
Step 4: Configuring API Gateway to invoke request
Step 5: Test the Rest API using Postman

Step 1: Download the Dataset and send it to a s3 bucket.

Step 2: Run SageMaker Autopilot

  • Open SageMaker Studio. In the search bar, search “Experiment” and click on it. A Create experiment tab will open and fill the necessary details.
  • In Connect Your Data give the bucket name and file-path.
  • Select Target column as ‘Y’.
  • Similarly select bucket for output data location.
  • Set the machine learning problem type as “AUTO”. Let the SageMaker autopilot decide appropriate solution for the problem.
  • Turn on auto-deploy ON. It will deploy the model after getting the best fit model and will generate an end point for this automatically.
  • The auto pilot job is started. It will take several hours to complete (3 hours in our case) the process. There will be 250 models created after completion of job. Steps of the auto-ML process can be seen in Realtime.
  • After the Auto-ML job completed, Best Model will be selected out of 250 model according to its attributes. Best Model’s endpoint is also created.
  • Click on Open candidate generation notebook to see Jupyter Notebook like structure with explanation and to customize the underlying code.
  • Go to SageMaker Console > Inference > Endpoints. Select the endpoint that is created in previous process and note down the name of it.

Step 3: Create Lambda Function to integrate SageMaker with AWS API Gateway

  • Create a Lambda Function with Runtime Python 3.x. Having Following Code.
import os
import io
import boto3
import json
import csv

# grab environment variables
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime= boto3.client('runtime.sagemaker')

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))
    
    data = json.loads(json.dumps(event))
    payload = data['data']
    print(payload)
    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                       ContentType='text/csv',
                                       Body=payload)
    print(response)
    result = json.loads(response['Body'].read().decode())
    
    return result
  • Go to Configuration > Environment Variable > Edit and add environment variable as ‘ENDPOINT_NAME’and value as “Endpoint name” of SageMaker ML Model.
  • Go to Configuration > Permission > Execution Role and click on role name. Go to the Policy > JSON and add following line to it. Review and save the policy.
{
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "sagemaker:InvokeEndpoint",
            "Resource": "*"
} 
  • Save the Lambda Function. Then Deploy the Lambda Function.

Step 4: Configuring API Gateway to invoke request

  • Go to API Gateway Console > Create API. Choose REST API > Import.
  • In the settings give API Name. Choose endpoint type as Regional.
  • Click on Actions > Create Resource. Give Resource Name. Then Create Resource.
  • Click on Action > Create Method. Choose POST. Integration Type = Lambda Function. Choose Lambda Region and Name Created in the previous section.
  • Test the API by clicking test icon(Lightning icon). Type following data in the request body.
{
"data":"56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,261,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0"
}
  • Response will be following.
  • Then Click on Action > Deploy API. Choose Deployment Stage as [New Stage] and give a Stage Name. Then Deploy Stage. Then go to Stages and note the Invoke URL.

Step 5: Test the Rest API using Postman

Postman is an application used for API testing. It is an HTTP client that tests HTTP requests, utilizing a graphical user interface. Here postman is used to send POST request to the Machine Learning model and it will get a response from it.

  • Create a new collection in the Postman. Click on New Request. Choose Request type as POST.
  • Copy and Paste the Invoke URL of API Gateway from the Step 4.
  • Go to Body > raw. Choose Format as JSON and paste the following data. Other sample data can be found from the sample zip file.
{
"data":"56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,261,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0"
}
  • The response will be “No” which suggest that the customer having given demographic may reject the marketing offer.
Conclusion

Amazon SageMaker Autopilot helps a lot whether you’re a beginner or expert in Machine Learning. It makes machine learning simpler and accessible to everyone. Autopilot takes care from pre-processing of data to End-point deployment and user needn’t to bother about underlying infrastructure and scaling. With AWS Lambda and API Gateway, the process of deploying ML model to application through REST API becomes easy.

Ganesh
Ganesh Muni