Deploy pre-trained models on AWS Wavelength with 5G edge using Amazon SageMaker JumpStart

AWS Machine Learning Blog

With the advent of high-speed 5G mobile networks, enterprises are more easily positioned than ever with the opportunity to harness the convergence of telecommunications networks and the cloud. As one of the most prominent use cases to date, machine learning (ML) at the edge has allowed enterprises to deploy ML models closer to their end-customers to reduce latency and increase responsiveness of their applications. As an example, smart venue solutions can use near-real-time computer vision for crowd analytics over 5G networks, all while minimizing investment in on-premises hardware networking equipment. Retailers can deliver more frictionless experiences on the go with natural language processing (NLP), real-time recommendation systems, and fraud detection. Even ground and aerial robotics can use ML to unlock safer, more autonomous operations.
To reduce the barrier to entry of ML at the edge, we wanted to demonstrate an example of deploying a pre-trained model from Amazon SageMaker to AWS Wavelength, all in less than 100 lines of code. In this post, we demonstrate how to deploy a SageMaker model to AWS Wavelength to reduce model inference latency for 5G network-based applications.
Solution overview
Across AWS’s rapidly expanding global infrastructure, AWS Wavelength brings the power of cloud compute and storage to the edge of 5G networks, unlocking more performant mobile experiences. With AWS Wavelength, you can extend your virtual private cloud (VPC) to Wavelength Zones corresponding to the telecommunications carrier’s network edge in 29 cities across the globe. The following diagram shows an example of this architecture.

You can opt in to the Wavelength Zones within a given Region via the AWS Management Console or the AWS Command Line Interface (AWS CLI). To learn more about deploying geo-distributed applications on AWS Wavelength, refer to Deploy geo-distributed Amazon EKS clusters on AWS Wavelength.
Building on the fundamentals discussed in this post, we look to ML at the edge as a sample workload with which to deploy to AWS Wavelength. As our sample workload, we deploy a pre-trained model from Amazon SageMaker JumpStart.
SageMaker is a fully managed ML service that allows developers to easily deploy ML models into their AWS environments. Although AWS offers a number of options for model training—from AWS Marketplace models and SageMaker built-in algorithms—there are a number of techniques to deploy open-source ML models.
JumpStart provides access to hundreds of built-in algorithms with pre-trained models that can be seamlessly deployed to SageMaker endpoints. From predictive maintenance and computer vision to autonomous driving and fraud detection, JumpStart supports a variety of popular use cases with one-click deployment on the console.
Because SageMaker is not natively supported in Wavelength Zones, we demonstrate how to extract the model artifacts from the Region and re-deploy to the edge. To do so, you use Amazon Elastic Kubernetes Service (Amazon EKS) clusters and node groups in Wavelength Zones, followed by creating a deployment manifest with the container image generated by JumpStart. The following diagram illustrates this architecture.

Prerequisites
To make this as easy as possible, ensure that your AWS account has Wavelength Zones enabled. Note that this integration is only available in us-east-1 and us-west-2, and you will be using us-east-1 for the duration of the demo.
To opt in to AWS Wavelength, complete the following steps:
On the Amazon VPC console, choose Zones under Settings and choose US East (Verizon) / us-east-1-wl1.
Choose Manage.
Select Opted in.
Choose Update zones.
Create AWS Wavelength infrastructure
Before we convert the local SageMaker model inference endpoint to a Kubernetes deployment, you can create an EKS cluster in a Wavelength Zone. To do so, deploy an Amazon EKS cluster with an AWS Wavelength node group. To learn more, you can visit this guide on the AWS Containers Blog or Verizon’s 5GEdgeTutorials repository for one such example.
Next, using an AWS Cloud9 environment or interactive development environment (IDE) of choice, download the requisite SageMaker packages and Docker Compose, a key dependency of JumpStart.

pip install sagemaker
pip install ‘sagemaker[local]’ –upgrade
sudo curl -L “https://github.com/docker/compose/releases/download/1.23.1/docker-compose-$(uname -s)-$(uname -m)” -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
docker-compose –version

Create model artifacts using JumpStart
First, make sure that you have an AWS Identity and Access Management (IAM) execution role for SageMaker. To learn more, visit SageMaker Roles.
Using this example, create a file called train_model.py that uses the SageMaker Software Development Kit (SDK) to retrieve a pre-built model (replace with the Amazon Resource Name (ARN) of your SageMaker execution role). In this file, you deploy a model locally using the instance_type attribute in the model.deploy() function, which starts a Docker container within your IDE using all requisite model artifacts you defined:

#train_model.py
from sagemaker import image_uris, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
import sagemaker, boto3, json
from sagemaker import get_execution_role

aws_role = “”
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

# model_version=”*” fetches the latest version of the model.
infer_model_id = “tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2”
infer_model_version= “*”
endpoint_name = name_from_base(f”jumpstart-example-{infer_model_id}”)

# Retrieve the inference docker container uri.
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None,
image_scope=”inference”,
model_id=infer_model_id,
model_version=infer_model_version,
instance_type=”local”,
)
# Retrieve the inference script uri.
deploy_source_uri = script_uris.retrieve(
model_id=infer_model_id, model_version=infer_model_version, script_scope=”inference”
)
# Retrieve the base model uri.
base_model_uri = model_uris.retrieve(
model_id=infer_model_id, model_version=infer_model_version, model_scope=”inference”
)
model = Model(
image_uri=deploy_image_uri,
source_dir=deploy_source_uri,
model_data=base_model_uri,
entry_point=”inference.py”,
role=aws_role,
predictor_cls=Predictor,
name=endpoint_name,
)
print(deploy_image_uri,deploy_source_uri,base_model_uri)
# deploy the Model.
base_model_predictor = model.deploy(
initial_instance_count=1,
instance_type=”local”,
endpoint_name=endpoint_name,
)

Next, set infer_model_id to the ID of the SageMaker model that you would like to use.
For a complete list, refer to Built-in Algorithms with pre-trained Model Table. In our example, we use the Bidirectional Encoder Representations from Transformers (BERT) model, commonly used for natural language processing.
Run the train_model.py script to retrieve the JumpStart model artifacts and deploy the pre-trained model to your local machine:

python train_model.py

Should this step succeed, your output may resemble the following:

763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.8-cpu
s3://jumpstart-cache-prod-us-east-1/source-directory-tarballs/tensorflow/inference/tc/v2.0.0/sourcedir.tar.gz
s3://jumpstart-cache-prod-us-east-1/tensorflow-infer/v2.0.0/infer-tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2.tar.gz

In the output, you will see three artifacts in order: the base image for TensorFlow inference, the inference script that serves the model, and the artifacts containing the trained model. Although you could create a custom Docker image with these artifacts, another approach is to let SageMaker local mode create the Docker image for you. In the subsequent steps, we extract the container image running locally and deploy to Amazon Elastic Container Registry (Amazon ECR) as well as push the model artifact separately to Amazon Simple Storage Service (Amazon S3).
Convert local mode artifacts to remote Kubernetes deployment
Now that you have confirmed that SageMaker is working locally, let’s extract the deployment manifest from the running container. Complete the following steps:
Identify the location of the SageMaker local mode deployment manifest: To do so, search our root directory for any files named docker-compose.yaml.
docker_manifest=$( find /tmp/tmp* -name “docker-compose.yaml” -printf ‘%T+ %pn’ | sort | tail -n 1 | cut -d’ ‘ -f2-)
echo $docker_manifest
Identify the location of the SageMaker local mode model artifacts: Next, find the underlying volume mounted to the local SageMaker inference container, which will be used in each EKS worker node after we upload the artifact to Amazon s3.
model_local_volume = $(grep -A1 -w “volumes:” $docker_manifest | tail -n 1 | tr -d ‘ ‘ | awk -F: ‘{print $1}’ | cut -c 2-)
# Returns something like: /tmp/tmpcr4bu_a7
Create local copy of running SageMaker inference container: Next, we’ll find the currently running container image running our machine learning inference model and make a copy of the container locally. This will ensure we have our own copy of the container image to pull from Amazon ECR.
# Find container ID of running SageMaker Local container
mkdir sagemaker-container
container_id=$(docker ps –format “{{.ID}} {{.Image}}” | grep “tensorflow” | awk ‘{print $1}’)
# Retrieve the files of the container locally
docker cp $my_container_id:/ sagemaker-container/
Before acting on the model_local_volume, which we’ll push to Amazon S3, push a copy of the running Docker image, now in the sagemaker-container directory, to Amazon Elastic Container Registry. Be sure to replace region, aws_account_id, docker_image_id and my-repository:tag or follow the Amazon ECR user guide. Also, be sure to take note of the final ECR Image URL (aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag), which we will use in our EKS deployment.

aws ecr get-login-password –region region | docker login –username AWS –password-stdin aws_account_id.dkr.ecr.region.amazonaws.com
docker build .
docker tag aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag
docker push aws_account_id.dkr.ecr.region.amazonaws.com/my-repository:tag

Now that we have an ECR image corresponding to the inference endpoint, create a new Amazon S3 bucket and copy the SageMaker Local artifacts (model_local_volume) to this bucket. In parallel, create an Identity Access Management (IAM) that provides Amazon EC2 instances access to read objects within the bucket. Be sure to replace with a globally unique name for your Amazon S3 bucket.

# Create S3 Bucket for model artifacts
aws s3api create-bucket –bucket
aws s3api put-public-access-block –bucket –public-access-block-configuration “BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true”
# Step 2: Create IAM attachment to Node Group
cat > ec2_iam_policy.json
Go to Source
07/04/2023 – 21:05 /Robert Belson
Twitter: @hoffeldtcom

Hoffeldt

Deploy pre-trained models on AWS Wavelength with 5G edge using Amazon SageMaker JumpStart

About Admin

About Admin

You May Also Like