AWS Machine Learning Blog

Recently, we’ve been witnessing the rapid development and evolution of generative AI applications, with observability and evaluation emerging as critical aspects for developers, data scientists, and stakeholders. Observability refers to the ability to understand the internal state and behavior of a system by analyzing its outputs, logs, and metrics. Evaluation, on the other hand, involves assessing the quality and relevance of the generated outputs, enabling continual improvement.
Comprehensive observability and evaluation are essential for troubleshooting, identifying bottlenecks, optimizing applications, and providing relevant, high-quality responses. Observability empowers you to proactively monitor and analyze your generative AI applications, and evaluation helps you collect feedback, refine models, and enhance output quality.
In the context of Amazon Bedrock, observability and evaluation become even more crucial. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. As the complexity and scale of these applications grow, providing comprehensive observability and robust evaluation mechanisms are essential for maintaining high performance, quality, and user satisfaction.
We have built a custom observability solution that Amazon Bedrock users can quickly implement using just a few key building blocks and existing logs using FMs, Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, and Amazon Bedrock Agents. This solution uses decorators in your application code to capture and log metadata such as input prompts, output results, run time, and custom metadata, offering enhanced security, ease of use, flexibility, and integration with native AWS services.
Notably, the solution supports comprehensive Retrieval Augmented Generation (RAG) evaluation so you can assess the quality and relevance of generated responses, identify areas for improvement, and refine the knowledge base or model accordingly.
In this post, we set up the custom solution for observability and evaluation of Amazon Bedrock applications. Through code examples and step-by-step guidance, we demonstrate how you can seamlessly integrate this solution into your Amazon Bedrock application, unlocking a new level of visibility, control, and continual improvement for your generative AI applications.
By the end of this post, you will:

Understand the importance of observability and evaluation in generative AI applications
Learn about the key features and benefits of this solution
Gain hands-on experience in implementing the solution through step-by-step demonstrations
Explore best practices for integrating observability and evaluation into your Amazon Bedrock workflows

Prerequisites
To implement the observability solution discussed in this post, you need the following prerequisites:

An active Amazon Web Services (AWS) account and AWS Identity and Access Management (IAM) role with Amazon Bedrock access
Access to the FMs you plan to use
Basic understanding of decorators in your preferred programming language (Python or Node.js)
A clone of the amazon-bedrock-samples GitHub repository
Basic familiarity with AWS services such as Amazon Data Firehose, Amazon Athena, and AWS Glue crawlers (optional, depending on the specific components used in the solution)

Solution overview
The observability solution for Amazon Bedrock empowers users to track and analyze interactions with FMs, knowledge bases, guardrails, and agents using decorators in their source code. Key highlights of the solution include:

Decorator – Decorators are applied to functions invoking Amazon Bedrock APIs, capturing input prompt, output results, custom metadata, custom metrics, and latency related metrics.
Flexible logging –You can use this solution to store logs either locally or in Amazon Simple Storage Service (Amazon S3) using Amazon Data Firehose, enabling integration with existing monitoring infrastructure. Additionally, you can choose what gets logged.
Dynamic data partitioning – The solution enables dynamic partitioning of observability data based on different workflows or components of your application, such as prompt preparation, data preprocessing, feedback collection, and inference. This feature allows you to separate data into logical partitions, making it easier to analyze and process data later.
Security – The solution uses AWS services and adheres to AWS Cloud Security best practices so your data remains within your AWS account.
Cost optimization – This solution uses serverless technologies, making it cost-effective for the observability infrastructure. However, some components may incur additional usage-based costs.
Multiple programming language support – The GitHub repository provides the observability solution in both Python and Node.js versions, catering to different programming preferences.

Here’s a high-level overview of the observability solution architecture:

The following steps explain how the solution works:

Application code using Amazon Bedrock is decorated with @bedrock_logs.watch to save the log
Logged data streams through Amazon Data Firehose
AWS Lambda transforms the data and applies dynamic partitioning based on call_type variable
Amazon S3 stores the data securely
Optional components for advanced analytics
AWS Glue creates tables from S3 data
Amazon Athena enables data querying
Visualize logs and insights in your favorite dashboard tool

This architecture provides comprehensive logging, efficient data processing, and powerful analytics capabilities for your Amazon Bedrock applications.
Getting started
To help you get started with the observability solution, we have provided example notebooks in the attached GitHub repository, covering knowledge bases, evaluation, and agents for Amazon Bedrock. These notebooks demonstrate how to integrate the solution into your Amazon Bedrock application and showcase various use cases and features including feedback collected from users or quality assurance (QA) teams.
The repository contains well-documented notebooks that cover topics such as:

Setting up the observability infrastructure
Integrating the decorator pattern into your application code
Logging model inputs, outputs, and custom metadata
Collecting and analyzing feedback data
Evaluating model responses and knowledge base performance
Example visualization for observability data using AWS services

To get started with the example notebooks, follow these steps:

Clone the GitHub repository

git clone https://github.com/aws-samples/amazon-bedrock-samples.git

Navigate to the observability solution directory

cd amazon-bedrock-samples/evaluation-observe/Custom-Observability-Solution

Follow the instructions in the README file to set up the required AWS resources and configure the solution
Open the provided Jupyter notebooks and follow along with the examples and demonstrations

These notebooks provide a hands-on learning experience and serve as a starting point for integrating our solution into your generative AI applications. Feel free to explore, modify, and adapt the code examples to suit your specific requirements.
Key features
The solution offers a range of powerful features to streamline observability and evaluation for your generative AI applications on Amazon Bedrock:

Decorator-based implementation – Use decorators to seamlessly integrate observability logging into your application functions, capturing inputs, outputs, and metadata without modifying the core logic
Selective logging – Choose what to log by selectively capturing function inputs, outputs, or excluding sensitive information or large data structures that might not be relevant for observability
Logical data partitioning – Create logical partitions in the observability data based on different workflows or application components, enabling easier analysis and processing of specific data subsets
Human-in-the-loop evaluation – Collect and associate human feedback with specific model responses or sessions, facilitating comprehensive evaluation and continual improvement of your application’s performance and output quality
Multi-component support – Support observability and evaluation for various Amazon Bedrock components, including InvokeModel, batch inference, knowledge bases, agents, and guardrails, providing a unified solution for your generative AI applications
Comprehensive evaluation – Evaluate the quality and relevance of generated responses, including RAG evaluation for knowledge base applications, using the open source RAGAS library to compute evaluation metrics

This concise list highlights the key features you can use to gain insights, optimize performance, and drive continual improvement for your generative AI applications on Amazon Bedrock. For a detailed breakdown of the features and implementation specifics, refer to the comprehensive documentation in the GitHub repository.
Implementation and best practices
The solution is designed to be modular and flexible so you can customize it according to your specific requirements. Although the implementation is straightforward, following best practices is crucial for the scalability, security, and maintainability of your observability infrastructure.
Solution deployment
This solution includes an AWS CloudFormation template that streamlines the deployment of required AWS resources, providing consistent and repeatable deployments across environments. The CloudFormation template provisions resources such as Amazon Data Firehose delivery streams, AWS Lambda functions, Amazon S3 buckets, and AWS Glue crawlers and databases.
Decorator pattern
The solution uses the decorator pattern to integrate observability logging into your application functions seamlessly. The @bedrock_logs.watch decorator wraps your functions, automatically logging inputs, outputs, and metadata to Amazon Kinesis Firehose. Here’s an example of how to use the decorator:

# import observability
from observability import BedrockLogs

# instantiate BedrockLogs in Firehose mode
bedrock_logs = BedrockLogs(delivery_stream_name=’your-firehose-delivery-stream’, feedback_variables=True)

# decorate your function
@bedrock_logs.watch(capture_input=True, capture_output=True, call_type=”)
def your_function(arg1, arg2):
# Your function code here along with any custom metric of your choosing
return output

Human-in-the-loop evaluation
The solution supports human-in-the-loop evaluation so you can incorporate human feedback into the performance evaluation of your generative AI application. You can involve end users, experts, or QA teams in the evaluation process, providing insights to enhance output quality and relevance. Here’s an example of how you can implement human-in-the-loop evaluation:

@bedrock_logs.watch(call_type=’Retrieve-and-Generate-with-KB’)
def main(input_arguments):
# Your code to interact with Amazon Bedrock Knowledge Base or Agent
return response, custom_metric, etc.

@bedrock_logs.watch(call_type=’observation-feedback’)
def observation_level_feedback(feedback):
pass

# Invoke main function with user input and get run_id and observation_id
tuple_of_function_outputs, run_id, observation_id = main(input_arguments)

# Collect human feedback on model response in your application
user_feedback = ‘thumbs-up’

observation_feedback_from_front_end = {
‘user_id’: ‘User-1’,
‘f_run_id’: run_id,
‘f_observation_id’: observation_id,
‘actual_feedback’: user_feedback
}

# Log the human-in-loop feedback using observation_level_feedback function
observation_level_feedback(observation_feedback_from_front_end)

By using the run_id and observation_id generated, you can associate human feedback with specific model responses or sessions. This feedback can then be analyzed and used to refine the knowledge base, fine-tune models, or identify areas for improvement.
Best practices
It’s recommended to follow these best practices:

Plan call types in advance – Determine the logical partitions (call_type) for your observability data based on different workflows or application components. This enables easier analysis and processing of specific data subsets.
Use feedback variables – Configure feedback_variables=True when initializing BedrockLogs to generate run_id and observation_id. These IDs can be used to join logically partitioned datasets, associating feedback data with corresponding model responses.
Extend for general steps – Although the solution is designed for Amazon Bedrock, you can use the decorator pattern to log observability data for general steps such as prompt preparation, postprocessing, or other custom workflows.
Log custom metrics – If you need to calculate custom metrics such as latency, context relevance, faithfulness, or any other metric, you can pass these values in the response of your decorated function, and the solution will log them alongside the observability data.
Selective logging – Use the capture_input and capture_output parameters to selectively log function inputs or outputs or exclude sensitive information or large data structures that might not be relevant for observability.
Comprehensive evaluation – Evaluate the quality and relevance of generated responses, including RAG evaluation for knowledge base applications, using the KnowledgeBasesEvaluations

By following these best practices and using the features of the solution, you can set up comprehensive observability and evaluation for your generative AI applications to gain valuable insights, identify areas for improvement, and enhance the overall user experience.
In the next post in this three-part series, we dive deeper into observability and evaluation for RAG and agent-based generative AI applications, providing in-depth insights and guidance.
Clean up
To avoid incurring costs and maintain a clean AWS account, you can remove the associated resources by deleting the AWS CloudFormation stack you created for this walkthrough. You can follow the steps provided in the Deleting a stack on the AWS CloudFormation console documentation to delete the resources created for this solution.
Conclusion and next steps
This comprehensive solution empowers you to seamlessly integrate comprehensive observability into your generative AI applications in Amazon Bedrock. Key benefits include streamlined integration, selective logging, custom metadata tracking, and comprehensive evaluation capabilities, including RAG evaluation. Use AWS services such as Athena to analyze observability data, drive continual improvement, and connect with your favorite dashboard tool to visualize the data.
This post focused is on Amazon Bedrock, but it can be extended to broader machine learning operations (MLOps) workflows or integrated with other AWS services such as AWS Lambda or Amazon SageMaker. We encourage you to explore this solution and integrate it into your workflows. Access the source code and documentation in our GitHub repository  and start your integration journey. Embrace the power of observability and unlock new heights for your generative AI applications.

About the authors
Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.
Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.
Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.
Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.
Go to Source
30/10/2024 – 16:56 /Ishan Singh
Twitter: @hoffeldtcom

AWS Machine Learning Blog

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
Batch inference in Amazon Bedrock efficiently processes large volumes of data using foundation models (FMs) when real-time results aren’t necessary. It’s ideal for workloads that aren’t latency sensitive, such as obtaining embeddings, entity extraction, FM-as-judge evaluations, and text categorization and summarization for business reporting tasks. A key advantage is its cost-effectiveness, with batch inference workloads charged at a 50% discount compared to On-Demand pricing. Refer to Supported Regions and models for batch inference for current supporting AWS Regions and models.
Although batch inference offers numerous benefits, it’s limited to 10 batch inference jobs submitted per model per Region. To address this consideration and enhance your use of batch inference, we’ve developed a scalable solution using AWS Lambda and Amazon DynamoDB. This post guides you through implementing a queue management system that automatically monitors available job slots and submits new jobs as slots become available.
We walk you through our solution, detailing the core logic of the Lambda functions. By the end, you’ll understand how to implement this solution so you can maximize the efficiency of your batch inference workflows on Amazon Bedrock. For instructions on how to start your Amazon Bedrock batch inference job, refer to Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock.
The power of batch inference
Organizations can use batch inference to process large volumes of data asynchronously, making it ideal for scenarios where real-time results are not critical. This capability is particularly useful for tasks such as asynchronous embedding generation, large-scale text classification, and bulk content analysis. For instance, businesses can use batch inference to generate embeddings for vast document collections, classify extensive datasets, or analyze substantial amounts of user-generated content efficiently.
One of the key advantages of batch inference is its cost-effectiveness. Amazon Bedrock offers select FMs for batch inference at 50% of the On-Demand inference price. Organizations can process large datasets more economically because of this significant cost reduction, making it an attractive option for businesses looking to optimize their generative AI processing expenses while maintaining the ability to handle substantial data volumes.
Solution overview
The solution presented in this post uses batch inference in Amazon Bedrock to process many requests efficiently using the following solution architecture.

This architecture workflow includes the following steps:

A user uploads files to be processed to an Amazon Simple Storage Service (Amazon S3) bucket br-batch-inference-{Account_Id}-{AWS-Region} in the to-process folder. Amazon S3 invokes the {stack_name}-create-batch-queue-{AWS-Region} Lambda function.
The invoked Lambda function creates new job entries in a DynamoDB table with the status as Pending. The DynamoDB table is crucial for tracking and managing the batch inference jobs throughout their lifecycle. It stores information such as job ID, status, creation time, and other metadata.
The Amazon EventBridge rule scheduled to run every 15 minutes invokes the {stack_name}-process-batch-jobs-{AWS-Region} Lambda function.
The {stack_name}-process-batch-jobs-{AWS-Region} Lambda function performs several key tasks:

Scans the DynamoDB table for jobs in InProgress, Submitted, Validation and Scheduled status
Updates job status in DynamoDB based on the latest information from Amazon Bedrock
Calculates available job slots and submits new jobs from the Pending queue if slots are available
Handles error scenarios by updating job status to Failed and logging error details for troubleshooting

The Lambda function makes the GetModelInvocationJob API call to get the latest status of the batch inference jobs from Amazon Bedrock
The Lambda function then updates the status of the jobs in DynamoDB using the UpdateItem API call, making sure that the table always reflects the most current state of each job
The Lambda function calculates the number of available slots before the Service Quota Limit for batch inference jobs is reached. Based on this, it queries for jobs in the Pending state that can be submitted
If there is a slot available, the Lambda function will make CreateModelInvocationJob API calls to create new batch inference jobs for the pending jobs
It updates the DynamoDB table with the status of the batch inference jobs created in the previous step
After one batch job is complete, its output files will be available in the S3 bucket br-batch-inference-{Account_Id}-{AWS-Region} processed folder

Prerequisites
To perform the solution, you need the following prerequisites:

An active AWS account.
An AWS Region from the list of batch inference supported Regions for Amazon Bedrock.
Access to your selected models hosted on Amazon Bedrock. Make sure the selected model has been enabled in Amazon Bedrock.
If you plan to use your own AWS Identity and Access Management (IAM) role for batch inference, create it with a trust policy and Amazon S3 access (read access to the folder containing input data and write access to the folder storing output data).

Deployment guide
To deploy the pipeline, complete the following steps:

Choose the Launch Stack button:
Choose Next, as shown in the following screenshot
Specify the pipeline details with the options fitting your use case:

Stack name (Required) – The name you specified for this AWS CloudFormation. The name must be unique in the region in which you’re creating it.
ModelId (Required) – Provide the model ID that you need your batch job to run with.
RoleArn (Optional) – By default, the CloudFormation stack will deploy a new IAM role with the required permissions. If you have a role you want to use instead of creating a new role, provide the IAM role Amazon Resource Name (ARN) that has sufficient permission to create a batch inference job in Amazon Bedrock and read/write in the created S3 bucket br-batch-inference-{Account_Id}-{AWS-Region}. Follow the instructions in the prerequisites section to create this role.

In the Amazon Configure stack options section, add optional tags, permissions, and other advanced settings if needed. Or you can just leave it blank and choose Next, as shown in the following screenshot.
Review the stack details and select I acknowledge that AWS CloudFormation might create AWS IAM resources, as shown in the following screenshot.
Choose Submit. This initiates the pipeline deployment in your AWS account.
After the stack is deployed successfully, you can start using the pipeline. First, create a /to-process folder under the created Amazon S3 location for input. A .jsonl uploaded to this folder will have a batch job created with the selected model. The following is a screenshot of the DynamoDB table where you can track the job status and other types of metadata related to the job.
After your first batch job from the pipeline is complete, the pipeline will create a /processed folder under the same bucket, as shown in the following screenshot. Outputs from the batch jobs created by this pipeline will be stored in this folder.
To start using this pipeline, upload the .jsonl files you’ve prepared for batch inference in Amazon Bedrock

You’re done! You’ve successfully deployed your pipeline and you can check the batch job status in the Amazon Bedrock console. If you want to have more insights about each .jsonl file’s status, navigate to the created DynamoDB table {StackName}-DynamoDBTable-{UniqueString} and check the status there. You may need to wait up to 15 minutes to observe the batch jobs created because EventBridge is scheduled to scan DynamoDB every 15 minutes.
Clean up
If you no longer need this automated pipeline, follow these steps to delete the resources it created to avoid additional cost:

On the Amazon S3 console, manually delete the contents inside buckets. Make sure the bucket is empty before moving to step 2.
On the AWS CloudFormation console, choose Stacks in the navigation pane.
Select the created stack and choose Delete, as shown in the following screenshot.

This automatically deletes the deployed stack.
Conclusion
In this post, we’ve introduced a scalable and efficient solution for automating batch inference jobs in Amazon Bedrock. By using AWS Lambda, Amazon DynamoDB, and Amazon EventBridge, we’ve addressed key challenges in managing large-scale batch processing workflows.
This solution offers several significant benefits:

Automated queue management – Maximizes throughput by dynamically managing job slots and submissions
Cost optimization – Uses the 50% discount on batch inference pricing for economical large-scale processing

This automated pipeline significantly enhances your ability to process large amounts of data using batch inference for Amazon Bedrock. Whether you’re generating embeddings, classifying text, or analyzing content in bulk, this solution offers a scalable, efficient, and cost-effective approach to batch inference.
As you implement this solution, remember to regularly review and optimize your configuration based on your specific workload patterns and requirements. With this automated pipeline and the power of Amazon Bedrock, you’re well-equipped to tackle large-scale AI inference tasks efficiently and effectively. We encourage you to try it out and share your feedback to help us continually improve this solution.
For additional resources, refer to the following:

User guide – Process multiple prompts with batch inference
Code sample – Sample for building your batch inference job
Blog post – Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

About the authors
Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.
Neeraj Lamba is a Cloud Infrastructure Architect with Amazon Web Services (AWS) Worldwide Public Sector Professional Services. He helps customers transform their business by helping design their cloud solutions and offering technical guidance. Outside of work, he likes to travel, play Tennis and experimenting with new technologies.
Go to Source
30/10/2024 – 16:55 /Yanyan Zhang
Twitter: @hoffeldtcom

AWS Machine Learning Blog

Professionals in a wide variety of industries have adopted digital video conferencing tools as part of their regular meetings with suppliers, colleagues, and customers. These meetings often involve exchanging information and discussing actions that one or more parties must take after the session. The traditional way to make sure information and actions aren’t forgotten is to take notes during the session; a manual and tedious process that can be error-prone, particularly in a high-activity or high-pressure scenario. Furthermore, these notes are usually personal and not stored in a central location, which is a lost opportunity for businesses to learn what does and doesn’t work, as well as how to improve their sales, purchasing, and communication processes.
This post presents a solution where you can upload a recording of your meeting (a feature available in most modern digital communication services such as Amazon Chime) to a centralized video insights and summarization engine. This engine uses artificial intelligence (AI) and machine learning (ML) services and generative AI on AWS to extract transcripts, produce a summary, and provide a sentiment for the call. The solution notes the logged actions per individual and provides suggested actions for the uploader. All of this data is centralized and can be used to improve metrics in scenarios such as sales or call centers. Many commercial generative AI solutions available are expensive and require user-based licenses. In contrast, our solution is an open-source project powered by Amazon Bedrock, offering a cost-effective alternative without those limitations.
This solution can help your organizations’ sales, sales engineering, and support functions become more efficient and customer-focused by reducing the need to take notes during customer calls.
Use case overview
The organization in this scenario has noticed that during customer calls, some actions often get skipped due to the complexity of the discussions, and that there might be potential to centralize customer data to better understand how to improve customer interactions in the long run. The organization already records sessions in video format, but these videos are often kept in individual repositories, and a review of the access logs has shown that employees rarely use them in their day-to-day activities.
To increase efficiency, reduce the load, and gain better insights, this solution looks at how to use generative AI to analyze recorded videos and provide employees with valuable insights relating to their calls. It also supports audio files so you have flexibility around the type of call recordings you use. Generated call transcripts and insights include conversation summary, sentiment, a list of logged actions, and a set of suggested next best actions. These insights are stored in a central repository, unlocking the ability for analytics teams to have a single view of interactions and use the data to formulate better sales and support strategies.
Organizations typically can’t predict their call patterns, so the solution relies on AWS serverless services to scale during busy times. This enables you to keep up with peak demands, but also scale down to reduce costs during times such as seasonal holidays when the sales, engineering, and support teams are away.
This post provides guidance on how you can create a video insights and summarization engine using AWS AI/ML services. We walk through the key components and services needed to build the end-to-end architecture, offering example code snippets and explanations for each critical element that help achieve the core functionality. This approach should enable you to understand the underlying architectural concepts and provides flexibility for you to either integrate these into existing workloads or use them as a foundation to build a new workload.
Solution overview
The following diagram illustrates the pipeline for the video insights and summarization engine.

To enable the video insights solution, the architecture uses a combination of AWS services, including the following:

Amazon API Gateway is a fully managed service that makes it straightforward for developers to create, publish, maintain, monitor, and secure APIs at scale.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
AWS Lambda is an event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. You can invoke Lambda functions from over 200 AWS services and software-as-a-service (SaaS) applications.
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. You can use Amazon S3 to securely store objects and also serve static websites.
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it straightforward for developers to add speech-to-text capability to their applications.

For integration between services, we use API Gateway as an event trigger for our Lambda function, and DynamoDB as a highly scalable database to store our customer details. Finally, video or audio files uploaded are stored securely in an S3 bucket.
The end-to-end solution for the video insights and summarization engine starts with the UI. We build a simple static web application hosted in Amazon S3 and deploy an Amazon CloudFront distribution to serve the static website for low latency and high transfer speeds. We use CloudFront origin access control (OAC) to secure Amazon S3 origins and permit access to the designated CloudFront distributions only. With Amazon Cognito, we are able to protect the web application from unauthenticated users.
We use API Gateway as the entry point for real-time communications between the frontend and backend of the video insights and summarization engine, while controlling access using Amazon Cognito as the authorizer. With Lambda integration, we can create a web API with an endpoint to the Lambda function.
To start the workflow, upload a raw video file directly into an S3 bucket with the pre-signed URL given through API Gateway and a Lambda function. The updated video is fed into Amazon Transcribe, which converts the speech of the video into a video transcript in text format. Finally, we use large language models (LLMs) available through Amazon Bedrock to summarize the video transcript and extract insights from the video content.
The solution stores uploaded videos and video transcripts in Amazon S3, which offers durable, highly available, and scalable data storage at a low cost. We also store the video summaries, sentiments, insights, and other workflow metadata in DynamoDB, a NoSQL database service that allows you to quickly keep track of the workflow status and retrieve relevant information from the original video.
We also use Amazon CloudWatch and Amazon EventBridge to monitor every component of the workflow in real time and respond as necessary.
AI/ML workflow
In this post, we focus on the workflow using AWS AI/ML services to generate the summarized content and extract insights from the video transcript.
Starting with the Amazon Transcribe StartTranscriptionJob API, we transcribe the original video stored in Amazon S3 into a JSON file. The following code shows an example of this using Python:

job_args = {
‘TranscriptionJobName’: jobId,
‘Media’: {‘MediaFileUri’: media_uri},
‘MediaFormat’: media_format,
‘LanguageCode’: language_code,
‘Subtitles’: {‘Formats’: [‘srt’]},
‘OutputBucketName’: output_bucket_name,
‘OutputKey’: jobId + “.json”
}
if vocabulary_name is not None:
job_args[‘Settings’] = {‘VocabularyName’: vocabulary_name}
response = transcribe_client.start_transcription_job(**job_args)

The following is an example of our workload’s Amazon Transcribe output in JSON format:

{
“jobName”: “a37f0f27-0908-45eb-8d98-8efc3a9d4590-1698392975”,
“accountId”: “8469761*****”,
“results”: {
“transcripts”: [{
“transcript”: “Thank you for calling, my name is Ivy. Can I have your name?…”}],
“items”: [{
“start_time”: “7.809”,”end_time”: “8.21”,
“alternatives”: [{
“confidence”: “0.998”,”content”: “Thank”}],
“type”: “pronunciation”
},

]
},
“status”: “COMPLETED”
}

As the output from Amazon Transcribe is created and stored in Amazon S3, we use Amazon S3 Event Notifications to invoke an event to a Lambda function when the transcription job is finished and a video transcript file object has been created.
In the next step of the workflow, we use LLMs available through Amazon Bedrock. LLMs are neural network-based language models containing hundreds of millions to over a trillion parameters. The ability to generate content has resulted in LLMs being widely utilized for use cases such as text generation, summarization, translation, sentiment analysis, conversational chatbots, and more. For this solution, we use Anthropic’s Claude 3 on Amazon Bedrock to summarize the original text, get the sentiment of the conversation, extract logged actions, and suggest further actions for the sales team. In Amazon Bedrock, you can also use other LLMs for text summarization such as Amazon Titan, Meta Llama 3, and others, which can be invoked using the Amazon Bedrock API.
As shown in the following Python code to summarize the video transcript, you can call the InvokeEndpoint API to invoke the specified Amazon Bedrock model to run inference using the input provided in the request body:

modelId = ‘anthropic.claude-3-sonnet-20240229-v1:0’
accept = ‘application/json’
contentType = ‘application/json’

prompt_template = “””
The following is the transcript from one of our sales representatives and our customer.
The AI is a tool that the sales representative uses to obtain a brief summary of what the conversation was about. The AI based this summary on the contents of the conversation and does not make up events that did not happen.
The transcript is:

{}

What is the 2 paragraphs summary of the conversation?
“””

PROMPT = prompt_template.format(raw_text)

body = json.dumps(
{
“messages”: [
{
“role”: “user”,
“content”: [
{“type”: “text”, “text”: PROMPT}
],
}
],
“anthropic_version”: “bedrock-2023-05-31”,
“max_tokens”: 512,
“temperature”: 0.1,
“top_p”: 0.9
}
)
response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response[“body”].read())
summary = response_body[“content”][0][“text”]

You can invoke the endpoint with different parameters defined in the payload to impact the text summarization:

temperature – temperature is used in text generation to control the level of randomness of the output. A lower temperature value results in a more conservative and deterministic output; a higher temperature value encourages more diverse and creative outputs.
top_p – top_p, also known as nucleus sampling, is another parameter to control the diversity of the summaries text. It indicates the cumulative probability threshold to select the next token during the text generation process. Lower values of top_p result in a narrower selection of tokens with high probabilities, leading to more deterministic outputs. Conversely, higher values of top_p introduce more randomness and diversity into the generated summaries.

Although there’s no universal optimal combination of top_p and temperature for all scenarios, in the preceding code, we demonstrate sample values with high top_p and low temperature in order to generate summaries focused on key information, maintaining fidelity to the original video transcript while still introducing some degree of wording variation.
The following is another example of using the Anthropic’s Claude 3 model through the Amazon Bedrock API to provide suggested actions to sales representatives based on the video transcript:

prompt_template = “””
The following is the transcript from one of our sales representatives and our customer.
The AI is a tool that the sales representative uses to look into what additional actions they can use to increase sales after the session. The AI bases the suggested actions on the contents of the conversation and what it thinks might help increase the customers satisfaction and loyalty.

The transcript is:

{}

Using the transcript above, provide a bullet point format for suggested actions the sales representative could do to increase follow on sales.
“””

PROMPT = prompt_template.format(raw_text)

body = json.dumps(
{
“messages”: [
{
“role”: “user”,
“content”: [
{“type”: “text”, “text”: PROMPT}
],
}
],
“anthropic_version”: “bedrock-2023-05-31”,
“max_tokens”: 1024,
“temperature”: 0.1,
“top_p”: 0.9
}
)

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response[“body”].read())
suggested_actions = response_body[“content”][0][“text”]

After we successfully generate video summaries, sentiments, logged actions, and suggested actions from the original video transcript, we store these insights in a DynamoDB table, which is then updated in the UI through API Gateway.
The following screenshot shows a simple UI for the video insights and summarization engine. The frontend is built on Cloudscape, an open source design system for the cloud. On average, it takes less than 5 minutes and costs no more than $2 to process 1 hour of video, assuming the video’s transcript contains approximately 8,000 words.

Future improvements
The solution in this post shows how you can use AWS services with Amazon Bedrock to build a cost-effective and powerful generative AI application that allows you to analyze video content and extract insights to help teams become more efficient. This solution is just the beginning of the value you can unlock with AWS generative AI and broader ML services.
One example of how this solution could be taken further is to expand the scope to help tackle some of the logged actions from calls. The addition of services such as Amazon Bedrock Agents could help automate some of the responses, such as forwarding relevant documentation like product specifications, price lists, or even a simple recap email. All of these could save effort and time, enabling you to focus more on value-added activities.
Similarly, the centralization of all this data could allow you to create an analytics layer on top of a centralized database to help formulate more effective sales and support strategies. This data is usually lost or misplaced within organizations because people prefer different methods for note collection. The proposed solution gives you the freedom to centralize data but also augment organization data with the voice of the customer. For example, the analytics team could analyze what employees did well in calls that have a positive sentiment and offer training or guidance to help everyone achieve more positive customer interactions.
Conclusion
In this post, we described how to create a solution that ingests video and audio files to create powerful, actionable, and accurate insights that an organization can use through the power of Amazon Bedrock generative AI capabilities on AWS. The insights provided can help reduce the undifferentiated heavy lifting that customer-facing teams encounter, and also provide a centralized dataset of customer conversations that an organization can use to further improve performance.
For further information on how you can use Amazon Bedrock for your workloads, see Amazon Bedrock.

About the Authors
Simone Zucchet is a Solutions Architect Manager at AWS. With over 6 years of experience as a Cloud Architect, Simone enjoys working on innovative projects that help transform the way organizations approach business problems. He helps support large enterprise customers at AWS and is part of the Machine Learning TFC. Outside of his professional life, he enjoys working on cars and photography.
Vu San Ha Huynh is a Solutions Architect at AWS. He has a PhD in computer science and enjoys working on different innovative projects to help support large enterprise customers.
Adam Raffe is a Principal Solutions Architect at AWS. With over 8 years of experience in cloud architecture, Adam helps large enterprise customers solve their business problems using AWS.
Ahmed Raafat is a Principal Solutions Architect at AWS, with 20 years of field experience and a dedicated focus of 6 years within the AWS ecosystem. He specializes in AI/ML solutions. His extensive experience spans various industry verticals, making him a trusted advisor for numerous enterprise customers, helping them seamlessly navigate and accelerate their cloud journey.
Go to Source
30/10/2024 – 16:55 /Simone Zucchet
Twitter: @hoffeldtcom

AWS Machine Learning Blog

Enterprises in industries like manufacturing, finance, and healthcare are inundated with a constant flow of documents—from financial reports and contracts to patient records and supply chain documents. Historically, processing and extracting insights from these unstructured data sources has been a manual, time-consuming, and error-prone task. However, the rise of intelligent document processing (IDP), which uses the power of artificial intelligence and machine learning (AI/ML) to automate the extraction, classification, and analysis of data from various document types is transforming the game. For manufacturers, this means streamlining processes like purchase order management, invoice processing, and supply chain documentation. Financial services firms can accelerate workflows around loan applications, account openings, and regulatory reporting. And in healthcare, IDP revolutionizes patient onboarding, claims processing, and medical record keeping.
By integrating IDP into their operations, organizations across these key industries experience transformative benefits: increased efficiency and productivity through the reduction of manual data entry, improved accuracy and compliance by reducing human errors, enhanced customer experiences due to faster document processing, greater scalability to handle growing volumes of documents, and lower operational costs associated with document management.
This post demonstrates how to build an IDP pipeline for automatically extracting and processing data from documents using Amazon Bedrock Prompt Flows, a fully managed service that enables you to build generative AI workflow using Amazon Bedrock and other services in an intuitive visual builder. Amazon Bedrock Prompt Flows allows you to quickly update your pipelines as your business changes, scaling your document processing workflows to help meet evolving demands.
Solution overview
To be scalable and cost-effective, this solution uses serverless technologies and managed services. In addition to Amazon Bedrock Prompt Flows, the solution uses the following services:

Amazon Textract – Automatically extracts printed text, handwriting, and data from
Amazon Simple Storage Service (Amazon S3) – Object storage built to retrieve data from anywhere.
Amazon Simple Notification Service (Amazon SNS) – A highly available, durable, secure, and fully managed publish-subscribe (pub/sub) messaging service to decouple microservices, distributed systems, and serverless applications.
AWS Lambda – A compute service that runs code in response to triggers such as changes in data, changes in application state, or user actions. Because services such as Amazon S3 and Amazon SNS can directly trigger an AWS Lambda function, you can build a variety of real-time serverless data-processing systems.
Amazon DynamoDB – a serverless, NoSQL, fully-managed database with single-digit millisecond performance at

Solution architecture
The solution proposed contains the following steps:

Users upload a PDF for analysis to Amazon S3.
The Amazon S3 upload triggers an AWS Lambda function execution.
The function invokes Amazon Textract to extract text from the PDF in batch mode.
Amazon Textract sends an SNS notification when the job is complete.
An AWS Lambda function reads the Amazon Textract response and calls an Amazon Bedrock prompt flow to classify the document.
Results of the classification are stored in Amazon S3 and sent to a destination AWS Lambda function.
The destination AWS Lambda function calls an Amazon Bedrock prompt flow to extract and analyze data based on the document class provided.
Results of the extraction and analysis are stored in Amazon S3.

This workflow is shown in the following diagram.

In the following sections, we dive deep into how to build your IDP pipeline with Amazon Bedrock Prompt Flows.
Prerequisites
To complete the activities described in this post, ensure that you complete the following prerequisites in your local environment:

An AWS account with sufficient permissions to access the console and execute CLI commands.
Install and configure the AWS Command Line Interface (AWS CLI).
Install the AWS Serverless Application Model Command Line Interface (AWS SAM CLI).
Access to an AWS Region that supports Amazon Bedrock Prompt Flows.
To gain model access to Anthropic Claude 3 Sonnet on Amazon Bedrock, follow the instructions at Access Amazon Bedrock foundation models.

Implementation time and cost estimation

Time to complete
~ 60 minutes

Cost to run 1000 pages
Under $25

Time to cleanup
~20 minutes

Learning level
Advanced (300)

Deploy the solution
To deploy the solution, follow these steps:

Clone the GitHub repository
Use the shell script to build and deploy the solution by running the following commands from your project root directory:

chmod +x deploy.sh
./deploy.sh

This will trigger the AWS CloudFormation template in your AWS account.

Test the solution
Once the template is deployed successfully, follow these steps to test the solution:

On the AWS CloudFormation console, select the stack that was deployed
Select the Resources tab
Locate the resources labeled SourceS3Bucket and DestinationS3Bucket, as shown in the following screenshot. Select the link to open the SourceS3Bucket in a new tab

Select Upload and then Add folder
Under sample_files, select the folder customer123, then choose Upload

Alternatively, you can upload the folder using the following AWS CLI command from the root of the project:

aws s3 sync ./sample_files/customer123 s3://[SourceS3Bucket_NAME]/customer123

After a few minutes the uploaded files will be processed. To view the results, follow these steps:

Open the DestinationS3Bucket
Under customer123, you should see a folder for documents for the processing jobs. Download and review the files locally using the console or with the following AWS CLI command

aws s3 sync s3://[DestinationS3Bucket_NAME]/customer123 ./result_files/customer123

Inside the folder for customer123 you will see several subfolders, as shown in the following diagram:

customer123
└── [Long Textract Job ID]
├── classify_response.txt
├── input_doc.txt
└── FOR_REVIEW
├── pages_0.txt
└── report.txt
└── [Long Textract Job ID]
├── classify_response.txt
├── input_doc.txt
└── URLA_1003
├── pages_0.json
├── pages_0.txt
└── report.txt
└── [Long Textract Job ID]
├── classify_response.txt
├── input_doc.txt
└── BANK_STATEMENT
├── pages_0.json
├── pages_0.txt
└── report.txt
└── [Long Textract Job ID]
├── classify_response.txt
├── input_doc.txt
└── DRIVERS_LICENSE
├── pages_0.json
├── pages_0.txt
└── report.txt

How it works
After the document text is extracted, it is sent to a classify prompt flow along with a list of classes, as shown in the following screenshot:

The list of classes is generated in the AWS Lambda function by using the API to identify existing prompt flows that contain class definitions in their description. This approach allows us to expand the solution to new document types by adding a new prompt flow supporting the new document class, as shown in the following screenshot:

For each document type, you can implement an extract and analyze flow that is appropriate to this document type. The following screenshot shows an example flow from the URLA_1003 flow. In this case, a prompt is used to convert the text to a standardized JSON format, and a second prompt then analyzes that JSON document to generate a report to the processing agent.

Expand the solution using Amazon Bedrock Prompt Flows
To adapt to new use cases without changing the underlying code, use Amazon Bedrock Prompt Flows as described in the following steps.
Create a new prompt
From the files you downloaded, look for a folder named FOR_REVIEW. This folder contains documents that were processed and did not fit into an existing class. Open report.txt and review the suggested document class and proposed JSON template.

In the navigation pane in Amazon Bedrock, open Prompt management and select Create prompt, as shown in the following screenshot:

Name the new prompt IDP_PAYSTUB_JSON and then choose Create
In the Prompt box, enter the following text. Replace COPY YOUR JSON HERE with the JSON template from your txt file

Analyze the provided paystub

{{doc_text}}

Provide a structured JSON object containing the following information:

[COPY YOUR JSON HERE]

The following screenshot demonstrates this step.

Choose Select model and choose Anthropic Claude 3 Sonnet
Save your changes by choosing Save draft
To test your prompt, open the pages_[n].txt file FOR_REVIEW folder and copy the content into the doc_text input box. Choose Run and the model should return a response

The following screenshot demonstrates this step.

When you are satisfied with the results, choose Create Version. Note the version number because you will need it in the next section

Create a prompt flow
Now we will create a prompt flow using the prompt you created in the previous section.

In the navigation menu, choose Prompt flows and then choose Create prompt flow, as shown in the following screenshot:

Name the new flow IDP_PAYSTUB
Choose Create and use a new service role and then choose Save

Next, create the flow using the following steps. When you are done, the flow should resemble the following screenshot.

Configure the Flow input node:

Choose the Flow input node and select the Configure
Select Object as the Type. This means that flow invocation will expect to receive a JSON object.

Add the S3 Retrieval node:

In the Prompt flow builder navigation pane, select the Nodes tab
Drag an S3 Retrieval node into your flow in the center pane
In the Prompt flow builder pane, select the Configure tab
Enter get_doc_text as the Node name
Expand the Inputs Set the input express for objectKey to $.data.doc_text_s3key
Drag a connection from the output of the Flow input node to the objectKey input of this node

Add the Prompt node:

Drag a Prompt node into your flow in the center pane
In the Prompt flow builder pane, select the Configure tab
Enter map_to_json as the Node name
Choose Use a prompt from your Prompt Management
Select IDP_PAYSTUB_JSON from the dropdown
Choose the version you noted previously
Drag a connection from the output of the get_doc_text node to the doc_text input of this node

Add the S3 Storage node:

In the Prompt flow builder navigation pane, select the Nodes tab
Drag an S3 Storage node into your flow in the center pane
In the Prompt flow builder pane, select the Configure tab in
Enter save_json as the Node name
Expand the Inputs Set the input express for objectKey to $.data.JSON_s3key
Drag a connection from the output of the Flow input node to the objectKey input of this node
Drag a connection from the output of the map_to_json node to the content input of this node

Configure the Flow output node:

Drag a connection from the output of the save_json node to the input of this node

Choose Save to save your flow. Your flow should now be prepared for testing

To test your flow, in the Test prompt flow pane on the right, enter the following JSON object. Choose Run and the flow should return a model response
When you are satisfied with the result, choose Save and exit

{
“doc_text_s3key”: “[PATH TO YOUR TEXT FILE IN S3].txt”,
“JSON_s3key”: “[PATH TO YOUR TEXT FILE IN S3].json”
}

To get the path to your file, follow these steps:

Navigate to FOR_REVIEW in S3 and choose the pages_[n].txt file
Choose the Properties tab
Copy the key path by selecting the copy icon to the left of the key value, as shown in the following screenshot. Be sure to replace .txt with .json in the second line of input as noted previously.

Publish a version and alias

On the flow management screen, choose Publish version. A success banner appears at the top
At the top of the screen, choose Create alias
Enter latest for the Alias name
Choose Use an existing version to associate this alias. From the dropdown menu, choose the version that you just published
Select Create alias. A success banner appears at the top.
Get the FlowId and AliasId to use in the step below

Choose the Alias you just created
From the ARN, copy the FlowId and AliasId

Add your new class to DynamoDB

Open the AWS Management Console and navigate to the DynamoDB service.
Select the table document-processing-bedrock-prompt-flows-IDP_CLASS_LIST
Choose Actions then Create item
Choose JSON view for entering the item data.
Paste the following JSON into the editor:

{
“class_name”: {
“S”: “PAYSTUB”
},
“expected_inputs”: {
“S”: “Should contain Gross Pay, Net Pay, Pay Date ”
},
“flow_alias_id”: {
“S”: “[Your flow Alias ID]”
},
“flow_id”: {
“S”: “[Your flow ID]”
},
“flow_name”: {
“S”: “[The name of your flow]”
}
}

Review the JSON to ensure all details are correct.
Choose Create item to add the new class to your DynamoDB table.

Test by repeating the upload of the test file
Use the console to repeat the upload of the paystub.jpg file from your customer123 folder into Amazon S3. Alternatively, you can enter the following command into the command line:

aws s3 cp ./sample_files/customer123/paystub.jpeg s3://[INPUT_BUCKET_NAME]/customer123/

In a few minutes, check the report in the output location to see that you successfully added support for the new document type.
Clean up
Use these steps to delete the resources you created to avoid incurring charges on your AWS account:

Empty the SourceS3Bucket and DestinationS3Bucket buckets including all versions
Use the following shell script to delete the CloudFormation stack and test resources from your account:

chmod +x cleanup.sh
./cleanup.sh

Return to the Expand the solution using Amazon Bedrock Prompt Flows section and follow these steps:

In the Create a prompt flow section:

Choose the flow idp_paystub that you created and choose Delete
Follow the instructions to permanently delete the flow

In the Create a new prompt section:

Choose the prompt paystub_json that you created and choose Delete
Follow the instructions to permanently delete the prompt

Conclusion
This solution demonstrates how customers can use Amazon Bedrock Prompt Flows to deploy and expand a scalable, low-code IDP pipeline. By taking advantage of the flexibility of Amazon Bedrock Prompt Flows, organizations can rapidly implement and adapt their document processing workflows to help meet evolving business needs. The low-code nature of Amazon Bedrock Prompt Flows makes it possible for business users and developers alike to create, modify, and extend IDP pipelines without extensive programming knowledge. This significantly reduces the time and resources required to deploy new document processing capabilities or adjust existing ones.
By adopting this integrated IDP solution, businesses across industries can accelerate their digital transformation initiatives, improve operational efficiency, and enhance their ability to extract valuable insights from document-based processes, driving significant competitive advantages.
Review your current manual document processing processes and identify where Amazon Bedrock Prompt Flows can help you automate these workflows for your business.
For further exploration and learning, we recommend checking out the following resources:

AWS Prompt engineering guidelines
Implementing advanced prompt engineering with Amazon Bedrock
Intelligent Document Processing with AWS AI Services

About the Authors
Erik Cordsen is a Solutions Architect at AWS serving customers in Georgia. He is passionate about applying cloud technologies and ML to solve real life problems. When he is not designing cloud solutions, Erik enjoys travel, cooking, and cycling.
Vivek Mittal is a Solution Architect at Amazon Web Services. He is passionate about serverless and machine learning technologies. Vivek takes great joy in assisting customers with building innovative solutions on the AWS cloud.
Brijesh Pati is an Enterprise Solutions Architect at AWS. His primary focus is helping enterprise customers adopt cloud technologies for their workloads. He has a background in application development and enterprise architecture and has worked with customers from various industries such as sports, finance, energy, and professional services. His interests include serverless architectures and AI/ML.
Go to Source
30/10/2024 – 16:55 /Erik Cordsen
Twitter: @hoffeldtcom

AWS Machine Learning Blog

This post is part of an ongoing series on governing the machine learning (ML) lifecycle at scale. To start from the beginning, refer to Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker.
A multi-account strategy is essential not only for improving governance but also for enhancing security and control over the resources that support your organization’s business. This approach enables various teams within your organization to experiment, innovate, and integrate more rapidly while keeping the production environment secure and available for your customers. However, because multiple teams might use your ML platform in the cloud, monitoring large ML workloads across a scaling multi-account environment presents challenges in setting up and monitoring telemetry data that is scattered across multiple accounts. In this post, we dive into setting up observability in a multi-account environment with Amazon SageMaker.
Amazon SageMaker Model Monitor allows you to automatically monitor ML models in production, and alerts you when data and model quality issues appear. SageMaker Model Monitor emits per-feature metrics to Amazon CloudWatch, which you can use to set up dashboards and alerts. You can use cross-account observability in CloudWatch to search, analyze, and correlate cross-account telemetry data stored in CloudWatch such as metrics, logs, and traces from one centralized account. You can now set up a central observability AWS account and connect your other accounts as sources. Then you can search, audit, and analyze logs across your applications to drill down into operational issues in a matter of seconds. You can discover and visualize operational and model metrics from many accounts in a single place and create alarms that evaluate metrics belonging to other accounts.
AWS CloudTrail is also essential for maintaining security and compliance in your AWS environment by providing a comprehensive log of all API calls and actions taken across your AWS account, enabling you to track changes, monitor user activities, and detect suspicious behavior. This post also dives into how you can centralize CloudTrail logging so that you have visibility into user activities within all of your SageMaker environments.
Solution overview
Customers often struggle with monitoring their ML workloads across multiple AWS accounts, because each account manages its own metrics, resulting in data silos and limited visibility. ML models across different accounts need real-time monitoring for performance and drift detection, with key metrics like accuracy, CPU utilization, and AUC scores tracked to maintain model reliability.
To solve this, we implement a solution that uses SageMaker Model Monitor and CloudWatch cross-account observability. This approach enables centralized monitoring and governance, allowing your ML team to gain comprehensive insights into logs and performance metrics across all accounts. With this unified view, your team can effectively monitor and manage their ML workloads, improving operational efficiency.
Implementing the solution consists of the following steps:

Deploy the model and set up SageMaker Model Monitor.
Enable CloudWatch cross-account observability.
Consolidate metrics across source accounts and build unified dashboards.
Configure centralized logging to API calls across multiple accounts using CloudTrail.

The following architecture diagram showcases the centralized observability solution in a multi-account setup. We deploy ML models across two AWS environments, production and test, which serve as our source accounts. We use SageMaker Model Monitor to assess these models’ performance. Additionally, we enhance centralized management and oversight by using cross-account observability in CloudWatch to aggregate metrics from the ML workloads in these source accounts into the observability account.

Deploy the model and set up SageMaker Model Monitor
We deploy an XGBoost classifier model, trained on publicly available banking marketing data, to identify potential customers likely to subscribe to term deposits. This model is deployed in both production and test source accounts, where its real-time performance is continually validated against baseline metrics using SageMaker Model Monitor to detect deviations in model performance. Additionally, we use CloudWatch to centralize and share the data and performance metrics of these ML workloads in the observability account, providing a comprehensive view across different accounts. You can find the full source code for this post in the accompanying GitHub repo.
The first step is to deploy the model to an SageMaker endpoint with data capture enabled:
endpoint_name = f”BankMarketingTarget-endpoint-{datetime.utcnow():%Y-%m-%d-%H%M}”
print(“EndpointName =”, endpoint_name)

data_capture_config = DataCaptureConfig(
enable_capture=True, sampling_percentage=100, destination_s3_uri=s3_capture_upload_path)

model.deploy(
initial_instance_count=1,
instance_type=”ml.m4.xlarge”,
endpoint_name=endpoint_name,
data_capture_config=data_capture_config,)
For real-time model performance evaluation, it’s essential to establish a baseline. This baseline is created by invoking the endpoint with validation data. We use SageMaker Model Monitor to perform baseline analysis, compute performance metrics, and propose quality constraints for effective real-time performance evaluation.
Next, we define the model quality monitoring object and run the model quality monitoring baseline job. The model monitor automatically generates baseline statistics and constraints based on the provided validation data. The monitoring job evaluates the model’s predictions against ground truth labels to make sure the model maintains its performance over time.

Banking_Quality_Monitor = ModelQualityMonitor(
role=role,
instance_count=1,
instance_type=”ml.m5.xlarge”,
volume_size_in_gb=20,
max_runtime_in_seconds=1800,
sagemaker_session=session,
)
job = Banking_Quality_Monitor.suggest_baseline(
job_name=baseline_job_name,
baseline_dataset=baseline_dataset_uri,
dataset_format=DatasetFormat.csv(header=True),
output_s3_uri=baseline_results_uri,
problem_type=”BinaryClassification”,
inference_attribute=”prediction”,
probability_attribute=”probability”,
ground_truth_attribute=”label”,
)
job.wait(logs=False)

In addition to the generated baseline, SageMaker Model Monitor requires two additional inputs: predictions from the deployed model endpoint and ground truth data provided by the model-consuming application. Because data capture is enabled on the endpoint, we first generate traffic to make sure prediction data is captured. When listing the data capture files stored, you should expect to see various files from different time periods, organized based on the hour in which the invocation occurred. When viewing the contents of a single file, you will notice the following details. The inferenceId attribute is set as part of the invoke_endpoint call. When ingesting ground truth labels and merging them with predictions for performance metrics, SageMaker Model Monitor uses inferenceId, which is included in captured data records. It’s used to merge these captured records with ground truth records, making sure the inferenceId in both datasets matches. If inferenceId is absent, it uses the eventId from captured data to correlate with the ground truth record.

{
“captureData”: {
“endpointInput”: {
“observedContentType”: “text/csv”,
“mode”: “INPUT”,
“data”: “162,1,0.1,25,1.4,94.465,-41.8,4.961,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1.1,0.9,0.10,0.11,0.12,0.13,0.14,0.15,1.2,0.16,0.17,0.18,0.19,0.20,1.3”,
“encoding”: “CSV”
},
“endpointOutput”: {
“observedContentType”: “text/csv; charset=utf-8”,
“mode”: “OUTPUT”,
“data”: “0.000508524535689503”,
“encoding”: “CSV”
}
},
“eventMetadata”: {
“eventId”: “527cfbb1-d945-4de8-8155-a570894493ca”,
“inferenceId”: “0”,
“inferenceTime”: “2024-08-18T20:25:54Z”
},
“eventVersion”: “0”
}

SageMaker Model Monitor ingests ground truth data collected periodically and merges it with prediction data to calculate performance metrics. This monitoring process uses baseline constraints from the initial setup to continuously assess the model’s performance. By enabling enable_cloudwatch_metrics=True, SageMaker Model Monitor uses CloudWatch to monitor the quality and performance of our ML models, thereby emitting these performance metrics to CloudWatch for comprehensive tracking.

from sagemaker.model_monitor import CronExpressionGenerator

response = Banking_Quality_Monitor.create_monitoring_schedule(
monitor_schedule_name=Banking_monitor_schedule_name,
endpoint_input=endpointInput,
output_s3_uri=baseline_results_uri,
problem_type=”BinaryClassification”,
ground_truth_input=ground_truth_upload_path,
constraints=baseline_job.suggested_constraints(),
schedule_cron_expression=CronExpressionGenerator.hourly(),
enable_cloudwatch_metrics=True,
)

Each time the model quality monitoring job runs, it begins with a merge job that combines two datasets: the inference data captured at the endpoint and the ground truth data provided by the application. This is followed by a monitoring job that assesses the data for insights into model performance using the baseline setup.

Waiting for execution to finish………………………………………………!
groundtruth-merge-202408182100-7460007b77e6223a3f739740 job status: Completed
groundtruth-merge-202408182100-7460007b77e6223a3f739740 job exit message, if any: None
groundtruth-merge-202408182100-7460007b77e6223a3f739740 job failure reason, if any: None
Waiting for execution to finish………………………………………………!
model-quality-monitoring-202408182100-7460007b77e6223a3f739740 job status: Completed
model-quality-monitoring-202408182100-7460007b77e6223a3f739740 job exit message, if any: CompletedWithViolations: Job completed successfully with 8 violations.
model-quality-monitoring-202408182100-7460007b77e6223a3f739740 job failure reason, if any: None
Execution status is: CompletedWithViolations
{‘MonitoringScheduleName’: ‘BankMarketingTarget-monitoring-schedule-2024-08-18-2029’, ‘ScheduledTime’: datetime.datetime(2024, 8, 18, 21, 0, tzinfo=tzlocal()), ‘CreationTime’: datetime.datetime(2024, 8, 18, 21, 2, 21, 198000, tzinfo=tzlocal()), ‘LastModifiedTime’: datetime.datetime(2024, 8, 18, 21, 12, 53, 253000, tzinfo=tzlocal()), ‘MonitoringExecutionStatus’: ‘CompletedWithViolations’, ‘ProcessingJobArn’: ‘arn:aws:sagemaker:us-west-2:730335512115:processing-job/model-quality-monitoring-202408182100-7460007b77e6223a3f739740’, ‘EndpointName’: ‘BankMarketingTarget-endpoint-2024-08-18-1958’}
====STOP====
No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures

Check for deviations from the baseline constraints to effectively set appropriate thresholds in your monitoring process. As you can see in the following the screenshot, various metrics such as AUC, accuracy, recall, and F2 score are closely monitored, each subject to specific threshold checks like LessThanThreshold or GreaterThanThreshold. By actively monitoring these metrics, you can detect significant deviations and make informed decisions promptly, making sure your ML models perform optimally within established parameters.

Enable CloudWatch cross-account observability
With CloudWatch integrated into SageMaker Model Monitor to track the metrics of ML workloads running in the source accounts (production and test), the next step involves enabling CloudWatch cross-account observability. CloudWatch cross-account observability allows you to monitor and troubleshoot applications spanning multiple AWS accounts within an AWS Region. This feature enables seamless searching, visualization, and analysis of metrics, logs, traces, and Application Insights across linked accounts, eliminating account boundaries. You can use this feature to consolidate CloudWatch metrics from these source accounts into the observability account.
To achieve this centralized governance and monitoring, we establish two types of accounts:

Observability account – This central AWS account aggregates and interacts with ML workload metrics from the source accounts
Source accounts (production and test) – These individual AWS accounts share their ML workload metrics and logging resources with the central observability account, enabling centralized oversight and analysis

Configure the observability account
Complete the following steps to configure the observability account:

On the CloudWatch console of the observability account, choose Settings in the navigation pane.
In the Monitoring account configuration section, choose Configure.

Select which telemetry data can be shared with the observability account.

Under List source accounts, enter the source accounts that will share data with the observability account.

To link the source accounts, you can use account IDs, organization IDs, or organization paths. You can use an organization ID to include all accounts within the organization, or an organization path can target all accounts within a specific department or business unit. In this case, because we have two source accounts to link, we enter the account IDs of those two accounts.

Choose Configure.

After the setup is complete, the message “Monitoring account enabled” appears in the CloudWatch settings.

Additionally, your source accounts are listed on the Configuration policy tab.

Link source accounts
Now that the observability account has been enabled with source accounts, you can link these source accounts within an AWS organization. You can choose from two methods:

For organizations using AWS CloudFormation, you can download a CloudFormation template and deploy it in a CloudFormation delegated administration account. This method facilitates the bulk addition of source accounts.
For linking individual accounts, two options are available:

Download a CloudFormation template that can be deployed directly within each source account.
Copy a provided URL, which simplifies the setup process using the AWS Management Console.

Complete the following steps to use the provided URL:

Copy the URL and open it in a new browser window where you’re logged in as the source account.

Configure the telemetry data you want to share. This can include logs, metrics, traces, Application Insights, or Internet Monitor.

During this process, you’ll notice that the Amazon Resource Name (ARN) of the observability account configuration is automatically filled in. This convenience is due to copying and pasting the URL provided in the earlier step. If, however, you choose not to use the URL, you can manually enter the ARN. Copy the ARN from the observability account settings and enter it into the designated field in the source account configuration page.

Define the label that identifies your source accounts. This label is crucial for organizing and distinguishing your accounts within the monitoring system.

Choose Link to finalize the connection between your source accounts and the observability account.

Repeat these steps for both source accounts.

You should see those accounts listed on the Linked source accounts tab within the observability account CloudWatch settings configuration.

Consolidate metrics across source accounts and build unified dashboards
In the observability account, you can access and monitor detailed metrics related to your ML workloads and endpoints deployed across the source accounts. This centralized view allows you to track a variety of metrics, including those from SageMaker endpoints and processing jobs, all within a single interface.

The following screenshot displays CloudWatch model metrics for endpoints in your source accounts. Because you linked the production and test source accounts using the label as the account name, CloudWatch categorizes metrics by account label, effectively distinguishing between the production and test environments. It organizes key details into columns, including account labels, metric names, endpoints, and performance metrics like accuracy and AUC, all captured by scheduled monitoring jobs. These metrics offer valuable insights into the performance of your models across these environments.

The observability account allows you to monitor key metrics of ML workloads and endpoints. The following screenshots display CPU utilization metrics associated with the BankMarketingTarget model and BankMarketing model endpoints you deployed in the source accounts. This view provides detailed insights into critical performance indicators, including:

CPU utilization
Memory utilization
Disk utilization

Furthermore, you can create dashboards that offer a consolidated view of key metrics related to your ML workloads running across the linked source accounts. These centralized dashboards are pivotal for overseeing the performance, reliability, and quality of your ML models on a large scale.

Let’s look at a consolidated view of the ML workload metrics running in our production and test source accounts. This dashboard provides us with immediate access to critical information:

AUC scores – Indicating model performance, giving insights into the trade-off between true positives and false positives
Accuracy rates – Showing prediction correctness, which helps in assessing the overall reliability of the model
F2 scores – Offering a balance between precision and recall, particularly valuable when false negatives are more critical to minimize
Total number of violations – Highlighting any breaches in predefined thresholds or constraints, making sure the model adheres to expected behavior
CPU usage levels – Helping you manage resource allocation by monitoring the processing power utilized by the ML workloads
Disk utilization percentages – Providing efficient storage management by keeping track of how much disk space is being consumed

This following screenshots show CloudWatch dashboards for the models deployed in our production and test source accounts. We track metrics for accuracy, AUC, CPU and disk utilization, and violation counts, providing insights into model performance and resource usage.

You can configure CloudWatch alarms to proactively monitor and receive notifications on critical ML workload metrics from your source accounts. The following screenshot shows an alarm configured to track the accuracy of our bank marketing prediction model in the production account. This alarm is set to trigger if the model’s accuracy falls below a specified threshold, so any significant degradation in performance is promptly detected and addressed. By using such alarms, you can maintain high standards of model performance and quickly respond to potential issues within your ML infrastructure.

You can also create a comprehensive CloudWatch dashboard for monitoring various aspects of Amazon SageMaker Studio, including the number of domains, apps, and user profiles across different AWS accounts. The following screenshot illustrates a dashboard that centralizes key metrics from the production and test source accounts.

Configure centralized logging of API calls across multiple accounts with CloudTrail
If AWS Control Tower has been configured to automatically create an organization-wide trail, each account will send a copy of its CloudTrail event trail to a centralized Amazon Simple Storage Service (Amazon S3) bucket. This bucket is typically created in the log archive account and is configured with limited access, where it serves as a single source of truth for security personnel. If you want to set up a separate account to allow the ML admin team to have access, you can configure replication from the log archive account. You can create the destination bucket in the observability account.
After you create the bucket for replicated logs, you can configure Amazon S3 replication by defining the source and destination bucket, and attaching the required AWS Identity and Access Management (IAM) permissions. Then you update the destination bucket policy to allow replication.
Complete the following steps:

Create an S3 bucket in the observability account.
Log in to the log archive account.
On the Amazon S3 console, open the Control Tower logs bucket, which will have the format aws-controltower-logs-{ACCOUNT-ID}-{REGION}.

You should see an existing key that corresponds to your organization ID. The trail logs are stored under /{ORG-ID}/AWSLogs/{ACCOUNT-ID}/CloudTrail/{REGION}/YYYY/MM/DD.

On the Management tab, choose Create replication rule.
For Replication rule name, enter a name, such as replicate-ml-workloads-to-observability.
Under Source bucket, select Limit the scope of the rule using one or more filters, and enter a path the corresponds to the account you want to enable querying against.

Select Specify a bucket in another account and enter the observability account ID and the bucket name.
Select Change object ownership to destination bucket owner.
For IAM role, choose Create new role.

After you set the cross-account replication, the logs being stored in the S3 bucket in the log archive account will be replicated in the observability account. You can now use Amazon Athena to query and analyze the data being stored in Amazon S3. If you don’t have Control Tower configured, you have to manually configure CloudTrail in each account to write to the S3 bucket in the centralized observability account for analysis. If your organization has more stringent security and compliance requirements, you can configure replication of just the SageMaker logs from the log archive account to the bucket in the observability account by integrating Amazon S3 Event Notifications with AWS Lambda functions.
The following is a sample query run against the logs stored in the observability account bucket and the associated result in Athena:
SELECT useridentity.arn, useridentity.sessioncontext.sourceidentity, requestparametersFROM observability_replicated_logs
WHERE eventname = ‘CreateEndpoint’
AND eventsource = ‘sagemaker.amazonaws.com’

Conclusion
Centralized observability in a multi-account setup empowers organizations to manage ML workloads at scale. By integrating SageMaker Model Monitor with cross-account observability in CloudWatch, you can build a robust framework for real-time monitoring and governance across multiple environments.
This architecture not only provides continuous oversight of model performance, but also significantly enhances your ability to quickly identify and resolve potential issues, thereby improving governance and security throughout our ML ecosystem.
In this post, we outlined the essential steps for implementing centralized observability within your AWS environment, from setting up SageMaker Model Monitor to using cross-account features in CloudWatch. We also demonstrated centralizing CloudTrail logs by replicating them from the log archive account and querying them using Athena to get insights into user activity within SageMaker environments across the organization.
As you implement this solution, remember that achieving optimal observability is an ongoing process. Continually refining and expanding your monitoring capabilities is crucial to making sure your ML models remain reliable, efficient, and aligned with business objectives. As ML practices evolve, blending cutting-edge technology with sound governance principles is key. Run the code yourself using the following notebook or try out the observability module in the following workshop.

About the Authors
Abhishek Doppalapudi is a Solutions Architect at Amazon Web Services (AWS), where he assists startups in building and scaling their products using AWS services. Currently, he is focused on helping AWS customers adopt Generative AI solutions. In his free time, Abhishek enjoys playing soccer, watching Premier League matches, and reading.
Venu Kanamatareddy is a Startup Solutions Architect at AWS. He brings 16 years of extensive IT experience working with both Fortune 100 companies and startups. Currently, Venu is helping guide and assist Machine Learning and Artificial Intelligence-based startups to innovate, scale, and succeed.
Vivek Gangasani is a Senior GenAI Specialist Solutions Architect at AWS. He helps emerging GenAI companies build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of Large Language Models. In his free time, Vivek enjoys hiking, watching movies and trying different cuisines.
Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, scalable, reliable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides motorcycle and walks with his three-year old sheep-a-doodle!
Go to Source
30/10/2024 – 16:55 /Abhishek Doppalapudi
Twitter: @hoffeldtcom

NIMH News Feed


A new study, funded in part by the National Institute of Mental Health, showed that a new medication derived from ketamine is safe and acceptable for use in humans, setting the stage for clinical trials testing it for hard-to-treat mental disorders like severe depression.
Go to Source
30/10/2024 – 16:55 /National Institute of Mental Health
Twitter: @hoffeldtcom

AWS Machine Learning Blog

Today, we’re pleased to announce the general availability (GA) of Amazon Bedrock Custom Model Import. This feature empowers customers to import and use their customized models alongside existing foundation models (FMs) through a single, unified API. Whether leveraging fine-tuned models like Meta Llama, Mistral Mixtral, and IBM Granite, or developing proprietary models based on popular open-source architectures, customers can now bring their custom models into Amazon Bedrock without the overhead of managing infrastructure or model lifecycle tasks.
Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.
With Amazon Bedrock Custom Model Import, customers can access their imported custom models on demand in a serverless manner, freeing them from the complexities of deploying and scaling models themselves. They’re able to accelerate generative AI application development by using native Amazon Bedrock tools and features such as Knowledge Bases, Guardrails, Agents, and more—all through a unified and consistent developer experience.
Benefits of Amazon Bedrock Custom Model Import include:

Flexibility to use existing fine-tuned models:Customers can use their prior investments in model customization by importing existing customized models into Amazon Bedrock without the need to recreate or retrain them. This flexibility maximizes the value of previous efforts and accelerates application development.
Integration with Amazon Bedrock Features: Imported custom models can be seamlessly integrated with the native tools and features of Amazon Bedrock, such as Knowledge Bases, Guardrails, Agents, and Model Evaluation. This unified experience enables developers to use the same tooling and workflows across both base FMs and imported custom models.
Serverless: Customers can access their imported custom models in an on-demand and serverless manner. This eliminates the need to manage or scale underlying infrastructure, as Amazon Bedrock handles all those aspects. Customers can focus on developing generative AI applications without worrying about infrastructure management or scalability issues.
Support for popular model architectures: Amazon Bedrock Custom Model Import supports a variety of popular model architectures, including Meta Llama 3.2, Mistral 7B, Mixtral 8x7B, and more. Customers can import custom weights in formats like Hugging Face Safetensors from Amazon SageMaker and Amazon S3. This broad compatibility allows customers to work with models that best suit their specific needs and use cases, allowing for greater flexibility and choice in model selection.
Leverage Amazon Bedrock converse API: Amazon Custom Model Import allows our customers to use their supported fine-tuned models with Amazon Bedrock Converse API which simplifies and unifies the access to the models.

Getting started with Custom Model Import
One of the critical requirements from our customers is the ability to customize models with their proprietary data while retaining complete ownership and control over the tuned model artifact and its deployment. Customization could be in form of domain adaptation or instruction fine-tuning. Customers have a wide degree of options for fine-tuning models efficiently and cost effectively. However, hosting models presents its own unique set of challenges. Customers are looking for some key aspects, namely:

Using the existing customization investment and fine-grained control over customization.
Having a unified developer experience when accessing custom models or base models through Amazon Bedrock’s API.
Ease of deployment through a fully managed, serverless, service.
Using pay-as-you-go inference to minimize the costs of their generative AI workloads.
Be backed by enterprise grade security and privacy tooling.

Amazon Bedrock Custom Model Import feature seeks to address these concerns. To bring your custom model into the Amazon Bedrock ecosystem, you need to run an import job. The import job can be invoked using the AWS Management Console or through APIs. In this post, we demonstrate the code for running the import model process through APIs. After the model is imported, you can invoke the model by using the model’s Amazon Resource Name (ARN).
As of this writing, supported model architectures today include Meta Llama (v.2, 3, 3.1, and 3.2), Mistral 7B, Mixtral 8x7B, Flan and IBM Granite models like Granite 3B-Code, 8B-Code, 20B-Code and 34B-Code.
A few points to be aware of when importing your model:

Models must be serialized in Safetensors format.
If you have a different format, you can potentially use Llama convert scripts or Mistral convert scripts to convert your model to a supported format.
The import process expects at least the following files:.safetensors, json, tokenizer_config.json, tokenizer.json, and tokenizer.model.
The precision for the model weights supported is FP32, FP16, and BF16.
For fine-tuning jobs that create adapters like LoRA-PEFT adapters, the import process expects the adapters to be merged into the main base model weight as described in Model merging.

Importing a model using the Amazon Bedrock console

Go to the Amazon Bedrock console and choose Foundational models and then Imported models from the navigation pane on the left hand side to get to the Models
Click on Import Model to configure the import process.
Configure the model.

Enter the location of your model weights. These can be in Amazon S3 or point to a SageMaker Model ARN object.
Enter a Job name. We recommend this be suffixed with the version of the model. As of now, you need to manage the generative AI operations aspects outside of this feature.
Configure your AWS Key Management Service (AWS KMS) key for encryption. By default, this will default to a key owned and managed by AWS.
Service access role. You can create a new role or use an existing role which will have the necessary permissions to run the import process. The permissions must include access to your Amazon S3 if you’re specifying model weights through S3.

After the Import Model job is complete, you will see the model and the model ARN. Make a note of the ARN to use later.
Test the model using the on-demand feature in the Text playground as you would for any base foundations model.

The import process validates that the model configuration complies with the specified architecture for that model by reading the config.json file and validates the model architecture values such as the maximum sequence length and other relevant details. It also checks that the model weights are in the Safetensors format. This validation verifies that the imported model meets the necessary requirements and is compatible with the system.
Fine tuning a Meta Llama Model on SageMaker
Meta Llama 3.2 offers multi-modal vision and lightweight models, representing Meta’s latest advances in large language models (LLMs). These new models provide enhanced capabilities and broader applicability across various use cases. With a focus on responsible innovation and system-level safety, the Llama 3.2 models demonstrate state-of-the-art performance on a wide range of industry benchmarks and introduce features to help you build a new generation of AI experiences.
SageMaker JumpStart provides FMs through two primary interfaces: SageMaker Studio and the SageMaker Python SDK. This gives you multiple options to discover and use hundreds of models for your use case.
In this section, we’ll show you how to fine-tune the Llama 3.2 3B Instruct model using SageMaker JumpStart. We’ll also share the supported instance types and context for the Llama 3.2 models available in SageMaker JumpStart. Although not highlighted in this post, you can also find other Llama 3.2 Model variants that can be fine-tuned using SageMaker JumpStart.
Instruction fine-tuning
The text generation model can be instruction fine-tuned on any text data, provided that the data is in the expected format. The instruction fine-tuned model can be further deployed for inference. The training data must be formatted in a JSON Lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, but can be saved in multiple JSON Lines files. The training folder can also contain a template.json file describing the input and output formats.
Synthetic dataset
For this use case, we’ll use a synthetically generated dataset named amazon10Ksynth.jsonl in an instruction-tuning format. This dataset contains approximately 200 entries designed for training and fine-tuning LLMs in the finance domain.
The following is an example of the data format:

instruction_sample = {
“question”: “What is Amazon’s plan for expanding their physical store footprint and how will that impact their overall revenue?”,
“context”: “The 10-K report mentions that Amazon is continuing to expand their physical store network, including 611 North America stores and 32 International stores as of the end of 2022. This physical store expansion is expected to contribute to increased product sales and overall revenue growth.”,
“answer”: “Amazon is expanding their physical store footprint, with 611 North America stores and 32 International stores as of the end of 2022. This physical store expansion is expected to contribute to increased product sales and overall revenue growth.”
}

print(instruction_sample)

Prompt template
Next, we create a prompt template for using the data in an instruction input format for the training job (because we are instruction fine-tuning the model in this example), and for inferencing the deployed endpoint.

import json

prompt_template = {
“prompt”: “question: {question} context: {context}”,
“completion”: “{answer}”
}

with open(“prompt_template.json”, “w”) as f:
json.dump(prompt_template, f)

After the prompt template is created, upload the prepared dataset that will be used for fine-tuning to Amazon S3.

from sagemaker.s3 import S3Uploader
import sagemaker
output_bucket = sagemaker.Session().default_bucket()
local_data_file = “amazon10Ksynth.jsonl”
train_data_location = f”s3://{output_bucket}/amazon10Ksynth_dataset”
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload(“prompt_template.json”, train_data_location)
print(f”Training data: {train_data_location}”)

Fine-tuning the Meta Llama 3.2 3B model
Now, we’ll fine-tune the Llama 3.2 3B model on the financial dataset. The fine-tuning scripts are based on the scripts provided by the Llama fine-tuning repository.

from sagemaker.jumpstart.estimator import JumpStartEstimator

estimator = JumpStartEstimator(
model_id=model_id,
model_version=model_version,
environment={“accept_eula”: “true”},
disable_output_compression=True,
instance_type=”ml.g5.12xlarge”,
)

# Set the hyperparameters for instruction tuning
estimator.set_hyperparameters(
instruction_tuned=”True”, epoch=”5″, max_input_length=”1024″
)

# Fit the model on the training data
estimator.fit({“training”: train_data_location})

Importing a custom model from SageMaker to Amazon Bedrock
In this section, we will use a Python SDK to create a model import job, get the imported model ID and finally generate inferences. You can refer to the console screenshots in the earlier section  for how to import a model using the Amazon Bedrock console.
Parameter and helper function set up
First, we’ll create a few helper functions and set up our parameters to create the import job. The import job is responsible for collecting and deploying the model from SageMaker to Amazon Bedrock. This is done by using the create_model_import_job function.
Stored safetensors need to be formatted so that the Amazon S3 location is the top-level folder. The configuration files and safetensors will be stored as shown in the following figure.

import json
import boto3
from botocore.exceptions import ClientError
bedrock = boto3.client(‘bedrock’, region_name=’us-east-1′)
job_name = ‘fine-tuned-model-import-demo’
sagemaker_model_name = ‘meta-textgeneration-llama-3-2-3b-2024-10-12-23-29-57-373’
model_url = {‘s3DataSource’:
{‘s3Uri’:
“s3://sagemaker-{REGION}-{AWS_ACCOUNT}/meta-textgeneration-llama-3-2-3b-2024-10-12-23-19-53-906/output/model/”
}
}

Check the status and get job ARN from the response:
After a few minutes, the model will be imported, and the status of the job can be checked using get_model_import_job. The job ARN is then used to get the imported model ARN, which we will use to generate inferences.

def get_import_model_from_job(job_name):
response = bedrock.get_model_import_job(jobIdentifier=job_name)
return response[‘importedModelArn’]

job_arn = response[‘jobArn’]
import_model_arn = get_import_model_from_job(job_arn)

Generating inferences using the imported custom model
The model can be invoked by using the invoke_model and converse APIs. The following is a support function that will be used to invoke and extract the generated text from the overall output.

from botocore.exceptions import ClientError

client = boto3.client(‘bedrock-runtime’, region_name=’us-east-1′)

def generate_conversation_with_imported_model(native_request, model_id):
request = json.dumps(native_request)
try:
# Invoke the model with the request.
response = client.invoke_model(modelId=model_id, body=request)
model_response = json.loads(response[“body”].read())

response_text = model_response[“outputs”][0][“text”]
print(response_text)
except (ClientError, Exception) as e:
print(f”ERROR: Can’t invoke ‘{model_id}’. Reason: {e}”)
exit(1)

Context set up and model response
Finally, we can use the custom model. First, we format our inquiry to match the fined-tuned prompt structure. This will make sure that the responses generated closely resemble the format used in the fine-tuning phase and are more aligned to our needs. To do this we use the template that we used to format the data used for fine-tuning. The context will be coming from your RAG solutions like Amazon Bedrock Knowledgebases. For this example, we take a sample context and add to demo the concept:

input_output_demarkation_key = “nn### Response:n”
question = “Tell me what was the improved inflow value of cash?”

context = “Amazons free cash flow less principal repayments of finance leases and financing obligations improved to an inflow of $46.1 billion for the trailing twelve months, compared with an outflow of $10.1 billion for the trailing twelve months ended March 31, 2023.”

payload = {
“prompt”: template[0][“prompt”].format(
question=question, # user query
context=context
+ input_output_demarkation_key # rag context
),
“max_tokens”: 100,
“temperature”: 0.01
}
generate_conversation_with_imported_model(payload, import_model_arn)

The output will look similar to:

After the model has been fine-tuned and imported into Amazon Bedrock, you can experiment by sending different sets of input questions and context to the model to generate a response, as shown in the following example:

question: “””How did Amazon’s international segment operating income change
in Q4 2022 compared to the prior year?”””
context: “””Amazon’s international segment reported an operating loss of
$1.1 billion in Q4 2022, an improvement from a $1.7 billion
operating loss in Q4 2021.”””
response:

Some points to note
This examples in this post are to demonstrate Custom Model Import and aren’t designed to be used in production. Because the model has been trained on only 200 samples of synthetically generated data, it’s only useful for testing purposes. You would ideally have more diverse datasets and additional samples with continuous experimentation conducted using hyperparameter tuning for your respective use case, thereby steering the model to create a more desirable output. For this post, ensure that the model temperature parameter is set to 0 and max_tokens run time parameter is set to a lower values such as 100–150 tokens so that a succinct response is generated. You can experiment with other parameters to generate a desirable outcome. See Amazon Bedrock Recipes and GitHub for more examples.
Best practices to consider:
This feature brings significant advantages for hosting your fine-tuned models efficiently. As we continue to develop this feature to meet our customers’ needs, there are a few points to be aware of:

Define your test suite and acceptance metrics before starting the journey. Automating this will help to save time and effort.
Currently, the model weights need to be all-inclusive, including the adapter weights. There are multiple methods for merging the models and we recommend experimenting to determine the right methodology. The Custom Model Import feature lets you test your model on demand.
When creating your import jobs, add versioning to the job name to help quickly track your models. Currently, we’re not offering model versioning, and each import is a unique job and creates a unique model.
The precision supported for the model weights is FP32, FP16, and BF16. Run tests to validate that these will work for your use case.
The maximum concurrency that you can expect for each model will be 16 per account. Higher concurrency requests will cause the service to scale and increase the number of model copies.
The number of model copies active at any point in time will be available through Amazon CloudWatch See Import a customized model to Amazon Bedrock for more information.
As of the writing this post, we are releasing this feature in the US-EAST-1 and US-WEST-2 AWS Regions only. We will continue to release to other Regions. Follow Model support by AWS Region for updates.
The default import quota for each account is three models. If you need more for your use cases, work with your account teams to increase your account quota.
The default throttling limits for this feature for each account will be 100 invocations per second.
You can use this sample notebook to performance test your models imported via this feature. This notebook is mere reference and not designed to be an exhaustive testing. We will always recommend you to run your own full performance testing along with your end to end testing including functional and evaluation testing.

Now available
Amazon Bedrock Custom Model Import is generally available today in Amazon Bedrock in the US-East-1 (N. Virginia) and US-West-2 (Oregon) AWS Regions. See the full Region list for future updates. To learn more, see the Custom Model Import product page and pricing page.
Give Custom Model Import a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

About the authors
Paras Mehra is a Senior Product Manager at AWS. He is focused on helping build Amazon SageMaker Training and Processing. In his spare time, Paras enjoys spending time with his family and road biking around the Bay Area.
Jay Pillai is a Principal Solutions Architect at Amazon Web Services. In this role, he functions as the Lead Architect, helping partners ideate, build, and launch Partner Solutions. As an Information Technology Leader, Jay specializes in artificial intelligence, generative AI, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across supply chain, legal technologies, real estate, financial services, insurance, payments, and market research business domains.
Shikhar Kwatra is a Sr. Partner Solutions Architect at Amazon Web Services, working with leading Global System Integrators. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and support the GSI partners in building strategic industry solutions on AWS.
Claudio Mazzoni is a Sr GenAI Specialist Solutions Architect at AWS working on world class applications guiding costumers through their implementation of GenAI to reach their goals and improve their business outcomes. Outside of work Claudio enjoys spending time with family, working in his garden and cooking Uruguayan food.
Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers leverage GenAI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a Ph.D. degree in Electrical Engineering. Outside of work, she loves traveling, working out and exploring new things.
Simon Zamarin is an AI/ML Solutions Architect whose main focus is helping customers extract value from their data assets. In his spare time, Simon enjoys spending time with family, reading sci-fi, and working on various DIY house projects.
Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on Amazon SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.
Go to Source
22/10/2024 – 10:02 /Paras Mehra
Twitter: @hoffeldtcom

error: Content is protected !!