Create an HCLS document summarization application with Falcon using Amazon SageMaker JumpStart
AWS Machine Learning Blog
Healthcare and life sciences (HCLS) customers are adopting generative AI as a tool to get more from their data. Use cases include document summarization to help readers focus on key points of a document and transforming unstructured text into standardized formats to highlight important attributes. With unique data formats and strict regulatory requirements, customers are looking for choices to select the most performant and cost-effective model, as well as the ability to perform necessary customization (fine-tuning) to fit their business use case. In this post, we walk you through deploying a Falcon large language model (LLM) using Amazon SageMaker JumpStart and using the model to summarize long documents with LangChain and Python.
Solution overview
Amazon SageMaker is built on Amazon’s two decades of experience developing real-world ML applications, including product recommendations, personalization, intelligent shopping, robotics, and voice-assisted devices. SageMaker is a HIPAA-eligible managed service that provides tools that enable data scientists, ML engineers, and business analysts to innovate with ML. Within SageMaker is Amazon SageMaker Studio, an integrated development environment (IDE) purpose-built for collaborative ML workflows, which, in turn, contain a wide variety of quickstart solutions and pre-trained ML models in an integrated hub called SageMaker JumpStart. With SageMaker JumpStart, you can use pre-trained models, such as the Falcon LLM, with pre-built sample notebooks and SDK support to experiment with and deploy these powerful transformer models. You can use SageMaker Studio and SageMaker JumpStart to deploy and query your own generative model in your AWS account.
You can also ensure that the inference payload data doesn’t leave your VPC. You can provision models as single-tenant endpoints and deploy them with network isolation. Furthermore, you can curate and manage the selected set of models that satisfy your own security requirements by using the private model hub capability within SageMaker JumpStart and storing the approved models in there. SageMaker is in scope for HIPAA BAA, SOC123, and HITRUST CSF.
The Falcon LLM is a large language model, trained by researchers at Technology Innovation Institute (TII) on over 1 trillion tokens using AWS. Falcon has many different variations, with its two main constituents Falcon 40B and Falcon 7B, comprised of 40 billion and 7 billion parameters, respectively, with fine-tuned versions trained for specific tasks, such as following instructions. Falcon performs well on a variety of tasks, including text summarization, sentiment analysis, question answering, and conversing. This post provides a walkthrough that you can follow to deploy the Falcon LLM into your AWS account, using a managed notebook instance through SageMaker JumpStart to experiment with text summarization.
The SageMaker JumpStart model hub includes complete notebooks to deploy and query each model. As of this writing, there are six versions of Falcon available in the SageMaker JumpStart model hub: Falcon 40B Instruct BF16, Falcon 40B BF16, Falcon 180B BF16, Falcon 180B Chat BF16, Falcon 7B Instruct BF16, and Falcon 7B BF16. This post uses the Falcon 7B Instruct model.
In the following sections, we show how to get started with document summarization by deploying Falcon 7B on SageMaker Jumpstart.
Prerequisites
For this tutorial, you’ll need an AWS account with a SageMaker domain. If you don’t already have a SageMaker domain, refer to Onboard to Amazon SageMaker Domain to create one.
Deploy Falcon 7B using SageMaker JumpStart
To deploy your model, complete the following steps:
Navigate to your SageMaker Studio environment from the SageMaker console.
Within the IDE, under SageMaker JumpStart in the navigation pane, choose Models, notebooks, solutions.
Deploy the Falcon 7B Instruct model to an endpoint for inference.
This will open the model card for the Falcon 7B Instruct BF16 model. On this page, you can find the Deploy or Train options as well as links to open the sample notebooks in SageMaker Studio. This post will use the sample notebook from SageMaker JumpStart to deploy the model.
Choose Open notebook.
Run the first four cells of the notebook to deploy the Falcon 7B Instruct endpoint.
You can see your deployed JumpStart models on the Launched JumpStart assets page.
In the navigation pane, under SageMaker Jumpstart, choose Launched JumpStart assets.
Choose the Model endpoints tab to view the status of your endpoint.
With the Falcon LLM endpoint deployed, you are ready to query the model.
Run your first query
To run a query, complete the following steps:
On the File menu, choose New and Notebook to open a new notebook.
You can also download the completed notebook here.
Select the image, kernel, and instance type when prompted. For this post, we choose the Data Science 3.0 image, Python 3 kernel, and ml.t3.medium instance.
Import the Boto3 and JSON modules by entering the following two lines into the first cell:
import json
import boto3
Press Shift + Enter to run the cell.
Next, you can define a function that will call your endpoint. This function takes a dictionary payload and uses it to invoke the SageMaker runtime client. Then it deserializes the response and prints the input and generated text.
newline, bold, unbold = ‘n’, ‘