GPT-NeoXT-Chat-Base-20B foundation model for chatbot applications is now available on Amazon SageMaker

AWS Machine Learning Blog

Today we are excited to announce that Together Computer’s GPT-NeoXT-Chat-Base-20B language foundation model is available for customers using Amazon SageMaker JumpStart. GPT-NeoXT-Chat-Base-20B is an open-source model to build conversational bots. You can easily try out this model and use it with JumpStart. JumpStart is the machine learning (ML) hub of Amazon SageMaker that provides access to foundation models in addition to built-in algorithms and end-to-end solution templates to help you quickly get started with ML.
In this post, we walk through how to deploy the GPT-NeoXT-Chat-Base-20B model and invoke the model within an OpenChatKit interactive shell. This demonstration provides an open-source foundation model chatbot for use within your application.
JumpStart models use Deep Java Serving that uses the Deep Java Library (DJL) with deep speed libraries to optimize models and minimize latency for inference. The underlying implementation in JumpStart follows an implementation that is similar to the following notebook. As a JumpStart model hub customer, you get improved performance without having to maintain the model script outside of the SageMaker SDK. JumpStart models also achieve improved security posture with endpoints that enable network isolation.
Foundation models in SageMaker
JumpStart provides access to a range of models from popular model hubs, including Hugging Face, PyTorch Hub, and TensorFlow Hub, which you can use within your ML development workflow in SageMaker. Recent advances in ML have given rise to a new class of models known as foundation models, which are typically trained on billions of parameters and are adaptable to a wide category of use cases, such as text summarization, generating digital art, and language translation. Because these models are expensive to train, customers want to use existing pre-trained foundation models and fine-tune them as needed, rather than train these models themselves. SageMaker provides a curated list of models that you can choose from on the SageMaker console.
You can now find foundation models from different model providers within JumpStart, enabling you to get started with foundation models quickly. You can find foundation models based on different tasks or model providers, and easily review model characteristics and usage terms. You can also try out these models using a test UI widget. When you want to use a foundation model at scale, you can do so easily without leaving SageMaker by using pre-built notebooks from model providers. Because the models are hosted and deployed on AWS, you can rest assured that your data, whether used for evaluating or using the model at scale, is never shared with third parties.
GPT-NeoXT-Chat-Base-20B foundation model
Together Computer developed GPT-NeoXT-Chat-Base-20B, a 20-billion-parameter language model, fine-tuned from ElutherAI’s GPT-NeoX model with over 40 million instructions, focusing on dialog-style interactions. Additionally, the model is tuned on several tasks, such as question answering, classification, extraction, and summarization. The model is based on the OIG-43M dataset that was created in collaboration with LAION and Ontocord.
In addition to the aforementioned fine-tuning, GPT-NeoXT-Chat-Base-20B-v0.16 has also undergone further fine-tuning via a small amount of feedback data. This allows the model to better adapt to human preferences in the conversations. GPT-NeoXT-Chat-Base-20B is designed for use in chatbot applications and may not perform well for other use cases outside of its intended scope. Together, Ontocord and LAION collaborated to release OpenChatKit, an open-source alternative to ChatGPT with a comparable set of capabilities. OpenChatKit was launched under an Apache-2.0 license, granting complete access to the source code, model weights, and training datasets. There are several tasks that OpenChatKit excels at out of the box. This includes summarization tasks, extraction tasks that allow extracting structured information from unstructured documents, and classification tasks to classify a sentence or paragraph into different categories.
Let’s explore how we can use the GPT-NeoXT-Chat-Base-20B model in JumpStart.
Solution overview
You can find the code showing the deployment of GPT-NeoXT-Chat-Base-20B on SageMaker and an example of how to use the deployed model in a conversational manner using the command shell in the following GitHub notebook.
In the following sections, we expand each step in detail to deploy the model and then use it to solve different tasks:
Set up prerequisites.
Select a pre-trained model.
Retrieve artifacts and deploy an endpoint.
Query the endpoint and parse a response.
Use an OpenChatKit shell to interact with your deployed endpoint.
Set up prerequisites
This notebook was tested on an ml.t3.medium instance in Amazon SageMaker Studio with the Python 3 (Data Science) kernel and in a SageMaker notebook instance with the conda_python3 kernel.
Before you run the notebook, use the following command to complete some initial steps required for setup:

%pip install –upgrade sagemaker –quiet

Select a pre-trained model
We set up a SageMaker session like usual using Boto3 and then select the model ID that we want to deploy:

model_id, model_version = “huggingface-textgeneration2-gpt-neoxt-chat-base-20b-fp16”, “*”

Retrieve artifacts and deploy an endpoint
With SageMaker, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the instance_type, image_uri, and model_uri for the pre-trained model. To host the pre-trained model, we create an instance of sagemaker.model.Model and deploy it. The following code uses ml.g5.24xlarge for the inference endpoint. The deploy method may take a few minutes.

endpoint_name = name_from_base(f”jumpstart-example-{model_id}”)

# Retrieve the inference instance type for the specified model.
instance_type = instance_types.retrieve_default(
model_id=model_id, model_version=model_version, scope=”inference”
)

# Retrieve the inference docker container uri.
image_uri = image_uris.retrieve(
region=None,
framework=None,
image_scope=”inference”,
model_id=model_id,
model_version=model_version,
instance_type=instance_type,
)

# Retrieve the model uri.
model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope=”inference”
)

# Create the SageMaker model instance. The inference script is prepacked with the model artifact.
model = Model(
image_uri=image_uri,
model_data=model_uri,
role=aws_role,
predictor_cls=Predictor,
name=endpoint_name,
)

# Set the serializer/deserializer used to run inference through the sagemaker API.
serializer = JSONSerializer()
deserializer = JSONDeserializer()

# Deploy the Model.
predictor = model.deploy(
initial_instance_count=1,
instance_type=instance_type,
predictor_cls=Predictor,
endpoint_name=endpoint_name,
serializer=serializer,
deserializer=deserializer
)

Query the endpoint and parse the response
Next, we show you an example of how to invoke an endpoint with a subset of the hyperparameters:

payload = {
“text_inputs”: “: Tell me the steps to make a pizzan:”,
“max_length”: 500,
“max_time”: 50,
“top_k”: 50,
“top_p”: 0.95,
“do_sample”: True,
“stopping_criteria”: [“”],
}
response = predictor.predict(payload)
print(response[0][0][“generated_text”])

The following is the response that we get:

: Tell me the steps to make a pizza
: 1. Choose your desired crust, such as thin-crust or deep-dish.
2. Preheat the oven to the desired temperature.
3. Spread sauce, such as tomato or garlic, over the crust.
4. Add your desired topping, such as pepperoni, mushrooms, or olives.
5. Add your favorite cheese, such as mozzarella, Parmesan, or Asiago.
6. Bake the pizza according to the recipe instructions.
7. Allow the pizza to cool slightly before slicing and serving.
:

Here, we have provided the payload argument “stopping_criteria”: [“”], which has resulted in the model response ending with the generation of the word sequence . The JumpStart model script will accept any list of strings as desired stop words, convert this list to a valid stopping_criteria keyword argument to the transformers generate API, and stop text generation when the output sequence contains any specified stop words. This is useful for two reasons: first, inference time is reduced because the endpoint doesn’t continue to generate undesired text beyond the stop words, and second, this prevents the OpenChatKit model from hallucinating additional human and bot responses until other stop criteria are met.
Use an OpenChatKit shell to interact with your deployed endpoint
OpenChatKit provides a command line shell to interact with the chatbot. In this step, you create a version of this shell that can interact with your deployed endpoint. We provide a bare-bones simplification of the inference scripts in this OpenChatKit repository that can interact with our deployed SageMaker endpoint.
There are two main components to this:
A shell interpreter (JumpStartOpenChatKitShell) that allows for iterative inference invocations of the model endpoint
A conversation object (Conversation) that stores previous human/chatbot interactions locally within the interactive shell and appropriately formats past conversations for future inference context
The Conversation object is imported as is from the OpenChatKit repository. The following code creates a custom shell interpreter that can interact with your endpoint. This is a simplified version of the OpenChatKit implementation. We encourage you to explore the OpenChatKit repository to see how you can use more in-depth features, such as token streaming, moderation models, and retrieval augmented generation, within this context. The context of this notebook focuses on demonstrating a minimal viable chatbot with a JumpStart endpoint; you can add complexity as needed from here.
A short demo to showcase the JumpStartOpenChatKitShell is shown in the following video.

The following snippet shows how the code works:

class JumpStartOpenChatKitShell(cmd.Cmd):
intro = (
“Welcome to the OpenChatKit chatbot shell, modified to use a SageMaker JumpStart endpoint! Type /help or /? to ”
“list commands. For example, type /quit to exit shell.n”
)
prompt = “>>> ”
human_id = “”
bot_id = “”

def __init__(self, predictor: Predictor, cmd_queue: Optional[List[str]] = None, **kwargs):
super().__init__()
self.predictor = predictor
self.payload_kwargs = kwargs
self.payload_kwargs[“stopping_criteria”] = [self.human_id]
if cmd_queue is not None:
self.cmdqueue = cmd_queue

def preloop(self):
self.conversation = Conversation(self.human_id, self.bot_id)

def precmd(self, line):
command = line[1:] if line.startswith(‘/’) else ‘say ‘ + line
return command

def do_say(self, arg):
self.conversation.push_human_turn(arg)
prompt = self.conversation.get_raw_prompt()
payload = {“text_inputs”: prompt, **self.payload_kwargs}
response = self.predictor.predict(payload)
output = response[0][0][“generated_text”][len(prompt):]
self.conversation.push_model_response(output)
print(self.conversation.get_last_turn())

def do_reset(self, arg):
self.conversation = Conversation(self.human_id, self.bot_id)

def do_hyperparameters(self, arg):
print(f”Hyperparameters: {self.payload_kwargs}n”)

def do_quit(self, arg):
return True

You can now launch this shell as a command loop. This will repeatedly issue a prompt, accept input, parse the input command, and dispatch actions. Because the resulting shell may be utilized in an infinite loop, this notebook provides a default command queue (cmdqueue) as a queued list of input lines. Because the last input is the command /quit, the shell will exit upon exhaustion of the queue. To dynamically interact with this chatbot, remove the cmdqueue.

cmd_queue = [
“Hello!”,
]
JumpStartOpenChatKitShell(
endpoint_name=endpoint_name,
cmd_queue=cmd_queue,
max_new_tokens=128,
do_sample=True,
temperature=0.6,
top_k=40,
).cmdloop()

Example 1: Conversation context is retained
The following prompt shows that the chatbot is able to retain the context of the conversation to answer follow-up questions:

Welcome to the OpenChatKit chatbot shell, modified to use a SageMaker JumpStart endpoint! Type /help or /? to list commands. For example, type /quit to exit shell.

> What is the capital of US?

> How far it is from PA ?

What is the sentiment of this sentence ” The news this morning was tragic and it created lot of fear and concerns in city”

Go to Source
16/05/2023 – 21:02 /Rachna Chadha
Twitter: @hoffeldtcom

About Admin

As an experienced Human Resources leader, I bring a wealth of expertise in corporate HR, talent management, consulting, and business partnering, spanning diverse industries such as retail, media, marketing, PR, graphic design, NGO, law, assurance, consulting, tax services, investment, medical, app/fintech, and tech/programming. I have primarily worked with service and sales companies at local, regional, and global levels, both in Europe and the Asia-Pacific region. My strengths lie in operations, development, strategy, and growth, and I have a proven track record of tailoring HR solutions to meet unique organizational needs. Whether it's overseeing daily HR tasks or crafting and implementing new processes for organizational efficiency and development, I am skilled in creating innovative human capital management programs and impactful company-wide strategic solutions. I am deeply committed to putting people first and using data-driven insights to drive business value. I believe that building modern and inclusive organizations requires a focus on talent development and daily operations, as well as delivering results. My passion for HRM is driven by a strong sense of empathy, integrity, honesty, humility, and courage, which have enabled me to build and maintain positive relationships with employees at all levels.

    You May Also Like

    error: Content is protected !!