Code generation using Code Llama 70B and Mixtral 8x7B on Amazon SageMaker
AWS Machine Learning Blog
In the ever-evolving landscape of machine learning and artificial intelligence (AI), large language models (LLMs) have emerged as powerful tools for a wide range of natural language processing (NLP) tasks, including code generation. Among these cutting-edge models, Code Llama 70B stands out as a true heavyweight, boasting an impressive 70 billion parameters. Developed by Meta and now available on Amazon SageMaker, this state-of-the-art LLM promises to revolutionize the way developers and data scientists approach coding tasks.
What is Code Llama 70B and Mixtral 8x7B?
Code Llama 70B is a variant of the Code Llama foundation model (FM), a fine-tuned version of Meta’s renowned Llama 2 model. This massive language model is specifically designed for code generation and understanding, capable of generating code from natural language prompts or existing code snippets. With its 70 billion parameters, Code Llama 70B offers unparalleled performance and versatility, making it a game-changer in the world of AI-assisted coding.
Mixtral 8x7B is a state-of-the-art sparse mixture of experts (MoE) foundation model released by Mistral AI. It supports multiple use cases such as text summarization, classification, text generation, and code generation. It is an 8x model, which means it contains eight distinct groups of parameters. The model has about 45 billion total parameters and supports a context length of 32,000 tokens. MoE is a type of neural network architecture that consists of multiple experts” where each expert is a neural network. In the context of transformer models, MoE replaces some feed-forward layers with sparse MoE layers. These layers have a certain number of experts, and a router network selects which experts process each token at each layer. MoE models enable more compute-efficient and faster inference compared to dense models.
Key features and capabilities of Code Llama 70B and Mixtral 8x7B include:
Code generation: These LLMs excel at generating high-quality code across a wide range of programming languages, including Python, Java, C++, and more. They can translate natural language instructions into functional code, streamlining the development process and accelerating project timelines.
Code infilling: In addition to generating new code, they can seamlessly infill missing sections of existing code by providing the prefix and suffix. This feature is particularly valuable for enhancing productivity and reducing the time spent on repetitive coding tasks.
Natural language interaction: The instruct variants of Code Llama 70B and Mixtral 8x7B support natural language interaction, allowing developers to engage in conversational exchanges to develop code-based solutions. This intuitive interface fosters collaboration and enhances the overall coding experience.
Long context support: With the ability to handle context lengths of up to 48 thousand tokens, Code Llama 70B can maintain coherence and consistency over extended code segments or conversations, ensuring relevant and accurate responses. Mixtral 8x7B has a context window of 32 thousand tokens.
Multi-language support: While both of these models excel at generating code, their capabilities extend beyond programming languages. They can also assist with natural language tasks, such as text generation, summarization, and question answering, making them versatile tools for various applications.
Harnessing the power of Code Llama 70B and Mistral models on SageMaker
Amazon SageMaker, a fully managed machine learning service, provides a seamless integration with Code Llama 70B, enabling developers and data scientists to use its capabilities with just a few clicks. Here’s how you can get started:
One-click deployment: Code Llama 70B and Mixtral 8x7B are available in Amazon SageMaker JumpStart, a hub that provides access to pre-trained models and solutions. With a few clicks, you can deploy them and create a private inference endpoint for your coding tasks.
Scalable infrastructure: The SageMaker scalable infrastructure ensures that foundation models can handle even the most demanding workloads, allowing you to generate code efficiently and without delays.
Integrated development environment: SageMaker provides a seamless integrated development environment (IDE) that you can use to interact with these models directly from your coding environment. This integration streamlines the workflow and enhances productivity.
Customization and fine-tuning: While Code Llama 70B and Mixtral 8x7B are powerful out-of-the-box models, you can use SageMaker to fine-tune and customize a model to suit your specific needs, further enhancing its performance and accuracy.
Security and compliance: SageMaker JumpStart employs multiple layers of security, including data encryption, network isolation, VPC deployment, and customizable inference, to ensure the privacy and confidentiality of your data when working with LLMs
Solution overview
The following figure showcases how code generation can be done using the Llama and Mistral AI Models on SageMaker presented in this blog post.
You first deploy a SageMaker endpoint using an LLM from SageMaker JumpStart. For the examples presented in this article, you either deploy a Code Llama 70 B or a Mixtral 8x7B endpoint. After the endpoint has been deployed, you can use it to generate code with the prompts provided in this article and the associated notebook, or with your own prompts. After the code has been generated with the endpoint, you can use a notebook to test the code and its functionality.
Prerequisites
In this section, you sign up for an AWS account and create an AWS Identity and Access Management (IAM) admin user.
If you’re new to SageMaker, we recommend that you read What is Amazon SageMaker?.
Use the following hyperlinks to finish setting up the prerequisites for an AWS account and Sagemaker:
Create an AWS Account: This walks you through setting up an AWS account
When you create an AWS account, you get a single sign-in identity that has complete access to all of the AWS services and resources in the account. This identity is called the AWS account root user.
Signing in to the AWS Management Console using the email address and password that you used to create the account gives you complete access to all of the AWS resources in your account. We strongly recommend that you not use the root user for everyday tasks, even the administrative ones.
Adhere to the security best practices in IAM, and Create an Administrative User and Group. Then securely lock away the root user credentials and use them to perform only a few account and service management tasks.
In the console, go to the SageMaker console andopen the left navigation pane.
Under Admin configurations, choose Domains.
Choose Create domain.
Choose Set up for single user (Quick setup). Your domain and user profile are created automatically.
Follow the steps in Custom setup to Amazon SageMaker to set up SageMaker for your organization.
With the prerequisites complete, you’re ready to continue.
Code generation scenarios
The Mixtral 8x7B and Code Llama 70B models requires an ml.g5.48xlarge instance. SageMaker JumpStart provides a simplified way to access and deploy over 100 different open source and third-party foundation models. In order to deploy an endpoint using SageMaker JumpStart, you might need to request a service quota increase to access an ml.g5.48xlarge instance for endpoint use. You can request service quota increases through the AWS console, AWS Command Line Interface (AWS CLI), or API to allow access to those additional resources.
Code Llama use cases with SageMaker
While Code Llama excels at generating simple functions and scripts, its capabilities extend far beyond that. The models can generate complex code for advanced applications, such as building neural networks for machine learning tasks. Let’s explore an example of using Code Llama to create a neural network on SageMaker. Let us start with deploying the Code Llama Model through SageMaker JumpStart.
Launch SageMaker JumpStart Sign in to the console, navigate to SageMaker, and launch the SageMaker domain to open SageMaker Studio. Within SageMaker Studio, select JumpStart in the left-hand navigation menu.
Search for Code Llama 70B In the JumpStart model hub, search for Code Llama 70B in the search bar. You should see the Code Llama 70B model listed under the Models category.
Deploy the Model Select the Code Llama 70B model, and then choose Deploy. Enter an endpoint name (or keep the default value) and select the target instance type (for example, ml.g5.48xlarge). Choose Deploy to start the deployment process. You can leave the rest of the options as default.
Additional details on deployment can be found in Code Llama 70B is now available in Amazon SageMaker JumpStart
Create an inference endpoint After the deployment is complete, SageMaker will provide you with an inference endpoint URL. Copy this URL to use later.
Set set up your development environment You can interact with the deployed Code Llama 70B model using Python and the AWS SDK for Python (Boto3). First, make sure you have the required dependencies installed: pip install boto3
Note: This blog post section contains code that was generated with the assistance of Code Llama70B powered by Amazon Sagemaker.
Generating a transformer model for natural language processing
Let us walk through a code generation example with Code Llama 70B where you will generate a transformer model in python using Amazon SageMaker SDK.
Prompt:
[INST]
You are an expert code assistant that can teach a junior developer how to code. Your language of choice is Python. Don’t explain the code, just generate the code block itself. Always use Amazon SageMaker SDK for python code generation. Add test case to test the code
Generate a Python code that defines and trains a Transformer model for text classification on movie dataset. The python code should use Amazon SageMaker’s TensorFlow estimator and be ready for deployment on SageMaker.
[/INST]
Response:
Code Llama generates a Python script for training a Transformer model on the sample dataset using TensorFlow and Amazon SageMaker.
Code example: Create a new Python script (for example, code_llama_inference.py) and add the following code. Replace with the actual inference endpoint name provided by SageMaker JumpStart:
import boto3
import json
# Set up the SageMaker client
session = boto3.Session()
sagemaker_client = session.client(“sagemaker-runtime”)
# Set the inference endpoint URL
endpoint_name = “”
def query_endpoint(payload):
client = boto3.client(‘runtime.sagemaker’)
response = client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType=’application/json’,
Body=json.dumps(payload).encode(‘utf-8’),
)
response = response[“Body”].read().decode(“utf8″)
response = json.loads(response)
return response
def print_completion(prompt: str, response: str) -> None:
bold, unbold = ‘