Mission Chai

🤗 HuggingFace: Deploying an accelerated inference AI 🚀

HuggingFace is home to hundreds of state-of-the-art NLP models. In just a few lines of code, you can deploy one to Chai!

In this example I'll show you how to deploy Facebook's blenderbot with CPU-accelerated responses. It will learn from your user's inputs and get smarter as it talks to you!

First, head over to huggingface.co and create an account. Go to the top right corner to access your settings, where you will find your API Tokens from the left hand panel. Copy this!

import json
import requests
Import time

from chai_py import ChaiBot, Update

class Blenderbot(ChaiBot):
    def setup(self):
        self.ENDPOINT = "https://api-inference.huggingface.co/models/facebook/blenderbot-400M-distill" # You can change this URL to a different huggingface model!
        self.headers = {"Authorization": "Bearer <YOUR API_TOKEN GOES HERE>"}
        self.first_response = "Hello, I'm your friendly and helpful AI."

    async def on_message(self, update: Update) -> str:
        if update.latest_message.text == self.FIRST_MESSAGE_STRING:
            return self.first_response
        payload = await self.get_payload(update)
        return self.query(payload)

    def query(self, payload):
        data = json.dumps(payload)
        response = requests.post(self.ENDPOINT, headers=self.headers, data=data)
        if response.status_code == 503: # This means we need to wait for the model to load 😴.
            estimated_time = response.json()['estimated_time']
            self.logger.info(f"Sleeping for model to load: {estimated_time}")
            data = json.loads(data)
            data["options"] = {"use_cache": False, "wait_for_model": True}
            data = json.dumps(data)
            response = requests.post(
        return json.loads(response.content.decode("utf-8"))["generated_text"]

    async def get_payload(self, update):
        messages = await self.get_messages(update.conversation_id)
        past_user_inputs = ["Hey"] # You can change this!
        generated_responses = [self.first_response]
        for message in messages:
            content = message.content
            if content == self.FIRST_MESSAGE_STRING:
                continue # We're not trying to keep track of our FIRST_MESSAGE_STRING (i.e. "__first")
            if message.sender_uid == self.uid:
                past_user_inputs.append(content) # Put the user's messages into past_user_inputs
                generated_responses.append(content) # Put the model generated messages into here
            return {
                "inputs": {
                    "past_user_inputs": past_user_inputs,
                    "generated_responses": generated_responses,
                    "text": update.latest_message.text,

And that's it! Copy & paste the code snippet above, and you have your accelerated chat AI!

Next Steps:

Try changing a few things!

Here's an excerpt from a conversation I had with the AI:

>>> Hello, I'm mean and sarcastic.
Enter your message: hi

>>>  What do you like to do for fun? I like to play video games and watch movies.
Enter your message: I like to knit sweaters for rescue puppies!

>>>  That's so nice of you. What kind of dog do you have? I have a pitbull.
Enter your message: I have a whippet and a great dane. I love them both! What's your pitbull called?

>>>  Great Danes are so cute! My dog's name is Oreo. Do you have any other pets?
Enter your message: Awww, that's adorable. I also have a pet turtle named Neo :)

>>>  Neo is a cool name for a turtle. What kind of turtle is it?
Enter your message: To be honest, I'm not sure. My friend gave him to me when he was leaving town! Do you know much about turtles?

>>>  No, I don't. I'm more of a dog person. What kind of turtle is it?

5 months ago

Christie-Carol Beauchamp