🤗 HuggingFace: Deploying an accelerated inference AI 🚀
HuggingFace is home to hundreds of state-of-the-art NLP models. In just a few lines of code, you can deploy one to Chai!
In this example I'll show you how to deploy Facebook's blenderbot with CPU-accelerated responses. It will learn from your user's inputs and get smarter as it talks to you!
First, head over to huggingface.co and create an account. Go to the top right corner to access your settings, where you will find your API Tokens from the left hand panel. Copy this!
import json
import requests
Import time
from chai_py import ChaiBot, Update
class Blenderbot(ChaiBot):
def setup(self):
self.ENDPOINT = "https://api-inference.huggingface.co/models/facebook/blenderbot-400M-distill" # You can change this URL to a different huggingface model!
self.headers = {"Authorization": "Bearer <YOUR API_TOKEN GOES HERE>"}
self.first_response = "Hello, I'm your friendly and helpful AI."
async def on_message(self, update: Update) -> str:
if update.latest_message.text == self.FIRST_MESSAGE_STRING:
return self.first_response
payload = await self.get_payload(update)
return self.query(payload)
def query(self, payload):
data = json.dumps(payload)
response = requests.post(self.ENDPOINT, headers=self.headers, data=data)
if response.status_code == 503: # This means we need to wait for the model to load 😴.
estimated_time = response.json()['estimated_time']
time.sleep(estimated_time)
self.logger.info(f"Sleeping for model to load: {estimated_time}")
data = json.loads(data)
data["options"] = {"use_cache": False, "wait_for_model": True}
data = json.dumps(data)
response = requests.post(
self.ENDPOINT,
headers=self.headers,
data=data
)
return json.loads(response.content.decode("utf-8"))["generated_text"]
async def get_payload(self, update):
messages = await self.get_messages(update.conversation_id)
past_user_inputs = ["Hey"] # You can change this!
generated_responses = [self.first_response]
for message in messages:
content = message.content
if content == self.FIRST_MESSAGE_STRING:
continue # We're not trying to keep track of our FIRST_MESSAGE_STRING (i.e. "__first")
if message.sender_uid == self.uid:
past_user_inputs.append(content) # Put the user's messages into past_user_inputs
else:
generated_responses.append(content) # Put the model generated messages into here
return {
"inputs": {
"past_user_inputs": past_user_inputs,
"generated_responses": generated_responses,
"text": update.latest_message.text,
},
}
And that's it! Copy & paste the code snippet above, and you have your accelerated chat AI!
Next Steps:
Try changing a few things!
- the model URL (e.g. try setting it to "https://api-inference.huggingface.co/models/microsoft/DialoGPT-medium")
- self.first_response - the model will learn from this! For example, you could change this to "Hello, I'm mean and sarcastic."
Here's an excerpt from a conversation I had with the AI:
>>> Hello, I'm mean and sarcastic.
Enter your message: hi
>>> What do you like to do for fun? I like to play video games and watch movies.
Enter your message: I like to knit sweaters for rescue puppies!
>>> That's so nice of you. What kind of dog do you have? I have a pitbull.
Enter your message: I have a whippet and a great dane. I love them both! What's your pitbull called?
>>> Great Danes are so cute! My dog's name is Oreo. Do you have any other pets?
Enter your message: Awww, that's adorable. I also have a pet turtle named Neo :)
>>> Neo is a cool name for a turtle. What kind of turtle is it?
Enter your message: To be honest, I'm not sure. My friend gave him to me when he was leaving town! Do you know much about turtles?
>>> No, I don't. I'm more of a dog person. What kind of turtle is it?