Building a Multi-Modal Twitch Bot in Python using GPT-4o
Introduction
Twitch chat bots offer a fun way to enhance your stream's interactivity. They can engage with viewers, answer questions, and create an engaging experience. However, their capabilities have been limited – unless implemented directly in the game, bots could only send messages to the chat without interacting directly with the streamer or game. Using Conduit, bots can now interact with the streamer through their audio and video with just a few lines of code. In this blog post, we'll walk through building a multi-modal Twitch bot in Python using Conduit. This bot will listen to and watch any Twitch livestream, responding to the video as if it were a real viewer.
If you're interested in the code for this project, you can find it on our examples repo.
Setup
Getting an API Key
Before we begin writing any code, we need to generate an API key for the Conduit service.
To do this, navigate to your Conduit Settings, sign in, and click on the "+ API Key" button in the top right.
API keys can not be recovered from the dashboard. Make sure to securely save your key in a password manager once you've generated it!
Installing the Python Packages
Before we begin, we need to make sure twitchio and socketio are installed. We will use TwitchIO to handle the interaction with Twitch and SocketIO to connect to the Conduit service. To install these packages, we can use pip:
pip install twitchio python-socketio
Be sure to install python-socketio and not socketio. The socketio package is not the same as python-socketio and will cause errors when running the bot.
Coding the bot
Using the OpenAI Vision API
The first step in creating a Twitch bot that can see and listen to a stream is writing the code to allow us to process images / transcriptions and respond to them in a way that makes sense. To do this, we will leverage the OpenAI vision API, passing it a base64 encoded image and the transcription from the stream.
First, we should create a OpenAIClient class that sets up OpenAI. We'll create this new class in the file open_ai_client.py
.
Next, we can create a method to generate a response to the streamer. This method will use what the streamer said and the image associated with it to prompt OpenAI for a response that sounds like it came from a Twitch chatter.
Here we reduce the max number of tokens in the response since Twitch chats are usually fairly short. Calling generate_response
, should now generate a one sentence response to what the streamer said. You can and should play with the max_tokens
and system_prompt
to customize the "feel" of the bot.
Connecting to Twitch
Now that the bot has a way to respond to it's incoming data, we will write the code to connect our bot to Twitch so that it can send messages in any given Twitch chat. To do this we first need to create a new Python file called twitch_bot.py
and add the following constructor:
This will spin up a new Twitch bot that will process commands prefixed with the '!' character (e.g. !hello). It will also join your Twitch chat automatically so that we don't need to manually join it later.
To get your Twitch OAuth token, you can head to https://twitchapps.com/tmi/.
To help with debugging, we can add an event handler to log our username when we are done connecting to Twitch. To do that, we'll create the event_ready
method:
Finally, we can write the method to actually send a message to the Twitch chat. We'll call this method respond_to_streamer
since it will take the transcription and image data from the stream and use OpenAI to come up with a believable response.
The powerful part of this code is await self.get_channel(channel).send(response)
. This will get try to get the channel name from our cache and send a message in the chat. If we aren't connected to the stream get_channel
will return None
and this will fail.
Believe it or not, that's all the code we actually need to have a functioning Twitch bot. Now that we have the ability to interact in chat, let's move on to using Conduit to pull in transcriptions and images.
Getting Real-Time Transcriptions / Images from Twitch
Conduit makes it easy to get both transcriptions and images from Twitch livestreams in real-time. We can start by creating the ConduitClient
class in a new python file called conduit_client.py
. Since Conduit uses SocketIO to interact with clients / send events, we will spin up a new SocketIO AsyncClient
in our constructor. We can then register the event handlers to process incoming messages from Conduit, and create a new TwitchBot
so that we can respond to the transcriptions that we receive.
Event Handlers
Conduit emits multiple event types that we should listen to in our client. We'll first create a private method __register_event_handlers
that will allow us to listen / process these events in our code, then we'll jump into the implementations of each event handler.
You can see the full list of events Conduit will emit in our websocket documentation.
The event being emitted from Conduit that we will be most interested in is the livestream_data_event
. This event will contain the stream url the transcription comes from, the transcription message, and an image from the stream. We can pull out all three of these data points from the event and pass it directly to our TwitchBot
which will use OpenAI to process a response and send a message in the Twitch chat.
The next event we need to listen for is the error_event
. While not quite a fun as the livestream_data_event
, this event is equally important as it will notify you when something is wrong with your request or when there was an issue on Conduit's side. Each error_event
will contain the error_code
and the error_message
for the error. In this example, we'll simply print out when we see an error.
You can see all possible errors Conduit can emit on our websocket documentation.
Finally, there are the subscribe_event
and unsubscribe_event
. These events will fire any time you join or leave a livestream and contain a list of all streams you're currently subscribed to. To keep things simple, in these event handlers we'll just print the entire list out.
Connecting / Disconnecting from Conduit
Now that we have a way to process all the data we get from Conduit, we can start working on a way to connect / disconnect from Conduit. Since Conduit uses SocketIO, connecting to the client is fairly straight forward. All we need to do is call connect
on our SocketIO client, passing the Conduit data url and the API key we generated earlier. After connecting to Conduit, we can go ahead and connect to Twitch as well.
Disconnecting from these clients is even easier. All we need to do is call disconnect
/ close
.
Joining / Leaving Twitch Streams
With the rest of the Conduit client wired up, we can finally write the code to join Twitch livestreams and begin listening for the transcriptions / images. To do this, we can emit a subscribe
event with the stream_url
we want to join and the content_type
we're interested in.
At the time of writing this, Conduit only supports subscribing to "all" content. We are currently working to allow subscribing to separate data streams. Please check our websocket documentation for the most up-to-date information.
Similarly, we can unsubscribe from streams by emitting an unsubscribe_event
.
Running the Bot
With the ConduitClient
finished, we can now write the main driver code for the bot in main.py
. This code will run an asyncio coroutine that connects to Conduit and joins our livestream. Once we join the stream, we should start receiving transcriptions and which will be forward to our TwitchBot
. Pressing ctrl + c
will stop the bot and disconnect us from Conduit and Twitch.
Conclusion
And that's all folks! In this blog post, we saw just how easy it is to leverage real-time transcriptions and images from Twitch livestreams to build an AI powered Twitch chat bot. Using this technology, you can build highly intelligent chat bots that interact directly with the streamer and that build engagement with the Twitch audience.