LLMs and Chat
Pegasus comes with an optional Chat UI for interacting with LLMs. This section covers how it works and the various supported options.
Choosing an LLM model
Section titled “Choosing an LLM model”You can choose between two options for your LLM chat: OpenAI and LLM (generic). The OpenAI option limits you to OpenAI models, but supports streaming and asynchronous API access. The generic “LLM” option uses the litellm library and can be used with many different models---including local ones.
We recommend choosing “OpenAI” unless you know you want to use a different model.
Configuring OpenAI
Section titled “Configuring OpenAI”If you’re using OpenAI, you need to set OPENAI_API_KEY
in your environment or settings file (.env
in development).
You can also change the model used by setting OPENAI_MODEL
, which defaults to "gpt-4o"
.
See this page for help finding your OpenAI API key.
Configuring LLM
Section titled “Configuring LLM”If you built with generic LLM support, you can configure it by setting the LLM_MODELS
and DEFAULT_LLM_MODEL
values in your settings.py
. For example:
LLM_MODELS = { "gpt-3.5-turbo": {"api_key": env("OPENAI_API_KEY", default="")}, "gpt-4o": {"api_key": env("OPENAI_API_KEY", default="")}, "claude-3-opus-20240229": {"api_key": env("ANTHROPIC_API_KEY", default="")}, "ollama_chat/llama3": {"api_base": env("OLLAMA_API_BASE", default="http://localhost:11434")}, # requires a running ollama instance}DEFAULT_LLM_MODEL = env("DEFAULT_LLM_MODEL", default="gpt4")
The chat UI will use whatever is set in DEFAULT_LLM_MODEL
out-of-the-box, but you can quickly change it
to another model to try different options.
For further reading, see the documentation of the litellm Python API, and litellm providers.
Running open source LLMs
Section titled “Running open source LLMs”To run models like Mixtral or Llama3, you will need to run an Ollama server in a separate process.
- Download and run Ollama or use the Docker image
- Download the model you want to run:
See the documentation for the list of supported models.
Terminal window ollama pull llama3# or with dockerdocker exec -it ollama ollama pull llama3 - Update your django settings to point to the Ollama server. For example:
LLM_MODELS = {"ollama_chat/llama3": {"api_base": "http://localhost:11434"},}DEFAULT_LLM_MODEL = "ollama_chat/llama3"
- Restart your Django server.
The Chat UI
Section titled “The Chat UI”The Chat UI has multiple different implementations, and the one that is used for your project will be determined by your build configuration.
If you build with asynchronous functionality enabled and htmx then it will use a websocket-based Chat UI. This Chat UI supports streaming responses for OpenAI models, and is the recommended option.
If you build without asynchronous functionality enabled, the chat UI will instead use Celery and polling. The React version of the chat UI also uses Celery and polling. This means that Celery must be running to get responses from the LLM.