Models & Backends¶
ALUE is designed to be backend-agnostic, supporting multiple inference engines and embedding providers.
All configuration is controlled through environment variables (see Configuration).
Inference Engines¶
ALUE supports the following inference engines for running LLMs:
| Engine | Alias (ALUE_ENDPOINT_TYPE) |
Connection Mode | Description |
|---|---|---|---|
| OpenAI | openai |
API | Direct connection to the OpenAI Chat Completions API. Requires ALUE_OPENAI_API_KEY. |
| vLLM (online) | vllm |
API (OpenAI-compatible) | Connects to a running vLLM server via REST using the OpenAI-compatible API. |
| TGI | tgi |
API (OpenAI-compatible) | Connects to Hugging Face Text Generation Inference (TGI) endpoints. |
| Ollama | ollama |
API (OpenAI-compatible) | Connects to Ollama locally via its OpenAI-compatible API. |
| vLLM (offline) | vllm-offline |
Local (no API) | Runs vLLM entirely offline inside the Python process (no server needed). |
| Transformers | transformers |
Local (no API) | Runs models directly using Hugging Face transformers. Requires local model weights. |
Key distinction:
- API (OpenAI-compatible): vLLM, TGI, and Ollama integrate with ALUE by exposing an OpenAI-style endpoint.
- Offline/local: vLLM in offline mode (vllm-offline) and Hugging Face transformers run fully within your Python process — no external endpoint required.
Example setting in .env:
ALUE_ENDPOINT_TYPE=openai
ALUE_OPENAI_API_KEY=sk-xxxx
LLM Judge Engines¶
Some tasks (e.g., RAG and Summarization) require an additional LLM as a judge for evaluation. Configuration mirrors the main inference engine, with the following variables:
ALUE_LLM_JUDGE_ENDPOINT_TYPEALUE_LLM_JUDGE_ENDPOINT_URLALUE_LLM_JUDGE_OPENAI_API_KEY
By default, the judge can be the same engine as the inference model, but this is not recommended (to reduce evaluation bias).
Embedding Engines¶
For retrieval-based tasks (e.g., RAG), ALUE requires embeddings. Supported embedding backends:
| Engine | Alias (EMBEDDING_ENDPOINT_TYPE) |
Notes |
|---|---|---|
| OpenAI | openai |
Embeddings via OpenAI API (requires EMBEDDING_API_KEY). |
| Ollama | ollama |
Uses local Ollama models for embeddings. |
| Hugging Face | hf |
Uses Hugging Face models (e.g., sentence-transformers). Requires HF_TOKEN. |
| Local | local (default) |
Uses bundled local embedding models. |
| OpenAI-compatible | openai-compatible |
Custom OpenAI-style embedding endpoints. |
Example setting in .env:
EMBEDDING_ENDPOINT_TYPE=hf
HF_TOKEN=hf_xxxx