Skip to content

Models & Backends

ALUE is designed to be backend-agnostic, supporting multiple inference engines and embedding providers.
All configuration is controlled through environment variables (see Configuration).


Inference Engines

ALUE supports the following inference engines for running LLMs:

Engine Alias (ALUE_ENDPOINT_TYPE) Connection Mode Description
OpenAI openai API Direct connection to the OpenAI Chat Completions API. Requires ALUE_OPENAI_API_KEY.
vLLM (online) vllm API (OpenAI-compatible) Connects to a running vLLM server via REST using the OpenAI-compatible API.
TGI tgi API (OpenAI-compatible) Connects to Hugging Face Text Generation Inference (TGI) endpoints.
Ollama ollama API (OpenAI-compatible) Connects to Ollama locally via its OpenAI-compatible API.
vLLM (offline) vllm-offline Local (no API) Runs vLLM entirely offline inside the Python process (no server needed).
Transformers transformers Local (no API) Runs models directly using Hugging Face transformers. Requires local model weights.

Key distinction:
- API (OpenAI-compatible): vLLM, TGI, and Ollama integrate with ALUE by exposing an OpenAI-style endpoint.
- Offline/local: vLLM in offline mode (vllm-offline) and Hugging Face transformers run fully within your Python process — no external endpoint required.

Example setting in .env:

ALUE_ENDPOINT_TYPE=openai
ALUE_OPENAI_API_KEY=sk-xxxx

LLM Judge Engines

Some tasks (e.g., RAG and Summarization) require an additional LLM as a judge for evaluation. Configuration mirrors the main inference engine, with the following variables:

  • ALUE_LLM_JUDGE_ENDPOINT_TYPE
  • ALUE_LLM_JUDGE_ENDPOINT_URL
  • ALUE_LLM_JUDGE_OPENAI_API_KEY

By default, the judge can be the same engine as the inference model, but this is not recommended (to reduce evaluation bias).


Embedding Engines

For retrieval-based tasks (e.g., RAG), ALUE requires embeddings. Supported embedding backends:

Engine Alias (EMBEDDING_ENDPOINT_TYPE) Notes
OpenAI openai Embeddings via OpenAI API (requires EMBEDDING_API_KEY).
Ollama ollama Uses local Ollama models for embeddings.
Hugging Face hf Uses Hugging Face models (e.g., sentence-transformers). Requires HF_TOKEN.
Local local (default) Uses bundled local embedding models.
OpenAI-compatible openai-compatible Custom OpenAI-style embedding endpoints.

Example setting in .env:

EMBEDDING_ENDPOINT_TYPE=hf
HF_TOKEN=hf_xxxx