ALUE: Aerospace Language Understanding and Evaluation¶
The Aerospace Language Understanding and Evaluation (ALUE) framework is an open-source system for benchmarking and evaluating large language models (LLMs) on tasks relevant to aerospace, safety-critical domains, and general language understanding.
ALUE provides:
- A consistent interface for multiple task types, including multiple-choice question answering (MCQA), summarization, and retrieval-augmented generation (RAG).
- Evaluation methods that combine traditional metrics (e.g., recall@k, token-level F1) with LLM-based evaluation metrics (e.g., context relevancy, composite correctness, claim decomposition).
- Extensible templates and configuration utilities to support additional domains and tasks.
Key Features¶
- Backend Flexibility
ALUE integrates with a variety of inference and embedding providers: - Inference backends:
openai,vllm,tgi,ollama,transformers -
Embedding providers:
openai,ollama,hf,local,openai-compatible -
Evaluation Beyond Token Overlap
Incorporates LLM-judge metrics that provide a more nuanced and robust assessment of correctness and factual grounding, particularly for long-form and generative responses. -
Structured Prompting
All tasks use message templates with defined variables. This enables transparent, reproducible, and customizable prompt construction. -
Task-Specific Evaluation
Each task is accompanied by its own evaluation methodology and metrics tailored to the problem type.
Documentation Structure¶
- Setup: Installation and configuration of ALUE, including inference and embedding backends.
- Tasks: Task-specific documentation and examples:
- MCQA
- Summarization
- RAG
- Contributing: Guidelines for extensions and contributions.
- API Reference: Generated reference documentation for ALUE modules.
Quickstart¶
1. Install dependencies¶
# Recommended: using uv
uv sync
# Alternative: using pip
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
2. Configure environment¶
# Copy example environment file
cp .env.example .env
# Edit .env to add your API keys and endpoints
# At minimum, set:
# ALUE_ENDPOINT_TYPE=openai
# ALUE_OPENAI_API_KEY=sk-...
3. Run a simple example¶
# Example: Multiple Choice QA with OpenAI
python -m scripts.mcqa inference \
-i data/aviation_knowledge_exam/3_1_aviation_test.json \
-o runs/mcqa \
-m gpt-4o-mini \
--task_type aviation_exam \
--num_examples 3
# Results will be saved to runs/mcqa_<timestamp>/predictions.json
4. Verify installation¶
# Run the test suite
pytest tests
Next Steps¶
- Getting Started — Detailed installation and configuration
- Tasks — Task-specific guides:
- MCQA — Multiple choice question answering
- RAG — Retrieval-augmented generation
- Summarization — Narrative summarization
- Extractive QA — Span extraction
- Models & Backends — Supported inference engines and embedding providers
- Configuration — Complete environment variables reference
Citation¶
If you use ALUE in academic or applied work, please cite:
@inproceedings{alue2025, title = {ALUE: Aerospace Language Understanding and Evaluation}, author = {…}, booktitle = {AIAA Scitech Forum}, year = {2025}, doi = {10.2514/6.2025-3247} }