Skip to content

ALUE: Aerospace Language Understanding and Evaluation

The Aerospace Language Understanding and Evaluation (ALUE) framework is an open-source system for benchmarking and evaluating large language models (LLMs) on tasks relevant to aerospace, safety-critical domains, and general language understanding.

ALUE provides: - A consistent interface for multiple task types, including multiple-choice question answering (MCQA), summarization, and retrieval-augmented generation (RAG).
- Evaluation methods that combine traditional metrics (e.g., recall@k, token-level F1) with LLM-based evaluation metrics (e.g., context relevancy, composite correctness, claim decomposition).
- Extensible templates and configuration utilities to support additional domains and tasks.


Key Features

  • Backend Flexibility
    ALUE integrates with a variety of inference and embedding providers:
  • Inference backends: openai, vllm, tgi, ollama, transformers
  • Embedding providers: openai, ollama, hf, local, openai-compatible

  • Evaluation Beyond Token Overlap
    Incorporates LLM-judge metrics that provide a more nuanced and robust assessment of correctness and factual grounding, particularly for long-form and generative responses.

  • Structured Prompting
    All tasks use message templates with defined variables. This enables transparent, reproducible, and customizable prompt construction.

  • Task-Specific Evaluation
    Each task is accompanied by its own evaluation methodology and metrics tailored to the problem type.


Documentation Structure

  • Setup: Installation and configuration of ALUE, including inference and embedding backends.
  • Tasks: Task-specific documentation and examples:
  • MCQA
  • Summarization
  • RAG
  • Contributing: Guidelines for extensions and contributions.
  • API Reference: Generated reference documentation for ALUE modules.

Quickstart

1. Install dependencies

# Recommended: using uv
uv sync

# Alternative: using pip
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

2. Configure environment

# Copy example environment file
cp .env.example .env

# Edit .env to add your API keys and endpoints
# At minimum, set:
# ALUE_ENDPOINT_TYPE=openai
# ALUE_OPENAI_API_KEY=sk-...

3. Run a simple example

# Example: Multiple Choice QA with OpenAI
python -m scripts.mcqa inference \
  -i data/aviation_knowledge_exam/3_1_aviation_test.json \
  -o runs/mcqa \
  -m gpt-4o-mini \
  --task_type aviation_exam \
  --num_examples 3

# Results will be saved to runs/mcqa_<timestamp>/predictions.json

4. Verify installation

# Run the test suite
pytest tests

Next Steps

Citation

If you use ALUE in academic or applied work, please cite:

@inproceedings{alue2025, title = {ALUE: Aerospace Language Understanding and Evaluation}, author = {…}, booktitle = {AIAA Scitech Forum}, year = {2025}, doi = {10.2514/6.2025-3247} }