Architecture¶

Documentation Path

You are here: Developer Guide > Architecture Overview

For detailed pipeline: See Data Flow Pipeline
For MappingEngine: See MappingEngine User Guide
For practical examples: See Workflows

System Overview¶

graph TD
A[User] -->|CLI Commands| B[Click CLI]
B --> C[Download Command]
B --> D[Export Command]
B --> E[Catalog Command]

C --> F[WorkbenchScraper]
F --> G[CIS WorkBench]
F --> H[Pydantic Models]
H --> I[JSON Storage]

I --> D
D --> J[ExporterFactory]
J --> K[Unified XCCDF Exporter]
J --> L[CSV/JSON/YAML Exporters]

K --> M[MappingEngine]
M --> N[YAML Config]
N --> O[disa_style.yaml]
N --> P[cis_style.yaml]

E --> Q[Catalog Database]
E --> R[Catalog Scraper]
R --> G
Q --> S[SQLite + FTS5]

Core Components¶

CLI Layer¶

Framework: Click Entry Point: src/cis_bench/cli/app.py

graph LR
A[cis-bench] --> B[download]
A --> C[export]
A --> D[list]
A --> E[info]
A --> F[catalog]

F --> F1[refresh]
F --> F2[search]
F --> F3[download]
F --> F4[update]

Commands:

download - Fetch benchmarks from WorkBench
export - Convert to formats (JSON, YAML, CSV, MD, XCCDF)
list - Show downloaded benchmarks
info - Display benchmark details
catalog - Manage catalog (8 subcommands)

Fetcher Layer¶

Purpose: Download and parse benchmarks from CIS WorkBench

graph TD
A[WorkbenchScraper] --> B[AuthManager]
B --> C[Browser Cookies]
B --> D[Cookie File]

A --> E[Strategy Pattern]
E --> F[v1_current Strategy]
E --> G[Future Strategies]

A --> H[HTMLParser]
H --> I[BeautifulSoup]

A --> J[Pydantic Models]
J --> K[Benchmark]
J --> L[Recommendation]

Components:

WorkbenchScraper - Main scraper class
AuthManager - Cookie extraction
Strategy Pattern - Adapts to HTML changes
HTMLParser - Extract data from HTML

Output: Validated Pydantic models (19 fields per recommendation)

Export Layer - Config-Driven Architecture¶

Key Innovation: ONE exporter class, multiple styles via YAML config

graph TD
A[ExporterFactory] -->|style=disa| B[XCCDFExporter]
A -->|style=cis| B
A --> C[JSONExporter]
A --> D[YAMLExporter]
A --> E[CSVExporter]
A --> F[MarkdownExporter]

B --> G[MappingEngine]
G --> H[disa_style.yaml]
G --> I[cis_style.yaml]

G --> J[Loop-Driven Mapping]
J --> K[map_benchmark]
J --> L[map_group]
J --> M[map_rule]

M --> N[xsdata Models]
N --> O[XCCDF XML]

Factory Pattern:

# Create exporter with style parameter
exporter = ExporterFactory.create("xccdf", style="cis")

# Adding new style: just create YAML config
# pci_dss_style.yaml -> ExporterFactory.create("xccdf", style="pci-dss")

MappingEngine - The Heart of XCCDF Export¶

Config-Driven Transformation:

graph LR
A[Pydantic Model] --> B[MappingEngine]
B --> C[Load YAML Config]
C --> D[Field Mappings]
C --> E[Transformations]
C --> F[Type Specs]

B --> G[Loop Through Config]
G --> H[Apply Transforms]
H --> I[Build xsdata Objects]

I --> J[XCCDF XML]

Key Principle: Changes happen in YAML, not code

Example - Adding a new field:

# In cis_style.yaml
field_mappings:
new_field:
target_element: "custom-element"
source_field: "my_data"
transform: "html_to_markdown"

No Python code changes needed!

Loop-Driven Methods:

map_rule() - Loops through field_mappings config
map_group() - Loops through group_elements config
map_benchmark() - Loops through benchmark config

Catalog System¶

Purpose: Browse and discover 1300+ CIS benchmarks

graph TD
A[Catalog CLI] --> B[CatalogSearch]
A --> C[CatalogScraper]
A --> D[CatalogDownloader]

C --> E[WorkBench HTML]
C --> F[CatalogParser]
F --> G[Extract Metadata]

G --> H[CatalogDatabase]
H --> I[SQLite]
I --> J[8 Tables - 3NF]
I --> K[FTS5 Search]

B --> H
D --> H
D --> L[WorkbenchScraper]
L --> M[Download Benchmark]
M --> H

Database Schema (3NF):

platforms - Operating System, Cloud, Database
benchmark_statuses - Published, Draft, Archived
communities - Development communities
collections - Categories/tags
owners - Authors
catalog_benchmarks - Main table with FKs
downloaded_benchmarks - Cached content
scrape_metadata - Tracking

FTS5 Search:

Fuzzy matching ("ubunt" finds "ubuntu")
Multi-word search
Ranked results (BM25)
Fast (<1ms for 1000+ records)

Data Flow¶

Download Workflow¶

sequenceDiagram
User->>CLI: cis-bench download 23598
CLI->>AuthManager: Get cookies
AuthManager->>Browser: Extract cookies
Browser-->>AuthManager: Session cookies
CLI->>WorkbenchScraper: Fetch benchmark
WorkbenchScraper->>WorkBench: HTTP GET (authenticated)
WorkBench-->>WorkbenchScraper: HTML
WorkbenchScraper->>HTMLParser: Parse HTML
HTMLParser->>Strategy: Extract fields
Strategy-->>WorkbenchScraper: Data dict
WorkbenchScraper->>Pydantic: Validate
Pydantic-->>CLI: Benchmark object
CLI->>File: Save JSON

Export Workflow¶

sequenceDiagram
User->>CLI: cis-bench export --format xccdf --style cis
CLI->>ExporterFactory: create("xccdf", style="cis")
ExporterFactory-->>CLI: XCCDFExporter(cis)
CLI->>XCCDFExporter: export(benchmark)
XCCDFExporter->>MappingEngine: Load cis_style.yaml
MappingEngine->>Config: Read field_mappings
MappingEngine->>FieldLoop: For each field in config
FieldLoop->>Transform: Apply transformation
Transform->>xsdata: Build typed objects
xsdata->>XML: Serialize
XCCDFExporter->>PostProcessor: Inject metadata
PostProcessor-->>XCCDFExporter: Final XML
XCCDFExporter->>File: Write XCCDF

Catalog Workflow¶

sequenceDiagram
User->>CLI: cis-bench catalog refresh
CLI->>CatalogScraper: scrape_full_catalog()
CatalogScraper->>WorkBench: GET page 1
WorkBench-->>CatalogScraper: HTML
CatalogScraper->>Parser: parse_catalog_page()
Parser-->>CatalogScraper: List[benchmark_data]
CatalogScraper->>Database: insert_benchmark()
Database->>SQLite: INSERT + FTS5 update
Note over CatalogScraper,SQLite: Repeat for 68 pages
CatalogScraper-->>CLI: Stats

User->>CLI: cis-bench catalog search "ubuntu"
CLI->>CatalogSearch: search()
CatalogSearch->>Database: FTS5 query
Database->>SQLite: SELECT with MATCH
SQLite-->>Database: Ranked results
Database-->>CLI: Benchmark list

Design Patterns¶

Strategy Pattern (Fetcher)¶

Problem: CIS WorkBench HTML changes over time

Solution: Version-specific strategies

class StrategyDetector:
strategies = [
v1_2025_10, # Current
v1_2024_06, # Older
]

def detect(html):
for strategy in strategies:
if strategy.can_handle(html):
return strategy

Adding new HTML format: 1. Create new strategy class 2. Register with detector 3. Old benchmarks still work

Factory Pattern (Exporters)¶

Problem: Multiple export formats

Solution: Pluggable exporters

ExporterFactory.register("xccdf", XCCDFExporter)
ExporterFactory.register("csv", CSVExporter)

# Create exporter
exporter = ExporterFactory.create("xccdf", style="cis")

Adding new format: 1. Create exporter class 2. Register with factory 3. Available in CLI

Config-Driven Mapping (XCCDF)¶

Problem: Multiple XCCDF styles with different structures

Solution: YAML-based configuration

# cis_style.yaml
field_mappings:
title:
target_element: "title"
source_field: "title"
transform: "strip_html"

cis_controls_metadata:
structure: "metadata_from_config"
requires_post_processing: true

Adding new XCCDF style: 1. Create style_name_style.yaml 2. Define field mappings 3. Use: --format xccdf --style style_name

No code changes!

Technology Stack¶

Core:

Python 3.12+
Click (CLI framework)
Pydantic (data validation)
SQLModel (database ORM)

Scraping:

requests (HTTP)
BeautifulSoup4 (HTML parsing)
browser-cookie3 (authentication)

Export:

xsdata (XCCDF models from XSD)
lxml (XML processing)
PyYAML (config files)

Database:

SQLite (catalog storage)
SQLAlchemy (via SQLModel)
FTS5 (full-text search)

CLI/UX:

Rich (formatting, progress bars)
questionary (interactive prompts)

Development:

pytest (testing)
ruff (linting/formatting)
bandit (security)
pre-commit (hooks)
mkdocs-material (documentation)

File Organization¶

src/cis_bench/
├── cli/ # Click commands
│ ├── app.py # Main CLI
│ └── commands/ # Command modules
├── fetcher/ # WorkBench scraper
│ ├── workbench.py # Main scraper
│ ├── auth.py # Authentication
│ └── strategies/ # HTML strategies
├── exporters/ # Format exporters
│ ├── base.py # Factory
│ ├── xccdf_unified_exporter.py # Config-driven XCCDF
│ ├── mapping_engine.py # YAML XCCDF
│ ├── configs/ # YAML configs
│ └── [format]_exporter.py
├── catalog/ # Catalog system
│ ├── database.py # SQLModel database
│ ├── models.py # Database models
│ ├── scraper.py # Multi-page scraper
│ ├── parser.py # HTML parser
│ ├── search.py # Search/filter
│ └── downloader.py # Smart download
├── models/ # Data models
│ ├── benchmark.py # Pydantic models
│ ├── cis_controls_official.py
│ ├── enhanced_metadata.py
│ └── xccdf/ # xsdata generated
├── utils/ # Utilities
│ ├── xml_utils.py # XCCDF processing
│ ├── html_parser.py # HTML cleaning
│ ├── logging_config.py # Logging setup
│ └── ...
├── validators/ # XCCDF validators
└── config.py # Environment config

Configuration System¶

Environment-Based Config¶

graph TD
A[Config.py] --> B{Environment?}
B -->|test| C[/tmp/cis-bench-test/]
B -->|dev| D[~/.cis-bench-dev/]
B -->|production| E[~/.cis-bench/]

F[pytest] --> G[Set CIS_BENCH_ENV=test]
H[Developer] --> I[export CIS_BENCH_ENV=dev]
J[User] --> K[Default production]

Environment variable: CIS_BENCH_ENV

test - Pytest (automatic)
dev - Development work
production - Default

Optional .env file:

# ~/.cis-bench/.env
CIS_BENCH_ENV=dev
CIS_BENCH_SSL_VERIFY=false

Loaded automatically via python-dotenv.

YAML Mapping System¶

How Styles Work¶

graph LR
A[Benchmark JSON] --> B[MappingEngine]
B --> C{Load Config}
C --> D[disa_style.yaml]
C --> E[cis_style.yaml]

D --> F[Field Mappings]
E --> F

F --> G[Loop-Driven]
G --> H[For each field in config]
H --> I[Apply transform]
I --> J[Build xsdata object]

J --> K[XCCDF XML]

YAML Structure¶

# Style metadata
metadata:
style_name: "cis"
xccdf_version: "1.2"

# Field mappings (loop-driven)
field_mappings:

# Simple field
title:
target_element: "title"
source_field: "title"
transform: "strip_html"

# Complex nested structure
cis_controls_metadata:
target_element: "metadata"
structure: "metadata_from_config"
requires_post_processing: true

# Multiple values
reference:
target_element: "reference"
multiple: true
structure: "dublin_core"
source_field: "nist_controls"

# Transformations
transformations:
strip_html:
function: "HTMLCleaner.strip_html"

html_to_markdown:
function: "HTMLCleaner.html_to_markdown"

Structure Types¶

Supported structures:

simple - Direct field mapping
nested - Parent with child elements
dublin_core - Reference with DC metadata
metadata_from_config - Generic nested XML (CIS Controls, hierarchies)
ident_from_list - Generic ident generation (CCIs, MITRE, PCI-DSS)
embedded_xml_tags - VulnDiscussion style

Adding new structure: 1. Define in YAML with structure: "new_type" 2. Add handler in MappingEngine 3. OR use existing structure types

Database Schema¶

Catalog Database (SQLite)¶

erDiagram
catalog_benchmarks ||--o{ downloaded_benchmarks : has
catalog_benchmarks }o--|| platforms : belongs_to
catalog_benchmarks }o--|| benchmark_statuses : has
catalog_benchmarks }o--|| communities : belongs_to
catalog_benchmarks }o--|| owners : owned_by
catalog_benchmarks }o--o{ collections : has

catalog_benchmarks {
string benchmark_id PK
string title
string version
int status_id FK
int platform_id FK
int community_id FK
int owner_id FK
date published_date
bool is_latest
}

downloaded_benchmarks {
string benchmark_id PK
text content_json
string content_hash
int recommendation_count
datetime downloaded_at
}

platforms {
int platform_id PK
string name
}

benchmark_statuses {
int status_id PK
string name
bool is_active
}

3NF Normalization:

No data duplication
Referential integrity via FKs
Fast queries with indexes

FTS5 Virtual Table:

CREATE VIRTUAL TABLE benchmarks_fts USING fts5(
benchmark_id UNINDEXED,
title,
platform,
community,
description,
tokenize='porter unicode61'
);

Provides fuzzy full-text search.

Extensibility Points¶

Adding New Export Format¶

Create exporter class:

class NewFormatExporter(BaseExporter):
def export(self, benchmark, output_path):
# Convert and write
pass

Register:

ExporterFactory.register("newformat", NewFormatExporter)

Use:

cis-bench export benchmark.json --format newformat

Adding New XCCDF Style¶

Create YAML config:

# custom_style.yaml
metadata:
style_name: "custom"
xccdf_version: "1.2"

field_mappings:
# Define mappings