nlp2cmd

Schema System Architecture

Overview

System schematów w NLP2CMD pozwala na ekstrakcję, przechowywanie i wykorzystanie metadanych poleceń do generowania precyzyjnych komend z naturalnego języka.

Core Concepts

Schema

Schema opisuje strukturę polecenia:

Nazwa komendy
Parametry (wymagane/opcjonalne)
Szablony użycia
Przykłady

{
  "command": "docker",
  "version": "1.0",
  "description": "Docker container management",
  "category": "container",
  "parameters": [
    {
      "name": "image",
      "type": "string",
      "required": true,
      "description": "Container image"
    }
  ],
  "templates": [
    "docker run {image}",
    "docker run -d {image}"
  ]
}

Components

1. Schema Extraction (`src/nlp2cmd/schema_extraction/`)

Ekstrakcja schematów z różnych źródeł:

man pages
--help output
AppSpec files
Istniejące komendy

Key Classes:

from nlp2cmd.schema_extraction import SchemaRegistry

registry = SchemaRegistry(
    use_per_command_storage=True,
    storage_dir="./command_schemas"
)

# Extract from command help
schema = registry.register_shell_help("docker")

# Extract from OpenAPI
schema = registry.register_openapi_schema("https://api.example.com/openapi.json")

2. Schema Storage (`src/nlp2cmd/storage/`)

Przechowywanie schematów w command_schemas/:

Per-command storage: każdy schemat w osobnym pliku JSON
Kategoryzacja w categories/
Index w index.json

Structure:

command_schemas/
├── commands/           # Individual command schemas (per-command storage)
├── categories/         # Schema categories index
├── index.json          # Master index of all schemas
└── *.json              # Schema files (docker.json, nginx.json, etc.)

3. Schema-Based Generation (`src/nlp2cmd/generation/schema/`)

Note: Module przeniesiony z schema_based/ do generation/schema/ jako shims.

Generowanie komend na podstawie schematów:

Pattern matching
Context-aware generation
Learning from feedback

# Historical API (deprecated shim)
from nlp2cmd.generation.schema import SchemaBasedGenerator

generator = SchemaBasedGenerator(llm_config)
command = generator.generate_command('find', {'path': '/home', 'pattern': '*.py'})

Usage Flow

1. User Input (NL)
      ↓
2. Schema Registry Lookup
      ↓
3. Schema-Based Generation
      ↓
4. Command Output

Versioning

Note: Wersjonowanie schematów jest obecnie ograniczone do wersji “1.0”.

Planowane wsparcie dla wersjonowania:

Major: breaking changes
Minor: new features
Patch: bug fixes

{
  "command": "docker",
  "version": "1.0",
  "migration_notes": "v1→v2: --rm moved to run options (planned)"
}

Integration with Pipeline

Current State: Schematy są częściowo zintegrowane z pipeline NLP2CMD:

Schema Extraction - SchemaRegistry ekstrahuje schematy z różnych źródeł
Schema Storage - PerCommandSchemaStore przechowuje schematy
Limited Integration - SchemaBasedGenerator używany w nielicznych miejscach

Planned Integration:

Schema Match - sprawdź czy istnieje schema
Template Match - użyj zapisanych szablonów
LLM Fallback - generuj przez LLM jeśli brak schema

Best Practices

Start with good schemas - Provide quality initial schemas
Use per-command storage - Lepsza organizacja
Collect feedback - Enable user feedback for improvement
Validate generations - Check generated commands
Version schemas - Track schema evolution

Configuration

# Schema extraction config
llm_config = {
    "model": "ollama/qwen2.5-coder:7b",
    "api_base": "http://localhost:11434",
    "temperature": 0.1,
    "max_tokens": 512,
}

# Registry config
registry_config = {
    "use_per_command_storage": True,
    "storage_dir": "./command_schemas",
    "use_llm": False  # Optional LLM extraction
}

Migration History

v0.2.0: Schema extraction introduced
v0.3.0: Per-command storage added
v0.4.0: schema_based/ → generation/schema/ (shims)
v0.5.0: Added support for OpenAPI, AppSpec exports, dynamic schemas
v1.0.0: Schema extraction integration with pipeline

API Reference - Detailed API
Examples Guide - Practical examples
Versioned Schemas - Schema versioning
Thermodynamic Integration - Energy-based optimization

This site is open source. Improve this page.