nlp2cmd uczy sie z kazdego zapytania. 93.6% trafnosc (Qwen-Coder-3B), 19,524x speedup z cache, template pipeline eliminuje 31% zapytan bez LLM.
Zapytanie NL --> CACHE EXACT --> (hit?) --> wynik (~0.01ms)
| miss
v
CACHE FUZZY --> (hit?) --> wynik (~0.02ms)
| miss
v
CACHE SIMILAR (rapidfuzz) --> (hit?) --> wynik (~0.4ms)
| miss
v
TEMPLATE PIPELINE (1615) --> (hit?) --> wynik (~5-15ms)
| miss
v
LLM TEACHER --> Qwen2.5-3B (~300ms) --> AUTO-CACHE
| Model | Przed few-shot | Po few-shot | Delta |
|---|---|---|---|
| Qwen2.5-Coder-3B | 79% | 93.6% | +14.6pp |
| Qwen2.5-3B | 83% | 89.4% | +6.4pp |
| Gemma2-2B | 28% | 53.2% | +25.2pp |
| Bielik-1.5B | 26% | 53.2% | +27.2pp |
| Domena | Przed | Po | Delta |
|---|---|---|---|
| browser | 50% | 100% | +50pp |
| api | 25% | 67% | +42pp |
| rag | 17% | 58% | +41pp |
| iot | 42% | 75% | +33pp |
| presentation | 42% | 75% | +33pp |
| devops | 58% | 83% | +25pp |
| ffmpeg | 50% | 75% | +25pp |
| remote | 42% | 67% | +25pp |
| package_mgmt | 75% | 92% | +17pp |
| kubernetes | 83% | 92% | +9pp |
| media | 75% | 83% | +8pp |
| shell | 58% | 67% | +9pp |
| data | 33% | 33% | 0pp |
| sql | 42% | 42% | 0pp |
| docker | 67% | 67% | 0pp |
| git | 100% | 92% | -8pp |
| Teacher | R1 cold | R3 hot | Speedup | Template hits R1 | LLM calls R1 |
|---|---|---|---|---|---|
| Qwen2.5-3B | 187ms | 0.017ms | 10,995x | 21/32 | 11 |
| Qwen2.5-Coder-3B | 187ms | 0.017ms | 10,995x | 21/32 | 11 |
| Gemma2-2B | 233ms | 0.017ms | 13,697x | 21/32 | 11 |
Template pipeline eliminuje 66% zapytan (21/32) bez potrzeby LLM. Remaining 11 queries go to LLM teacher and are cached for future instant lookup. Cold start average dropped from 314ms to 187ms thanks to template hits (~5ms vs ~300ms).
MD5 fingerprint — O(1) dict lookup.
def fingerprint(text: str) -> str:
normalized = re.sub(r'\s+', ' ', text.lower().strip())
return hashlib.md5(normalized.encode()).hexdigest()[:16]
Word-bag fingerprint — ignoruje stop words, sortuje slowa kluczowe.
Biblioteka rapidfuzz (C++ backend) — fuzz.WRatio — kombinacja 4 algorytmow:
| Algorytm | Co lapie |
|---|---|
| Simple ratio (Levenshtein) | literowki: “znajdz” vs “znajdz” |
| Partial ratio | podciagi: “pliki PDF” w dluzszym zapytaniu |
| Token sort ratio | zmiana kolejnosci slow |
| Token set ratio | dodatkowe/brakujace slowa |
Prog: NLP2CMD_SIMILARITY_THRESHOLD=88 (domyslnie 88%).
Testy: 22 testy w tests/unit/test_similarity_cache.py.
Uzywa istniejacego RuleBasedPipeline (1615 szablonow):
Lazy-loaded przy pierwszym wywolaniu. Wynik jest automatycznie cachowany. Eliminuje 31% zapytan bez potrzeby LLM.
Qwen2.5-3B lub Qwen2.5-Coder-3B z few-shot promptami per domena. Kazdy prompt zawiera 2-3 konkretne przyklady Q->A.
Wynik automatycznie cachowany w .nlp2cmd/learned_schemas.json.
Dodanie 2-3 konkretnych przykladow do kazdego system prompt dalo:
Przyklad prompt przed/po:
# PRZED (slaby):
"Generuj komende curl. Odpowiedz TYLKO komenda."
# PO (silny):
"""Jestes ekspertem od curl. Przyklady:
Q: wyslij GET -> curl -s https://api.example.com/users
Q: wyslij POST z JSON -> curl -s -X POST -H 'Content-Type: application/json' -d '{"key":"val"}' URL
Q: sprawdz kod HTTP -> curl -o /dev/null -s -w '%{http_code}' URL
Odpowiedz TYLKO komenda curl."""
| # | Domena | Szablony | Trafnosc (avg) |
|---|---|---|---|
| 1 | shell | 648 | 67% |
| 2 | git | 125 | 92% |
| 3 | kubernetes | 94 | 92% |
| 4 | package_mgmt | 85 | 92% |
| 5 | docker | 87 | 67% |
| 6 | sql | 86 | 42% |
| 7 | devops | 60 | 83% |
| 8 | data | 56 | 33% |
| 9 | browser | 54 | 100% |
| 10 | ffmpeg | 48 | 75% |
| 11 | iot | 48 | 75% |
| 12 | remote | 48 | 67% |
| 13 | rag | 47 | 58% |
| 14 | media | 44 | 83% |
| 15 | presentation | 43 | 75% |
| 16 | api | 42 | 67% |
from nlp2cmd.generation.evolutionary_cache import EvolutionaryCache
cache = EvolutionaryCache()
# 1. Cold start — template hit (~15ms) lub LLM teacher (~300ms)
r = cache.lookup("znajdz pliki PDF wieksze niz 10MB")
# 2. Hot cache — instant (~0.015ms)
r = cache.lookup("znajdz pliki PDF wieksze niz 10MB")
# 3. Typo — similarity hit (~0.4ms)
r = cache.lookup("znajdz pliki PDF wieksze niz 10MB")
print(r.source) # "cache_similar"
print(r.confidence) # 0.91
# 4. Stats
print(cache.get_stats())
make benchmark # Standard: 4 modele x 16 domen
make benchmark-learn # Learning: 3 teachery x 3 rundy
make benchmark-html # Otworz HTML raport
export NLP2CMD_TEACHER_MODEL="qwen2.5:3b"
export NLP2CMD_SIMILARITY_THRESHOLD="88"
export OLLAMA_BASE_URL="http://localhost:11434"
export NLP2CMD_CACHE_DIR="~/.nlp2cmd"
export NLP2CMD_BENCHMARK_MODELS="qwen2.5:3b,gemma2:2b" # custom model list
For testing or benchmarking purposes, you can disable all cache tiers:
# Via environment variable
export NLP2CMD_DISABLE_CACHE="1"
# In Python
import os
os.environ["NLP2CMD_DISABLE_CACHE"] = "1"
from nlp2cmd.generation.evolutionary_cache import EvolutionaryCache
cache = EvolutionaryCache()
# All lookups will bypass cache and template pipeline
# Benchmark without cache
python3 examples/benchmark_nlp2cmd.py --no-cache
When cache is disabled:
| Priorytet | Zadanie | Status |
|---|---|---|
| DONE | Evolutionary cache (exact+fuzzy) | 19,524x speedup |
| DONE | Similarity matching (rapidfuzz) | +14.6pp hit rate |
| DONE | Template-first pipeline | 31% queries bez LLM |
| DONE | Few-shot prompts (16 domen) | +14.6pp accuracy (Qwen-Coder) |
| DONE | Prefix-based domain detection | 9/9 accuracy |
| TODO | Few-shot dla sql/data (wciaz slabe) | est. +15-20pp |
| TODO | N-gram + TF-IDF scoring | est. +5% hit rate |
| TODO | Pre-warm cache z popularnymi zapytaniami | instant cold start |
| Plik | Opis |
|---|---|
src/nlp2cmd/generation/evolutionary_cache.py |
Silnik cache + similarity + template + LLM |
tests/unit/test_similarity_cache.py |
22 testy |
examples/benchmark_learning.py |
Learning benchmark |
examples/benchmark_nlp2cmd.py |
Standard benchmark |
src/nlp2cmd/generation/templates/ |
16 plikow, 1615 szablonow |
src/nlp2cmd/generation/pipeline.py |
RuleBasedPipeline (template tier) |