Każde zadanie rozkładamy na wyspecjalizowane narzędzia (tools), które:
curllm_core/tools/
├── __init__.py
├── registry.py # Tool registry + autodiscovery
├── base.py # BaseTool abstract class
├── orchestrator.py # LLM-driven tool selection & execution
│
├── extraction/ # Data extraction tools
│ ├── __init__.py
│ ├── products_ceneo.py # Specialized for Ceneo.pl products
│ ├── products_amazon.py # Amazon product extraction
│ ├── products_generic.py # Generic e-commerce
│ ├── articles_hn.py # HackerNews articles
│ ├── articles_reddit.py # Reddit posts
│ ├── links_by_pattern.py # Links matching pattern
│ └── tables_to_json.py # HTML tables → JSON
│
├── forms/ # Form manipulation tools
│ ├── __init__.py
│ ├── price_filter.py # Set price min/max filters
│ ├── search_query.py # Fill search boxes
│ ├── date_picker.py # Date range selection
│ └── dropdown_select.py # Dropdown/select manipulation
│
├── navigation/ # Page navigation tools
│ ├── __init__.py
│ ├── scroll_load.py # Infinite scroll handler
│ ├── pagination.py # Multi-page navigation
│ ├── category_select.py # Category tree navigation
│ └── click_by_text.py # Click element by text content
│
└── validation/ # Data validation tools
├── __init__.py
├── price_range_check.py # Validate prices in range
├── url_format_check.py # Validate URL formats
└── required_fields.py # Check required fields present
Każde narzędzie ma plik {tool_name}.json:
{
"name": "products_ceneo",
"version": "1.0.0",
"description": "Extract product listings from Ceneo.pl with price filtering",
"category": "extraction",
"triggers": [
"product.*ceneo",
"ceneo.*product",
"product.*polish.*site"
],
"parameters": {
"type": "object",
"properties": {
"max_price": {
"type": "number",
"description": "Maximum price threshold in PLN",
"default": 999999
},
"min_price": {
"type": "number",
"description": "Minimum price threshold in PLN",
"default": 0
},
"category": {
"type": "string",
"description": "Product category filter",
"optional": true
}
},
"required": ["max_price"]
},
"output_schema": {
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"url": {"type": "string", "format": "uri"}
}
}
}
}
},
"examples": [
{
"instruction": "Find products under 150zł on Ceneo",
"parameters": {"max_price": 150},
"expected_output": {"products": [{"name": "Example", "price": 149.99, "url": "..."}]}
}
],
"dependencies": ["playwright", "page"],
"timeout_ms": 30000
}
LLM generuje execution plan jako JSON:
{
"plan": [
{
"tool": "navigation.scroll_load",
"parameters": {"times": 6, "wait_ms": 600},
"description": "Load more products by scrolling"
},
{
"tool": "forms.price_filter",
"parameters": {"max": 150, "submit": true},
"description": "Apply price filter ≤150 PLN"
},
{
"tool": "extraction.products_ceneo",
"parameters": {"max_price": 150},
"description": "Extract filtered products"
},
{
"tool": "validation.price_range_check",
"parameters": {"min": 0, "max": 150},
"description": "Validate all prices ≤150"
}
],
"expected_output": "products",
"fallback": "extraction.products_generic"
}
┌─────────────────────────────────────┐
│ Instruction from User │
└──────────────┬──────────────────────┘
│
v
┌─────────────────────────────────────┐
│ Tool Orchestrator (LLM) │
│ - Analyzes instruction │
│ - Queries tool registry │
│ - Generates execution plan (JSON) │
└──────────────┬──────────────────────┘
│
v
┌─────────────────────────────────────┐
│ Tool Executor │
│ - Validates parameters │
│ - Executes tools sequentially │
│ - Passes output as input to next │
│ - Handles errors & fallbacks │
└──────────────┬──────────────────────┘
│
v
┌─────────────────────────────────────┐
│ Result Validator │
│ - Checks output schema │
│ - Runs validation tools │
│ - Returns final result │
└─────────────────────────────────────┘
User instruction:
Find all products under 150zł on Ceneo special offers page
LLM generates plan:
{
"plan": [
{
"tool": "forms.price_filter",
"parameters": {"max": 150, "submit": true}
},
{
"tool": "navigation.scroll_load",
"parameters": {"times": 8}
},
{
"tool": "extraction.products_ceneo",
"parameters": {"max_price": 150}
}
]
}
Tools execute:
forms.price_filter → fills price form, clicks submitnavigation.scroll_load → scrolls 8 times to load productsextraction.products_ceneo → specialized Ceneo extractionOutput:
{
"products": [
{"name": "Odkurzacz XYZ", "price": 149.99, "url": "https://..."},
...
]
}