Status: π IN PROGRESS Last Updated: 2025-12-08
This document describes the LLM-DSL architecture for dynamic URL resolution and element finding. The system replaces hardcoded keywords with LLM-driven semantic analysis.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM-DSL URL RESOLUTION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β User Instruction: "ZnajdΕΊ formularz kontaktowy" β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. GoalDetector (LLM-first) β β
β β βββ LLM semantic analysis β TaskGoal.FIND_CONTACT_FORM β β
β β βββ Statistical fallback (NO hardcoded keywords) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 2. UrlResolver.find_url_for_goal() β β
β β βββ _find_url_with_llm() β LLM semantic β β
β β βββ dom_helpers.find_link() β Statistical word-overlap β β
β β βββ _legacy_fallback() β Pattern matching (deprecated) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 3. LLMElementFinder / LLMSelectorGenerator β β
β β βββ find_element(purpose="contact form") β β
β β βββ generate_field_selector(purpose="email input") β β
β β βββ generate_consent_selector() β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β Result: ResolvedUrl(url="/kontakt", method="llm", confidence=0.9) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
goal_detector_llm/)BEFORE (Hardcoded):
# β Hardcoded keyword lists
if 'kontakt' in instruction or 'contact' in instruction:
return TaskGoal.FIND_CONTACT_FORM
AFTER (LLM-DSL):
# β
LLM semantic analysis
result = await llm.aquery(f"""
Analyze this instruction and determine user's goal:
"{instruction}"
Goals: FIND_CART, FIND_LOGIN, FIND_CONTACT_FORM, FIND_RETURNS, ...
Return: goal
""")
url_resolution/resolver.py)Strategy Hierarchy:
llm_dsl/element_finder.py)# Find element by PURPOSE, not selector
finder = LLMElementFinder(page=page, llm=llm)
result = await finder.find_element(purpose="contact form submit button")
# Result:
# ElementMatch(
# found=True,
# selector="form.contact button[type='submit']",
# confidence=0.9,
# method="llm"
# )
llm_dsl/selector_generator.py)# Generate selector dynamically
generator = LLMSelectorGenerator(llm=llm)
result = await generator.generate_field_selector(
page=page,
purpose="email input field"
)
# Result:
# GeneratedSelector(
# selector="input[type='email'], input[name*='mail']",
# confidence=0.85,
# method="llm"
# )
All semantic concepts are defined in ONE location:
curllm_core/
βββ url_types.py # TaskGoal enum (goals only)
βββ llm_dsl/
β βββ concepts.py # FIELD_CONCEPTS (NEW - single source)
β βββ selector_generator.py # Uses concepts from above
β βββ element_finder.py # Uses concepts from above
βββ form_fill/
βββ js_scripts.py # generate_field_concepts_with_llm()
| Component | Before | After |
|---|---|---|
| Goal detection | if 'kontakt' in text |
llm.analyze_intent(text) |
| URL finding | url_patterns = ['/kontakt'] |
llm.find_url_for_purpose(purpose) |
| Selector | '#email, .email-input' |
generator.generate_field_selector('email') |
| Form fields | ['email', 'mail', 'adres'] |
generate_field_concepts_with_llm(page) |
When LLM is unavailable, use statistical analysis:
async def _find_link_statistical(page, goal: str) -> Optional[LinkInfo]:
"""
Statistical word-overlap scoring.
NO HARDCODED KEYWORDS - derives keywords from goal description.
"""
# Extract keywords from goal semantically
goal_words = set(goal.replace('_', ' ').lower().split())
links = await page.evaluate("() => [...document.querySelectorAll('a')]...")
for link in links:
# Score based on word overlap
link_words = set(link.text.lower().split() + link.href.split('/'))
score = len(goal_words & link_words) / len(goal_words)
...
| File | Issue | Solution |
|---|---|---|
goal_detector_hybrid.py |
GOAL_KEYWORDS dict |
Use LLM analysis, remove dict |
resolver.py::_extract_keywords |
FILTER_WORDS set |
Use LLM to extract search terms |
_find_link_keyword_fallback.py |
goal_config dict |
Use statistical scoring |
# Run URL resolver examples
cd examples/url_resolver
python run_all.py
# Expected results after migration:
# - Goal detection: 90%+ (LLM semantic)
# - URL finding: 80%+ (LLM + statistical)
# - Form filling: 85%+ (LLM selectors)