curllm

LLM-DSL Migration Plan

Last Updated: 2025-12-08 Status: ✅ MIGRATION COMPLETE

Summary

Documentation

Overview

This plan outlines the migration from hardcoded selectors/keywords to LLM-DSL architecture.

Architecture Changes

Before (Hardcoded)

# Hardcoded selector
element = document.querySelector('input[name="email"]')

# Hardcoded keyword list
for field in ["name", "email", "phone"]:
    # ...

After (LLM-DSL)

# LLM-driven element finding
element = await dsl.execute("find_element", {
    "purpose": "email_input",
    "context": page_context
})

# LLM-driven field detection
fields = await dsl.execute("analyze_form", {
    "form_context": form_html,
    "detect_purposes": True
})
for field in fields.data:
    # ...

Progress Tracking

File Status Changes
curllm_core/dsl/executor.py ✅ Refactored _parse_instruction() → semantic concept detection
curllm_core/extraction/extractor.py ✅ Refactored _filter_only() uses concept groups
curllm_core/iterative_extractor.py ⏳ No Changes Already uses statistical/dynamic detection
curllm_core/form_fill.py ✅ Refactored field_concepts dict for semantic matching
curllm_core/form_fill/js_scripts.py ✅ Refactored LLM generates keywords via generate_field_concepts_with_llm()
curllm_core/hierarchical/planner.py ✅ Refactored field_concepts dict for canonical names
curllm_core/field_filling/filler.py ✅ Refactored consentConcepts for GDPR detection
curllm_core/dom/helpers.py ✅ Refactored LLM-first link finding strategy
curllm_core/llm_dsl/element_finder.py ✅ Created New LLM-driven element finder
curllm_core/llm_dsl/selector_generator.py ✅ Created LLM generates selectors dynamically
curllm_core/orchestrators/social.py ✅ Refactored _find_element_with_llm()
curllm_core/orchestrators/auth.py ✅ Refactored _find_auth_element() with LLM
curllm_core/streamware/.../smart_orchestrator.py ✅ Refactored Semantic concept groups
curllm_core/streamware/.../dom_fix.py ✅ Refactored phone_concepts, message_concepts
curllm_core/streamware/.../orchestrator.py ✅ Refactored Semantic concept groups
curllm_core/element_finder/finder.py ⏳ No Changes Already LLM-driven
curllm_core/orchestrators/form.py ⏳ No Changes No hardcoded keywords
curllm_core/proxy.py ⏳ No Changes Proxy management
curllm_core/streamware/patterns.py ⏳ No Changes Data patterns
curllm_core/vision_form_analysis.py ✅ Refactored Semantic concept groups
curllm_core/orchestrators/ecommerce.py ✅ Refactored LLM-first product click
curllm_core/form_fill/filler.py ✅ Refactored field_concepts dict
scripts/refactor_to_llm_dsl.py ✅ Created Migration analysis script

Files to Migrate (Priority Order)

1. curllm_core/dsl/executor.py ✅ DONE

2. curllm_core/extraction/extractor.py ✅ DONE

3. curllm_core/iterative_extractor.py ⏳ NO CHANGES NEEDED

4. curllm_core/form_fill.py ✅ DONE

5. curllm_core/form_fill/js_scripts.py ✅ DONE

6. curllm_core/hierarchical/planner.py ✅ DONE

7. curllm_core/field_filling/filler.py ✅ DONE

8. curllm_core/dom/helpers.py ✅ DONE

9. curllm_core/orchestrators/social.py ✅ DONE

10. curllm_core/orchestrators/auth.py ✅ DONE

11. curllm_core/streamware/components/form/smart_orchestrator.py ✅ DONE

12. curllm_core/streamware/components/dom_fix.py ✅ DONE

13. curllm_core/element_finder/finder.py ⏳ NO CHANGES NEEDED

14. curllm_core/streamware/components/form/orchestrator.py ✅ DONE

15. curllm_core/orchestrators/form.py ⏳ NO CHANGES NEEDED

16. curllm_core/proxy.py ⏳ NO CHANGES NEEDED

17. curllm_core/streamware/patterns.py ⏳ NO CHANGES NEEDED

18. curllm_core/vision_form_analysis.py ✅ DONE

19. curllm_core/orchestrators/ecommerce.py ✅ DONE

21. curllm_core/orchestrators/social.py ⏳ CAPTCHA PATTERNS

20. curllm_core/form_fill/filler.py ✅ DONE

22. curllm_core/orchestrators/auth.py ⏳ CAPTCHA PATTERNS

23. curllm_core/streamware/components/captcha/ ⏳ CAPTCHA PATTERNS


Migration Complete

All high-priority files have been refactored. Remaining items are:

  1. Phase 1: Core Modules (atoms.py, executor.py)
    • Ensure all atomic functions are LLM-queryable
    • Add fallback strategies for each function
  2. Phase 2: Form Handling (form_fill.py, field_filler.py)
    • Replace hardcoded field keywords with LLM detection
    • Use semantic analysis for form understanding
  3. Phase 3: URL Resolution (url_resolver.py, dom_helpers.py)
    • Replace hardcoded URL patterns with LLM analysis
    • Use page structure analysis for navigation
  4. Phase 4: Orchestrators (orchestrator.py, steps.py)
    • Integrate LLM-DSL for all element interactions
    • Add context-aware fallbacks

Testing

After each phase:

  1. Run make test to verify no regressions
  2. Run example scripts to verify functionality
  3. Compare success rates before/after