Last Updated: 2025-12-08 Status: ✅ MIGRATION COMPLETE
curllm_core.cli_orchestrator workinghuman_delay, human_type, human_scroll helpersThis plan outlines the migration from hardcoded selectors/keywords to LLM-DSL architecture.
# Hardcoded selector
element = document.querySelector('input[name="email"]')
# Hardcoded keyword list
for field in ["name", "email", "phone"]:
# ...
# LLM-driven element finding
element = await dsl.execute("find_element", {
"purpose": "email_input",
"context": page_context
})
# LLM-driven field detection
fields = await dsl.execute("analyze_form", {
"form_context": form_html,
"detect_purposes": True
})
for field in fields.data:
# ...
| File | Status | Changes |
|---|---|---|
curllm_core/dsl/executor.py |
✅ Refactored | _parse_instruction() → semantic concept detection |
curllm_core/extraction/extractor.py |
✅ Refactored | _filter_only() uses concept groups |
curllm_core/iterative_extractor.py |
⏳ No Changes | Already uses statistical/dynamic detection |
curllm_core/form_fill.py |
✅ Refactored | field_concepts dict for semantic matching |
curllm_core/form_fill/js_scripts.py |
✅ Refactored | LLM generates keywords via generate_field_concepts_with_llm() |
curllm_core/hierarchical/planner.py |
✅ Refactored | field_concepts dict for canonical names |
curllm_core/field_filling/filler.py |
✅ Refactored | consentConcepts for GDPR detection |
curllm_core/dom/helpers.py |
✅ Refactored | LLM-first link finding strategy |
curllm_core/llm_dsl/element_finder.py |
✅ Created | New LLM-driven element finder |
curllm_core/llm_dsl/selector_generator.py |
✅ Created | LLM generates selectors dynamically |
curllm_core/orchestrators/social.py |
✅ Refactored | _find_element_with_llm() |
curllm_core/orchestrators/auth.py |
✅ Refactored | _find_auth_element() with LLM |
curllm_core/streamware/.../smart_orchestrator.py |
✅ Refactored | Semantic concept groups |
curllm_core/streamware/.../dom_fix.py |
✅ Refactored | phone_concepts, message_concepts |
curllm_core/streamware/.../orchestrator.py |
✅ Refactored | Semantic concept groups |
curllm_core/element_finder/finder.py |
⏳ No Changes | Already LLM-driven |
curllm_core/orchestrators/form.py |
⏳ No Changes | No hardcoded keywords |
curllm_core/proxy.py |
⏳ No Changes | Proxy management |
curllm_core/streamware/patterns.py |
⏳ No Changes | Data patterns |
curllm_core/vision_form_analysis.py |
✅ Refactored | Semantic concept groups |
curllm_core/orchestrators/ecommerce.py |
✅ Refactored | LLM-first product click |
curllm_core/form_fill/filler.py |
✅ Refactored | field_concepts dict |
scripts/refactor_to_llm_dsl.py |
✅ Created | Migration analysis script |
curllm_core/dsl/executor.py ✅ DONE_get_default_fields() instead of hardcoded list_get_default_fields() instead of hardcoded list_detect_fields_semantic() for concept-based detection_detect_filter_semantic() for filter expression parsingcurllm_core/extraction/extractor.py ✅ DONE_filter_only() now uses semantic concept groupsdirect_fastpath() now uses semantic concept groups'a' selector kept (semantic - finds anchors)'a[href]' selector kept (semantic - finds links)curllm_core/iterative_extractor.py ⏳ NO CHANGES NEEDED* selector is used for statistical DOM analysis (not hardcoded targeting)a[href], img are semantic element type selectors (correct for purpose)curllm_core/form_fill.py ✅ DONEfield_concepts dict for semantic matchingcurllm_core/form_fill/js_scripts.py ✅ DONEgenerate_field_concepts_with_llm() - LLM generates keywordsFIND_FORM_FIELDS_PARAMETRIZED_JS - accepts concepts as parameterfind_form_fields_with_llm() - main entry point for LLM-driven form detectionFIND_FORM_FIELDS_JS preserved for backward compatibilitycurllm_core/hierarchical/planner.py ✅ DONEfield_concepts dict for semantic field detectionfield_concepts dict for canonical name mappingcurllm_core/field_filling/filler.py ✅ DONELLMSelectorGeneratorcurllm_core/dom/helpers.py ✅ DONEfind_link_for_goal() now uses LLM-first strategy_find_link_with_llm() for LLM-based link finding_find_link_statistical() for word-overlap scoring_find_link_keyword_fallback() preserved as legacy fallbacka[href], input, form selectors kept (semantic for purpose)curllm_core/orchestrators/social.py ✅ DONEPLATFORM_CONFIG → PLATFORM_HINTS (purposes not selectors)_find_element_with_llm() for dynamic element findingcurllm_core/orchestrators/auth.py ✅ DONEELEMENT_PURPOSES for semantic descriptionsPLATFORM_SELECTORS → SELECTOR_HINTS (fallback only)_find_auth_element() with LLM-first approachcurllm_core/streamware/components/form/smart_orchestrator.py ✅ DONE_are_semantically_related() uses semantic concept groupsform, input, textarea, select kept (semantic element types)curllm_core/streamware/components/dom_fix.py ✅ DONE_fields_match() uses phone_concepts and message_concepts setscurllm_core/element_finder/finder.py ⏳ NO CHANGES NEEDEDcurllm_core/streamware/components/form/orchestrator.py ✅ DONEcurllm_core/orchestrators/form.py ⏳ NO CHANGES NEEDEDcurllm_core/proxy.py ⏳ NO CHANGES NEEDEDcurllm_core/streamware/patterns.py ⏳ NO CHANGES NEEDEDcurllm_core/vision_form_analysis.py ✅ DONEcurllm_core/orchestrators/ecommerce.py ✅ DONE_click_first_product() now uses LLM-first approachcurllm_core/orchestrators/social.py ⏳ CAPTCHA PATTERNS.g-recaptcha, .h-captcha) are acceptablecurllm_core/form_fill/filler.py ✅ DONEfield_concepts dict for semantic matchingcurllm_core/orchestrators/auth.py ⏳ CAPTCHA PATTERNS.slider-captcha) are acceptable service patternscurllm_core/streamware/components/captcha/ ⏳ CAPTCHA PATTERNSAll high-priority files have been refactored. Remaining items are:
input, form, a[href] (semantic, kept)After each phase:
make test to verify no regressions