curllm

curllm logo

curllm = curl + LLM

Intelligent Browser Automation with Local LLMs

PyPI Python License Stars Issues

Quick Start β€’ Features β€’ Examples β€’ Documentation β€’ API


🎯 What is curllm?

curllm is a powerful CLI tool that combines browser automation with local LLMs (like Ollama’s Qwen, Llama, Mistral) to intelligently extract data, fill forms, and automate web workflows - all running locally on your machine with complete privacy.

πŸ†• v2 LLM-DSL Architecture! Dynamic element detection, semantic goal understanding, no hardcoded selectors. 388 tests passing.

# Extract products with prices from any e-commerce site
curllm "https://shop.example.com" -d "Find all products under $100"

# Fill contact forms automatically
curllm --stealth "https://example.com/contact" -d "Fill form: name=John, email=john@example.com"

# Extract all emails from a page
curllm "https://example.com" -d "extract all email addresses"

✨ Features

Feature Description
🧠 Local LLM Works with 8GB GPUs (Qwen 2.5, Llama 3, Mistral)
🎯 Smart Extraction LLM-guided DOM analysis - no hardcoded selectors
πŸ“ Form Automation Auto-fill forms with intelligent field mapping
πŸ₯· Stealth Mode Bypass anti-bot detection
πŸ‘οΈ Visual Mode See browser actions in real-time
πŸ” BQL Support Browser Query Language for structured queries
πŸ“Š Export Formats JSON, CSV, HTML, XLS output
πŸ”’ Privacy-First Everything runs locally - no cloud APIs needed

🧠 LLM-DSL Architecture

curllm v2 uses LLM-DSL (LLM Domain Specific Language) - a dynamic approach that eliminates hardcoded selectors:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     LLM-DSL Flow                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  1. Goal Detection (semantic)                               β”‚
β”‚     "Find RAM DDR5" β†’ FIND_PRODUCTS                         β”‚
β”‚                                                             β”‚
β”‚  2. Strategy Selection                                      β”‚
β”‚     FIND_PRODUCTS β†’ use search flow                         β”‚
β”‚     FIND_CART β†’ find link by semantic scoring               β”‚
β”‚                                                             β”‚
β”‚  3. Element Finding (LLM-first)                             β”‚
β”‚     LLM analysis β†’ Statistical scoring β†’ Fallback           β”‚
β”‚                                                             β”‚
β”‚  4. Dynamic Selector Generation                             β”‚
β”‚     Analyze DOM β†’ Score elements β†’ Generate selector        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Benefits

Feature Traditional LLM-DSL
Selectors Hardcoded CSS/XPath Dynamic generation
Keywords Static lists Semantic analysis
Language English only Multi-language (PL, EN)
Maintenance Manual updates Self-adapting

πŸš€ Quick Start

Installation

pip install -U curllm
curllm-setup      # One-time setup (installs Playwright browsers)
curllm-doctor     # Verify installation

Requirements

# Install Ollama (if not installed)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen2.5:7b

πŸ“– Examples

Extract Data

# Extract all links
curllm "https://example.com" -d "extract all links"

# Extract emails
curllm "https://example.com/contact" -d "extract all email addresses"
# Output: {"emails": ["info@example.com", "sales@example.com"]}

# Extract products with price filter
curllm --stealth "https://shop.example.com" -d "Find all products under 500zΕ‚"

Form Automation

# Fill contact form
curllm --visual --stealth "https://example.com/contact" \
  -d "Fill form: name=John Doe, email=john@example.com, message=Hello"

# Login automation
curllm --visual "https://app.example.com/login" \
  -d '{"instruction":"Login", "credentials":{"user":"admin", "pass":"secret"}}'

Export Results

# Export to CSV
curllm "https://example.com" -d "extract all products" --csv -o products.csv

# Export to HTML
curllm "https://example.com" -d "extract all links" --html -o links.html

# Export to Excel
curllm "https://example.com" -d "extract all data" --xls -o data.xlsx

Screenshots

# Take screenshot
curllm "https://example.com" -d "screenshot"

# Visual mode (watch browser)
curllm --visual "https://example.com" -d "extract all links"

BQL Queries

curllm --bql -d 'query {
  page(url: "https://news.ycombinator.com") {
    title
    links: select(css: "a.titlelink") { text url: attr(name: "href") }
  }
}'

🌐 Web Interface

curllm-web start   # Start web UI at http://localhost:5000
curllm-web status  # Check status
curllm-web stop    # Stop server

Features:

πŸ”§ Configuration

Environment variables (.env):

CURLLM_MODEL=qwen2.5:7b          # LLM model
CURLLM_OLLAMA_HOST=http://localhost:11434
CURLLM_HEADLESS=true             # Run browser headlessly
CURLLM_STEALTH_MODE=false        # Anti-detection
CURLLM_LOCALE=en-US              # Browser locale

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         curllm CLI                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  DSL Executor  │───▢│ Knowledge Base │───▢│ Strategy YAML β”‚  β”‚
β”‚  β”‚  (Orchestrator)β”‚    β”‚   (SQLite)     β”‚    β”‚    Files      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚          β”‚                                                      β”‚
β”‚          β–Ό                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚                    DOM Toolkit (Pure JS)                   β”‚ β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚ β”‚
β”‚  β”‚  β”‚Structure β”‚  β”‚ Patterns β”‚  β”‚Selectors β”‚  β”‚   Prices   β”‚  β”‚ β”‚
β”‚  β”‚  β”‚ Analyzer β”‚  β”‚ Detector β”‚  β”‚Generator β”‚  β”‚  Detector  β”‚  β”‚ β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚          β”‚                                                      β”‚
β”‚          β–Ό                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚              Playwright Browser Engine                     β”‚ β”‚
β”‚  β”‚         (Chromium with Stealth & Anti-Detection)           β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚          β”‚                                                      β”‚
β”‚          β–Ό                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚                 Ollama / LiteLLM                           β”‚ β”‚
β”‚  β”‚      (Local LLM: Qwen 2.5, Llama 3, Mistral, GPT, etc)     β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

Component Description LLM Calls
URL Resolver Smart navigation with goal detection 0-1
Goal Detector Semantic intent understanding 0-1
Element Finder Dynamic selector generation 0-1
DOM Toolkit Pure JavaScript atomic queries 0
SPA Hydration Wait for CSR/SPA content 0

πŸ“– Full Architecture Documentation β†’

🧬 DSL System (Strategy-Based Extraction)

Note: The YAML DSL system works alongside the newer LLM-DSL. YAML strategies are used for known sites with proven extraction patterns, while LLM-DSL handles unknown sites dynamically.

curllm automatically learns and saves successful extraction strategies as YAML files:

# dsl/ceneo_products.yaml - Auto-generated from successful extraction
url_pattern: "*.ceneo.pl/*"
task: extract_products
algorithm: statistical_containers

selector: div.product-card
fields:
  name: h3.title
  price: span.price
  url: a[href]

metadata:
  success_rate: 0.95
  use_count: 42

How It Works

  1. First visit - LLM-DSL dynamically analyzes page, extracts data
  2. Successful - Strategy saved to dsl/*.yaml, recorded in Knowledge Base
  3. Next visit - Knowledge Base loads saved strategy (fast path)
  4. Unknown site - Falls back to LLM-DSL dynamic discovery
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Request Flow                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  URL β†’ Knowledge Base lookup                            β”‚
β”‚        β”‚                                                β”‚
β”‚        β”œβ”€ Found? β†’ Load YAML strategy (fast)            β”‚
β”‚        β”‚                                                β”‚
β”‚        └─ Not found? β†’ LLM-DSL dynamic (flexible)       β”‚
β”‚                        β”‚                                β”‚
β”‚                        └─ Success? β†’ Save to YAML       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Algorithms

Algorithm Best For Speed
statistical_containers Product grids ⚑ Fast
pattern_detection Lists, tables ⚑ Fast
llm_guided Complex layouts 🐒 Slower
form_fill Contact forms ⚑ Fast

πŸ“– DSL System Documentation β†’

🀝 Multi-Provider LLM Support

curllm supports multiple LLM providers via LiteLLM:

from curllm_core import LLMConfig

# OpenAI
config = LLMConfig(provider="openai/gpt-4o-mini")

# Anthropic
config = LLMConfig(provider="anthropic/claude-3-haiku-20240307")

# Google Gemini
config = LLMConfig(provider="gemini/gemini-2.0-flash")

# Local Ollama (default)
config = LLMConfig(provider="ollama/qwen2.5:7b")

πŸ“š Documentation

Getting Started

Architecture

Reference

πŸ§ͺ Development

# Clone and install
git clone https://github.com/wronai/curllm.git
cd curllm
make install

# Run tests (388 tests passing)
make test

# Run URL resolver examples
cd examples/url_resolver && python run_all.py

# Run with Docker
docker compose up -d

πŸ“„ License

Apache License 2.0 - see LICENSE

πŸ™ Acknowledgments

Built with:


⭐ Star this repo if you find it useful!

Made with ❀️ by wronai