rag1

Bolmo RAG Toolkit

Zestaw narzędzi do eksperymentów z Retrieval-Augmented Generation (RAG) oparty na skrypcie rag.py. Repozytorium zawiera gotowy skrypt instalacyjny install.sh oraz dopasowany Makefile, które w kilka sekund przygotowują środowisko uruchomieniowe i dostarczają wygodne cele (install, run, clean).

Wymagania wstępne

Linux / macOS z python3 i venv
Dostęp do internetu w celu pobrania zależności Pythona

Instalacja środowiska (install.sh)

Nadaj plikowi uprawnienia wykonywalne:
```
chmod +x install.sh
```
Uruchom skrypt:
```
./install.sh
```

Skrypt:

tworzy virtualenv (.venv),
aktywuje środowisko,
aktualizuje pip,
instaluje komplet zależności zpinowanych pod Python 3.13/3.11, aby uniknąć kompilacji Rust/C (transformers==4.46.2, tokenizers==0.20.3, sentence-transformers==2.6.1, faiss-cpu==1.13.2, torch==2.10.0, PyPDF2==3.0.1, python-docx==1.2.0, html2text==2025.4.15, python-dotenv==1.0.1).

Po zakończeniu aktywuj środowisko komendą source .venv/bin/activate.

Uwaga: jeśli pracujesz na starszej dystrybucji, upewnij się, że masz zainstalowane build-essential, rustc oraz cargo; nowe paczki powinny jednak instalować się z gotowych kółek (wheels) i nie wymagać kompilacji.

Przykładowa kolekcja `docs/`

Repozytorium zawiera katalog docs/ z wieloma formatami, które możesz wykorzystać do szybkiego testu pipeline’u:

readme.txt – tekstowe wprowadzenie do korpusu.
faq.md – markdown z najczęstszymi pytaniami.
support.html – prosty dokument HTML do sprawdzenia ekstrakcji z treści webowych.
schedule.csv – tabela z harmonogramem zadań.
config.json – przykładowa konfiguracja indeksowania.
meeting_notes.docx – notatki ze spotkania w formacie Word.
overview.pdf – krótki PDF opisujący demo Bolmo RAG.

Aby uruchomić demo na tych danych:

make run FOLDER=./docs QUERY="Co zawiera pakiet demo?"

Wyniki, z cytowaniem źródeł:

$ make run
CUDA_HOME=/usr/local/cuda .venv/bin/python rag.py --folder ./docs --query "Co to jest Bolmo?" --k 5
[rag] Ładowanie dokumentów z ./docs...
[rag] Załadowano 7 plików.
[rag] Chunkowanie dokumentów...
[rag] Chunkowanie zakończone – 7 fragmentów.
[rag] Ładowanie embeddera all-MiniLM-L6-v2...
[rag] Ładowanie istniejącego indexu FAISS...
[rag] Retrieval – wyszukiwanie 5 fragmentów...
[rag] Pobrano 5 fragmentów kontekstu.
[rag] Ładowanie modelu allenai/Bolmo-1B na urządzeniu cuda...
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 90.62it/s]
[rag] Generowanie odpowiedzi...
[1] support.html
# Kontakt zespołem Bolmo

Napisz do nas na [help@bolmo.ai](mailto:help@bolmo.ai), aby uzyskać wsparcie.

* Aktualizacje produktu w każdy wtorek
  * Wsparcie 24/7 dla klientów enterprise

[2] faq.md
# Najczęstze pytania

## Czym jest Bolmo?
Bolmo to przykład systemu wspomagającego analizę dokumentów w modelu RAG.

## Jak korzystać z próbek?
Ustaw `FOLDER=./docs` podczas wywołania `make run`, aby przetestować pipeline na zróżnich plikach.

[3] overview.pdf
Bolmo RAG DemoTen PDF ilustruje obsÅugÄ plikÃ³w binarnych w pipeline.MoÅ¼esz zastÄpiÄ go wÅasnymi materiaÅami referencyjnymi.

[4] readme.txt
Bolmo RAG Sample Corpus
===============================

To demo the pipeline, point the --folder flag to this directory.
It contains multiple document formats that mirror real-world inputs.

[5] meeting_notes.docx
Spotkanie: Harmonogram wdrożeń
Ustalenia: Przygotować demo dla zespołu sprzedaży.

Konfiguracja modeli przez `.env`

Sklonuj plik wzorcowy i ustaw zmienną:
```
cp .env.example .env
```
Otwórz .env i ustaw:
- BOLMO_MODEL – model generatywny (domyślnie allenai/Bolmo-1B). Jeśli masz zasoby GPU, możesz przełączyć na allenai/Bolmo-7B.
- BOLMO_EMBEDDER – sentence-transformer do częśći retrieval (domyślnie all-MiniLM-L6-v2). Możesz wskazać np. sentence-transformers/all-mpnet-base-v2.

Skrypt rag.py ładuje .env automatycznie dzięki python-dotenv, więc każda zmiana wartości jest widoczna przy kolejnym uruchomieniu make run.

Automatyzacja za pomocą Makefile

Makefile udostępnia zestaw celów:

make install                      # tworzy .venv poprzez install.sh
make run FOLDER=./docs QUERY="?"   # uruchamia rag.py z parametrami
make test                         # uruchamia pytest
make clean                        # usuwa katalog .venv
make docker-build                 # buduje obraz Dockera bolmo-rag
make docker-run FOLDER=... QUERY=... # buduje i uruchamia obraz z zamontowanymi docs/.env
make docker-shell                 # wchodzi do python:3.11 z repo pod /app
make docker-up / docker-logs / docker-stop # zarządza kontenerem w tle

Opis celów

install – zależy od pliku .venv/bin/activate; jeżeli środowisko nie istnieje, uruchamia install.sh i tworzy plik znacznika w katalogu .venv.
run – zapewnia, że środowisko jest gotowe (wywołuje make install), a następnie uruchamia rag.py z przekazanymi argumentami --folder oraz --query.
test – uruchamia pytest, aby sprawdzić podstawowe wczytywanie dokumentów.
clean – usuwa katalog .venv, pozwalając rozpocząć instalację od zera.
docker-build – pakuje aplikację w obraz Dockera z wykorzystaniem requirements.txt.
docker-run – uruchamia wcześniej zbudowany obraz z podmontowanym katalogiem docs/ i plikiem .env, dzięki czemu można zmieniać korpusy bez przebudowy obrazu.
docker-shell – otwiera tymczasowy kontener python:3.11 z repozytorium zamontowanym pod /app; idealne do ręcznego testowania pip install -r requirements.txt && make run ... w kontrolowanym środowisku.
docker-up – startuje kontener w tle (nazwa bolmo-rag-run),
docker-logs – tailuje logi działającego kontenera,
docker-stop – zatrzymuje i usuwa kontener uruchomiony przez docker-up.

Scenariusz „Docker + logi”

Zbuduj obraz i odpal pipeline w tle:

make docker-up FOLDER=./docs QUERY="Co to jest Bolmo?"

Podejrzyj logi na żywo (działa jak docker logs -f):
```
make docker-logs
```
Po zakończeniu zatrzymaj kontener:
```
make docker-stop
```

Jeżeli chcesz jedynie wejść w interaktywnego basha w obrazie referencyjnym, uruchom make docker-shell, a następnie w kontenerze wykonaj python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt && make run ....

Przykładowe użycie celu `run`

make run FOLDER=./docs QUERY="Co to jest Bolmo?"

Zmodyfikuj wartości FOLDER i QUERY, aby wskazać własny zbiór dokumentów oraz pytanie kierowane do modułu RAG.

Reset i ponowna instalacja środowiska

Jeśli chcesz upewnić się, że pracujesz na świeżej konfiguracji, wykonaj kolejno:

make clean                              # usuwa poprzednią virtualenv
make install                            # ponownie tworzy środowisko i instaluje zależności
make run FOLDER=./docs QUERY="Co to jest Bolmo?"  # uruchamia RAG na próbkach z docs/

Pierwsze dwa kroki gwarantują czyste środowisko, a trzeci demonstruje pełne wywołanie pipeline’u na przykładowej kolekcji.

Dalsze kroki

Rozbuduj rag.py (lub alternatywny skrypt, np. bolmo_rag.py) o własne logiki indeksowania czy interfejs CLI. Makefile i struktura środowiska są gotowe na kolejne rozszerzenia.

Powodzenia w pracy z Bolmo RAG!

Bolmo RAG

A simple Retrieval-Augmented Generation (RAG) system for processing documents and answering queries based on their content.

Overview

Bolmo RAG allows you to index documents in a specified folder and query them using natural language. It supports three retrieval backends:

faiss (default): FAISS with persistent cache in FOLDER/.rag_cache (fast repeated runs)
faiss_nocache: baseline, rebuilds chunks/embeddings/index on every run
qdrant: Qdrant vector database (ingest + query)

Installation

To set up the environment and install dependencies:

make install

Expected Outcome: This will create a virtual environment and install all necessary packages. You should see the installation process complete with a message indicating the environment is ready.

Usage

Quickstart (recommended)

Build cache once, then query:

make index BACKEND=faiss FOLDER=./docs
make run   BACKEND=faiss FOLDER=./docs QUERY="Czym jest Bolmo?" K=5

Indexing Documents

Indexing requirements depend on the backend:

faiss: indexing is persisted in FOLDER/.rag_cache (use make index or make reindex)
faiss_nocache: no separate indexing step (everything is built on the fly)
qdrant: requires ingest into a Qdrant collection (use make reindex BACKEND=qdrant or start Qdrant manually + make index BACKEND=qdrant)

With FAISS Backend (Default)

make index BACKEND=faiss FOLDER=./docs

Expected Outcome: The system will process all supported document types in the specified folder, create chunks, and build a FAISS index. You should see a message like Cache/index gotowy. indicating the index is ready.

To force rebuild:

make reindex BACKEND=faiss FOLDER=./docs

With FAISS Baseline (No Cache)

No separate indexing step. Just run:

make run BACKEND=faiss_nocache FOLDER=./docs QUERY="Czym jest Bolmo?" K=5

With Qdrant Backend

make reindex BACKEND=qdrant FOLDER=./docs

Expected Outcome: Similar to FAISS, but the system will (via make reindex) attempt to start a local Qdrant container named qdrant-server and then ingest the document chunks into a Qdrant collection. You should see Ingest do Qdrant zakończony. upon completion.

Note: Ensure Docker is installed and running for Qdrant backend.

Querying

After indexing (if required), you can query the documents:

.venv/bin/python rag.py --backend faiss --folder ./docs --query "Czym jest Bolmo?"

Expected Outcome: The system will retrieve relevant document chunks and generate an answer based on the context using the Bolmo model. The response will be in Markdown format with citations to the source documents.

Specify Backend for Query

.venv/bin/python rag.py --backend qdrant --folder ./docs --query "Czym jest Bolmo?" --k 5

Expected Outcome: Same as above, but uses Qdrant for retrieval if the collection exists.

Reindexing

If documents change or you want to force a rebuild of the index:

make reindex BACKEND=faiss FOLDER=./docs

or for Qdrant:

make reindex BACKEND=qdrant FOLDER=./docs

Expected Outcome: The cache and index will be rebuilt from scratch, reflecting any changes in the documents.

Make targets

The default interface is via make:

make install
make run BACKEND=faiss FOLDER=./docs QUERY="..." K=5
make index BACKEND=faiss FOLDER=./docs
make query BACKEND=faiss FOLDER=./docs QUERY="..." K=5
make reindex BACKEND=faiss FOLDER=./docs

Backend Comparison

FAISS:
- Pros: Lightweight, local, no external dependencies beyond Python packages. Fast for small to medium datasets.
- Cons: Not designed for distributed systems or very large datasets. Limited scalability.
- Use Case: Ideal for personal projects or when simplicity and speed are priorities.
FAISS (no cache):
- Pros: Simplest baseline, no persistent artifacts.
- Cons: Slow for repeated runs (re-chunking + re-embedding + rebuilding index every time).
- Use Case: Debugging / one-off runs.
Qdrant:
- Pros: Scalable, supports distributed setups, designed for vector similarity search with advanced features like filtering.
- Cons: Requires Docker and more setup (server must be running). Slightly slower for small datasets due to network overhead.
- Use Case: Better for larger datasets or when you need advanced search capabilities and plan to scale.

Environment Variables

You can customize settings via environment variables (see .env.example for defaults):

BOLMO_BACKEND: Default backend (faiss, faiss_nocache, or qdrant)
BOLMO_EMBEDDER: Embedding model (default: all-MiniLM-L6-v2)
BOLMO_MODEL: Language model for generation (default: allenai/Bolmo-1B)
QDRANT_URL: URL for Qdrant server (default: http://localhost:6333)
QDRANT_COLLECTION: Collection name in Qdrant (default: bolmo_docs)
BOLMO_CHUNK_SIZE: Size of text chunks (default: 800)
BOLMO_CHUNK_OVERLAP: Overlap between chunks (default: 200)
BOLMO_FORCE_CPU: Force CPU usage even if CUDA is available (1 or true)

Benchmarks

There is a benchmark script that measures retrieval speed and a simple quality proxy (hit@k) for all three backends:

python benchmarks/benchmark_rag.py --folder ./docs --k 5 --reindex

More examples are in BENCHMARKS.md.

Tests

make test

Optional Qdrant smoke tests (requires running Qdrant):

RUN_QDRANT_TESTS=1 QDRANT_URL=http://localhost:6333 make test

Troubleshooting

Qdrant Connection Issues: Ensure Docker is installed and the Qdrant container is running (docker ps to check). Restart with docker restart qdrant-server if needed.
Model Loading Errors: Check if CUDA is properly set up if using GPU. Set BOLMO_FORCE_CPU=1 to fallback to CPU.

This site is open source. Improve this page.