RLAMA is a powerful AI-driven question-answering tool for documents, seamlessly integrating with local Ollama models. Enables creating, managing, and interacting with Retrieval-Augmented Generation (RAG) systems tailored to documentation needs. Note: Project temporarily paused due to work/university commitments. Vision: becoming definitive tool for creating local RAG systems for everyone. Completed features: basic RAG system creation via CLI, document processing with multiple formats (.txt, .md, .pdf, etc.), document chunking with advanced semantic strategies (fixed, semantic, hierarchical, hybrid), vector storage of document embeddings, context retrieval with basic semantic search and configurable context size, Ollama integration with seamless connection to models, cross-platform support (Linux/macOS/Windows), easy one-line installation, API server with HTTP endpoints for RAG capabilities integration, web crawling for creating RAGs from websites, guided RAG setup wizard with interactive interface, and Hugging Face integration accessing 45,000+ GGUF models from HuggingFace Hub.
Roadmap: Small LLM Optimization (Q2 2025) with prompt compression, adaptive chunking, minimal context retrieval, parameter optimization; Advanced Embedding Pipeline (Q2-Q3 2025) with multi-model embedding support, hybrid retrieval techniques, embedding evaluation tools, automated embedding cache; User Experience Enhancements (Q3 2025) with lightweight web interface, knowledge graph visualization, domain-specific templates; Enterprise Features (Q4 2025) with multi-user access control, enterprise system integration, knowledge quality monitoring, system integration API, AI agent creation framework; Next-Gen Retrieval Innovations (Q1 2026) with multi-step retrieval, cross-modal retrieval, feedback-based optimization, knowledge graphs & symbolic reasoning. Tech stack: Go for core language, Cobra for CLI, Ollama API for embeddings/completions, local filesystem-based storage (JSON files), custom cosine similarity for embedding retrieval. Architecture: cmd/ (CLI commands), internal/ (client/domain/repository/service), pkg/ (shared utilities). Data flow: Documents → Document Processing → Embedding Generation → Storage (~/.rlama) → Query (embedding comparison) → Response Generation. Prerequisites: Ollama installed and running. Installation via 'curl -fsSL https://raw.githubusercontent.com/dontizi/rlama/main/install.sh | sh'. Available commands: rag, crawl-rag, wizard, watch/watch-off, check-watched, web-watch/web-watch-off, check-web-watched, run, api, list, delete, list-docs, list-chunks, view-chunk, add-docs, crawl-add-docs, update-model, update, version, hf-browse, run-hf. Apache 2.0 license.
Use Cases:
⚠️ Project Temporarily Paused
This project is currently on pause due to my work and university commitments that take up a lot of my time. I am not able to actively maintain this project at the moment. Development will resume when my situation allows it.
RLAMA is a powerful AI-driven question-answering tool for your documents, seamlessly integrating with your local Ollama models. It enables you to create, manage, and interact with Retrieval-Augmented Generation (RAG) systems tailored to your documentation needs.
RLAMA Demonstration (https://img.youtube.com/vi/EIsQnBqeQxQ/0.jpg)
RLAMA aims to become the definitive tool for creating local RAG systems that work seamlessly for everyone—from individual developers to large enterprises. Here's our strategic roadmap:
RLAMA's core philosophy remains unchanged: to provide a simple, powerful, local RAG solution that respects privacy, minimizes resource requirements, and works seamlessly across platforms.
curl -fsSL https://raw.githubusercontent.com/dontizi/rlama/main/install.sh | sh
RLAMA is built with:
RLAMA follows a clean architecture pattern with clear separation of concerns:
rlama/
├── cmd/ # CLI commands (using Cobra)
│ ├── root.go # Base command
│ ├── rag.go # Create RAG systems
│ ├── run.go # Query RAG systems
│ └── ...
├── internal/
│ ├── client/ # External API clients
│ │ └── ollama_client.go # Ollama API integration
│ ├── domain/ # Core domain models
│ │ ├── rag.go # RAG system entity
│ │ └── document.go # Document entity
│ ├── repository/ # Data persistence
│ │ └── rag_repository.go # Handles saving/loading RAGs
│ └── service/ # Business logic
│ ├── rag_service.go # RAG operations
│ ├── document_loader.go # Document processing
│ └── embedding_service.go # Vector embeddings
└── pkg/ # Shared utilities
└── vector/ # Vector operations
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Documents │────>│ Document │────>│ Embedding │
│ (Input) │ │ Processing │ │ Generation │
└─────────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Query │────>│ Vector │ environment variable > default location.
<!-- truncated for length -->
Convert websites into desktop apps with Electron. Features multi-account support, global hotkey switching, custom JavaScript injection and portable packaging for Windows, macOS and Linux.
Open-source AI meeting assistant built with Tauri at 10MB. Features real-time transcription with OpenAI Whisper, GPT-4, Claude, Gemini and Grok support, translucent overlay, and undetectable in video calls.
Cross-platform M3U8/MPD video downloader built with PySide6 and QFluentWidgets featuring multi-threaded downloads, task management, fluent design GUI, FFmpeg and N_m3u8DL-RE integration, Python 3.11 conda environment, and deployment support for Windows/macOS/Linux with GPL-3.0 license.
Flutter AI voice assistant for Android and iOS with real-time conversation, Live2D characters, echo cancellation, multi-service support for Xiaozhi, Dify and OpenAI, and image messaging.
GitHub starred repository manager with AI-powered auto-sync, semantic search, automatic categorization, release tracking, one-click downloads, smart asset filters, bilingual wiki integration, and cross-platform Electron client for Windows/macOS/Linux with 100% local data storage and MIT license.
Build local AI agents that observe your screen, microphone and clipboard, process with local LLMs, and react with notifications, screen recording and memory. All data stays private. Works with Ollama and OpenAI.