Rlama Download

Document AI question-answering CLI tool built with Go/Cobra connecting to local Ollama models for RAG systems, featuring web crawling, interactive wizard, directory/website watching, API server, advanced chunking strategies, vector storage, Hugging Face integration with 45,000+ GGUF models, and Apache 2.0 license.

⭐ 1,089 stars on GitHub
Latest Release: v0.1.39

About Software

RLAMA is a powerful AI-driven question-answering tool for documents, seamlessly integrating with local Ollama models. Enables creating, managing, and interacting with Retrieval-Augmented Generation (RAG) systems tailored to documentation needs. Note: Project temporarily paused due to work/university commitments. Vision: becoming definitive tool for creating local RAG systems for everyone. Completed features: basic RAG system creation via CLI, document processing with multiple formats (.txt, .md, .pdf, etc.), document chunking with advanced semantic strategies (fixed, semantic, hierarchical, hybrid), vector storage of document embeddings, context retrieval with basic semantic search and configurable context size, Ollama integration with seamless connection to models, cross-platform support (Linux/macOS/Windows), easy one-line installation, API server with HTTP endpoints for RAG capabilities integration, web crawling for creating RAGs from websites, guided RAG setup wizard with interactive interface, and Hugging Face integration accessing 45,000+ GGUF models from HuggingFace Hub.

Roadmap: Small LLM Optimization (Q2 2025) with prompt compression, adaptive chunking, minimal context retrieval, parameter optimization; Advanced Embedding Pipeline (Q2-Q3 2025) with multi-model embedding support, hybrid retrieval techniques, embedding evaluation tools, automated embedding cache; User Experience Enhancements (Q3 2025) with lightweight web interface, knowledge graph visualization, domain-specific templates; Enterprise Features (Q4 2025) with multi-user access control, enterprise system integration, knowledge quality monitoring, system integration API, AI agent creation framework; Next-Gen Retrieval Innovations (Q1 2026) with multi-step retrieval, cross-modal retrieval, feedback-based optimization, knowledge graphs & symbolic reasoning. Tech stack: Go for core language, Cobra for CLI, Ollama API for embeddings/completions, local filesystem-based storage (JSON files), custom cosine similarity for embedding retrieval. Architecture: cmd/ (CLI commands), internal/ (client/domain/repository/service), pkg/ (shared utilities). Data flow: Documents → Document Processing → Embedding Generation → Storage (~/.rlama) → Query (embedding comparison) → Response Generation. Prerequisites: Ollama installed and running. Installation via 'curl -fsSL https://raw.githubusercontent.com/dontizi/rlama/main/install.sh | sh'. Available commands: rag, crawl-rag, wizard, watch/watch-off, check-watched, web-watch/web-watch-off, check-web-watched, run, api, list, delete, list-docs, list-chunks, view-chunk, add-docs, crawl-add-docs, update-model, update, version, hf-browse, run-hf. Apache 2.0 license.

Use Cases:

  • Document AI question-answering tool connecting to local Ollama models for creating, managing, and interacting with RAG systems
  • CLI tool with Go/Cobra featuring web crawling (crawl-rag), interactive wizard setup, directory/website watching, API server, document management
  • Advanced chunking strategies (fixed, semantic, hierarchical, hybrid), vector storage with cosine similarity, multi-format support (.txt, .md, .pdf)
  • Hugging Face integration accessing 45,000+ GGUF models, adaptive chunking for small LLMs, minimal context retrieval, embedding pipeline
  • Installation via curl one-liner, architecture with cmd/internal/pkg, data flow: document processing → embedding generation → storage → query/response

Downloads

v0.1.39 May 24, 2025
rlama_windows_amd64.exeexe
v0.1.38 May 23, 2025
rlama_windows_amd64.exeexe
v0.1.37 May 23, 2025
rlama_windows_amd64.exeexe
v0.1.36 April 03, 2025
rlama_windows_amd64.exeexe
v0.1.35 April 01, 2025
rlama_windows_amd64.exeexe
v0.1.34 March 22, 2025
rlama_windows_amd64.exeexe
v0.1.33 March 21, 2025
rlama_windows_amd64.exeexe
v0.1.32 March 16, 2025
rlama_windows_amd64.exeexe
v0.1.31 March 15, 2025
rlama_windows_amd64.exeexe
v0.1.30 March 15, 2025
rlama_windows_amd64.exeexe
v0.1.29 March 13, 2025
rlama_windows_amd64.exeexe
v0.1.28 March 12, 2025
rlama_windows_amd64.exeexe
v0.1.27 March 12, 2025
rlama_windows_amd64.exeexe
v0.1.26 March 11, 2025
rlama_windows_amd64.exeexe
v0.1.25 March 10, 2025
rlama_windows_amd64.exeexe
v0.1.24 March 10, 2025
rlama_windows_amd64.exeexe
v0.1.23 March 08, 2025
rlama_windows_amd64.exeexe
v0.1.22 March 08, 2025
rlama_windows_amd64.exeexe
v0.1.21 March 08, 2025
rlama_windows_amd64.exeexe
v0.1.2 March 07, 2025
rlama_windows_amd64.exeexe
v0.1.1 March 07, 2025
rlama_windows_amd64.exeexe
v0.1.0 March 07, 2025
rlama_windows_amd64.exeexe

Package Info

Last Updated
May 24, 2025
Latest Version
v0.1.39
License
Apache-2.0
Total Versions
22

README

RLAMA - User Guide

⚠️ Project Temporarily Paused
This project is currently on pause due to my work and university commitments that take up a lot of my time. I am not able to actively maintain this project at the moment. Development will resume when my situation allows it.

RLAMA is a powerful AI-driven question-answering tool for your documents, seamlessly integrating with your local Ollama models. It enables you to create, manage, and interact with Retrieval-Augmented Generation (RAG) systems tailored to your documentation needs.

RLAMA Demonstration (https://img.youtube.com/vi/EIsQnBqeQxQ/0.jpg)

Table of Contents

  • Vision & Roadmap
  • Installation
  • Available Commands
    • rag - Create a RAG system
    • crawl-rag - Create a RAG system from a website
    • wizard - Create a RAG system with interactive setup
    • watch - Set up directory watching for a RAG system
    • watch-off - Disable directory watching for a RAG system
    • check-watched - Check a RAG's watched directory for new files
    • web-watch - Set up website monitoring for a RAG system
    • web-watch-off - Disable website monitoring for a RAG system
    • check-web-watched - Check a RAG's monitored website for updates
    • run - Use a RAG system
    • api - Start API server
    • list - List RAG systems
    • delete - Delete a RAG system
    • list-docs - List documents in a RAG
    • list-chunks - Inspect document chunks
    • view-chunk - View chunk details
    • add-docs - Add documents to RAG
    • crawl-add-docs - Add website content to RAG
    • update-model - Change LLM model
    • update - Update RLAMA
    • version - Display version
    • hf-browse - Browse GGUF models on Hugging Face
    • run-hf - Run a Hugging Face GGUF model
  • Uninstallation
  • Supported Document Formats
  • Troubleshooting
  • Using OpenAI Models

Vision & Roadmap

RLAMA aims to become the definitive tool for creating local RAG systems that work seamlessly for everyone—from individual developers to large enterprises. Here's our strategic roadmap:

Completed Features ✅

  • Basic RAG System Creation: CLI tool for creating and managing RAG systems
  • Document Processing: Support for multiple document formats (.txt, .md, .pdf, etc.)
  • Document Chunking: Advanced semantic chunking with multiple strategies (fixed, semantic, hierarchical, hybrid)
  • Vector Storage: Local storage of document embeddings
  • Context Retrieval: Basic semantic search with configurable context size
  • Ollama Integration: Seamless connection to Ollama models
  • Cross-Platform Support: Works on Linux, macOS, and Windows
  • Easy Installation: One-line installation script
  • API Server: HTTP endpoints for integrating RAG capabilities in other applications
  • Web Crawling: Create RAGs directly from websites
  • Guided RAG Setup Wizard: Interactive interface for easy RAG creation
  • Hugging Face Integration: Access to 45,000+ GGUF models from Hugging Face Hub

Small LLM Optimization (Q2 2025)

  • Prompt Compression: Smart context summarization for limited context windows
  • Adaptive Chunking: Dynamic content segmentation based on semantic boundaries and document structure
  • Minimal Context Retrieval: Intelligent filtering to eliminate redundant content
  • Parameter Optimization: Fine-tuned settings for different model sizes

Advanced Embedding Pipeline (Q2-Q3 2025)

  • Multi-Model Embedding Support: Integration with various embedding models
  • Hybrid Retrieval Techniques: Combining sparse and dense retrievers for better accuracy
  • Embedding Evaluation Tools: Built-in metrics to measure retrieval quality
  • Automated Embedding Cache: Smart caching to reduce computation for similar queries

User Experience Enhancements (Q3 2025)

  • Lightweight Web Interface: Simple browser-based UI for the existing CLI backend
  • Knowledge Graph Visualization: Interactive exploration of document connections
  • Domain-Specific Templates: Pre-configured settings for different domains

Enterprise Features (Q4 2025)

  • Multi-User Access Control: Role-based permissions for team environments
  • Integration with Enterprise Systems: Connectors for SharePoint, Confluence, Google Workspace
  • Knowledge Quality Monitoring: Detection of outdated or contradictory information
  • System Integration API: Webhooks and APIs for embedding RLAMA in existing workflows
  • AI Agent Creation Framework: Simplified system for building custom AI agents with RAG capabilities

Next-Gen Retrieval Innovations (Q1 2026)

  • Multi-Step Retrieval: Using the LLM to refine search queries for complex questions
  • Cross-Modal Retrieval: Support for image content understanding and retrieval
  • Feedback-Based Optimization: Learning from user interactions to improve retrieval
  • Knowledge Graphs & Symbolic Reasoning: Combining vector search with structured knowledge

RLAMA's core philosophy remains unchanged: to provide a simple, powerful, local RAG solution that respects privacy, minimizes resource requirements, and works seamlessly across platforms.

Installation

Prerequisites

  • Ollama (https://ollama.ai/) installed and running

Installation from terminal

curl -fsSL https://raw.githubusercontent.com/dontizi/rlama/main/install.sh | sh

Tech Stack

RLAMA is built with:

  • Core Language: Go (chosen for performance, cross-platform compatibility, and single binary distribution)
  • CLI Framework: Cobra (for command-line interface structure)
  • LLM Integration: Ollama API (for embeddings and completions)
  • Storage: Local filesystem-based storage (JSON files for simplicity and portability)
  • Vector Search: Custom implementation of cosine similarity for embedding retrieval

Architecture

RLAMA follows a clean architecture pattern with clear separation of concerns:

rlama/
├── cmd/                  # CLI commands (using Cobra)
│   ├── root.go           # Base command
│   ├── rag.go            # Create RAG systems
│   ├── run.go            # Query RAG systems
│   └── ...
├── internal/
│   ├── client/           # External API clients
│   │   └── ollama_client.go # Ollama API integration
│   ├── domain/           # Core domain models
│   │   ├── rag.go        # RAG system entity
│   │   └── document.go   # Document entity
│   ├── repository/       # Data persistence
│   │   └── rag_repository.go # Handles saving/loading RAGs
│   └── service/          # Business logic
│       ├── rag_service.go      # RAG operations
│       ├── document_loader.go  # Document processing
│       └── embedding_service.go # Vector embeddings
└── pkg/                  # Shared utilities
    └── vector/           # Vector operations

Data Flow

  1. Document Processing: Documents are loaded from the file system, parsed based on their type, and converted to plain text.
  2. Embedding Generation: Document text is sent to Ollama to generate vector embeddings.
  3. Storage: The RAG system (documents + embeddings) is stored in the user's home directory (~/.rlama).
  4. Query Process: When a user asks a question, it's converted to an embedding, compared against stored document embeddings, and relevant content is retrieved.
  5. Response Generation: Retrieved content and the question are sent to Ollama to generate a contextually-informed response.

Visual Representation

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Documents  │────>│  Document   │────>│  Embedding  │
│  (Input)    │     │  Processing │     │  Generation │
└─────────────┘     └─────────────┘     └─────────────┘
                                              │
                                              ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Query     │────>│  Vector     │ environment variable > default location.

<!-- truncated for length -->

Related Software

Tuboshu

Convert websites into desktop apps with Electron. Features multi-account support, global hotkey switching, custom JavaScript injection and portable packaging for Windows, macOS and Linux.

⭐ 1,291

Pluely

Open-source AI meeting assistant built with Tauri at 10MB. Features real-time transcription with OpenAI Whisper, GPT-4, Claude, Gemini and Grok support, translucent overlay, and undetectable in video calls.

⭐ 1,274ai-assistant, claude, cluely-alternative

Fluent M3 U8

Cross-platform M3U8/MPD video downloader built with PySide6 and QFluentWidgets featuring multi-threaded downloads, task management, fluent design GUI, FFmpeg and N_m3u8DL-RE integration, Python 3.11 conda environment, and deployment support for Windows/macOS/Linux with GPL-3.0 license.

⭐ 1,267fluent, m3u8, m3u8-downloader

Xiaozhi Android Client

Flutter AI voice assistant for Android and iOS with real-time conversation, Live2D characters, echo cancellation, multi-service support for Xiaozhi, Dify and OpenAI, and image messaging.

⭐ 1,252ai, chat, chatgpt

Github Stars Manager

GitHub starred repository manager with AI-powered auto-sync, semantic search, automatic categorization, release tracking, one-click downloads, smart asset filters, bilingual wiki integration, and cross-platform Electron client for Windows/macOS/Linux with 100% local data storage and MIT license.

⭐ 1,224

Observer

Build local AI agents that observe your screen, microphone and clipboard, process with local LLMs, and react with notifications, screen recording and memory. All data stays private. Works with Ollama and OpenAI.

⭐ 1,216