Mini Cpm Download

Ultra-efficient large language model achieving 3x faster reasoning generation on end devices with hybrid sparse attention and extensive hardware acceleration support.

⭐ 8,464 stars on GitHub
Latest Release: 2.4.2

About Software

MiniCPM4 and MiniCPM4.1 are ultra-efficient large language models designed specifically for end devices, achieving 3x generation speedup on reasoning tasks compared to standard models. The MiniCPM4.1 series introduces a hybrid reasoning model with trainable sparse attention that can operate in both deep reasoning mode and non-reasoning mode, making it extremely versatile.

These models deliver ultimate efficiency improvements while maintaining optimal performance at their scale, supporting GPU acceleration across Apple Silicon Metal, CoreML, NVIDIA CUDA, and AMD Vulkan platforms. With built-in support for speculative decoding via EAGLE3, quantization options through BitCPM4, and deployment flexibility across vLLM, SGLang, llama.cpp, and Ollama, MiniCPM enables powerful AI capabilities on devices that were previously unable to run such models effectively.

Use Cases:

  • Deploy ultra-efficient LLMs on end devices with 3x faster reasoning generation
  • Run powerful language models on resource-constrained hardware efficiently
  • Achieve 5x generation acceleration on typical end-side chips
  • Train sparse attention models with only 5B long-text tokens
  • Execute hybrid reasoning with trainable sparse attention patterns

Downloads

2.4.2 June 30, 2025
minicpm-2.4.2-setup.exeexe

Package Info

Last Updated
Jun 30, 2025
Latest Version
2.4.2
License
Apache-2.0
Total Versions
1

README

    中文 | English

MiniCPM Paper | MiniCPM Wiki (in Chinese) | MiniCPM-V Repo | Join our discord and WeChat | Join Us

Changelog🔥

  • [2025.09.29] InfLLM-V2 paper (https://arxiv.org/abs/2509.24663) is released! We can train a sparse attention model with only 5B long-text tokens. 🔥🔥🔥
  • [2025.09.05] MiniCPM4.1 series are released! This series is a hybrid reasoning model with trainable sparse attention, which can be used in both deep reasoning mode and non-reasoning mode. 🔥🔥🔥
  • [2025.06.06] Released MiniCPM4 (https://huggingface.co/collections/openbmb/minicpm-4-6841ab29d180257e940baa9b)! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips!
  • [2024.09.05] We release MiniCPM3-4B (https://huggingface.co/openbmb/MiniCPM3-4B)! This model outperforms Phi-3.5-mini-instruct and GPT-3.5-Turbo-0125 and is comparable to several models with 7B-9B parameters like Llama3.1-8B-Instruct, Qwen2-7B-Instruct, and GLM-4-9B-Chat.
  • [2024.07.05] Released MiniCPM-S-1B (https://huggingface.co/openbmb/MiniCPM-S-1B-sft)! This model achieves an average sparsity of 87.89% in the FFN layer, reducing FFN FLOPs by 84%, while maintaining downstream task performance.
  • [2024.04.11] Released MiniCPM-2B-128k (https://huggingface.co/openbmb/MiniCPM-2B-128k), MiniCPM-MoE-8x2B (https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) and MiniCPM-1B (https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)! Click here (https://openbmb.vercel.app/) to read our technical blog.
  • [2024.02.01] Released MiniCPM-2B (https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)! This model performs similarly to Mistral-7B on public benchmarks (with better performance in Chinese, math, and code abilities) and overall outperforms models like Llama2-13B, MPT-30B, and Falcon-40B.

Quick Links

  • Changelog🔥
  • Quick Links
  • Model Downloads
  • MiniCPM4 and MiniCPM4.1 Series
    • Highlights
    • Introduction
    • Evaluation Results
      • Efficiency Evaluation
      • Comprehensive Evaluation
      • Long Text Evaluation
    • Inference
      • Hybird Reasoning Mode
      • HuggingFace
      • vLLM
        • Speculative Decoding
            1. Download MiniCPM4.1 Draft Model
            1. Install EAGLE3-Compatible vLLM
            1. Launch vLLM Server with Speculative Decoding
            1. Client Usage Example
          • vLLM Configuration Parameters
        • Standard Inference (Without Speculative Decoding)
      • SGLang
        • Speculative Decoding
            1. Download MiniCPM4.1 Draft Model
            1. Install EAGLE3-Compatible SGLang
            1. Launch SGLang Server with Speculative Decoding
            1. Client Usage
          • Configuration Parameters
        • Standard Inference (Without Speculative Decoding)
      • CPM.cu
      • llama.cpp and Ollama
        • llama.cpp
        • Ollama
    • BitCPM4: Quantization
      • BitCPM4 Evaluation
      • BitCPM4 Inference
    • MiniCPM4 Application
      • MiniCPM4-Survey: Trustworthy Survey Generation
        • Demo and Quick Start
        • Performance Evaluation
      • MiniCPM4-MCP: Tool Use with Model Context Protocol
        • Demo
        • Performance Evaluation
      • MiniCPM Intel AIPC Client: A New Edge Large Model Powerhouse
        • Key Features
        • System Requirements
        • Download
  • LICENSE
    • Model LICENSE
    • Statement
  • Institutions
  • Citation

Model Downloads

HuggingFace ModelScope
MiniCPM4.1-8B (https://huggingface.co/openbmb/MiniCPM4.1-8B) MiniCPM4.1-8B (https://www.modelscope.cn/models/OpenBMB/MiniCPM4.1-8B)
MiniCPM4.1-8B-GPTQ (https://huggingface.co/openbmb/MiniCPM4.1-8B-GPTQ) MiniCPM4.1-8B-GPTQ (https://www.modelscope.cn/openbmb/MiniCPM4.1-8B-GPTQ)
MiniCPM4.1-8B-AutoAWQ (https://huggingface.co/openbmb/MiniCPM4.1-8B-AutoAWQ) MiniCPM4.1-8B-AutoAWQ (https://www.modelscope.cn/openbmb/MiniCPM4.1-8B-AutoAWQ)
MiniCPM-4.1-8B-Marlin (https://huggingface.co/openbmb/MiniCPM-4.1-8B-Marlin) MiniCPM-4.1-8B-Marlin (https://www.modelscope.cn/openbmb/MiniCPM-4.1-8B-Marlin)
MiniCPM4.1-8B-GGUF (https://huggingface.co/openbmb/MiniCPM4.1-8B-GGUF) MiniCPM4.1-8B-GGUF (https://www.modelscope.cn/openbmb/MiniCPM4.1-8B-GGUF)
MiniCPM4.1-8B-MLX (https://huggingface.co/openbmb/MiniCPM4.1-8B-MLX) MiniCPM4.1-8B-MLX (https://www.modelscope.cn/openbmb/MiniCPM4.1-8B-MLX)
MiniCPM4.1-8B-Eagle3 (https://huggingface.co/openbmb/MiniCPM4.1-8B-Eagle3) MiniCPM4.1-8B-Eagle3 (https://www.modelscope.cn/openbmb/MiniCPM4.1-8B-Eagle3)
MiniCPM4-8B (https://huggingface.co/openbmb/MiniCPM4-8B) MiniCPM4-8B (https://www.modelscope.cn/models/OpenBMB/MiniCPM4-8B)
MiniCPM4-0.5B (https://huggingface.co/openbmb/MiniCPM4-0.5B) MiniCPM4-0.5B (https://www.modelscope.cn/models/OpenBMB/MiniCPM4-0.5B)
BitCPM4-1B (https://huggingface.co/openbmb/BitCPM4-1B) BitCPM4-1B (https://www.modelscope.cn/models/OpenBMB/BitCPM4-1B)
BitCPM4-0.5B (https://huggingface.co/openbmb/BitCPM4-0.5B) BitCPM4-0.5B (https://www.modelscope.cn/models/OpenBMB/BitCPM4-0.5B)
MiniCPM4-Survey (https://huggingface.co/openbmb/MiniCPM4-Survey) MiniCPM4-Survey (https://www.modelscope.cn/models/OpenBMB/MiniCPM4-Survey)
MiniCPM4-MCP (https://huggingface.co/openbmb/MiniCPM4-MCP) MiniCPM4-MCP (https://www.modelscope.cn/models/OpenBMB/MiniCPM4-MCP)
See full README on repository.