MiniCPM4 and MiniCPM4.1 are ultra-efficient large language models designed specifically for end devices, achieving 3x generation speedup on reasoning tasks compared to standard models. The MiniCPM4.1 series introduces a hybrid reasoning model with trainable sparse attention that can operate in both deep reasoning mode and non-reasoning mode, making it extremely versatile.
These models deliver ultimate efficiency improvements while maintaining optimal performance at their scale, supporting GPU acceleration across Apple Silicon Metal, CoreML, NVIDIA CUDA, and AMD Vulkan platforms. With built-in support for speculative decoding via EAGLE3, quantization options through BitCPM4, and deployment flexibility across vLLM, SGLang, llama.cpp, and Ollama, MiniCPM enables powerful AI capabilities on devices that were previously unable to run such models effectively.
Use Cases:
中文 | English
MiniCPM Paper | MiniCPM Wiki (in Chinese) | MiniCPM-V Repo | Join our discord and WeChat | Join Us
| HuggingFace | ModelScope |
|---|---|
| MiniCPM4.1-8B (https://huggingface.co/openbmb/MiniCPM4.1-8B) | MiniCPM4.1-8B (https://www.modelscope.cn/models/OpenBMB/MiniCPM4.1-8B) |
| MiniCPM4.1-8B-GPTQ (https://huggingface.co/openbmb/MiniCPM4.1-8B-GPTQ) | MiniCPM4.1-8B-GPTQ (https://www.modelscope.cn/openbmb/MiniCPM4.1-8B-GPTQ) |
| MiniCPM4.1-8B-AutoAWQ (https://huggingface.co/openbmb/MiniCPM4.1-8B-AutoAWQ) | MiniCPM4.1-8B-AutoAWQ (https://www.modelscope.cn/openbmb/MiniCPM4.1-8B-AutoAWQ) |
| MiniCPM-4.1-8B-Marlin (https://huggingface.co/openbmb/MiniCPM-4.1-8B-Marlin) | MiniCPM-4.1-8B-Marlin (https://www.modelscope.cn/openbmb/MiniCPM-4.1-8B-Marlin) |
| MiniCPM4.1-8B-GGUF (https://huggingface.co/openbmb/MiniCPM4.1-8B-GGUF) | MiniCPM4.1-8B-GGUF (https://www.modelscope.cn/openbmb/MiniCPM4.1-8B-GGUF) |
| MiniCPM4.1-8B-MLX (https://huggingface.co/openbmb/MiniCPM4.1-8B-MLX) | MiniCPM4.1-8B-MLX (https://www.modelscope.cn/openbmb/MiniCPM4.1-8B-MLX) |
| MiniCPM4.1-8B-Eagle3 (https://huggingface.co/openbmb/MiniCPM4.1-8B-Eagle3) | MiniCPM4.1-8B-Eagle3 (https://www.modelscope.cn/openbmb/MiniCPM4.1-8B-Eagle3) |
| MiniCPM4-8B (https://huggingface.co/openbmb/MiniCPM4-8B) | MiniCPM4-8B (https://www.modelscope.cn/models/OpenBMB/MiniCPM4-8B) |
| MiniCPM4-0.5B (https://huggingface.co/openbmb/MiniCPM4-0.5B) | MiniCPM4-0.5B (https://www.modelscope.cn/models/OpenBMB/MiniCPM4-0.5B) |
| BitCPM4-1B (https://huggingface.co/openbmb/BitCPM4-1B) | BitCPM4-1B (https://www.modelscope.cn/models/OpenBMB/BitCPM4-1B) |
| BitCPM4-0.5B (https://huggingface.co/openbmb/BitCPM4-0.5B) | BitCPM4-0.5B (https://www.modelscope.cn/models/OpenBMB/BitCPM4-0.5B) |
| MiniCPM4-Survey (https://huggingface.co/openbmb/MiniCPM4-Survey) | MiniCPM4-Survey (https://www.modelscope.cn/models/OpenBMB/MiniCPM4-Survey) |
| MiniCPM4-MCP (https://huggingface.co/openbmb/MiniCPM4-MCP) | MiniCPM4-MCP (https://www.modelscope.cn/models/OpenBMB/MiniCPM4-MCP) |
Terminal-based AI coding assistant with multi-provider LLM support, session management, LSP integration, and interactive TUI for developers.
AI-powered video translation and dubbing tool supporting 100 languages with voice cloning, automated subtitle generation, and platform-optimized output for global content distribution.
All-in-one AI content marketing platform for creating, publishing, and monetizing across 14+ social channels with automation, trend tracking, and engagement tools.
Privacy-first AI meeting assistant with local transcription, speaker diarization, and automated summarization running entirely on your infrastructure without cloud dependencies.
Modern cross-platform system monitor built with Rust offering real-time CPU and memory tracking with beautiful UI, process management, and advanced search capabilities.
Feature-rich Flutter-based Bilibili third-party client supporting multiple platforms with offline playback, DLNA casting, and extensive social interaction features.