MobiAgent is an AI system that automates tasks on Android and HarmonyOS devices. It uses vision language models to understand what's on screen and perform actions like tapping, swiping and typing. The MobiMind model family includes specialized versions for reasoning and mixed tasks based on Qwen3-VL.
Features include experience retrieval for smarter task planning, user preference memory for personalization, and the AgentRR framework for recording and replaying tasks. Includes MobiFlow benchmark to evaluate agent performance. Deploy models with vLLM and control devices via ADB.
Use Cases:
MobiAgent: A Systematic Framework for Customizable Mobile Agents
| Paper | Huggingface | App |
English | 中文
MobiAgent is a powerful and customizable mobile agent system including:
System Architecture:
[2025.12.08] 🔥 We've released a new reasoning model (support both Android and HarmonyOS): MobiMind-Reasoning-4B MobiMind-Reasoning-4B-1208 (https://huggingface.co/IPADS-SAI/MobiMind-Reasoning-4B-1208), and 4-bit weight quantized (W4A16) MobiMind-Reasoning-4B-1208-AWQ (https://huggingface.co/IPADS-SAI/MobiMind-Reasoning-4B-1208-AWQ) version. When serving with vLLM, please add the flag --dtype float16 for quantized model to ensure compatibility.[2025.11.03] ✅ Added multi-task execution module support and user preference support. For details about multi-task usage and configuration, see here. [2025.11.03] 🧠 Introduced a user profile memory system: async preference extraction with LLM, raw-text preference storage and retrieval, optional GraphRAG via Neo4j. Preferences are retrieved as original texts and appended to experience prompts to personalize planning, see here.[2025.10.31] 🔥We've updated the MobiMind-Mixed model based on Qwen3-VL-4B-Instruct! Download it at MobiMind-Mixed-4B-1031 (https://huggingface.co/IPADS-SAI/MobiMind-Mixed-4B-1031), and add --use_qwen3 flag when running dataset creation and agent runner scripts.[2025.9.30] 🚀 added a local experience retrieval module, supporting experience query based on task description, enhancing the intelligence and efficiency of task planning![2025.9.29] We've open-sourced a mixed version of MobiMind, capable of handling both Decider and Grounder tasks! Feel free to download and try it at MobiMind-Mixed-7B (https://huggingface.co/IPADS-SAI/MobiMind-Mixed-7B).[2025.8.30] We've open-sourced the MobiAgent!Mobile App Demo:
AgentRR Demo (Left: first task; Right: subsequent task)
Multi Task Demo
task: 帮我在小红书找一下推荐的最畅销的男士牛仔裤,然后在淘宝搜这一款裤子,把淘宝中裤子品牌、名称和价格用微信发给小赵
agent_rr/ - Agent Record & Replay frameworkcollect/ - Data collection, annotation, processing and export toolsrunner/ - Agent executor that connects to phone via ADB, executes tasks, and records execution tracesMobiFlow/ - Agent evaluation benchmark based on milestone DAGapp/ - MobiAgent Android appdeployment/ - Service deployment for MobiAgent mobile applicationIf you would like to try MobiAgent directly with our APP, please download it in Download Link (https://github.com/IPADS-SAI/MobiAgent/releases/tag/v1.0) and enjoy yourself!
If you would like to try MobiAgent with python scripts which leverage Android Debug Bridge (ADB) to control your phone, please follow these steps:
Create virtual environment, e.g., using conda:
conda create -n MobiMind python=3.10
conda activate MobiMind
Simplest environment setup (in case you want to run the agent runner alone):
# Install simplest dependencies
pip install -r requirements_simple.txt
Full environment setup (in case you want to run the full pipeline):
pip install -r requirements.txt
# Download OmniParser model weights
for f in icon_detect/{train_args.yaml,model.pt,model.yaml} ; do huggingface-cli download microsoft/OmniParser-v2.0 "$f" --local-dir weights; done
# Download embedding model utils
huggingface-cli download BAAI/bge-small-zh --local-dir ./utils/experience/BAAI/bge-small-zh
# Install OCR utils (optional)
sudo apt install tesseract-ocr tesseract-ocr-chi-sim
# If you need GPU acceleration for OCR, install paddlepaddle-gpu according to your CUDA version
# For details, refer to https://www.paddlepaddle.org.cn/install/quick, CUDA 11.8 for example:
python -m pip install paddlepaddle-gpu>=3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
After downloading the model checkpoints, use vLLM to deploy model inference services:
For MobiMind-Mixed/Reasoning Model (based on Qwen3-VL-4B):
vllm serve IPADS-SAI/MobiMind-Mixed-4B --port
vllm serve Qwen/Qwen3-4B-Instruct --port
For Legacy MobiMind-Decider/Grounder Models:
vllm serve IPADS-SAI/MobiMind-Decider-7B --port
vllm serve IPADS-SAI/MobiMind-Grounder-3B --port
vllm serve Qwen/Qwen3-4B-Instruct --port
Write the list of tasks that you would like to test in runner/mobiagent/task.json, then launch agent runner:
python -m runner.mobiagent.mobiagent \
--service_ip \
--decider_port \
--grounder_port \
--planner_port \
--device
Parameters:
--service_ip: Service IP (default: localhost)--decider_port: Decider service port (default: 8000)--grounder_port: Grounder service port (default: 8001)--planner_port: Planner service port (default: 8002)--device: Device type (default: Android)The runner automatically controls the device and invoke agent models to complete the pre-defined tasks.
Important: If you deploy MobiMind-Mixed model inference, set both decider/grounder ports to ``.
For detailed usage instructions, see the README.md files in each sub-module directory.
If you find MobiAgent useful in your research, please feel free to cite our paper (https://arxiv.org/abs/2509.00531):
@misc{zhang2025mobiagentsystematicframeworkcustomizable,
title={MobiAgent: A Systematic Framework for Customizable Mobile Agents},
author={Cheng Zhang and Erhu Feng and Xi Zhao and Yisheng Zhao and Wangbo Gong and Jiahui Sun and Dong Du and Zhichao Hua and Yubin Xia and Haibo Chen},
year={2025},
eprint={2509.00531},
archivePrefix={arXiv},
primaryClass={cs.MA},
url={https://arxiv.org/abs/2509.00531},
}
We gratefully acknowledge the open-source projects like MobileAgent, UI-TARS, and Qwen-VL, etc. We also thank the National Innovation Institute of High-end Smart Appliances for their support of this project.
Star History Chart (https://api.star-history.com/svg?repos=IPADS-SAI/MobiAgent&type=Date)
Fast, minimal macOS browser with sidebar-first vertical tab design, built with SwiftUI using modern macOS APIs for high performance and low memory usage, featuring extension support, profile isolation, split-view tabs, and command palette.
AI-powered knowledge management desktop app with RAG capabilities, Tiptap collaborative editor, and dual modes: Cloud sync via Supabase or complete offline privacy with local Ollama AI, SQLite storage, and LanceDB vector search.
Open-source MCP Host desktop application with universal LLM support for ChatGPT, Anthropic, Ollama, and OpenAI-compatible models, featuring OAPHub.ai cloud integration, cross-platform deployment via Tauri/Electron, 24+ language support, and OAuth-enabled MCP server integration.
Modern web-based SSH/RDP/VNC client built with Vue 3, TypeScript, Node.js, and Apache Guacamole, featuring multi-tab management, Monaco Editor, session suspension, 2FA security, IP filtering, audit logging, Docker management, and standalone desktop application for Windows and Linux.
Native macOS LLM server with MLX-optimized Apple Silicon inference, OpenAI/Ollama-compatible APIs, MCP server support for Cursor and Claude Desktop, remote provider connections, plugin ecosystem, menu bar chat, and Apple Foundation Models integration for macOS 26+.
AI API gateway system forked from New API with multi-key channel support, gift codes, native Hugging Face interface, regex filtering, token grouping, model restrictions, multiple OAuth providers, per-user rate limiting, cache billing, and comprehensive logging built with Go.