Mobi Agent Download

✓
Mobi Agent Download

AI system for automating Android and HarmonyOS tasks using vision language models. Features MobiMind model family, experience retrieval, preference learning and the AgentRR framework for task recording.

⭐ 1,409 stars on GitHub

Latest Release: v1.0

About Software
About software content is created by automation system. If you want to complain, visit our complaint page.

MobiAgent is an AI system that automates tasks on Android and HarmonyOS devices. It uses vision language models to understand what's on screen and perform actions like tapping, swiping and typing. The MobiMind model family includes specialized versions for reasoning and mixed tasks based on Qwen3-VL.

Features include experience retrieval for smarter task planning, user preference memory for personalization, and the AgentRR framework for recording and replaying tasks. Includes MobiFlow benchmark to evaluate agent performance. Deploy models with vLLM and control devices via ADB.

Use Cases:

Automate tasks on Android phones using AI vision models
Control mobile device interfaces through natural language commands
Build custom mobile automation agents with preference learning
Benchmark and evaluate mobile GUI agent performance
Record and replay mobile task sequences for faster execution

Downloads

v1.0 August 29, 2025

Mobiagent.apkapk

Download

Package Info

Last Updated This date is from the latest scrape, not necessarily the repo’s own last update.: Aug 29, 2025
Latest Version: v1.0
License: Apache-2.0
Total Versions: 1

README

MobiAgent: A Systematic Framework for Customizable Mobile Agents

| Paper | Huggingface | App |

English | 中文

About

MobiAgent is a powerful and customizable mobile agent system including:

An agent model family: MobiMind
An agent acceleration framework: AgentRR
An agent benchmark: MobiFlow

System Architecture:

News

[2025.12.08] 🔥 We've released a new reasoning model (support both Android and HarmonyOS): MobiMind-Reasoning-4B MobiMind-Reasoning-4B-1208 (https://huggingface.co/IPADS-SAI/MobiMind-Reasoning-4B-1208), and 4-bit weight quantized (W4A16) MobiMind-Reasoning-4B-1208-AWQ (https://huggingface.co/IPADS-SAI/MobiMind-Reasoning-4B-1208-AWQ) version. When serving with vLLM, please add the flag --dtype float16 for quantized model to ensure compatibility.
[2025.11.03] ✅ Added multi-task execution module support and user preference support. For details about multi-task usage and configuration, see here.
[2025.11.03] 🧠 Introduced a user profile memory system: async preference extraction with LLM, raw-text preference storage and retrieval, optional GraphRAG via Neo4j. Preferences are retrieved as original texts and appended to experience prompts to personalize planning, see here.
[2025.10.31] 🔥We've updated the MobiMind-Mixed model based on Qwen3-VL-4B-Instruct! Download it at MobiMind-Mixed-4B-1031 (https://huggingface.co/IPADS-SAI/MobiMind-Mixed-4B-1031), and add --use_qwen3 flag when running dataset creation and agent runner scripts.
[2025.9.30] 🚀 added a local experience retrieval module, supporting experience query based on task description, enhancing the intelligence and efficiency of task planning!
[2025.9.29] We've open-sourced a mixed version of MobiMind, capable of handling both Decider and Grounder tasks! Feel free to download and try it at MobiMind-Mixed-7B (https://huggingface.co/IPADS-SAI/MobiMind-Mixed-7B).
[2025.8.30] We've open-sourced the MobiAgent!

Evaluation Results

Demo

Mobile App Demo:

AgentRR Demo (Left: first task; Right: subsequent task)

Multi Task Demo

task: 帮我在小红书找一下推荐的最畅销的男士牛仔裤，然后在淘宝搜这一款裤子，把淘宝中裤子品牌、名称和价格用微信发给小赵

Project Structure

agent_rr/ - Agent Record & Replay framework
collect/ - Data collection, annotation, processing and export tools
runner/ - Agent executor that connects to phone via ADB, executes tasks, and records execution traces
MobiFlow/ - Agent evaluation benchmark based on milestone DAG
app/ - MobiAgent Android app
deployment/ - Service deployment for MobiAgent mobile application

Quick Start

Use with MobiAgent APP

If you would like to try MobiAgent directly with our APP, please download it in Download Link (https://github.com/IPADS-SAI/MobiAgent/releases/tag/v1.0) and enjoy yourself!

Use with Python Scripts

If you would like to try MobiAgent with python scripts which leverage Android Debug Bridge (ADB) to control your phone, please follow these steps:

Environment Setup

Create virtual environment, e.g., using conda:

conda create -n MobiMind python=3.10
conda activate MobiMind

Simplest environment setup (in case you want to run the agent runner alone):

# Install simplest dependencies
pip install -r requirements_simple.txt

Full environment setup (in case you want to run the full pipeline):

pip install -r requirements.txt

# Download OmniParser model weights
for f in icon_detect/{train_args.yaml,model.pt,model.yaml} ; do huggingface-cli download microsoft/OmniParser-v2.0 "$f" --local-dir weights; done

# Download embedding model utils
huggingface-cli download BAAI/bge-small-zh --local-dir ./utils/experience/BAAI/bge-small-zh

# Install OCR utils (optional)
sudo apt install tesseract-ocr tesseract-ocr-chi-sim

# If you need GPU acceleration for OCR, install paddlepaddle-gpu according to your CUDA version
# For details, refer to https://www.paddlepaddle.org.cn/install/quick, CUDA 11.8 for example:
python -m pip install paddlepaddle-gpu>=3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

Mobile Device Setup

Download and install ADBKeyboard (https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) on your Android device
Enable Developer Options on your Android device and allow USB debugging
Connect your phone to the computer using a USB cable

Model Deployment

After downloading the model checkpoints, use vLLM to deploy model inference services:

For MobiMind-Mixed/Reasoning Model (based on Qwen3-VL-4B):

vllm serve IPADS-SAI/MobiMind-Mixed-4B --port 
vllm serve Qwen/Qwen3-4B-Instruct --port

For Legacy MobiMind-Decider/Grounder Models:

vllm serve IPADS-SAI/MobiMind-Decider-7B --port 
vllm serve IPADS-SAI/MobiMind-Grounder-3B --port 
vllm serve Qwen/Qwen3-4B-Instruct --port

Launch Agent Runner

Write the list of tasks that you would like to test in runner/mobiagent/task.json, then launch agent runner:

python -m runner.mobiagent.mobiagent \
  --service_ip  \
  --decider_port  \
  --grounder_port  \
  --planner_port  \
  --device

Parameters:

--service_ip: Service IP (default: localhost)
--decider_port: Decider service port (default: 8000)
--grounder_port: Grounder service port (default: 8001)
--planner_port: Planner service port (default: 8002)
--device: Device type (default: Android)

The runner automatically controls the device and invoke agent models to complete the pre-defined tasks.

Important: If you deploy MobiMind-Mixed model inference, set both decider/grounder ports to ``.

Detailed Sub-module Usage

For detailed usage instructions, see the README.md files in each sub-module directory.

Citation

If you find MobiAgent useful in your research, please feel free to cite our paper (https://arxiv.org/abs/2509.00531):

@misc{zhang2025mobiagentsystematicframeworkcustomizable,
  title={MobiAgent: A Systematic Framework for Customizable Mobile Agents}, 
  author={Cheng Zhang and Erhu Feng and Xi Zhao and Yisheng Zhao and Wangbo Gong and Jiahui Sun and Dong Du and Zhichao Hua and Yubin Xia and Haibo Chen},
  year={2025},
  eprint={2509.00531},
  archivePrefix={arXiv},
  primaryClass={cs.MA},
  url={https://arxiv.org/abs/2509.00531}, 
}

Acknowledgements

We gratefully acknowledge the open-source projects like MobileAgent, UI-TARS, and Qwen-VL, etc. We also thank the National Innovation Institute of High-end Smart Appliances for their support of this project.

Star History

Star History Chart (https://api.star-history.com/svg?repos=IPADS-SAI/MobiAgent&type=Date)

Related Software

Nook

Fast, minimal macOS browser with sidebar-first vertical tab design, built with SwiftUI using modern macOS APIs for high performance and low memory usage, featuring extension support, profile isolation, split-view tabs, and command palette.

⭐ 1,677

Klee

AI-powered knowledge management desktop app with RAG capabilities, Tiptap collaborative editor, and dual modes: Cloud sync via Supabase or complete offline privacy with local Ollama AI, SQLite storage, and LanceDB vector search.

⭐ 1,676app, klee, llamaindex

Dive

Open-source MCP Host desktop application with universal LLM support for ChatGPT, Anthropic, Ollama, and OpenAI-compatible models, featuring OAPHub.ai cloud integration, cross-platform deployment via Tauri/Electron, 24+ language support, and OAuth-enabled MCP server integration.

⭐ 1,652ai-agents, llm-interface, mcp-client

Nexus Terminal

Modern web-based SSH/RDP/VNC client built with Vue 3, TypeScript, Node.js, and Apache Guacamole, featuring multi-tab management, Monaco Editor, session suspension, 2FA security, IP filtering, audit logging, Docker management, and standalone desktop application for Windows and Linux.

⭐ 1,643express, nodejs, typescript

Osaurus

Native macOS LLM server with MLX-optimized Apple Silicon inference, OpenAI/Ollama-compatible APIs, MCP server support for Cursor and Claude Desktop, remote provider connections, plugin ecosystem, menu bar chat, and Apple Foundation Models integration for macOS 26+.

⭐ 1,622llm, swift, apple-foundation-models

Veloera

AI API gateway system forked from New API with multi-key channel support, gift codes, native Hugging Face interface, regex filtering, token grouping, model restrictions, multiple OAuth providers, per-user rate limiting, cache billing, and comprehensive logging built with Go.

⭐ 1,548