Lemonade Download

✓
Lemonade Download

High-performance local LLM server with GPU and NPU acceleration support, featuring multiple inference engines, OpenAI-compatible API, and cross-platform model deployment for AMD Ryzen AI processors.

⭐ 1,845 stars on GitHub

Latest Release: v9.1.0

About Software
About software content is created by automation system. If you want to complain, visit our complaint page.

Lemonade is a high-performance local LLM serving platform that intelligently configures state-of-the-art inference engines for NPUs and GPUs. It supports multiple model formats (GGUF, FLM, ONNX) and provides seamless switching between CPU, GPU (Vulkan/ROCm/Metal), and NPU acceleration, with a focus on AMD Ryzen AI 300 series processors.

The platform includes a desktop app with built-in model manager, chat interface, and OpenAI-compatible API server. It features CLI tools for model management, performance benchmarking, and memory profiling. Used by startups, research teams at Stanford, and AMD for production LLM deployments.

Use Cases:

Run local LLMs with maximum performance on AMD Ryzen AI NPUs and GPUs
Deploy GGUF, FLM, and ONNX models across CPU, GPU, and NPU hardware configurations
Integrate LLM inference into applications via OpenAI-compatible API endpoints
Switch between multiple inference engines (llamacpp, OGA, FLM) at runtime
Build AI-powered desktop apps with built-in model manager and chat interface

Downloads

v9.1.0 December 10, 2025

lemonade-server-minimal.msimsi

Download

lemonade-server-minimal_9.1.0_amd64.debdeb

Download

lemonade.msimsi

Download

lemonade_9.1.0_amd64.debdeb

Download

v9.0.8 December 05, 2025

lemonade-server-minimal.msimsi

Download

lemonade-server-minimal_9.0.8_amd64.debdeb

Download

v9.0.7 December 04, 2025

lemonade-server-minimal.msimsi

Download

lemonade-server-minimal_9.0.7_amd64.debdeb

Download

v9.0.6 December 03, 2025

lemonade-server-minimal.msimsi

Download

lemonade-server-minimal_9.0.6_amd64.debdeb

Download

v9.0.5 December 02, 2025

lemonade-server-minimal.msimsi

Download

lemonade-server-minimal_9.0.5_amd64.debdeb

Download

v9.0.4 November 25, 2025

lemonade-server-minimal.msimsi

Download

lemonade-server-minimal_9.0.4_amd64.debdeb

Download

v9.0.3 November 19, 2025

lemonade-server-minimal.msimsi

Download

lemonade-server-minimal_9.0.3_amd64.debdeb

Download

v9.0.2 November 13, 2025

lemonade-server-9.0.2-Linux.debdeb

Download

Lemonade_Server_Installer.exeexe

Download

v8.2.2 November 10, 2025

lemonade-server-9.0.1-Linux.debdeb

Download

Lemonade_Server_Installer.exeexe

Download

Lemonade_Server_Installer_beta.exeexe

Download

v8.2.1 November 05, 2025

lemonade-server-9.0.0-Linux.debdeb

Download

Lemonade_Server_Installer.exeexe

Download

Lemonade_Server_Installer_beta.exeexe

Download

v8.2.0 October 29, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.12 October 08, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.11 October 01, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.10 September 12, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.9 September 10, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.8 September 03, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.7 August 23, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.6 August 22, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.5 August 20, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.4 August 19, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.3 August 14, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.2 August 08, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.1 August 06, 2025

Lemonade_Server_Installer.exeexe

Download

v8.1.0 July 30, 2025

Lemonade_Server_Installer.exeexe

Download

v8.0.6 July 17, 2025

Lemonade_Server_Installer.exeexe

Download

v8.0.5 July 14, 2025

Lemonade_Server_Installer.exeexe

Download

v8.0.4 July 08, 2025

Lemonade_Server_Installer.exeexe

Download

v8.0.3 June 27, 2025

Lemonade_Server_Installer.exeexe

Download

v8.0.2 June 24, 2025

Lemonade_Server_Installer.exeexe

Download

Package Info

Last Updated This date is from the latest scrape, not necessarily the repo’s own last update.: Dec 10, 2025
Latest Version: v9.1.0
License: Apache-2.0
Total Versions: 29

README

🍋 Lemonade: Local LLM Serving with GPU and NPU acceleration

Download | Documentation | Discord

Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs.

Startups such as Styrk AI (https://styrk.ai/styrk-ai-and-amd-guardrails-for-your-on-device-ai-revolution/), research teams like Hazy Research at Stanford (https://www.amd.com/en/developer/resources/technical-articles/2025/minions--on-device-and-cloud-language-model-collaboration-on-ryz.html), and large companies like AMD (https://www.amd.com/en/developer/resources/technical-articles/unlocking-a-wave-of-llm-apps-on-ryzen-ai-through-lemonade-server.html) use Lemonade to run LLMs.

Getting Started

Install: Windows (https://github.com/lemonade-sdk/lemonade/releases/latest/download/lemonade.msi) · Ubuntu (https://lemonade-server.ai/install_options.html) · Source (https://lemonade-server.ai/install_options.html)
Get Models: Browse and download with the Model Manager
Chat: Try models with the built-in chat interface
Connect: Use Lemonade with your favorite apps:

Want your app featured here? Discord · GitHub Issue · Email

Using the CLI

To run and chat with Gemma 3:

lemonade-server run Gemma-3-4b-it-GGUF

To install models ahead of time, use the pull command:

lemonade-server pull Gemma-3-4b-it-GGUF

To check all models available, use the list command:

lemonade-server list

Tip: You can use --llamacpp vulkan/rocm to select a backend when running GGUF models.

Model Library

Lemonade supports GGUF, FLM, and ONNX models across CPU, GPU, and NPU (see supported configurations).

Use lemonade-server pull or the built-in Model Manager to download models. You can also import custom GGUF/ONNX models from Hugging Face.

Browse all built-in models → (https://lemonade-server.ai/docs/server/server_models/)

Supported Configurations

Lemonade supports the following configurations, while also making it easy to switch between them at runtime. Find more information about it here.

Hardware	Engine: OGA	Engine: llamacpp	Engine: FLM	Windows	Linux
🧠 CPU	All platforms	All platforms	-	✅	✅
🎮 GPU	—	Vulkan: All platformsROCm: Selected AMD platforms*Metal: Apple Silicon	—	✅	✅
🤖 NPU	AMD Ryzen™ AI 300 series	—	Ryzen™ AI 300 series	✅	—

See supported AMD ROCm platforms

Architecture
Platform Support
GPU Models




gfx1151 (STX Halo)
Windows, Ubuntu
Ryzen AI MAX+ Pro 395


gfx120X (RDNA4)
Windows, Ubuntu
Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT


gfx110X (RDNA3)
Windows, Ubuntu
Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT

Project Roadmap

Under Development	Under Consideration	Recently Completed
Image Generation	vLLM support	General speech-to-text support (whisper.cpp)
Add imagegen and transcription to app	Handheld devices: Ryzen AI Z2 Extreme APUs	Multiple models loaded at the same time
ROCm support for Ryzen AI 360-375 (Strix) APUs	Text to speech	Lemonade desktop app

Integrate Lemonade Server with Your Application

You can use any OpenAI-compatible client library by configuring it to use http://localhost:8000/api/v1 as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.

Feel free to pick and choose your preferred language.

Python	C++	Java	C#	Node.js	Go	Ruby	Rust	PHP
openai-python (https://github.com/openai/openai-python)	openai-cpp (https://github.com/olrea/openai-cpp)	openai-java (https://github.com/openai/openai-java)	openai-dotnet (https://github.com/openai/openai-dotnet)	openai-node (https://github.com/openai/openai-node)	go-openai (https://github.com/sashabaranov/go-openai)	ruby-openai (https://github.com/alexrudall/ruby-openai)	async-openai (https://github.com/64bit/async-openai)	openai-php (https://github.com/openai-php/client)

Python Client Example

from openai import OpenAI

# Initialize the client to use Lemonade Server
client = OpenAI(
    base_url="http://localhost:8000/api/v1",
    api_key="lemonade"  # required but unused
)

# Create a chat completion
completion = client.chat.completions.create(
    model="Llama-3.2-1B-Instruct-Hybrid",  # or any other available model
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

# Print the response
print(completion.choices[0].message.content)

For more detailed integration instructions, see the Integration Guide.

Beyond an LLM Server

The Lemonade Python SDK is also available, which includes the following components:

🐍 Lemonade Python API: High-level Python API to directly integrate Lemonade LLMs into Python applications.
🖥️ Lemonade CLI: The lemonade CLI lets you mix-and-match LLMs (ONNX, GGUF, SafeTensors) with prompting templates, accuracy testing, performance benchmarking, and memory profiling to characterize your models on your hardware.

FAQ

To read our frequently asked questions, see our FAQ Guide

Contributing

We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our contribution guide.

New contributors can find beginner-friendly issues tagged with "Good First Issue" to get started.

Maintainers

This project is sponsored by AMD. It is maintained by @danielholanda @jeremyfowers @ramkrishna @vgodsoe in equal measure. You can reach us by filing an issue (https://github.com/lemonade-sdk/lemonade/issues), emailing [email protected], or joining our Discord (https://discord.gg/5xXzkMu8Zk).

License and Attribution

This project is:

Built with C++ (server) and Python (SDK) with ❤️ for the open source community,
Standing on the shoulders of great tools from:
- ggml/llama.cpp (https://github.com/ggml-org/llama.cpp)
- OnnxRuntime GenAI (https://github.com/microsoft/onnxruntime-genai)
- Hugging Face Hub (https://github.com/huggingface/huggingface_hub)
- OpenAI API (https://github.com/openai/openai-python)
- IRON/MLIR-AIE (https://github.com/Xilinx/mlir-aie)
- and more...
Accelerated by mentorship from the OCV Catalyst program.
Licensed under the Apache 2.0 License (https://github.com/lemonade-sdk/lemonade/blob/main/LICENSE).
- Portions of the project are licensed as described in NOTICE.md.