GitHub - cactus-compute/cactus: Framework for running AI locally on mobile devices and wearables. Hardware-aware C/C++ backend with wrappers for Flutter & React Native. Kotlin & Swift coming soon.

Cactus is a lightweight, high-performance framework for running AI models on mobile devices, with simple and consistent APIs across C/C++, Dart/Flutter and Ts/React-Native. Cactus currently leverages GGML backends to support any GGUF model already compatible with Llama.cpp.

Text completion and chat completion
Vision Language Models
Streaming token generation
Embedding generation
Text-to-speech model support (early stages)
JSON mode with schema validation
Chat templates with Jinja2 support
Low memory footprint
Battery-efficient inference
Background processing

APIs are increasingly becoming expensive, especially at scale
Private and local, data do not leave the device whatsoever
Low-latency anf fault-tolerant, no need for users to have internet connections
Small models excell at most tasks, big APIs are often only better at enterprise tasks like coding
Freedom to use any GGUF model, unlike Apple Foundations and Google AI Core
React-Native and Flutter APIs, no need for separate Swift and Android setups
iOS xcframework and JNILibs ifworking in native setup
Neat and tiny C++ build for custom hardware

Update pubspec.yaml: Add cactus to your project's dependencies. Ensure you have flutter: sdk: flutter (usually present by default).
```
dependencies:
  flutter:
    sdk: flutter
  cactus: ^0.1.0
```
Install dependencies: Execute the following command in your project terminal:
```
flutter pub get
```

Basic Flutter Text Completion

import 'package:cactus/cactus.dart';

Future<String> basicCompletion() async {
// Initialize context
final context = await CactusContext.init(CactusInitParams(
    modelPath: '/path/to/model.gguf',
    contextSize: 2048,
    threads: 4,
));

// Generate response
final result = await context.completion(CactusCompletionParams(
    messages: [
    ChatMessage(role: 'user', content: 'Hello, how are you?')
    ],
    maxPredictedTokens: 100,
    temperature: 0.7,
));

context.free();
return result.text;
}

To learn more, see the Flutter Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and more.

Install the cactus-react-native package: Using npm:
```
npm install cactus-react-native
```
Or using yarn:
```
yarn add cactus-react-native
```
Install iOS Pods (if not using Expo): For native iOS projects, ensure you link the native dependencies. Navigate to your ios directory and run:
```
npx pod-install
```

Basic React-Native Text Completion

import { initLlama } from 'cactus-react-native';

// Initialize a context
const context = await initLlama({
    model: '/path/to/your/model.gguf',
    n_ctx: 2048,
    n_threads: 4,
});

// Generate text
const result = await context.completion({
    messages: [
        { role: 'user', content: 'Hello, how are you?' }
    ],
    n_predict: 100,
    temperature: 0.7,
});

console.log(result.text);

To learn more, see the React Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and more.

Cactus backend is written in C/C++ and can run directly on any ARM/X86/Raspberry PI hardware like phones, smart tvs, watches, speakers, cameras, laptops etc.

Setup You need CMake 3.14+ installed, or install with brew install cmake (on macOS) or standard package managers on Linux.

Build from Source

git clone https://github.com/your-org/cactus.git
cd cactus
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

CMake Integration Add to your CMakeLists.txt:

# Add Cactus as subdirectory
add_subdirectory(cactus)

# Link to your target
target_link_libraries(your_target cactus)
target_include_directories(your_target PRIVATE cactus)

# Requires C++17 or higher

Basic Text Completion

#include "cactus/cactus.h"
#include <iostream>

int main() {
    cactus::cactus_context context;
    
    // Configure parameters
    common_params params;
    params.model.path = "model.gguf";
    params.n_ctx = 2048;
    params.n_threads = 4;
    params.n_gpu_layers = 99; // Use GPU acceleration
    
    // Load model
    if (!context.loadModel(params)) {
        std::cerr << "Failed to load model" << std::endl;
        return 1;
    }
    
    // Set prompt
    context.params.prompt = "Hello, how are you?";
    context.params.n_predict = 100;
    
    // Initialize sampling
    if (!context.initSampling()) {
        std::cerr << "Failed to initialize sampling" << std::endl;
        return 1;
    }
    
    // Generate response
    context.beginCompletion();
    context.loadPrompt();
    
    while (context.has_next_token && !context.is_interrupted) {
        auto token_output = context.doCompletion();
        if (token_output.tok == -1) break;
    }
    
    std::cout << "Response: " << context.generated_text << std::endl;
    return 0;
}

To learn more, see the C++ Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and more.

First, clone the repo with git clone https://github.com/cactus-compute/cactus.git, cd into it and make all scripts executable with chmod +x scripts/*.sh

Flutter
- Build the Android JNILibs with scripts/build-flutter-android.sh.
- Build the Flutter Plugin with scripts/build-flutter-android.sh.
- Navigate to the example app with cd examples/flutter.
- Open your simulator via Xcode or Android Studio, walkthrough if you have not done this before.
- Always start app with this combo flutter clean && flutter pub get && flutter run.
- Play with the app, and make changes either to the example app or plugin as desired.
React Native
- Build the Android JNILibs with scripts/build-react-android.sh.
- Build the Flutter Plugin with scripts/build-react-android.sh.
- Navigate to the example app with cd examples/react.
- Setup your simulator via Xcode or Android Studio, walkthrough if you have not done this before.
- Always start app with this combo yarn && yarn ios or yarn && yarn android.
- Play with the app, and make changes either to the example app or package as desired.
- For now, if changes are made in the package, you would manually copy the files/folders into the examples/react/node_modules/cactus-react-native.
C/C++
- Navigate to the example app with cd examples/cpp.
- There are multiple main files main_vlm, main_llm, main_embed, main_tts.
- Build both the libraries and executable using build.sh.
- Run with one of the executables ./cactus_vlm, ./cactus_llm, ./cactus_embed, ./cactus_tts.
- Try different models and make changes as desired.
Contributing
- To contribute a bug fix, create a branch after making your changes with git checkout -b <branch-name> and submit a PR.
- To contribute a feature, please raise as issue first so it can be discussed, to avoid intersecting with someone else.
- Join our discord

Device	Gemma-3 1B Q8 (toks/sec)	Qwen-2.5 1.5B Q8 (toks/sec)
iPhone 16 Pro Max	46	37
iPhone 16 Pro	46	37
iPhone 16	42	36
iPhone 15 Pro Max	39	31
iPhone 15 Pro	39	31
iPhone 14 Pro Max	38	29
OnePlus 13 5G	37	-
Samsung Galaxy S24 Ultra	36	-
iPhone 15	36	25
OnePlus Open	33	-
Samsung Galaxy S23 5G	32	-
Samsung Galaxy S24	31	-
iPhone 13 Pro	30	-
OnePlus 12	30	-
Galaxy S25 Ultra	25	-
OnePlus 11	23	-
iPhone 13 mini	22	-
Redmi K70 Ultra	21	-
Xiaomi 13	21	-
Samsung Galaxy S24+	19	-
Samsung Galaxy Z Fold 4	19	-
Xiaomi Poco F6 5G	19	-

We created a demo chat app we use for benchmarking:

You can run up to 10B models at Q4 on most devices, but it is not recommended for production due to file size, speed, battery, heating performance. We generally give the following recommendation.

Language Generation: SmolLM2-360m, Qwen-3-600m-Q6, Gemma-3-1B-Q6, Qwen-3-1.7B-Q6
Multimodal Language Generation: Smol-VLM-500m-Q6, Gemma-3n-2B-Q6
Embeddings: nomic-v2-moe-300m-Q6, jina-v3-570m-Q6
Text-To-Speech: OuteTTS-0.2-500m-Q6

Gemma-3n-2B-Q6 is a great omni model and beats GPT 4.1 across many metrics. It is multimodal (vision, audio) and can perfectly be used for embedding text, images, audio, as well as zero-shot classification and more, with clever prompt engineering. We are trying hard to get the weights.

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
assets		assets
cactus-flutter		cactus-flutter
cactus-ios		cactus-ios
cactus-react		cactus-react
cactus		cactus
docs		docs
examples		examples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

License

cactus-compute/cactus

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages