Skip to content

Framework for running AI locally on mobile devices and wearables. Hardware-aware C/C++ backend with wrappers for Flutter & React Native. Kotlin & Swift coming soon.

License

Notifications You must be signed in to change notification settings

cactus-compute/cactus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Email   Discord   Design Docs   

Cactus is a lightweight, high-performance framework for running AI models on mobile devices, with simple and consistent APIs across C/C++, Dart/Flutter and Ts/React-Native. Cactus currently leverages GGML backends to support any GGUF model already compatible with Llama.cpp.

Features

  • Text completion and chat completion
  • Vision Language Models
  • Streaming token generation
  • Embedding generation
  • Text-to-speech model support (early stages)
  • JSON mode with schema validation
  • Chat templates with Jinja2 support
  • Low memory footprint
  • Battery-efficient inference
  • Background processing

Why Cactus?

  • APIs are increasingly becoming expensive, especially at scale
  • Private and local, data do not leave the device whatsoever
  • Low-latency anf fault-tolerant, no need for users to have internet connections
  • Small models excell at most tasks, big APIs are often only better at enterprise tasks like coding
  • Freedom to use any GGUF model, unlike Apple Foundations and Google AI Core
  • React-Native and Flutter APIs, no need for separate Swift and Android setups
  • iOS xcframework and JNILibs ifworking in native setup
  • Neat and tiny C++ build for custom hardware

Flutter

  1. Update pubspec.yaml: Add cactus to your project's dependencies. Ensure you have flutter: sdk: flutter (usually present by default).
    dependencies:
      flutter:
        sdk: flutter
      cactus: ^0.1.0
  2. Install dependencies: Execute the following command in your project terminal:
    flutter pub get
  3. Basic Flutter Text Completion
    import 'package:cactus/cactus.dart';
    
    Future<String> basicCompletion() async {
    // Initialize context
    final context = await CactusContext.init(CactusInitParams(
        modelPath: '/path/to/model.gguf',
        contextSize: 2048,
        threads: 4,
    ));
    
    // Generate response
    final result = await context.completion(CactusCompletionParams(
        messages: [
        ChatMessage(role: 'user', content: 'Hello, how are you?')
        ],
        maxPredictedTokens: 100,
        temperature: 0.7,
    ));
    
    context.free();
    return result.text;
    }

To learn more, see the Flutter Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and more.

React Native

  1. Install the cactus-react-native package: Using npm:

    npm install cactus-react-native

    Or using yarn:

    yarn add cactus-react-native
  2. Install iOS Pods (if not using Expo): For native iOS projects, ensure you link the native dependencies. Navigate to your ios directory and run:

    npx pod-install
  3. Basic React-Native Text Completion

    import { initLlama } from 'cactus-react-native';
    
    // Initialize a context
    const context = await initLlama({
        model: '/path/to/your/model.gguf',
        n_ctx: 2048,
        n_threads: 4,
    });
    
    // Generate text
    const result = await context.completion({
        messages: [
            { role: 'user', content: 'Hello, how are you?' }
        ],
        n_predict: 100,
        temperature: 0.7,
    });
    
    console.log(result.text);

To learn more, see the React Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and more.

C++

Cactus backend is written in C/C++ and can run directly on any ARM/X86/Raspberry PI hardware like phones, smart tvs, watches, speakers, cameras, laptops etc.

  1. Setup You need CMake 3.14+ installed, or install with brew install cmake (on macOS) or standard package managers on Linux.

  2. Build from Source

    git clone https://github.com/your-org/cactus.git
    cd cactus
    mkdir build && cd build
    cmake .. -DCMAKE_BUILD_TYPE=Release
    make -j$(nproc)
  3. CMake Integration Add to your CMakeLists.txt:

    # Add Cactus as subdirectory
    add_subdirectory(cactus)
    
    # Link to your target
    target_link_libraries(your_target cactus)
    target_include_directories(your_target PRIVATE cactus)
    
    # Requires C++17 or higher 
  4. Basic Text Completion

    #include "cactus/cactus.h"
    #include <iostream>
    
    int main() {
        cactus::cactus_context context;
        
        // Configure parameters
        common_params params;
        params.model.path = "model.gguf";
        params.n_ctx = 2048;
        params.n_threads = 4;
        params.n_gpu_layers = 99; // Use GPU acceleration
        
        // Load model
        if (!context.loadModel(params)) {
            std::cerr << "Failed to load model" << std::endl;
            return 1;
        }
        
        // Set prompt
        context.params.prompt = "Hello, how are you?";
        context.params.n_predict = 100;
        
        // Initialize sampling
        if (!context.initSampling()) {
            std::cerr << "Failed to initialize sampling" << std::endl;
            return 1;
        }
        
        // Generate response
        context.beginCompletion();
        context.loadPrompt();
        
        while (context.has_next_token && !context.is_interrupted) {
            auto token_output = context.doCompletion();
            if (token_output.tok == -1) break;
        }
        
        std::cout << "Response: " << context.generated_text << std::endl;
        return 0;
    }

To learn more, see the C++ Docs. It covers chat design, embeddings, multimodal models, text-to-speech, and more.

Using this Repo & Example Apps

First, clone the repo with git clone https://github.com/cactus-compute/cactus.git, cd into it and make all scripts executable with chmod +x scripts/*.sh

  1. Flutter

    • Build the Android JNILibs with scripts/build-flutter-android.sh.
    • Build the Flutter Plugin with scripts/build-flutter-android.sh.
    • Navigate to the example app with cd examples/flutter.
    • Open your simulator via Xcode or Android Studio, walkthrough if you have not done this before.
    • Always start app with this combo flutter clean && flutter pub get && flutter run.
    • Play with the app, and make changes either to the example app or plugin as desired.
  2. React Native

    • Build the Android JNILibs with scripts/build-react-android.sh.
    • Build the Flutter Plugin with scripts/build-react-android.sh.
    • Navigate to the example app with cd examples/react.
    • Setup your simulator via Xcode or Android Studio, walkthrough if you have not done this before.
    • Always start app with this combo yarn && yarn ios or yarn && yarn android.
    • Play with the app, and make changes either to the example app or package as desired.
    • For now, if changes are made in the package, you would manually copy the files/folders into the examples/react/node_modules/cactus-react-native.
  3. C/C++

    • Navigate to the example app with cd examples/cpp.
    • There are multiple main files main_vlm, main_llm, main_embed, main_tts.
    • Build both the libraries and executable using build.sh.
    • Run with one of the executables ./cactus_vlm, ./cactus_llm, ./cactus_embed, ./cactus_tts.
    • Try different models and make changes as desired.
  4. Contributing

    • To contribute a bug fix, create a branch after making your changes with git checkout -b <branch-name> and submit a PR.
    • To contribute a feature, please raise as issue first so it can be discussed, to avoid intersecting with someone else.
    • Join our discord

Performance

Device Gemma-3 1B Q8 (toks/sec) Qwen-2.5 1.5B Q8 (toks/sec)
iPhone 16 Pro Max 46 37
iPhone 16 Pro 46 37
iPhone 16 42 36
iPhone 15 Pro Max 39 31
iPhone 15 Pro 39 31
iPhone 14 Pro Max 38 29
OnePlus 13 5G 37 -
Samsung Galaxy S24 Ultra 36 -
iPhone 15 36 25
OnePlus Open 33 -
Samsung Galaxy S23 5G 32 -
Samsung Galaxy S24 31 -
iPhone 13 Pro 30 -
OnePlus 12 30 -
Galaxy S25 Ultra 25 -
OnePlus 11 23 -
iPhone 13 mini 22 -
Redmi K70 Ultra 21 -
Xiaomi 13 21 -
Samsung Galaxy S24+ 19 -
Samsung Galaxy Z Fold 4 19 -
Xiaomi Poco F6 5G 19 -

Demo

We created a demo chat app we use for benchmarking:

Download App Download App

Recommendations

You can run up to 10B models at Q4 on most devices, but it is not recommended for production due to file size, speed, battery, heating performance. We generally give the following recommendation.

  • Language Generation: SmolLM2-360m, Qwen-3-600m-Q6, Gemma-3-1B-Q6, Qwen-3-1.7B-Q6
  • Multimodal Language Generation: Smol-VLM-500m-Q6, Gemma-3n-2B-Q6
  • Embeddings: nomic-v2-moe-300m-Q6, jina-v3-570m-Q6
  • Text-To-Speech: OuteTTS-0.2-500m-Q6

Gemma-3n-2B-Q6 is a great omni model and beats GPT 4.1 across many metrics. It is multimodal (vision, audio) and can perfectly be used for embedding text, images, audio, as well as zero-shot classification and more, with clever prompt engineering. We are trying hard to get the weights.

About

Framework for running AI locally on mobile devices and wearables. Hardware-aware C/C++ backend with wrappers for Flutter & React Native. Kotlin & Swift coming soon.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 7