LLM Worker

LLM Worker

Run large language models on The Grid. Earn AIPG for powering text generation and chat.


What It Does

The LLM Worker connects your hardware to The Grid for text inference:

  • Chat completions - Power aipg.chat and API requests
  • Text generation - Prompts, completions, conversations
  • Multiple backends - Ollama, LM Studio, vLLM, KoboldCpp, and more

When users send messages, your worker processes them and you earn AIPG.


Requirements

Hardware

ComponentMinimumRecommended
GPU VRAM4 GB12 GB+
System RAM8 GB16 GB+
Storage20 GB free50 GB+ SSD

CPU-only: Possible for smaller models but significantly slower.

VRAM by Model Size

Model SizeVRAM NeededExamples
3B params4 GBLlama 3.2 3B, Phi-3 Mini
7-8B params8 GBLlama 3 8B, Mistral 7B
13B params16 GBLlama 2 13B
34B params24 GBCode Llama 34B
70B params48 GB+Llama 2 70B

Software

  • Python 3.9+ (if running from source)
  • A backend: Ollama (recommended), LM Studio, vLLM, SGLang, or KoboldCpp
  • Grid API key

Quick Start

1. Get an API Key

Register at api.aipowergrid.io/register (opens in a new tab) or dashboard.aipowergrid.io (opens in a new tab)

2. Install a Backend (Ollama Recommended)

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
 
# Pull a model
ollama pull llama3.2:3b

That's it for the backend. One command to install, one to download a model.

3. Download the Worker

Option A: Pre-built Binary (Easiest)

Download for your platform from GitHub Releases (opens in a new tab):

PlatformInstructions
WindowsDownload .zip, extract, run the .exe
macOSDownload .zip, extract, open the app
LinuxDownload .zip, extract, chmod +x, run

Option B: From Source

git clone https://github.com/AIPowerGrid/grid-inference-worker
cd grid-inference-worker
pip install -e .
grid-inference-worker

Option C: Docker

git clone https://github.com/AIPowerGrid/grid-inference-worker
cd grid-inference-worker
cp .env.example .env
# Edit .env with your API key
docker compose up -d

4. Configure via Web Wizard

When the worker starts, it opens a web interface at:

http://localhost:7861

Here you can:

  • Enter your Grid API key
  • Select your backend (Ollama, etc.)
  • Choose which model to serve
  • Set a worker name
  • Start hosting

5. Start Earning

Once configured, click "Start" and your worker connects to The Grid. Jobs come in automatically.


Configuration Options

Via Web Wizard (Recommended)

The setup wizard at http://localhost:7861 handles everything visually.

Via Command Line

grid-inference-worker \
  --api-key YOUR_API_KEY \
  --model llama3.2:3b \
  --backend-url http://localhost:11434 \
  --worker-name "my-llm-worker"

Via Environment Variables

# Required
export GRID_API_KEY=your-api-key
 
# Optional
export MODEL_NAME=llama3.2:3b
export BACKEND_TYPE=ollama
export BACKEND_URL=http://localhost:11434
export GRID_WORKER_NAME=my-worker
export GRID_MAX_LENGTH=4096

Run as System Service

Auto-start on boot:

grid-inference-worker --install-service

Works on Windows, macOS, and Linux.


Supported Backends

Ollama (Recommended)

Easiest setup. Handles model management automatically.

# Install
curl -fsSL https://ollama.com/install.sh | sh
 
# Pull models
ollama pull llama3.2:3b
ollama pull mistral
ollama pull codellama

LM Studio

GUI-based. Run a local server and point the worker at it.

vLLM

High-performance inference. Good for production deployments.

pip install vllm
vllm serve meta-llama/Llama-3-8B --port 8000

SGLang

Fast inference with RadixAttention.

KoboldCpp

CPU-optimized, good for older hardware.


Supported Models

Any model your backend supports. Popular choices:

General Purpose

ModelSizeVRAMNotes
Llama 3.2 3B3B4 GBFast, good for most tasks
Llama 3 8B8B8 GBGreat balance
Mistral 7B7B8 GBEfficient, fast
Mixtral 8x7B47B32 GBMoE, high quality

Code

ModelSizeVRAMNotes
Code Llama7-34B8-24 GBProgramming focused
DeepSeek Coder6-33B8-24 GBStrong at code

Multilingual

ModelSizeVRAMNotes
Qwen7-72B8-48 GBGood multilingual support

Installing Models (Ollama)

# List available models
ollama list
 
# Pull a new model
ollama pull llama3
ollama pull mistral
ollama pull codellama:13b
 
# Remove a model
ollama rm model-name

Earning Rewards

How It Works

  1. Your worker connects to The Grid
  2. User sends a chat message via aipg.chat or API
  3. The Grid routes the request to your worker
  4. Your model generates a response
  5. You earn AIPG for the completed job

Reward Factors

  • Tokens generated - More output = more reward
  • Speed - Faster responses = more jobs/hour
  • Uptime - Consistent availability = consistent work
  • Competition - Fewer workers = more jobs for you

Bonded Workers

Optionally bond AIPG for priority:

  • Higher priority job routing
  • Signal commitment to the network
  • Bonding is NOT required to earn

Best Practices

Maximize Earnings

  • Popular models - Llama 3 and Mistral get the most requests
  • Stay online 24/7 - More uptime = more jobs
  • Fast responses - Lower latency = more completed jobs
  • Reliable connection - Dropped jobs hurt reputation

Stability

  • Monitor memory - Watch for OOM errors
  • Check logs - Catch issues early
  • Update regularly - Pull latest worker version
  • Restart weekly - Clears memory leaks

Hardware Tips

  • Consumer GPUs work great - RTX 3080/3090/4080/4090
  • Cloud works too - Runpod, Vast.ai, Lambda Labs
  • More VRAM = bigger models - 24 GB opens most options
  • NVMe SSD - Faster model loading

Troubleshooting

Worker won't start

# Check Python version
python --version  # Need 3.9+
 
# Verify Ollama is running
ollama list
 
# Check API key
echo $GRID_API_KEY

Ollama errors

# Reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh
 
# Restart Ollama service
systemctl restart ollama  # Linux
brew services restart ollama  # macOS
 
# Check model is downloaded
ollama list

Out of memory

  • Use a smaller model (3B instead of 7B)
  • Close other GPU applications
  • Check for memory leaks, restart worker
  • Try quantized versions (q4_0, q4_K_M)

No jobs coming in

  • Normal during low-traffic periods
  • Verify worker shows online in dashboard
  • Check you're running a popular model
  • Ensure your internet connection is stable

Connection issues

# Test API connection
curl https://api.aipowergrid.io/health
 
# Check firewall
# Worker needs outbound HTTPS (443)
 
# Verify API key is valid
grid-inference-worker --test-connection

Links

ResourceURL
Repositorygithub.com/AIPowerGrid/grid-inference-worker (opens in a new tab)
Releasesgithub.com/AIPowerGrid/grid-inference-worker/releases (opens in a new tab)
Get API Keyapi.aipowergrid.io/register (opens in a new tab)
Ollamaollama.com (opens in a new tab)
Discorddiscord.gg/W9D8j6HCtC (opens in a new tab)