LLM Worker

Run large language models on The Grid. Earn AIPG for powering text generation and chat.

What It Does

The LLM Worker connects your hardware to The Grid for text inference:

Chat completions - Power aipg.chat and API requests
Text generation - Prompts, completions, conversations
Multiple backends - Ollama, LM Studio, vLLM, KoboldCpp, and more

When users send messages, your worker processes them and you earn AIPG.

Requirements

Hardware

Component	Minimum	Recommended
GPU VRAM	4 GB	12 GB+
System RAM	8 GB	16 GB+
Storage	20 GB free	50 GB+ SSD

CPU-only: Possible for smaller models but significantly slower.

VRAM by Model Size

Model Size	VRAM Needed	Examples
3B params	4 GB	Llama 3.2 3B, Phi-3 Mini
7-8B params	8 GB	Llama 3 8B, Mistral 7B
13B params	16 GB	Llama 2 13B
34B params	24 GB	Code Llama 34B
70B params	48 GB+	Llama 2 70B

Software

Python 3.9+ (if running from source)
A backend: Ollama (recommended), LM Studio, vLLM, SGLang, or KoboldCpp
Grid API key

Quick Start

1. Get an API Key

2. Install a Backend (Ollama Recommended)

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
 
# Pull a model
ollama pull llama3.2:3b

That's it for the backend. One command to install, one to download a model.

3. Download the Worker

Option A: Pre-built Binary (Easiest)

Download for your platform from GitHub Releases (opens in a new tab):

Platform	Instructions
Windows	Download `.zip`, extract, run the `.exe`
macOS	Download `.zip`, extract, open the app
Linux	Download `.zip`, extract, `chmod +x`, run

Option B: From Source

git clone https://github.com/AIPowerGrid/grid-inference-worker
cd grid-inference-worker
pip install -e .
grid-inference-worker

Option C: Docker

git clone https://github.com/AIPowerGrid/grid-inference-worker
cd grid-inference-worker
cp .env.example .env
# Edit .env with your API key
docker compose up -d

4. Configure via Web Wizard

When the worker starts, it opens a web interface at:

http://localhost:7861

Here you can:

Enter your Grid API key
Select your backend (Ollama, etc.)
Choose which model to serve
Set a worker name
Start hosting

5. Start Earning

Once configured, click "Start" and your worker connects to The Grid. Jobs come in automatically.

Configuration Options

Via Web Wizard (Recommended)

The setup wizard at http://localhost:7861 handles everything visually.

Via Command Line

grid-inference-worker \
  --api-key YOUR_API_KEY \
  --model llama3.2:3b \
  --backend-url http://localhost:11434 \
  --worker-name "my-llm-worker"

Via Environment Variables

# Required
export GRID_API_KEY=your-api-key
 
# Optional
export MODEL_NAME=llama3.2:3b
export BACKEND_TYPE=ollama
export BACKEND_URL=http://localhost:11434
export GRID_WORKER_NAME=my-worker
export GRID_MAX_LENGTH=4096

Run as System Service

Auto-start on boot:

grid-inference-worker --install-service

Works on Windows, macOS, and Linux.

Supported Backends

Ollama (Recommended)

Easiest setup. Handles model management automatically.

# Install
curl -fsSL https://ollama.com/install.sh | sh
 
# Pull models
ollama pull llama3.2:3b
ollama pull mistral
ollama pull codellama

LM Studio

GUI-based. Run a local server and point the worker at it.

vLLM

High-performance inference. Good for production deployments.

pip install vllm
vllm serve meta-llama/Llama-3-8B --port 8000

SGLang

Fast inference with RadixAttention.

KoboldCpp

CPU-optimized, good for older hardware.

Supported Models

Any model your backend supports. Popular choices:

General Purpose

Model	Size	VRAM	Notes
Llama 3.2 3B	3B	4 GB	Fast, good for most tasks
Llama 3 8B	8B	8 GB	Great balance
Mistral 7B	7B	8 GB	Efficient, fast
Mixtral 8x7B	47B	32 GB	MoE, high quality

Code

Model	Size	VRAM	Notes
Code Llama	7-34B	8-24 GB	Programming focused
DeepSeek Coder	6-33B	8-24 GB	Strong at code

Multilingual

Model	Size	VRAM	Notes
Qwen	7-72B	8-48 GB	Good multilingual support

Installing Models (Ollama)

# List available models
ollama list
 
# Pull a new model
ollama pull llama3
ollama pull mistral
ollama pull codellama:13b
 
# Remove a model
ollama rm model-name

Earning Rewards

How It Works

Your worker connects to The Grid
User sends a chat message via aipg.chat or API
The Grid routes the request to your worker
Your model generates a response
You earn AIPG for the completed job

Reward Factors

Tokens generated - More output = more reward
Speed - Faster responses = more jobs/hour
Uptime - Consistent availability = consistent work
Competition - Fewer workers = more jobs for you

Bonded Workers

Optionally bond AIPG for priority:

Higher priority job routing
Signal commitment to the network
Bonding is NOT required to earn

Best Practices

Maximize Earnings

Popular models - Llama 3 and Mistral get the most requests
Stay online 24/7 - More uptime = more jobs
Fast responses - Lower latency = more completed jobs
Reliable connection - Dropped jobs hurt reputation

Stability

Monitor memory - Watch for OOM errors
Check logs - Catch issues early
Update regularly - Pull latest worker version
Restart weekly - Clears memory leaks

Hardware Tips

Consumer GPUs work great - RTX 3080/3090/4080/4090
Cloud works too - Runpod, Vast.ai, Lambda Labs
More VRAM = bigger models - 24 GB opens most options
NVMe SSD - Faster model loading

Troubleshooting

Worker won't start

# Check Python version
python --version  # Need 3.9+
 
# Verify Ollama is running
ollama list
 
# Check API key
echo $GRID_API_KEY

Ollama errors

# Reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh
 
# Restart Ollama service
systemctl restart ollama  # Linux
brew services restart ollama  # macOS
 
# Check model is downloaded
ollama list

Out of memory

Use a smaller model (3B instead of 7B)
Close other GPU applications
Check for memory leaks, restart worker
Try quantized versions (q4_0, q4_K_M)

No jobs coming in

Normal during low-traffic periods
Verify worker shows online in dashboard
Check you're running a popular model
Ensure your internet connection is stable

Connection issues

# Test API connection
curl https://api.aipowergrid.io/health
 
# Check firewall
# Worker needs outbound HTTPS (443)
 
# Verify API key is valid
grid-inference-worker --test-connection

Links

Resource	URL
Repository	github.com/AIPowerGrid/grid-inference-worker (opens in a new tab)
Releases	github.com/AIPowerGrid/grid-inference-worker/releases (opens in a new tab)
Get API Key	api.aipowergrid.io/register (opens in a new tab)
Ollama	ollama.com (opens in a new tab)
Discord	discord.gg/W9D8j6HCtC (opens in a new tab)

Generate AI Media Worker