LLM Worker
Run large language models on The Grid. Earn AIPG for powering text generation and chat.
What It Does
The LLM Worker connects your hardware to The Grid for text inference:
- Chat completions - Power aipg.chat and API requests
- Text generation - Prompts, completions, conversations
- Multiple backends - Ollama, LM Studio, vLLM, KoboldCpp, and more
When users send messages, your worker processes them and you earn AIPG.
Requirements
Hardware
| Component | Minimum | Recommended |
|---|---|---|
| GPU VRAM | 4 GB | 12 GB+ |
| System RAM | 8 GB | 16 GB+ |
| Storage | 20 GB free | 50 GB+ SSD |
CPU-only: Possible for smaller models but significantly slower.
VRAM by Model Size
| Model Size | VRAM Needed | Examples |
|---|---|---|
| 3B params | 4 GB | Llama 3.2 3B, Phi-3 Mini |
| 7-8B params | 8 GB | Llama 3 8B, Mistral 7B |
| 13B params | 16 GB | Llama 2 13B |
| 34B params | 24 GB | Code Llama 34B |
| 70B params | 48 GB+ | Llama 2 70B |
Software
- Python 3.9+ (if running from source)
- A backend: Ollama (recommended), LM Studio, vLLM, SGLang, or KoboldCpp
- Grid API key
Quick Start
1. Get an API Key
Register at api.aipowergrid.io/register (opens in a new tab) or dashboard.aipowergrid.io (opens in a new tab)
2. Install a Backend (Ollama Recommended)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3.2:3bThat's it for the backend. One command to install, one to download a model.
3. Download the Worker
Option A: Pre-built Binary (Easiest)
Download for your platform from GitHub Releases (opens in a new tab):
| Platform | Instructions |
|---|---|
| Windows | Download .zip, extract, run the .exe |
| macOS | Download .zip, extract, open the app |
| Linux | Download .zip, extract, chmod +x, run |
Option B: From Source
git clone https://github.com/AIPowerGrid/grid-inference-worker
cd grid-inference-worker
pip install -e .
grid-inference-workerOption C: Docker
git clone https://github.com/AIPowerGrid/grid-inference-worker
cd grid-inference-worker
cp .env.example .env
# Edit .env with your API key
docker compose up -d4. Configure via Web Wizard
When the worker starts, it opens a web interface at:
http://localhost:7861Here you can:
- Enter your Grid API key
- Select your backend (Ollama, etc.)
- Choose which model to serve
- Set a worker name
- Start hosting
5. Start Earning
Once configured, click "Start" and your worker connects to The Grid. Jobs come in automatically.
Configuration Options
Via Web Wizard (Recommended)
The setup wizard at http://localhost:7861 handles everything visually.
Via Command Line
grid-inference-worker \
--api-key YOUR_API_KEY \
--model llama3.2:3b \
--backend-url http://localhost:11434 \
--worker-name "my-llm-worker"Via Environment Variables
# Required
export GRID_API_KEY=your-api-key
# Optional
export MODEL_NAME=llama3.2:3b
export BACKEND_TYPE=ollama
export BACKEND_URL=http://localhost:11434
export GRID_WORKER_NAME=my-worker
export GRID_MAX_LENGTH=4096Run as System Service
Auto-start on boot:
grid-inference-worker --install-serviceWorks on Windows, macOS, and Linux.
Supported Backends
Ollama (Recommended)
Easiest setup. Handles model management automatically.
# Install
curl -fsSL https://ollama.com/install.sh | sh
# Pull models
ollama pull llama3.2:3b
ollama pull mistral
ollama pull codellamaLM Studio
GUI-based. Run a local server and point the worker at it.
vLLM
High-performance inference. Good for production deployments.
pip install vllm
vllm serve meta-llama/Llama-3-8B --port 8000SGLang
Fast inference with RadixAttention.
KoboldCpp
CPU-optimized, good for older hardware.
Supported Models
Any model your backend supports. Popular choices:
General Purpose
| Model | Size | VRAM | Notes |
|---|---|---|---|
| Llama 3.2 3B | 3B | 4 GB | Fast, good for most tasks |
| Llama 3 8B | 8B | 8 GB | Great balance |
| Mistral 7B | 7B | 8 GB | Efficient, fast |
| Mixtral 8x7B | 47B | 32 GB | MoE, high quality |
Code
| Model | Size | VRAM | Notes |
|---|---|---|---|
| Code Llama | 7-34B | 8-24 GB | Programming focused |
| DeepSeek Coder | 6-33B | 8-24 GB | Strong at code |
Multilingual
| Model | Size | VRAM | Notes |
|---|---|---|---|
| Qwen | 7-72B | 8-48 GB | Good multilingual support |
Installing Models (Ollama)
# List available models
ollama list
# Pull a new model
ollama pull llama3
ollama pull mistral
ollama pull codellama:13b
# Remove a model
ollama rm model-nameEarning Rewards
How It Works
- Your worker connects to The Grid
- User sends a chat message via aipg.chat or API
- The Grid routes the request to your worker
- Your model generates a response
- You earn AIPG for the completed job
Reward Factors
- Tokens generated - More output = more reward
- Speed - Faster responses = more jobs/hour
- Uptime - Consistent availability = consistent work
- Competition - Fewer workers = more jobs for you
Bonded Workers
Optionally bond AIPG for priority:
- Higher priority job routing
- Signal commitment to the network
- Bonding is NOT required to earn
Best Practices
Maximize Earnings
- Popular models - Llama 3 and Mistral get the most requests
- Stay online 24/7 - More uptime = more jobs
- Fast responses - Lower latency = more completed jobs
- Reliable connection - Dropped jobs hurt reputation
Stability
- Monitor memory - Watch for OOM errors
- Check logs - Catch issues early
- Update regularly - Pull latest worker version
- Restart weekly - Clears memory leaks
Hardware Tips
- Consumer GPUs work great - RTX 3080/3090/4080/4090
- Cloud works too - Runpod, Vast.ai, Lambda Labs
- More VRAM = bigger models - 24 GB opens most options
- NVMe SSD - Faster model loading
Troubleshooting
Worker won't start
# Check Python version
python --version # Need 3.9+
# Verify Ollama is running
ollama list
# Check API key
echo $GRID_API_KEYOllama errors
# Reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Restart Ollama service
systemctl restart ollama # Linux
brew services restart ollama # macOS
# Check model is downloaded
ollama listOut of memory
- Use a smaller model (3B instead of 7B)
- Close other GPU applications
- Check for memory leaks, restart worker
- Try quantized versions (q4_0, q4_K_M)
No jobs coming in
- Normal during low-traffic periods
- Verify worker shows online in dashboard
- Check you're running a popular model
- Ensure your internet connection is stable
Connection issues
# Test API connection
curl https://api.aipowergrid.io/health
# Check firewall
# Worker needs outbound HTTPS (443)
# Verify API key is valid
grid-inference-worker --test-connection