TrueNAS Hosted Offline LLM:
Dolphin, Ollama and AnythingLLM
Introduction
Running AI models on your TrueNAS server provides centralized access to powerful language models for your entire network. This guide walks through deploying Ollama with Dolphin models and AnythingLLM on TrueNAS, making AI assistance available to all your devices while maintaining complete data privacy.
By the end of this guide, you'll have:
- Ollama running as a Docker container on TrueNAS
- Dolphin models stored efficiently on your ZFS pool
- AnythingLLM accessible via web browser from any device on your network
- Persistent storage that survives container restarts
- Optional GPU passthrough for accelerated inference
Why Run LLMs on TrueNAS?
Centralized Access: One server, multiple client devices accessing AI Efficient Storage: ZFS handles large model files with deduplication and compression Always Available: Server runs 24/7, models ready when you need them Resource Pooling: Leverage server-grade hardware for better performance Data Protection: ZFS snapshots protect your AI configurations and chat history Network Integration: Access from desktops, laptops, phones, tablets
Prerequisites
TrueNAS Requirements
- TrueNAS Version: SCALE 22.12+ (Dragonfish or later recommended)
- Why SCALE: Native Docker support via Apps/Kubernetes
- TrueNAS CORE: Not recommended - requires more complex VM setup
Hardware Requirements
Minimum Server Specs:
- CPU: 4+ cores with AVX support
- RAM: 16GB total (8GB for models + 8GB for TrueNAS/other services)
- Storage: 100GB free space for models and data
- Network: Gigabit Ethernet
Recommended Server Specs:
- CPU: 8+ cores, modern Intel/AMD
- RAM: 32GB+ (allows multiple simultaneous users)
- Storage: 500GB+ SSD pool for AI workloads
- GPU: NVIDIA GPU with 8GB+ VRAM (optional, significant speedup)
- Network: 2.5GbE or 10GbE for faster model loading
Storage Planning
Dataset Structure:
tank/
├── docker/ # Docker container data
│ ├── ollama/ # Ollama models and config
│ └── anythingllm/ # AnythingLLM data
└── backups/
└── ai-stack/ # Backup snapshots
Space Requirements:
- Ollama base: 500MB
- Dolphin-Phi (2.7B): ~1.6GB
- Dolphin-Mistral (7B): ~4.1GB
- Dolphin-Mixtral (8x7B): ~26GB
- AnythingLLM: ~2GB + documents
- Recommended: 100GB+ for flexibility
Part 1: Preparing TrueNAS
Create Datasets
Navigate to Storage → Pools and create dedicated datasets:
1. Create parent dataset:
- Pool: [your-pool]
- Name: docker
- Record Size: 1M (good for large files)
- Compression: LZ4 (balanced)
- Enable: Deduplication OFF (not needed for models)
2. Create Ollama dataset:
- Parent: docker
- Name: ollama
- Mount Point: /mnt/[pool]/docker/ollama
3. Create AnythingLLM dataset:
- Parent: docker
- Name: anythingllm
- Mount Point: /mnt/[pool]/docker/anythingllm
Set Permissions
# SSH into TrueNAS as root
ssh root@truenas.local
# Set ownership for Docker access
chown -R 568:568 /mnt/tank/docker/ollama
chown -R 568:568 /mnt/tank/docker/anythingllm
# Set permissions
chmod -R 755 /mnt/tank/docker/ollama
chmod -R 755 /mnt/tank/docker/anythingllm
Enable Docker/Apps
- Navigate to Apps
- If first time, configure Apps pool:
- Click "Choose Pool"
- Select your pool
- Click "Choose"
- Wait for Apps system to initialize
Part 2: Installing Ollama via Docker
Method 1: TrueNAS Apps GUI (Easier)
Unfortunately, Ollama isn't in the official TruCharts catalog yet. We'll use Custom App:
- Navigate to Apps → Discover Apps
- Click "Custom App" (top right)
- Configure Application:
Application Name: ollama
Image Configuration:
- Image Repository:
ollama/ollama - Image Tag:
latest - Pull Policy:
IfNotPresent
Container Configuration:
- Container Port:
11434 - Node Port:
11434 - Protocol:
TCP
Storage:
- Type:
Host Path - Host Path:
/mnt/tank/docker/ollama - Mount Path:
/root/.ollama
Advanced:
- Restart Policy:
Always
- Click "Install"
Method 2: Docker Compose (Advanced, More Control)
Create a docker-compose file:
# SSH into TrueNAS
ssh root@truenas.local
# Create compose directory
mkdir -p /mnt/tank/docker/compose/ollama
cd /mnt/tank/docker/compose/ollama
# Create docker-compose.yml
cat > docker-compose.yml <<'EOF'
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- /mnt/tank/docker/ollama:/root/.ollama
environment:
- OLLAMA_MODELS=/root/.ollama/models
- OLLAMA_HOST=0.0.0.0:11434
# Uncomment for GPU support (NVIDIA)
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
EOF
# Start the container
docker compose up -d
# Check status
docker compose ps
docker logs ollama
Method 3: Using TrueNAS Shell
# SSH into TrueNAS
ssh root@truenas.local
# Run Ollama container directly
docker run -d \
--name ollama \
--restart unless-stopped \
-p 11434:11434 \
-v /mnt/tank/docker/ollama:/root/.ollama \
-e OLLAMA_HOST=0.0.0.0:11434 \
ollama/ollama:latest
# Verify it's running
docker ps | grep ollama
Verify Ollama Installation
# From TrueNAS shell
curl http://localhost:11434/api/tags
# From another computer on your network
curl http://truenas-ip:11434/api/tags
# Should return: {"models":[]}
Part 3: Installing Dolphin Models
Access the Ollama Container
# SSH into TrueNAS
ssh root@truenas.local
# Enter the Ollama container
docker exec -it ollama bash
# Now you're inside the container
Download Models
# Inside the Ollama container
# Option 1: Dolphin-Mistral 7B (Recommended)
ollama pull dolphin-mistral
# Option 2: Dolphin-Mixtral (Better quality, more RAM)
ollama pull dolphin-mixtral
# Option 3: Dolphin-Phi (Faster, smaller)
ollama pull dolphin-phi
# Option 4: Latest Dolphin 2.5
ollama pull dolphin2.5-mixtral
# List installed models
ollama list
# Exit container
exit
Alternative: Pull from TrueNAS Host
# Run ollama commands directly
docker exec ollama ollama pull dolphin-mistral
docker exec ollama ollama list
Verify Model Storage
# Check dataset usage
zfs list | grep docker/ollama
# View model files
ls -lh /mnt/tank/docker/ollama/models/
Test Model
# Quick test
docker exec -it ollama ollama run dolphin-mistral "Say hello"
# Should return a response
Part 4: Installing AnythingLLM
Method 1: Docker Compose (Recommended)
# Create compose directory
mkdir -p /mnt/tank/docker/compose/anythingllm
cd /mnt/tank/docker/compose/anythingllm
# Create docker-compose.yml
cat > docker-compose.yml <<'EOF'
version: '3.8'
services:
anythingllm:
image: mintplexlabs/anythingllm:latest
container_name: anythingllm
restart: unless-stopped
ports:
- "3001:3001"
volumes:
- /mnt/tank/docker/anythingllm:/app/server/storage
environment:
- STORAGE_DIR=/app/server/storage
- OLLAMA_BASE_URL=http://truenas-ip:11434
depends_on:
- ollama
EOF
# Replace truenas-ip with your actual IP
# Example: 192.168.1.100
# Start AnythingLLM
docker compose up -d
# Check logs
docker logs -f anythingllm
Method 2: Direct Docker Run
# Replace with your TrueNAS IP
docker run -d \
--name anythingllm \
--restart unless-stopped \
-p 3001:3001 \
-v /mnt/tank/docker/anythingllm:/app/server/storage \
-e OLLAMA_BASE_URL=http://192.168.1.100:11434 \
mintplexlabs/anythingllm:latest
Method 3: TrueNAS Custom App
- Apps → Discover Apps → Custom App
Application Name: anythingllm
Image Configuration:
- Image Repository:
mintplexlabs/anythingllm - Image Tag:
latest
Container Configuration:
- Container Port:
3001 - Node Port:
3001 - Protocol:
TCP
Storage:
- Type:
Host Path - Host Path:
/mnt/tank/docker/anythingllm - Mount Path:
/app/server/storage
Environment Variables:
- Name:
OLLAMA_BASE_URL - Value:
http://[your-truenas-ip]:11434
- Click "Install"
Access AnythingLLM
Open web browser from any device on your network:
http://truenas-ip:3001
# Example: http://192.168.1.100:3001
Part 5: Configuring AnythingLLM
Initial Setup
- Access AnythingLLM:
http://truenas-ip:3001 - First-time setup wizard:
- Set admin password
- Skip cloud services (staying offline)
Connect to Ollama
- Click Settings (⚙️ icon)
- LLM Preference:
- Provider:
Ollama - Base URL:
http://truenas-ip:11434 - Model: Select your Dolphin model
- Click "Update"
- Provider:
Troubleshooting Connection
If AnythingLLM can't connect:
# Check if both containers are running
docker ps
# Test Ollama from AnythingLLM container
docker exec anythingllm curl http://truenas-ip:11434/api/tags
# Check network connectivity
docker network ls
docker network inspect bridge
# Verify firewall (if enabled)
# TrueNAS shell:
iptables -L -n | grep 11434
Fix: Use Docker Network
# Create custom network
docker network create ai-network
# Reconnect containers
docker network connect ai-network ollama
docker network connect ai-network anythingllm
# Update AnythingLLM environment
docker exec anythingllm sh -c "export OLLAMA_BASE_URL=http://ollama:11434"
# Or update in compose file:
# OLLAMA_BASE_URL: http://ollama:11434
Part 6: GPU Passthrough (NVIDIA)
Prerequisites
- NVIDIA GPU installed in TrueNAS server
- NVIDIA drivers installed on TrueNAS host
- NVIDIA Container Toolkit installed
Install NVIDIA Container Toolkit
# SSH to TrueNAS
ssh root@truenas.local
# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
tee /etc/apt/sources.list.d/nvidia-docker.list
apt-get update
apt-get install -y nvidia-container-toolkit
# Configure Docker
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
# Test GPU access
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Update Ollama with GPU Support
Docker Compose Method:
# Edit: /mnt/tank/docker/compose/ollama/docker-compose.yml
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- /mnt/tank/docker/ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Direct Docker Method:
# Stop existing container
docker stop ollama
docker rm ollama
# Start with GPU support
docker run -d \
--name ollama \
--gpus all \
--restart unless-stopped \
-p 11434:11434 \
-v /mnt/tank/docker/ollama:/root/.ollama \
ollama/ollama:latest
Verify GPU Usage
# Check GPU in container
docker exec ollama nvidia-smi
# Monitor GPU while running inference
watch -n 1 nvidia-smi
# Run a test
docker exec ollama ollama run dolphin-mistral "Explain quantum computing"
# Watch GPU utilization spike
Part 7: Network Configuration
Static IP Assignment
Ensure TrueNAS has a static IP:
- Network → Interfaces
- Select interface (e.g.,
enp0s3) - Edit:
- Type:
Static - IP Address:
192.168.1.100/24(example) - Gateway:
192.168.1.1
- Type:
- Save and Test Changes
Firewall Configuration
If you have a firewall enabled:
# Allow Ollama port
iptables -A INPUT -p tcp --dport 11434 -j ACCEPT
# Allow AnythingLLM port
iptables -A INPUT -p tcp --dport 3001 -j ACCEPT
# Save rules (persistence varies by system)
iptables-save > /etc/iptables/rules.v4
DNS Configuration (Optional)
Create friendly hostnames:
Option 1: Local DNS Server (Pi-hole, pfSense):
ollama.local → 192.168.1.100
ai.local → 192.168.1.100
Option 2: Hosts File (on client machines):
# Linux/Mac: /etc/hosts
# Windows: C:\Windows\System32\drivers\etc\hosts
192.168.1.100 ollama.local ai.local truenas.local
Access via:
http://ai.local:3001(AnythingLLM)http://ollama.local:11434(Ollama API)
Reverse Proxy (Advanced)
Use nginx or Traefik for HTTPS and custom domains:
# Install nginx in a container
docker run -d \
--name nginx-proxy \
-p 80:80 \
-p 443:443 \
-v /mnt/tank/docker/nginx:/etc/nginx/conf.d \
nginx:alpine
nginx config (/mnt/tank/docker/nginx/ai.conf):
server {
listen 80;
server_name ai.local;
location / {
proxy_pass http://truenas-ip:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}
Access: http://ai.local (no port needed)
Part 8: Data Management and Backups
ZFS Snapshots
Create automated snapshots for protection:
Via GUI:
- Storage → Pools
- Select
docker/ollamadataset - Add Snapshot
- Create snapshot task:
- Dataset:
tank/docker/ollama - Recursive: Yes
- Schedule: Daily at 2 AM
- Retention: 7 daily, 4 weekly
- Dataset:
Via Shell:
# Manual snapshot
zfs snapshot tank/docker/ollama@pre-update-$(date +%Y%m%d)
zfs snapshot tank/docker/anythingllm@pre-update-$(date +%Y%m%d)
# List snapshots
zfs list -t snapshot | grep docker
# Restore from snapshot
zfs rollback tank/docker/ollama@pre-update-20250501
Automated Snapshot Script
# Create snapshot script
cat > /mnt/tank/scripts/backup-ai-stack.sh <<'EOF'
#!/bin/bash
DATE=$(date +%Y%m%d-%H%M%S)
# Create snapshots
zfs snapshot tank/docker/ollama@backup-$DATE
zfs snapshot tank/docker/anythingllm@backup-$DATE
# Keep only last 14 snapshots
zfs list -t snapshot -o name -s creation | grep "tank/docker/ollama@backup-" | head -n -14 | xargs -n 1 zfs destroy
zfs list -t snapshot -o name -s creation | grep "tank/docker/anythingllm@backup-" | head -n -14 | xargs -n 1 zfs destroy
echo "Backup completed: $DATE"
EOF
chmod +x /mnt/tank/scripts/backup-ai-stack.sh
Schedule with cron:
# Edit crontab
crontab -e
# Add daily backup at 2 AM
0 2 * * * /mnt/tank/scripts/backup-ai-stack.sh >> /var/log/ai-backup.log 2>&1
Export/Import Models
Export models to backup location:
# Copy entire Ollama directory
rsync -av /mnt/tank/docker/ollama/ /mnt/tank/backups/ollama-$(date +%Y%m%d)/
# Or tar/compress
tar -czf /mnt/tank/backups/ollama-backup-$(date +%Y%m%d).tar.gz \
-C /mnt/tank/docker/ollama .
Restore models:
# From tar backup
tar -xzf /mnt/tank/backups/ollama-backup-20250501.tar.gz \
-C /mnt/tank/docker/ollama/
# Restart container
docker restart ollama
Replication to Another TrueNAS
Setup replication task:
- Tasks → Replication Tasks → Add
- Source:
- Datasets:
tank/docker/ollama,tank/docker/anythingllm
- Datasets:
- Destination:
- Remote system:
backup-nas - Dataset:
backup-tank/ai-replicas
- Remote system:
- Schedule: Daily
- Retention: 30 snapshots
Part 9: Performance Optimization
Resource Allocation
Set container resource limits:
# docker-compose.yml
services:
ollama:
image: ollama/ollama:latest
deploy:
resources:
limits:
cpus: '8.0' # Max 8 CPU cores
memory: 16G # Max 16GB RAM
reservations:
cpus: '4.0' # Guaranteed 4 cores
memory: 8G # Guaranteed 8GB RAM
Direct Docker:
docker run -d \
--name ollama \
--cpus="8.0" \
--memory="16g" \
--memory-reservation="8g" \
-p 11434:11434 \
-v /mnt/tank/docker/ollama:/root/.ollama \
ollama/ollama:latest
ZFS Tuning for AI Workloads
# Optimize for large sequential reads (models)
zfs set recordsize=1M tank/docker/ollama
zfs set primarycache=metadata tank/docker/ollama
zfs set secondarycache=all tank/docker/ollama
# Set ARC target for better caching
# Edit /boot/loader.conf (FreeBSD) or /etc/modprobe.d/zfs.conf (Linux)
# vfs.zfs.arc_max="17179869184" # 16GB ARC max
# Apply immediately (Linux)
echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_max
SSD Special vdev for Metadata
If you have SSDs available:
# Add special vdev for metadata (improves model loading)
zpool add tank special mirror \
/dev/disk/by-id/ssd1 \
/dev/disk/by-id/ssd2
# Configure datasets to use special vdev
zfs set special_small_blocks=128K tank/docker/ollama
Network Performance
Enable jumbo frames (if your network supports it):
# Set MTU to 9000
ifconfig enp0s3 mtu 9000
# Make persistent in TrueNAS GUI:
# Network → Interfaces → Edit → MTU: 9000
CPU Governor
# Set CPU to performance mode
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
echo performance > $cpu
done
# Make persistent
cat > /etc/sysfs.conf <<EOF
devices/system/cpu/cpu*/cpufreq/scaling_governor = performance
EOF
Part 10: Multi-User Access
User Management in AnythingLLM
- Settings → Users
- Add New User:
- Username
- Password
- Role: Admin / User
- User Permissions:
- Create workspaces
- Upload documents
- View shared workspaces
Concurrent Usage
Ollama supports multiple simultaneous requests:
# Set environment variables
docker exec -it ollama sh
export OLLAMA_NUM_PARALLEL=4 # Allow 4 concurrent requests
export OLLAMA_MAX_LOADED_MODELS=2 # Keep 2 models in memory
Update docker-compose.yml:
services:
ollama:
environment:
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=2
- OLLAMA_MAX_QUEUE=128
Load Balancing (Advanced)
For high usage, run multiple Ollama instances:
# docker-compose-multi.yml
version: '3.8'
services:
ollama1:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- /mnt/tank/docker/ollama:/root/.ollama
ollama2:
image: ollama/ollama:latest
ports:
- "11435:11434"
volumes:
- /mnt/tank/docker/ollama:/root/.ollama
nginx-lb:
image: nginx:alpine
ports:
- "11430:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
nginx.conf:
upstream ollama {
least_conn;
server ollama1:11434;
server ollama2:11434;
}
server {
listen 80;
location / {
proxy_pass http://ollama;
}
}
Part 11: Monitoring and Maintenance
Container Health Checks
Add to docker-compose.yml:
services:
ollama:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
Monitoring Scripts
Create monitoring script:
cat > /mnt/tank/scripts/monitor-ai-stack.sh <<'EOF'
#!/bin/bash
TELEGRAM_BOT_TOKEN="your-token"
TELEGRAM_CHAT_ID="your-chat-id"
send_alert() {
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d chat_id="${TELEGRAM_CHAT_ID}" \
-d text="$1"
}
# Check Ollama
if ! docker exec ollama curl -f http://localhost:11434/api/tags > /dev/null 2>&1; then
send_alert "⚠️ Ollama is not responding!"
docker restart ollama
fi
# Check AnythingLLM
if ! curl -f http://localhost:3001 > /dev/null 2>&1; then
send_alert "⚠️ AnythingLLM is not responding!"
docker restart anythingllm
fi
# Check disk space
USAGE=$(df -h /mnt/tank/docker | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $USAGE -gt 85 ]; then
send_alert "⚠️ Docker storage is ${USAGE}% full!"
fi
# Check RAM usage
RAM_USAGE=$(free | grep Mem | awk '{print ($3/$2) * 100.0}' | cut -d. -f1)
if [ $RAM_USAGE -gt 90 ]; then
send_alert "⚠️ RAM usage is ${RAM_USAGE}%!"
fi
EOF
chmod +x /mnt/tank/scripts/monitor-ai-stack.sh
# Schedule every 5 minutes
crontab -e
*/5 * * * * /mnt/tank/scripts/monitor-ai-stack.sh
Log Management
# View logs
docker logs ollama
docker logs anythingllm
# Follow logs in real-time
docker logs -f ollama
# Limit log size in docker-compose.yml
services:
ollama:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
Update Procedure
# Create update script
cat > /mnt/tank/scripts/update-ai-stack.sh <<'EOF'
#!/bin/bash
DATE=$(date +%Y%m%d-%H%M%S)
echo "Starting AI stack update: $DATE"
# Create snapshots
echo "Creating snapshots..."
zfs snapshot tank/docker/ollama@pre-update-$DATE
zfs snapshot tank/docker/anythingllm@pre-update-$DATE
# Stop containers
echo "Stopping containers..."
docker stop anythingllm ollama
# Pull new images
echo "Pulling latest images..."
docker pull ollama/ollama:latest
docker pull mintplexlabs/anythingllm:latest
# Restart containers
echo "Starting containers..."
docker start ollama
sleep 10
docker start anythingllm
# Verify health
echo "Checking health..."
sleep 30
if docker exec ollama curl -f http://localhost:11434/api/tags > /dev/null 2>&1; then
echo "✓ Ollama is healthy"
else
echo "✗ Ollama failed health check - rolling back"
zfs rollback tank/docker/ollama@pre-update-$DATE
docker restart ollama
fi
if curl -f http://localhost:3001 > /dev/null 2>&1; then
echo "✓ AnythingLLM is healthy"
else
echo "✗ AnythingLLM failed health check - rolling back"
zfs rollback tank/docker/anythingllm@pre-update-$DATE
docker restart anythingllm
fi
echo "Update complete: $(date)"
EOF
chmod +x /mnt/tank/scripts/update-ai-stack.sh
Part 12: Troubleshooting
Container Won't Start
# Check container status
docker ps -a
# View container logs
docker logs ollama
docker logs anythingllm
# Check permissions
ls -la /mnt/tank/docker/ollama
ls -la /mnt/tank/docker/anythingllm
# Fix permissions if needed
chown -R 568:568 /mnt/tank/docker/ollama
chown -R 568:568 /mnt/tank/docker/anythingllm
# Restart
docker restart ollama anythingllm
Out of Memory
# Check memory usage
free -h
docker stats
# Identify memory hog
docker stats --no-stream --format "table {{.Container}}\t{{.MemUsage}}"
# Increase swap (temporary)
dd if=/dev/zero of=/swapfile bs=1G count=16
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
# Or use smaller models
docker exec ollama ollama pull dolphin-phi
Models Not Loading
# Check model files
ls -lh /mnt/tank/docker/ollama/models/
# Re-download model
docker exec ollama ollama rm dolphin-mistral
docker exec ollama ollama pull dolphin-mistral
# Check disk space
df -h /mnt/tank
# Check ZFS quota
zfs get quota,used tank/docker/ollama
Network Connectivity Issues
# Test from TrueNAS host
curl http://localhost:11434/api/tags
# Test from client
curl http://truenas-ip:11434/api/tags
# Check firewall
iptables -L -n | grep 11434
# Check container network
docker inspect ollama | grep -A 20 NetworkSettings
docker network ls
docker network inspect bridge
# Recreate containers with host network
docker run -d \
--name ollama \
--network host \
-v /mnt/tank/docker/ollama:/root/.ollama \
ollama/ollama:latest
GPU Not Detected
# Check NVIDIA drivers on host
nvidia-smi
# Check inside container
docker exec ollama nvidia-smi
# Verify NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
# Check docker configuration
cat /etc/docker/daemon.json
# Should contain:
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
# Restart docker if needed
systemctl restart docker
Slow Performance
# Check CPU usage
top
htop
# Check I/O wait
iostat -x 1
# Check network latency
ping truenas-ip
# Optimize ZFS ARC
echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_max
# Use smaller model
docker exec ollama ollama pull dolphin-phi
Part 13: Security Considerations
Network Isolation
Restrict access to specific subnet:
# Allow only local network
iptables -A INPUT -p tcp --dport 11434 -s 192.168.1.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 11434 -j DROP
iptables -A INPUT -p tcp --dport 3001 -s 192.168.1.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 3001 -j DROP
VPN-Only Access
# Install WireGuard
apt-get install wireguard
# Configure to only allow VPN clients
iptables -A INPUT -p tcp --dport 3001 -i wg0 -j ACCEPT
iptables -A INPUT -p tcp --dport 3001 -j DROP
HTTPS/SSL (with Nginx)
# Generate self-signed certificate
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /mnt/tank/docker/nginx/ssl/selfsigned.key \
-out /mnt/tank/docker/nginx/ssl/selfsigned.crt
# Nginx SSL config
cat > /mnt/tank/docker/nginx/ai-ssl.conf <<'EOF'
server {
listen 443 ssl;
server_name ai.local;
ssl_certificate /etc/nginx/ssl/selfsigned.crt;
ssl_certificate_key /etc/nginx/ssl/selfsigned.key;
location / {
proxy_pass http://truenas-ip:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
}
}
EOF
# Access via: https://ai.local
Container Isolation
# Run containers with read-only root filesystem
docker run -d \
--name ollama \
--read-only \
--tmpfs /tmp \
-v /mnt/tank/docker/ollama:/root/.ollama \
-p 11434:11434 \
ollama/ollama:latest
# Use non-root user (if image supports it)
docker run -d \
--name anythingllm \
--user 1000:1000 \
-v /mnt/tank/docker/anythingllm:/app/server/storage \
-p 3001:3001 \
mintplexlabs/anythingllm:latest
Audit Logging
# Enable Docker logging to syslog
cat > /etc/docker/daemon.json <<'EOF'
{
"log-driver": "syslog",
"log-opts": {
"syslog-address": "tcp://localhost:514",
"tag": "{{.Name}}"
}
}
EOF
systemctl restart docker
Part 14: Advanced Use Cases
Multiple Model Instances
Run different models for different purposes:
# docker-compose-multi-model.yml
version: '3.8'
services:
ollama-fast:
image: ollama/ollama:latest
container_name: ollama-fast
ports:
- "11434:11434"
volumes:
- /mnt/tank/docker/ollama-fast:/root/.ollama
# Load dolphin-phi for quick responses
ollama-quality:
image: ollama/ollama:latest
container_name: ollama-quality
ports:
- "11435:11434"
volumes:
- /mnt/tank/docker/ollama-quality:/root/.ollama
# Load dolphin-mixtral for best responses
Integration with Home Assistant
# Home Assistant configuration.yaml
rest_command:
ask_ai:
url: "http://truenas-ip:11434/api/generate"
method: POST
payload: '{"model":"dolphin-mistral","prompt":"{{ prompt }}","stream":false}'
content_type: "application/json"
# Example automation
automation:
- alias: "AI Daily Briefing"
trigger:
- platform: time
at: "07:00:00"
action:
- service: rest_command.ask_ai
data:
prompt: "Summarize today's weather and provide a motivational quote"
API Integration Examples
Python client:
import requests
def query_ollama(prompt, model="dolphin-mistral"):
url = "http://truenas-ip:11434/api/generate"
data = {
"model": model,
"prompt": prompt,
"stream": False
}
response = requests.post(url, json=data)
return response.json()["response"]
# Usage
answer = query_ollama("Explain Docker networking")
print(answer)
curl examples:
# Simple query
curl http://truenas-ip:11434/api/generate -d '{
"model": "dolphin-mistral",
"prompt": "Explain ZFS snapshots",
"stream": false
}'
# Streaming response
curl http://truenas-ip:11434/api/generate -d '{
"model": "dolphin-mistral",
"prompt": "Write a poem about TrueNAS",
"stream": true
}'
Part 15: Best Practices Summary
Storage Best Practices
✅ Use dedicated datasets for Docker volumes ✅ Enable compression (LZ4 or LZ4HC) ✅ Set appropriate recordsize (1M for large files) ✅ Create regular snapshots (hourly/daily) ✅ Set up replication to backup NAS ✅ Monitor disk space usage ✅ Keep 20% free space for optimal performance
Performance Best Practices
✅ Allocate sufficient RAM (2GB per concurrent user) ✅ Use SSD storage for models if possible ✅ Enable GPU acceleration when available ✅ Tune ZFS ARC for model caching ✅ Monitor CPU usage and throttling ✅ Use appropriate model size for workload ✅ Limit concurrent users based on resources
Security Best Practices
✅ Restrict network access to trusted subnets ✅ Use VPN for remote access ✅ Enable HTTPS with SSL certificates ✅ Keep Docker and images updated ✅ Use container resource limits ✅ Implement user authentication ✅ Regular security audits and log reviews ✅ Backup encryption keys securely
Maintenance Best Practices
✅ Weekly: Review logs for errors ✅ Monthly: Update container images ✅ Monthly: Check model updates ✅ Monthly: Verify backup integrity ✅ Quarterly: Review and prune old snapshots ✅ Quarterly: Performance optimization review ✅ Annually: Hardware assessment and upgrades
Conclusion
You now have a production-ready LLM system running on TrueNAS that provides:
✅ Centralized AI access for all network devices ✅ Data protection with ZFS snapshots ✅ High performance with GPU acceleration ✅ Complete privacy - all data stays local ✅ Scalable - supports multiple concurrent users ✅ Reliable - automatic restarts and health checks ✅ Maintainable - easy updates and monitoring
Quick Reference Commands
# Container management
docker ps # List running containers
docker logs -f ollama # View Ollama logs
docker restart ollama anythingllm # Restart services
docker exec -it ollama bash # Enter Ollama container
# Model management
docker exec ollama ollama list # List models
docker exec ollama ollama pull [model] # Download model
docker exec ollama ollama rm [model] # Remove model
# Maintenance
zfs snapshot tank/docker/ollama@backup # Create snapshot
zfs list -t snapshot # List snapshots
docker system prune -a # Clean unused images
# Monitoring
docker stats # Resource usage
nvidia-smi # GPU usage
zpool status # Storage health
Next Steps
- Test from multiple devices - Verify network access
- Set up automated backups - Protect your configuration
- Tune performance - Optimize for your workload
- Create monitoring alerts - Stay informed of issues
- Document your setup - Custom configs and procedures
- Experiment with models - Find the best fit for your needs
Additional Resources
- TrueNAS Documentation: https://www.truenas.com/docs/
- Ollama GitHub: https://github.com/ollama/ollama
- AnythingLLM Docs: https://docs.anythingllm.com
- Docker Compose: https://docs.docker.com/compose/
- ZFS Best Practices: https://openzfs.org/
This guide was created for Pacific Northwest Computers. For professional TrueNAS setup, configuration assistance, or enterprise AI deployment services, contact us for a consultation.
Document Version: 1.0
Last Updated: November 2025
Tested On: TrueNAS SCALE 24.04 (Dragonfish)
Frequently Asked Questions
Q: Can I run this on TrueNAS CORE?
A: Not recommended. CORE lacks native Docker support. You'd need to set up a Linux VM, which adds complexity. Migrate to SCALE for better Docker integration.
Q: How much RAM do I really need?
A: Minimum 16GB total (8GB for small models). Recommend 32GB+ for multiple users or larger models. Each concurrent user needs ~2-4GB.
Q: Will this slow down my TrueNAS?
A: With proper resource allocation, no. Set CPU/RAM limits on containers. GPU acceleration keeps CPU usage low during inference.
Q: Can I expose this to the internet?
A: Not recommended unless behind VPN. Use WireGuard or Tailscale for secure remote access. Never expose ports 11434/3001 directly.
Q: What if my TrueNAS reboots?
A: Containers restart automatically (with --restart unless-stopped flag). Models persist on ZFS. No data loss.
Q: Can I use this commercially?
A: Yes! Dolphin models are open-source and commercially usable. Check specific model licenses on HuggingFace.
Q: How do I upgrade models?
A: Simply pull the new version: docker exec ollama ollama pull dolphin-mistral. Old version auto-replaced.
Q: What about power consumption?
A: Idle: ~50W. Under load: 150-300W (CPU-only), 200-400W (with GPU). Use CPU governor to manage.
Q: Can multiple users use simultaneously?
A: Yes! Ollama supports concurrent requests. Set OLLAMA_NUM_PARALLEL=4 for up to 4 simultaneous users.
Q: Does this work with WiFi clients?
A: Yes, as long as they're on the same network. Performance depends on WiFi speed. Wired connection recommended for large document uploads.
Ready to revolutionize your home lab with local AI? Get started today! 🚀


No comments:
Post a Comment