Tuesday, November 11, 2025

Offline AI LLM System (TrueNAS Hosted)

 


TrueNAS Hosted Offline LLM:
Dolphin, Ollama and AnythingLLM

Introduction

Running AI models on your TrueNAS server provides centralized access to powerful language models for your entire network. This guide walks through deploying Ollama with Dolphin models and AnythingLLM on TrueNAS, making AI assistance available to all your devices while maintaining complete data privacy.

By the end of this guide, you'll have:

  • Ollama running as a Docker container on TrueNAS
  • Dolphin models stored efficiently on your ZFS pool
  • AnythingLLM accessible via web browser from any device on your network
  • Persistent storage that survives container restarts
  • Optional GPU passthrough for accelerated inference

Why Run LLMs on TrueNAS?

Centralized Access: One server, multiple client devices accessing AI Efficient Storage: ZFS handles large model files with deduplication and compression Always Available: Server runs 24/7, models ready when you need them Resource Pooling: Leverage server-grade hardware for better performance Data Protection: ZFS snapshots protect your AI configurations and chat history Network Integration: Access from desktops, laptops, phones, tablets

Prerequisites

TrueNAS Requirements

  • TrueNAS Version: SCALE 22.12+ (Dragonfish or later recommended)
  • Why SCALE: Native Docker support via Apps/Kubernetes
  • TrueNAS CORE: Not recommended - requires more complex VM setup

Hardware Requirements

Minimum Server Specs:

  • CPU: 4+ cores with AVX support
  • RAM: 16GB total (8GB for models + 8GB for TrueNAS/other services)
  • Storage: 100GB free space for models and data
  • Network: Gigabit Ethernet

Recommended Server Specs:

  • CPU: 8+ cores, modern Intel/AMD
  • RAM: 32GB+ (allows multiple simultaneous users)
  • Storage: 500GB+ SSD pool for AI workloads
  • GPU: NVIDIA GPU with 8GB+ VRAM (optional, significant speedup)
  • Network: 2.5GbE or 10GbE for faster model loading

Storage Planning

Dataset Structure:

tank/
├── docker/              # Docker container data
│   ├── ollama/         # Ollama models and config
│   └── anythingllm/    # AnythingLLM data
└── backups/
    └── ai-stack/       # Backup snapshots

Space Requirements:

  • Ollama base: 500MB
  • Dolphin-Phi (2.7B): ~1.6GB
  • Dolphin-Mistral (7B): ~4.1GB
  • Dolphin-Mixtral (8x7B): ~26GB
  • AnythingLLM: ~2GB + documents
  • Recommended: 100GB+ for flexibility

Part 1: Preparing TrueNAS

Create Datasets

Navigate to Storage → Pools and create dedicated datasets:

1. Create parent dataset:
   - Pool: [your-pool]
   - Name: docker
   - Record Size: 1M (good for large files)
   - Compression: LZ4 (balanced)
   - Enable: Deduplication OFF (not needed for models)

2. Create Ollama dataset:
   - Parent: docker
   - Name: ollama
   - Mount Point: /mnt/[pool]/docker/ollama
   
3. Create AnythingLLM dataset:
   - Parent: docker
   - Name: anythingllm
   - Mount Point: /mnt/[pool]/docker/anythingllm

Set Permissions

# SSH into TrueNAS as root
ssh root@truenas.local

# Set ownership for Docker access
chown -R 568:568 /mnt/tank/docker/ollama
chown -R 568:568 /mnt/tank/docker/anythingllm

# Set permissions
chmod -R 755 /mnt/tank/docker/ollama
chmod -R 755 /mnt/tank/docker/anythingllm

Enable Docker/Apps

  1. Navigate to Apps
  2. If first time, configure Apps pool:
    • Click "Choose Pool"
    • Select your pool
    • Click "Choose"
  3. Wait for Apps system to initialize

Part 2: Installing Ollama via Docker

Method 1: TrueNAS Apps GUI (Easier)

Unfortunately, Ollama isn't in the official TruCharts catalog yet. We'll use Custom App:

  1. Navigate to Apps → Discover Apps
  2. Click "Custom App" (top right)
  3. Configure Application:

Application Name: ollama

Image Configuration:

  • Image Repository: ollama/ollama
  • Image Tag: latest
  • Pull Policy: IfNotPresent

Container Configuration:

  • Container Port: 11434
  • Node Port: 11434
  • Protocol: TCP

Storage:

  • Type: Host Path
  • Host Path: /mnt/tank/docker/ollama
  • Mount Path: /root/.ollama

Advanced:

  • Restart Policy: Always
  1. Click "Install"

Method 2: Docker Compose (Advanced, More Control)

Create a docker-compose file:

# SSH into TrueNAS
ssh root@truenas.local

# Create compose directory
mkdir -p /mnt/tank/docker/compose/ollama
cd /mnt/tank/docker/compose/ollama

# Create docker-compose.yml
cat > docker-compose.yml <<'EOF'
version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - /mnt/tank/docker/ollama:/root/.ollama
    environment:
      - OLLAMA_MODELS=/root/.ollama/models
      - OLLAMA_HOST=0.0.0.0:11434
    # Uncomment for GPU support (NVIDIA)
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]
EOF

# Start the container
docker compose up -d

# Check status
docker compose ps
docker logs ollama

Method 3: Using TrueNAS Shell

# SSH into TrueNAS
ssh root@truenas.local

# Run Ollama container directly
docker run -d \
  --name ollama \
  --restart unless-stopped \
  -p 11434:11434 \
  -v /mnt/tank/docker/ollama:/root/.ollama \
  -e OLLAMA_HOST=0.0.0.0:11434 \
  ollama/ollama:latest

# Verify it's running
docker ps | grep ollama

Verify Ollama Installation

# From TrueNAS shell
curl http://localhost:11434/api/tags

# From another computer on your network
curl http://truenas-ip:11434/api/tags

# Should return: {"models":[]}

Part 3: Installing Dolphin Models

Access the Ollama Container

# SSH into TrueNAS
ssh root@truenas.local

# Enter the Ollama container
docker exec -it ollama bash

# Now you're inside the container

Download Models

# Inside the Ollama container

# Option 1: Dolphin-Mistral 7B (Recommended)
ollama pull dolphin-mistral

# Option 2: Dolphin-Mixtral (Better quality, more RAM)
ollama pull dolphin-mixtral

# Option 3: Dolphin-Phi (Faster, smaller)
ollama pull dolphin-phi

# Option 4: Latest Dolphin 2.5
ollama pull dolphin2.5-mixtral

# List installed models
ollama list

# Exit container
exit

Alternative: Pull from TrueNAS Host

# Run ollama commands directly
docker exec ollama ollama pull dolphin-mistral
docker exec ollama ollama list

Verify Model Storage

# Check dataset usage
zfs list | grep docker/ollama

# View model files
ls -lh /mnt/tank/docker/ollama/models/

Test Model

# Quick test
docker exec -it ollama ollama run dolphin-mistral "Say hello"

# Should return a response

Part 4: Installing AnythingLLM

Method 1: Docker Compose (Recommended)

# Create compose directory
mkdir -p /mnt/tank/docker/compose/anythingllm
cd /mnt/tank/docker/compose/anythingllm

# Create docker-compose.yml
cat > docker-compose.yml <<'EOF'
version: '3.8'

services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    container_name: anythingllm
    restart: unless-stopped
    ports:
      - "3001:3001"
    volumes:
      - /mnt/tank/docker/anythingllm:/app/server/storage
    environment:
      - STORAGE_DIR=/app/server/storage
      - OLLAMA_BASE_URL=http://truenas-ip:11434
    depends_on:
      - ollama
EOF

# Replace truenas-ip with your actual IP
# Example: 192.168.1.100

# Start AnythingLLM
docker compose up -d

# Check logs
docker logs -f anythingllm

Method 2: Direct Docker Run

# Replace with your TrueNAS IP
docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  -v /mnt/tank/docker/anythingllm:/app/server/storage \
  -e OLLAMA_BASE_URL=http://192.168.1.100:11434 \
  mintplexlabs/anythingllm:latest

Method 3: TrueNAS Custom App

  1. Apps → Discover Apps → Custom App

Application Name: anythingllm

Image Configuration:

  • Image Repository: mintplexlabs/anythingllm
  • Image Tag: latest

Container Configuration:

  • Container Port: 3001
  • Node Port: 3001
  • Protocol: TCP

Storage:

  • Type: Host Path
  • Host Path: /mnt/tank/docker/anythingllm
  • Mount Path: /app/server/storage

Environment Variables:

  • Name: OLLAMA_BASE_URL
  • Value: http://[your-truenas-ip]:11434
  1. Click "Install"

Access AnythingLLM

Open web browser from any device on your network:

http://truenas-ip:3001
# Example: http://192.168.1.100:3001

Part 5: Configuring AnythingLLM

Initial Setup

  1. Access AnythingLLM: http://truenas-ip:3001
  2. First-time setup wizard:
    • Set admin password
    • Skip cloud services (staying offline)

Connect to Ollama

  1. Click Settings (⚙️ icon)
  2. LLM Preference:
    • Provider: Ollama
    • Base URL: http://truenas-ip:11434
    • Model: Select your Dolphin model
    • Click "Update"

Troubleshooting Connection

If AnythingLLM can't connect:

# Check if both containers are running
docker ps

# Test Ollama from AnythingLLM container
docker exec anythingllm curl http://truenas-ip:11434/api/tags

# Check network connectivity
docker network ls
docker network inspect bridge

# Verify firewall (if enabled)
# TrueNAS shell:
iptables -L -n | grep 11434

Fix: Use Docker Network

# Create custom network
docker network create ai-network

# Reconnect containers
docker network connect ai-network ollama
docker network connect ai-network anythingllm

# Update AnythingLLM environment
docker exec anythingllm sh -c "export OLLAMA_BASE_URL=http://ollama:11434"

# Or update in compose file:
# OLLAMA_BASE_URL: http://ollama:11434

Part 6: GPU Passthrough (NVIDIA)

Prerequisites

  • NVIDIA GPU installed in TrueNAS server
  • NVIDIA drivers installed on TrueNAS host
  • NVIDIA Container Toolkit installed

Install NVIDIA Container Toolkit

# SSH to TrueNAS
ssh root@truenas.local

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  tee /etc/apt/sources.list.d/nvidia-docker.list

apt-get update
apt-get install -y nvidia-container-toolkit

# Configure Docker
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker

# Test GPU access
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Update Ollama with GPU Support

Docker Compose Method:

# Edit: /mnt/tank/docker/compose/ollama/docker-compose.yml
version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - /mnt/tank/docker/ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Direct Docker Method:

# Stop existing container
docker stop ollama
docker rm ollama

# Start with GPU support
docker run -d \
  --name ollama \
  --gpus all \
  --restart unless-stopped \
  -p 11434:11434 \
  -v /mnt/tank/docker/ollama:/root/.ollama \
  ollama/ollama:latest

Verify GPU Usage

# Check GPU in container
docker exec ollama nvidia-smi

# Monitor GPU while running inference
watch -n 1 nvidia-smi

# Run a test
docker exec ollama ollama run dolphin-mistral "Explain quantum computing"
# Watch GPU utilization spike

Part 7: Network Configuration

Static IP Assignment

Ensure TrueNAS has a static IP:

  1. Network → Interfaces
  2. Select interface (e.g., enp0s3)
  3. Edit:
    • Type: Static
    • IP Address: 192.168.1.100/24 (example)
    • Gateway: 192.168.1.1
  4. Save and Test Changes

Firewall Configuration

If you have a firewall enabled:

# Allow Ollama port
iptables -A INPUT -p tcp --dport 11434 -j ACCEPT

# Allow AnythingLLM port
iptables -A INPUT -p tcp --dport 3001 -j ACCEPT

# Save rules (persistence varies by system)
iptables-save > /etc/iptables/rules.v4

DNS Configuration (Optional)

Create friendly hostnames:

Option 1: Local DNS Server (Pi-hole, pfSense):

ollama.local    → 192.168.1.100
ai.local        → 192.168.1.100

Option 2: Hosts File (on client machines):

# Linux/Mac: /etc/hosts
# Windows: C:\Windows\System32\drivers\etc\hosts

192.168.1.100    ollama.local ai.local truenas.local

Access via:

  • http://ai.local:3001 (AnythingLLM)
  • http://ollama.local:11434 (Ollama API)

Reverse Proxy (Advanced)

Use nginx or Traefik for HTTPS and custom domains:

# Install nginx in a container
docker run -d \
  --name nginx-proxy \
  -p 80:80 \
  -p 443:443 \
  -v /mnt/tank/docker/nginx:/etc/nginx/conf.d \
  nginx:alpine

nginx config (/mnt/tank/docker/nginx/ai.conf):

server {
    listen 80;
    server_name ai.local;
    
    location / {
        proxy_pass http://truenas-ip:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
}

Access: http://ai.local (no port needed)

Part 8: Data Management and Backups

ZFS Snapshots

Create automated snapshots for protection:

Via GUI:

  1. Storage → Pools
  2. Select docker/ollama dataset
  3. Add Snapshot
  4. Create snapshot task:
    • Dataset: tank/docker/ollama
    • Recursive: Yes
    • Schedule: Daily at 2 AM
    • Retention: 7 daily, 4 weekly

Via Shell:

# Manual snapshot
zfs snapshot tank/docker/ollama@pre-update-$(date +%Y%m%d)
zfs snapshot tank/docker/anythingllm@pre-update-$(date +%Y%m%d)

# List snapshots
zfs list -t snapshot | grep docker

# Restore from snapshot
zfs rollback tank/docker/ollama@pre-update-20250501

Automated Snapshot Script

# Create snapshot script
cat > /mnt/tank/scripts/backup-ai-stack.sh <<'EOF'
#!/bin/bash

DATE=$(date +%Y%m%d-%H%M%S)

# Create snapshots
zfs snapshot tank/docker/ollama@backup-$DATE
zfs snapshot tank/docker/anythingllm@backup-$DATE

# Keep only last 14 snapshots
zfs list -t snapshot -o name -s creation | grep "tank/docker/ollama@backup-" | head -n -14 | xargs -n 1 zfs destroy
zfs list -t snapshot -o name -s creation | grep "tank/docker/anythingllm@backup-" | head -n -14 | xargs -n 1 zfs destroy

echo "Backup completed: $DATE"
EOF

chmod +x /mnt/tank/scripts/backup-ai-stack.sh

Schedule with cron:

# Edit crontab
crontab -e

# Add daily backup at 2 AM
0 2 * * * /mnt/tank/scripts/backup-ai-stack.sh >> /var/log/ai-backup.log 2>&1

Export/Import Models

Export models to backup location:

# Copy entire Ollama directory
rsync -av /mnt/tank/docker/ollama/ /mnt/tank/backups/ollama-$(date +%Y%m%d)/

# Or tar/compress
tar -czf /mnt/tank/backups/ollama-backup-$(date +%Y%m%d).tar.gz \
  -C /mnt/tank/docker/ollama .

Restore models:

# From tar backup
tar -xzf /mnt/tank/backups/ollama-backup-20250501.tar.gz \
  -C /mnt/tank/docker/ollama/

# Restart container
docker restart ollama

Replication to Another TrueNAS

Setup replication task:

  1. Tasks → Replication Tasks → Add
  2. Source:
    • Datasets: tank/docker/ollama, tank/docker/anythingllm
  3. Destination:
    • Remote system: backup-nas
    • Dataset: backup-tank/ai-replicas
  4. Schedule: Daily
  5. Retention: 30 snapshots

Part 9: Performance Optimization

Resource Allocation

Set container resource limits:

# docker-compose.yml
services:
  ollama:
    image: ollama/ollama:latest
    deploy:
      resources:
        limits:
          cpus: '8.0'        # Max 8 CPU cores
          memory: 16G        # Max 16GB RAM
        reservations:
          cpus: '4.0'        # Guaranteed 4 cores
          memory: 8G         # Guaranteed 8GB RAM

Direct Docker:

docker run -d \
  --name ollama \
  --cpus="8.0" \
  --memory="16g" \
  --memory-reservation="8g" \
  -p 11434:11434 \
  -v /mnt/tank/docker/ollama:/root/.ollama \
  ollama/ollama:latest

ZFS Tuning for AI Workloads

# Optimize for large sequential reads (models)
zfs set recordsize=1M tank/docker/ollama
zfs set primarycache=metadata tank/docker/ollama
zfs set secondarycache=all tank/docker/ollama

# Set ARC target for better caching
# Edit /boot/loader.conf (FreeBSD) or /etc/modprobe.d/zfs.conf (Linux)
# vfs.zfs.arc_max="17179869184"  # 16GB ARC max

# Apply immediately (Linux)
echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_max

SSD Special vdev for Metadata

If you have SSDs available:

# Add special vdev for metadata (improves model loading)
zpool add tank special mirror \
  /dev/disk/by-id/ssd1 \
  /dev/disk/by-id/ssd2

# Configure datasets to use special vdev
zfs set special_small_blocks=128K tank/docker/ollama

Network Performance

Enable jumbo frames (if your network supports it):

# Set MTU to 9000
ifconfig enp0s3 mtu 9000

# Make persistent in TrueNAS GUI:
# Network → Interfaces → Edit → MTU: 9000

CPU Governor

# Set CPU to performance mode
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
    echo performance > $cpu
done

# Make persistent
cat > /etc/sysfs.conf <<EOF
devices/system/cpu/cpu*/cpufreq/scaling_governor = performance
EOF

Part 10: Multi-User Access

User Management in AnythingLLM

  1. Settings → Users
  2. Add New User:
    • Username
    • Password
    • Role: Admin / User
  3. User Permissions:
    • Create workspaces
    • Upload documents
    • View shared workspaces

Concurrent Usage

Ollama supports multiple simultaneous requests:

# Set environment variables
docker exec -it ollama sh
export OLLAMA_NUM_PARALLEL=4      # Allow 4 concurrent requests
export OLLAMA_MAX_LOADED_MODELS=2 # Keep 2 models in memory

Update docker-compose.yml:

services:
  ollama:
    environment:
      - OLLAMA_NUM_PARALLEL=4
      - OLLAMA_MAX_LOADED_MODELS=2
      - OLLAMA_MAX_QUEUE=128

Load Balancing (Advanced)

For high usage, run multiple Ollama instances:

# docker-compose-multi.yml
version: '3.8'

services:
  ollama1:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - /mnt/tank/docker/ollama:/root/.ollama

  ollama2:
    image: ollama/ollama:latest
    ports:
      - "11435:11434"
    volumes:
      - /mnt/tank/docker/ollama:/root/.ollama

  nginx-lb:
    image: nginx:alpine
    ports:
      - "11430:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro

nginx.conf:

upstream ollama {
    least_conn;
    server ollama1:11434;
    server ollama2:11434;
}

server {
    listen 80;
    location / {
        proxy_pass http://ollama;
    }
}

Part 11: Monitoring and Maintenance

Container Health Checks

Add to docker-compose.yml:

services:
  ollama:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

Monitoring Scripts

Create monitoring script:

cat > /mnt/tank/scripts/monitor-ai-stack.sh <<'EOF'
#!/bin/bash

TELEGRAM_BOT_TOKEN="your-token"
TELEGRAM_CHAT_ID="your-chat-id"

send_alert() {
    curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
        -d chat_id="${TELEGRAM_CHAT_ID}" \
        -d text="$1"
}

# Check Ollama
if ! docker exec ollama curl -f http://localhost:11434/api/tags > /dev/null 2>&1; then
    send_alert "⚠️ Ollama is not responding!"
    docker restart ollama
fi

# Check AnythingLLM
if ! curl -f http://localhost:3001 > /dev/null 2>&1; then
    send_alert "⚠️ AnythingLLM is not responding!"
    docker restart anythingllm
fi

# Check disk space
USAGE=$(df -h /mnt/tank/docker | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $USAGE -gt 85 ]; then
    send_alert "⚠️ Docker storage is ${USAGE}% full!"
fi

# Check RAM usage
RAM_USAGE=$(free | grep Mem | awk '{print ($3/$2) * 100.0}' | cut -d. -f1)
if [ $RAM_USAGE -gt 90 ]; then
    send_alert "⚠️ RAM usage is ${RAM_USAGE}%!"
fi
EOF

chmod +x /mnt/tank/scripts/monitor-ai-stack.sh

# Schedule every 5 minutes
crontab -e
*/5 * * * * /mnt/tank/scripts/monitor-ai-stack.sh

Log Management

# View logs
docker logs ollama
docker logs anythingllm

# Follow logs in real-time
docker logs -f ollama

# Limit log size in docker-compose.yml
services:
  ollama:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

Update Procedure

# Create update script
cat > /mnt/tank/scripts/update-ai-stack.sh <<'EOF'
#!/bin/bash

DATE=$(date +%Y%m%d-%H%M%S)
echo "Starting AI stack update: $DATE"

# Create snapshots
echo "Creating snapshots..."
zfs snapshot tank/docker/ollama@pre-update-$DATE
zfs snapshot tank/docker/anythingllm@pre-update-$DATE

# Stop containers
echo "Stopping containers..."
docker stop anythingllm ollama

# Pull new images
echo "Pulling latest images..."
docker pull ollama/ollama:latest
docker pull mintplexlabs/anythingllm:latest

# Restart containers
echo "Starting containers..."
docker start ollama
sleep 10
docker start anythingllm

# Verify health
echo "Checking health..."
sleep 30
if docker exec ollama curl -f http://localhost:11434/api/tags > /dev/null 2>&1; then
    echo "✓ Ollama is healthy"
else
    echo "✗ Ollama failed health check - rolling back"
    zfs rollback tank/docker/ollama@pre-update-$DATE
    docker restart ollama
fi

if curl -f http://localhost:3001 > /dev/null 2>&1; then
    echo "✓ AnythingLLM is healthy"
else
    echo "✗ AnythingLLM failed health check - rolling back"
    zfs rollback tank/docker/anythingllm@pre-update-$DATE
    docker restart anythingllm
fi

echo "Update complete: $(date)"
EOF

chmod +x /mnt/tank/scripts/update-ai-stack.sh

Part 12: Troubleshooting

Container Won't Start

# Check container status
docker ps -a

# View container logs
docker logs ollama
docker logs anythingllm

# Check permissions
ls -la /mnt/tank/docker/ollama
ls -la /mnt/tank/docker/anythingllm

# Fix permissions if needed
chown -R 568:568 /mnt/tank/docker/ollama
chown -R 568:568 /mnt/tank/docker/anythingllm

# Restart
docker restart ollama anythingllm

Out of Memory

# Check memory usage
free -h
docker stats

# Identify memory hog
docker stats --no-stream --format "table {{.Container}}\t{{.MemUsage}}"

# Increase swap (temporary)
dd if=/dev/zero of=/swapfile bs=1G count=16
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile

# Or use smaller models
docker exec ollama ollama pull dolphin-phi

Models Not Loading

# Check model files
ls -lh /mnt/tank/docker/ollama/models/

# Re-download model
docker exec ollama ollama rm dolphin-mistral
docker exec ollama ollama pull dolphin-mistral

# Check disk space
df -h /mnt/tank

# Check ZFS quota
zfs get quota,used tank/docker/ollama

Network Connectivity Issues

# Test from TrueNAS host
curl http://localhost:11434/api/tags

# Test from client
curl http://truenas-ip:11434/api/tags

# Check firewall
iptables -L -n | grep 11434

# Check container network
docker inspect ollama | grep -A 20 NetworkSettings
docker network ls
docker network inspect bridge

# Recreate containers with host network
docker run -d \
  --name ollama \
  --network host \
  -v /mnt/tank/docker/ollama:/root/.ollama \
  ollama/ollama:latest

GPU Not Detected

# Check NVIDIA drivers on host
nvidia-smi

# Check inside container
docker exec ollama nvidia-smi

# Verify NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

# Check docker configuration
cat /etc/docker/daemon.json

# Should contain:
{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

# Restart docker if needed
systemctl restart docker

Slow Performance

# Check CPU usage
top
htop

# Check I/O wait
iostat -x 1

# Check network latency
ping truenas-ip

# Optimize ZFS ARC
echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_max

# Use smaller model
docker exec ollama ollama pull dolphin-phi

Part 13: Security Considerations

Network Isolation

Restrict access to specific subnet:

# Allow only local network
iptables -A INPUT -p tcp --dport 11434 -s 192.168.1.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 11434 -j DROP

iptables -A INPUT -p tcp --dport 3001 -s 192.168.1.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 3001 -j DROP

VPN-Only Access

# Install WireGuard
apt-get install wireguard

# Configure to only allow VPN clients
iptables -A INPUT -p tcp --dport 3001 -i wg0 -j ACCEPT
iptables -A INPUT -p tcp --dport 3001 -j DROP

HTTPS/SSL (with Nginx)

# Generate self-signed certificate
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout /mnt/tank/docker/nginx/ssl/selfsigned.key \
  -out /mnt/tank/docker/nginx/ssl/selfsigned.crt

# Nginx SSL config
cat > /mnt/tank/docker/nginx/ai-ssl.conf <<'EOF'
server {
    listen 443 ssl;
    server_name ai.local;

    ssl_certificate /etc/nginx/ssl/selfsigned.crt;
    ssl_certificate_key /etc/nginx/ssl/selfsigned.key;

    location / {
        proxy_pass http://truenas-ip:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
    }
}
EOF

# Access via: https://ai.local

Container Isolation

# Run containers with read-only root filesystem
docker run -d \
  --name ollama \
  --read-only \
  --tmpfs /tmp \
  -v /mnt/tank/docker/ollama:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama:latest

# Use non-root user (if image supports it)
docker run -d \
  --name anythingllm \
  --user 1000:1000 \
  -v /mnt/tank/docker/anythingllm:/app/server/storage \
  -p 3001:3001 \
  mintplexlabs/anythingllm:latest

Audit Logging

# Enable Docker logging to syslog
cat > /etc/docker/daemon.json <<'EOF'
{
  "log-driver": "syslog",
  "log-opts": {
    "syslog-address": "tcp://localhost:514",
    "tag": "{{.Name}}"
  }
}
EOF

systemctl restart docker

Part 14: Advanced Use Cases

Multiple Model Instances

Run different models for different purposes:

# docker-compose-multi-model.yml
version: '3.8'

services:
  ollama-fast:
    image: ollama/ollama:latest
    container_name: ollama-fast
    ports:
      - "11434:11434"
    volumes:
      - /mnt/tank/docker/ollama-fast:/root/.ollama
    # Load dolphin-phi for quick responses

  ollama-quality:
    image: ollama/ollama:latest
    container_name: ollama-quality
    ports:
      - "11435:11434"
    volumes:
      - /mnt/tank/docker/ollama-quality:/root/.ollama
    # Load dolphin-mixtral for best responses

Integration with Home Assistant

# Home Assistant configuration.yaml
rest_command:
  ask_ai:
    url: "http://truenas-ip:11434/api/generate"
    method: POST
    payload: '{"model":"dolphin-mistral","prompt":"{{ prompt }}","stream":false}'
    content_type: "application/json"

# Example automation
automation:
  - alias: "AI Daily Briefing"
    trigger:
      - platform: time
        at: "07:00:00"
    action:
      - service: rest_command.ask_ai
        data:
          prompt: "Summarize today's weather and provide a motivational quote"

API Integration Examples

Python client:

import requests

def query_ollama(prompt, model="dolphin-mistral"):
    url = "http://truenas-ip:11434/api/generate"
    data = {
        "model": model,
        "prompt": prompt,
        "stream": False
    }
    response = requests.post(url, json=data)
    return response.json()["response"]

# Usage
answer = query_ollama("Explain Docker networking")
print(answer)

curl examples:

# Simple query
curl http://truenas-ip:11434/api/generate -d '{
  "model": "dolphin-mistral",
  "prompt": "Explain ZFS snapshots",
  "stream": false
}'

# Streaming response
curl http://truenas-ip:11434/api/generate -d '{
  "model": "dolphin-mistral",
  "prompt": "Write a poem about TrueNAS",
  "stream": true
}'

Part 15: Best Practices Summary

Storage Best Practices

✅ Use dedicated datasets for Docker volumes ✅ Enable compression (LZ4 or LZ4HC) ✅ Set appropriate recordsize (1M for large files) ✅ Create regular snapshots (hourly/daily) ✅ Set up replication to backup NAS ✅ Monitor disk space usage ✅ Keep 20% free space for optimal performance

Performance Best Practices

✅ Allocate sufficient RAM (2GB per concurrent user) ✅ Use SSD storage for models if possible ✅ Enable GPU acceleration when available ✅ Tune ZFS ARC for model caching ✅ Monitor CPU usage and throttling ✅ Use appropriate model size for workload ✅ Limit concurrent users based on resources

Security Best Practices

✅ Restrict network access to trusted subnets ✅ Use VPN for remote access ✅ Enable HTTPS with SSL certificates ✅ Keep Docker and images updated ✅ Use container resource limits ✅ Implement user authentication ✅ Regular security audits and log reviews ✅ Backup encryption keys securely

Maintenance Best Practices

✅ Weekly: Review logs for errors ✅ Monthly: Update container images ✅ Monthly: Check model updates ✅ Monthly: Verify backup integrity ✅ Quarterly: Review and prune old snapshots ✅ Quarterly: Performance optimization review ✅ Annually: Hardware assessment and upgrades

Conclusion

You now have a production-ready LLM system running on TrueNAS that provides:

Centralized AI access for all network devices ✅ Data protection with ZFS snapshots ✅ High performance with GPU acceleration ✅ Complete privacy - all data stays local ✅ Scalable - supports multiple concurrent users ✅ Reliable - automatic restarts and health checks ✅ Maintainable - easy updates and monitoring

Quick Reference Commands

# Container management
docker ps                              # List running containers
docker logs -f ollama                  # View Ollama logs
docker restart ollama anythingllm      # Restart services
docker exec -it ollama bash            # Enter Ollama container

# Model management
docker exec ollama ollama list         # List models
docker exec ollama ollama pull [model] # Download model
docker exec ollama ollama rm [model]   # Remove model

# Maintenance
zfs snapshot tank/docker/ollama@backup # Create snapshot
zfs list -t snapshot                   # List snapshots
docker system prune -a                 # Clean unused images

# Monitoring
docker stats                           # Resource usage
nvidia-smi                            # GPU usage
zpool status                          # Storage health

Next Steps

  1. Test from multiple devices - Verify network access
  2. Set up automated backups - Protect your configuration
  3. Tune performance - Optimize for your workload
  4. Create monitoring alerts - Stay informed of issues
  5. Document your setup - Custom configs and procedures
  6. Experiment with models - Find the best fit for your needs

Additional Resources

  • TrueNAS Documentation: https://www.truenas.com/docs/
  • Ollama GitHub: https://github.com/ollama/ollama
  • AnythingLLM Docs: https://docs.anythingllm.com
  • Docker Compose: https://docs.docker.com/compose/
  • ZFS Best Practices: https://openzfs.org/

This guide was created for Pacific Northwest Computers. For professional TrueNAS setup, configuration assistance, or enterprise AI deployment services, contact us for a consultation.

Document Version: 1.0
Last Updated: November 2025
Tested On: TrueNAS SCALE 24.04 (Dragonfish)


Frequently Asked Questions

Q: Can I run this on TrueNAS CORE?
A: Not recommended. CORE lacks native Docker support. You'd need to set up a Linux VM, which adds complexity. Migrate to SCALE for better Docker integration.

Q: How much RAM do I really need?
A: Minimum 16GB total (8GB for small models). Recommend 32GB+ for multiple users or larger models. Each concurrent user needs ~2-4GB.

Q: Will this slow down my TrueNAS?
A: With proper resource allocation, no. Set CPU/RAM limits on containers. GPU acceleration keeps CPU usage low during inference.

Q: Can I expose this to the internet?
A: Not recommended unless behind VPN. Use WireGuard or Tailscale for secure remote access. Never expose ports 11434/3001 directly.

Q: What if my TrueNAS reboots?
A: Containers restart automatically (with --restart unless-stopped flag). Models persist on ZFS. No data loss.

Q: Can I use this commercially?
A: Yes! Dolphin models are open-source and commercially usable. Check specific model licenses on HuggingFace.

Q: How do I upgrade models?
A: Simply pull the new version: docker exec ollama ollama pull dolphin-mistral. Old version auto-replaced.

Q: What about power consumption?
A: Idle: ~50W. Under load: 150-300W (CPU-only), 200-400W (with GPU). Use CPU governor to manage.

Q: Can multiple users use simultaneously?
A: Yes! Ollama supports concurrent requests. Set OLLAMA_NUM_PARALLEL=4 for up to 4 simultaneous users.

Q: Does this work with WiFi clients?
A: Yes, as long as they're on the same network. Performance depends on WiFi speed. Wired connection recommended for large document uploads.


Ready to revolutionize your home lab with local AI? Get started today! 🚀



Created & Maintained by Pacific Northwest Computers



📞 Pacific Northwest Computers offers Remote & Onsite Support Across: 
SW Washington including Vancouver WA, Battle Ground WA, Camas WA, Washougal WA, Longview WA, Kelso WA, and Portland OR

No comments:

Post a Comment