Create a Completely Offline LLM:
Using Dolphin, Ollama, and AnythingLLM

Running a Large Language Model (LLM) completely offline gives you privacy, control, and independence from cloud services. In this comprehensive guide, I'll walk you through setting up a fully functional offline AI assistant using three powerful tools:

Dolphin - Uncensored, instruction-tuned language models
Ollama - Simple, efficient local LLM runtime
AnythingLLM - User-friendly web interface for interacting with local models

By the end of this guide, you'll have a ChatGPT-like experience running entirely on your own hardware, with no internet connection required!!

Why Choose This Setup?

Privacy First: All your conversations and data stay on your local machine. No data is sent to external servers.

Offline Operation: Once installed, the entire system works without an internet connection.

Uncensored Models: Dolphin models are specifically trained to be helpful without unnecessary restrictions, making them ideal for technical work, creative writing, and research.

Easy to Use: AnythingLLM provides a modern, intuitive interface similar to ChatGPT.

Cost-Effective: After initial setup, there are no recurring API costs or subscription fees.

System Requirements

Minimum Requirements

CPU: Modern multi-core processor (Intel i5/AMD Ryzen 5 or better)
RAM: 16GB minimum (8GB for model + 8GB for OS/applications)
Storage: 50GB free space (models can be 4-20GB each)
OS: Windows 10/11, macOS 10.15+, or Linux

Recommended Specifications

CPU: Intel i7/AMD Ryzen 7 or better
RAM: 32GB or more for larger models
GPU: NVIDIA GPU with 8GB+ VRAM (optional but significantly faster)
Storage: 100GB+ SSD for multiple models

Performance Notes

4GB models: Run reasonably on 16GB RAM systems
7B models: Work best with 16GB+ RAM
13B models: Require 32GB+ RAM for good performance
GPU acceleration: NVIDIA GPUs provide 5-10x faster inference

Part 1: Installing Ollama

Ollama is the backbone of our setup - it manages and runs the AI models locally.

Installation on Linux

# Install Ollama with the official script
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

Installation on macOS

# Download and install from the official website
# Visit: https://ollama.com/download

# Or use Homebrew
brew install ollama

# Verify installation
ollama --version

Installation on Windows

Download the Windows installer from https://ollama.com/download
Run the installer and follow the prompts
Open Command Prompt or PowerShell and verify:

ollama --version

Starting the Ollama Service

Ollama typically starts automatically, but you can manage it manually:

Linux (systemd):

# Start Ollama service
sudo systemctl start ollama

# Enable on boot
sudo systemctl enable ollama

# Check status
sudo systemctl status ollama

macOS/Windows: Ollama runs as a background service automatically. If needed, restart it from your system tray.

Part 2: Installing Dolphin Models

Dolphin models are available in various sizes. Choose based on your hardware capabilities.

Understanding Model Sizes

dolphin-phi (2.7B): Smallest, fastest, good for simple tasks
dolphin-mistral (7B): Balanced performance and quality
dolphin-mixtral (8x7B): High quality, requires more resources
dolphin2.5-mixtral (8x7B): Newer version with improvements
dolphin3 (8B): Latest, next generation version

Downloading Dolphin Models

# Option 1: Dolphin3 (Recommended for most users)
ollama pull dolphin3

# Option 2: Dolphin-Mixtral 8x7B (Better quality, needs more RAM)
ollama pull dolphin-mixtral

# Option 3: Dolphin 2.5 Mixtral (Latest version)
ollama pull dolphin2.5-mixtral

# Option 4: Dolphin-Phi (Fastest, smaller model)
ollama pull dolphin-phi

The download may take 10-30 minutes depending on your internet speed and model size.

Testing Your Model

Once downloaded, test the model directly from the command line:

# Start an interactive session
ollama run dolphin-mistral

# Try a test prompt
>>> Hello! Can you introduce yourself?

# Exit with /bye
>>> /bye

Listing Installed Models

# See all installed models
ollama list

# Example output:
# NAME                    ID              SIZE    MODIFIED
# dolphin-mistral:latest  abc123def456    4.1GB   2 minutes ago

Managing Models

# Remove a model you no longer need
ollama rm dolphin-phi

# Pull a specific version
ollama pull dolphin-mistral:7b-v2.6

# Update an existing model
ollama pull dolphin-mistral

Part 3: Installing AnythingLLM

AnythingLLM provides a beautiful web interface for interacting with your local models.

Installation Methods

Method 1: Desktop Application (Recommended)

Windows/macOS/Linux:

Visit https://anythingllm.com/download
Download the installer for your operating system
Run the installer and follow the setup wizard
Launch AnythingLLM from your applications

Method 2: Docker (Advanced Users)

# Pull the Docker image
docker pull mintplexlabs/anythingllm

# Run AnythingLLM container
docker run -d \
  --name anythingllm \
  -p 3001:3001 \
  -v anythingllm_data:/app/server/storage \
  -v anythingllm_models:/app/server/storage/models \
  mintplexlabs/anythingllm

# Access at http://localhost:3001

Method 3: From Source (Developers)

# Clone the repository
git clone https://github.com/Mintplex-Labs/anything-llm.git
cd anything-llm

# Install dependencies and build
yarn install
yarn build

# Start the application
yarn start

Part 4: Configuring AnythingLLM with Ollama

Now let's connect AnythingLLM to your local Ollama instance.

Initial Setup

Launch AnythingLLM - Open the application
First-Time Setup Wizard:
- Click "Get Started"
- Choose your language preference
- Skip cloud service options (we're going offline!)

Connecting to Ollama

Navigate to Settings:
- Click the gear icon (⚙️) in the bottom left
- Select "LLM Preference" from the sidebar
Configure Ollama Connection:
- LLM Provider: Select "Ollama"
- Ollama Base URL: http://localhost:11434
- Model: Select your installed Dolphin model from the dropdown
- Click "Save" or "Update"
Verify Connection:
- AnythingLLM will test the connection
- You should see a green checkmark if successful

Troubleshooting Connection Issues

If AnythingLLM can't connect to Ollama:

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Should return JSON with your installed models

# If not running, start Ollama
# Linux:
sudo systemctl start ollama

# macOS/Windows:
# Restart from system tray or run:
ollama serve

Part 5: Creating Your First Workspace

Workspaces in AnythingLLM organize your conversations and documents.

Setting Up a Workspace

Create New Workspace:
- Click "+ New Workspace" button
- Name it (e.g., "General Chat", "Technical Research", "Writing")
- Choose your Dolphin model
- Set temperature (0.7 is balanced, higher = more creative)
Configure Workspace Settings:
- Temperature: 0.7 (recommended default)
- Context Window: Set based on your model's capacity
- Top P: 0.9 (recommended)
- System Prompt: Customize the AI's behavior (optional)

Example System Prompts

General Assistant:

You are a helpful, knowledgeable AI assistant. Provide clear, accurate, and concise responses. When uncertain, acknowledge limitations.

Technical Expert:

You are an expert programmer and system administrator. Provide detailed technical explanations with code examples. Focus on best practices and security.

Creative Writer:

You are a creative writing assistant. Help with storytelling, character development, and prose improvement. Be imaginative and supportive.

Part 6: Advanced Features

Document Chat (RAG)

AnythingLLM can read and answer questions about your documents:

Upload Documents:
- Click the document icon in your workspace
- Upload PDF, TXT, DOCX, or MD files
- AnythingLLM will process and embed the content
Ask Questions:
- "What are the main points in this document?"
- "Summarize the technical requirements"
- "Find information about [specific topic]"

Supported Document Types

PDF files
Text files (.txt)
Markdown (.md)
Word documents (.docx)
Code files (.py, .js, .java, etc.)

Embedding Models

For document chat, you'll need an embedding model:

# Install nomic-embed-text (recommended)
ollama pull nomic-embed-text

# Or use a smaller alternative
ollama pull mxbai-embed-large

Configure in AnythingLLM:

Settings → Embedding Preference
Provider: Ollama
Model: nomic-embed-text

Web Scraping (Offline Mode)

While web scraping requires internet, you can download pages offline:

Save web pages as HTML or PDF
Upload them to your workspace
Query the content without needing live internet

Part 7: Optimizing Performance

Hardware Optimization

Enable GPU Acceleration (NVIDIA):

# Ollama automatically detects CUDA-capable GPUs
# Verify GPU is being used:
ollama run dolphin-mistral --verbose

# You should see GPU memory allocation in the output

Adjust Model Parameters:

# Run with custom context window
ollama run dolphin-mistral --ctx-size 4096

# Run with specific number of GPU layers
ollama run dolphin-mistral --n-gpu-layers 35

Memory Management

Linux: Monitor memory usage

# Watch memory in real-time
watch -n 1 free -h

# Check Ollama's memory usage
ps aux | grep ollama

Windows: Use Task Manager

Press Ctrl+Shift+Esc
Look for "ollama" process
Monitor memory and GPU usage

Model Selection Tips

Use Case	Recommended Model	RAM Needed	Performance
Quick questions	dolphin-phi	8GB	Fast
General use	dolphin-mistral	16GB	Balanced
Complex tasks	dolphin-mixtral	32GB	Best quality
Technical work	dolphin2.5-mixtral	32GB	High accuracy

Part 8: Best Practices for Offline Use

Prompt Engineering Tips

Be Specific:

❌ "Tell me about networking"
✅ "Explain how TCP/IP three-way handshake works with a diagram"

Provide Context:

❌ "Fix this code"
✅ "This Python function should validate email addresses but returns False for valid emails. Please fix it: [code]"

Use System Prompts: Set a clear role for consistent behavior across sessions.

Maintaining Your System

Regular Updates:

# Update Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Update models (check for new versions)
ollama pull dolphin-mistral

# Update AnythingLLM
# Download latest version from website or use built-in updater

Disk Space Management:

# Check model storage
ollama list

# Remove unused models
ollama rm old-model-name

# Clear conversation history (in AnythingLLM)
# Settings → Privacy → Clear Chat History

Backup Your Setup

Backup Ollama Models:

# Models are stored in:
# Linux: /usr/share/ollama/.ollama/models
# macOS: ~/.ollama/models
# Windows: C:\Users\<username>\.ollama\models

# Create backup
tar -czf ollama-models-backup.tar.gz ~/.ollama/models

Backup AnythingLLM Data:

# Desktop app data location:
# Linux: ~/.config/anythingllm-desktop
# macOS: ~/Library/Application Support/anythingllm-desktop  
# Windows: C:\Users\<username>\AppData\Roaming\anythingllm-desktop

# Create backup
tar -czf anythingllm-backup.tar.gz ~/.config/anythingllm-desktop

Part 9: Troubleshooting Common Issues

Ollama Won't Start

Issue: Ollama service fails to start

Solution:

# Check if port 11434 is in use
sudo netstat -tulpn | grep 11434

# Kill conflicting process
sudo kill -9 <PID>

# Restart Ollama
sudo systemctl restart ollama

Model Download Fails

Issue: Model download interrupted or fails

Solution:

# Remove partial download
ollama rm dolphin-mistral

# Try download again with verbose output
ollama pull dolphin-mistral --verbose

# Check disk space
df -h

Slow Performance

Issue: Responses are very slow

Solutions:

Use a smaller model: Switch from mixtral to mistral
Close other applications: Free up RAM
Check GPU usage: Ensure GPU acceleration is working
Reduce context window: Lower the context size in settings

# Monitor system resources
htop  # Linux
# Task Manager on Windows

AnythingLLM Can't Connect

Issue: "Failed to connect to Ollama"

Checklist:

[ ] Ollama is running: curl http://localhost:11434/api/tags
[ ] Correct URL in settings: http://localhost:11434
[ ] Firewall not blocking: Temporarily disable and test
[ ] Model is downloaded: ollama list

Fix:

# Restart Ollama service
sudo systemctl restart ollama

# Test connection manually
curl http://localhost:11434/api/tags

# Check Ollama logs
journalctl -u ollama -f

Out of Memory Errors

Issue: System crashes or model fails to load

Solutions:

Switch to smaller model:

ollama pull dolphin-phi  # Only 2.7B parameters

Increase swap space (Linux):

# Create 16GB swap file
sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Close other applications: Free up system resources

Part 10: Use Cases and Examples

Programming Assistant

Prompt:

I need a Python function that reads a CSV file and returns a dictionary where keys are values from the first column and values are lists of remaining column values. Include error handling.

Workspace Setup:

Model: dolphin-mistral or dolphin-mixtral
Temperature: 0.3 (more deterministic for code)
System Prompt: "You are an expert programmer. Provide clean, well-commented code with error handling."

Research and Analysis

Prompt:

I'm uploading a research paper about quantum computing. Please:
1. Summarize the main findings
2. Explain the methodology
3. List practical applications mentioned
4. Identify any limitations discussed

Workspace Setup:

Upload research papers as PDFs
Model: dolphin-mixtral (better comprehension)
Temperature: 0.5
Enable document chat with nomic-embed-text

Creative Writing

Prompt:

Help me develop a character for my sci-fi novel:
- Name: Dr. Sarah Chen
- Role: Xenobiologist on Mars colony
- Conflict: Discovers evidence of ancient Martian life
- Personality: [your input]

Create a character profile with background, motivations, and potential story arcs.

Workspace Setup:

Model: dolphin-mixtral (better creativity)
Temperature: 0.9 (higher creativity)
System Prompt: "You are a creative writing coach specializing in science fiction."

Technical Documentation

Prompt:

Convert this technical specification into user-friendly documentation for non-technical staff. Include:
- Overview in plain language
- Step-by-step instructions
- Common troubleshooting
- FAQ section

[Paste technical spec]

Workspace Setup:

Model: dolphin-mistral
Temperature: 0.6
Upload existing documentation for context

Part 11: Privacy and Security

Why Offline Matters

Data Privacy:

No conversation logs sent to external servers
No user tracking or analytics
Complete control over your data

Compliance:

GDPR compliant (data stays local)
HIPAA considerations (healthcare)
SOC 2 / corporate security requirements

Intellectual Property:

Safe to discuss proprietary information
No risk of data training on your inputs
Suitable for sensitive business use

Security Best Practices

Keep Software Updated:
- Regular Ollama updates
- AnythingLLM security patches
- OS security updates
Access Control:
- Use firewalls to block external access to port 11434
- Consider user authentication in AnythingLLM
- Don't expose services to the internet
Data Encryption:
- Use full disk encryption
- Encrypt backups
- Secure physical access to servers
Network Isolation:

# Bind Ollama to localhost only (Linux)
# Edit: /etc/systemd/system/ollama.service
Environment="OLLAMA_HOST=127.0.0.1:11434"

# Restart service
sudo systemctl daemon-reload
sudo systemctl restart ollama

Part 12: Advanced Configuration

Custom Model Parameters

Create a custom Modelfile for fine-tuned behavior:

# Create Modelfile
cat > Dolphin-Custom <<EOF
FROM dolphin-mistral

# Set parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 4096

# Custom system prompt
SYSTEM You are a security-focused AI assistant specializing in cybersecurity and penetration testing. Provide detailed, accurate technical guidance.
EOF

# Create custom model
ollama create dolphin-security -f Dolphin-Custom

# Use your custom model
ollama run dolphin-security

Environment Variables

Configure Ollama behavior with environment variables:

# Set model storage location
export OLLAMA_MODELS=/path/to/models

# Set number of parallel requests
export OLLAMA_NUM_PARALLEL=2

# Set max loaded models
export OLLAMA_MAX_LOADED_MODELS=2

# Enable debug logging
export OLLAMA_DEBUG=1

# Restart Ollama to apply
sudo systemctl restart ollama

API Integration

Use Ollama's API for custom integrations:

# Python example
import requests
import json

def query_ollama(prompt, model="dolphin-mistral"):
    url = "http://localhost:11434/api/generate"
    data = {
        "model": model,
        "prompt": prompt,
        "stream": False
    }
    
    response = requests.post(url, json=data)
    return response.json()["response"]

# Use it
result = query_ollama("Explain docker containers")
print(result)

# Curl example
curl http://localhost:11434/api/generate -d '{
  "model": "dolphin-mistral",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Multi-Model Setup

Run different models for different purposes:

# Install multiple models
ollama pull dolphin-phi        # Fast responses
ollama pull dolphin-mistral    # Balanced
ollama pull dolphin-mixtral    # Best quality
ollama pull codellama          # Code generation
ollama pull llava              # Vision tasks

# Create workspaces in AnythingLLM for each

Part 13: Performance Benchmarks

Response Time Expectations

dolphin-phi (2.7B) on 16GB RAM system:

Simple query: 1-3 seconds
Complex analysis: 5-15 seconds
Code generation: 3-10 seconds

dolphin-mistral (7B) on 16GB RAM system:

Simple query: 3-8 seconds
Complex analysis: 15-30 seconds
Code generation: 10-20 seconds

dolphin-mixtral (8x7B) on 32GB RAM + GPU:

Simple query: 2-5 seconds (with GPU)
Complex analysis: 10-25 seconds
Code generation: 8-18 seconds

Optimization Results

With GPU acceleration (NVIDIA RTX 3060):

5-10x faster inference
Better for interactive conversations
Required for larger models

CPU-only performance:

Suitable for batch processing
Acceptable for non-real-time use
Consider smaller models

Part 14: Comparison with Cloud Services

Feature Comparison

Feature	Offline Setup	ChatGPT Plus	Claude Pro
Monthly Cost	$0 (after hardware)	$20/month	$20/month
Privacy	100% private	Data sent to OpenAI	Data sent to Anthropic
Internet Required	No	Yes	Yes
Response Speed	Varies by hardware	Fast	Fast
Model Updates	Manual	Automatic	Automatic
Censorship	None (Dolphin)	Yes	Yes
API Access	Free (local)	Pay per token	Pay per token
Customization	Full control	Limited	Limited

Cost Analysis

Year 1:

Hardware (if needed): $500-2000
Software: $0
Total: $500-2000

Cloud Alternative (Year 1):

Subscription: $240 ($20/mo × 12)
API usage: $100-500 (varies)
Total: $340-740

Break-even: 12-24 months depending on usage and existing hardware.

Conclusion

You now have a fully functional offline LLM system that respects your privacy while providing powerful AI assistance. This setup gives you:

✅ Complete data privacy and control

✅ Zero recurring costs after initial setup

✅ Unlimited usage without API restrictions

✅ Customizable models and behavior

✅ Works completely offline

✅ Professional-grade AI capabilities

Next Steps

Experiment with different models - Try various Dolphin versions
Create specialized workspaces - Organize by task type
Upload your documents - Enable RAG for your personal knowledge base
Customize system prompts - Tune the AI for your specific needs
Explore the API - Integrate with your own applications

Additional Resources

Ollama Documentation: https://github.com/ollama/ollama
AnythingLLM Docs: https://docs.anythingllm.com
Dolphin Models: https://huggingface.co/cognitivecomputations
Community Support: r/LocalLLaMA, r/Ollama

Need Help?

If you run into issues during setup:

Check the troubleshooting section above
Consult Ollama GitHub issues
Join the AnythingLLM Discord community
Post questions on r/LocalLLaMA

Frequently Asked Questions

Q: Can I use this setup for commercial purposes?
A: Yes! Dolphin models are open-source and can be used commercially. Check specific model licenses on HuggingFace.

Q: How much does this cost?
A: After initial hardware investment, there are zero recurring costs. No subscriptions or API fees.

Q: Is this as good as ChatGPT?
A: For many tasks, yes. Larger Dolphin models (mixtral) perform comparably to GPT-3.5. They won't match GPT-4, but the privacy and cost benefits are significant.

Q: Can I run this on a laptop?
A: Yes! A laptop with 16GB RAM can run smaller models (phi, mistral-7b) effectively. Performance varies by CPU.

Q: Do I need a GPU?
A: No, but it helps significantly. CPU-only inference works but is slower. NVIDIA GPUs provide the best acceleration.

Q: Can I access this from multiple devices?
A: Yes! Set up Ollama on a server and configure AnythingLLM on multiple devices to connect to it. Ensure proper network security.

Q: How often should I update?
A: Check for updates monthly. Ollama and AnythingLLM release improvements regularly. Model updates are less frequent.

Q: Can I use other models besides Dolphin?
A: Absolutely! Ollama supports Llama, Mistral, CodeLlama, Vicuna, and many others. Dolphin is recommended for its uncensored nature.

Q: Is this legal?
A: Yes. Running open-source models locally is completely legal. Dolphin uses properly licensed base models.

Q: Can I train or fine-tune models?
A: While this guide focuses on inference, you can fine-tune models using tools like Axolotl or LLaMA Factory, then run them in Ollama.

Happy local AI adventures! 🚀

Created & Maintained by Pacific Northwest Computers

Pacific NW Computers

www.linktr.ee/pnwcomputers

www.pacificnwcomputers.com

📞 Pacific Northwest Computers offers Remote & Onsite Support Across:

SW Washington including Vancouver WA, Battle Ground WA, Camas WA, Washougal WA, Longview WA, Kelso WA, and Portland OR

Tuesday, November 11, 2025

Offline AI LLM System

Create a Completely Offline LLM:Using Dolphin, Ollama, and AnythingLLM

Why Choose This Setup?

System Requirements

Minimum Requirements

Recommended Specifications

Performance Notes

Part 1: Installing Ollama

Installation on Linux

Installation on macOS

Installation on Windows

Starting the Ollama Service

Part 2: Installing Dolphin Models

Understanding Model Sizes

Downloading Dolphin Models

Testing Your Model

Listing Installed Models

Managing Models

Part 3: Installing AnythingLLM

Installation Methods

Method 1: Desktop Application (Recommended)

Method 2: Docker (Advanced Users)

Method 3: From Source (Developers)

Part 4: Configuring AnythingLLM with Ollama

Initial Setup

Connecting to Ollama

Troubleshooting Connection Issues

Part 5: Creating Your First Workspace

Setting Up a Workspace

Example System Prompts

Part 6: Advanced Features

Document Chat (RAG)

Supported Document Types

Embedding Models

Web Scraping (Offline Mode)

Part 7: Optimizing Performance

Hardware Optimization

Memory Management

Model Selection Tips

Part 8: Best Practices for Offline Use

Prompt Engineering Tips

Maintaining Your System

Backup Your Setup

Part 9: Troubleshooting Common Issues

Ollama Won't Start

Model Download Fails

Slow Performance

AnythingLLM Can't Connect

Out of Memory Errors

Part 10: Use Cases and Examples

Programming Assistant

Research and Analysis

Creative Writing

Technical Documentation

Part 11: Privacy and Security

Why Offline Matters

Security Best Practices

Part 12: Advanced Configuration

Custom Model Parameters

Environment Variables

API Integration

Multi-Model Setup

Part 13: Performance Benchmarks

Response Time Expectations

Optimization Results

Part 14: Comparison with Cloud Services

Feature Comparison

Cost Analysis

Conclusion

Next Steps

Additional Resources

Need Help?

Frequently Asked Questions

Happy local AI adventures! 🚀

Created & Maintained by Pacific Northwest Computers

No comments: