Create a Completely Offline LLM:
Using Dolphin, Ollama, and AnythingLLM
Running a Large Language Model (LLM) completely offline gives you privacy, control, and independence from cloud services. In this comprehensive guide, I'll walk you through setting up a fully functional offline AI assistant using three powerful tools:
- Dolphin - Uncensored, instruction-tuned language models
- Ollama - Simple, efficient local LLM runtime
- AnythingLLM - User-friendly web interface for interacting with local models
By the end of this guide, you'll have a ChatGPT-like experience running entirely on your own hardware, with no internet connection required!!
Why Choose This Setup?
Privacy First: All your conversations and data stay on your local machine. No data is sent to external servers.
Offline Operation: Once installed, the entire system works without an internet connection.
Uncensored Models: Dolphin models are specifically trained to be helpful without unnecessary restrictions, making them ideal for technical work, creative writing, and research.
Easy to Use: AnythingLLM provides a modern, intuitive interface similar to ChatGPT.
Cost-Effective: After initial setup, there are no recurring API costs or subscription fees.
System Requirements
Minimum Requirements
- CPU: Modern multi-core processor (Intel i5/AMD Ryzen 5 or better)
- RAM: 16GB minimum (8GB for model + 8GB for OS/applications)
- Storage: 50GB free space (models can be 4-20GB each)
- OS: Windows 10/11, macOS 10.15+, or Linux
Recommended Specifications
- CPU: Intel i7/AMD Ryzen 7 or better
- RAM: 32GB or more for larger models
- GPU: NVIDIA GPU with 8GB+ VRAM (optional but significantly faster)
- Storage: 100GB+ SSD for multiple models
Performance Notes
- 4GB models: Run reasonably on 16GB RAM systems
- 7B models: Work best with 16GB+ RAM
- 13B models: Require 32GB+ RAM for good performance
- GPU acceleration: NVIDIA GPUs provide 5-10x faster inference
Part 1: Installing Ollama
Ollama is the backbone of our setup - it manages and runs the AI models locally.
Installation on Linux
# Install Ollama with the official script
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version
Installation on macOS
# Download and install from the official website
# Visit: https://ollama.com/download
# Or use Homebrew
brew install ollama
# Verify installation
ollama --version
Installation on Windows
- Download the Windows installer from https://ollama.com/download
- Run the installer and follow the prompts
- Open Command Prompt or PowerShell and verify:
ollama --version
Starting the Ollama Service
Ollama typically starts automatically, but you can manage it manually:
Linux (systemd):
# Start Ollama service
sudo systemctl start ollama
# Enable on boot
sudo systemctl enable ollama
# Check status
sudo systemctl status ollama
macOS/Windows: Ollama runs as a background service automatically. If needed, restart it from your system tray.
Part 2: Installing Dolphin Models
Dolphin models are available in various sizes. Choose based on your hardware capabilities.
Understanding Model Sizes
- dolphin-phi (2.7B): Smallest, fastest, good for simple tasks
- dolphin-mistral (7B): Balanced performance and quality
- dolphin-mixtral (8x7B): High quality, requires more resources
- dolphin2.5-mixtral (8x7B): Newer version with improvements
- dolphin3 (8B): Latest, next generation version
Downloading Dolphin Models
# Option 1: Dolphin3 (Recommended for most users)
ollama pull dolphin3
# Option 2: Dolphin-Mixtral 8x7B (Better quality, needs more RAM)
ollama pull dolphin-mixtral
# Option 3: Dolphin 2.5 Mixtral (Latest version)
ollama pull dolphin2.5-mixtral
# Option 4: Dolphin-Phi (Fastest, smaller model)
ollama pull dolphin-phi
The download may take 10-30 minutes depending on your internet speed and model size.
Testing Your Model
Once downloaded, test the model directly from the command line:
# Start an interactive session
ollama run dolphin-mistral
# Try a test prompt
>>> Hello! Can you introduce yourself?
# Exit with /bye
>>> /bye
Listing Installed Models
# See all installed models
ollama list
# Example output:
# NAME ID SIZE MODIFIED
# dolphin-mistral:latest abc123def456 4.1GB 2 minutes ago
Managing Models
# Remove a model you no longer need
ollama rm dolphin-phi
# Pull a specific version
ollama pull dolphin-mistral:7b-v2.6
# Update an existing model
ollama pull dolphin-mistral
Part 3: Installing AnythingLLM
AnythingLLM provides a beautiful web interface for interacting with your local models.
Installation Methods
Method 1: Desktop Application (Recommended)
Windows/macOS/Linux:
- Visit https://anythingllm.com/download
- Download the installer for your operating system
- Run the installer and follow the setup wizard
- Launch AnythingLLM from your applications
Method 2: Docker (Advanced Users)
# Pull the Docker image
docker pull mintplexlabs/anythingllm
# Run AnythingLLM container
docker run -d \
--name anythingllm \
-p 3001:3001 \
-v anythingllm_data:/app/server/storage \
-v anythingllm_models:/app/server/storage/models \
mintplexlabs/anythingllm
# Access at http://localhost:3001
Method 3: From Source (Developers)
# Clone the repository
git clone https://github.com/Mintplex-Labs/anything-llm.git
cd anything-llm
# Install dependencies and build
yarn install
yarn build
# Start the application
yarn start
Part 4: Configuring AnythingLLM with Ollama
Now let's connect AnythingLLM to your local Ollama instance.
Initial Setup
-
Launch AnythingLLM - Open the application
-
First-Time Setup Wizard:
- Click "Get Started"
- Choose your language preference
- Skip cloud service options (we're going offline!)
Connecting to Ollama
-
Navigate to Settings:
- Click the gear icon (⚙️) in the bottom left
- Select "LLM Preference" from the sidebar
-
Configure Ollama Connection:
- LLM Provider: Select "Ollama"
- Ollama Base URL:
http://localhost:11434 - Model: Select your installed Dolphin model from the dropdown
- Click "Save" or "Update"
-
Verify Connection:
- AnythingLLM will test the connection
- You should see a green checkmark if successful
Troubleshooting Connection Issues
If AnythingLLM can't connect to Ollama:
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Should return JSON with your installed models
# If not running, start Ollama
# Linux:
sudo systemctl start ollama
# macOS/Windows:
# Restart from system tray or run:
ollama serve
Part 5: Creating Your First Workspace
Workspaces in AnythingLLM organize your conversations and documents.
Setting Up a Workspace
-
Create New Workspace:
- Click "+ New Workspace" button
- Name it (e.g., "General Chat", "Technical Research", "Writing")
- Choose your Dolphin model
- Set temperature (0.7 is balanced, higher = more creative)
-
Configure Workspace Settings:
- Temperature: 0.7 (recommended default)
- Context Window: Set based on your model's capacity
- Top P: 0.9 (recommended)
- System Prompt: Customize the AI's behavior (optional)
Example System Prompts
General Assistant:
You are a helpful, knowledgeable AI assistant. Provide clear, accurate, and concise responses. When uncertain, acknowledge limitations.
Technical Expert:
You are an expert programmer and system administrator. Provide detailed technical explanations with code examples. Focus on best practices and security.
Creative Writer:
You are a creative writing assistant. Help with storytelling, character development, and prose improvement. Be imaginative and supportive.
Part 6: Advanced Features
Document Chat (RAG)
AnythingLLM can read and answer questions about your documents:
-
Upload Documents:
- Click the document icon in your workspace
- Upload PDF, TXT, DOCX, or MD files
- AnythingLLM will process and embed the content
-
Ask Questions:
- "What are the main points in this document?"
- "Summarize the technical requirements"
- "Find information about [specific topic]"
Supported Document Types
- PDF files
- Text files (.txt)
- Markdown (.md)
- Word documents (.docx)
- Code files (.py, .js, .java, etc.)
Embedding Models
For document chat, you'll need an embedding model:
# Install nomic-embed-text (recommended)
ollama pull nomic-embed-text
# Or use a smaller alternative
ollama pull mxbai-embed-large
Configure in AnythingLLM:
- Settings → Embedding Preference
- Provider: Ollama
- Model: nomic-embed-text
Web Scraping (Offline Mode)
While web scraping requires internet, you can download pages offline:
- Save web pages as HTML or PDF
- Upload them to your workspace
- Query the content without needing live internet
Part 7: Optimizing Performance
Hardware Optimization
Enable GPU Acceleration (NVIDIA):
# Ollama automatically detects CUDA-capable GPUs
# Verify GPU is being used:
ollama run dolphin-mistral --verbose
# You should see GPU memory allocation in the output
Adjust Model Parameters:
# Run with custom context window
ollama run dolphin-mistral --ctx-size 4096
# Run with specific number of GPU layers
ollama run dolphin-mistral --n-gpu-layers 35
Memory Management
Linux: Monitor memory usage
# Watch memory in real-time
watch -n 1 free -h
# Check Ollama's memory usage
ps aux | grep ollama
Windows: Use Task Manager
- Press Ctrl+Shift+Esc
- Look for "ollama" process
- Monitor memory and GPU usage
Model Selection Tips
| Use Case | Recommended Model | RAM Needed | Performance |
|---|---|---|---|
| Quick questions | dolphin-phi | 8GB | Fast |
| General use | dolphin-mistral | 16GB | Balanced |
| Complex tasks | dolphin-mixtral | 32GB | Best quality |
| Technical work | dolphin2.5-mixtral | 32GB | High accuracy |
Part 8: Best Practices for Offline Use
Prompt Engineering Tips
Be Specific:
❌ "Tell me about networking"
✅ "Explain how TCP/IP three-way handshake works with a diagram"
Provide Context:
❌ "Fix this code"
✅ "This Python function should validate email addresses but returns False for valid emails. Please fix it: [code]"
Use System Prompts: Set a clear role for consistent behavior across sessions.
Maintaining Your System
Regular Updates:
# Update Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Update models (check for new versions)
ollama pull dolphin-mistral
# Update AnythingLLM
# Download latest version from website or use built-in updater
Disk Space Management:
# Check model storage
ollama list
# Remove unused models
ollama rm old-model-name
# Clear conversation history (in AnythingLLM)
# Settings → Privacy → Clear Chat History
Backup Your Setup
Backup Ollama Models:
# Models are stored in:
# Linux: /usr/share/ollama/.ollama/models
# macOS: ~/.ollama/models
# Windows: C:\Users\<username>\.ollama\models
# Create backup
tar -czf ollama-models-backup.tar.gz ~/.ollama/models
Backup AnythingLLM Data:
# Desktop app data location:
# Linux: ~/.config/anythingllm-desktop
# macOS: ~/Library/Application Support/anythingllm-desktop
# Windows: C:\Users\<username>\AppData\Roaming\anythingllm-desktop
# Create backup
tar -czf anythingllm-backup.tar.gz ~/.config/anythingllm-desktop
Part 9: Troubleshooting Common Issues
Ollama Won't Start
Issue: Ollama service fails to start
Solution:
# Check if port 11434 is in use
sudo netstat -tulpn | grep 11434
# Kill conflicting process
sudo kill -9 <PID>
# Restart Ollama
sudo systemctl restart ollama
Model Download Fails
Issue: Model download interrupted or fails
Solution:
# Remove partial download
ollama rm dolphin-mistral
# Try download again with verbose output
ollama pull dolphin-mistral --verbose
# Check disk space
df -h
Slow Performance
Issue: Responses are very slow
Solutions:
- Use a smaller model: Switch from mixtral to mistral
- Close other applications: Free up RAM
- Check GPU usage: Ensure GPU acceleration is working
- Reduce context window: Lower the context size in settings
# Monitor system resources
htop # Linux
# Task Manager on Windows
AnythingLLM Can't Connect
Issue: "Failed to connect to Ollama"
Checklist:
- [ ] Ollama is running:
curl http://localhost:11434/api/tags - [ ] Correct URL in settings:
http://localhost:11434 - [ ] Firewall not blocking: Temporarily disable and test
- [ ] Model is downloaded:
ollama list
Fix:
# Restart Ollama service
sudo systemctl restart ollama
# Test connection manually
curl http://localhost:11434/api/tags
# Check Ollama logs
journalctl -u ollama -f
Out of Memory Errors
Issue: System crashes or model fails to load
Solutions:
- Switch to smaller model:
ollama pull dolphin-phi # Only 2.7B parameters
- Increase swap space (Linux):
# Create 16GB swap file
sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
- Close other applications: Free up system resources
Part 10: Use Cases and Examples
Programming Assistant
Prompt:
I need a Python function that reads a CSV file and returns a dictionary where keys are values from the first column and values are lists of remaining column values. Include error handling.
Workspace Setup:
- Model: dolphin-mistral or dolphin-mixtral
- Temperature: 0.3 (more deterministic for code)
- System Prompt: "You are an expert programmer. Provide clean, well-commented code with error handling."
Research and Analysis
Prompt:
I'm uploading a research paper about quantum computing. Please:
1. Summarize the main findings
2. Explain the methodology
3. List practical applications mentioned
4. Identify any limitations discussed
Workspace Setup:
- Upload research papers as PDFs
- Model: dolphin-mixtral (better comprehension)
- Temperature: 0.5
- Enable document chat with nomic-embed-text
Creative Writing
Prompt:
Help me develop a character for my sci-fi novel:
- Name: Dr. Sarah Chen
- Role: Xenobiologist on Mars colony
- Conflict: Discovers evidence of ancient Martian life
- Personality: [your input]
Create a character profile with background, motivations, and potential story arcs.
Workspace Setup:
- Model: dolphin-mixtral (better creativity)
- Temperature: 0.9 (higher creativity)
- System Prompt: "You are a creative writing coach specializing in science fiction."
Technical Documentation
Prompt:
Convert this technical specification into user-friendly documentation for non-technical staff. Include:
- Overview in plain language
- Step-by-step instructions
- Common troubleshooting
- FAQ section
[Paste technical spec]
Workspace Setup:
- Model: dolphin-mistral
- Temperature: 0.6
- Upload existing documentation for context
Part 11: Privacy and Security
Why Offline Matters
Data Privacy:
- No conversation logs sent to external servers
- No user tracking or analytics
- Complete control over your data
Compliance:
- GDPR compliant (data stays local)
- HIPAA considerations (healthcare)
- SOC 2 / corporate security requirements
Intellectual Property:
- Safe to discuss proprietary information
- No risk of data training on your inputs
- Suitable for sensitive business use
Security Best Practices
-
Keep Software Updated:
- Regular Ollama updates
- AnythingLLM security patches
- OS security updates
-
Access Control:
- Use firewalls to block external access to port 11434
- Consider user authentication in AnythingLLM
- Don't expose services to the internet
-
Data Encryption:
- Use full disk encryption
- Encrypt backups
- Secure physical access to servers
-
Network Isolation:
# Bind Ollama to localhost only (Linux)
# Edit: /etc/systemd/system/ollama.service
Environment="OLLAMA_HOST=127.0.0.1:11434"
# Restart service
sudo systemctl daemon-reload
sudo systemctl restart ollama
Part 12: Advanced Configuration
Custom Model Parameters
Create a custom Modelfile for fine-tuned behavior:
# Create Modelfile
cat > Dolphin-Custom <<EOF
FROM dolphin-mistral
# Set parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 4096
# Custom system prompt
SYSTEM You are a security-focused AI assistant specializing in cybersecurity and penetration testing. Provide detailed, accurate technical guidance.
EOF
# Create custom model
ollama create dolphin-security -f Dolphin-Custom
# Use your custom model
ollama run dolphin-security
Environment Variables
Configure Ollama behavior with environment variables:
# Set model storage location
export OLLAMA_MODELS=/path/to/models
# Set number of parallel requests
export OLLAMA_NUM_PARALLEL=2
# Set max loaded models
export OLLAMA_MAX_LOADED_MODELS=2
# Enable debug logging
export OLLAMA_DEBUG=1
# Restart Ollama to apply
sudo systemctl restart ollama
API Integration
Use Ollama's API for custom integrations:
# Python example
import requests
import json
def query_ollama(prompt, model="dolphin-mistral"):
url = "http://localhost:11434/api/generate"
data = {
"model": model,
"prompt": prompt,
"stream": False
}
response = requests.post(url, json=data)
return response.json()["response"]
# Use it
result = query_ollama("Explain docker containers")
print(result)
# Curl example
curl http://localhost:11434/api/generate -d '{
"model": "dolphin-mistral",
"prompt": "Why is the sky blue?",
"stream": false
}'
Multi-Model Setup
Run different models for different purposes:
# Install multiple models
ollama pull dolphin-phi # Fast responses
ollama pull dolphin-mistral # Balanced
ollama pull dolphin-mixtral # Best quality
ollama pull codellama # Code generation
ollama pull llava # Vision tasks
# Create workspaces in AnythingLLM for each
Part 13: Performance Benchmarks
Response Time Expectations
dolphin-phi (2.7B) on 16GB RAM system:
- Simple query: 1-3 seconds
- Complex analysis: 5-15 seconds
- Code generation: 3-10 seconds
dolphin-mistral (7B) on 16GB RAM system:
- Simple query: 3-8 seconds
- Complex analysis: 15-30 seconds
- Code generation: 10-20 seconds
dolphin-mixtral (8x7B) on 32GB RAM + GPU:
- Simple query: 2-5 seconds (with GPU)
- Complex analysis: 10-25 seconds
- Code generation: 8-18 seconds
Optimization Results
With GPU acceleration (NVIDIA RTX 3060):
- 5-10x faster inference
- Better for interactive conversations
- Required for larger models
CPU-only performance:
- Suitable for batch processing
- Acceptable for non-real-time use
- Consider smaller models
Part 14: Comparison with Cloud Services
Feature Comparison
| Feature | Offline Setup | ChatGPT Plus | Claude Pro |
|---|---|---|---|
| Monthly Cost | $0 (after hardware) | $20/month | $20/month |
| Privacy | 100% private | Data sent to OpenAI | Data sent to Anthropic |
| Internet Required | No | Yes | Yes |
| Response Speed | Varies by hardware | Fast | Fast |
| Model Updates | Manual | Automatic | Automatic |
| Censorship | None (Dolphin) | Yes | Yes |
| API Access | Free (local) | Pay per token | Pay per token |
| Customization | Full control | Limited | Limited |
Cost Analysis
Year 1:
- Hardware (if needed): $500-2000
- Software: $0
- Total: $500-2000
Cloud Alternative (Year 1):
- Subscription: $240 ($20/mo × 12)
- API usage: $100-500 (varies)
- Total: $340-740
Break-even: 12-24 months depending on usage and existing hardware.
Conclusion
You now have a fully functional offline LLM system that respects your privacy while providing powerful AI assistance. This setup gives you:
✅ Complete data privacy and control
✅ Zero recurring costs after initial setup
✅ Unlimited usage without API restrictions
✅ Customizable models and behavior
✅ Works completely offline
✅ Professional-grade AI capabilities
Next Steps
- Experiment with different models - Try various Dolphin versions
- Create specialized workspaces - Organize by task type
- Upload your documents - Enable RAG for your personal knowledge base
- Customize system prompts - Tune the AI for your specific needs
- Explore the API - Integrate with your own applications
Additional Resources
- Ollama Documentation: https://github.com/ollama/ollama
- AnythingLLM Docs: https://docs.anythingllm.com
- Dolphin Models: https://huggingface.co/cognitivecomputations
- Community Support: r/LocalLLaMA, r/Ollama
Need Help?
If you run into issues during setup:
- Check the troubleshooting section above
- Consult Ollama GitHub issues
- Join the AnythingLLM Discord community
- Post questions on r/LocalLLaMA
Frequently Asked Questions
Q: Can I use this setup for commercial purposes?
A: Yes! Dolphin models are open-source and can be used commercially. Check specific model licenses on HuggingFace.
Q: How much does this cost?
A: After initial hardware investment, there are zero recurring costs. No subscriptions or API fees.
Q: Is this as good as ChatGPT?
A: For many tasks, yes. Larger Dolphin models (mixtral) perform comparably to GPT-3.5. They won't match GPT-4, but the privacy and cost benefits are significant.
Q: Can I run this on a laptop?
A: Yes! A laptop with 16GB RAM can run smaller models (phi, mistral-7b) effectively. Performance varies by CPU.
Q: Do I need a GPU?
A: No, but it helps significantly. CPU-only inference works but is slower. NVIDIA GPUs provide the best acceleration.
Q: Can I access this from multiple devices?
A: Yes! Set up Ollama on a server and configure AnythingLLM on multiple devices to connect to it. Ensure proper network security.
Q: How often should I update?
A: Check for updates monthly. Ollama and AnythingLLM release improvements regularly. Model updates are less frequent.
Q: Can I use other models besides Dolphin?
A: Absolutely! Ollama supports Llama, Mistral, CodeLlama, Vicuna, and many others. Dolphin is recommended for its uncensored nature.
Q: Is this legal?
A: Yes. Running open-source models locally is completely legal. Dolphin uses properly licensed base models.
Q: Can I train or fine-tune models?
A: While this guide focuses on inference, you can fine-tune models using tools like Axolotl or LLaMA Factory, then run them in Ollama.


No comments:
Post a Comment