Blog
How to
Run DeepSeek Locally with Ollama: Full Step-by-Step Deployment Guide
Introduction: The Revolution of Local AI Deployment Contents hide
1 Introduction: The Revolution of Local AI Deployment
Introduction: The Revolution of Local AI Deployment
The landscape of Artificial Intelligence is shifting rapidly from cloud-dependent silos to decentralized, local environments. For developers, data scientists, and privacy-conscious enthusiasts, the ability to run DeepSeek locally with Ollama represents a paradigm shift in how we interact with Large Language Models (LLMs). No longer tethered by API latency, subscription fees, or data privacy concerns, local deployment empowers you to harness state-of-the-art reasoning capabilities directly on your own hardware.
DeepSeek has emerged as a formidable competitor in the open-source arena, offering models like DeepSeek-V3, DeepSeek-R1 (a reasoning model rivaling OpenAI’s o1), and the specialized DeepSeek-Coder. When paired with Ollama, an industry-standard framework for simplifying LLM management, deploying these powerful neural networks becomes surprisingly accessible.
In this comprehensive, cornerstone guide, we will walk you through everything you need to know to successfully deploy DeepSeek models on your local machine. From dissecting hardware requirements to executing advanced API integrations, this is your definitive manual for local AI mastery.
Understanding the Power Duo: DeepSeek and Ollama Explained
Before diving into the terminal, it is crucial to understand why this specific combination of software and model architecture is capturing the attention of the tech world.
What is DeepSeek?
DeepSeek is an open-source research lab that has released a series of high-performance LLMs. Unlike generic models, DeepSeek has focused heavily on specialized architectures:
- DeepSeek-R1: A reasoning-focused model that uses Chain-of-Thought (CoT) processing to solve complex logic puzzles, math problems, and coding tasks with high accuracy.
- DeepSeek-Coder: Trained on massive datasets of code, this model supports multiple programming languages and is optimized for code generation and debugging.
- DeepSeek-V3: A Mixture-of-Experts (MoE) model that balances performance and inference speed by only activating a subset of parameters for each token generated.
Why Use Ollama?
Running raw model weights (typically in .safetensors or PyTorch formats) requires complex Python environments and manual dependency management. Ollama abstracts this complexity. It acts as a backend that:
- Manages Model Weights: Automatically downloads and organizes model files (GGUF format).
- Optimizes Hardware: auto-detects GPU acceleration (NVIDIA, AMD, Apple Silicon).
- Provides an API: Exposes a REST API compatible with OpenAI libraries, allowing for easy software integration.
Hardware Prerequisites: Can Your PC Handle DeepSeek?
The feasibility of the query "how to run DeepSeek locally with Ollama" depends entirely on your hardware. LLMs are memory-intensive. The amount of VRAM (Video RAM) on your GPU or Unified Memory (on Mac) is the primary bottleneck.
Minimum Requirements by Model Size
DeepSeek models come in various sizes, typically measured in billions of parameters (B). Here is a breakdown of what you need for smooth performance using 4-bit quantization (the standard for local use):
- 1.5B to 7B Models (Distilled/Lite versions):
- RAM/VRAM: 8GB minimum.
- Hardware: Standard laptops, MacBook Air (M1/M2/M3), or desktops with entry-level GPUs (RTX 3060).
- Use Case: Quick chat, basic coding assistance, low-latency tasks.
- 14B to 32B Models:
- RAM/VRAM: 16GB to 32GB.
- Hardware: High-end MacBook Pro, desktops with RTX 3090/4090.
- Use Case: Complex reasoning, RAG (Retrieval-Augmented Generation), detailed content creation.
- 67B to 70B+ Models:
- RAM/VRAM: 48GB to 64GB+.
- Hardware: Mac Studio (M2/M3 Ultra), Dual RTX 3090/4090 setups.
- Use Case: Enterprise-grade analysis, heavy data processing.
Pro Tip: If you lack a dedicated GPU, Ollama can run models on the CPU, but inference speed (tokens per second) will be significantly slower.
Step-by-Step Guide: How to Run DeepSeek Locally with Ollama
Follow these precise steps to get your local AI environment up and running.
Step 1: Installing Ollama
The installation process varies slightly by operating system but remains straightforward.
For macOS and Windows
Navigate to the official Ollama website and download the installer. The Windows installer is currently in preview but functions well for most users. Run the .exe or .dmg file and follow the on-screen prompts.
For Linux
Open your terminal and run the following curl command, which automatically installs the necessary dependencies and the Ollama service:
curl -fsSL https://ollama.com/install.sh | sh
Step 2: Verifying the Installation
Once installed, verify that the Ollama backend is running. Open your terminal (Command Prompt on Windows, Terminal on Mac/Linux) and type:
ollama --version
If the version number is returned (e.g., ollama version 0.5.4), the CLI tool is successfully installed.
Step 3: Pulling the DeepSeek Model
Ollama allows you to download models using the pull command. You must specify the model tag. For DeepSeek, the most popular tags currently are for the R1 (reasoning) and standard V3 models.
To install DeepSeek-R1 (approx 7B-8B parameter size):
ollama pull deepseek-r1
To install larger versions (e.g., 32B), specify the tag:
ollama pull deepseek-r1:32b
Note: Ensure you have sufficient disk space. A 7B model takes up roughly 4-5GB, while a 32B model can exceed 20GB.
Step 4: Running the Model
After the download is complete, you can initialize the chat session immediately. Run the following command:
ollama run deepseek-r1
You are now inside the interactive shell. You can type prompts like "Explain quantum entanglement in simple terms" or "Write a Python script to scrape a website." The DeepSeek model will generate tokens locally on your machine, with zero data leaving your network.
Optimizing Performance: Quantization and Custom Modelfiles
To truly master how to run DeepSeek locally with Ollama, you must understand optimization. Raw models are often too large for consumer hardware, so we use quantization.
Understanding Quantization
Ollama typically pulls 4-bit quantized models (Q4_K_M) by default. This reduces the precision of the model’s weights from 16-bit to 4-bit, drastically lowering memory usage with negligible loss in intelligence.
If you have high VRAM and want better performance, you can look for different tags, though Ollama’s default balance is ideal for 95% of users.
Creating a Custom Modelfile
You can customize how DeepSeek behaves (e.g., setting a strict system prompt) by creating a Modelfile. This is similar to a Dockerfile for AI.
- Create a file named
Modelfile. - Add the following content:
FROM deepseek-r1
SYSTEM "You are a senior coding assistant who only answers in Python. Be concise."
PARAMETER temperature 0.7
- Build your custom model:
ollama create my-deepseek-coder -f Modelfile
- Run your custom model:
ollama run my-deepseek-coder
Beyond the Terminal: Integrating DeepSeek with Web UIs
While the terminal is efficient, most users prefer a chat interface similar to ChatGPT. Because Ollama exposes a local server, you can easily connect it to open-source Web UIs.
Option 1: Open WebUI (Recommended)
Open WebUI is the most feature-rich interface for local LLMs. It offers chat history, document upload (RAG), and user management.
To run Open WebUI via Docker:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main
Once running, navigate to http://localhost:3000 in your browser. Select "DeepSeek-R1" from the model dropdown, and enjoy a full UI experience powered locally.
Option 2: Browser Extensions
There are several Chrome and Firefox extensions, such as "Page Assist," that connect directly to your local Ollama instance (usually at http://127.0.0.1:11434), allowing you to use DeepSeek to summarize the web page you are currently viewing.
Developer’s Corner: Using the DeepSeek API via Ollama
For developers building applications, Ollama provides a local API endpoint. You do not need an internet connection to use this.
Python Example
You can use the official ollama Python library or the standard requests library.
import requests
json_data = {
"model": "deepseek-r1",
"prompt": "Why is the sky blue?",
"stream": False
}
response = requests.post('http://localhost:11434/api/generate', json=json_data)
print(response.json()['response'])
This capability allows you to build local chatbots, analysis tools, and automation scripts that leverage DeepSeek’s intelligence without paying per-token API fees.
Troubleshooting Common Deployment Issues
Even with a streamlined tool like Ollama, issues can arise. Here are solutions to the most common roadblocks.
1. “Error: connection refused”
This usually means the Ollama background service is not running. On Mac/Windows, restart the application from the system tray. On Linux, run sudo systemctl start ollama.
2. Slow Inference / Token Generation
If the AI is typing extremely slowly, it has likely fallen back to CPU processing because your GPU VRAM is full or undetected. Check your VRAM usage. If you are trying to run a 32B model on an 8GB card, switch to a smaller model like the 7B or 8B version.
3. Hallucinations in Reasoning
While DeepSeek-R1 is powerful, smaller quantized versions (1.5B or 7B) can struggle with highly complex logic compared to the 70B version. Ensure you are using the largest model your hardware can support for the best results.
Frequently Asked Questions
Is it free to run DeepSeek locally with Ollama?
Yes, completely free. Both Ollama and the DeepSeek models are open-source. The only cost is the electricity used by your computer hardware.
Can I run DeepSeek on a laptop without a dedicated GPU?
Yes, but performance will vary. A modern MacBook with an M-series chip (M1/M2/M3) handles it beautifully due to Unified Memory. A Windows laptop with only an integrated Intel/AMD GPU will run the model on the CPU, which will be significantly slower but functional.
Is my data private when using Ollama?
Absolutely. When you run DeepSeek locally with Ollama, the inference happens entirely on your machine. No text or data is sent to DeepSeek’s servers or Ollama’s cloud, making it ideal for processing sensitive or proprietary documents.
How does DeepSeek-R1 compare to Llama 3?
DeepSeek-R1 is specifically optimized for "reasoning" tasks, similar to OpenAI’s o1 series, often outperforming general-purpose models like Llama 3 in math and coding benchmarks. However, Llama 3 is often considered better for creative writing and general conversation.
How do I update the DeepSeek model in Ollama?
Models are improved frequently. To update your local version, simply run the pull command again: ollama pull deepseek-r1. Ollama will detect the difference in the manifest and download the newer layers.
Conclusion
Learning how to run DeepSeek locally with Ollama is more than just a technical exercise; it is a step toward AI sovereignty. By following this guide, you have set up a private, cost-effective, and high-performance AI lab right on your desktop.
Whether you are using DeepSeek-Coder to refactor legacy code, DeepSeek-R1 to solve complex logic problems, or simply exploring the frontier of open-source artificial intelligence, the combination of DeepSeek and Ollama provides a robust foundation. As hardware becomes more powerful and models become more efficient, the gap between local deployment and cloud APIs continues to narrow, placing the future of AI directly in your hands.
Editor at XS One Consultants, sharing insights and strategies to help businesses grow and succeed.