How to Install DeepSeek-R1 on a Linux Dedicated Server

A step-by-step guide to hosting private AI locally with Ollama on Ubuntu and AlmaLinux.

How to Install DeepSeek-R1 on a Linux Dedicated Server - KW Servers

How to Install DeepSeek-R1 (Private AI) on a Linux Dedicated Server Using Ollama

In the fast-evolving world of artificial intelligence, data privacy has become a top priority for enterprises. With data breaches and regulatory scrutiny on the rise, businesses are actively abandoning cloud-based solutions like OpenAI in favor of hosting local large language models (LLMs). Enter DeepSeek-R1, a highly capable open-source AI model gaining massive traction for its advanced reasoning, high efficiency, and deep customization potential.

Hosting this local LLM on your own Linux dedicated server ensures complete data sovereignty, with absolutely no third-party privacy risks involved.

If you are looking for a comprehensive, step-by-step guide to installing DeepSeek-R1 on a Linux dedicated server using Ollama, you are in the right place. This private AI hosting tutorial focuses on Ubuntu and AlmaLinux, two of the most robust distributions for enterprise server environments. We will walk you through the entire deployment pipeline, from initial setup to securing your AI API.

Looking for enterprise-grade hardware? KW Servers offers top-tier GPU dedicated servers heavily optimized for demanding AI workloads. Let's dive in!

What You'll Learn

Why Choose DeepSeek-R1 and Ollama for Your Private AI Hosting?

Before jumping into the installation, here is a quick look at why this specific AI software stack is the industry standard for local deployment:

  • DeepSeek-R1 (70B Distilled): While the flagship model boasts 671 billion parameters, DeepSeek successfully distilled its groundbreaking reasoning capabilities into a highly efficient 70-billion-parameter version. This distilled model delivers high-performance inference with minimal resource overhead, making it perfect for complex tasks like advanced NLP, code generation, and logical reasoning, all running locally.

  • Ollama: A powerful open-source tool that makes running local LLMs on your own hardware incredibly simple. It natively handles model pulling, inference, and API exposure, making it beginner-friendly yet robust enough for high-traffic production environments.

Key Benefits of Local AI Deployment:

  • Data Privacy: Keep highly sensitive company information and customer data strictly on-premises.

  • Cost Savings: Eliminate the compounding, unpredictable API fees and rate limits of cloud-based AI providers.

  • Scalability: Seamlessly integrate the AI within your existing infrastructure and scale your hardware exactly as your user base grows.

Prerequisites for Installing DeepSeek-R1

To ensure a smooth AI deployment and bottleneck-free token generation, your server must meet specific hardware requirements. The 70B parameter model requires substantial resources, especially for fast inference.

Component Minimum / Recommended Specs Notes
Operating System Ubuntu 22.04 LTS+ or AlmaLinux 8/9 Stable, widely supported enterprise server distros.
CPU 16+ Cores (e.g., Intel Xeon or AMD Ryzen) Essential for heavy processing and system-level tasks.
RAM 64GB Minimum (128GB+ Recommended) Prevents system swapping. 64GB is strictly required to run the 70B model with partial GPU offloading.
Storage 500GB+ SSD or NVMe The default quantized Ollama 70B model takes about 43GB, but extra space is needed for the OS and context caching.
GPU (Crucial) NVIDIA RTX or A-series (48GB+ Total VRAM) Crucial for speed. We recommend Dual RTX 3090s/4090s (24GBx2) or an NVIDIA A6000/A100 to fit the 70B model entirely in VRAM.
Permissions Root Access SSH access as root or a user with sudo privileges.
Basic Tools curl, git Required for downloading necessary setup packages.

Pro Tip: Don't let hardware bottleneck your AI project. KW Servers provides GPU Dedicated Servers starting at just $41/month (featuring high-bandwidth networking and dual-GPU configurations) and High-RAM Ryzen Dedicated Servers starting at $64/month. All plans are pre-configured for heavy AI workloads and backed by 24/7 expert support. Check out our server plans to get started!

Step 1: Install Ollama Natively

One of the best advantages of Ollama is how easily it installs on a bare-metal or virtualized Linux server. It runs natively as a system service, bypassing the need for complex Docker container setups.

1. Update Your System

Before installing new AI software, always ensure your server's package manager is up to date.

For Ubuntu:

sudo apt update && sudo apt upgrade -y

For AlmaLinux:

sudo dnf update -y

2. Install Ollama

Use the official one-liner script to download and install Ollama. This script works perfectly for both Ubuntu and AlmaLinux, automatically detecting your hardware (including NVIDIA GPUs) and setting up necessary drivers for hardware acceleration:

curl -fsSL https://ollama.com/install.sh | sh

3. Verify the Installation

Verify that the service is installed and running:

ollama --version

Troubleshooting: If the service isn't running, manually start and enable it on boot using: sudo systemctl enable --now ollama.

Step 2: Pull and Run the DeepSeek-R1 Model

With the environment ready, pulling the AI model is straightforward. We will use the highly capable 70B variant for the best balance of reasoning power and efficiency.

1. Pull the Model

Run the following command to download the quantized DeepSeek-R1 files to your server:

ollama pull deepseek-r1:70b

Note: The 70B model is approximately 43GB. Depending on your network, this download can take time. (Luckily, KW Servers come equipped with high-bandwidth enterprise networking to make this lightning fast!)

2. Run the Model

Once downloaded, start an interactive terminal session:

ollama run deepseek-r1:70b

Test it out by typing: "Hello, DeepSeek!" The model will respond right in your terminal. Because Ollama runs as a background service, you can immediately start customizing Modelfiles or connecting it to external applications.

Performance Note: Local LLMs rely heavily on GPU memory bandwidth. On KW Servers' GPU-optimized plans, inference speeds can easily hit 50+ tokens per second—vastly outperforming CPU-only setups or underpowered home labs.

Step 3: Expose DeepSeek-R1 to the Web with Nginx

By default, the Ollama API runs strictly locally on localhost:11434. To access it remotely (for your web apps or development team), setting up an Nginx reverse proxy is the industry standard for secure exposure.

1. Install Nginx

For Ubuntu:

sudo apt install nginx -y

For AlmaLinux:

sudo dnf install nginx -y

Start and enable the service for both:

sudo systemctl start nginx
sudo systemctl enable nginx

2. Configure the Reverse Proxy

The configuration file path depends on your operating system.

For Ubuntu: sudo nano /etc/nginx/sites-available/ollama

For AlmaLinux: sudo nano /etc/nginx/conf.d/ollama.conf

Paste the following configuration, replacing yourdomain.com with your actual domain name or your KW Server's public IP address:

server {
    listen 80;
    server_name yourdomain.com;

    location / {
        proxy_pass http://localhost:11434;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Save and exit the file.

3. Enable and Restart Nginx

If using Ubuntu, link the file to enable it (AlmaLinux users skip this step, as conf.d files are auto-enabled):

sudo ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/

Test your configuration for syntax errors, then restart:

sudo nginx -t
sudo systemctl restart nginx

(Optional but Highly Recommended): Secure your API with a free SSL certificate via Certbot. Run sudo apt install certbot python3-certbot-nginx -y (Ubuntu) or sudo dnf install certbot python3-certbot-nginx -y (AlmaLinux), then execute sudo certbot --nginx to configure HTTPS automatically.

You can now access your API at http://yourdomain.com and integrate it with powerful frameworks like LangChain or Flowise!

Bonus: Secure Your AI API with a Server Firewall

Security is paramount for private AI. With Nginx handling web traffic, configure your server's firewall to block unauthorized ports, keeping port 11434 hidden so traffic only passes through your reverse proxy.

For Ubuntu (Using UFW):

# Allow SSH so you don't lock yourself out
sudo ufw allow OpenSSH

# Allow HTTP and HTTPS for your Nginx API
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Enable the firewall
sudo ufw enable

Check status anytime with sudo ufw status.

For AlmaLinux (Using Firewalld):

sudo systemctl start firewalld
sudo systemctl enable firewalld

# Allow SSH, HTTP, and HTTPS
sudo firewall-cmd --permanent --add-service=ssh
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https

# Reload to apply the changes
sudo firewall-cmd --reload

This ensures your DeepSeek-R1 setup blocks unwanted traffic while keeping your API securely accessible via your domain.

Final Thoughts

Congratulations! You have successfully installed the DeepSeek-R1 model on your Linux dedicated server using Ollama, complete with secure web exposure and strict firewall rules. This AI deployment empowers you to run blazing-fast, private LLMs without ever compromising your company's data sovereignty.

Ready to deploy? KW Servers' dedicated machines are reliable, scalable, and heavily tailored for these exact AI workloads, backed by our expert support team. Visit KW Servers today to browse our top-tier GPU and High-RAM server options.

If you run into issues, need custom configurations, or want advice on scaling your AI infrastructure, drop a comment below or contact our sales team. Stay ahead in the local LLM revolution!

Discover KW Servers Dedicated Server Locations

KW Servers servers are available around the world, providing diverse options for hosting websites. Each region offers unique advantages, making it easier to choose a location that best suits your specific hosting needs.