DeepSeek on Kubernetes: AI-Powered Reasoning at Scale

Keywords: DeepSeek Kubernetes deployment, AI on Kubernetes, DeepSeek-R1, scalable AI inference, Kubernetes Open WebUI, Ollama deployment
🧠 Run advanced reasoning AI models like DeepSeek-R1 using Kubernetes and deliver a fast, scalable, and secure experience—this guide shows you how.


🔗 Table of Contents


🤖 What Is DeepSeek-R1 and Why Should You Deploy It on Kubernetes?

DeepSeek-R1 is a state-of-the-art reasoning model optimized for concise, structured responses. When deployed on Kubernetes, it benefits from:

  • Horizontal auto-scaling

  • Service resiliency

  • Seamless cloud-native integrations

  • Simplified updates and versioning

Check out our guide on why Kubernetes is ideal for deploying AI and ML workloads.


🔧 Prerequisites for Running DeepSeek-R1 in Kubernetes

To follow along, you’ll need a Kubernetes cluster running locally or in the cloud. Here are tools to spin up a local environment:

Verify your cluster:

				
					kubectl get nodes
				
			

Example Output

				
					NAME                      STATUS   ROLES           VERSION
deepseek-control-plane    Ready    control-plane   v1.32.0
deepseek-worker           Ready    <none>          v1.32.0
deepseek-worker2          Ready    <none>          v1.32.0

				
			

Even without GPUs, DeepSeek-R1 will run—albeit slower. For real-time reasoning, GPU-backed nodes like NVIDIA A100 are recommended.

🚀 Step-by-Step Guide to Deploy DeepSeek-R1 with Ollama and Open WebUI

📦 Step 1: Deploy DeepSeek-R1 via Ollama

Use Ollama to serve the DeepSeek model within a Kubernetes Pod. Here’s a production-ready deployment manifest:

				
					apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
        - name: ollama
          image: ollama/ollama:latest
          ports:
            - containerPort: 11434
          env:
            - name: OLLAMA_MODEL
              value: deepseek-r1:1.5b
            - name: OLLAMA_NO_THINKING
              value: "true"
            - name: OLLAMA_SYSTEM_PROMPT
              value: "You are DeepSeek-R1, a reasoning model..."
          volumeMounts:
            - mountPath: /root/.ollama
              name: ollama-storage
      volumes:
        - name: ollama-storage
          emptyDir: {}

				
			

🌐 Step 2: Expose Ollama Service (Internal or External)

Create a NodePort service to enable communication between Ollama and Open WebUI:

				
					apiVersion: v1
kind: Service
metadata:
  name: ollama-service
spec:
  selector:
    app: ollama
  ports:
    - port: 11434
      targetPort: 11434
  type: NodePort

				
			

Optional: Configure an NGINX Ingress Controller with cert-manager for TLS-based secure access.


🖥️ Step 3: Add Open WebUI for Interactive Access

Open WebUI is a front-end that lets users chat with your deployed model via browser.

				
					apiVersion: apps/v1
kind: Deployment
metadata:
  name: openweb-ui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: openweb-ui
  template:
    metadata:
      labels:
        app: openweb-ui
    spec:
      containers:
        - name: openweb-ui
          image: ghcr.io/open-webui/open-webui:main
          ports:
            - containerPort: 8080
          env:
            - name: WEBUI_NAME
              value: "DeepSeek India - Hardware Software Gheware"
            - name: OLLAMA_BASE_URL
              value: "http://ollama-service:11434"
            - name: OLLAMA_DEFAULT_MODEL
              value: "deepseek-r1:1.5b"
          volumeMounts:
            - mountPath: /app/backend/data
              name: openweb-data
      volumes:
        - name: openweb-data
          persistentVolumeClaim:
            claimName: openweb-ui-pvc

				
			

You can persist chat histories by attaching a PVC.


🧪 Step 4: Test DeepSeek-R1 from Inside the Container

To test the model:

				
					kubectl exec -it deploy/ollama -- bash
ollama run deepseek-r1:1.5b

				
			

This launches a shell with direct access to the model.


🌍 Access the Chat Interface via Open WebUI

After successful deployment, you can access the model from (use your domain URL):

				
					http://deepseek.gheware.com/auth

				
			

🔐 For secure access, use OAuth2 Proxy to enforce login or integrate OpenID Connect (OIDC) via your identity provider.


🔭 Extend This Deployment Further

Ready to scale and optimize? Here’s what you can do next:

You can also read our deep-dive post: How to Monitor AI Inference with Grafana and Prometheus


🎯 Conclusion: Why DeepSeek on Kubernetes Is a Game Changer

Deploying DeepSeek-R1 on Kubernetes offers a resilient and future-ready architecture for AI-powered reasoning engines.

Whether you’re building:

  • An internal AI assistant

  • A smart customer support bot

  • A research engine

This architecture scales with your needs.


📢 Let’s Make AI Infrastructure Easy for Everyone!

🔄 Like this guide? Bookmark and share it with your team.

💡 Interested in a Helm chart, Terraform script, or a GitHub Actions workflow to automate this entire setup?

👉 Connect with me on LinkedIn or explore more tutorials on BrainUpgrade.in/blogs

Start deploying smarter AI today!

Share:

More Posts

Send Us A Message