Keywords: DeepSeek Kubernetes deployment, AI on Kubernetes, DeepSeek-R1, scalable AI inference, Kubernetes Open WebUI, Ollama deployment
🧠 Run advanced reasoning AI models like DeepSeek-R1 using Kubernetes and deliver a fast, scalable, and secure experience—this guide shows you how.
🔗 Table of Contents
What Is DeepSeek-R1 and Why Should You Deploy It on Kubernetes?
Step-by-Step Guide to Deploy DeepSeek-R1 with Ollama and Open WebUI
🤖 What Is DeepSeek-R1 and Why Should You Deploy It on Kubernetes?
DeepSeek-R1 is a state-of-the-art reasoning model optimized for concise, structured responses. When deployed on Kubernetes, it benefits from:
Horizontal auto-scaling
Service resiliency
Seamless cloud-native integrations
Simplified updates and versioning
Check out our guide on why Kubernetes is ideal for deploying AI and ML workloads.
🔧 Prerequisites for Running DeepSeek-R1 in Kubernetes
To follow along, you’ll need a Kubernetes cluster running locally or in the cloud. Here are tools to spin up a local environment:
Verify your cluster:
kubectl get nodes
Example Output
NAME STATUS ROLES VERSION
deepseek-control-plane Ready control-plane v1.32.0
deepseek-worker Ready v1.32.0
deepseek-worker2 Ready v1.32.0
Even without GPUs, DeepSeek-R1 will run—albeit slower. For real-time reasoning, GPU-backed nodes like NVIDIA A100 are recommended.
🚀 Step-by-Step Guide to Deploy DeepSeek-R1 with Ollama and Open WebUI
📦 Step 1: Deploy DeepSeek-R1 via Ollama
Use Ollama to serve the DeepSeek model within a Kubernetes Pod. Here’s a production-ready deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
env:
- name: OLLAMA_MODEL
value: deepseek-r1:1.5b
- name: OLLAMA_NO_THINKING
value: "true"
- name: OLLAMA_SYSTEM_PROMPT
value: "You are DeepSeek-R1, a reasoning model..."
volumeMounts:
- mountPath: /root/.ollama
name: ollama-storage
volumes:
- name: ollama-storage
emptyDir: {}
🌐 Step 2: Expose Ollama Service (Internal or External)
Create a NodePort
service to enable communication between Ollama and Open WebUI:
apiVersion: v1
kind: Service
metadata:
name: ollama-service
spec:
selector:
app: ollama
ports:
- port: 11434
targetPort: 11434
type: NodePort
Optional: Configure an NGINX Ingress Controller with cert-manager for TLS-based secure access.
🖥️ Step 3: Add Open WebUI for Interactive Access
Open WebUI is a front-end that lets users chat with your deployed model via browser.
apiVersion: apps/v1
kind: Deployment
metadata:
name: openweb-ui
spec:
replicas: 1
selector:
matchLabels:
app: openweb-ui
template:
metadata:
labels:
app: openweb-ui
spec:
containers:
- name: openweb-ui
image: ghcr.io/open-webui/open-webui:main
ports:
- containerPort: 8080
env:
- name: WEBUI_NAME
value: "DeepSeek India - Hardware Software Gheware"
- name: OLLAMA_BASE_URL
value: "http://ollama-service:11434"
- name: OLLAMA_DEFAULT_MODEL
value: "deepseek-r1:1.5b"
volumeMounts:
- mountPath: /app/backend/data
name: openweb-data
volumes:
- name: openweb-data
persistentVolumeClaim:
claimName: openweb-ui-pvc
You can persist chat histories by attaching a PVC.
🧪 Step 4: Test DeepSeek-R1 from Inside the Container
To test the model:
kubectl exec -it deploy/ollama -- bash
ollama run deepseek-r1:1.5b
This launches a shell with direct access to the model.
🌍 Access the Chat Interface via Open WebUI
After successful deployment, you can access the model from (use your domain URL):
http://deepseek.gheware.com/auth
🔐 For secure access, use OAuth2 Proxy to enforce login or integrate OpenID Connect (OIDC) via your identity provider.
🔭 Extend This Deployment Further
Ready to scale and optimize? Here’s what you can do next:
📈 Monitor model metrics with Prometheus + Grafana
⚡ Enable GPU acceleration using NVIDIA device plugin for Kubernetes
📊 Load test inference endpoints using k6
🚀 Auto-scale pods using HPA (Horizontal Pod Autoscaler)
🔀 Route traffic intelligently using Istio Service Mesh
You can also read our deep-dive post: How to Monitor AI Inference with Grafana and Prometheus
🎯 Conclusion: Why DeepSeek on Kubernetes Is a Game Changer
Deploying DeepSeek-R1 on Kubernetes offers a resilient and future-ready architecture for AI-powered reasoning engines.
Whether you’re building:
An internal AI assistant
A smart customer support bot
A research engine
This architecture scales with your needs.
📢 Let’s Make AI Infrastructure Easy for Everyone!
🔄 Like this guide? Bookmark and share it with your team.
💡 Interested in a Helm chart, Terraform script, or a GitHub Actions workflow to automate this entire setup?
👉 Connect with me on LinkedIn or explore more tutorials on BrainUpgrade.in/blogs
Start deploying smarter AI today!