DeepSeek on Kubernetes: AI-Powered Reasoning at Scale

Introduction

As artificial intelligence continues to evolve, deploying AI-powered applications efficiently and at scale has become critical. Kubernetes, the de facto orchestration platform, plays a crucial role in managing containerized AI workloads, ensuring scalability, resilience, and ease of management. In this article, we explore DeepSeek on Kubernetes, a deployment that integrates DeepSeek-R1, a powerful reasoning AI model, with Open WebUI for seamless interaction.

Why Kubernetes for DeepSeek?

DeepSeek is an advanced reasoning model that benefits significantly from a containerized deployment within a Kubernetes cluster. Kubernetes provides:

  • Scalability: Effortlessly scale AI workloads across multiple nodes.

  • Resilience: Automatic pod rescheduling in case of failures.

  • Service Discovery: Manage microservices effectively using Kubernetes Services.

  • Persistent Storage: Use PVCs to store and manage AI model data across restarts.

  • Load Balancing: Distribute workloads efficiently across multiple replicas.

Deploying DeepSeek on Kubernetes

1. Kubernetes Cluster Setup

In our setup, we have a three-node Kubernetes cluster with the following nodes:

$ kubectl get nodes
NAME                        STATUS   ROLES           AGE    VERSION
deepseek-control-plane      Ready    control-plane   6d5h   v1.32.0
deepseek-worker            Ready    <none>          6d5h   v1.32.0
deepseek-worker2           Ready    <none>          6d5h   v1.32.0

Even if Kubernetes nodes are not powered using GPU, DeepSeek-R1 will still function, although response times may be slower. GPU acceleration is recommended for optimal performance, especially for complex reasoning tasks.

Kubernetes clusters can be set up locally using tools like:

  • KIND (Kubernetes IN Docker)

  • Minikube

  • MicroK8s

If deployed on a cloud provider, the setup can be made securely accessible using an Ingress object to expose services through a web interface with proper authentication and TLS security.

2. Deploying DeepSeek-R1 with Ollama

DeepSeek-R1 is deployed within Kubernetes using Ollama, which handles AI model inference. Below is the Kubernetes manifest for the Ollama Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  labels:
    app: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:
        - mountPath: /root/.ollama
          name: ollama-storage
        env:
        - name: OLLAMA_MODEL
          value: deepseek-r1:1.5b
        - name: OLLAMA_KEEP_ALIVE
          value: "-1"  
        - name: OLLAMA_NO_THINKING
          value: "true"
        - name: OLLAMA_SYSTEM_PROMPT
          value: "You are DeepSeek-R1, a reasoning model. Provide direct answers without detailed reasoning steps or <think> tags."
      volumes:
      - name: ollama-storage
        emptyDir: {}

3. Exposing Ollama as a Service

To allow other services to communicate with Ollama, we define a NodePort Service:

apiVersion: v1
kind: Service
metadata:
  name: ollama-service
spec:
  selector:
    app: ollama
  ports:
    - protocol: TCP
      port: 11434
      targetPort: 11434
  type: NodePort

4. Deploying Open WebUI

For an interactive experience, we integrate Open WebUI, which connects to Ollama and provides a user-friendly interface. The deployment is as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: openweb-ui
  labels:
    app: openweb-ui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: openweb-ui
  template:
    metadata:
      labels:
        app: openweb-ui
    spec:
      containers:
      - name: openweb-ui
        image: ghcr.io/open-webui/open-webui:main
        env:
        - name: WEBUI_NAME
          value: "DeepSeek India - Hardware Software Gheware"        
        - name: OLLAMA_BASE_URL
          value: "http://ollama-service:11434"  
        - name: OLLAMA_DEFAULT_MODEL
          value: "deepseek-r1:1.5b"             
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: openweb-data
          mountPath: /app/backend/data
      volumes:
      - name: openweb-data
        persistentVolumeClaim:
          claimName: openweb-ui-pvc

5. Running Inference on DeepSeek-R1

To test the deployment, we can execute a command within the Ollama container:

kubectl exec -it deploy/ollama -- bash
ollama run deepseek-r1:1.5b

This command starts an interactive session with the AI model, allowing direct input queries.

Accessing Open WebUI

After deployment, Open WebUI is accessible at:

http://deepseek.gheware.com/auth

This interface allows users to interact with DeepSeek-R1 through a chat-based environment.

Conclusion

By deploying DeepSeek on Kubernetes, we achieve a scalable, resilient, and production-ready AI reasoning system. Kubernetes efficiently orchestrates DeepSeek-R1, ensuring smooth model execution and user interaction through Open WebUI. This architecture can be further extended by adding GPU acceleration, auto-scaling, and monitoring with Prometheus and Grafana.

For AI practitioners, Kubernetes offers the perfect foundation for deploying and managing reasoning models like DeepSeek-R1, paving the way for efficient, scalable AI-powered solutions.

Ready to explore AI on Kubernetes? Let’s deploy DeepSeek together!

 

Share:

More Posts

Send Us A Message