K3s Deployment Guide
Complete guide to deploying a production-ready K3s cluster on Proxmox with Longhorn storage
Overview
This guide documents the deployment of a high-availability K3s Kubernetes cluster running on Proxmox VE. The cluster consists of 7 nodes across two physical Proxmox hosts, with distributed storage provided by Longhorn.
- Control Plane: 3 nodes (HA etcd)
- Worker Nodes: 2 nodes
- Storage Nodes: 2 dedicated nodes
- Storage: ~930GB usable (Longhorn)
- Networking: MetalLB + NGINX Ingress
- Kubernetes Version: v1.35.4+k3s1
Architecture
Infrastructure Layout
| Node | Role | IP Address | Host | Specs |
|---|---|---|---|---|
| k3s-cp-1 | Control Plane | 10.0.0.20 | pve-lenova | 4 CPU, 8GB RAM, 20GB Disk |
| k3s-cp-2 | Control Plane | 10.0.0.21 | pve-lenova | 4 CPU, 8GB RAM, 20GB Disk |
| k3s-cp-3 | Control Plane | 10.0.0.22 | pve-lenova | 4 CPU, 8GB RAM, 20GB Disk |
| k3s-worker-1 | Worker | 10.0.0.23 | pve-lenova | 8 CPU, 16GB RAM, 40GB Disk |
| k3s-worker-2 | Worker | 10.0.0.24 | pve-lenova | 8 CPU, 16GB RAM, 40GB Disk |
| k3s-storage-1 | Storage | 10.0.0.25 | pve-lenova | 4 CPU, 8GB RAM, 500GB Disk |
| k3s-storage-2 | Storage | 10.0.0.26 | pve-dell | 4 CPU, 8GB RAM, 500GB Disk |
Network Topology
- Cluster Network: 10.0.0.0/24 (Proxmox vmbr0)
- Pod Network: 10.42.0.0/16 (Flannel VXLAN)
- Service Network: 10.43.0.0/16
- WireGuard Mesh: 10.250.0.0/24
- MetalLB Range: 10.0.0.100-10.0.0.110
Prerequisites
Proxmox Requirements
- Proxmox VE 8.x or later
- At least 2 nodes for HA (pve-lenova, pve-dell)
- Shared network (vmbr0) accessible by all nodes
- SSH access between nodes
VM Template
Create an Ubuntu 24.04 LXC template or VM template with:
- Containerd support
- SSH key authentication
- WireGuard tools installed
- Static IP configuration
Ensure all VMs can reach each other on the 10.0.0.0/24 network. The WireGuard mesh (10.250.0.0/24) provides additional connectivity for external access.
VM Setup
1. Create VMs on Proxmox
Use the Proxmox web interface or CLI to create VMs. Here's an example for the first control plane node:
# On pve-lenova (10.0.0.5)
qm create 200 \
--name k3s-cp-1 \
--memory 8192 \
--cores 4 \
--net0 virtio,bridge=vmbr0 \
--scsihw virtio-scsi-single \
--scsi0 local-lvm:20 \
--ostype l26 \
--agent enabled=1
qm set 200 --ipconfig0 ip=10.0.0.20/24,gw=10.0.0.1
qm start 200
2. Base Configuration
After VM creation, SSH into each node and configure:
# Update system
apt update && apt upgrade -y
# Install required packages
apt install -y curl wget vim htop wireguard-tools
# Set hostname
hostnamectl set-hostname k3s-cp-1
# Configure hosts file
cat >> /etc/hosts << EOF
10.0.0.20 k3s-cp-1
10.0.0.21 k3s-cp-2
10.0.0.22 k3s-cp-3
10.0.0.23 k3s-worker-1
10.0.0.24 k3s-worker-2
10.0.0.25 k3s-storage-1
10.0.0.26 k3s-storage-2
EOF
# Disable swap (required for Kubernetes)
swapoff -a
sed -i '/swap/d' /etc/fstab
# Load kernel modules
cat > /etc/modules-load.d/k3s.conf << EOF
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
# Configure sysctl
cat > /etc/sysctl.d/k3s.conf << EOF
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
K3s Installation
1. First Control Plane Node
On k3s-cp-1 (10.0.0.20):
# Install K3s server with embedded etcd
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.35.4+k3s1 sh -s - server \
--cluster-init \
--tls-san 10.0.0.20 \
--tls-san 10.0.0.21 \
--tls-san 10.0.0.22 \
--tls-san k3s-cp-1 \
--tls-san k3s-cp-2 \
--tls-san k3s-cp-3 \
--node-ip 10.0.0.20 \
--advertise-address 10.0.0.20 \
--disable servicelb \
--disable traefik
# Get the node token for joining other nodes
cat /var/lib/rancher/k3s/server/node-token
Save the node token output. It will look like: K10xxxxxxxx::server:xxxxxxxx. You'll need this for joining other control plane nodes.
2. Configure kubectl
# On your management machine (kvm-00)
mkdir -p ~/.kube
scp root@10.0.0.20:/etc/rancher/k3s/k3s.yaml ~/.kube/config
sed -i 's/127.0.0.1/10.0.0.20/g' ~/.kube/config
# Test connection
kubectl get nodes
Joining Worker Nodes
Join Additional Control Plane Nodes
On k3s-cp-2 and k3s-cp-3:
# On k3s-cp-2 (10.0.0.21)
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.35.4+k3s1 sh -s - server \
--server https://10.0.0.20:6443 \
--token <NODE_TOKEN> \
--tls-san 10.0.0.21 \
--node-ip 10.0.0.21 \
--advertise-address 10.0.0.21 \
--disable servicelb \
--disable traefik
# On k3s-cp-3 (10.0.0.22)
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.35.4+k3s1 sh -s - server \
--server https://10.0.0.20:6443 \
--token <NODE_TOKEN> \
--tls-san 10.0.0.22 \
--node-ip 10.0.0.22 \
--advertise-address 10.0.0.22 \
--disable servicelb \
--disable traefik
Join Worker Nodes
On k3s-worker-1 and k3s-worker-2:
# On worker nodes
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.35.4+k3s1 sh -s - agent \
--server https://10.0.0.20:6443 \
--token <NODE_TOKEN> \
--node-ip 10.0.0.23
Label Storage Nodes
On k3s-storage-1 and k3s-storage-2, join as workers then label:
# Join as worker (same command as above with appropriate IP)
# Then label from management machine:
kubectl label node k3s-storage-1 node-role.kubernetes.io/storage=true
kubectl label node k3s-storage-2 node-role.kubernetes.io/storage=true
Longhorn Installation
1. Prerequisites
Install open-iscsi on all storage nodes:
# On k3s-storage-1 and k3s-storage-2
apt install -y open-iscsi
systemctl enable --now iscsid
2. Install Longhorn via Helm
# Add Longhorn Helm repository
helm repo add longhorn https://charts.longhorn.io
helm repo update
# Install Longhorn
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
--create-namespace \
--set defaultSettings.defaultDataPath="/var/lib/longhorn" \
--set persistence.defaultClassReplicaCount=1
# Wait for pods to be ready
kubectl wait --for=condition=ready pod \
-l app=longhorn-manager \
-n longhorn-system \
--timeout=300s
3. Create Backup Target
Longhorn requires a backup target (even if not used):
kubectl apply -f - << EOF
apiVersion: longhorn.io/v1beta2
kind: BackupTarget
metadata:
name: default
namespace: longhorn-system
spec:
backupTargetURL: "nfs://localhost/backup"
credentialSecret: ""
pollInterval: 5m
EOF
Storage Class Configuration
Set Longhorn as Default
# Remove default from local-path
kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
# Verify
kubectl get storageclass
Storage Class Parameters
| Parameter | Default | Description |
|---|---|---|
| numberOfReplicas | 1 | Number of volume replicas |
| staleReplicaTimeout | 30 | Minutes to wait before cleanup |
| fromBackup | "" | Restore from backup URL |
PVC Example
Create a PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi
Use in a Pod
apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: default
spec:
containers:
- name: app
image: nginx:alpine
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: my-pvc
MetalLB Setup
Install MetalLB
# Install MetalLB
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.5/config/manifests/metallb-native.yaml
# Wait for deployment
kubectl wait --for=condition=ready pod \
-l app=metallb \
-n metallb-system \
--timeout=300s
Configure IP Address Pool
kubectl apply -f - << EOF
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: default-pool
namespace: metallb-system
spec:
addresses:
- 10.0.0.100-10.0.0.110
autoAssign: true
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: default
namespace: metallb-system
spec:
ipAddressPools:
- default-pool
EOF
Ensure the MetalLB IP range (10.0.0.100-110) doesn't conflict with existing devices on your network. We moved from 10.0.0.50 due to a conflict with the router.
NGINX Ingress Controller
Install NGINX Ingress
# Add Helm repo
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
# Install with LoadBalancer service type
helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.type=LoadBalancer \
--set controller.service.loadBalancerIP=10.0.0.100
Create an Ingress Resource
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app
namespace: default
spec:
ingressClassName: nginx
rules:
- host: myapp.devtek.uk
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-service
port:
number: 80
Caddy Reverse Proxy Configuration
Caddy runs on kvm-01 (149.5.1.252) and proxies traffic to the K3s cluster via MetalLB.
Caddyfile Example
# /etc/caddy/Caddyfile on kvm-01
# Longhorn UI
longhorn.devtek.uk {
reverse_proxy 10.0.0.100:80
}
# K3s Docs (this site)
k3s-docs.devtek.uk {
reverse_proxy 10.0.0.100:80
}
# K3s Dashboard
k3s.devtek.uk {
reverse_proxy 10.0.0.100:80
}
# Your apps
myapp.devtek.uk {
reverse_proxy 10.0.0.100:80
}
Reload Caddy
caddy reload --config /etc/caddy/Caddyfile
kubectl Commands
Essential Commands
# Get cluster info
kubectl cluster-info
# View nodes
kubectl get nodes -o wide
# View all pods
kubectl get pods -A
# View pods in a namespace
kubectl get pods -n longhorn-system
# Describe a resource
kubectl describe node k3s-cp-1
kubectl describe pod <pod-name> -n <namespace>
# View logs
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> -f # follow
# Execute into a pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
# Port forward
kubectl port-forward svc/<service> 8080:80 -n <namespace>
Resource Management
# Apply a configuration
kubectl apply -f manifest.yaml
# Delete a resource
kubectl delete -f manifest.yaml
kubectl delete pod <pod-name> -n <namespace>
# Edit a resource
kubectl edit deployment <name> -n <namespace>
# Scale a deployment
kubectl scale deployment <name> --replicas=3 -n <namespace>
Monitoring
Built-in Metrics
# View node metrics
kubectl top nodes
# View pod metrics
kubectl top pods -A
kubectl top pods -n longhorn-system
Longhorn Monitoring
- Access Longhorn UI: https://longhorn.devtek.uk
- Check volume status and replica health
- Monitor storage usage per node
Troubleshooting
Common Issues
Pod Stuck in Pending
# Check events
kubectl describe pod <pod-name>
# Common causes:
# - Insufficient resources
# - PVC not binding (check Longhorn)
# - Node selector mismatch
Longhorn Volume Issues
# Check Longhorn pods
kubectl get pods -n longhorn-system
# Check backup target
kubectl get backuptargets -n longhorn-system
# View Longhorn logs
kubectl logs -n longhorn-system -l app=longhorn-manager
MetalLB Not Working
# Check MetalLB pods
kubectl get pods -n metallb-system
# Check IP pool
kubectl get ipaddresspools -n metallb-system
# Verify no IP conflicts
ping 10.0.0.100
Node Not Ready
# Check node status
kubectl describe node <node-name>
# On the node, check K3s service
systemctl status k3s # server nodes
systemctl status k3s-agent # worker nodes
# Check logs
journalctl -u k3s -f
Backup & Restore
Backup etcd
# On a control plane node
k3s etcd-snapshot save --name <snapshot-name>
# List snapshots
k3s etcd-snapshot ls
# Prune old snapshots
k3s etcd-snapshot prune --snapshot-retention 5
Restore from Snapshot
# Stop K3s on all nodes
systemctl stop k3s # or k3s-agent
# Restore on first control plane node
k3s server --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/<snapshot-name>
# Restart K3s
systemctl start k3s
Longhorn Backups
Configure a proper backup target (S3/NFS) in Longhorn settings for volume backups.
Useful Links
Documentation
Tools
Command Cheatsheet
Quick Reference
| Task | Command |
|---|---|
| SSH to control plane | ssh -i ~/.ssh/id_ed25519_kflix root@10.250.0.1 |
| Get nodes | kubectl get nodes -o wide |
| Get all pods | kubectl get pods -A |
| Longhorn UI | https://longhorn.devtek.uk |
| Architecture diagram | https://k3s-docs.devtek.uk |
| Check storage | kubectl get storageclass |
| Check PVCs | kubectl get pvc -A |
| Reload Caddy | caddy reload --config /etc/caddy/Caddyfile |