Network & AI/ML Solutions

Welcome back to my tech corner. Today, we are going to take a deep dive into some fundamental Kubernetes topics, including Init Containers, Deployments, DaemonSets, and Taints & Tolerations.

Prerequisites: Since I will be running Kubernetes locally using Docker and Kind, you will need:

Docker Desktop installed on your system.
Kind (Kubernetes in Docker) installed along with the kubectl client.

The Problem

Imagine you are working for a company that requires two distinct departments: Production and Monitoring. You have a cluster with three nodes. However, Worker Node 2 is reserved exclusively for critical work.

The Rule: Production apps must not run on Worker Node 2.

The Exception: Monitoring apps (like log collectors) must run on every node, regardless of restrictions.

The Architecture

This is the high-level view of our Nodes:

Worker Nodes 1 & 2: Will host the Frontend and Backend of the production application.
Worker Node 3: Is the only node capable of running the SSD Cache app (simulating a hardware dependency).
All Nodes: Must run the Monitoring Log Collector for compliance.

Step 1: Cluster Setup

First, we will create a cluster for this project with 3 worker nodes using a kind.yaml configuration file. Since we are using Kind, we need to configure port mapping to ensure NodePort services work correctly from our local machine.

The kind.yaml configuration:

Command to create the cluster:

$ kind create cluster --image [image-name] --name [cluster-name] --config kind.yaml

Verify the setup: Run the following commands to check your cluster status and nodes:

kind get clusters kubectl get nodes

Step 2: Namespaces & Node Labeling

Now, let's create the Production and Monitoring namespaces. We will also taint Worker 2 for critical workloads and label Worker 3 for node affinity.

1. Create Namespaces (Imperative Commands):

kubectl create namespace prod kubectl create namespace monitoring

2. Apply the Taint: We taint Worker 2 so standard pods cannot schedule there.

# Syntax: kubectl taint nodes [node-name] key=value:effect

kubectl taint nodes cka-dual-tenant-cluster-worker2 restricted=true:NoSchedule

3. Apply the Label: We label Worker 3 to simulate a node with a specific hardware feature (SSD).

kubectl label nodes cka-dual-tenant-cluster-worker3 ssd=true

Step 3: Deploying Production Apps

The Backend App

We will create a Deployment that runs a simple Go application. The Twist: We will add an Init Container. The main application won't start until this container finishes its job (simulating a "Wait for Database" check).

The backend.yaml file

backend.yaml

1apiVersion: apps/v1
2kind: Deployment
3metadata:
4  name: backend-app
5  namespace: prod
6  labels:
7    app: backend
8    tier: api
9spec:
10  replicas: 2
11  selector:
12    matchLabels:
13      app: backend
14  template:
15    metadata:
16      labels:
17        app: backend
18    spec:
19      # 1. This container runs FIRST.
20      initContainers:
21      - name: check-db-ready
22        image: busybox:1.28
23        # Simulates waiting for a database for 10 seconds
24        command: ['sh', '-c', 'echo "Checking database connection..."; sleep 10; echo "DB is up!";']
25
26      # 2. This container starts only after the Init Container finishes.
27      containers:
28      - name: main-app
29        image: gcr.io/google-samples/hello-app:1.0
30        ports:
31        - containerPort: 8080
32---
33apiVersion: v1
34kind: Service
35metadata:
36  name: backend-service
37  namespace: prod
38spec:
39  # ClusterIP means it is ONLY accessible inside the cluster (secure)
40  type: ClusterIP
41  selector:
42    app: backend
43  ports:
44  - port: 80        # The port other pods use to talk to this service
45    targetPort: 8080 # The port the container is actually listening on

Observation: Immediately after applying, if you run kubectl get pods -n prod -w, you will see the status transition from Init:0/1 → PodInitializing → Running. This proves our resilience logic is working.

The Frontend App

The Frontend app runs an Nginx image with 2 replicas, exposed on port 30009.

The frontend.yaml file

frontend.yaml

1apiVersion: apps/v1
2kind: Deployment
3metadata:
4  name: frontend-app
5  namespace: prod
6  labels:
7    app: frontend
8    tier: web
9spec:
10  replicas: 2
11  selector:
12    matchLabels:
13      app: frontend
14  template:
15    metadata:
16      labels:
17        app: frontend
18    spec:
19      containers:
20      - name: nginx
21        image: nginx:alpine
22        ports:
23        - containerPort: 80
24---
25apiVersion: v1
26kind: Service
27metadata:
28  name: frontend-service
29  namespace: prod
30spec:
31  # NodePort opens a port on your computer so you can access it in browser
32  type: NodePort
33  selector:
34    app: frontend
35  ports:
36  - port: 80
37    targetPort: 80
38    nodePort: 30009 # We force a specific port for easy testing

Observation: After inspecting the pods, you will notice that the Frontend and Backend apps are only running on Worker 1 and Worker 3. Worker Node 2 is skipped entirely because of the taint we applied earlier.

Step 4: The Monitoring Agent (DaemonSet)

Now we will deploy the Log Collector. This DaemonSet utilizes a Toleration to ignore the "restricted" taint we created in Step 2.

The monitor-agent.yaml file

monitor-agent.yaml

1
2apiVersion: apps/v1
3kind: DaemonSet
4metadata:
5  name: log-collector
6  namespace: monitoring
7  labels:
8    app: logging
9spec:
10  selector:
11    matchLabels:
12      app: logging
13  template:
14    metadata:
15      labels:
16        app: logging
17    spec:
18      # 1. TOLERATIONS: This magic key allows this pod to land on the tainted node
19      tolerations:
20      - key: "restricted"
21        operator: "Equal"
22        value: "true"
23        effect: "NoSchedule"
24
25      containers:
26      - name: fluentd-simulator
27        image: busybox
28        args:
29        - /bin/sh
30        - -c
31        - >
32          i=0;
33          while true;
34          do
35            echo "$i: Collecting logs from node $(printenv MY_NODE_NAME)...";
36            i=$((i+1));
37            sleep 10;
38          done
39        env:
40        # This helps us see which node the pod is actually running on in the logs
41        - name: MY_NODE_NAME
42          valueFrom:
43            fieldRef:
44              fieldPath: spec.nodeName

This DaemonSet uses a simple BusyBox image to simulate log collection. Because of the Toleration, this pod is allowed to run on all nodes, including the restricted Worker Node 2.

Step 5: Node Affinity (SSD Cache)

Finally, let's use Node Affinity. We want a specific "Database Cache" pod that only runs on nodes backed by fast SSDs (Worker Node 3, which we labeled ssd=true).

The ssd-cache.yaml file

ssd-cache.yaml

1
2apiVersion: apps/v1
3kind: DaemonSet
4metadata:
5  name: log-collector
6  namespace: monitoring
7  labels:
8    app: logging
9spec:
10  selector:
11    matchLabels:
12      app: logging
13  template:
14    metadata:
15      labels:
16        app: logging
17    spec:
18      # 1. TOLERATIONS: This magic key allows this pod to land on the tainted node
19      tolerations:
20      - key: "restricted"
21        operator: "Equal"
22        value: "true"
23        effect: "NoSchedule"
24
25      containers:
26      - name: fluentd-simulator
27        image: busybox
28        args:
29        - /bin/sh
30        - -c
31        - >
32          i=0;
33          while true;
34          do
35            echo "$i: Collecting logs from node $(printenv MY_NODE_NAME)...";
36            i=$((i+1));
37            sleep 10;
38          done
39        env:
40        # This helps us see which node the pod is actually running on in the logs
41        - name: MY_NODE_NAME
42          valueFrom:
43            fieldRef:
44              fieldPath: spec.nodeName

This deploys a simple Redis image that will strictly adhere to our hardware requirements.

Verification

1. Visualizing the Nodes

Let's look at a detailed view of worker node one distribution.

If you check the monitoring namespace, you will see:

The SSD Cache Pod is running only on Worker 3 (due to Affinity).

The Log Collector is running on all 3 nodes (due to DaemonSet + Tolerations).

2. Verifying the Log Collector (The Taint Test)

The most important part of this project is proving that our Log Collector is running on Worker Node 2 (the restricted node) and actually doing its job.

Find the Pod on the Restricted Node: First, list the pods with the node name to find the one running on worker2.

kubectl get pods -n monitoring -o wide

Check the Logs: Copy that pod's name and check its output. It should be printing the node name it is running on.

kubectl logs [log-collector-pod-name] -n monitoring

Expected Output:

This confirms that despite the "NoSchedule" taint, our infrastructure agent is successfully monitoring the critical node.

3. External Access (Frontend)

Open your browser and go to http://localhost:30009. You should see the "Welcome to nginx!" page.

4. Internal DNS (Frontend → Backend)

To test internal service discovery, we will log into the Frontend pod and try to reach the Backend using its Service name (backend-service).

Get the Frontend pod name:

kubectl get pods -n prod

Exec into the Frontend pod:

kubectl exec -it [frontend-pod-name] -n prod -- sh

Test connectivity via curl:

curl http://backend-service

(Note: If curl is missing, you can often verify DNS with nslookup backend-service).

Resilient & Multi-Tenant Kubernetes Cluster