Sitecore Kubernetes and Docker for beginners

A beginners guide to Kubernetes and Docker slanted towards Sitecore development and deployment.

Share

When Docker was released for Linux I said to my fellow developers it was a game-changer. Not something I say lightly. 

Many years later with the advent of Windows containers and then Azure AKS, finally those of us in the Microsoft sphere can leverage this amazing technology.

For a beginner, it's super intimidating though and k8s even more so. Poor documentation, terrible installation guides, weird error messages, out of date information and the like abounds. 

As someone who has set up k8s at home for Windows and Linux they definitely don't make it easy.

In this article, I will try to break down in simple terms how the system works.

 

Docker

To start with we will discuss Docker File and how this is used for working with Docker.

# Indicates that the windowsservercore image will be used as the base image.
FROM mcr.microsoft.com/windows/servercore:ltsc2019

# Metadata indicating an image maintainer.
LABEL maintainer="jshelton@contoso.com"

# Uses dism.exe to install the IIS role.
RUN dism.exe /online /enable-feature /all /featurename:iis-webserver /NoRestart

# Creates an HTML file and adds content to this file.
RUN echo "Hello World - Dockerfile" > c:\inetpub\wwwroot\index.html

# Sets a command or process that will run each time a container is run from the new image.
CMD [ "cmd" ]

The above code snippet is a sample docker file provided by Microsoft. Think of Docker as a system which gives you a computer pre-configured for a given task. 

If I had a DockerFile and it simply said

FROM mcr.microsoft.com/windows/servercore:ltsc2019

When I started a container using this it would give me clean Windows Server Core computer version LTSC 2019. I can then expand that docker file to do whatever I like.

For example, I could have a DockerFile that says

FROM mcr.microsoft.com/windows/servercore:ltsc2019 RUN ["powershell", "New-Item", "c:/test"]

and I would get a computer that when it starts it runs this powershell command.

powershell New-Item c:/test

Beyond this, I could have a massive docker file like this.

# escape=`

# This is a custom SDK image based on servercore that serves two purposes:
#   * Allows us to build a mixed solution (framework and netcore)
#   * Allows us to run `dotnet watch` for rendering host development
#     (see https://github.com/dotnet/dotnet-docker/issues/1984)

ARG BUILD_IMAGE
ARG NETCORE_BUILD_IMAGE

FROM ${NETCORE_BUILD_IMAGE} as netcore-sdk
FROM ${BUILD_IMAGE}

SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]

# Ensure updated nuget. Depending on your Windows version, dotnet/framework/sdk:4.8 tag may provide an outdated client.
# See https://github.com/microsoft/dotnet-framework-docker/blob/1c3dd6638c6b827b81ffb13386b924f6dcdee533/4.8/sdk/windowsservercore-ltsc2019/Dockerfile#L7
ENV NUGET_VERSION 5.6.0
RUN Invoke-WebRequest "https://dist.nuget.org/win-x86-commandline/v$env:NUGET_VERSION/nuget.exe" -UseBasicParsing -OutFile "$env:ProgramFiles\NuGet\nuget.exe"

## Install netcore onto SDK image
## https://github.com/dotnet/dotnet-docker/blob/5e9b849a900c69edfe78f6e0f3519009de4ab471/3.1/sdk/nanoserver-1909/amd64/Dockerfile

# Retrieve .NET Core SDK
COPY --from=netcore-sdk ["/Program Files/dotnet/", "/Program Files/dotnet/"]

ENV `
    # Enable detection of running in a container
    DOTNET_RUNNING_IN_CONTAINER=true `
    # Enable correct mode for dotnet watch (only mode supported in a container)
    DOTNET_USE_POLLING_FILE_WATCHER=true `
    # Skip extraction of XML docs - generally not useful within an image/container - helps performance
    NUGET_XMLDOC_MODE=skip 

RUN $path = ${Env:PATH} + ';C:\Program Files\dotnet\;'; `
    setx /M PATH $path

# Trigger first run experience by running arbitrary cmd
RUN dotnet help | out-null

If I was to start that computer it would give me an out of the box copy of Windows. With .NET Core and .NET Core SDK installed.

Beyond this, I could have DockerFile which takes Windows and installs a copy of the Sitecore CM on it. 

We or Sitecore can take that DockerFile and using a command like this

docker build -t iis .

C:\> docker build -t iis .

Sending build context to Docker daemon 2.048 kB
Step 1 : FROM mcr.microsoft.com/windows/servercore:ltsc2019
 ---> 6801d964fda5

Step 2 : RUN dism /online /enable-feature /all /featurename:iis-webserver /NoRestart
 ---> Running in ae8759fb47db

Deployment Image Servicing and Management tool
Version: 10.0.10586.0

Image Version: 10.0.10586.0

Enabling feature(s)
The operation completed successfully.

 ---> 4cd675d35444
Removing intermediate container ae8759fb47db

Step 3 : RUN echo "Hello World - Dockerfile" > c:\inetpub\wwwroot\index.html
 ---> Running in 9a26b8bcaa3a
 ---> e2aafdfbe392
Removing intermediate container 9a26b8bcaa3a

Successfully built e2aafdfbe392

Docker would take those instructions and produce us a Docker Image.  The above is the output for that command. That image can then be uploaded to the internet or simply remain on our machines. 

In short, a DockerFile is instructions to set up a computer in a given way. Which is then stored as an image for use later.

Docker Images/Containers

A company or individual use the DockerFile we discussed to produce images. Those images can be downloaded (Pull) to your computer with this docker command.

docker pull mcr.microsoft.com/windows/nanoserver:ltsc2022

The above command would download an out of the box copy of Windows Nano Server. You can then start (Run) that computer on your machine using this command.

docker run -it mcr.microsoft.com/windows/nanoserver:ltsc2022 cmd.exe

This creates a Docker Container, by using docker you can happy that every time you start your image any changes you made will be wiped out. If I run an image on my machine and you run it on yours the way they run and the process that created them would have been the same. That is the ultimate power of Docker that docker images and containers behave in a consistent repeatable way.

You are probably thinking that you can start a Sitecore instance in the same way. Your right you can

docker pull scr.sitecore.com/sxp/sitecore-xp1-cd:10.1-ltsc2019

docker run scr.sitecore.com/sxp/sitecore-xp1-cd:10.1-ltsc2019

but if you try it will error ;-) Not because you did anything wrong, it's just Sitecore has lots of dependencies so needs a fair bit more to get it going.

Sitecore Docker

Sitecore has their own documentation

https://doc.sitecore.com/en/developers/100/developer-tools/containers-in-sitecore-development.html

and examples for Docker

https://github.com/Sitecore/docker-examples

In simple terms this:

  • Bakes in configuration settings so that the images for Sitecore will run on dev.
  • Sitecore needs SQL Server so for dev this creates a copy of this for you and stores the databases on your machine
  • Sitecore needs SOLR so for dev this creates a copy of this for you and stores the cores on your machine
  • Uses DockerFile to create custom images taking your source code and putting this on top of the out of the box images

The building blocks are the same it's just more complex to deal with Sitecore. Instead of starting a single computer (Image/Container), you need multiple.

AKS

Like how docker takes an image and runs a container. Kubernetes takes your image and runs a Pod. A pod has a lot more to it such as how much resource does it get, starting multiple pods for the same role (CM/CD/ID etc Sitecore roles) and monitoring to determine if the pod is working right. We will go into this in more detail later, but you need to understand the term before we talk about the higher-level architecture.

Azure AKS is made up of Node Pools a node pool is an Azure Virtual Machine Scale Set. This is the physical hardware your Kubernetes runs on. In Sitecore land, you're going to have at least two.

One node pool will be running Linux and this is used to run the Kubernetes k8s servers.

One node pool will be running Windows and this is where your Sitecore will go.

Your Node Pool might be Standard D4s v3 (4 vcpus, 16 GiB memory) Azure Virtual Machines. It's size might be three and your pool will consist of three VMs sized at Standard D4s v3.

A node pool can be node scaled to add additional physical resources to your pool. You could change it's size to say four and now you have four VMs running your Pods.

The node pool is also where you upgrade k8s to a new version.

k8s automatically moves your pods to new pools when required. This means if you scale down a pool to zero nodes then k8s will move off all the pods running in that pool to another which can run what you need. 

Node scaling works if you just want to change the number of machines. If you want to change the underlying Azure VM k8s is running on e.g. more powerful VMs. You would create a new node pool with the right VM sizes and set the old pool as zero nodes.

The pool is also where you might apply taints a taint is how you tell Kubernetes what physical hardware your images can run on. For example, you might have a taint on servers running GPU and only allow certain workloads to run there for cost reasons.

Kubernetes

This system allows you to take the computers you run with Docker and run them at scale with deployment management.

The images created with DockerFile are stored into a Registry this allows those images to be pulled down to k8s or other people computers. These registries can be either public or private access.

k8s is configured with the Kubectl command, your machine connects to k8s with this command and applies changes to it.

With Azure you would connect like this.

Using the Azure CLI

az login

az account set --subscription <subscription id here>

az aks get-credentials --resource-group <resource group here> --name <AKS name here>

The system is separated up into namespaces which works like you imagine it would. A namespace contains a bunch of servers and prevents those servers from accessing stuff outside of that namespace. 

If you don't specify a namespace then your changes will go into the default namespace.

The Docker images are applied to k8s as pods. You can add an image directly as a pod like this.

kubectl run <name of pod> --image=<name of the image from registry>

But in reality, you will do this using Kubectl YAML files. A yaml config file contians details about the type of server you want to deploy. Here is an example Sitecore file.

apiVersion: v1
kind: Service
metadata:
  name: solr
spec:
  selector:
    app: solr
  ports:
  - protocol: TCP
    port: 8983
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: solr
  labels:
    app: solr
spec:
  replicas: 1
  selector:
    matchLabels:
      app: solr
  template:
    metadata:
      labels:
        app: solr
    spec:
      nodeSelector:
        kubernetes.io/os: linux
      containers:
      - name: solr
        image: solr
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        ports:
        - containerPort: 8983
        env:
        - name: SOLR_MODE
          value: solrcloud
        startupProbe:
          httpGet:
            path: /solr/admin/info/system
            port: 8983
            httpHeaders:
            - name: X-Kubernetes-Probe
              value: Startup
          timeoutSeconds: 30
          periodSeconds: 10
          failureThreshold: 10
        volumeMounts:
            - mountPath: /tmp
              name: tmp
        resources:
          requests:
            memory: 2Gi
            cpu: 500m
          limits:
            memory: 3Gi
            cpu: 1500m
      volumes:
        - emptyDir: {}
          name: tmp

This deploys a service called solr which consists of a single pod (replica) if replica is higher then you would get more pods doing the same role. It specifies the type of node pool it can run on (linux). How to determine if the pod was able to start successfully (startup probe). As well as the amount of resources to give this instance (resources - 2gb ram, 500m cpu). In short it's a config file for what you want to run and the amount of the k8s servers you want to dedicate to running it.

In the examples above I talk about the -f argument. In some instances, though you will use -k

The -k argument is for when the folder with your k8s configs has a kustomization.yaml this kustomization file will take customisations in that file and apply them automatically over the top of the other configs. This allows the main files to remaain the same but let Sitecore (or whoever) to change the kustomization and specify a new version.

For example a file like this:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
images:
- name: sitecore-xp1-cd
        newName: scr.sitecore.com/sxp/sitecore-xp1-cd
        newTag: 10.1-ltsc2019
resources:
- cm.yaml

This will overwrite the image in the CM yaml file with the one in the kustomization file. From Sitecore 10.1 you will find that most of the time you will run with -k whereas in older versions it may be -f. It just depends on it the folder needs kustomization or not. The Sitecore installation guide will explain this but that is the difference between -f and -k.

k8s secrets

Kubernetes has a secrets system you add secrets either directly or via YAML. This is how in Sitecore for example you store connection string details and the like. Sitecore you update the files in the secrets folder and run

kubectl -k ./secrets

which uploads each of the files as a seperate secret into k8s. If you look at the kustomization yaml it looks like this

secretGenerator:
- name: sitecore-admin
files:
- sitecore-adminpassword.txt

which is essentially take this files content and create a secret called sitecore-admin. Any pod in the namespace that needs the sitecore-admin can read it from the secret and access this confidential piece of information.

These are made available to Sitecore itself by appearing in the Windows Environment variables. Those variables are then imported into Sitecore patch files and applied to Sitecore itself.

Ingress

The ingress is responsible for taking traffic and routing it to the appropriate service. First off you create a ingress controller, you can think of this as a computer responsible for the routing. This is the PowerShell for creating this:

param (
    [Parameter(Mandatory = $true)]
    [ValidateNotNullOrEmpty()]
    [string] $ResourceGroupName,

    [Parameter(Mandatory = $true)]
    [ValidateNotNullOrEmpty()]
    [string] $AksName
)

# authenticate AKS instance
Write-Host "--- Get credentials for k8s cluster ---" -ForegroundColor Cyan

az aks get-credentials --admin `
    --resource-group $ResourceGroupName `
    --name $AksName `
    --overwrite-existing

Write-Host "--- Creating nginx (Ingress) ---" -ForegroundColor Cyan

# add nginx helm charts
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx

helm repo update

# update the charts
helm upgrade --install nginx-ingress ingress-nginx/ingress-nginx `
    --set controller.replicaCount=2 `
    --set controller.nodeSelector."beta\.kubernetes\.io/os"=linux `
    --set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"=linux `
    --set controller.admissionWebhooks.patch.nodeSelector."beta\.kubernetes\.io/os"=linux `
    --set-string controller.config.proxy-body-size=10m `
    --set controller.service.externalTrafficPolicy=Local `
    --wait

Write-Host "--- Ready setting up nginx, now retrieving DNS data... ---" -ForegroundColor Green

The PowerShell follows the same instructions in the Sitecore installation guide. It uses HELM which you can think of as a package manager for k8s. We download an NGINX Ingress controller and set this up on the Kubernetes system. It will automatically be assigned a public IP address by AKS.

If this is the server for directing traffic as you will imagine you must configure this. Which as you will imagine happens with yaml.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: sitecore-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/proxy-buffer-size: "32k"
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-body-size: "512m"
spec:
  rules:
  - host: cm.globalhost
    http:
      paths:
      - path:
        pathType: Prefix
        backend:
          service:
            name: cm
            port: 
              number: 80
        path: /
  - host: cd.globalhost
    http:
      paths:
      - path:
        pathType: Prefix
        backend:
          service:
            name: cd
            port: 
              number: 80
        path: /
  - host: id.globalhost
    http:
      paths:
      - path:
        pathType: Prefix
        backend:
          service:
            name: id
            port: 
              number: 80
        path: /
  tls:
  - secretName: global-cd-tls
    hosts:
    - cd.globalhost
  - secretName: global-cm-tls
    hosts:
    - cm.globalhost
  - secretName: global-id-tls
    hosts:
    - id.globalhost

You can see a number of hostnames, these hostnames are publically exposed on HTTPS using the TLS certificates at the bottom of the file.

You request https://cm.globalhost

The ingress then takes this and sends your traffic internally of k8s to the CM service on port 80.

The TLS certificate at the bottom of the file is being pulled from the k8s secrets.

Sitecore k8s files

The above hopefully will help frame the k8s files provided by Sitecore. Once you understand that, the install guide and files make more sense:

  1. create an ingress controller to direct traffic
  2. the yaml in external creates servers for database or solr in non production
  3. the yaml in init will populate the databases and solr cores with sitecore setup
  4. the secrets folder simply adds all those secrets to k8s secrets to make them available to all pods using its YAML file
  5. the YAML files in root level are the various Sitecore services running as pods.
  6. create an ingress rule to direct the domains to a particular service

 

Sitecores installation guide can be distilled right down to the following. Which are all things we have already discussed.

 

Step 1: Install an Ingress Controller to handle traffic

helm repo add stable https://kubernetes.github.io/ingress-nginx

helm install nginx-ingress ingress-nginx/ingress-nginx
--set controller.replicaCount=2
--set controller.nodeSelector."beta\.kubernetes\.io/os"=linux
--set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"=linux
--set controller.admissionWebhooks.patch.nodeSelector."beta\.kubernetes\.io/os"=linux
--set-string controller.config.proxy-body-size=10m

 

Step 2: Apply the ootb Sitecore rules for routing hostnames to services

kubectl apply -k ./ingress-nginx/

 

Step 3: Put all Sitecore configuration into k8s secrets

kubectl apply -k ./secrets/

 

Step 4: On dev (or test environment) start database and solr

kubectl apply -k ./external/

kubectl wait --for=condition=Available deployments --all --timeout=900s
kubectl wait --for=condition=Ready pods --all

 

Step 5: Attach the databases to the database server and populate solr with its cores

kubectl apply -k ./init/

kubectl wait --for=condition=Complete job.batch/solr-init --timeout=900s
kubectl wait --for=condition=Complete job.batch/mssql-init --timeout=900s

 

Step 6: On a newer Sitecore, you need to create a disk claim, used for the submit queue. This is disk access that lives across restarts of pods and deployments.

kubectl apply -f ./volumes/azurefile

 

Step 7: Create Sitecore pods (e.g. the sitecore roles running as individual servers)

kubectl apply -k ./

kubectl wait --for=condition=Available deployments --all --timeout=1800s

 

As you can see the majority of these are applying YAML files to create the services for Sitecore. You end up with servers for database, solr, the individual Sitecore roles/services. As well as the ingress to allow routing the hostname traffic to the appropriate service.

Working with Pods

You can view the Sitecore pods that are running with Kubectl 

kubectl get pods

these are the pods in the default namespace. Hopefully with an out of the box version you will see that they are all marked as ready and running.

The deployed images are updated like this

kubectl image deployments/id sitecore-xp1-id=YOURCUSTOM REGISTRYHERE/sitecore-id:NEWVERSIONNUMBERHERE

this would update the Identity server image to the one you provided. If the image can be started and take traffic it will replace the existing one.

If the deployment fails you will see this

NAME READY STATUS RESTARTS AGE
cd-5475f7456f-j2cf2 0/1 Running 433 39h
cd-67f58bb46f-xbbd8 1/1 Running 7 7d15h

The above gives a good example you can see one CD is running this is the stable image and has been running for sevearl days. The second is a recent deploy which is broken due to config. Unless the replacement ‘deployment' works it will continue to use the old image.

Some of the statuses are

Init:1/6 = means I’m waiting for other services to be available before I can start. For example CM won’t start unless xconnect et al are running. Note you also see this if SOLR stops working see SOLR - Stops working below.

PodInitializing = It’s currently starting up (basically wait a minute)

Running = it’s started but if it isn’t ready it probably won’t be taking traffic

Completed = only see this if its a job which means the job completed

 

You can diagnose the issue by connecting to the pod using powershell. Using the above example

kubectl exec --stdin --tty cd-5475f7456f-j2cf2 -- powershell

Notice I’ve put the name in the command which is cd-5475f7456f-j2cf2

This connects me to powershell on the remote machine. Be aware you can get cut off at any moment if the pod restarts. You can then navigate to the log folder as normal and see what Sitecore is complaining about. Or inspect the file system to see if it is as you expect.

Another thing thats useful is just making a request to the website like this

kubectl exec --stdin --tty cd-67f58bb46f-xbbd8 -- powershell

(invoke-webrequest -usebasicparsing "http://localhost").Content

which returns the html that would be displayed. If it’s a full on error you can see the error contents and go from there.

You might also want to use describe

kubectl describe pod cd-67f58bb46f-xbbd8

this is useful if the pod won’t start at all and you need to see why

 

Roll back to original Sitecore images

You can tell AKS to swap back to the out of the box Sitecore CM/CD images for testing purposes or to start from a blank slate. To do this queue up a deployment like this.

Swapping the version numbers to whatever you are using.

kubectl.exe set image deployments/cd sitecore-xp1-cd=scr.sitecore.com/sxp/sitecore-xp1-cd:10.0-ltsc2019
kubectl.exe set image deployments/cm sitecore-xp1-cm=scr.sitecore.com/sxp/sitecore-xp1-cm:10.0-ltsc2019
kubectl.exe set image deployments/id sitecore-xp1-id=scr.sitecore.com/sxp/sitecore-id:10.0-ltsc2019

Give it a few minutes (using get pods to see progress) and you should see it on the old images. This can be useful when dealing with Sitecore support as if the issue exists on the above versions it's more likely to be a config or Sitecore level problem.

Restart Kuberenetes Pods

Ordinarily, k8s will detect if a pod is misbehaving and restart it as required. However, it is possible to do restart a pod using Kubectl if you need to.

There is no kubectl restart command, you can get a similar if the Pod is part of a Deployment, StatefulSet, ReplicaSet or Replication Controller. Replace sitecore-cm with the name of the deployment you want to effect.

 

kubectl scale deployment sitecore-cm --replicas=0

wait for the pod to terminate by checking with kubectl get pods or wait command. Then scale it because up using:

kubectl scale deployment sitecore-cm --replicas=1

 

The above will cause downtime as you're killing the pod(s). You can do something similar without downtime by using the rollout command. This is done using

kubectl rollout restart deployment sitecore-cm

Conclusion

Because Sitecore is an enterprise-level system it is highly complex. For people not familiar with Docker or k8s this makes understanding it even more difficult. I hope the above acts as a good introduction to the terminology and commands that you will use day to day.