Sitecore Kubernetes and Docker for beginners
When Docker was released for Linux I said to my fellow developers it was a game-changer. Not something I say lightly.
Many years later with the advent of Windows containers and then Azure AKS, finally those of us in the Microsoft sphere can leverage this amazing technology.
For a beginner, it's super intimidating though and k8s even more so. Poor documentation, terrible installation guides, weird error messages, out of date information and the like abounds.
As someone who has set up k8s at home for Windows and Linux they definitely don't make it easy.
In this article, I will try to break down in simple terms how the system works.
Docker
To start with we will discuss Docker File and how this is used for working with Docker.
# Indicates that the windowsservercore image will be used as the base image. FROM mcr.microsoft.com/windows/servercore:ltsc2019 # Metadata indicating an image maintainer. LABEL maintainer="jshelton@contoso.com" # Uses dism.exe to install the IIS role. RUN dism.exe /online /enable-feature /all /featurename:iis-webserver /NoRestart # Creates an HTML file and adds content to this file. RUN echo "Hello World - Dockerfile" > c:\inetpub\wwwroot\index.html # Sets a command or process that will run each time a container is run from the new image. CMD [ "cmd" ]
The above code snippet is a sample docker file provided by Microsoft. Think of Docker as a system which gives you a computer pre-configured for a given task.
If I had a DockerFile and it simply said
FROM mcr.microsoft.com/windows/servercore:ltsc2019
When I started a container using this it would give me clean Windows Server Core computer version LTSC 2019. I can then expand that docker file to do whatever I like.
For example, I could have a DockerFile that says
FROM mcr.microsoft.com/windows/servercore:ltsc2019 RUN ["powershell", "New-Item", "c:/test"]
and I would get a computer that when it starts it runs this powershell command.
powershell New-Item c:/test
Beyond this, I could have a massive docker file like this.
# escape=` # This is a custom SDK image based on servercore that serves two purposes: # * Allows us to build a mixed solution (framework and netcore) # * Allows us to run `dotnet watch` for rendering host development # (see https://github.com/dotnet/dotnet-docker/issues/1984) ARG BUILD_IMAGE ARG NETCORE_BUILD_IMAGE FROM ${NETCORE_BUILD_IMAGE} as netcore-sdk FROM ${BUILD_IMAGE} SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"] # Ensure updated nuget. Depending on your Windows version, dotnet/framework/sdk:4.8 tag may provide an outdated client. # See https://github.com/microsoft/dotnet-framework-docker/blob/1c3dd6638c6b827b81ffb13386b924f6dcdee533/4.8/sdk/windowsservercore-ltsc2019/Dockerfile#L7 ENV NUGET_VERSION 5.6.0 RUN Invoke-WebRequest "https://dist.nuget.org/win-x86-commandline/v$env:NUGET_VERSION/nuget.exe" -UseBasicParsing -OutFile "$env:ProgramFiles\NuGet\nuget.exe" ## Install netcore onto SDK image ## https://github.com/dotnet/dotnet-docker/blob/5e9b849a900c69edfe78f6e0f3519009de4ab471/3.1/sdk/nanoserver-1909/amd64/Dockerfile # Retrieve .NET Core SDK COPY --from=netcore-sdk ["/Program Files/dotnet/", "/Program Files/dotnet/"] ENV ` # Enable detection of running in a container DOTNET_RUNNING_IN_CONTAINER=true ` # Enable correct mode for dotnet watch (only mode supported in a container) DOTNET_USE_POLLING_FILE_WATCHER=true ` # Skip extraction of XML docs - generally not useful within an image/container - helps performance NUGET_XMLDOC_MODE=skip RUN $path = ${Env:PATH} + ';C:\Program Files\dotnet\;'; ` setx /M PATH $path # Trigger first run experience by running arbitrary cmd RUN dotnet help | out-null
If I was to start that computer it would give me an out of the box copy of Windows. With .NET Core and .NET Core SDK installed.
Beyond this, I could have DockerFile which takes Windows and installs a copy of the Sitecore CM on it.
We or Sitecore can take that DockerFile and using a command like this
docker build -t iis .
C:\> docker build -t iis . Sending build context to Docker daemon 2.048 kB Step 1 : FROM mcr.microsoft.com/windows/servercore:ltsc2019 ---> 6801d964fda5 Step 2 : RUN dism /online /enable-feature /all /featurename:iis-webserver /NoRestart ---> Running in ae8759fb47db Deployment Image Servicing and Management tool Version: 10.0.10586.0 Image Version: 10.0.10586.0 Enabling feature(s) The operation completed successfully. ---> 4cd675d35444 Removing intermediate container ae8759fb47db Step 3 : RUN echo "Hello World - Dockerfile" > c:\inetpub\wwwroot\index.html ---> Running in 9a26b8bcaa3a ---> e2aafdfbe392 Removing intermediate container 9a26b8bcaa3a Successfully built e2aafdfbe392
Docker would take those instructions and produce us a Docker Image. The above is the output for that command. That image can then be uploaded to the internet or simply remain on our machines.
In short, a DockerFile is instructions to set up a computer in a given way. Which is then stored as an image for use later.
Docker Images/Containers
A company or individual use the DockerFile we discussed to produce images. Those images can be downloaded (Pull) to your computer with this docker command.
docker pull mcr.microsoft.com/windows/nanoserver:ltsc2022
The above command would download an out of the box copy of Windows Nano Server. You can then start (Run) that computer on your machine using this command.
docker run -it mcr.microsoft.com/windows/nanoserver:ltsc2022 cmd.exe
This creates a Docker Container, by using docker you can happy that every time you start your image any changes you made will be wiped out. If I run an image on my machine and you run it on yours the way they run and the process that created them would have been the same. That is the ultimate power of Docker that docker images and containers behave in a consistent repeatable way.
You are probably thinking that you can start a Sitecore instance in the same way. Your right you can
docker pull scr.sitecore.com/sxp/sitecore-xp1-cd:10.1-ltsc2019
docker run scr.sitecore.com/sxp/sitecore-xp1-cd:10.1-ltsc2019
but if you try it will error ;-) Not because you did anything wrong, it's just Sitecore has lots of dependencies so needs a fair bit more to get it going.
Sitecore Docker
Sitecore has their own documentation
https://doc.sitecore.com/en/developers/100/developer-tools/containers-in-sitecore-development.html
and examples for Docker
https://github.com/Sitecore/docker-examples
In simple terms this:
- Bakes in configuration settings so that the images for Sitecore will run on dev.
- Sitecore needs SQL Server so for dev this creates a copy of this for you and stores the databases on your machine
- Sitecore needs SOLR so for dev this creates a copy of this for you and stores the cores on your machine
- Uses DockerFile to create custom images taking your source code and putting this on top of the out of the box images
The building blocks are the same it's just more complex to deal with Sitecore. Instead of starting a single computer (Image/Container), you need multiple.
AKS
Like how docker takes an image and runs a container. Kubernetes takes your image and runs a Pod. A pod has a lot more to it such as how much resource does it get, starting multiple pods for the same role (CM/CD/ID etc Sitecore roles) and monitoring to determine if the pod is working right. We will go into this in more detail later, but you need to understand the term before we talk about the higher-level architecture.
Azure AKS is made up of Node Pools a node pool is an Azure Virtual Machine Scale Set. This is the physical hardware your Kubernetes runs on. In Sitecore land, you're going to have at least two.
One node pool will be running Linux and this is used to run the Kubernetes k8s servers.
One node pool will be running Windows and this is where your Sitecore will go.
Your Node Pool might be Standard D4s v3 (4 vcpus, 16 GiB memory) Azure Virtual Machines. It's size might be three and your pool will consist of three VMs sized at Standard D4s v3.
A node pool can be node scaled to add additional physical resources to your pool. You could change it's size to say four and now you have four VMs running your Pods.
The node pool is also where you upgrade k8s to a new version.
k8s automatically moves your pods to new pools when required. This means if you scale down a pool to zero nodes then k8s will move off all the pods running in that pool to another which can run what you need.
Node scaling works if you just want to change the number of machines. If you want to change the underlying Azure VM k8s is running on e.g. more powerful VMs. You would create a new node pool with the right VM sizes and set the old pool as zero nodes.
The pool is also where you might apply taints a taint is how you tell Kubernetes what physical hardware your images can run on. For example, you might have a taint on servers running GPU and only allow certain workloads to run there for cost reasons.
Kubernetes
This system allows you to take the computers you run with Docker and run them at scale with deployment management.
The images created with DockerFile are stored into a Registry this allows those images to be pulled down to k8s or other people computers. These registries can be either public or private access.
k8s is configured with the Kubectl command, your machine connects to k8s with this command and applies changes to it.
With Azure you would connect like this.
Using the Azure CLI az login az account set --subscription <subscription id here> az aks get-credentials --resource-group <resource group here> --name <AKS name here>
The system is separated up into namespaces which works like you imagine it would. A namespace contains a bunch of servers and prevents those servers from accessing stuff outside of that namespace.
If you don't specify a namespace then your changes will go into the default namespace.
The Docker images are applied to k8s as pods. You can add an image directly as a pod like this.
kubectl run <name of pod> --image=<name of the image from registry>
But in reality, you will do this using Kubectl YAML files. A yaml config file contians details about the type of server you want to deploy. Here is an example Sitecore file.
apiVersion: v1 kind: Service metadata: name: solr spec: selector: app: solr ports: - protocol: TCP port: 8983 --- apiVersion: apps/v1 kind: Deployment metadata: name: solr labels: app: solr spec: replicas: 1 selector: matchLabels: app: solr template: metadata: labels: app: solr spec: nodeSelector: kubernetes.io/os: linux containers: - name: solr image: solr securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 1000 ports: - containerPort: 8983 env: - name: SOLR_MODE value: solrcloud startupProbe: httpGet: path: /solr/admin/info/system port: 8983 httpHeaders: - name: X-Kubernetes-Probe value: Startup timeoutSeconds: 30 periodSeconds: 10 failureThreshold: 10 volumeMounts: - mountPath: /tmp name: tmp resources: requests: memory: 2Gi cpu: 500m limits: memory: 3Gi cpu: 1500m volumes: - emptyDir: {} name: tmp
This deploys a service called solr which consists of a single pod (replica) if replica is higher then you would get more pods doing the same role. It specifies the type of node pool it can run on (linux). How to determine if the pod was able to start successfully (startup probe). As well as the amount of resources to give this instance (resources - 2gb ram, 500m cpu). In short it's a config file for what you want to run and the amount of the k8s servers you want to dedicate to running it.
In the examples above I talk about the -f argument. In some instances, though you will use -k
The -k argument is for when the folder with your k8s configs has a kustomization.yaml this kustomization file will take customisations in that file and apply them automatically over the top of the other configs. This allows the main files to remaain the same but let Sitecore (or whoever) to change the kustomization and specify a new version.
For example a file like this:
apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization images: - name: sitecore-xp1-cd newName: scr.sitecore.com/sxp/sitecore-xp1-cd newTag: 10.1-ltsc2019 resources: - cm.yaml
This will overwrite the image in the CM yaml file with the one in the kustomization file. From Sitecore 10.1 you will find that most of the time you will run with -k whereas in older versions it may be -f. It just depends on it the folder needs kustomization or not. The Sitecore installation guide will explain this but that is the difference between -f and -k.
k8s secrets
Kubernetes has a secrets system you add secrets either directly or via YAML. This is how in Sitecore for example you store connection string details and the like. Sitecore you update the files in the secrets folder and run
kubectl -k ./secrets
which uploads each of the files as a seperate secret into k8s. If you look at the kustomization yaml it looks like this
secretGenerator:
- name: sitecore-admin
files:
- sitecore-adminpassword.txt
which is essentially take this files content and create a secret called sitecore-admin. Any pod in the namespace that needs the sitecore-admin can read it from the secret and access this confidential piece of information.
These are made available to Sitecore itself by appearing in the Windows Environment variables. Those variables are then imported into Sitecore patch files and applied to Sitecore itself.
Ingress
The ingress is responsible for taking traffic and routing it to the appropriate service. First off you create a ingress controller, you can think of this as a computer responsible for the routing. This is the PowerShell for creating this:
param ( [Parameter(Mandatory = $true)] [ValidateNotNullOrEmpty()] [string] $ResourceGroupName, [Parameter(Mandatory = $true)] [ValidateNotNullOrEmpty()] [string] $AksName ) # authenticate AKS instance Write-Host "--- Get credentials for k8s cluster ---" -ForegroundColor Cyan az aks get-credentials --admin ` --resource-group $ResourceGroupName ` --name $AksName ` --overwrite-existing Write-Host "--- Creating nginx (Ingress) ---" -ForegroundColor Cyan # add nginx helm charts helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo update # update the charts helm upgrade --install nginx-ingress ingress-nginx/ingress-nginx ` --set controller.replicaCount=2 ` --set controller.nodeSelector."beta\.kubernetes\.io/os"=linux ` --set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"=linux ` --set controller.admissionWebhooks.patch.nodeSelector."beta\.kubernetes\.io/os"=linux ` --set-string controller.config.proxy-body-size=10m ` --set controller.service.externalTrafficPolicy=Local ` --wait Write-Host "--- Ready setting up nginx, now retrieving DNS data... ---" -ForegroundColor Green
The PowerShell follows the same instructions in the Sitecore installation guide. It uses HELM which you can think of as a package manager for k8s. We download an NGINX Ingress controller and set this up on the Kubernetes system. It will automatically be assigned a public IP address by AKS.
If this is the server for directing traffic as you will imagine you must configure this. Which as you will imagine happens with yaml.
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: sitecore-ingress annotations: kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/proxy-buffer-size: "32k" nginx.ingress.kubernetes.io/affinity: "cookie" nginx.ingress.kubernetes.io/rewrite-target: / nginx.ingress.kubernetes.io/proxy-connect-timeout: "600" nginx.ingress.kubernetes.io/proxy-read-timeout: "600" nginx.ingress.kubernetes.io/proxy-send-timeout: "600" nginx.ingress.kubernetes.io/proxy-body-size: "512m" spec: rules: - host: cm.globalhost http: paths: - path: pathType: Prefix backend: service: name: cm port: number: 80 path: / - host: cd.globalhost http: paths: - path: pathType: Prefix backend: service: name: cd port: number: 80 path: / - host: id.globalhost http: paths: - path: pathType: Prefix backend: service: name: id port: number: 80 path: / tls: - secretName: global-cd-tls hosts: - cd.globalhost - secretName: global-cm-tls hosts: - cm.globalhost - secretName: global-id-tls hosts: - id.globalhost
You can see a number of hostnames, these hostnames are publically exposed on HTTPS using the TLS certificates at the bottom of the file.
You request https://cm.globalhost
The ingress then takes this and sends your traffic internally of k8s to the CM service on port 80.
The TLS certificate at the bottom of the file is being pulled from the k8s secrets.
Sitecore k8s files
The above hopefully will help frame the k8s files provided by Sitecore. Once you understand that, the install guide and files make more sense:
- create an ingress controller to direct traffic
- the yaml in external creates servers for database or solr in non production
- the yaml in init will populate the databases and solr cores with sitecore setup
- the secrets folder simply adds all those secrets to k8s secrets to make them available to all pods using its YAML file
- the YAML files in root level are the various Sitecore services running as pods.
- create an ingress rule to direct the domains to a particular service
Sitecores installation guide can be distilled right down to the following. Which are all things we have already discussed.
Step 1: Install an Ingress Controller to handle traffic
helm repo add stable https://kubernetes.github.io/ingress-nginx
helm install nginx-ingress ingress-nginx/ingress-nginx
--set controller.replicaCount=2
--set controller.nodeSelector."beta\.kubernetes\.io/os"=linux
--set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"=linux
--set controller.admissionWebhooks.patch.nodeSelector."beta\.kubernetes\.io/os"=linux
--set-string controller.config.proxy-body-size=10m
Step 2: Apply the ootb Sitecore rules for routing hostnames to services
kubectl apply -k ./ingress-nginx/
Step 3: Put all Sitecore configuration into k8s secrets
kubectl apply -k ./secrets/
Step 4: On dev (or test environment) start database and solr
kubectl apply -k ./external/
kubectl wait --for=condition=Available deployments --all --timeout=900s
kubectl wait --for=condition=Ready pods --all
Step 5: Attach the databases to the database server and populate solr with its cores
kubectl apply -k ./init/
kubectl wait --for=condition=Complete job.batch/solr-init --timeout=900s
kubectl wait --for=condition=Complete job.batch/mssql-init --timeout=900s
Step 6: On a newer Sitecore, you need to create a disk claim, used for the submit queue. This is disk access that lives across restarts of pods and deployments.
kubectl apply -f ./volumes/azurefile
Step 7: Create Sitecore pods (e.g. the sitecore roles running as individual servers)
kubectl apply -k ./
kubectl wait --for=condition=Available deployments --all --timeout=1800s
As you can see the majority of these are applying YAML files to create the services for Sitecore. You end up with servers for database, solr, the individual Sitecore roles/services. As well as the ingress to allow routing the hostname traffic to the appropriate service.
Working with Pods
You can view the Sitecore pods that are running with Kubectl
kubectl get pods
these are the pods in the default namespace. Hopefully with an out of the box version you will see that they are all marked as ready and running.
The deployed images are updated like this
kubectl image deployments/id sitecore-xp1-id=YOURCUSTOM REGISTRYHERE/sitecore-id:NEWVERSIONNUMBERHERE
this would update the Identity server image to the one you provided. If the image can be started and take traffic it will replace the existing one.
If the deployment fails you will see this
NAME READY STATUS RESTARTS AGE
cd-5475f7456f-j2cf2 0/1 Running 433 39h
cd-67f58bb46f-xbbd8 1/1 Running 7 7d15h
The above gives a good example you can see one CD is running this is the stable image and has been running for sevearl days. The second is a recent deploy which is broken due to config. Unless the replacement ‘deployment' works it will continue to use the old image.
Some of the statuses are
Init:1/6 = means I’m waiting for other services to be available before I can start. For example CM won’t start unless xconnect et al are running. Note you also see this if SOLR stops working see SOLR - Stops working below.
PodInitializing = It’s currently starting up (basically wait a minute)
Running = it’s started but if it isn’t ready it probably won’t be taking traffic
Completed = only see this if its a job which means the job completed
You can diagnose the issue by connecting to the pod using powershell. Using the above example
kubectl exec --stdin --tty cd-5475f7456f-j2cf2 -- powershell
Notice I’ve put the name in the command which is cd-5475f7456f-j2cf2
This connects me to powershell on the remote machine. Be aware you can get cut off at any moment if the pod restarts. You can then navigate to the log folder as normal and see what Sitecore is complaining about. Or inspect the file system to see if it is as you expect.
Another thing thats useful is just making a request to the website like this
kubectl exec --stdin --tty cd-67f58bb46f-xbbd8 -- powershell
(invoke-webrequest -usebasicparsing "http://localhost").Content
which returns the html that would be displayed. If it’s a full on error you can see the error contents and go from there.
You might also want to use describe
kubectl describe pod cd-67f58bb46f-xbbd8
this is useful if the pod won’t start at all and you need to see why
Roll back to original Sitecore images
You can tell AKS to swap back to the out of the box Sitecore CM/CD images for testing purposes or to start from a blank slate. To do this queue up a deployment like this.
Swapping the version numbers to whatever you are using.
kubectl.exe set image deployments/cd sitecore-xp1-cd=scr.sitecore.com/sxp/sitecore-xp1-cd:10.0-ltsc2019
kubectl.exe set image deployments/cm sitecore-xp1-cm=scr.sitecore.com/sxp/sitecore-xp1-cm:10.0-ltsc2019
kubectl.exe set image deployments/id sitecore-xp1-id=scr.sitecore.com/sxp/sitecore-id:10.0-ltsc2019
Give it a few minutes (using get pods to see progress) and you should see it on the old images. This can be useful when dealing with Sitecore support as if the issue exists on the above versions it's more likely to be a config or Sitecore level problem.
Restart Kuberenetes Pods
Ordinarily, k8s will detect if a pod is misbehaving and restart it as required. However, it is possible to do restart a pod using Kubectl if you need to.
There is no kubectl restart command, you can get a similar if the Pod is part of a Deployment, StatefulSet, ReplicaSet or Replication Controller. Replace sitecore-cm with the name of the deployment you want to effect.
kubectl scale deployment sitecore-cm --replicas=0
wait for the pod to terminate by checking with kubectl get pods or wait command. Then scale it because up using:
kubectl scale deployment sitecore-cm --replicas=1
The above will cause downtime as you're killing the pod(s). You can do something similar without downtime by using the rollout command. This is done using
kubectl rollout restart deployment sitecore-cm
Conclusion
Because Sitecore is an enterprise-level system it is highly complex. For people not familiar with Docker or k8s this makes understanding it even more difficult. I hope the above acts as a good introduction to the terminology and commands that you will use day to day.
About Me
As a Tech Lead for Sagittarius marketing who I have been with for the last twelve years. I oversee a team of seven working pods, including numerous developers and contractors in multiple global locations. This involves supporting the developers with coding issues, meetings and phone calls with their clients and going out of pitches with potential new clients.
I have extensive experience building and supporting Sitecore websites from Sitecore 6+ including Helix pattern, I scored 100% in the Sitecore 7 certification exam. Experience managing and maintaining SQL Server, integration with numerous third parties such as Salesforce, AppDynamics, New Relic, Dynamics CRM and many payment gateways.
The first Sitecore website I developed was Skiweekends which was architected and developed by me. It won the Sitecore Experience Award the main award during the Sitecore Experience awards ceremony and the Sitecore Best Travel & Tourism award. I also was lucky enough to perform the first Sitecore 8 upgrade within the United Kingdom for Liberon.
Personally I have had the honour of being recognised in several award ceremonies. Including the BIMA 100 awards in 2019 in the Tech Trailblazers category and previously in the Dev's and Makers category. I’ve been highly commended twice in the Wirehive 100 Techie of the Year awards. Due to my involvement in many aspects of Sagittarius work, many of the awards for their clients I've also been involved in.
About the author
Richard Brisley
I'm a multi-award winning Sitecore developer. Currently working for Sagittarius Marketing as a solutions architect to understand customer needs and produce multi-national high-performance websites.