How to do a Zero-Downtime Deployment Using Azure Kubernetes Service?

Overview of deployment with Azure Kubernetes Service

In this article, I will show you a solution for zero-downtime deployment in Azure Kubernetes Service. To add a context for it, first, we are going through some Deployment strategies. Then, I will choose the one that fits our needs. Some of them are supported by Kubernetes natively, some are not (yet). Next, I will outline a System overview by showing you the necessary Kubernetes objects in our AKS. The following part of my article presents our Azure DevOps deployment pipeline to you and briefly goes through the scripts and other settings that do the main thing: zero-downtime deployment. Finally, I am going to Wrap up the things.

Deployment strategies in azure services

Many deployment strategies can help you to deploy your application to production or any other environment. They all have their usage scenarios along with their benefits and drawbacks. As a pre-condition, consider that you might have multiple running instances of your application that needs to be deployed.

Deployment strategies and Azure Kubernetes Service em:

Let us see some of them:

Recreate: Firstly, every instance with the old version is removed then the instances of the new version are rolled out.

 We use this technique while we are developing, and the downtime does not matter.

  •  Kubernetes services support this strategy out of the box.
Deployment strategies and Azure Kubernetes Service 1
1. Figure- Recreate deployment strategy - original image URL: https://storage.googleapis.com/cdn.thenewstack.io/media/2017/11/c42fa239-recreate.gif

Ramped: The central concept of this strategy is to replace the instances of the old version with the new version of the instances one by one.

 There is no doubt that the main gain of this solution is that there is no downtime. In contrast, some severe cons like time-consuming rollout and rollback, no influence on traffic lead you to version problems.

  •  Natively available in Kubernetes services
Kép alt: Deployment strategies and Azure Kubernetes Service 2
2. Figure - Ramped deployment strategy - original image URL: https://storage.googleapis.com/cdn.thenewstack.io/media/2017/11/5bddc931-ramped.gif

Blue/Green: The new version instances are deployed to the destination environment while the traffic is routed to the old instances. The traffic is switched to the instances with the latest version. Lastly, the old instances are deleted.

 Now we have won the following three: no downtime, fast rollout/rollback, control over traffic – no version problems. The downside of this technique is the price of having both the old and new instances of the application at the same time. We can use this approach in a production environment.

·       Not supported by Kubernetes services out of the box.

Kép alt: Deployment strategies and Azure Kubernetes Service 3
3. Figure - Blue/Green deployment strategy - original image URL: https://storage.googleapis.com/cdn.thenewstack.io/media/2017/11/73a2824d-blue-green.gif




Canary: With this strategy, you will also have both the old and new instances alongside. However, the switch of the traffic from the old instances to the new ones is different. In this solution, only weighted traffic is switched to the new instances. After some iterations, you will send the whole traffic to the new instances, and the old versions can be terminated. As an outcome, the users are testing new releases.

 The pros here are fast rollback, measurable performance, and failures, more control over traffic. The cons: slow rollout, can be expensive, there is no control over traffic on the level of users. By using an ingress controller like NGINX, the weighted traffic can be routed way more precisely and cost-effectively. This approach can be used in a production environment as well.

 Not supported by Kubernetes services out of the box.

Deployment strategies and Azure Kubernetes Service 4
4. Figure - Canary deployment strategy - original image URL: https://storage.googleapis.com/cdn.thenewstack.io/media/2017/11/a6324354-canary.gif

A/B testing: Pretty much the same as the canary. The difference is that instead of using a weight for traffic switching, you can use a so-called canary cookie or header. You can highly accurately specify the subset of users who are routed to the new instances. As you might know, A/B testing is originally a technique for making business decisions by rolling out the version that converts the most.

 A remarkable benefit over the weighted canary is the complete control over the traffic. The drawback is still the slow rollout and using a Layer-7 load balancer like NGINX. The strategy can be very useful in production environments.

 Not supported by Kubernetes services out of the box

Kép alt: Deployment strategies and Azure Kubernetes Service 5
5. Figure – A/B testing deployment strategy – original image URL: https://storage.googleapis.com/cdn.thenewstack.io/media/2017/11/5deeea9c-a-b.gif

Shadow: New instances are deployed along with the old instances. After rollout, traffic is routed both to the old and new versions. One can mirror traffic, e.g., with the help of NGINX.

With this strategy, performance tests with full production traffic can be made quickly; furthermore, there is no impact on the users. On the other hand, it is expensive since we are doubling the required resources.

Kép alt: Deployment strategies and Azure Kubernetes Service 6
6. Figure - Shadow deployment strategy - original image URL: https://storage.googleapis.com/cdn.thenewstack.io/media/2017/11/fdd947f8-shadow.gif

You can read more about these deployment strategies at https://thenewstack.io/deployment-strategies/.

Now that we have seen some exciting deployment strategies choosing the right one for our needs is time. We had needed a deployment strategy that satisfies the following requirements:

  • Fast rollout and rollback.
  • The ability to check the newly deployed version while the users are still routed to the old version.
  • Zero-downtime switch to the new version.
  • Can use in the production environment.

The prerequisites above implicate that we will need to mix the Blue/Green and the A/B testing strategies to fit our needs. 

System overview for Zero-Downtime Deployment Using Kubernetes Services

Now let me show you an overview of the relevant components of our system. The figure below shows the infrastructure requirements regarding the chosen deployment strategies. Each component has a Helm chart, which represents the used Kubernetes objects as code, and actually, it is the unit of installation. Note that these codes are simplified to get familiar with the central concept more easily.  

System overview for Zero-Downtime Deployment Using Kubernetes Services
7. Figure - System overview

Deployment objects

To introduce the Blue/Green strategy, the system needs to handle so-called slots for blue and green versions.

Kubernetes Deployment objects
8. Figure- Blue/Green deployments of the component

In the solution above, slots are represented by Kubernetes Deployment objects. In a nutshell, the deployment object is responsible for the pods containing a version of our component. A helm chart template for a component deployment object should look something like this:

The highlighted part of the code is essential because the deployed pod objects related to a component and a slot can be selected clearly by the app and the slot labels.

Service objects

Furthermore, to satisfy the A/B testing strategy, alternatively, you want to make the system able to operate two different versions of a component from another point of view. Therefore, at the same time, we need to double Kubernetes Service objects as well.

Kubernetes Service objects
9. Figure- service objects of the component

The primary role of a service object is to expose a set of pods as a network service. In the figure above, there is a service for the component that points to a set of pods whose slot label is set to blue, and they are running the current version of the application component. Additionally, a canary service for the component points to a set of pods that slot label is set to green, and they are running the next version of the application component.

The only difference between the template of the service and that of the canary service object is that the canary service template additionally contains the highlighted parts.

Ingress Objects

As a consequence, a Kubernetes Ingress object for both of the services needs to be created. An ingress object exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the ingress object.

The only difference between the ingress and the canary ingress objects template is that the canary ingress template contains the highlighted parts. Service and ingress object templates are packed in the same helm chart. The default values file content is the following:

Obviously, we use the same host for both services; however, there are some canary-related annotations (rules) on the canary ingress object

Two deployments of the same component are hosted parallelly only if a new version just has been rolled out. In this situation, we would like to check whether the new version of the component is fully functioning and/or we would like to do some warmup for the new version. Therefore, with the help of the canary annotations, we can do it easily by browsing the same URL along with an additional header - added to each request - named canary with stage value. This can be achieved by using plugins like this in your favorite browser.

In summary, the ingress helm chart is responsible for installing the service and ingress objects. When installing or updating the chart that mostly happens after rolling out a new component's deployment, the service selector labels must be provided (app and slot).

Ingress Controller

The last but significant part of the system is an Ingress controller. To satisfy an ingress object, you must have a running Ingress controller. As you could see clues in ingress templates, you cannot be surprised that we use NGINX Ingress Controller in our solution.

Kubernetes services Ingress controller
10. Figure- Routing to services by NGINX ingress controller

Deployment of the NGINX ingress controller with Helm can be done with the following few commands:

Azure DevOps deployment pipeline

As a next step, we wanted to automate the rollout of our component to AKS. For this purpose, we have defined a Release Pipeline in Azure DevOps. Our release pipeline has exactly one stage called Deploy to AKS, which consists of several jobs.

Azure DevOps deployment pipeline
11. Figure- Our Azure DevOps release pipeline

Azure DevOps deployment pipeline’s jobs are doing the following:

  • Deploy: pull charts from ACR, then deploy the current component to the free slot with helm and upgrade the ingress release to have the canary service selectors updated to the proper set of pods. 
  • User check: at this point, the user can check with the canary header whether the new component works properly, and s/he can decide to go further or do a rollback. If the user rejected the further steps, this job fails. 
  • Clean up deployment: this job runs only if one of the previous jobs had failed. It removes the new version of component deployment and its slot.
  • Swap deployment slots and Cleanup: point both service and canary service to the new version of pods and remove the deployment-related to the previous version of the component.

There are some variables with Release scope defined:

  • acrName: Azure Container Registry name.
  • acrPassword: Azure Container Registry password.
  • chartName: the name of the components helm chart.
  • firewall Mode: Modsecurity firewall state, part of NGINX ingress controller.
  • ingress TlsSecretName: the name of the TLS secret related to the routed hostname.
  • ingressUrl: the routed hostname.
  • serviceName: service name of the component without the slot name.
  • valuesFilename: the filename of the YAML file, which contains the overridden values for the helm chart.
  • whitelistFilename: the filename of the JSON file which contains the whitelist for the ingress object.

Now let us go into the details of the steps mentioned above.

Deploy job

The first task of the job is the Pull charts from the ACR PowerShell task. As its name describes, it pulls the component chart and the ingress chart from the ACR with the correct version. Besides pulling them, it also exports them to a local folder on the build agent; therefore, it can be installed with helm. The version is extracted from some build artifacts. It is important to mention to set HELM_EXPERIMENTAL_OCI environment variable with the value of 1, as helm requires it for the used commands.

The second task, named Helm Login is a Package and deploy Helm charts task, which is part of the default Azure DevOps tasks. It logs into the Kubernetes cluster.

The last one is the Deploy chart and updates ingress PowerShell task with the following script:

This task does the deployment of the component chart and the ingress chart exported to local path before. As you might have noticed, there are two PowerShell scripts which run in this task get-whitelist.ps1 and deploy-bg.ps1.The first is reading whitelist from a json artifact file and converts it to the proper format. The second script is the following:

First of all, this script checks helm releases and whether the component has already been deployed to the blue or the green slot. If there are no deployments yet, we will use the blue slot for the new version. Otherwise, we will use the opposite slot of the one already used. Secondly, we set the whitelist and the firewall mode in the values.YAML file of the ingress helm chart. The upcoming command will upgrade or install the ingress helm chart with the parameters given. Lastly, we will run the installation of the component chart with the proper parameters.

Now let us return to the last row of the Deploy job's Deploy chart and update ingress PowerShell task. As you can observe, we write the slot name used for the deployment into a file on the build agent machine. This is necessary considering that we need to pass this information to other jobs.

User check job

This job, unlike the others, is an agentless job with only one Manual intervention task added from the common Azure DevOps task list. At this step, the deployment process is paused while the user decides whether to reject or resume it. Reject means the task fails, whereas resume results in the task succeeding, thus the job as well.

Clean up deployment job

This job runs only when a previous job has failed, this needs to be set on the job. The first task of the job is the Pull ingress chart from ACRPowerShell task, which is very similar to the first task of the Deploy job. It does almost the same but only pulls and exports the ingress chart. The HELM_EXPERIMENTAL_OCI environment variable needs to be set as well.

The second task, named Helm Login is also the same as in the Deploy job. It logs in to the Kubernetes cluster.

The last one is the Cleanup deployment PowerShell task with the following script:

The last one is the Cleanup deployment PowerShell task with the following script:

The task does the cleanup of the current deployment and updates the ingress chart exported to a local path before. Two PowerShell scripts here correspondingly run in this task get-whitelist.ps1 and delete-bg.ps1. You are already familiar with the first one from the deploy job. The second script is the following:

 Firstly, the script determines if the blue or the green slot needs to be restored both for the service and the canary service. Secondly, as in the deploy-bg.ps1, we set the whitelist and the firewall mode in the values.YAML file of the ingress helm chart. Next, we upgrade or install the ingress helm chart with the proper parameters. The last command will uninstall the newly deployed helm chart release, if any.

Swap deployment slots and cleanup job

We have reached the part where the deployment process finalizes the deployment by swapping the slots and removing the old version of the component.

The first task of the job is the Pull ingress chart from ACR PowerShell task; the second is the Helm login task. We will not go into detail with these two; hence they are the same as in the Cleanup deployment job.

The third is the Swap slots and remove unused slot's deployment PowerShell task with the following script:

Similarly, we have two PowerShell scripts in this task get-whitelist.ps1 and swap-bg.ps1.The swap-bg.ps1 script is the following:

As a first step, the script decides whether the blue or the green slot needs to be removed. Secondly, as in the deploy-bg.ps1, we set the whitelist and the firewall mode in the values.yaml file of the ingress helm chart. Next, we upgrade or install the ingress helm chart with the proper parameters and set both the service and the canary service selector labels to the new deployments slot. The last step uninstalls the previously deployed helm chart release if any.

A deployment process should look like this:

The deployment process in action with azure kubernetes service
12. Figure- The deployment process in action

Wrapping things up : Zero-Downtime Deployment with Kubernetes services

Zero-Downtime deployment in the Azure cloud

In the beginning, I showed you a solution for zero-downtime deployment with Kubernetes in the Azure cloud. It is essential to know your requirements, think over which technique you might need to use from industry-standard Deployment strategies. Remember, there is no silver bullet, only different techniques under different circumstances. You can even mix them to fit your needs, like we did.

System overview in Kubernetes services

While we were going through the System overview, we defined our main Kubernetes objects like ingress controlleringress-, service-, and deployment objects we intended to use in our solution. We also prepared our custom helm charts to encapsulate objects within, make them reusable and easier to release to the cluster.

We created an Azure DevOps deployment pipeline

Lastly, we created an Azure DevOps deployment pipeline, a deployment process consisting of:

  • Deploy job which does the deployment of the component helm chart and the ingress chart install/upgrade, 
  • User check job with the help of which we added a user intervention check where we let the user decide whether the deployment of the new version was ok, 
  • Cleanup deployment job which provides rollback logic on failure,
  • And a Swap deployment slot and a cleanup job finalizes our deployment by routing the new version to the public service of the component and removing the old version.

All the presented source codes are available for download from the following link.

Reference

https://thenewstack.io/deployment-strategies/