Managing Helm-sourced Platform Components

When running applications on Kubernetes, it is common to install a set of platform components that provide supporting services. Helm is one of the most popular tools for doing this.

This example demonstrates how you can install and maintain such platform components across many environments using ConfigHub.

Scenario

This example is built on the following scenario:

The technology organization of a company is divided into line-of-business (LOB) teams and a supporting platform team
The platform team is responsible for providing kube clusters "as a service" to LOB teams (the cluster provisioning itself is not demonstrated)
As part of providing clusters, the platform team curates and deploys a set of base platform components for logging, security, observability, ingress and more.

Setup

Make sure you have completed the prerequisites, then get the scripts and files ready with:

git clone https://github.com/confighub/examples.git
cd examples/helm-platform-components

You can then set things up step by step or using a single helper script. Executing this script is equivalent to the steps in the sub-sections below:

bin/install-all

To get a better understanding of each step, follow the sub-sections below and review the source of the scripts.

Install Helm charts as base units

The first step is to render Helm charts into base units in ConfigHub. First familiarize yourself with the source of the script bin/install-charts, then make sure you're logged into the right ConfigHub org and execute it:

bin/install-charts

The script will create space with a unique name and use cub helm install to render Helm charts as config units into the space. It is best practice not to modify the config units directly installed via Helm. Instead you modify units cloned from these base units. This will allow you to more seamlessly perform chart version upgrades later on.

Clone unit to a dev space

Next, clone all the units from the base space to a "dev" space:

bin/clone base dev

The script will automatically add a prefix to "base" and "dev", create a dev space and create a clone in the dev space of all the units in the base space.

Clone units to team spaces

You can use the clone script to keep cloning spaces. In this scenario, you can now create clones for the line-of-business teams that are supported by the platform team:

bin/clone dev team1
bin/clone dev team2

Hint: You can create as deep of a "clone tree" as you need. It all depends on how you want to control the flow of changes.

Scenario Tasks

Customize Helm-originated config

Configurations generated by Helm Charts are traditionally customized via values files. With ConfigHub, you can customize the generated configuration directly instead. This offers more flexibility, more ease-of-use (because you don't have to understand values files), and more robustness over time.

As an example, let's imagine that team1 has had some resource issues with their nginx controller. They decide to bump up the resources. This can be done by directly editing the config of ingress-nginx in the team1 space:

cub unit edit ingress-nginx --space $(bin/pre)-team1
...
          mountPath: /usr/local/certificates/
          readOnly: true
        resources:
          requests:
            cpu: 100m     # <---- edit this
            memory: 90Mi  # <---- edit this
      nodeSelector:
        kubernetes.io/os: linux
...

But a more robust (and scriptable) approach is to update the config using an appropriate function:

cub run set-container-resources --quiet --unit ingress-nginx --space $(bin/pre)-team1 \
        --container-name controller \
        --operation floor \
        --cpu 1000m \
        --memory 1Gi \
        --limit-factor 0 \
        --change-desc "Bumping resources for nginx ingress after seeing some crashes recently (#INCIDENT-1234)"

By using a function, you get the following benefits:

No typo or YAML indentation mistakes
Smart logic such as "floor" which means the resource limits will only be changed if the current limits are lower
Ability to make bulk changes
Robust scripting

Whether you edit directly or use a function, you get these benefits of the ConfigHub, config-as-data approach:

Changes are made only to the environment that needs them. No other configuration (base, dev or team1 in this case) was affected by this change
It is easy for anyone to understand what has changed, who changed it, why it was changed, and what impact it will have on the live system
Changes are made first to config data, not to a live system. It can be reviewed and approved using a variety of workflows before it is applied.
Rollback (restore) is easy

Upgrade Helm Charts to a new version

It is important to be able to safely upgrade Helm Charts to a new version, especially when new versions address security problems. But this can often be a dicey affair because it can be hard to understand what exact changes are deployed during a helm upgrade. It is especially risky after you make your own custom changes to installed resources.

ConfigHub helps with this. Let's say that we want to downgrade ingress-nginx to version 4.12.6 because we found a regression in the current version. We start by upgrading the base units:

cub helm upgrade --space $(bin/pre)-base \
        --namespace ingress-nginx ingress-nginx ingress-nginx \
        --repo https://kubernetes.github.io/ingress-nginx \
        --version 4.12.6

After this operation we can check the revision list:

cub revision list ingress-nginx --space $(bin/pre)-base

The upgrade command resulted in a new revision and one more revision may have been created after that by the auto-resolve system. You can now diff the revisions:

cub unit diff ingress-nginx --space $(bin/pre)-base --from 3

(adjust revision number accordingly). You will see several diffs. Here is one example of a diff (your output may look different depending on latest ingress-nginx version):

...
699:           kubernetes.io/os: linux
700:         serviceAccountName: ingress-nginx
701: -      automountServiceAccountToken: true
702:         terminationGracePeriodSeconds: 300
703:         volumes:
...

This is interesting. It looks like the downgrade to version 4.12.6 included a configuration change to the Deployment resource that turns off automountServiceAccountToken. This is clearly something you will want to know about before rolling out in production.

At this point, you have only made changes to the base. We can check status of all ingress-nginx units:

cub unit list --where "Slug = 'ingress-nginx'" --space '*' --columns Name,Space.Slug,UpgradeNeeded
NAME             SPACE                UPGRADE-NEEDED
ingress-nginx    whisker-paw-base
ingress-nginx    whisker-paw-dev      Yes
ingress-nginx    whisker-paw-team1    No
ingress-nginx    whisker-paw-team2    No

(your output will vary slightly). The UPGRADE-NEEDED column indicates whether there are changes between a downstream and its upstream. In this case we just upgrade the base and therefore, dev can now be upgraded.

Let's upgrade dev:

cub unit update ingress-nginx --space $(bin/pre)-dev --upgrade

Check the command above again:

cub unit list --where "Slug = 'ingress-nginx'" --space '*' --columns Name,Space.Slug,UpgradeNeeded
NAME             SPACE                UPGRADE-NEEDED
ingress-nginx    whisker-paw-base
ingress-nginx    whisker-paw-dev      No
ingress-nginx    whisker-paw-team1    Yes
ingress-nginx    whisker-paw-team2    Yes

Now dev no longer needs an upgrade but team1 and team2 do because they are downstream from dev which was just upgraded. We can choose to upgrade just one of them or we can bulk upgrade all at once. The former is the safest bet, but for the sake of demonstration, here is how you bulk upgrade:

cub unit update --space '*' --where "Slug = 'ingress-nginx'" --upgrade --patch

and we can check the list again:

cub unit list --where "Slug = 'ingress-nginx'" --space '*' --columns Name,Space.Slug,UpgradeNeeded
NAME             SPACE                UPGRADE-NEEDED
ingress-nginx    whisker-paw-base
ingress-nginx    whisker-paw-dev      No
ingress-nginx    whisker-paw-team1    No
ingress-nginx    whisker-paw-team2    No

Now comes the big question: What happened to the resource changed we made to the team1 config? Let's diff the latest (adjust your revision number as needed):

cub unit diff ingress-nginx --space $(bin/pre)-team1 --from 2
...
692:             mountPath: /usr/local/certificates/
693:             readOnly: true
694:           resources:
695:             requests:
696:               cpu: 1000m
697:               memory: 1Gi
698:         nodeSelector:
699:           kubernetes.io/os: linux
700:         serviceAccountName: ingress-nginx
701: -      automountServiceAccountToken: true
702:         terminationGracePeriodSeconds: 300
703:         volumes:
...

You can see in this diff snippet that automountServiceAccountToken was changed as expected but also that the resource increases we made before have been left alone.

This works because when you make "local" changes to a specific unit, ConfigHub assigns ownership of those changes to the unit. During a later upgrade, the incoming changes will not clobber locally made changes.