Stack 2023 - kubernetes with Rancher

blog stack contabo kubernetes k8s rancher

written: 2023-02-09

Series:

#the idea

As described in the previous post of this series, I want to replace the old days' deployment flows and make hosting more scalable and usable for multiple functionalities without the risk of making the system dirty with unwanted services when playing around with new technologies.

A friend of mine ( Omer's blog ) suggested using Rancher as the next thing. He also prepared a post for the topic of our conversation, which triggered me enough to replace everything I was using.

#trying out a new server provider

My last reliable providers were Linode and DigitalOcean. I wanted to go cheaper for the new playing around and keep them for existing production stuff. I had some production experience with Hetzner in the past. To round the experience, it all works as expected till it doesn't. And at those times, cheaper solutions always left me in deep shit with clients.

This time I am going to try Contabo.

#installation of k8s (Kubernetes) and Rancher

Note: You may wanto to jump to #issues first, as there are many tips and tricks related to the installation. !!! WARNING !!! The content of this post was tested and used in Aug 2022, and if it's months/years from now, follow these instructions with caution so you don't repeat my mistake! !!! WARNING !!!

I went with the idea of HA (high availability) cluster with an embedded database. Not recommended, but it's a risk I am willing to take or solve in the future.

This link provides most of the info, but I will pin all the other links as we go.

As versions of documentation change with new releases, I am also pasting a Rancher v2.6 link here.

#operating system

I just used the latest Ubuntu LTS.

#installing Kubernetes cluster using k3s (Lightweight Kubernetes)

Following instructions here. It should work out of the box on the first run.

These are my notes:

NOTE: replace the version in commands

Install and init ha cluster primary server with an embedded database.

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.21.7+k3s1" sh -s - server --cluster-init

Alternative: Install and init ha cluster primary server with the external database.

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.21.7+k3s1" sh -s - server --cluster-init \
  --datastore-endpoint="mysql://username:password@tcp(hostname:3306)/database-name"

Add more servers to the primary server:

choose a good name for "K3S_NODE_NAME" as you can not change it later (without a hustle)
for renaming the primary server, I had to remove and re-add it to the cluster with a new name, which worked on the first try

curl -sfL https://get.k3s.io | K3S_NODE_NAME="{server-1}" INSTALL_K3S_VERSION="v1.21.7+k3s1" K3S_URL=https://{REPLACE_WITH_PRIMARY_SERVER_IP}:6443 K3S_TOKEN={REPLACE_WITH_K3S_TOKEN} sh -s - server

For the HA cluster, it is suggested to have (at least) 3 servers.

Add workers with:

choose a good name for "K3S_NODE_NAME" as you can not change it later (without a hustle)

curl -sfL https://get.k3s.io | K3S_NODE_NAME="worker-1" INSTALL_K3S_VERSION="v1.21.7+k3s1" K3S_URL=https://{REPLACE_WITH_PRIMARY_SERVER_IP}:6443 K3S_TOKEN={REPLACE_WITH_K3S_TOKEN} sh -

You can add cluster access to your local machine by running this from the local machine terminal.

scp root@{REPLACE_ME_WITH_SERVER_IP}:/etc/rancher/k3s/k3s.yaml ~/.kube/config

To get the server token, run this on the server.

cat /var/lib/rancher/k3s/server/node-token

#install Rancher using helm charts

The latest guide is (should be) available here (v2.6 here).

!!! WARNING !!! The guide uses the LATEST version, not the STABLE one. Change links accordingly. !!! WARNING !!!

My notes, using cert-manager 1.9.1 and the STABLE version of Rancher:

helm repo add rancher-stable https://releases.rancher.com/server-charts/stable

kubectl create namespace cattle-system

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cert-manager.crds.yaml

helm repo add jetstack https://charts.jetstack.io

helm repo update

helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.9.1 \
  --set installCRDs=true

helm upgrade --install rancher-stable/Rancher
--namespace cattle-system
--create-namespace
--set hostname={REPLACE_ME_WITH_RANCHER_DOMAIN}
--set bootstrapPassword={REPLACE_ME_WITH_SOME_BOOT_PASSWORD}
--set ingress.tls.source=letsEncrypt
--set letsEncrypt.email={REPLACE_ME_WITH_EMAIL}

If you have correctly directed domain DNS to the rancher server (also check CloudFlare issue fix), you should be able to visit the login page of Rancher and use the boot password to set up login credentials.

#setting up certificates manager

The next thing was to make domains use genuine certificates. I had a lot of issues setting this up, as every version had a few different instructions, and tutorials always missed a thing or two. I have been stuck here for days!

!!! WARNING !!! You need to set up `cluster-issuer`, not `issuer', as you are using HA cluster, not a single server.

!!! WARNING !!! Tutorials all showed examples for `nginx`, but I had to use `traefik` for the `solver` property. This took me days to find out.

I ended up changing/setting up 2 credential issuers.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
  namespace: cert-manager
spec:
  acme:
    # You must replace this email address with your own.
    # Let's Encrypt will use this to contact you about expiring
    # certificates and issues related to your account.
    email: {[email protected]}
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # Secret resource that will be used to store the account's private key.
      name: letsencrypt-staging-account-key
    # Add a single challenge solver, HTTP01 using nginx
    solvers:
    - http01:
        ingress:
          class: traefik

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-production
  namespace: cert-manager
spec:
  acme:
    # You must replace this email address with your own.
    # Let's Encrypt will use this to contact you about expiring
    # certificates and issues related to your account.
    email: {[email protected]}
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # Secret resource that will be used to store the account's private key.
      name: letsencrypt-production-account-key
    # Add a single challenge solver, HTTP01 using nginx
    solvers:
    - http01:
        ingress:
          class: traefik

!!! WARNING !!!

If you are behind CloudFlare, you need to allow HTTP (not secure) requests (disable FORCE HTTPS), etc.; otherwise, the challenge won't complete!

You should probably go with a different challenge than http01, like DNS or something similar.

Now you can request a new certificate by creating a new "certificate" under the certificate manager, "Certificates." The certificate will see if a secret key exists for it; otherwise, it triggers certificate requests, challenges, etc ... I had many issues here, so welcome to ping me for help.

Certificate for rancher domain:

apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: dashboard-example-com-tls namespace: Rancher spec: # choose your own secretName: dashboard-example-com-tls dnsNames: - dashboard.example.com issuerRef: name: letsencrypt-production kind: ClusterIssuer group: cert-manager.io

Certificate for an application that lives in namespace 'test-application.'

apiVersion: cert-manager.io/v1 kind: Certificate metadata: # I use the same names for a secret key, but that is not needed name: test-application-com-tls namespace: test-application spec: secretName: test-application-com-tls dnsNames: - test.application.com issuerRef: name: letsencrypt-production kind: ClusterIssuer group: cert-manager.io

#setting up disk management with longhorn

I followed the install thru the "Apps" navigation menu. Nothing special is needed there, but things won't work out of the box. Some requirements could be more noticeable when doing the installation. I had to read longhorn documentation line by line to find those fu***rs.

This needs to be run on every cluster node/server.

Check requirements with script their script (change the version) and install dependencies till it shows green for all checks:

curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.3.1/scripts/environment_check.sh | bash

Dependencies (I had to install):

apt-get install jq apt-get install open-iscsi apt-get install nfs-common

#testing

The idea is simple, but I won't lie, have good luck trying it out, as anything can go wrong, and nothing will work.

Idea:

create application namespace
- !!! everything needs to be part of that namespace from now on
create a new deployment configuration
- docker image source
- disk requests
- load balancer
- ...
add a certificate for the domain
add ingress rule
- add domain
- select certificate
- connect to the load balancer port

What I have done in the beginning:

created my own docker image FROM nginx, replacing just index.html hello content
uploaded it to the docker hub
added docker hub credentials to Rancher (application-specific namespace)
used some subdomain where the dashboard is (testing.example.com)
used letsencrypt staging first
visited ports directly
...

#issues

Following different tutorials, the whole k8s and Rancher progress was a mess, and it required lots of nerves and enthusiasm from me to react to the goal I wanted. Being just a month late on the tutorial, nothing worked. The whole thing reminded me of the JavaScript npm world. There was even a point where I followed the Gitlab installation tutorial dated just a couple of months old, to realize at the end it was just a copy of a years-old tutorial published elsewhere. Just FML at so many points, I was sure I was having a nightmare.

That's why I will pinpoint some issues and solutions, paste some links I followed, etc.

#issue: tutorials and docs use "latest," not "stable"

Be careful, as most documentation refers to the latest releases rather than stable ones. Expect lots of issues there and always check the version and read the commands you are going to run carefully.

#issue: servers from Contabo can't see each other

Before installing a k8s cluster, some things must be solved first.

I suggest you do a sanity check and try pinging servers from one another to avoid problems when installing a k8s cluster.

The Contabo issue may be present only for some servers (I did not investigate the problem), but for the ping to work between them, I had to follow the instructions here.

I would sum it up with the following:

you have server A with IP (1.2.3.101)
you have server B with IP (1.2.3.102)
you have server C with IP (1.2.3.103)
public route within is 1.2.3.4

Go to each server and check if they can see each other with ping. If you get a response like the one below, you need mentioned changes.

root@servername: ping 1.2.3.102
PING 1.2.3.102 (1.2.3.102) 56(84) bytes of data.
From {1.2.3.4} icmp_seq=1 Destination Host Unreachable
From {1.2.3.4} icmp_seq=2 Destination Host Unreachable
From {1.2.3.4} icmp_seq=3 Destination Host Unreachable

Edit network file with:

vi /etc/netplan/01-netcfg.yaml

Where you see:

- to: 0.0.0.0/0 via: 1.2.3.4 on-link: true

Add this record before it:

- to: 1.2.3.102/32 via: 1.2.3.4

And do that for every server. I am not sure if restart was needed or not. Try pinging and see for yourself.

#issue: CloudFlare domain had too strict rules

I am using the development domain I will call example.com to manage development stuff. I have set it as a wildcard domain in CloudFlare and registered Rancher as dashboard.example.com, but I had to

Mark CloudFlare domain TLS to FULL (and not STRICT)

So the requests still get thru to the Rancher dashboard, even when the encryption is only self-signed at the beginning.

I have changed my CloudFlare rules for the domain example.com to:

removing force https in security
adding the first rule:

*example.com/.well-known/ => SSL: off

optional rule in between, for www redirect with preserving URL path:

www.example.com/* => Forwarding URL (Status Code: 301 - Permanent Redirect, Url: https://example.com/$1)

adding the last rule for forcing https:

*example.com/ => Always Use HTTPS

#issue: error localhost:8080 not reachable

export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

#issue: error cert-manager timeout error

do loops until it succeeds

helm fetch jetstack/cert-manager --untar && cd cert-manager &&
while ! helm install --namespace cert-manager --create-namespace --set installCRDs=true -- cert-manager . ;
do helm uninstall cert-manager -n cert-manager ; done

#issue: error incompatible kube version

Error: INSTALLATION FAILED: chart requires kubeVersion: < 1.24.0-0 which is incompatible with Kubernetes v1.24.3+k3s1

install older version of k3s (see releases here)

curl https://get.k3s.io | INSTALL_K3S_VERSION=v1.23.9+k3s1 ...

#issue: forgotten finish installation URL

echo https://dashboard.example.com/dashboard/?setup=$(kubectl get secret --namespace cattle-system bootstrap-secret -o go-template='{{.data.bootstrapPassword|base64decode}}')

#suggestion

On your local machine, install the following:

helm
kubectl

It is much easier to work from a local terminal than thru ssh terminal.

#where I am today and what is the plan

My development flow could be better, but we are getting there.

I am stuck with having docker registries available in every application namespace. Doing a local build of docker images with automatic upload to the registry and deployment thru the local terminal connected to the cluster.

I will upgrade and automate things as I go. But, for now, it is enough.

#conclusion

The experience was one of the hardest I have ever had in my 20+ years in this business. And I have done dockers from scratch and used vim as the primary editor.

Even though things are not perfect, the upgraded deployment flow is still blazingly fast and reliable, and all I was dreaming about before starting the whole upgrading trip.

Would I do it again? Yes, but with more beer and more breaks in between.

Till next time, stay sexy and hydrated.

Stack 2023 - going cheaper fail

blog k8s rancher gitlab CI runners CD DevOps GitOps