Published: Dec 12, 2023 by Isaac Johnson
We’ll cover two key areas today. The first came when I discovered a physical utility machine was offline. I powered it up easy enough (it has an on button), but I decided I needed to properly monitor it with Uptime Kuma. It also gave a good opportunity to illustrate typing Uptime Kuma to Matrix rooms for monitoring.
The other improvement we’ll cover is upgrading Gitea. I upgraded it not that long ago and it got totally hosed. This time we’ll do a bugfix update then proceeded to a minor release. The unforeseen problem I had to deal with this time was a crashing Redis cluster. We’ll go over the many ways I tried then how I succeeded to bringing Redis (and thusly Gitea) back online.
Utility VMs
I found that one of my utility boxes was down.
One of the reasons I never noticed was that Uptime really only was monitoring my Kubernetes Clusters
I’ll now add the host on it’s eth0 (.33).
I can see ping tests are now responding
I also added my DockerHost to the group and used a label to denote it’s usage
Matrix Room
Let’s first add a Room
I’ll give it a name and create the room
I needed to not only invite my ‘builder’ user but also login (via element) to accept the invite. Once I was able to see the room as builder
then my Matrix test worked
Lastly, I enabled the same monitoring for K8s
I changed the room icon and could see the test pushes from Uptime in the room
Upgrading Gitea
Let’s talk about two kinds of upgrades.
The first is minor - we have a gitea instance but we just want to keep it up to date with the security and bugfix updates for our release.
We can see our Gitea chart is at 9.5.1
$ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
argo-cd default 1 2022-10-03 14:55:02.238016 -0500 CDT deployed argo-cd-1.0.1
azure-vote default 1 2022-07-27 06:14:33.598900543 -0500 CDT deployed azure-vote-0.1.1
gitea default 11 2023-11-07 08:10:24.849544239 -0600 CST deployed gitea-9.5.1 1.20.5
We’ll want to grab our current values
$ helm get values gitea -o yaml > gitea.values.passwordsandall.yaml
There are now two approaches we can take - download the chart or specify the version to the chart repo.
In the first approach, I can download the 9.5.1 chart directly from gitea charts
builder@DESKTOP-QADGF36:~$ cd giteachart/
builder@DESKTOP-QADGF36:~/giteachart$ wget https://dl.gitea.com/charts/gitea-9.5.1.tgz
--2023-12-02 06:32:13-- https://dl.gitea.com/charts/gitea-9.5.1.tgz
Resolving dl.gitea.com (dl.gitea.com)... 18.160.249.54, 18.160.249.45, 18.160.249.44, ...
Connecting to dl.gitea.com (dl.gitea.com)|18.160.249.54|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 288892 (282K) [application/x-tar]
Saving to: ‘gitea-9.5.1.tgz’
gitea-9.5.1.tgz 100%[==========================================================================================================================>] 282.12K --.-KB/s in 0.07s
2023-12-02 06:32:13 (3.85 MB/s) - ‘gitea-9.5.1.tgz’ saved [288892/288892]
builder@DESKTOP-QADGF36:~/giteachart$ tar -xzf ./gitea-9.5.1.tgz
builder@DESKTOP-QADGF36:~/giteachart$ ls -ltra
total 304
-rw-r--r-- 1 builder builder 288892 Oct 17 12:27 gitea-9.5.1.tgz
drwxr-xr-x 59 builder builder 12288 Dec 2 06:32 ..
drwxr-xr-x 5 builder builder 4096 Dec 2 06:32 gitea
drwxr-xr-x 3 builder builder 4096 Dec 2 06:32 .
builder@DESKTOP-QADGF36:~/giteachart$ ls -ltra gitea
total 140
-rw-r--r-- 1 builder builder 20792 Oct 17 12:27 values.yaml
-rw-r--r-- 1 builder builder 981 Oct 17 12:27 renovate.json5
-rw-r--r-- 1 builder builder 63376 Oct 17 12:27 README.md
-rw-r--r-- 1 builder builder 1174 Oct 17 12:27 LICENSE
-rw-r--r-- 1 builder builder 1108 Oct 17 12:27 Chart.yaml
-rw-r--r-- 1 builder builder 422 Oct 17 12:27 Chart.lock
-rw-r--r-- 1 builder builder 293 Oct 17 12:27 .yamllint
-rw-r--r-- 1 builder builder 10 Oct 17 12:27 .prettierignore
-rw-r--r-- 1 builder builder 488 Oct 17 12:27 .helmignore
-rw-r--r-- 1 builder builder 230 Oct 17 12:27 .editorconfig
drwxr-xr-x 4 builder builder 4096 Dec 2 06:32 templates
drwxr-xr-x 2 builder builder 4096 Dec 2 06:32 docs
drwxr-xr-x 3 builder builder 4096 Dec 2 06:32 ..
drwxr-xr-x 5 builder builder 4096 Dec 2 06:32 .
drwxr-xr-x 5 builder builder 4096 Dec 2 06:32 charts
Now with my values I can upgrade by giving the new tag and the local downloaded version:
builder@DESKTOP-QADGF36:~/giteachart$ helm upgrade --dry-run gitea -f ./gitea.values.passwordsandall.yaml ./gitea | grep -i image
image: "busybox:latest"
image: "gitea/gitea:1.20.5-rootless"
imagePullPolicy: Always
image: "gitea/gitea:1.20.5-rootless"
imagePullPolicy: Always
image: "gitea/gitea:1.20.5-rootless"
imagePullPolicy: Always
image: "gitea/gitea:1.20.5-rootless"
imagePullPolicy: Always
image: docker.io/bitnami/redis-cluster:7.2.1-debian-11-r26
imagePullPolicy: "IfNotPresent"
# Now with the image.tag specified
builder@DESKTOP-QADGF36:~/giteachart$ helm upgrade --dry-run gitea -f ./gitea.values.passwordsandall.yaml --set image.tag=1.20.6 ./gitea | grep -i image
image: "busybox:latest"
image: "gitea/gitea:1.20.6-rootless"
imagePullPolicy: Always
image: "gitea/gitea:1.20.6-rootless"
imagePullPolicy: Always
image: "gitea/gitea:1.20.6-rootless"
imagePullPolicy: Always
image: "gitea/gitea:1.20.6-rootless"
imagePullPolicy: Always
image: docker.io/bitnami/redis-cluster:7.2.1-debian-11-r26
imagePullPolicy: "IfNotPresent"
Had they made a chart version that referenced 1.20.6, we would be fine to use a chart version route. However, versions 9.6.0 and 9.6.1 move to Gitea 1.21.
Bugfix release upgrade
Let’s perform this upgrade using the image tag:
$ wget https://dl.gitea.com/charts/gitea-9.5.1.tgz
$ tar -xzf ./gitea-9.5.1.tgz
$ helm get values gitea -o yaml > gitea.values.passwordsandall.yaml
$ helm upgrade gitea -f ./gitea.values.passwordsandall.yaml --set image.tag=1.20.6 ./gitea
Minor release upgrade
Going from one minor release to another means one should really read releease notes. Last time I neglected to do this I didn’t take note they dropped mysql and I had a bad day.
Note: MySQL/MariaDB was not dropped. However, I would find that there are some major changes between 8.x and 9.x charts - read more about that here
The charts page will detail up the differences from a charts perspective
We can see the only real thing to watch is the update of Redis tags to 9.1.3. This means the redis cluster might also bounce.
To be honest, I’ve had one of the pods in a crashloop the whole time and the app has been fine
$ kubectl get pods -l app.kubernetes.io/instance=gitea
NAME READY STATUS RESTARTS AGE
gitea-redis-cluster-0 1/1 Running 1 (16d ago) 16d
gitea-redis-cluster-3 1/1 Running 1 (16d ago) 16d
gitea-redis-cluster-2 1/1 Running 0 24d
gitea-redis-cluster-1 1/1 Running 0 24d
gitea-redis-cluster-4 1/1 Running 4 (15d ago) 24d
gitea-5898bd5c79-pgt5q 1/1 Running 0 7m7s
gitea-redis-cluster-5 0/1 CrashLoopBackOff 4764 (3m52s ago) 16d
Download approach
First we can do this as a download.
I’ll move my 9.5.1 out of the way and get the latest 9.6.1
builder@DESKTOP-QADGF36:~/giteachart$ ls
gitea gitea-9.5.1.tgz gitea.values.passwordsandall.yaml o
builder@DESKTOP-QADGF36:~/giteachart$ mv gitea gitea_951
builder@DESKTOP-QADGF36:~/giteachart$ wget https://dl.gitea.com/charts/gitea-9.6.1.tgz
--2023-12-02 06:59:49-- https://dl.gitea.com/charts/gitea-9.6.1.tgz
Resolving dl.gitea.com (dl.gitea.com)... 18.160.181.93, 18.160.181.70, 18.160.181.23, ...
Connecting to dl.gitea.com (dl.gitea.com)|18.160.181.93|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 290634 (284K) [application/x-tar]
Saving to: ‘gitea-9.6.1.tgz’
gitea-9.6.1.tgz 100%[================================================>] 283.82K --.-KB/s in 0.06s
2023-12-02 06:59:49 (4.81 MB/s) - ‘gitea-9.6.1.tgz’ saved [290634/290634]
builder@DESKTOP-QADGF36:~/giteachart$ tar -xzf gitea-9.6.1.tgz
builder@DESKTOP-QADGF36:~/giteachart$ ls -l
total 604
drwxr-xr-x 5 builder builder 4096 Dec 2 06:59 gitea
-rw-r--r-- 1 builder builder 288892 Oct 17 12:27 gitea-9.5.1.tgz
-rw-r--r-- 1 builder builder 290634 Nov 27 15:02 gitea-9.6.1.tgz
-rw-r--r-- 1 builder builder 1700 Dec 2 06:33 gitea.values.passwordsandall.yaml
drwxr-xr-x 5 builder builder 4096 Dec 2 06:32 gitea_951
-rw-r--r-- 1 builder builder 21985 Dec 2 06:39 o
Before we continue, be aware that IF in our prior steps we set the image, it won’t override that:
# Using our value with a set image
builder@DESKTOP-QADGF36:~/giteachart$ helm get values gitea -o yaml > gitea.values.passwordsandall.yaml.NEW
builder@DESKTOP-QADGF36:~/giteachart$ helm upgrade --dry-run gitea -f ./gitea.values.passwordsandall.yaml.NEW ./gitea |
grep -i Image
image: "busybox:latest"
image: "gitea/gitea:1.20.6-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.20.6-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.20.6-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.20.6-rootless"
imagePullPolicy: IfNotPresent
image: docker.io/bitnami/redis-cluster:7.2.3-debian-11-r1
imagePullPolicy: "IfNotPresent"
# Using the values where we didnt set an image:
builder@DESKTOP-QADGF36:~/giteachart$ helm upgrade --dry-run gitea -f ./gitea.values.passwordsandall.yaml ./gitea | gre
p -i Image
image: "busybox:latest"
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: docker.io/bitnami/redis-cluster:7.2.3-debian-11-r1
imagePullPolicy: "IfNotPresent"
To correct that, just remove the ‘image.tag’ block:
OR you can force it to determine the right image by adding --set image.tag=""
when invoking:
builder@DESKTOP-QADGF36:~/giteachart$ helm upgrade --dry-run gitea -f ./gitea.values.passwordsandall.yaml.NEW --set image.tag="" ./gitea | grep -i Image
image: "busybox:latest"
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: docker.io/bitnami/redis-cluster:7.2.3-debian-11-r1
imagePullPolicy: "IfNotPresent"
From Chart repo
Knowing that we want the latest chart, we can also just upgrade using their chart repo
builder@DESKTOP-QADGF36:~/giteachart$ helm repo add gitea-charts https://dl.gitea.com/charts/
"gitea-charts" already exists with the same configuration, skipping
builder@DESKTOP-QADGF36:~/giteachart$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Unable to get an update from the "freshbrewed" chart repository (https://harbor.freshbrewed.science/chartrepo/library):
failed to fetch https://harbor.freshbrewed.science/chartrepo/library/index.yaml : 404 Not Found
...Unable to get an update from the "myharbor" chart repository (https://harbor.freshbrewed.science/chartrepo/library): failed to fetch https://harbor.freshbrewed.science/chartrepo/library/index.yaml : 404 Not Found
...Successfully got an update from the "nfs" chart repository
...Successfully got an update from the "azure-samples" chart repository
...Successfully got an update from the "jfelten" chart repository
...Successfully got an update from the "ngrok" chart repository
...Successfully got an update from the "adwerx" chart repository
...Successfully got an update from the "akomljen-charts" chart repository
...Successfully got an update from the "kube-state-metrics" chart repository
...Successfully got an update from the "confluentinc" chart repository
...Successfully got an update from the "hashicorp" chart repository
...Successfully got an update from the "kuma" chart repository
...Successfully got an update from the "btungut" chart repository
...Successfully got an update from the "sonarqube" chart repository
...Successfully got an update from the "elastic" chart repository
...Successfully got an update from the "rook-release" chart repository
...Successfully got an update from the "harbor" chart repository
...Successfully got an update from the "sumologic" chart repository
...Successfully got an update from the "nginx-stable" chart repository
...Successfully got an update from the "kiwigrid" chart repository
...Successfully got an update from the "datadog" chart repository
...Successfully got an update from the "rancher-latest" chart repository
...Successfully got an update from the "argo-cd" chart repository
...Successfully got an update from the "incubator" chart repository
...Successfully got an update from the "newrelic" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "opencost" chart repository
...Successfully got an update from the "portainer" chart repository
...Successfully got an update from the "zabbix-community" chart repository
...Successfully got an update from the "actions-runner-controller" chart repository
...Successfully got an update from the "dapr" chart repository
...Successfully got an update from the "novum-rgi-helm" chart repository
...Successfully got an update from the "gitea-charts" chart repository
...Successfully got an update from the "rhcharts" chart repository
...Successfully got an update from the "castai-helm" chart repository
...Successfully got an update from the "open-telemetry" chart repository
...Successfully got an update from the "longhorn" chart repository
...Successfully got an update from the "lifen-charts" chart repository
...Successfully got an update from the "kubecost" chart repository
...Unable to get an update from the "epsagon" chart repository (https://helm.epsagon.com):
Get "https://helm.epsagon.com/index.yaml": dial tcp: lookup helm.epsagon.com on 172.22.64.1:53: no such host
...Successfully got an update from the "signoz" chart repository
...Successfully got an update from the "openzipkin" chart repository
...Successfully got an update from the "bitnami" chart repository
...Successfully got an update from the "grafana" chart repository
...Successfully got an update from the "crossplane-stable" chart repository
...Successfully got an update from the "uptime-kuma" chart repository
...Successfully got an update from the "gitlab" chart repository
Update Complete. ⎈Happy Helming!⎈
then we don’t use a local path, rather their chart/repo name to upgade
builder@DESKTOP-QADGF36:~/giteachart$ helm upgrade --dry-run gitea -f ./gitea.values.passwordsandall.yaml.NEW --set image.tag="" gitea-charts/gitea | grep -i Image
image: "busybox:latest"
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: "gitea/gitea:1.21.1-rootless"
imagePullPolicy: IfNotPresent
image: docker.io/bitnami/redis-cluster:7.2.3-debian-11-r1
imagePullPolicy: "IfNotPresent"
Issues: Redis Crashes
I got stuck with redis cluster crashes
kubectl get pods -l app.kubernetes.io/instance=gitea DESKTOP-QADGF36: Sun Dec 3 07:08:27 2023
NAME READY STATUS RESTARTS AGE
gitea-5898bd5c79-pgt5q 1/1 Running 0 24h
gitea-redis-cluster-2 1/1 Running 0 23h
gitea-redis-cluster-0 0/1 CrashLoopBackOff 285 (2m42s ago) 23h
gitea-dfccfb879-k77qq 0/1 CrashLoopBackOff 282 (43s ago) 23h
gitea-redis-cluster-1 0/1 CrashLoopBackOff 285 (5s ago) 23h
Luckily the old (1.20.6) Gitea was still running, but the new redis clsuer seemed to be in endless reboot
$ kubectl logs gitea-redis-cluster-0
redis-cluster 13:05:46.80 INFO ==>
redis-cluster 13:05:46.80 INFO ==> Welcome to the Bitnami redis-cluster container
redis-cluster 13:05:46.80 INFO ==> Subscribe to project updates by watching https://github.com/bitnami/containers
redis-cluster 13:05:46.81 INFO ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
redis-cluster 13:05:46.81 INFO ==>
redis-cluster 13:05:46.81 INFO ==> ** Starting Redis setup **
redis-cluster 13:05:46.83 WARN ==> You set the environment variable ALLOW_EMPTY_PASSWORD=yes. For safety reasons, do not use this flag in a production environment.
redis-cluster 13:05:46.84 INFO ==> Initializing Redis
redis-cluster 13:05:46.85 INFO ==> Setting Redis config file
redis-cluster 13:05:46.92 INFO ==> Changing old IP 10.42.3.30 by the new one 10.42.3.30
redis-cluster 13:05:46.97 INFO ==> Changing old IP 10.42.1.48 by the new one 10.42.1.48
redis-cluster 13:05:47.00 INFO ==> Changing old IP 10.42.0.183 by the new one 10.42.0.183
redis-cluster 13:05:47.01 INFO ==> ** Redis setup finished! **
1:C 03 Dec 2023 13:05:47.057 # WARNING: Changing databases number from 16 to 1 since we are in cluster mode
1:C 03 Dec 2023 13:05:47.057 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 03 Dec 2023 13:05:47.057 * Redis version=7.2.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 03 Dec 2023 13:05:47.057 * Configuration loaded
1:M 03 Dec 2023 13:05:47.057 * monotonic clock: POSIX clock_gettime
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 7.2.3 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in cluster mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 1
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | https://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
1:M 03 Dec 2023 13:05:47.062 * Node configuration loaded, I'm eae710660c9ba961343e4ca9f78ae5721bb4b408
1:M 03 Dec 2023 13:05:47.062 * Server initialized
1:M 03 Dec 2023 13:05:47.133 # Unable to obtain the AOF file appendonly.aof.2.base.rdb length. stat: Permission denied
Spoiler: I ended up, as you’ll see at the end of this section, needing to remove the PVCs, STS and then recreate - but I wanted to show you how I got there
I’ll create a ‘fix’ pods
$ cat fixRedis0.yaml
apiVersion: v1
kind: Pod
metadata:
name: ubuntu-pod-redis0
spec:
containers:
- name: ubuntu-container
image: ubuntu
command: ["tail", "-f", "/dev/null"]
volumeMounts:
- mountPath: /scripts
name: scripts
- mountPath: /bitnami/redis/data
name: redis-data
- mountPath: /opt/bitnami/redis/etc/redis-default.conf
name: default-config
subPath: redis-default.conf
- mountPath: /opt/bitnami/redis/etc/
name: redis-tmp-conf
volumes:
- name: redis-data
persistentVolumeClaim:
claimName: redis-data-gitea-redis-cluster-0
- configMap:
defaultMode: 493
name: gitea-redis-cluster-scripts
name: scripts
- configMap:
defaultMode: 420
name: gitea-redis-cluster-default
name: default-config
- emptyDir: {}
name: redis-tmp-conf
$ kubectl apply -f fixRedis0.yaml
pod/ubuntu-pod-redis0 created
I looked at the content of the rdb file in the busted Redis 0
root@ubuntu-pod-redis0:/bitnami/redis/data/appendonlydir# cat /bitnami/redis/data/appendonlydir/appendonly.aof.2.base.rdb
REDIS0011� redis-ver7.2.1�
redis-bits�@�ctime��Teused-mem�`�aof-base����⸮�ԋ?commits-count-5-commit-b2abd6e99f93a5b66d9477a3c2824d31b0382c3c���^ԋ11fffeee7da0ad41�@Y@m����] string
uidint64�uname� builder�
_old_ >�1�<3x؋3e498fded5a5a418�@Y@m����] string
uname� builder�uidint64�
_old_ �1��'6520b7f1ef543fbc�@Z@m����] string
_old_uid�1�
int64�uname� builder���xԋ'system.setting.picture.disable_gravatarfalse��c�?commits-count-4-c�}root@ubuntu-pod-redis0:/bitnami/redis/data/appendonlydir#
root@ubuntu-pod-redis0:/bitnami/redis/data/appendonlydir# cp /bitnami/redis/data/appendonlydir/appendonly.aof.2.base.rdb > /bitnami/redis/data/appendonlydir/appendonly.aof.2.base.rdb.old
cp: missing destination file operand after '/bitnami/redis/data/appendonlydir/appendonly.aof.2.base.rdb'
Try 'cp --help' for more information.
Perhaps if I move that out of the way and copy a generally empty one I saw from my working Redis cluster 2, that might work?
root@ubuntu-pod-redis0:/bitnami/redis/data/appendonlydir# cp /bitnami/redis/data/appendonlydir/appendonly.aof.2.base.rdb /bitnami/redis/data/appendonlydir/appendonly.aof.2.base.rdb.old
root@ubuntu-pod-redis0:/bitnami/redis/data/appendonlydir# echo L2JpdG5hbWkvcmVkaXMvZGF0YS9hcHBlbmRvbmx5ZGlyL2FwcGVuZG9ubHkuYW9mLjEuYmFzZS5yZGIK | base64 --decode > /bitnami/redis/data/appendonlydir/appendonly.aof.2.base.rdb
root@ubuntu-pod-redis0:/bitnami/redis/data/appendonlydir# chmod 666 /bitnami/redis/data/appendonlydir/appendonly.aof.2.base.rdb
I saw other guides that suggested do a bgrewriteaof
. Perhaps that will sort us out?
builder@DESKTOP-QADGF36:~$ kubectl exec -it gitea-redis-cluster-0 -- /bin/bash
I have no name!@gitea-redis-cluster-0:/$ which redis
I have no name!@gitea-redis-cluster-0:/$ which redis-cli
/opt/bitnami/redis/bin/redis-cli
I have no name!@gitea-redis-cluster-0:/$ redis-cli -a bgrewriteaof
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
AUTH failed: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct?
127.0.0.1:6379>
127.0.0.1:6379> bgrewriteaof
Background append only file rewriting started
Even though I could get it partially back up, the redis cluster was in a bad way. Logging in would 500
I decided to take a more nuclear approach. I would delete the stateful set and recreate
$ kubectl get sts gitea-redis-cluster -o yaml > save.sts.yaml
$ kubectl delete sts gitea-redis-cluster
statefulset.apps "gitea-redis-cluster" deleted
But they were just as jacked
Every 2.0s: kubectl get pods -l app.kubernetes.io/instance=gitea DESKTOP-QADGF36: Sun Dec 3 07:57:58 2023
NAME READY STATUS RESTARTS AGE
gitea-dfccfb879-k77qq 1/1 Running 287 (27m ago) 24h
gitea-redis-cluster-2 0/1 Running 0 32s
gitea-redis-cluster-0 0/1 CrashLoopBackOff 2 (8s ago) 33s
gitea-redis-cluster-1 0/1 CrashLoopBackOff 2 (8s ago) 33s
I decided to do it again, this time adding a couple more to the replicas (3 to 5). Perhaps some new replicas would gloss past the crashing ones?
The StatefulSet came up but I still got 500 errors logging in
kubectl get pods -l app.kubernetes.io/instance=gitea DESKTOP-QADGF36: Sun Dec 3 08:03:06 2023
NAME READY STATUS RESTARTS AGE
gitea-dfccfb879-k77qq 1/1 Running 287 (32m ago) 24h
gitea-redis-cluster-2 0/1 Running 0 2m39s
gitea-redis-cluster-3 1/1 Running 0 2m39s
gitea-redis-cluster-4 1/1 Running 0 2m39s
gitea-redis-cluster-1 0/1 CrashLoopBackOff 4 (66s ago) 2m39s
gitea-redis-cluster-0 0/1 CrashLoopBackOff 4 (48s ago) 2m39s
My last idea was to kill the Redis sts, then scrub the PVCs, then create again.
I removed the sts
$ kubectl delete sts gitea-redis-cluster
statefulset.apps "gitea-redis-cluster" deleted
Checked the volumes that were there
$ kubectl get pvc | grep gitea | grep redis
redis-data-gitea-redis-cluster-0 Bound pvc-068e7a7a-0cae-413f-85d3-7b98cc15c14c 8Gi RWO managed-nfs-storage 26d
redis-data-gitea-redis-cluster-1 Bound pvc-ca977aa4-decd-44ef-9919-14aa73b7b53b 8Gi RWO managed-nfs-storage 26d
redis-data-gitea-redis-cluster-2 Bound pvc-4859c13f-b647-4faa-a3de-be1609f7b278 8Gi RWO managed-nfs-storage 26d
redis-data-gitea-redis-cluster-4 Bound pvc-8a148d27-3264-41c0-9c21-d995fb6184af 8Gi RWO managed-nfs-storage 26d
redis-data-gitea-redis-cluster-3 Bound pvc-d3926a7e-f2a9-41eb-aadd-b4d9aca6060c 8Gi RWO managed-nfs-storage 26d
redis-data-gitea-redis-cluster-5 Bound pvc-da65f09e-7fab-4a59-8d08-4406f51b192e 8Gi RWO managed-nfs-storage 26d
Then scrubbed them out
$ kubectl delete pvc redis-data-gitea-redis-cluster-0 & \
kubectl delete pvc redis-data-gitea-redis-cluster-1 & \
kubectl delete pvc redis-data-gitea-redis-cluster-2 & \
kubectl delete pvc redis-data-gitea-redis-cluster-3 & \
kubectl delete pvc redis-data-gitea-redis-cluster-4 & \
kubectl delete pvc redis-data-gitea-redis-cluster-5
quick note: I did have one get stuck because I forgot I had created that temporary pod to scrub files/debug redis cluster 0. Once I reemoved that, then i could get that last PVC deleted. i.e.
$ kubectl delete pod ubuntu-pod-redis0
pod "ubuntu-pod-redis0" deleted
$ kubectl delete pvc redis-data-gitea-redis-cluster-0
persistentvolumeclaim "redis-data-gitea-redis-cluster-0" deleted
Next, I changed the save.sts.yaml
back to 3 replicas then fired it off to recreate them
$ kubectl apply -f save.sts.yaml
statefulset.apps/gitea-redis-cluster created
$get pods -l app.kubernetes.io/instance=gitea DESKTOP-QADGF36: Sun Dec 3 08:11:07 2023
NAME READY STATUS RESTARTS AGE
gitea-dfccfb879-k77qq 1/1 Running 287 (40m ago) 25h
gitea-redis-cluster-2 1/1 Running 0 72s
gitea-redis-cluster-1 1/1 Running 0 73s
gitea-redis-cluster-0 1/1 Running 0 73s
This time I could sign in! Whew!
(Checking later)
4 hours and all is still well so I would call the issue sorted
$ kubectl get pods -l app.kubernetes.io/instance=gitea
NAME READY STATUS RESTARTS AGE
gitea-dfccfb879-k77qq 1/1 Running 287 (4h50m ago) 29h
gitea-redis-cluster-2 1/1 Running 0 4h10m
gitea-redis-cluster-1 1/1 Running 0 4h10m
gitea-redis-cluster-0 1/1 Running 0 4h10m
$ kubectl get pvc | grep redis | grep gitea
redis-data-gitea-redis-cluster-0 Bound pvc-901e54c1-84e5-49cd-852e-5fcbd9f12ff6 8Gi RWO managed-nfs-storage 4h10m
redis-data-gitea-redis-cluster-1 Bound pvc-1b355407-d1b0-4cbd-80a9-792aeda23dbc 8Gi RWO managed-nfs-storage 4h10m
redis-data-gitea-redis-cluster-2 Bound pvc-501c8106-bd42-473b-8f3a-2efe034b6dec 8Gi RWO managed-nfs-storage 4h10m
The summary might be that Redis clusters should be mere caching servers for things like sessions and temporary form data. They really should be safe to delete and recreate.
Some time back I got into a bit of a heated discussion with a SME from an APM vendor over whether Redis was a “Database”. While it stores data, if one is using it as a database, they’re going to have a bad day.
Summary
Well, buckle up, because today was quite the rollercoaster ride in the world of home lab maintenance! We started off by playing matchmaker, introducing Uptime Kuma to a bunch of boxes for some quality monitoring time. Then, we decided to throw a party in a Matrix room and guess who we invited? That’s right, Uptime Kuma!
But wait, there’s more: We then turned our attention to Gitea, not once, but twice! It was like a double feature movie night, but with upgrades. The sequel, however, had a surprise twist - a Redis Cluster crash. It was like a popcorn spill in the middle of the movie. But fear not, we tackled it head-on by deleting the STS and PVCs, and then re-launching from a saved STS YAML. It was a classic case of “If at first you don’t succeed, delete and relaunch!”
Now, with Forgejo joining our tech family, I’m feeling much more secure when it comes to upgrading Gitea. It’s like having a tech superhero on standby! With two reasonably HA systems that can use each other for DR, it’s like we’ve built our own tech fortress. And who knows? In the future, we might just add some more bells and whistles to this setup, including PSQL backups. Stay tuned for the next episode of our tech adventures where we’ll check out “Process Compose” and more.
full disclosure. I asked copilot to spruce up my summary statement above as it was really drab. i think it did a good job, though I did trim a bit of the folksy yukyuks