I’ve had some issues with my kubernetes-node, basically a few random crashes. A bit inconvenient as it’s summer-time. As I am writing this, I am at our cabin and the kubernetes-node is down.
But wait a minute? Doesn’t the blog run on kubernetes? Yes, it does. But I do have backup.
A while back, I switched to zfs-sync-based backup of my node. This means all my file systems exist on the backup node – though not necessarily with same names etc. So I could’t let the downtime pass without trying to bring up services on the backup node!
Bringing up kubernetes
I have made some bootstrap scripts in the kubernetes-bootstrap repo that I plan to use for bringing up kubernetes and enough services to let argocd do it’s job. Now, I wasn’t planning on bringing up things from scratch, but the scripts to install k3s was still good to be tested. I found one spelling error (testing is always good), but other than that installing a bare, uninitalized k3s worked pretty well. Then, I put the configuration in place, mounted the file system with the etcd database, and fired up.
It failed.
2025-07-30T06:45:25.840968+00:00 remote k3s[4186]: time="2025-07-30T06:45:25Z" level=info msg="Failed to test etcd connection: this server is a not a member of the etcd cluster. Found [hassio-616dc712=https://192.168.1.153:2380], expect: hassio-616dc712=https://192.168.1.240:2380"
It seems I need to just reset the node state, which turned out to be pretty simple:
k3s server --cluster-reset
This will do it’s stuff, relabel things within etcd, and I can start k3s normally.
Now, I’m not home free. I have tons of leftover pods like this:
unifi unpoller-579fff97fd-dkzjt 1/1 Terminating 1 (22d ago) 35d
These are seemingly running, but they are on my old node. My new node is added with the nodename remote, and these run at hassio. My node knows about hassio, but since it’s down, there’s no way to check whether or not these pods are actually alive.
It’s possible quorums or whatever could save me, but with two nodes (my real node and my new one), any meaningful quorum is hard to do. But I know they are down, there’s no cleanup to do wrt mounts, containers or whatever, so I’ll just go ahead and force delete them.
kubectl delete --force -n unifi pod unpoller-579fff97fd-dkzjt
Now, they can restart on the new node.
Next issue: Storage. All my PVCs are created on hassio. While I do have the file systems locally on remote, kubernetes doesn’t know that. So, I need to see what tricks I can do. I could probably recreate things from scratch on new volumes and copy the data over, but that’s more time-consuming. I could also have synced over my data onto zpools that ware named the same, but the primary purpose is backup and not DR-node, so storage flexibility is more important.
So: The metadata for my storage basically has two errors: It think the data is on a different node, and the path is wrong. For a PV, this is how it looks:
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: zfs.csi.openebs.io
volume.kubernetes.io/provisioner-deletion-secret-name: ""
volume.kubernetes.io/provisioner-deletion-secret-namespace: ""
creationTimestamp: "2025-07-02T12:42:02Z"
finalizers:
- kubernetes.io/pv-protection
name: pvc-73f9fc83-2188-40ec-817d-c13604f3616a
resourceVersion: "19862352"
uid: e1aab6fa-6634-4583-83f4-b238675d7e69
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: syncthing-config-pvc
namespace: syncthing
resourceVersion: "19862300"
uid: 73f9fc83-2188-40ec-817d-c13604f3616a
csi:
driver: zfs.csi.openebs.io
fsType: zfs
volumeAttributes:
openebs.io/cas-type: localpv-zfs
openebs.io/poolname: nasdisk/k3s
storage.kubernetes.io/csiProvisionerIdentity: 1751064728233-9252-zfs.csi.openebs.io
volumeHandle: pvc-73f9fc83-2188-40ec-817d-c13604f3616a
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: openebs.io/nodeid
operator: In
values:
- hassio
persistentVolumeReclaimPolicy: Retain
storageClassName: zfs-storage-nas
volumeMode: Filesystem
status:
lastPhaseTransitionTime: "2025-07-02T12:42:02Z"
phase: Bound
In addition, there is a zfsvolume property:
apiVersion: zfs.openebs.io/v1
kind: ZFSVolume
metadata:
creationTimestamp: "2025-07-02T12:42:01Z"
finalizers:
- zfs.openebs.io/finalizer
generation: 2
labels:
kubernetes.io/nodename: hassio
name: pvc-73f9fc83-2188-40ec-817d-c13604f3616a
namespace: openebs
resourceVersion: "19862331"
uid: d3879115-36c3-4e80-8b9e-ee2463c928f0
spec:
capacity: "1073741824"
fsType: zfs
ownerNodeID: hassio
poolName: nasdisk/k3s
quotaType: quota
volumeType: DATASET
status:
state: Ready
In the first one, the fields are immutable, so I’m in a bit of trouble. I can’t update them. However, if I delete the PV, I can create them with the data changed, and it will happly find it in the disk. However, when I do:
kubectl delete pv pvc-73f9fc83-2188-40ec-817d-c13604f3616a
This hangs on the finalizer – I of course am not able to delete it on the hassio node, as I can’t contact it. This is actually fine, I plan to scratch this DR environment once I bring up hassio, which should have all the data intact, but I still need to be able to change things on the DR node.
There is a trick: If I do
kubectl edit pv pvc-73f9fc83-2188-40ec-817d-c13604f3616a
I can now delete the finalizer, and the delete command returns. The PV is gone. So, I recreate with these changes:
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: zfs.csi.openebs.io
volume.kubernetes.io/provisioner-deletion-secret-name: ""
volume.kubernetes.io/provisioner-deletion-secret-namespace: ""
creationTimestamp: "2025-07-02T12:42:02Z"
finalizers:
- kubernetes.io/pv-protection
name: pvc-73f9fc83-2188-40ec-817d-c13604f3616a
resourceVersion: "19862352"
uid: e1aab6fa-6634-4583-83f4-b238675d7e69
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: syncthing-config-pvc
namespace: syncthing
resourceVersion: "19862300"
uid: 73f9fc83-2188-40ec-817d-c13604f3616a
csi:
driver: zfs.csi.openebs.io
fsType: zfs
volumeAttributes:
openebs.io/cas-type: localpv-zfs
openebs.io/poolname: backup/encrypted/nasdisk/k3s
storage.kubernetes.io/csiProvisionerIdentity: 1751064728233-9252-zfs.csi.openebs.io
volumeHandle: pvc-73f9fc83-2188-40ec-817d-c13604f3616a
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: openebs.io/nodeid
operator: In
values:
- remote
persistentVolumeReclaimPolicy: Retain
storageClassName: zfs-storage-nas
volumeMode: Filesystem
status:
lastPhaseTransitionTime: "2025-07-02T12:42:02Z"
phase: Bound
The zfsvolume, I can edit and basically do the same changes. And bingo – kubernetes finds my data again!
Now, it’s a bit tricky – and I still haven’t figured out all the nuances – to make the PODs fully use them. I might have to unmount them with zfs umount, and do various tricks like deleting stuff under containers, but eventually, the PODs find the data. I’ll update this article if I ever find out the proper steps….
So, at this stage, I can bring up workloads on my DR node, but there’s still no way for the world to reach them property.
My loadbalancers with pools are intact, but they are wrong for the environment they run. The IPV6 addresses belong at my home, and there is no Unifi Gateway doing port forwarding for IPV4.
Rather than redoing all my loadbalancers and networking, I decide to create a VPN connection to the Unifi gateway and run with my BGP setup as before, with minimal changes.
In my earlier blog posts My Unifi Gateway just learned to do BGP! , BGP part two – A VPN connection to the cloud. and BGP part three – eBGP between a VPS and on-prem I have all the research done. I basically want iBGP from part one, over a VPN connection to the cloud as in part two.
I set up a VPN connection where remote has 192.168.228.2 and the Unifi gateway has 192.168.228.1. I also decide to set the IPV4 default gateway to 192.168.228.1, but adding a static route for the VPN endpoint, the DNS server and a few other (like the API endpoint for Letsencrypt DNS) out directly.
I have left out IPv6 at the time of writing, priortizing getting a working solution up – which I’d probably have done in an enterprise solution at this point, focusing on core functionality. (It pains me, but IPv6 isn’t that core functionality – most people still have IPv4).
So, I need to update the BGP peer:
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
creationTimestamp: "2025-04-25T09:44:03Z"
name: unifi
resourceVersion: "27822736"
uid: 0a771e2c-8f51-48ae-9236-bd76a2134f62
spec:
asNumber: 64512
peerIP: 192.168.228.1
sourceAddress: None
The sourceAddress, I specified because else, it put the node interfaces ip as the next hop, and not the VPN endpoint
The Unifi end of it:
ip prefix-list LIST-REMOTE-OUTGOING seq 5 deny X.X.X.X/21 le 32
ip prefix-list LIST-REMOTE-OUTGOING seq 6 deny 194.168.228.0/24 le 32
ip prefix-list LIST-REMOTE-OUTGOING seq 8 permit 0.0.0.0/0 le 24
.......
router bgp 64512
bgp router-id 192.168.1.5
neighbor linode peer-group
neighbor linode remote-as 64513
neighbor metallb peer-group
neighbor metallb remote-as 64512
neighbor remote peer-group
neighbor remote remote-as 64512
neighbor remote update-source 192.168.228.1
neighbor 192.168.229.2 peer-group linode
neighbor fd46:c709:32c6:3::1 peer-group linode
neighbor 192.168.1.153 peer-group metallb
neighbor fd46:c709:32c6:0:1e69:7aff:fe64:12e1 peer-group metallb
neighbor 192.168.228.2 peer-group remote
!
address-family ipv4 unicast
redistribute connected
neighbor linode next-hop-self
neighbor linode soft-reconfiguration inbound
neighbor linode route-map ALLOW-ALL in
neighbor linode route-map LINODE-OUTGOING out
neighbor metallb next-hop-self
neighbor metallb soft-reconfiguration inbound
neighbor metallb route-map ALLOW-ALL in
neighbor metallb route-map ALLOW-NONE out
neighbor remote next-hop-self
neighbor remote soft-reconfiguration inbound
neighbor remote route-map ALLOW-ALL in
neighbor remote route-map REMOTE-OUTGOING out
maximum-paths 2
exit-address-family
!
address-family ipv6 unicast
redistribute connected
neighbor linode activate
neighbor linode next-hop-self
neighbor linode soft-reconfiguration inbound
neighbor linode route-map LINODE-INCOMING-IPV6 in
neighbor linode route-map LINODE-OUTGOING-IPV6 out
neighbor metallb activate
neighbor metallb next-hop-self
neighbor metallb soft-reconfiguration inbound
neighbor metallb route-map ALLOW-ALL in
neighbor metallb route-map ALLOW-ALL out
exit-address-family
exit
!
route-map REMOTE-OUTGOING permit 10
match ip address prefix-list LIST-REMOTE-OUTGOING
exit
!
route-map ALLOW-ALL permit 10
exit
!
route-map LINODE-OUTGOING permit 10
match ip address prefix-list LIST-LINODE-OUTGOING
exit
!
route-map LINODE-OUTGOING-IPV6 permit 10
match ipv6 address prefix-list LINODE-OUTGOING-IPV6
exit
!
route-map LINODE-INCOMING-IPV6 permit 10
match ipv6 address prefix-list LINODE-INCOMING-IPV6
exit
!
end
As you can see, I am still running my other existing BGP configuration, just adding to it. I am using the same BGP AS as on my Unifi gateway and my primary node, making it iBGP.
This was basically it, and once configured, prefixes started flowing:
*> 10.151.24.0/26 192.168.228.2 100 0 i
*> 10.151.24.0/26 192.168.228.2 100 0 i
*> 192.168.250.0/24 192.168.228.2 100 0 i
*> 192.168.250.129/32
192.168.228.2 100 0 i
*> 192.168.250.151/32
192.168.228.2 100 0 i
*> 192.168.250.153/32
192.168.228.2 100 0 i
*> 192.168.250.155/32
192.168.228.2 100 0 i
*> 192.168.251.0/24 192.168.228.2 100 0 i
*> 192.168.251.0/32 192.168.228.2 100 0 i
*> 192.168.251.1/32 192.168.228.2 100 0 i
*> 192.168.251.5/32 192.168.228.2 100 0 i
*> 192.168.251.6/32 192.168.228.2 100 0 i
*> 192.168.251.8/32 192.168.228.2 100 0 i
*> 192.168.251.9/32 192.168.228.2 100 0 i
*> 192.168.251.10/32
192.168.228.2 100 0 i
*> 192.168.251.11/32
192.168.228.2 100 0 i
*> 192.168.251.12/32
192.168.228.2 100 0 i
*> 192.168.251.13/32
192.168.228.2 100 0 i
*> 192.168.251.14/32
192.168.228.2 100 0 i
*> 192.168.251.16/32
192.168.228.2 100 0 i
*> 192.168.251.17/32
192.168.228.2 100 0 i
*> 192.168.251.18/32
192.168.228.2 100 0 i
*> 192.168.251.19/32
192.168.228.2 100 0 i
Eureka! My workloads are accessible again.
Now, there’s a whole lot of stuff still not done. Traefik decided it lost all certificates, probably because of storage issues, but it happily recreates them.
But I managed to bring up this blog, and that’s something. The rest is just work and replication, which I’ll probably not do as my other node will be up some time tomorrow.
Careful planning, better naming of things, maybe renaming my node and the file systems matching would probably make this more trivial.
I did, however, achieve my goal: Verify that I can bring up stuff from my backup. The rest of it is just work, which I’ll probably leave out of it this time!