All homelab notes and experiments.

Deploying Kubernetes using Terraform, Proxmox, Ansible, and K3s

I’ve been meaning to try Kubernetes for a long time, but I kept finding excuses. It’s been at the back of my mind marinating for yeaaars.

One thing I absolutely did not want to do was spend hours clicking through admin panels just to get something working. I recently learned there’s a term for this: ClickOps. I developed a strong aversion to this way of doing things when I transitioned to Cloud Engineering. Clicking things can get old fast.

Terraform + Proxmox

There is barely any overlap with what I do at my full-time job and my homelab projects. We use Terraform at work to manage AWS resources. I use Proxmox for all my servers.

I knew there was a Proxmox provider for Terraform, but I hadn’t seriously considered it. Seeing how vastly different the “resources” looked compared to AWS provider.

I gave it a shot, and it worked beautifully. There’s something exciting about finally bridging the gap between the tools I use at work and the ones I use in my homelab.

This solved my VM provisioning problem.

This is the only Proxmox provider I recommend: https://github.com/bpg/terraform-provider-proxmox. I ran into issues with all the others.

Ansible for configuration management

Next, I needed a way to handle configuration management. While Terraform can invoke Bash scripts, it doesn’t track configuration changes well. This led me to another tool I had been meaning to try: Ansible.

Prior to this project, I’ve been only reading about it in passing. No actual experience at all. This was a good excuse to try it out.

Wrapping up

My primary goal is reproducibility. What I want is the config I’ll write now, I can reuse later on. I can destroy, and rebuild to make it easy to experiment.

Here’s what the process looks like for creating a new Kubernetes cluster once Proxmox is up and running:

terraform init
terraform apply
cd ansible
ansible-playbook -i hosts setup-k3s.yml
export KUBECONFIG=./k3s.yaml
kubectl get nodes

That’s it! The setup is fully reproducible, and I now have a fresh Kubernetes cluster.

https://github.com/jerico/terraform-proxmox-k3s

Next step is actually running things on the cluster 😄

Upgrading UPS Battery

My KSTAR YDC9101S RT UPS battery died a couple of months ago. I’ve been delaying fixing it because I didn’t want a hackish solution since it involves electricity. I also didn’t want to put my servers offline for uncertain amount of time.

The manual says the my UPS battery is not user-serviceable, I was directed to call for a service. I didn’t want to.

I knew that it has to have a battery inside, I just need to figure out how to pull it out. I also didn’t know if I could take off the battery without the UPS shutting down.

How to remove the internal battery

  1. Pull out the front panel
  2. Remove the battery connector
  3. Unscrew the 4 screws and the metal panel will be loose
  4. Pull out the battery

The battery is a lead-acid 12v9ah enclosed in a thick plastic. Connected in a series to make it 24v9ah.

Will the UPS work without battery?

Yes, as long as 1) Bypass mode is enabled in the settings 2) It has already started/working. Start up without battery does not work.

First attempt: Using original battery connector

The cost of 12v9ah battery is almost the same as 12v25ah. With 250%+ more capacity, I couldn’t justify buying the lower capacity.

The issue with 12v25ah is the form factor. It does not fit inside the UPS battery container.

What I end up doing is get the battery connector from the original battery and connect it to my new battery outside the unit.

I wasn’t satisfied with this especially the battery connector is in front and I could not close the front panel.

Second attempt: Using External Battery Pack Connector

I read in the manual that this can support an external battery pack. It is also rack mounted and there’s a port at the back to connect it. This also gave me the confidence that it can support bigger capacity batteries.

I’m pretty sure that it’s just a parallel connection to the internal battery. What I did not know and not written in the manual is what type of port it’s using.

After searching Lazada for all sorts of battery connector, it turns out it’s an Anderson Connector rated at 50a.

I found one that’s pre-terminated with terminal lug at the end.

Now it looks like this

Much better!

I can now confidently setup new servers. Next step: I need a sandbox/staging Proxmox server for all the things I’m looking to experiment to.

Cloning Tailscale VM

I wanted to create 3 Tailscale exit nodes for my 3 ISPs: Globe, PLDT, and Converge. I’m thinking of using it as a DIY VPN because sometimes some sites are slow on an ISP.

I mapped each VM to different VLAN specific to the ISP it will use.

Issue: Duplicate node key

I encountered an issue where when I clone a VM with running Tailscale, running tailscale up results in the same node key. To reset the node key, I had to:

apt-get remove tailscale
rm -r /var/cache/tailscale
rm /var/lib/tailscale/tailscaled.state
apt-get install tailscale
tailscale up -reset

Proxmox iGPU Passthrough only works on Windows guests

I wanted to passthrough UHD 630 to a DSM guest. I thought that PCIe passthrough in Proxmox has matured enough that it would be a straightforward task to assign my iGPU Intel UHD 630 to a Linux guest.

This was not the case.

I’m getting the following errors on Linux and Mac guests.

Host error log

DMAR: [DMA Write NO_PASID] Request device [c7:00.0] fault addr 0x27af3000 [fault reason 0x05] PTE Write access is not set

DSM error log

[ 4.093363] i915 0000:01:00.0: Invalid ROM contents
[ 4.095707] [drm:gen9_set_dc_state [i915]] *ERROR* DC state mismatch (0x0 -> 0x2)
[ 4.103321] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[ 5.754708] i915 0000:01:00.0: Resetting rcs0 after gpu hang
[ 5.755644] i915 0000:01:00.0: Resetting bcs0 after gpu hang
[ 5.756533] i915 0000:01:00.0: Resetting vcs0 after gpu hang
[ 5.757376] i915 0000:01:00.0: Resetting vecs0 after gpu hang
[ 7.707272] i915 0000:01:00.0: Resetting chip after gpu hang
[ 7.708617] i915 0000:01:00.0: GPU recovery failed
[ 7.716928] [drm] Initialized i915 1.6.0 20171222 for 0000:01:00.0 on minor 0
[ 8.107294] i915 0000:01:00.0: fb0: inteldrmfb frame buffer device
[ 8.702947] i915 0000:01:00.0: HDMI-A-1: EDID is invalid:
[ 42.201199] i915 0000:01:00.0: HDMI-A-2: EDID is invalid:

Things I tried

Blacklisting iGPU on boot

I followed https://3os.org/infrastructure/proxmox/gpu-passthrough/igpu-passthrough-to-vm/#proxmox-configuration-for-igpu-full-passthrough with all the modifications to the Proxmox host. Results are the same.

  • Blacklisting iGPU so the host won’t initialize it.

Use a vbios file

I tried using a vbios romfile from https://github.com/patmagauran/i915ovmfPkg to re-initiate the GPU. Results are the same.

  • Copied the rom file to /usr/share/kvm/ and added romfile=<vbios>.rom to hostpci0 in /etc/pve/qemu-server/<VMID>.conf.

SR-IOV

I briefly considered using SR-IOV https://github.com/strongtz/i915-sriov-dkms but it looks like a lot of work. Host and guest have to support it. I have limited control over DSM.

Probably in the future if I have a use-case of passing through iGPU to a guest.

Other useful resources

Showing 4 of 52 notes