PRIME+VFIO on Sway

I've been using PCI passthrough on my desktop for the past 5 years. I don't play any games that actively try to ban me for using a VM, so everything works well enough for me without much of a performance loss. I recently PRIME offloading which lets you use the second GPU on the host, and it's taken a bit of work to get everything working smoothly.

System info

I'd like to go over my hardware and software for anyone who may be attempting this. My system is functional and stable with this setup (currently,) and I can only attest to this on my current hardware.

Hardware

CPU: AMD Threadripper 1950x
Mobo: ASRock Taichi X399M
GPU1: AMD 5700 XT (Reference)
GPU2: AMD 6800 (Reference)

Software

Distro: Arch Linux
Kernel: Linux LTS 6.1.50
DE/WM: Sway

How to bind VFIO/amdgpu drivers

I originally wrote a script to unbind+bind when the VM starts/shuts down that gets called in the qemu hooks file, but this doesn't seem to be the best solution. The big issue here is that it's bound to the vfio driver at boot, so you need to run the script at boot to give it back to the host. This can be done with a systemd service, but why go through the effort when you can just...

Don't bind the VGA device to VFIO. At boot, it will be bound to the amdgpu driver. Libvirt will automatically bind the vfio driver when the VM starts and give it back to amdgpu when it shuts down.

Prevent sway from touching the card

By default, if the amdgpu driver is bound to the card, sway will try to use it.

The fix for this is to tell sway to only use the primary card. This can be done by setting the WLR_DRM_DEVICES variable as shown in the wlroots documentation:

I launch sway through .zprofile, so I just export the environment variable before I run it:

export WLR_DRM_DEVICES=/dev/dri/card0 

Prevent sway from touching the card again when the VM is shut down

Even though we tell sway to only use the primary card, it will automatically try to use additional cards when they're added after boot. The solution was posted here

I added this line to the release section of my hooks file:

udevadm trigger --verbose --type=devices --action=remove --subsystem-match=drm --property-match="MINOR=1"

Sway does still try to use the card for a second, but it releases it again very quickly. This shouldn't cause any issues.

AMDGPU doesn't rebind when shutting down VM

This happens when the card is in use (whether you are actively using it or not.) The big issue here is caused by sway. I found a hint for this here with step 4:

You can use # fuser -s /dev/dri/renderD129 and # fuser -s /dev/dri/card1 to see if anything is using the card. This on its own can be used to prevent the VM from launching when it's in use by adding it to the prepare section of /etc/libvirt/hooks/qemu.

if fuser -s /dev/dri/card1 || fuser -s /dev/dri/renderD129; then
    exit 1
fi

What to do on a different DE/WM?

I don't know, I use sway. Try to see if there's a way to only use the primary GPU.

Running a game

This part is very easy. All you need to do is use the environment variable:

DRI_PRIME=1