Dobry vecer,
mam tu takovou kuriozitu stran PCIe passthrough (kvm, libvirt).
Pri pokusu o spusteni VM, do niz je skrze PCI passthrough vecpan Broadcom / LSI MegaRAID SAS 2208, se VM nespusti, stav je okamzite paused a dmesg vyplivne chyby z pcieport.
[ 4939.169052] pcieport 0000:00:1b.4: Intel SPT PCH root port ACS workaround enabled
[ 4939.169294] pcieport 0000:00:1b.4: AER: Multiple Correctable error message received from 0000:03:00.0
[ 4939.169307] pcieport 0000:00:1b.4: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID)
[ 4939.169308] pcieport 0000:00:1b.4: device [8086:a2eb] error status/mask=00001000/00002000
[ 4939.169310] pcieport 0000:00:1b.4: [12] Timeout
[ 4939.169324] vfio-pci 0000:03:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
[ 4939.169325] vfio-pci 0000:03:00.0: device [1000:005b] error status/mask=00000001/00002000
[ 4939.169327] vfio-pci 0000:03:00.0: [ 0] RxErr (First)
[ 4939.169328] vfio-pci 0000:03:00.0: AER: Error of this Agent is reported first
[ 4939.318287] pcieport 0000:00:1b.4: Intel SPT PCH root port ACS workaround enabled
[ 4939.330493] vfio-pci 0000:03:00.0: enabling device (0000 -> 0003)
[ 4941.548543] pcieport 0000:00:1b.4: AER: Uncorrectable (Non-Fatal) error message received from 0000:00:1b.4
[ 4941.548551] pcieport 0000:00:1b.4: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Completer ID)
[ 4941.548554] pcieport 0000:00:1b.4: device [8086:a2eb] error status/mask=00008000/00010000
[ 4941.548556] pcieport 0000:00:1b.4: [15] CmpltAbrt (First)
[ 4941.548558] pcieport 0000:00:1b.4: AER: TLP Header: 00000000 00000000 00000000 00000000
[ 4941.548610] pcieport 0000:00:1b.4: AER: device recovery successful
1000:005b je ID radice.
8086:a2eb je ID PCH na desce.
Pozoruhodne je, ze pokud pockam dostatecne dlouho (treba 5-10 minut) a zkusim AlmaLinux VM spustit znovu, tak uz to na 99% projde.
Zelezo:
MSI Z270 PC MATE / i7-8700 / 64 GB
BIOS desky je modnutej pro podporu CoffeeLake CPU (jeden z nutnych patchu se tyka PCIe - ovsem toliko tech z CPU, ne tech z chipsetu)
CSM je vypnute, bezi UEFI-only
Jako OS na zeleze bezi Kubuntu; jadro 6.8.0-54-generic
ACS patch pro IOMMU groups je aktivni.
Bezi mi tam dve VM:
Windows (pres PCI passthrough GTX1070 GPU i Audio, dualport sitovka, SATA radic; skrze MDEV pak Intel iGPU)
AlmaLinux 8 (pres PCI passthrough radic Dell PERC H710)
V minulosti obe VM fungovaly v pohode na prvni dobrou.
Spusteni AlmaLinux VM bez PCI passthrough radice je vzdy OK (to je na zaklade logu vyse neprilis prekvapive).
VM AlmaLinux je arch='x86_64' machine='pc-q35-8.2' a v XML nejsou zadny obskurnosti, takze to nepovazuji za nutne sem nahravat (?).
lspci -tv
-[0000:00]-+-00.0 Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers
+-01.0-[01]--+-00.0 NVIDIA Corporation GP104 [GeForce GTX 1070] >>> PCI passthrough Windows
| \-00.1 NVIDIA Corporation GP104 High Definition Audio Controller >>> PCI passthrough Windows
+-02.0 Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] >>> MDEV passthrough Windows
+-08.0 Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
+-14.0 Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
+-14.2 Intel Corporation 200 Series PCH Thermal Subsystem
+-16.0 Intel Corporation 200 Series PCH CSME HECI #1
+-17.0 Intel Corporation 200 Series PCH SATA controller [AHCI mode] >>> PCI passthrough Windows
+-1b.0-[02]--
+-1b.4-[03]----00.0 Broadcom / LSI MegaRAID SAS 2208 [Thunderbolt] >>> PCI passthrough AlmaLinux (rozbity)
+-1c.0-[04]--
+-1c.4-[05]----00.0 ASMedia Technology Inc. ASM2142/ASM3142 USB 3.1 Host Controller
+-1c.6-[06-07]----00.0-[07]--
+-1c.7-[08]--+-00.0 Intel Corporation 82576 Gigabit Network Connection >>> PCI passthrough Windows
| \-00.1 Intel Corporation 82576 Gigabit Network Connection >>> PCI passthrough Windows
+-1d.0-[09]----00.0 Toshiba Corporation XG4 NVMe SSD Controller
+-1f.0 Intel Corporation 200 Series PCH LPC Controller (Z270)
+-1f.2 Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller
+-1f.3 Intel Corporation 200 Series PCH HD Audio
+-1f.4 Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller
\-1f.6 Intel Corporation Ethernet Connection (2) I219-V
/proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.0-54-generic root=UUID=d8cf64fc-763a-4d17-8203-04b2e6d04c07 ro intel_iommu=on iommu=pt vfio-pci.ids=10de:1b81,10de:10f0,1000:005b earlymodules=vfio-pci i915.enable_guc=0 quiet splash vt.handoff=7
1000:005b je ID toho radice, 10de... jsou od nVidie.
/etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1b81,10de:10f0,1000:005b disable_vga=1
/etc/modprobe.d/blacklist.conf
...
blacklist megaraid_sas
cat /etc/modules-load.d/kvm-gvt-d.conf
kvmgt
vfio-iommu-type1
vfio-mdev
V nedavne dobe jsem se akorat snazil rozbehnout Intel iGP passthrough, coz se mi nakonec podarilo, tak nevim, jestli jsem si neco nerozvrtal v souvislosti s tim...
Samo, kdyz vysoupnu megaraid_sas z blacklistu, tak se k poli normalne dostanu a vse funguje v pohode. Tu chybu o pcie portu to hazi jen zcela konkretne pri pokusu to soupnout do passthrough.
Delam neco na prvni pohled blbe?
Diky.