kernel-boot: Add rdma_topo tool
For some time now modern multi-NIC servers now have very complex
topology. Often with NICs, GPUs and NVMe devices that are topologically
co-located. These systems tend to come with specialized ACS requirements
for PCI Peer to Peer, for instance ACS disable or ACS setup specially for
translated traffic.
NVIDIA's latest systems have a novel PCI multipath system that requires
special asymmetric ACS.
Introduce a tool to help users configure the ACS on such systems. The tool
will be able to parse the PCI topology and identify the topological
features then generate the require ACS settings.
Modern kernels support the config_acs kernel command line parameter to
allow fine grained settings so the correct ACS for the topology can be fed
into Grub and to the kernel command line to configure it at boot
The tool has four functions:
topo - Print out the topology from the RDMA perspective. Indicate what
devices are P2P connected to the NIC.
write-grub-acs - Emit the config_acs kernel command line parameter for
the required ACS configuration
setpci-acs - Use setpci after booting to set the required ACS
configuration. This is not recommended but provided to help
legacy systems without config_acs.
check - Read the live ACS settings and compare them to the required
configuration
This initial version supports two NVIDIA platforms. There is an
expectation it will grow to more broadly support more common topologies as
well.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This is the userspace components for the Linux Kernel's drivers/infiniband subsystem. Specifically this contains the userspace libraries for the following device nodes:
The userspace component of the libibverbs RDMA kernel drivers are included under the providers/ directory. Support for the following Kernel RDMA drivers is included:
Additional service daemons are provided for:
This project uses a cmake based build system. Quick start:
$ bash build.sh
build/bin will contain the sample programs and build/lib will contain the shared libraries. The build is configured to run all the programs ‘in-place’ and cannot be installed.
$ apt-get install build-essential cmake gcc libudev-dev libnl-3-dev libnl-route-3-dev ninja-build pkg-config valgrind python3-dev cython3 python3-docutils pandoc
Supported releases:
$ dnf builddep redhat/rdma-core.spec
NOTE: Fedora Core uses the name ‘ninja-build’ for the ‘ninja’ command.
$ zypper install cmake gcc libnl3-devel libudev-devel ninja pkg-config valgrind-devel python3-devel python3-Cython python3-docutils pandoc
Install required packages:
$ yum install cmake gcc libnl3-devel libudev-devel make pkgconfig valgrind-devel
Developers on CentOS 7 or Amazon Linux 2 are suggested to install more modern tooling for the best experience.
CentOS 7:
$ yum install epel-release $ yum install cmake3 ninja-build pandoc
Amazon Linux 2:
$ amazon-linux-extras install epel $ yum install cmake3 ninja-build pandoc
NOTE: EPEL uses the name ‘ninja-build’ for the ‘ninja’ command, and ‘cmake3’ for the ‘cmake’ command.
To set up software RDMA on an existing interface with either of the available drivers, use the following commands, substituting <DRIVER> with the name of the driver of your choice (rdma_rxe or siw) and <TYPE> with the type corresponding to the driver (rxe or siw).
# modprobe <DRIVER> # rdma link add <NAME> type <TYPE> netdev <DEVICE>
Please note that you need version of iproute2 recent enough is required for the command above to work.
You can use either ibv_devices or rdma link to verify that the device was successfully added.
Bugs should be reported to the linux-rdma@vger.kernel.org mailing list In your bug report, please include:
Information about your system:
How to reproduce the bug.
If the bug is a crash, the exact output printed out when the crash occurred, including any kernel messages produced.
See Contributing to rdma-core.
Stable versions are released regularly with backported fixes (see Documentation/stable.md) The current minimum version still maintained is ‘v33.X’