mlx5: Fix inline-scatter source address on 128B CQE REQ completions
In mlx5_parse_cqe() the inline-scatter completion path used the
function's void* buffer pointer to address the inline data:
if (cqe64->op_own & MLX5_INLINE_SCATTER_32)
err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe, ...);
else if (cqe64->op_own & MLX5_INLINE_SCATTER_64)
err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe - 1, ...);
`cqe` is the base of the CQE buffer entry, while `cqe64` points at the
64B descriptor inside that entry (offset 0 for a 64B entry, offset 64
for a 128B entry). Both call sites used the wrong base:
1. SCATTER_64: the payload occupies the first 64B of a 128B
entry, so the correct base is `cqe64 - 1`. The existing
`cqe - 1` relied on void* pointer arithmetic (a GNU extension
that subtracts one byte, not one descriptor) and so was off by
63 bytes.
2. SCATTER_32: the inline_32 payload starts at offset 0 of the
descriptor, so the correct base is `cqe64`. Passing `cqe`
instead reads from offset 0 of the buffer entry, which on a
128B CQE is 64 bytes before the payload (the tail of the
previous entry). For 64B CQEs cqe == cqe64 so the bug was
masked.
Both bugs affect inline RDMA_READ / ATOMIC completions on the legacy
ibv_poll_cq path and the extended-CQ ibv_start_poll / ibv_next_poll
path. The matching responder helpers (handle_responder,
handle_responder_lazy, handle_tag_matching) already pass the typed cqe64
pointer and so were not affected.
Use cqe64 / cqe64 - 1 at the REQ-path call sites. The void* cqe
parameter was only used here, so drop it from mlx5_get_next_cqe(),
mlx5_parse_cqe(), mlx5_parse_lazy_cqe() and the locals in
mlx5_poll_one(), mlx5_start_poll() and mlx5_next_poll().
Fixes: 8c4791ae2395 ("libmlx5: First version of libmlx5")
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
This is the userspace components for the Linux Kernel's drivers/infiniband subsystem. Specifically this contains the userspace libraries for the following device nodes:
The userspace component of the libibverbs RDMA kernel drivers are included under the providers/ directory. Support for the following Kernel RDMA drivers is included:
Additional service daemons are provided for:
This project uses a cmake based build system. Quick start:
$ bash build.sh
build/bin will contain the sample programs and build/lib will contain the shared libraries. The build is configured to run all the programs ‘in-place’ and cannot be installed.
$ apt-get install build-essential cmake gcc libudev-dev libnl-3-dev libnl-route-3-dev ninja-build pkg-config valgrind python3-dev cython3 python3-docutils pandoc
Supported releases:
$ dnf builddep redhat/rdma-core.spec
NOTE: Fedora Core uses the name ‘ninja-build’ for the ‘ninja’ command.
$ zypper install cmake gcc libnl3-devel libudev-devel ninja pkg-config valgrind-devel python3-devel python3-Cython python3-docutils pandoc
Install required packages:
$ yum install cmake gcc libnl3-devel libudev-devel make pkgconfig valgrind-devel
Developers on CentOS 7 or Amazon Linux 2 are suggested to install more modern tooling for the best experience.
CentOS 7:
$ yum install epel-release $ yum install cmake3 ninja-build pandoc
Amazon Linux 2:
$ amazon-linux-extras install epel $ yum install cmake3 ninja-build pandoc
NOTE: EPEL uses the name ‘ninja-build’ for the ‘ninja’ command, and ‘cmake3’ for the ‘cmake’ command.
To set up software RDMA on an existing interface with either of the available drivers, use the following commands, substituting <DRIVER> with the name of the driver of your choice (rdma_rxe or siw) and <TYPE> with the type corresponding to the driver (rxe or siw).
# modprobe <DRIVER> # rdma link add <NAME> type <TYPE> netdev <DEVICE>
Please note that you need version of iproute2 recent enough is required for the command above to work.
You can use either ibv_devices or rdma link to verify that the device was successfully added.
Bugs should be reported to the linux-rdma@vger.kernel.org mailing list In your bug report, please include:
Information about your system:
How to reproduce the bug.
If the bug is a crash, the exact output printed out when the crash occurred, including any kernel messages produced.
See Contributing to rdma-core.
Stable versions are released regularly with backported fixes (see Documentation/stable.md) The current minimum version still maintained is ‘v33.X’