| ib_srpt Test Procedure |
| ====================== |
| |
| At least the following tests must be run before releasing a new SRPT version: |
| |
| * Make sure that SRPT compiles and installs without triggering any |
| compiler warning. Use the following command to compile and install SRPT: |
| |
| for d in scst srpt; do make -sC $d clean && make -sC $d install; done |
| |
| * Verify the output of run-regression-tests for kernel versions starting at |
| 2.6.27 up to and including the latest released kernel. |
| |
| * Verify that SRPT compiles, installs and works fine when following the |
| instructions in README.ofed for the latest released OFED distribution and |
| with the latest released CentOS, Ubuntu and openSUSE distributions. |
| |
| * Verify that module loading and unloading works fine. |
| |
| * Verify that rejecting logins does not trigger a memory leak, e.g. as follows: |
| * Run the following command on the target system: |
| ${SCST_TRUNK}/scripts/monitor-memory-usage | tee memlog.txt |
| * Run the following command on the initiator system: |
| target_id="$(/usr/sbin/ibsrpdm -c -d /dev/infiniband/umad0 | head -n1)" |
| for ((i=0;i<100000;i++)); do |
| echo "$target_id" >/sys/class/infiniband_srp/srp-mlx4_0-1/add_target |
| done |
| |
| * Verify that the following I/O stress test does not report any errors even |
| when left running for several hours: |
| |
| dev=... |
| umount /mnt |
| mkfs.ext4 -O ^has_journal $dev |
| while \ |
| mount $dev /mnt && \ |
| rm -rf /mnt/test* && \ |
| fio --verify=md5 -rw=randwrite --size=10m --bs=4k \ |
| --loops=1000000 --iodepth=64 --group_reporting --sync=1 --direct=1 \ |
| --ioengine=aio --directory=/mnt --name=test --thread --numjobs=80 \ |
| --runtime=30 && \ |
| fsck -N $dev |
| do |
| true |
| done |
| |
| * Repeat the above test with SCST_MAX_TGT_DEV_COMMANDS set to 48 and after |
| having applied the patch below on ib_srpt.c. The expected result is that no |
| data corruption occurs but that on the target error messages are logged |
| about a negative req_lim value. The lowest expected value for req_lim is -15. |
| |
| Index: srpt/src/ib_srpt.c |
| =================================================================== |
| --- srpt/src/ib_srpt.c (revision 2412) |
| +++ srpt/src/ib_srpt.c (working copy) |
| @@ -2705,7 +2705,7 @@ |
| ch->max_ti_iu_len = it_iu_len; |
| rsp->buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | |
| SRP_BUF_FORMAT_INDIRECT); |
| - rsp->req_lim_delta = cpu_to_be32(ch->rq_size); |
| + rsp->req_lim_delta = cpu_to_be32(ch->rq_size + 16); |
| ch->req_lim = ch->rq_size; |
| ch->req_lim_delta = 0; |
| |
| * Verify that a SCSI reset works properly by running the following command |
| on an initiator system (note: with kernel version 2.6.37 and before the |
| command below triggers a bug in the Linux SRP initiator -- see also |
| https://bugzilla.kernel.org/show_bug.cgi?id=13893): |
| |
| (target) |
| echo add debug > /sys/kernel/scst_tgt/targets/ib_srpt/trace_level |
| |
| (initiator) |
| sg_reset -d ${initiator_device} |
| |
| Verify that the target logged that it has processed a SRP_TSK_LUN_RESET |
| message. |
| |
| * Run the following command on a target system: |
| |
| while true; do \ |
| /etc/init.d/scst stop; sleep 3; /etc/init.d/scst start; sleep 5; \ |
| done |
| |
| and the following commands on an initiator system: |
| |
| target_id="$(/usr/sbin/ibsrpdm -c -d /dev/infiniband/umad0)" |
| while true; do date; rmmod ib_srp; modprobe ib_srp; echo "${target_id}" \ |
| > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target; sleep 2; done |
| |
| and verify that nothing unexpected happens. |
| |
| * Log in twice from an initiator system, and verify that the first session |
| receives a DREQ upon the second login: |
| |
| target_id="$(/usr/sbin/ibsrpdm -c -d /dev/infiniband/umad0 | head -n1)" |
| dmesg -c |
| echo "${target_id}" > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target |
| dmesg -c |
| echo "${target_id}" > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target |
| dmesg -c |
| echo "${target_id}" > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target |
| |
| * Test low memory conditions: load SRPT, reduce the amount of available |
| memory by creating a large file on a tmpfs file system and run a stress test |
| on an initiator system. |
| |
| * Test the state machine for SCST commands in SRPT by using SCST's error |
| injection mechanism. Add the following to scst/src/Makefile, log in from |
| an initiator system and trigger SRP I/O: |
| ccflags-y += -DCONFIG_SCST_DEBUG -g |
| ccflags-y += -DCONFIG_SCST_DEBUG_TM -DCONFIG_SCST_TM_DBG_GO_OFFLINE |
| |
| * Test with multiple values of ib_srp_tablesize in the range 1..128. |
| |
| * Verify that the initiator does not lock up while running the command below |
| (see also http://bugzilla.kernel.org/show_bug.cgi?id=14235): |
| |
| dev=... |
| fio --bs=512 --buffered=0 --ioengine=libaio --rw=read --invalidate=1 \ |
| --thread --numjobs=8 --loops=10 --gtod_reduce=1 --group_reporting \ |
| --name=$dev --filename=$dev |
| |
| * Test whether queue overflow recovery works correctly as follows: |
| - On the target, reload ib_srpt with srpt_sq_size set to 64. Add e.g. the |
| following in /etc/modprobe.d/ib_srpt.conf: |
| options ib_srpt srpt_sq_size=64 |
| - On the initiator, run a direct I/O test with large block sizes, e.g. |
| dev=/dev/sd... |
| scripts/blockdev-perftest -f -d -j -m 12 -M 24 $dev |
| - On the initiator, run the following two commands in parallel: |
| scripts/blockdev-perftest -r -d -j -m 12 -M 24 $dev |
| fio --verify=md5 -rw=randwrite --size=10m --bs=4k \ |
| --loops=1000000 --iodepth=64 --group_reporting --sync=1 --direct=1 \ |
| --norandommap --ioengine=aio --directory=/mnt --name=test --thread \ |
| --numjobs=80 --runtime=30 |
| |
| * Test whether aborting multipart RDMA transfers works correctly as follows: |
| - On the target, reload ib_srpt with srpt_sq_size set to 64. |
| - On the initiator, run a direct I/O test with large block sizes, e.g. 128 KB. |
| - Verify that on the target kernel messages similar to the following are |
| logged frequently: |
| ib_srpt: ***ERROR***: srpt_perform_rdmas[2966]: ib_post_send() returned |
| -12 for 1/2 |
| - On the target, unload and reload the ib_srpt kernel module. |
| - Verify that no kernel crash occurs on the target. |
| - Repeat the above a few times. |
| |
| * Test whether the code for rejecting a login after the completion thread has |
| been created works fine. Do that as follows: |
| - Insert "ret = -ENOMEM;" after the srpt_ch_qp_rtr() call. |
| - Rebuild, reinstall and restart SCST and ib_srpt on a system running a |
| debug kernel. |
| - Log in a few times from another system. |
| - Verify on the initiator system that each login attempt results in |
| "write error: Connection reset by peer". |
| - Verify on the target that no error messages have been logged in the kernel |
| log, that no completion threads remain (output of "ps aux | grep srpt" must |
| not list any kernel threads) and that no sessions remain |
| (output of "ls /sys/kernel/scst_tgt/targets/ib_srpt/*/sessions" must be |
| empty) and that the amount of free memory remains the same. |
| |
| * Run this script on the target to verify that active sessions are closed |
| properly when a target is disabled: |
| e=1 |
| while true; do |
| echo $e |
| for f in /sys/kernel/scst_tgt/targets/ib_srpt/*/enabled; do |
| echo $e > $f |
| done |
| e=$((1-e)) |
| sleep 10 |
| done |
| While this script is running, repeatedly try to log in from another system. |
| |
| * Test as follows that module removal with active sessions works fine (this |
| test uses the loopback capability of an IB HCA): |
| |
| for ((i=0;i<1;i++)); do |
| /etc/init.d/scst restart |
| while [ -e /sys/module/ib_srp ]; do |
| rmmod ib_srp >/dev/null 2>&1 || sleep 1 |
| done |
| modprobe ib_srp |
| cat /sys/kernel/scst_tgt/targets/ib_srpt/*/login_info | |
| sed 's/^t//' | |
| while read target_info; do |
| echo "${target_info}" |
| for dev in /sys/class/infiniband_srp/srp-*; do |
| echo -n "${target_info}" > $dev/add_target & |
| done |
| wait |
| done |
| (cd /sys/class/srp_host && ls) |
| sleep 1 |
| done |