scst/scst/README - backupdr - Git at Google

 Generic SCSI target mid-level for Linux (SCST)
 ==============================================

 Version 3.5.0, 21 December 2020
 ----------------------------

 SCST is designed to provide unified, consistent interface between SCSI
 target drivers and Linux kernel and simplify target drivers development
 as much as possible. Detail description of SCST's features and internals
 could be found on its Internet page http://scst.sourceforge.net.

 SCST supports the following I/O modes:

  * Pass-through mode with one to many relationship, i.e. when multiple
    initiators can connect to the exported pass-through devices, for
    the following SCSI devices types: disks (type 0), tapes (type 1),
    processors (type 3), CDROMs (type 5), MO disks (type 7), medium
    changers (type 8) and RAID controllers (type 0xC).

  * FILEIO mode, which allows to use files on file systems or block
    devices as virtual remotely available SCSI disks or CDROMs with
    benefits of the Linux page cache.

  * BLOCKIO mode, which performs direct block IO with a block device,
    bypassing page-cache for all operations. This mode works ideally with
    high-end storage HBAs and for applications that either do not need
    caching between application and disk or need the large block
    throughput.

  * User space mode using scst_user device handler, which allows to
    implement in the user space high performance virtual SCSI
    devices. Comparing with fully in-kernel dev handlers this mode has
    very low overhead (few %%).

  * "Performance" device handlers, which provide in pseudo pass-through
    mode a way for direct performance measurements without overhead of
    actual data transferring from/to underlying SCSI device.

 In addition, SCST supports advanced per-initiator access and devices
 visibility management, so different initiators could see different set
 of devices with different access permissions. See below for details.

 Full list of SCST features and comparison with other Linux targets you
 can find on http://scst.sourceforge.net/comparison.html.


 Installation
 ------------

 Only vanilla kernels from kernel.org and RHEL/CentOS 5.2 kernels are
 supported, but SCST should work on other (vendors') kernels, if you
 manage to successfully compile on them. The main problem with vendors'
 kernels is that they often contain patches, which will appear only in
 the next version of the vanilla kernel, therefore it's quite hard to
 track such changes. Thus, if during compilation for some vendor kernel
 your compiler complains about redefinition of some symbol, you should
 either switch to vanilla kernel, or add or change as necessary the
 corresponding to that symbol "#if LINUX_VERSION_CODE" statement.

 Kernel version 2.6.26 and higher are supported.

 At first, make sure that the link "/lib/modules/`you_kernel_version`/build"
 points to the source code for your currently running kernel.

 Then you should consider to apply necessary kernel patches. SCST has the
 following patches for the kernel in the "kernel" subdirectory. All of
 them are optional, so, if you don't need the corresponding
 functionality, you may not apply them.

 1. readahead-2.6.X.patch. This patch fixes problem in Linux readahead
 subsystem and greatly improves performance for software RAIDs. See
 http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel
 thread for more details. It is included in the mainstream kernels 2.6.33
 and 2.6.32.11.

 2. readahead-context-2.6.X.patch. This is backported from 2.6.31 version
 of the context readahead patch http://lkml.org/lkml/2009/4/12/9, big
 thanks to Wu Fengguang. This is a performance improvement patch. It is
 included in the mainstream kernel 2.6.31.

 Then, to compile SCST type 'make scst'. It will build SCST itself and its
 device handlers. To install them type 'make scst_install'. The driver
 modules will be installed in '/lib/modules/`you_kernel_version`/extra'.
 In addition, scst.h, scst_debug.h as well as Module.symvers or
 Modules.symvers will be copied to '/usr/local/include/scst'. The first
 file contains all SCST's public data definition, which are used by
 target drivers. The other ones support debug messages logging and build
 process.

 Then you can load any module by typing 'modprobe module_name'. The names
 are:

  - scst - SCST itself
  - scst_disk - device handler for disks (type 0)
  - scst_tape - device handler for tapes (type 1)
  - scst_processor - device handler for processors (type 3)
  - scst_cdrom - device handler for CDROMs (type 5)
  - scst_modisk - device handler for MO disks (type 7)
  - scst_changer - device handler for medium changers (type 8)
  - scst_raid - device handler for storage array controller (e.g. raid) (type C)
  - scst_vdisk - device handler for virtual disks (file, device or ISO CD image).
  - scst_user - user space device handler

 Then, to see your devices remotely, you need to add a corresponding LUN
 for them (see below how). By default, no local devices are seen
 remotely. There must be LUN 0 in each LUNs set (security group), i.e.
 LUs numeration must not start from, e.g., 1. Otherwise you will see no
 devices on remote initiators and SCST core will write into the kernel
 log message: "tgt_dev for LUN 0 not found, command to unexisting LU?"

 It is highly recommended to use scstadmin utility for configuring
 devices and security groups.

 The flow of SCST initialization should be as follows:

 1. Load of SCST modules with necessary module parameters, if needed.

 2. Configure targets, devices, LUNs, etc. using either scstadmin
 (recommended), or the sysfs interface directly as described below.

 If you experience problems during modules load or running, check your
 kernel logs (or run dmesg command for the few most recent messages).

 IMPORTANT: Without loading appropriate device handler, corresponding devices
 =========  will be invisible for remote initiators, which could lead to holes
            in the LUN addressing, so automatic device scanning by remote SCSI
            mid-level could not notice the devices. Therefore you will have
 	   to add them manually via
 	   'echo "- - -" >/sys/class/scsi_host/hostX/scan',
 	   where X - is the host number.

 IMPORTANT: Working of target and initiator on the same host is
 =========  supported, except the following 2 cases: swap over target exported
            device and using a writable mmap over a file from target
 	   exported device. The latter means you can't mount a file
 	   system over target exported device. In other words, you can
 	   freely use any sg, sd, st, etc. devices imported from target
 	   on the same host, but you can't mount file systems or put
 	   swap on them. This is a limitation of Linux memory/cache
 	   manager, because in this case a memory allocation deadlock is
 	   possible like: system needs some memory -> it decides to
 	   clear some cache -> the cache is needed to be written on a
 	   target exported device -> initiator sends request to the
 	   target located on the same system -> the target needs memory
 	   -> the system needs even more memory -> deadlock.

 IMPORTANT: In the current version simultaneous access to local SCSI devices
 =========  via standard high-level SCSI drivers (sd, st, sg, etc.) and
            SCST's target drivers is unsupported. Especially it is
 	   important for execution via sg and st commands that change
 	   the state of devices and their parameters, because that could
 	   lead to data corruption. If any such command is done, at
 	   least related device handler(s) must be restarted. For block
 	   devices READ/WRITE commands using direct disk handler are
 	   generally safe.

 To uninstall, type 'make scst_uninstall'.


 Creating a kernel patch or patched kernel
 -----------------------------------------

 You can use generate-kernel-patch or generate-patched-kernel scripts in
 the scripts/ subdirectory to convert SCST source tree as it exists
 in the Subversion repository to a Linux kernel patch or generate a
 kernel source tree with the SCST patches applied correspondingly. This
 subdirectory exists only in the SVN tree.

 Example how to use generate-kernel-patch you can find at "How To install
 SCST on Ubutuntu 15.04 with in-tree kernel patches"
 https://gist.github.com/chrwei/42f8bbb687290b04b598, thanks to Chris Weiss.


 Migration from the obsolete proc interface
 ------------------------------------------

 Sysfs enabled scstadmin supports old procfs config file format, so with
 it you should do the following steps to migrate your proc-based
 configuration to the sysfs interface:

 1. Load SCST modules

 2. Run "scstadmin -config old_config_file"

 3. Run "scstadmin -write_config new_config_file"

 4. Check new_config_file and make sure it has everything written
 properly.

 5. Start using "scstadmin -config new_config_file" to configure SCST.


 Usage in failover mode
 ----------------------

 It is recommended to use TEST UNIT READY ("tur") command to check if
 SCST target is alive in MPIO configurations.


 Device handlers
 ---------------

 Device specific drivers (device handlers) are plugins for SCST, which
 help SCST to analyze incoming requests and determine parameters,
 specific to various types of devices. If an appropriate device handler
 for a SCSI device type isn't loaded, SCST doesn't know how to handle
 devices of this type, so they will be invisible for remote initiators
 (more precisely, "LUN not supported" sense code will be returned).

 In addition to device handlers for real devices, there are VDISK, user
 space and "performance" device handlers.

 VDISK device handler works over files on file systems and makes from
 them virtual remotely available SCSI disks or CDROM's. In addition, it
 allows to work directly over a block device, e.g. local IDE or SCSI disk
 or ever disk partition, where there is no file systems overhead. Using
 block devices comparing to sending SCSI commands directly to SCSI
 mid-level via scsi_do_req()/scsi_execute_async() has advantage that data
 are transferred via system cache, so it is possible to fully benefit
 from caching and read ahead performed by Linux's VM subsystem. The only
 disadvantage here that in the FILEIO mode there is superfluous data
 copying between the cache and SCST's buffers. This issue is going to be
 addressed in one of the future releases. Virtual CDROM's are useful for
 remote installation. See below for details how to setup and use VDISK
 device handler.

 SCST user space device handler provides an interface between SCST and
 the user space, which allows to create pure user space devices. The
 simplest example, where one would want it is if he/she wants to write a
 VTL. With scst_user he/she can write it purely in the user space. Or one
 would want it if he/she needs some sophisticated for kernel space
 processing of the passed data, like encrypting them or making snapshots.

 "Performance" device handlers for disks, MO disks and tapes in their
 exec() method skip (pretend to execute) all READ and WRITE operations
 and thus provide a way for direct link performance measurements without
 overhead of actual data transferring from/to underlying SCSI device.

 NOTE: Since "perf" device handlers on READ operations don't touch the
 ====  commands' data buffer, it is returned to remote initiators as it
       was allocated, without even being zeroed. Thus, "perf" device
       handlers impose some security risk, so use them with caution.


 Compilation options
 -------------------

 There are the following compilation options, that could be commented
 in/out in Makefile and scst.h:

  - CONFIG_SCST_DEBUG - if defined, turns on some debugging code,
    including some logging. Makes the driver considerably bigger and slower,
    producing large amount of log data.

  - CONFIG_SCST_TRACING - if defined, turns on ability to log events. Makes the
    driver considerably bigger and leads to some performance loss.

  - CONFIG_SCST_EXTRACHECKS - if defined, adds extra validity checks in
    the various places.

  - CONFIG_SCST_USE_EXPECTED_VALUES - if not defined (default), initiator
    supplied expected data transfer length and direction will be used
    only for verification purposes to return error or warn in case if one
    of them is invalid. Instead, locally decoded from SCSI command values
    will be used. This is necessary for security reasons, because
    otherwise a faulty initiator can crash target by supplying invalid
    value in one of those parameters. This is especially important in
    case of pass-through mode. If CONFIG_SCST_USE_EXPECTED_VALUES is
    defined, initiator supplied expected data transfer length and
    direction will override the locally decoded values. This might be
    necessary if internal SCST commands translation table doesn't contain
    SCSI command, which is used in your environment. You can know that if
    you enable "minor" trace level and have messages like "Unknown
    opcode XX for YY. Should you update scst_scsi_op_table?" in your
    kernel log and your initiator returns an error. Also report those
    messages in the SCST mailing list scst-devel@lists.sourceforge.net.
    Note, that not all SCSI transports support supplying expected values.
    You should try to enable this option if you have a not working with
    SCST pass-through device, for instance, an SATA CDROM.

  - CONFIG_SCST_DEBUG_TM - if defined, turns on task management functions
    debugging, when on LUN 6 some of the commands will be delayed for
    about 60 sec., so making the remote initiator send TM functions, eg
    ABORT TASK and TARGET RESET. Also define
    CONFIG_SCST_TM_DBG_GO_OFFLINE symbol in the Makefile if you want that
    the device eventually become completely unresponsive, or otherwise to
    circle around ABORTs and RESETs code. Needs CONFIG_SCST_DEBUG turned
    on.

  - CONFIG_SCST_DEBUG_SYSFS_EAGAIN - if defined, makes three out of four
    reads of sysfs attributes fail with -EAGAIN and also makes every sysfs
    write fail with -EAGAIN. This is useful to test -EAGAIN handling in user
    space tools like e.g. scstadmin. See also the documentation of the
    last_sysfs_mgmt_res sysfs attribute for more information.

  - CONFIG_SCST_STRICT_SERIALIZING - if defined, makes SCST send all commands to
    underlying SCSI device synchronously, one after one. This makes task
    management more reliable, with cost of some performance penalty. This
    is mostly actual for stateful SCSI devices like tapes, where the
    result of command's execution depends from device's settings defined
    by previous commands. Disk and RAID devices are stateless in the most
    cases. The current SCSI core in Linux doesn't allow to abort all
    commands reliably if they sent asynchronously to a stateful device.
    Turned off by default, turn it on if you use stateful device(s) and
    need as much error recovery reliability as possible. As a side effect
    of CONFIG_SCST_STRICT_SERIALIZING, on kernels below 2.6.30 no kernel
    patching is necessary for pass-through device handlers (scst_disk,
    etc.).

  - CONFIG_SCST_TEST_IO_IN_SIRQ - if defined, allows SCST to submit selected
    SCSI commands (TUR and READ/WRITE) from soft-IRQ context (tasklets).
    Enabling it will decrease amount of context switches and slightly
    improve performance. The goal of this option is to be able to measure
    overhead of the context switches. If after enabling this option you
    don't see under load in vmstat output on the target significant
    decrease of amount of context switches, then your target driver
    doesn't submit commands to SCST in IRQ context. For instance,
    iSCSI-SCST doesn't do that, but qla2x00t with
    CONFIG_QLA_TGT_DEBUG_WORK_IN_THREAD disabled - does. This option is
    designed to be used with vdisk NULLIO backend.

    WARNING! Using this option enabled with other backend than vdisk
    NULLIO is unsafe and can lead you to a kernel crash!

  - CONFIG_SCST_STRICT_SECURITY - if defined, makes SCST zero allocated data
    buffers. Undefining it (default) considerably improves performance
    and eases CPU load, but could create a security hole (information
    leakage), so enable it, if you have strict security requirements.

  - CONFIG_SCST_ABORT_CONSIDER_FINISHED_TASKS_AS_NOT_EXISTING - if defined,
    in case when TASK MANAGEMENT function ABORT TASK is trying to abort a
    command, which has already finished, remote initiator, which sent the
    ABORT TASK request, will receive TASK NOT EXIST (or ABORT FAILED)
    response for the ABORT TASK request. This is more logical response,
    since, because the command finished, attempt to abort it failed, but
    some initiators, particularly VMware iSCSI initiator, consider TASK
    NOT EXIST response as if the target got crazy and try to RESET it.
    Then sometimes get crazy itself. So, this option is disabled by
    default.

  - CONFIG_SCST_DIF_INJECT_CORRUPTED_TAGS - if defined, allows injection
    of corrupted DIF tags according to the Oracle specification. This
    functionality is working only if dif_mode doesn't contain dev_store
    and dif_type is 1.

  - CONFIG_SCST_NO_TOTAL_MEM_CHECKS - disables checks of allocated
    memory, see scst_max_cmd_mem below. Allows to avoid 2 global
    variables on the fast path, hence get better multi-queue performance.

 HIGHMEM kernel configurations are fully supported, but not recommended
 for performance reasons, except for scst_user, where they are not
 supported, because this module deals with user supplied memory on a
 zero-copy manner. If you need to use HIGHMEM enabled, consider change
 VMSPLIT option or use 64-bit system configuration instead.

 For changing VMSPLIT option (CONFIG_VMSPLIT to be precise) you should in
 "make menuconfig" command set the following variables:

  - General setup->Configure standard kernel features (for small systems): ON

  - General setup->Prompt for development and/or incomplete code/drivers: ON

  - Processor type and features->High Memory Support: OFF

  - Processor type and features->Memory split: according to amount of
    memory you have. If it is less than 800MB, you may not touch this
    option at all.


 Module parameters
 -----------------

 Module scst supports the following parameters:

  - scst_threads - allows to set count of SCST's threads. By default it
    is CPU count.

  - scst_max_cmd_mem - sets maximum amount of memory in MB allowed to be
    consumed by the SCST commands for data buffers at any given time. By
    default it is approximately TotalMem/4.

  - scst_max_dev_cmd_mem - sets maximum amount of memory in MB allowed
    to be consumed by all SCSI commands of a device at any given time. By
    default, it is approximately 2/5 of scst_max_cmd_mem.

  - auto_cm_assignment - enables the copy managers auto registration.
    If a device is not registered in the copy manager, it can not be
    source or target of EXTENDED COPY commands. Enabled by default.
    Disable, if you want to manually control the copy manager
    registration or need to change a device, e.g. a DM cache device, with
    SCST LUN on top of it to avoid extra reference the copy manager holds
    on this device. In the later case you can also remove this reference
    by manually deleting the corresponding copy manager LUN via sysfs interface
    (/sys/kernel/scst_tgt/targets/copy_manager/copy_manager_tgt/luns/mgmt).


 SCST sysfs interface
 --------------------

 Starting from 2.0.0 SCST has sysfs interface. It supports only kernels
 2.6.26 and higher, because in 2.6.26 internal kernel's sysfs interface
 had a major change, which made it heavily incompatible with pre-2.6.26
 version.

 SCST sysfs interface designed to be self descriptive and self
 containing. This means that a high level management tool for it can be
 written once and automatically support any future sysfs interface
 changes (attributes additions or removals, new target drivers and dev
 handlers, etc.) without any modifications. Scstadmin is an example of
 such management tool.

 To implement that an management tool should not be implemented around
 drivers and their attributes, but around common rules those drivers and
 attributes follow. You can find those rules in SysfsRules file. For
 instance, each SCST sysfs file (attribute) can contain in the last line
 mark "[key]". It is automatically added to allow scstadmin and other
 management tools to see which attributes it should save in the config
 file. If you are doing manual attributes manipulations, you can ignore
 this mark.

 Root of SCST sysfs interface is /sys/kernel/scst_tgt. It has the
 following entries:

  - devices - this is a root subdirectory for all SCST devices

  - handlers - this is a root subdirectory for all SCST dev handlers

  - max_tasklet_cmd - specifies how many commands at max can be queued in
    the SCST core simultaneously on a single CPU from all connected
    initiators to allow processing commands on this CPU in soft-IRQ
    context in tasklets. If the count of the commands exceeds this value,
    then all of them will be processed only in SCST threads. This is to
    to prevent possible under heavy load starvation of processes on the
    CPUs serving soft IRQs and in some cases to improve performance by
    more evenly spreading load over available CPUs.

  - measure_latency - whether or not to enable latency measurements.
    Enabling latency measurements has a small impact on performance but
    makes detailed information available about how much time is needed
    to process SCSI commands. The structure of the paths to files with
    latency information is as follows:

    /sys/kernel/scst_tgt/targets/${target_driver_name}/${target_port_name}/sessions/${initiator_name}/latency/${io_type}${io_size}

    ${io_type} is n, r, w or b. 'n' means that no data buffer was
    associated with the command, 'r' stands for read, 'w' for write
    and 'b' for bidirectional. ${io_size} is a power of two between 512
    and 524288. Each file contains statistics for I/O requests with a
    size up to ${io_size} and that exceed a smaller I/O size. The files
    for ${io_size} 524288 are an exception because these also include
    data for all larger requests.

    Here is an example of the data produced by this infrastructure (edited for
    clarity):

      $ echo 1 >/sys/kernel/scst_tgt/measure_latency
      $ sleep 10 # Wait until an initiator has submitted multiple I/O requests
      $ (cd /sys/kernel/scst_tgt/targets &&
         find -name latency | xargs grep -raH .)
      state             count  min    max   avg stddev
      PARSE               219  1.3   26.6   2.2    2.5 us
      PREPARE_SPACE       219  0.9   10.3   1.1    0.6 us
      RDY_TO_XFER         219  0.7    1.7   0.7    0.2 us
      TGT_PRE_EXEC        219  0.7   11.0   0.8    0.9 us
      EXEC_CHECK_SN       219  0.7    1.7   0.8    0.2 us
      PRE_DEV_DONE        219 11.3 3445.7  39.6  276.4 us
      DEV_DONE            219  0.7   11.0   0.9    0.7 us
      PRE_XMIT_RESP1      219  1.2   58.4   1.6    3.8 us
      CSW2                219  0.7    1.6   0.8    0.1 us
      PRE_XMIT_RESP2      219  0.7    1.5   0.7    0.1 us
      XMIT_RESP           219  0.7    1.5   0.7    0.1 us
      INIT_WAIT           219  1.0   57.3   2.1    4.4 us
      INIT                219  0.9   27.4   1.6    2.4 us
      CSW1                219 15.0 3856.1  74.2  264.8 us
      EXEC_CHECK_BLOCKING 219  1.3   10.8   1.7    0.9 us
      LOCAL_EXEC          219  0.7    1.8   0.7    0.1 us
      REAL_EXEC           219  0.6    1.5   0.7    0.1 us
      EXEC_WAIT           219 40.6 1021.7  54.4   68.7 us
      XMIT_WAIT           219  6.4 1682.0  50.6  228.1 us
      total               219    -      - 236.9 2012.1 us

    PRE_DEV_DONE refers to internal checks done after execution of a command
    finished. CSW1 is the context switch that happens after the transport
    driver received a command and before processing of a command starts.
    EXEC_WAIT is the time spent in the device handler .exec() method.

  - sgv - this is a root subdirectory for all SCST SGV caches

  - targets - this is a root subdirectory for all SCST targets

  - setup_id - allows to read and write SCST setup ID. This ID can be
    used in cases, when the same SCST configuration should be installed
    on several targets, but exported from those targets devices should
    have different IDs and SNs. For instance, VDISK dev handler uses this
    ID to generate T10 vendor specific identifier and SN of the devices.

  - poll_us - if polling is desired, sets how many us each SCST thread
    is polling its queue after it became empty in a hope that a new
    command can come. In some cases, polling can significantly increase
    IOPS, especially if low power states on CPU not disabled, because on
    high IOPS polling could be cheaper comparing to spending significant
    time on entering, then exiting CPU low power states + corresponding
    context switches. Disabled, i.e. set to 0, by default.

  - suspend - globally suspends or releases all SCSI activities on all
    devices. Useful for mass management, like adding or deleting LUNs.
    Writing to it value v:

 	* v > 0 - suspends activities, but waits no more, than v seconds

 	* v = 0 - suspends activities, waits indefinitely

 	* V < 0 - releases activities.

    Reading from this attribute returns number of previous suspend
    requests.

  - threads - allows to read and set number of global SCST I/O threads.
    Those threads used with async. dev handlers, for instance, vdisk
    BLOCKIO or NULLIO.

  - trace_cmds - shows current SCST commands up to size of the sysfs
    buffer (4KB)

  - trace_mcmds - shows current SCST management commands up to size of
    the sysfs buffer (4KB)

  - trace_level - allows to enable and disable various tracing
    facilities. See content of this file for help how to use it. See also
    section "Dealing with massive logs" for more info how to make correct
    logs when you enabled trace levels producing a lot of logs data.

  - version - read-only attribute, which allows to see version of
    SCST and enabled optional features.

  - last_sysfs_mgmt_res - read-only attribute returning completion status
    of the last management command. In the sysfs implementation there are
    some problems between internal sysfs and internal SCST locking. To
    avoid them in some cases sysfs calls can return error with errno
    EAGAIN. This doesn't mean the operation failed. It only means that
    the operation queued and not yet completed. To wait for it to
    complete, an management tool should poll this file. If the operation
    hasn't yet completed, it will also return EAGAIN. But after it's
    completed, it will return the result of this operation (0 for success
    or -errno for error). The following two shell functions show how to do
    this:

  - force_global_sgv_pool - if not set, buffers for SCSI commands are
    allocated from per-CPU SGV pool. Otherwise, global SGV pool is used.

 # Read the SCST sysfs attribute $1. See also scst/README for more information.
 scst_sysfs_read() {
     local EAGAIN val

     EAGAIN="Resource temporarily unavailable"
     while true; do
         if val="$(LC_ALL=C cat "$1" 2>&1)"; then
             echo -n "${val%\[key\]}"
             return 0
 	elif [ "${val/*: }" != "$EAGAIN" ]; then
             return 1
         fi
         sleep 1
     done
 }

 # Write $1 into the SCST sysfs attribute $2. See also scst/README for more
 # information.
 scst_sysfs_write() {
     local EAGAIN status

     EAGAIN="Resource temporarily unavailable"
     if status="$(LC_ALL=C; (echo -n "$1" > "$2") 2>&1)"; then
         return 0
     elif [ "${status/*: }" != "$EAGAIN" ]; then
         return 1
     fi
     scst_sysfs_read /sys/kernel/scst_tgt/last_sysfs_mgmt_res >/dev/null
 }

 "Devices" subdirectory contains subdirectories for each SCST devices.

 Content of each device's subdirectory is dev handler specific. See
 documentation for your dev handlers for more info about it as well as
 SysfsRules file for more info about common to all dev handlers rules.
 SCST dev handlers can have the following common entries:

  - block - allows to temporary block and unblock this device. See below.

  - exported - subdirectory containing links to all LUNs where this
    device was exported.

  - handler - if dev handler determined for this device, this link points
    to it. The handler can be not set for pass-through devices.

  - threads_num - shows and allows to set number of threads in this device's
    threads pool. If 0 - no threads will be created, and global SCST
    threads pool will be used. If <0 - creation of the threads pool is
    prohibited.

  - threads_pool_type - shows and allows to sets threads pool type.
    Possible values: "per_initiator" and "shared". When the value is
    "per_initiator" (default), each session from each initiator will use
    separate dedicated pool of threads. When the value is "shared", all
    sessions from all initiators will share the same per-device pool of
    threads. Valid only if threads_num attribute >0.

  - dump_prs - allows to dump persistent reservations information in the
    kernel log.

  - type - SCSI type of this device

  - max_tgt_dev_commands - maximum number of SCSI commands any session to
    this device can have in flight.

  - numa_node_id - NUMA node id this device physically belongs to. SCST
    NUMA handling assumes that being used in the system NUMA memory
    allocation policy is to always allocate from the current node.

 Attribute "block" allows to temporary block and unblock this device.
 "Blocking" means that no new commands for this device will go into the
 execution stage, but instead will be suspended just before it. The
 blocked state is not reached until queue of the corresponding device is
 completely drained. You can also call this state "frozen". It is useful
 in many cases, like consistent snapshots and graceful shutdown.

 On write "block" entry allows the following 3 types of parameters:

  - 1 - block device synchronously, i.e. don't return until this device
  becomes blocked, i.e. until queue of it is not completely drained. Can
  be called as many times as needed.

  - 11 params - block device asynchronously, i.e. return immediately.
  Notification about completing is delivered using SCST_EVENT_EXT_BLOCKING_DONE
  event. "Params" delivered to it as is in "data" payload. Can be
  called as many times as needed. Alternatively, status of blocking could be
  polled by reading this attributes until the second number reaches 0
  (see below).

  - 0 - unblock this device.

 Reading from "block" entry returns two numbers separated by space:

 1. How many times this device was blocked, i.e. how many times writing
 "0" to it is needed to unblock this device.

 2. Boolean (0 or 1) if blocking, if any, is done (0) or still pending (1).

 See below for more information about other entries of this subdirectory
 of the standard SCST dev handlers.

 "Handlers" subdirectory contains subdirectories for each SCST dev
 handler.

 Content of each handler's subdirectory is dev handler specific. See
 documentation for your dev handlers for more info about it as well as
 SysfsRules file for more info about common to all dev handlers rules.
 SCST dev handlers can have the following common entries:

  - mgmt - this entry allows to create virtual devices and their
    attributes (for virtual devices dev handlers) or assign/unassign real
    SCSI devices to/from this dev handler (for pass-through dev
    handlers).

  - trace_level - allows to enable and disable various tracing
    facilities. See content of this file for help how to use it. See also
    section "Dealing with massive logs" for more info how to make correct
    logs when you enabled trace levels producing a lot of logs data.

  - type - SCSI type of devices served by this dev handler.

 See below for more information about other entries of this subdirectory
 of the standard SCST dev handlers.

 "Sgv" subdirectory contains statistic information of SCST SGV caches. It
 has the following entries:

  - None, one or more subdirectories for each existing SGV cache.

  - global_stats - file containing global SGV caches statistics.

 Each SGV cache's subdirectory has the following item:

  - stats - file containing statistics for this SGV caches.

 "Targets" subdirectory contains subdirectories for each SCST target.

 Content of each target's subdirectory is target specific. See
 documentation for your target for more info about it as well as
 SysfsRules file for more info about common to all targets rules.
 Every target should have at least the following entries:

  - ini_groups - subdirectory, which contains and allows to define
    initiator-oriented access control information, see below.

  - luns - subdirectory, which contains list of available LUNs in the
    target-oriented access control and allows to define it, see below.

  - sessions - subdirectory containing connected to this target sessions.

  - comment - this attribute can be used to store any human readable info
    to help identify target. For instance, to help identify the target's
    mapping to the corresponding hardware port. It isn't anyhow used by
    SCST.

  - enabled - using this attribute you can enable or disable this target.
    It allows to finish configuring it before it starts accepting new
    connections. 0 by default.

  - addr_method - used LUNs addressing method. Possible values:
    "Peripheral", "Flat" or "LUN". Most initiators work well with
    Peripheral addressing method (default), but some (HP-UX, for instance)
    may require the Flat method or the LUN method (e.g. IBM systems). This
    attribute is also available in the initiators security groups, so you
    can assign the addressing method on per-initiator basis. See also the
    "Logical unit addressing (LUN)" section in SAM-5 for more information.

  - black_hole - if set, all LUNs in the corresponding initiator group,
    default target group in this case, start "swallowing" requests from
    initiators. Possible values are:

     * 0 - disable black hole mode

     * 1 - immediately abort all coming SCSI commands, i.e. all SCSI commands
       are dropped and TM requests return that they completed. It is
       supposed to simulate lost front end responses.

     * 2 - immediately abort all coming SCSI commands and drop all coming TM
       commands. It is supposed to simulate logical target hang, when the
       target stops responding, but on the HW/TCP connection level still
       appears to be online.

     * 3 - immediately abort all coming data transfer SCSI commands, i.e.
       only data transfer SCSI commands are dropped, while commands like
       INQUIRY and TEST UNIT READY pass well. It is supposed to simulate
       flaky front end connectivity, when responses for small commands
       pass well, but big data transfers fail.

     * 4 - immediately abort all coming data transfer SCSI commands and
       drop all coming TM commands. It is supposed to simulate really
       flaky front end connectivity, when TM requests or responses are
       also lost.

    Modes 3 and 4 are the most evil ones, because they are not too well
    handled by many initiator OS'es, including Linux, so they may never
    recover from it.

    Note, dropping TM commands, i.e. not sending response on them,
    implemented not for all target drivers. If it's implemented for your
    particular target driver or not, you can find out by checking traces
    or the target driver's source code.

  - dif_capabilities - if this target supports T10-PI, returns which
    exact DIF capabilities this target supports.

  - dif_checks_failed - if this target supports T10-PI, returns
    statistics how many DIF errors have been detected on the
    corresponding processing stages on this target. It returns 3 rows of
    numbers with 3 numbers in each row: for target driver stage, for SCST
    stage and for dev handler stage. Numbers in each row: how many errors
    detected checking application, reference and guard tags
    correspondingly. Writing to this attribute resets the numbers.

  - cpu_mask - defines CPU affinity mask for threads serving this target.
    For threads serving LUNs it is used only for devices with
    threads_pool_type "per_initiator".

  - io_grouping_type - defines how I/O from sessions to this target are
    grouped together. This I/O grouping is very important for
    performance. By setting this attribute in a right value, you can
    considerably increase performance of your setup. This grouping is
    performed only if you use CFQ I/O scheduler on the target and for
    devices with threads_num >= 0 and, if threads_num > 0, with
    threads_pool_type "per_initiator". Possible values:
    "this_group_only", "never", "auto", or I/O group number >0. When the
    value is "this_group_only" all I/O from all sessions in this target
    will be grouped together. When the value is "never", I/O from
    different sessions will not be grouped together, i.e. all sessions in
    this target will have separate dedicated I/O groups. When the value
    is "auto" (default), all I/O from initiators with the same name
    (iSCSI initiator name, for instance) in all targets will be grouped
    together with a separate dedicated I/O group for each initiator name.
    For iSCSI this mode works well, but other transports usually use
    different initiator names for different sessions, so using such
    transports in MPIO configurations you should either use value
    "this_group_only", or an explicit I/O group number. This attribute is
    also available in the initiators security groups, so you can assign
    the I/O grouping on per-initiator basis. See below for more info how
    to use this attribute.

  - rel_tgt_id - allows to read or write SCSI Relative Target Port
    Identifier attribute. This identifier is used to identify SCSI Target
    Ports by some SCSI commands, mainly by Persistent Reservations
    commands. This identifier must be unique among all SCST targets, but
    for convenience SCST allows disabled targets to have not unique
    rel_tgt_id. In this case SCST will not allow to enable this target
    until rel_tgt_id becomes unique. This attribute initialized unique by
    SCST by default.

  - forward_src - if set this target port is a forwarding source. This means
    that commands like COMPARE AND WRITE, EXTENDED COPY and RECEIVE COPY
    RESULTS are submitted to the SCSI device instead of being handled inside
    the SCST core. PERSISTENT RESERVE IN and OUT commands are processed by the
    SCST core, whether or not this mode is enabled. The name 'forwarding_src'
    refers to the use case where SCSI passthrough is used to send SCSI commands
    to another H.A. node.

  - forward_dst - if set this target port is a forwarding destination. This means
    that it does not check any local SCSI events (reservations, etc.). Those
    event are supposed to be checked at the forwarding source side.

  - forwarding - obsolete synonym for forward_dst.

  - *count*, e.g. read_io_count_kb, - statistics about executed
    commands and transferred data. Those attributes have speaking names
    built from parts:

    1. Data transfer direction

    2. Alignment type: not specified or unaligned (on 4K boundaries)

    3. Type: IO (commands) count or amount of transferred data

    4. For transferred data: measurement units

    For instance, read_unaligned_cmd_count means number of 4K unaligned IOs.

 A target driver may have also the following entries:

  - "hw_target" - if the target driver supports both hardware and virtual
     targets (for instance, an FC adapter supporting NPIV, which has
     hardware targets for its physical ports as well as virtual NPIV
     targets), this read only attribute for all hardware targets will
     exist and contain value 1.

 Subdirectory "sessions" contains one subdirectory for each connected
 session with name equal to name of the connected initiator with the
 following entries:

  - initiator_name - contains initiator name

  - force_close - optional write-only attribute, which allows to force
    close this session.

  - active_commands - contains number of active, i.e. not yet or being
    executed, SCSI commands in this session.

  - commands - contains overall number of SCSI commands in this session.

  - dif_checks_failed - if target of this session supports T10-PI, returns
    statistics how many DIF errors have been detected on the
    corresponding processing stages on all DIF-enabled LUNs in this
    session. It returns 3 rows of numbers with 3 numbers in each row: for
    target driver stage, for SCST stage and for dev handler stage.
    Numbers in each row: how many errors detected checking application,
    reference and guard tags correspondingly. Writing to this attribute
    resets the numbers. Similar statistics returned in attribute with the
    same name for each LUN in this session in this LUN's subdirectory, if
    its device configured with dif_type > 0.

  - read_cmd_count - number of READ SCSI commands received since beginning
    or last reset (writing 0 in this attribute)

  - read_io_count_kb - amount of data in KB read by the initiator since
    beginning or last reset (writing 0 in this attribute)

  - write_cmd_count - number of WRITE SCSI commands received since
    beginning or last reset (writing 0 in this attribute)

  - write_io_count_kb - amount of data in KB written by the initiator
    since beginning or last reset (writing 0 in this attribute)

  - bidi_cmd_count - number of BIDI SCSI commands received since
    beginning or last reset (writing 0 in this attribute)

  - bidi_io_count_kb - amount of data in KB transferred by the
    initiator since beginning or last reset (writing 0 in this attribute)

  - none_cmd_count - number of not transferring data SCSI commands
    (e.g. INQUIRY or TEST UNIT READY) received since beginning or last
    reset (writing 0 in this attribute)

  - unknown_cmd_count - number of unknown SCSI commands received since
    beginning or last reset (writing 0 in this attribute)

  - *count*, e.g. read_io_count_kb, - statistics about executed
    commands and transferred data. See above for more details.

  - luns - a link pointing out to the corresponding LUNs set (security
    group) where this session was attached to.

  - One or more "lunX" subdirectories, where 'X' is a number, for each LUN
    this session has (see below).

  - other target driver specific attributes and subdirectories.

 See below description of the VDISK's sysfs interface for samples.


 Each sessions/<sess>/lun<X> subdirectory contains the following entries:

  - active_commands - contains number of active, i.e. not yet or being
    executed, SCSI commands for lun<X> in session <sess>.

  - thread_pid - contains a single line with all the process identifiers
    (PIDs) of the kernel threads that process SCSI commands intended for
    lun<X> in session <sess>.

  - thread_index - thread index assigned by scst_add_threads().
    Can be used to look up which export thread is serving which target
    since this index also appears in the export thread name. This
    information then could be used to set CPU affinity for those threads
    to improve performance. Has a value in the range 0..n-1 for
    threads_pool_type per_initiator or -1 when using a shared thread pool
    per LUN or the global thread pool.


 Access and devices visibility management (LUN masking)
 ------------------------------------------------------

 Access and devices visibility management allows for an initiator or
 group of initiators to see different devices with different LUNs
 with necessary access permissions.

 SCST supports two modes of access control:

 1. Target-oriented. In this mode you define for each target a default
 set of LUNs, which are accessible to all initiators, connected to that
 target. This is a regular access control mode, which people usually mean
 thinking about access control in general. For instance, in IET this is
 the only supported mode.

 2. Initiator-oriented. In this mode you define which LUNs are accessible
 for each initiator. In this mode you should create for each set of one
 or more initiators, which should access to the same set of devices with
 the same LUNs, a separate security group, then add to it devices and
 names of allowed initiator(s).

 Both modes can be used simultaneously. In this case the
 initiator-oriented mode has higher priority, than the target-oriented,
 i.e. initiators are at first searched in all defined security groups for
 this target and, if none matches, the default target's set of LUNs is
 used. This set of LUNs might be empty, then the initiator will not see
 any LUNs from the target.

 You can at any time find out which set of LUNs each session is assigned
 to by looking where link
 /sys/kernel/scst_tgt/targets/target_driver/target_name/sessions/initiator_name/luns
 points to.

 To configure the target-oriented access control SCST provides the
 following interface. Each target's sysfs subdirectory
 (/sys/kernel/scst_tgt/targets/target_driver/target_name) has "luns"
 subdirectory. This subdirectory contains the list of already defined
 target-oriented access control LUNs for this target as well as file
 "mgmt". This file has the following commands, which you can send to it,
 for instance, using "echo" shell command. You can always get a small
 help about supported commands by looking inside this file. "Parameters"
 are one or more param_name=value pairs separated by ';'.

  - "add H:C:I:L lun [parameters]" - adds a pass-through device with
    host:channel:id:lun with LUN "lun". Optionally, the device could be
    marked as read only by using parameter "read_only". The recommended
    way to find out H:C:I:L numbers is use of lsscsi utility.

  - "replace H:C:I:L lun [parameters]" - replaces by pass-through device
    with host:channel:id:lun existing with LUN "lun" device with
    generation of INQUIRY DATA HAS CHANGED Unit Attention. If the old
    device doesn't exist, this command acts as the "add" command.
    Optionally, the device could be marked as read only by using
    parameter "read_only". The recommended way to find out H:C:I:L
    numbers is use of lsscsi utility.

  - "add VNAME lun [parameters]" - adds a virtual device with name VNAME
    with LUN "lun". Optionally, the device could be marked as read only
    by using parameter "read_only".

  - "replace VNAME lun [parameters]" - replaces by virtual device
    with name VNAME existing with LUN "lun" device with generation of
    INQUIRY DATA HAS CHANGED Unit Attention. If the old device doesn't
    exist, this command acts as the "add" command. Optionally, the device
    could be marked as read only by using parameter "read_only".

  - "del lun" - deletes LUN lun

  - "clear" - clears the list of devices

 To configure the initiator-oriented access control SCST provides the
 following interface. Each target's sysfs subdirectory
 (/sys/kernel/scst_tgt/targets/target_driver/target_name) has "ini_groups"
 subdirectory. This subdirectory contains the list of already defined
 security groups for this target as well as file "mgmt". This file has
 the following commands, which you can send to it, for instance, using
 "echo" shell command. You can always get a small help about supported
 commands by looking inside this file.

  - "create GROUP_NAME" - creates a new security group.

  - "del GROUP_NAME" - deletes a new security group.

 Each security group's subdirectory contains 2 subdirectories: initiators
 and luns as well as the following attributes: addr_method, cpu_mask and
 io_grouping_type, black_hole. See above description of them.

 Each "initiators" subdirectory contains list of added to this groups
 initiator as well as as well as file "mgmt". This file has the following
 commands, which you can send to it, for instance, using "echo" shell
 command. You can always get a small help about supported commands by
 looking inside this file.

  - "add INITIATOR_NAME" - adds initiator with name INITIATOR_NAME to the
    group.

  - "del INITIATOR_NAME" - deletes initiator with name INITIATOR_NAME
    from the group.

  - "move INITIATOR_NAME DEST_GROUP_NAME" moves initiator with name
    INITIATOR_NAME from the current group to group with name
    DEST_GROUP_NAME.

  - "clear" - deletes all initiators from this group.

 For "add" and "del" commands INITIATOR_NAME can be a simple DOS-type
 patterns, containing '*' and '?' symbols. '*' means match all any
 symbols, '?' means match only any single symbol. For instance,
 "blah.xxx" will match "bl?h.*". Additionally, you can use negative sign
 '!' to revert the value of the pattern. For instance, "ah.xxx" will
 match "!bl?h.*".

 Each "luns" subdirectory contains the list of already defined LUNs for
 this group as well as file "mgmt". Content of this file as well as list
 of available in it commands is fully identical to the "luns"
 subdirectory of the target-oriented access control.

 Examples:

  - echo "create INI" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.vlnb:tgt1/ini_groups/mgmt -
    creates security group INI for target iqn.2006-10.net.vlnb:tgt1.

  - echo "add 2:0:1:0 11" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.vlnb:tgt1/ini_groups/INI/luns/mgmt -
    adds a pass-through device sitting on host 2, channel 0, ID 1, LUN 0
    to group with name INI as LUN 11.

  - echo "add disk1 0" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.vlnb:tgt1/ini_groups/INI/luns/mgmt -
    adds a virtual disk with name disk1 to group with name INI as LUN 0.

  - echo "add 21:*:e0:?b:83:*" >/sys/kernel/scst_tgt/targets/21:00:00:a0:8c:54:52:12/ini_groups/INI/initiators/mgmt -
    adds a pattern to group with name INI to Fibre Channel target with
    WWN 21:00:00:a0:8c:54:52:12, which matches WWNs of Fibre Channel
    initiator ports.

 Consider you need to have an iSCSI target with name
 "iqn.2007-05.com.example:storage.disk1.sys1.xyz", which should export
 virtual device "dev1" with LUN 0 and virtual device "dev2" with LUN 1,
 but initiator with name
 "iqn.2007-05.com.example:storage.disk1.spec_ini.xyz" should see only
 virtual device "dev2" read only with LUN 0. To achieve that you should
 do the following commands:

 # echo "iqn.2007-05.com.example:storage.disk1.sys1.xyz" >/sys/kernel/scst_tgt/targets/iscsi/mgmt
 # echo "add dev1 0" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2007-05.com.example:storage.disk1.sys1.xyz/luns/mgmt
 # echo "add dev2 1" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2007-05.com.example:storage.disk1.sys1.xyz/luns/mgmt
 # echo "create SPEC_INI" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2007-05.com.example:storage.disk1.sys1.xyz/ini_groups/mgmt
 # echo "add dev2 0 read_only=1" \
 	>/sys/kernel/scst_tgt/targets/iscsi/iqn.2007-05.com.example:storage.disk1.sys1.xyz/ini_groups/SPEC_INI/luns/mgmt
 # echo "iqn.2007-05.com.example:storage.disk1.spec_ini.xyz" \
 	>/sys/kernel/scst_tgt/targets/iscsi/iqn.2007-05.com.example:storage.disk1.sys1.xyz/ini_groups/SPEC_INI/initiators/mgmt

 For Fibre Channel or SAS in the above example you should use target's
 and initiator ports WWNs instead of iSCSI names.

 It is highly recommended to use scstadmin utility instead of described
 in this section low level interface.

 IMPORTANT
 =========

 There must be LUN 0 in each set of LUNs, i.e. LUs numeration must not
 start from, e.g., 1. Otherwise you will see no devices on remote
 initiators and SCST core will write into the kernel log message: "tgt_dev
 for LUN 0 not found, command to unexisting LU?"

 IMPORTANT
 =========

 All the access control must be fully configured BEFORE the corresponding
 target is enabled. When you enable a target, it will immediately start
 accepting new connections, hence creating new sessions, and those new
 sessions will be assigned to security groups according to the
 *currently* configured access control settings. For instance, to
 the default target's set of LUNs, instead of "HOST004" group as you may
 need, because "HOST004" doesn't exist yet. So, you must configure all
 the security groups before new connections from the initiators are
 created, i.e. before the target enabled.


 VDISK device handler
 --------------------

 Starting from 2.0.0 VDISK device handler uses sysfs interface.

 VDISK has 4 built-in dev handlers: vdisk_fileio, vdisk_blockio,
 vdisk_nullio and vcdrom. Roots of their sysfs interface are
 /sys/kernel/scst_tgt/handlers/handler_name, e.g. for vdisk_fileio:
 /sys/kernel/scst_tgt/handlers/vdisk_fileio. Each root has the following
 entries:

  - None, one or more links to devices with name equal to names
    of the corresponding devices.

  - trace_level - allows to enable and disable various tracing
    facilities. See content of this file for help how to use it. See also
    section "Dealing with massive logs" for more info how to make correct
    logs when you enabled trace levels producing a lot of logs data.

  - mgmt - main management entry, which allows to add/delete VDISK
    devices with the corresponding type.

 The "mgmt" file has the following commands, which you can send to it,
 for instance, using "echo" shell command. You can always get a small
 help about supported commands by looking inside this file. "Parameters"
 are one or more param_name=value pairs separated by ';'.

  - echo "add_device device_name [parameters]" - adds a virtual device
    with name device_name and specified parameters (see below)

  - echo "del_device device_name" - deletes a virtual device with name
    device_name.

 Handler vdisk_fileio provides FILEIO mode to create virtual devices.
 This mode uses as backend files and accesses to them using regular
 read()/write() file calls. This allows to use full power of Linux page
 cache. The following parameters possible for vdisk_fileio:

  - filename - specifies path and file name of the backend file. The path
    must be absolute.

  - blocksize - specifies block size used by this virtual device. The
    block size must be power of 2 and >= 512 bytes. Default is 512.

  - opt_trans_len - specifies the optimal transfer length data in the block
    limits VPD page. Value is in bytes, and must be a multiple of the block
    size. Default is 524288. Setting this parameter to a multiple of the
    optimal transfer length below 4 MB may improve performance. Setting this
    parameter to a value above 4 MB hurts performance because the SGV cache
    only supports buffers up to 4 MB.

  - write_through - disables write back caching. Note, this option
    has sense only if you also *manually* disable write-back cache in
    *all* your backstorage devices and make sure it's actually disabled,
    since many devices are known to lie about this mode to get better
    benchmark results. Default is 0.

  - read_only - read only. Default is 0.

  - async - submit I/O asynchronously to the device handler. This mode
    allows concurrent processing of SCSI commands even when using only
    a single SCST command thread. This mode is only supported for kernel
    version 4.1 and later. RHEL 8 is the first RHEL version that supports
    in-kernel asynchronous file I/O.

  - o_direct - disables both read and write caching if asynchronous
    I/O is used. This mode bypasses the page cache and hence improves
    performance.

  - nv_cache - enables "non-volatile cache" mode. In this mode it is
    assumed that the target has a GOOD UPS with ability to cleanly
    shutdown target in case of power failure and it is software/hardware
    bugs free, i.e. all data from the target's cache are guaranteed
    sooner or later to go to the media. Hence all data synchronization
    with media operations, like SYNCHRONIZE_CACHE, are ignored in order
    to bring more performance. Also in this mode target reports to
    initiators that the corresponding device has write-through cache to
    disable all write-back cache workarounds used by initiators. Use with
    extreme caution, since in this mode after a crash of the target
    journaled file systems don't guarantee the consistency after journal
    recovery, therefore manual fsck MUST be ran. Note, that since usually
    the journal barrier protection (see "IMPORTANT" note below) turned
    off, enabling NV_CACHE could change nothing from data protection
    point of view, since no data synchronization with media operations
    will go from the initiator. This option overrides "write_through"
    option. Disabled by default.

  - thin_provisioned - enables thin provisioning facility, when remote
    initiators can unmap blocks of storage, if they don't need them
    anymore. Backend storage also must support this facility.

  - tst - allows to specify TST control mode page field. It specifies
    the type of task set in the device. Possible values are: 0 - the
    device maintains one task set for all I_T nexuses and 1 - the device
    maintains separate task sets for each I_T nexus. Default - 1.

  - removable - with this flag set the device is reported to remote
    initiators as removable.

  - rotational - if set, this device reported as rotational. Otherwise,
    it is reported as non-rotational (SSD, etc.)

  - zero_copy - ignored. For zero-copy I/O, set the async flag and
    possibly also the o_direct flag and use Linux kernel v4.10 or later.

  - dif_mode - specifies which T10-PI, or DIF, mode this device will use.
    See SCSI standards from more info about T10-PI. Available DIF modes
    (can be combined using '|'):

     * tgt - DIF tags are checked on the target hardware, if supported

     * scst - DIF tags are checked inside SCST core

     * dev_check - DIF tags are checked inside backend device. No DIF
       tags storing is required, but optionally possible.

     * dev_store - DIF tags are stored inside backend device on the WRITE
       path and read from it on the READ path. No DIF tags checking is
       required, but optionally possible.

    For instance, if only tgt DIF mode specified, then target driver,
    serving this device, will inside hardware check, then STRIP DIF tags
    from SCSI commands on the WRITE path and generate, then INSERT DIF
    tags into SCSI commands on the READ path, so neither SCST core, nor
    dev handler will see them.

    Similarly, if only scst DIF mode specified, then target driver will
    PASS DIF tags into SCST core, which then check/STRIP/generate/INSERT
    them, so dev handler will not see them.

    If only dev_check DIF mode specified, then both target driver and
    SCST core will PASS DIF tags into the dev handler, which is then
    responsible to check them in the backend hardware. If only dev_store
    specified, then DIF tags will only be stored by the dev handler in
    the backend hardware without checking at any level.

    If all "tgt|scst|dev_check|dev_store" DIF mode specified, then all
    target driver, SCST core and dev handler will check DIF tags, then
    dev handler will store them in the backend hardware.

  - dif_type - specifies which DIF SCSI type this device will use.

  - dif_static_app_tag - specifies fixed (static) DIF application tag for
    this device.

  - dif_filename - specifies full path to filename, where DIF tags will
    be stored.

 Handler vdisk_blockio provides BLOCKIO mode to create virtual devices.
 This mode performs direct block I/O with a block device, bypassing the
 page cache for all operations. This mode works ideally with high-end
 storage HBAs and for applications that either do not need caching
 between application and disk or need the large block throughput. See
 below for more info.

 The following parameters possible for vdisk_blockio: filename,
 blocksize, nv_cache, read_only, removable, rotational, thin_provisioned,
 tst, dif_mode, dif_type, dif_static_app_tag, dif_filename. See
 vdisk_fileio above for description of those parameters.

 vdisk_blockio devices have the following two additional attributes:

 - active - if this flag is set (the default), the backing block device
   will be opened when the SCST device is added/opened. If a SCST device
   is opened with active=0 then the backing block device will not be
   opened, allowing for an active/passive SCST configuration. In addition,
   this attribute is writable via sysfs allowing the user to open/close the
   backing block device on the fly, or via a script.

 - bind_alua_state - if this flag is set (the default), when the device is
   associated with an ALUA device group, and a target group ALUA state
   changes to the active/nonoptimized state, the active attribute will be
   set to 1 which attempts to open the backing block device. If the target
   group ALUA state changes to a value other than active/nonoptimized, the
   backing device will be closed (active=0). If bind_alua_state=0 for a
   device the ALUA state changes have NO effect on the active attribute,
   it is left up to the user to use a script, or manually set the active
   attribute to open/close the backing block device.

 Handler vdisk_nullio provides NULLIO mode to create virtual devices. In
 this mode no real I/O is done, but success returned to initiators.
 Intended to be used for performance measurements at the same way as
 "*_perf" handlers. The following parameters possible for vdisk_nullio:
 blocksize, read_only, removable, tst. See vdisk_fileio above for
 description of those parameters.

 vdisk_nullio devices have the following two additional attributes:

  - dummy - if this flag is set, LUNs corresponding to this device will
    not appear at the initiator side. This is because SCST will set the
    PERIPHERAL QUALIFIER qualifier field to 1 (not connected) and the
    PERIPHERAL DEVICE TYPE to 0x1f (no device) in the INQUIRY response.
    See also SPC-4 for more information. It is designed to be used as a
    "dummy" placeholder on LUN 0, if LUN 0 is not desired.

  - read_zero - if this flag is set, reading from a vdisk_nullio device
    returns a buffer filled with byte 0x00. If this flag is cleared
    (which is the default behavior), the buffer returned to the
    initiator is not cleared. Although this results in slightly faster
    operation this is a security hole since any data that is present in
    kernel memory can be returned to the initiator.

 Handler vcdrom allows emulation of a virtual CDROM device using an ISO
 file as backend. It has only single parameter: tst.

 For example:

 echo "add_device disk1 filename=/disk1; blocksize=4096; nv_cache=1" >/sys/kernel/scst_tgt/handlers/vdisk_fileio/mgmt

 will create a FILEIO virtual device disk1 with backend file /disk1
 with block size 4K and NV_CACHE enabled.

 Each vdisk_fileio's device has the following attributes in
 /sys/kernel/scst_tgt/devices/device_name:

  - filename - contains path and file name of the backend file.

  - blocksize - contains block size used by this virtual device.

  - opt_trans_len - contains the optimal transfer length used by this virtual
    device.

  - write_through - contains status of write back caching of this virtual
    device.

  - sync - writing into this attribute causes the page cache contents to
    be flushed to disk.

  - read_only - contains read only status of this virtual device.

  - o_direct - contains O_DIRECT status of this virtual device.

  - inq_vend_specific - Vendor specific data that will be reported via
    either bytes 36..55 or bytes 96..256 of the INQUIRY response, depending
    on whether this field is <= 20 or > 20 bytes long.

  - nv_cache - contains NV_CACHE status of this virtual device.

  - prod_id - PRODUCT IDENTIFICATION as reported via the INQUIRY response.
    The default value for this field is the SCST device name.

  - prod_rev_lvl - PRODUCT REVISION LEVEL as reported via the INQUIRY
    response. The default value for this field is " 300".

  - scsi_device_name - optional SCSI target device name to which this
    SCST device belongs to (in SCSI terminology all SCST devices called
    Logical Units). See SPC for more info.

  - tst - contains TST field of SCSI Control mode page. See SPC-4 for
    more details about this field.

  - thin_provisioned - contains thin provisioning status of this virtual
    device.

  - gen_tp_soft_threshold_reached_UA - for thin provisioned devices
    writing of anything into this write-only attribute will generate THIN
    PROVISIONING SOFT THRESHOLD REACHED Unit Attention to all connected
    to this device initiators.

  - removable - contains removable status of this virtual device.

  - rotational - contains rotational status of this virtual device.

  - size_mb - contains size of this virtual device in MB.

  - pr_file_name - Full path of the file or block device in which to store
    persistent reservation information. The default value for this attribute is
    /var/lib/scst/pr/${device_name}. Writing a new value into this sysfs
    attribute is only allowed if the device is not exported. Modifying this
    sysfs attribute causes the persistent reservation state to be reloaded.

  - t10_dev_id - contains and allows to set T10 vendor specific
    identifier for Device Identification VPD page (0x83) of INQUIRY data.
    By default VDISK handler always generates t10_dev_id for every new
    created device at creation time based on the device name and
    scst_vdisk_ID scst_vdisk.ko module parameter for procfs (see below)
    or the SCST setup_id when using the sysfs interface (see above).
    Note: some initiators, e.g. VMware's ESXi or MS Hyper-V, only looks
    at the first eight characters of t10_dev_id. You have to make sure
    that these first eight characters are unique or VMware will consider
    these devices as identical.

  - eui64_id - allows to set the EUI-64 based device identifier in the
    SCSI device identification VPD page (83h). This identifier must be 8,
    12 or 16 bytes long and must be specified in hexadecimal format (EUI =
    Extended Unique Identifier). A leading "0x" is allowed but is not
    required. Writing a newline into this attribute discards the EUI-64
    identifier. If neither eui64_id nor naa_id have been set the first
    eight bytes of the t10_dev_id are used as the EUI-64 ID. If naa_id has
    been set but eui64_id has not been set no EUI-64 identifier is
    reported in the SCSI device identification VPD page. If eui64_id has
    been set the value of this attribute is reported as the EUI-64 ID. The
    first three bytes of an EUI-64 ID are a so-called organizationally
    unique identifier (OUI). The remaining bytes may be chosen by the
    organization that owns the OUI. For more information about OUIs, see
    also http://standards.ieee.org/develop/regauth/oui/public.html.

  - naa_id - allows to set the NAA ID in the SCSI INQUIRY response (NAA =
    Network Address Authority). This identifier must be 8 or 16 bytes long
    and must be specified in hex format. A leading "0x" is allowed but is
    not required. Writing a newline into this attribute discards the NAA
    ID. If this ID is set it is reported in the SCSI VPD device
    identification page (83h). More information about NAA identifiers can
    be found in the following documents:
    * ANSI T11 committee, Fibre Channel Framing and Signaling Interface - 4
      (FC-FS-4) rev 0.50, May 2014 (http://www.t11.org/).
    * IETF, RFC 3980 - T11 Network Address Authority (NAA) Naming Format for
      iSCSI Node Names, February 2005 (https://tools.ietf.org/html/rfc3980).

  - t10_vend_id - Contents of the T10 VENDOR IDENTIFICATION field of the
    INQUIRY response. The default value for this field is "SCST_BIO" for
    vdisk_block devices and "SCST_FIO" for vdisk_fileio devices.

  - usn - contains the virtual device's serial number of INQUIRY data. It
    is created at the device creation time based on the device name and
    scst_vdisk_ID scst_vdisk.ko module parameter for procfs (see below)
    or the SCST setup_id when using the sysfs interface (see above).

  - type - contains SCSI type of this virtual device.

  - resync_size - write only attribute, which makes vdisk_fileio to
    rescan size of the backend file. It is useful if you changed it, for
    instance, if you resized it.

  - vend_specific_id - Vendor specific ID as reported via the Device
    Identification VPD page (83h). The default value for this attribute
    is the value of the t10_dev_id attribute.

 For example:

 /sys/kernel/scst_tgt/devices/disk1
 |-- block
 |-- blocksize
 |-- opt_trans_len
 |-- exported
 |   |-- export0 -> ../../../targets/iscsi/iqn.2006-10.net.vlnb:tgt/luns/0
 |   |-- export1 -> ../../../targets/iscsi/iqn.2006-10.net.vlnb:tgt/ini_groups/INI/luns/0
 |   |-- export2 -> ../../../targets/iscsi/iqn.2006-10.net.vlnb:tgt1/luns/0
 |   |-- export3 -> ../../../targets/iscsi/iqn.2006-10.net.vlnb:tgt1/ini_groups/INI1/luns/0
 |   |-- export4 -> ../../../targets/iscsi/iqn.2006-10.net.vlnb:tgt1/ini_groups/INI2/luns/0
 |-- filename
 |-- handler -> ../../handlers/vdisk_fileio
 |-- nv_cache
 |-- o_direct
 |-- read_only
 |-- removable
 |-- resync_size
 |-- rotational
 |-- size_mb
 |-- t10_dev_id
 |-- thin_provisioned
 |-- threads_num
 |-- threads_pool_type
 |-- tst
 |-- type
 |-- usn
 `-- write_through

 Each vdisk_blockio's device has the following attributes in
 /sys/kernel/scst_tgt/devices/device_name: blocksize, filename, nv_cache,
 read_only, removable, resync_size, rotational, size_mb, t10_dev_id,
 thin_provisioned, gen_tp_soft_threshold_reached_UA, threads_num,
 threads_pool_type, tst, type, usn. See above description of those
 parameters.

 Each vdisk_nullio's device has the following attributes in
 /sys/kernel/scst_tgt/devices/device_name: blocksize, read_only,
 removable, size_mb, t10_dev_id, threads_num, threads_pool_type, type,
 tst, usn, dummy. See above description of those parameters.

 Each vcdrom's device has the following attributes in
 /sys/kernel/scst_tgt/devices/device_name: filename, size_mb,
 t10_dev_id, threads_num, threads_pool_type, type, usn, tst. See above
 description of those parameters. Exception is filename attribute. For
 vcdrom it is writable. Writing to it allows to virtually insert or
 change virtual CD media in the virtual CDROM device. For example:

  - echo "/image.iso" >/sys/kernel/scst_tgt/devices/cdrom/filename - will
    insert file /image.iso as virtual media to the virtual CDROM cdrom.

  - echo "" >/sys/kernel/scst_tgt/devices/cdrom/filename - will remove
    "media" from the virtual CDROM cdrom.

 Additionally VDISK handler has module parameter "num_threads", which
 specifies count of I/O threads for each FILEIO VDISK's or VCDROM device.
 If you have a workload, which tends to produce rather random accesses
 (e.g. DB-like), you should increase this count to a bigger value, like
 32. If you have a rather sequential workload, you should decrease it to
 a lower value, like number of CPUs on the target or even 1. Due to some
 limitations of Linux I/O subsystem, increasing number of I/O threads too
 much leads to sequential performance drop, especially with deadline
 scheduler, so decreasing it can improve sequential performance. The
 default provides a good compromise between random and sequential
 accesses.

 You shouldn't be afraid to have too many VDISK I/O threads if you have
 many VDISK devices. Kernel threads consume very little amount of
 resources (several KBs) and only necessary threads will be used by SCST,
 so the threads will not trash your system.

 CAUTION: If you partitioned/formatted your device with block size X, *NEVER*
 ======== ever try to export and then mount it (even accidentally) with another
          block size. Otherwise you can *instantly* damage it pretty
 	 badly as well as all your data on it. Messages on initiator
 	 like: "attempt to access beyond end of device" is the sign of
 	 such damage.

 	 Moreover, if you want to compare how well different block sizes
 	 work for you, you **MUST** EVERY TIME AFTER CHANGING BLOCK SIZE
 	 **COMPLETELY** **WIPE OFF** ALL THE DATA FROM THE DEVICE. In
 	 other words, THE **WHOLE** DEVICE **MUST** HAVE ONLY **ZEROS**
 	 AS THE DATA AFTER YOU SWITCH TO NEW BLOCK SIZE. Switching block
 	 sizes isn't like switching between FILEIO and BLOCKIO, after
 	 changing block size all previously written with another block
 	 size data MUST BE ERASED. Otherwise you will have a full set of
 	 very weird behaviors, because blocks addressing will be
 	 changed, but initiators in most cases will not have a
 	 possibility to detect that old addresses written on the device
 	 in, e.g., partition table, don't refer anymore to what they are
 	 intended to refer.

 IMPORTANT: Some disk and partition table management utilities don't support
 =========  block sizes >512 bytes, therefore make sure that your favorite one
            supports it. Currently only cfdisk is known to work only with
 	   512 bytes blocks, other utilities like fdisk on Linux or
 	   standard disk manager on Windows are proved to work well with
 	   non-512 bytes blocks. Note, if you export a disk file or
 	   device with some block size, different from one, with which
 	   it was already partitioned, you could get various weird
 	   things like utilities hang up or other unexpected behavior.
 	   Hence, to be sure, zero the exported file or device before
 	   the first access to it from the remote initiator with another
 	   block size. On Window initiator make sure you "Set Signature"
 	   in the disk manager on the imported from the target drive
 	   before doing any other partitioning on it. After you
 	   successfully mounted a file system over non-512 bytes block
 	   size device, the block size stops matter, any program will
 	   work with files on such file system.


 Dealing with massive logs
 -------------------------

 If you want to enable using "trace_level" file logging levels, which
 produce a lot of events, like "debug", to not loose logged events you
 should also:

   * Increase in .config of your kernel CONFIG_LOG_BUF_SHIFT variable
     to much bigger value, then recompile it. For example, value 25 will
     provide good protection from logging overflow even under high volume
     of logging events. To use it you will need to modify the maximum
     allowed value for CONFIG_LOG_BUF_SHIFT in the corresponding Kconfig
     file to 25 as well.

   * Change in your /etc/syslog.conf or other config file of your favorite
     logging program to store kernel logs in async manner. For example,
     you can add in rsyslog.conf line "kern.info -/var/log/kernel" and
     add "kern.none" in line for /var/log/messages, so the resulting line
     would looks like:

     "*.info;kern.none;mail.none;authpriv.none;cron.none /var/log/messages"


 Persistent Reservations
 -----------------------

 SCST implements Persistent Reservations with full set of capabilities,
 including "Persistence Through Power Loss".

 The "Persistence Through Power Loss" data are saved in /var/lib/scst/pr
 with files with names the same as the names of the corresponding
 devices. Also this directory contains backup versions of those files
 with suffix ".1". Those backup files are used in case of power or other
 failure to prevent Persistent Reservation information from corruption
 during update. It is safe to assume that each of those files can be up
 to 1KB big.

 The Persistent Reservations available on all transports implementing
 get_initiator_port_transport_id() callback. Transports not implementing
 this callback will act in one of 2 possible scenarios ("all or
 nothing"):

 1. If a device has such transport connected and doesn't have persistent
 reservations, it will refuse Persistent Reservations commands as if it
 doesn't support them.

 2. If a device has persistent reservations, all initiators newly
 connecting via such transports will not see this device. After all
 persistent reservations from this device are released, upon reconnect
 the initiators will see it.


 ALUA Support
 ------------

 SCST supports both implicit and explicit asymmetric logical unit access
 (ALUA). ALUA is a feature defined by the ANSI T10 SCSI committee. It
 allows a target to tell the initiator which path to use in a multipath
 setup plus, in the explicit case, control state of each path via SET
 TARGET PORT GROUPS SCSI command. The redundant paths between initiator
 and target can be used either for redundancy or for load sharing
 purposes. The target can either be a single target system running SCST
 with multiple communication interfaces or two target systems each
 running SCST and configured in a high availability setup.

 In the SPC-4 standard the following concepts are defined related to ALUA:
 * Relative target port ID. A number between 1 and 65535 that uniquely
   identifies a target port. These numbers must be unique over the target as
   a whole, even if that target consists of multiple systems each running SCST.
 * Target port group asymmetric access state. One of active/optimized,
   active/non-optimized, standby, unavailable, logical block dependent or
   offline. The access state of a port defines which (if any) SCSI commands
   will be processed by the target port.
 * Target port preference indicator. This indicator is additional information
   next to the asymmetric access state that is provided by the target to an
   initiator and that may impact the decision taken by the initiator about
   which path that will be chosen.

 More detailed information about ALUA can be found in section 5.11.2 of the
 ANSI T10 standard called SPC-4.

 ALUA support in SCST
 ....................

 SCST allows to define ALUA settings for each unique combination of SCST
 device and SCST target. An initiator however queries ALUA settings by
 sending an appropriate SCSI command to a specific LUN of an SCST target.
 Each such LUN maps uniquely to an SCST device. For hardware SCST target
 drivers, e.g. ib_srpt, there is a one-to-one correspondence between SCST
 target and SCSI target port. With other SCST targets, e.g. iSCSI-SCST,
 by default the only relationship between SCST targets and SCSI target
 ports is that all SCST targets defined on a system are visible via all
 SCSI target ports. See also the iSCSI-SCST documentation about the
 allowed_portal attribute for information about how to associate iSCSI
 targets with a single physical interface.

 Notes:
 - In a H.A. setup it is the responsibility of the user to synchronize ALUA
   information between the individual systems running SCST. There are no
   provisions in SCST to exchange ALUA information automatically between
   individual systems.
 - In order to support H.A. setups it is possible to let one SCST system
   report information about target ports present in other SCST systems.
 - With SCST, and certainly in a H.A. setup, it is possible to configure ALUA
   such that an initiator receives information that is not standard compliant,
   e.g. setting all target ports in the offline state. It is the responsibility
   of the user to make sure that the information queried by an initiator is
   consistent independent of the LUN and the target port used by the initiator
   to query this information.
 - Before building a H.A. setup consisting of two or more SCST systems one
   should evaluate whether it's acceptable that persistent reservation commands,
   SCSI task management commands and MODE SELECT commands will only be processed
   by a single node instead of being processed by all nodes.

 Configuring ALUA in SCST
 ........................

 SCST allows to configure the following settings related to ALUA
 for each unique combination of SCST target and virtual SCST device
 (vdisk_fileio, vdisk_blockio, vcdrom, ...):
 * The target port group asymmetric access state. SCST supports all ALUA port
   states except logical block dependent.
 * The preference indicator for a target port group.
 * The relative target port ID associated with the SCST target.

 It is possible to configure the following ALUA-related information via the
 sysfs interface of SCST:
 * Device groups, where each device group has a name and contains zero or more
   SCST devices. If a device group contains only a single SCST device, the name
   of the group may be identical to the device name. See also
   /sys/kernel/scst_tgt/device_groups/mgmt.
 * Which devices are inside a device group. See also
   /sys/kernel/scst_tgt/device_groups/<device group name>/devices/mgmt.
 * Target groups, where each target group has a name and contains zero or more
   SCST target names. See also
   /sys/kernel/scst_tgt/device_groups/<device group name>/target_groups/mgmt.
 * Target port group identifier. This is a number in the range 0..65535 and is
   called the TARGET PORT GROUP in SPC-4. See also
   /sys/kernel/scst_tgt/device_groups/<device group name>/target_groups/<target
   group name>/group_id.
 * Target port group preference indicator. This is a boolean value called the
   PREF bit in SPC-4. See also /sys/kernel/scst_tgt/device_groups/<device group
   name>/target_groups/<target group name>/preferred.
 * Target port group state name. One of active, nonoptimized, standby,
   unavailable, offline or transitioning. See also
   /sys/kernel/scst_tgt/device_groups/<device group name>/target_groups/<target
   group name>/state.
 * Target group contents - zero or more target names. The target names either
   exist on the local system or on a remote system in a H.A. setup. For target
   names that refer to SCST targets on another system only the relative target
   port identifier matters, not the assigned name. See also
   /sys/kernel/scst_tgt/device_groups/<device group name>/target_groups/<target
   group name>/mgmt.
 * Relative target identifier. See also
   /sys/kernel/scst_tgt/device_groups/<device group name>/target_groups/<target
   group name>/<target name>/rel_tgt_id.

 The steps involved in configuring ALUA are:
 * Identify the SCST devices that will always share the same ALUA settings and
   state. Assign a name to each such group of SCST devices. If a device group
   only contains a single device, the group name may be identical to the device
   name.
 * Configure that device group in SCST via sysfs.
 * Identify the SCSI target ports that will always share the same ALUA settings
   and state. Assign a name, a group ID and preference indicator to each such
   SCSI target port group.
 * Configure the target port group information in SCST via sysfs.
 * Identify all SCST targets that can be accessed via a target port group.
 * Assign all these SCST target names to the target group via sysfs.
 * Assign a relative target port identifier to each target.

 As an example, in a H.A. setup with two systems each having one InfiniBand
 HCA controlled by the ib_srpt driver and where each system exports two LUNs
 the following configuration can be used in scst.conf on both systems:

 DEVICE_GROUP dgroup1 {
 	DEVICE disk01

 	TARGET_GROUP tgroup1 {
 		group_id 256
 		preferred 1
 		state active
 		TARGET fe80:0000:0000:0000:0002:c903:00fa:b7e1 {
 			rel_tgt_id 1
 		}
 	}
 	TARGET_GROUP tgroup2 {
 		group_id 257
 		state standby
 		TARGET fe80:0000:0000:0000:0002:c903:00fa:b7f2 {
 			rel_tgt_id 2
 		}
 	}
 }

 DEVICE_GROUP dgroup2 {
 	DEVICE disk02

 	TARGET_GROUP tgroup1 {
 		group_id 258
 		state standby
 		TARGET fe80:0000:0000:0000:0002:c903:00fa:b7e1 {
 			rel_tgt_id 1
 		}
 	}
 	TARGET_GROUP tgroup2 {
 		group_id 259
 		preferred 1
 		state active
 		TARGET fe80:0000:0000:0000:0002:c903:00fa:b7f2 {
 			rel_tgt_id 2
 		}
 	}
 }

 Note, if you are using "active" BLOCKIO device attribute to prevent open
 of the backend block device on the passive node, it is not recommended
 to set both active ("active", "nonoptimized") and passive ("standby",
 etc.) ALUA states for the same device if "bind_alua_state=1" is used, as
 shown above to keep internal "active" state of the BLOCKIO device consistent.

 If using the "active" BLOCKIO device attribute and multiple target groups
 exist per device on a SCST instance then "bind_alua_state=0" should be used
 and it is left up to the user to modify the "active" attribute value.

 Explicit ALUA
 .............

 To enable explicit ALUA you need in addition to the above settings set
 expl_alua device attribute to 1 (by default it is 0). Also you need to
 run stpgd and supply to it path to a script or program, which will
 perform actual path state switching on SET TARGET PORT GROUPS command,
 for instance, by calling drbdadm. For more information see stpgd README
 as well as sample script scst_on_stpg.

 DRBD and other replication/failover SW compatibility
 ....................................................

 DRBD as well as other replication/failover SW does not allow to open its
 device on the secondary as well as does not allow to perform primary to
 secondary transition, if this device is open.

 SCST BLOCKIO handler has necessary support for such behavior:

 1. If you need to prevent an SCST BLOCKIO device from opening its block
 device, you need to create it with parameter "active=0". In case of DRBD
 it would be done automatically, you don't have to use the "active"
 attribute.

 2. By default, if you write new ALUA state in the "state" attribute and
 "bind_alua_state=1" for the device, SCST BLOCKIO handler before transition
 closes open handles on all affected SCST devices and after transition
 reopens them, if the new state is active or nonoptimized. Alternatively,
 set "bind_alua_state=0" for SCST BLOCKIO devices and ALUA state changes
 will not open/close the backing block device, the user will need to handle
 this manually or via a cluster RA in an HA setup.

 Thus, the recommended implicit ALUA state change procedure for primary
 to secondary transition is:

 1. Block all involved SCST devices using "block" sysfs attribute (see
 above). Wait until the blocking finished.

 2. Change the ALUA state to "transitioning". At this moment all open
 file handles will be closed.

 3. Perform the DRBD or other replication/failover SW state transition

 4. Change the ALUA state to your desired secondary state.

 5. Unblock the blocked on step 1 devices.

 Optionally, if your initiators support Transitioning ALUA state, for
 more responsive behavior the blocked devices can be unblocked
 immediately after step (2). However, not all initiators correctly
 behave, if they receive ASYMMETRIC STATE TRANSITION sense.

 For the secondary to primary transition procedure is similar.

 In case of explicit ALUA, SCST automatically performs the necessary
 devices blocking around sending SCST_EVENT_STPG_USER_INVOKE event.

 Checking the Target Configuration
 .................................

 One way to verify the ALUA configuration from a Linux initiator is via
 the commands provided in the sg3_utils package. The first step is to
 verify whether for a certain LUN ALUA has been configured on the target.
 This is possible by checking whether the TPGS=1 text appears in the
 sg_inq output, where /dev/sdb is a device node created by the ib_srp
 initiator:

 # sg_inq /dev/sdb
 standard INQUIRY:
   PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
   [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=1  Resp_data_format=2
   SCCS=0  ACC=0  TPGS=1  3PC=0  Protect=0  BQue=0
   EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=1
   [RelAdr=0]  WBus16=0  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
   [SPI: Clocking=0x0  QAS=0  IUS=0]
     length=66 (0x42)   Peripheral device type: disk
  Vendor identification: SCST_FIO
  Product identification: disk01
  Product revision level:  300
  Unit serial number: 27cddc71

 The next step is to verify the target group configuration. That is possible
 by verifying whether the output of the sg_rtpg command matches the values
 configured on the target:

 # sg_rtpg /dev/sdb
 Report target port groups:
   target port group id : 0x100 , Pref=1
     target port group asymmetric access state : 0x00
     T_SUP : 0, O_SUP : 0, LBD_SUP : 0, U_SUP : 1, S_SUP : 1, AN_SUP : 1, AO_SUP : 1
     status code : 0x02
     vendor unique status : 0x00
     target port count : 01
     Relative target port ids:
       0x01
   target port group id : 0x101 , Pref=0
     target port group asymmetric access state : 0x02
     T_SUP : 0, O_SUP : 0, LBD_SUP : 0, U_SUP : 1, S_SUP : 1, AN_SUP : 1, AO_SUP : 1
     status code : 0x02
     vendor unique status : 0x00
     target port count : 01
     Relative target port ids:
       0x02

 The relative target port ID and the target port group ID for a certain path
 can be queried e.g. as follows:

 # sg_vpd -p di /dev/sdb
 Device Identification VPD page:
   Addressed logical unit:
     designator type: T10 vendor identification,  code set: ASCII
       vendor id: SCST_FIO
       vendor specific: 27cddc71-disk01
     designator type: EUI-64 based,  code set: Binary
       0x3237636464633731
   Target port:
     designator type: Relative target port,  code set: Binary
       Relative target port: 0x1
     designator type: Target port group,  code set: Binary
       Target port group: 0x100


 Initiator Support
 .................

 On Linux systems ALUA support is provided by the scsi_dh_alua kernel
 driver in combination with the user space multipathd daemon. You will
 have to modify at least the following in /etc/multipath.conf to enable
 ALUA:

 * hardware_handler      "1 alua"
 * prio                  alua
 * path_grouping_policy  group_by_prio
 * path_checker          tur

 Notes:
 - Newer versions of multipathd support a parameter called
   "detect_prio". It can be more convenient to enable this parameter instead of
   setting the parameter "prio" to "alua" for only those LUNs that support ALUA.
 - Older versions of multipathd (e.g. RHEL 5 and SLES 10 SP1) need
   'prio_callout "/sbin/mpath_prio_alua /dev/%n"' instead of 'prio alua'.

 # multipath -ll
 23237636464633731 dm-3 SCST_FIO,disk01
 size=1.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
 |-+- policy='service-time 0' prio=1 status=active
 | `- 10:0:0:0   sdd 8:48  active ready running
 `-+- policy='service-time 0' prio=130 status=enabled
   `- 11:0:0:0   sde 8:64  active ready running
 23133326137346538 dm-4 SCST_FIO,disk02
 size=1.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
 |-+- policy='service-time 0' prio=130 status=active
 | `- 10:0:0:2   sdn 8:208 active ready running
 `-+- policy='service-time 0' prio=1 status=enabled
   `- 11:0:0:2   sdp 8:240 active ready running

 The following information can be derived from the above output:
 * That the hardware handler (hw_handler) has been set to "1 alua".
 * That multipathd created two priority groups - one with priority 1 and one
   with priority 130.
 * That the SRP path with SCSI host number 10 will be used for communication
   with LUN "disk01" and that the SRP path with SCSI host number 11 will be used
   for communication with LUN "disk02".

 More information about how to configure the device mapper and the scsi_dh_alua
 driver can be found in the manual of your Linux distribution ("man
 multipath.conf", "man multipath" and "man multipathd").

 Windows initiator systems support ALUA from Windows Server 2008 on. For more
 information about ALUA support in Windows Server, see also:
 * Microsoft, Windows Server 2008 R2 Multipath I/O Overview, MSDN
   (http://technet.microsoft.com/en-us/library/cc725907.aspx).
 * Microsoft, Multipathing Support in Windows Server 2008, July 2008, MSDN
   (http://blogs.msdn.com/b/san/archive/2008/07/27/multipathing-support-in-windows-server-2008.aspx).
 * Microsoft, ALUA MPIO Logo Test, MSDN
   (http://msdn.microsoft.com/en-us/library/gg607458%28v=vs.85%29.aspx).

 Active/Non-Optimized via internal redirection
 .............................................

 The Active-Standby configuration is simple to understand and to set up.
 However, it might cause serious interoperability issues because not all
 initiators handle the ALUA state 'standby' state correctly. For instance,
 some versions of VMware reported to have such issues. Same for Windows.

 It is better to use the 'nonoptimized' state on the passive node instead
 of 'standby' with internal commands redirection to the active node. This
 is what the vast majority of storage vendors are doing. This is actually
 the reason why the 'standby' and 'unavailable' states have all those
 initiator interoperability issues. The latter combination has received
 too few testing because it is only marginally used.

 SCST has the necessary support for such redirection, it just needs to be
 configured correctly. It's a little bit of effort, especially to
 understand how it's going to function, but then it would work MUCH more
 reliable for full range of initiators. Ever poor initiators, who have no
 idea about ALUA (boot from SAN, e.g.) would work now. The following
 diagram illustrates this approach:

 ................................................................
 .                               .                              .
 .          Initiator A          .         Initiator B          .
 .               |               .              |               .
 ................................................................
 .               |               .              |               .
 .         target port C         .        target port D         .
 .               |               .              |               .
 .              SCST             .             SCST             .
 .           Instance E - target . target - Instance F          .
 .              /  \      port G . port H     /  \              .
 .             /    \           \./          /    \             .
 .            /      \          /.\         /      \            .
 . vdisk_blockio   dev_disk    / . \  dev_disk    vdisk_blockio .
 .    handler      handler    /  .  \ handler        handler    .
 .       |            |      /   .   \ |                |       .
 . block device    SCSI     /    .  SCSI          block device  .
 .       I         initiator     .  initiator           J       .
 .       |         node K        .  node L              |       .
 .       |______________________ .______________________|       .
 ................................................................
 The link between block devices I and J stands for synchronous replication.


 Such a setup can be configured as follows:

 1. Build SCST.

 2. Setup on active node internal redirect target, which is going to
 accept redirected commands from the passive node. It must be visible
 only to the passive node.

 3. Set "forward_dst" attribute for this target to 1. This is necessary to
 correctly handle PRs.

 4. Export through this target the SAME backend SCST device as being
 served to initiator(s) (consider for simplicity that there is only one
 served device)

 5. Connect to this SCST device through this internal target from the
 passive node, for instance, using iSCSI. Now you have a local SCSI
 device on the passive side pointing to the active node.

 6. Export this local device to the initiator(s) using SCST
 *pass-through* handler (scst_disk). Pass-though is needed to redirect
 non-block commands as well: ATS, XCOPY, etc.

 7. Set ALUA state to this target as "nonoptimized". Set the forward_src
 attribute to one.

 That's it on the normal path. Now the initiator(s) would see 2 paths:
 OPTIMIZED going to the active node and NON-OPTIMIZED going to the
 passive node, then redirected to the active node.

 On failover (i.e. switching active and passive states):

 1. Setup similar redirect target on the new active node.

 2. Setup connectivity to that new redirect target from the new passive
 node

 3. Start ALUA change (see above) on both nodes

 4. !! Exchange in the sysfs security group(s) for the initiator(s) *LUN*
 from old SCST device to the new one (blockio -> pass-through on the new
 passive and pass-through -> blockio on the new active) using "replace_no_ua"
 SCST command. You need to do it directly in the sysfs interface,
 scstadmin can't do it.

 5. Set ALUA states to "active" on the new active node and "nonoptimized"
 on the new passive node.

 6. Finish ALUA states change.

 Example using direct sysfs interface could look like:

 Active-Optimized node:

 modprobe scst
 modprobe scst_disk
 modprobe scst_vdisk

 # Main device, DRBD primary here
 echo "add_device aa filename=/dev/drbd1" >/sys/kernel/scst_tgt/handlers/vdisk_blockio/mgmt

 # Redirect device, not used here. Coming from connecting via iSCSI to the
 # corresponding redirect target on the other side.
 DEVICE=10:0:0:0
 echo add_device $DEVICE >/sys/kernel/scst_tgt/handlers/dev_disk/mgmt

 service iscsi-scst start

 # This is a regular, user-visible target
 echo "add_target iqn.2006-10.net.v:tgt " >/sys/kernel/scst_tgt/targets/iscsi/mgmt
 echo 1 >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgt/rel_tgt_id
 echo "add aa 0" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgt/luns/mgmt

 # This is redirect target, 192.168.9.x is the redirect network
 echo "add_target iqn.2006-10.net.v:tgtR" >/sys/kernel/scst_tgt/targets/iscsi/mgmt
 echo 2 >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgtR/rel_tgt_id
 echo "add_target_attribute iqn.2006-10.net.v:tgtR allowed_portal 192.168.9.1" >/sys/kernel/scst_tgt/targets/iscsi/mgmt
 echo "1" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgtR/forwarding
 echo "add aa 0" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgtR/luns/mgmt

 echo 1 >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgt/enabled
 echo 1 >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgtR/enabled

 echo 1 >/sys/kernel/scst_tgt/targets/iscsi/enabled

 # ALUA config

 echo create aa >/sys/kernel/scst_tgt/device_groups/mgmt
 echo add aa >/sys/kernel/scst_tgt/device_groups/aa/devices/mgmt

 echo add tgt_a >/sys/kernel/scst_tgt/device_groups/aa/target_groups/mgmt
 echo add iqn.2006-10.net.v:tgt >/sys/kernel/scst_tgt/device_groups/aa/target_groups/tgt_a/mgmt
 echo 1 >/sys/kernel/scst_tgt/device_groups/aa/target_groups/tgt_a/group_id
 echo active >/sys/kernel/scst_tgt/device_groups/aa/target_groups/tgt_a/state

 echo add tgt_n >/sys/kernel/scst_tgt/device_groups/aa/target_groups/mgmt
 echo add iqn.2006-10.net.v:tgt1 >/sys/kernel/scst_tgt/device_groups/aa/target_groups/tgt_n/mgmt
 echo 2 >/sys/kernel/scst_tgt/device_groups/aa/target_groups/tgt_n/iqn.2006-10.net.v:tgt1/rel_tgt_id
 echo 2 >/sys/kernel/scst_tgt/device_groups/aa/target_groups/tgt_n/group_id
 echo nonoptimized >/sys/kernel/scst_tgt/device_groups/aa/target_groups/tgt_n/state

 Active-Non-Optimized node:

 modprobe scst
 modprobe scst_disk
 modprobe scst_vdisk

 # Main device, DRBD secondary, not used here
 echo "add_device aa filename=/dev/drbd1" >/sys/kernel/scst_tgt/handlers/vdisk_blockio/mgmt

 # Redirect device. Coming from connecting via iSCSI to the
 # corresponding redirect target on the other side.
 DEVICE=10:0:0:0
 echo add_device $DEVICE >/sys/kernel/scst_tgt/handlers/dev_disk/mgmt

 service iscsi-scst start

 echo "add_target iqn.2006-10.net.v:tgt1" >/sys/kernel/scst_tgt/targets/iscsi/mgmt
 echo 2 >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgt1/rel_tgt_id
 echo "add $DEVICE 0" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgt1/luns/mgmt

 # Redirect target, 192.168.9.x is the redirect network
 echo "add_target iqn.2006-10.net.v:tgtR" >/sys/kernel/scst_tgt/targets/iscsi/mgmt
 echo 2 >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgtR/rel_tgt_id
 echo "add_target_attribute iqn.2006-10.net.v:tgtR allowed_portal 192.168.9.2" >/sys/kernel/scst_tgt/targets/iscsi/mgmt
 echo "1" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgtR/forwarding
 echo "add aa 0" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgtR/luns/mgmt

 echo 1 >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgt1/enabled

 echo 1 >/sys/kernel/scst_tgt/targets/iscsi/enabled
 echo 1 >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgtR/enabled

 # ALUA config

 echo create $DEVICE >/sys/kernel/scst_tgt/device_groups/mgmt
 echo add $DEVICE >/sys/kernel/scst_tgt/device_groups/$DEVICE/devices/mgmt

 echo add tgt_a >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/mgmt
 echo add iqn.2006-10.net.v:tgt >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_a/mgmt
 echo 1 >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_a/iqn.2006-10.net.v:tgt/rel_tgt_id
 echo 1 >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_a/group_id
 echo active >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_a/state

 echo add tgt_n >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/mgmt
 echo add iqn.2006-10.net.v:tgt1 >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_n/mgmt
 echo 1 >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_n/group_id
 echo nonoptimized >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_n/state

 ALUA state switch after DRBD primary-secondary transition:

 Ex-Optimized:

 echo "replace_no_ua $DEVICE 0" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgt1/luns/mgmt
 echo nonoptimized >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_a/state
 echo active >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_n/state

 Ex-Non-Optimized:

 echo "replace_no_ua aa 0" >/sys/kernel/scst_tgt/targets/iscsi/iqn.2006-10.net.v:tgt1/luns/mgmt
 echo nonoptimized >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_a/state
 echo active >/sys/kernel/scst_tgt/device_groups/$DEVICE/target_groups/tgt_n/state

 If you have any questions, please read this above text at least 3 times
 before asking. It might be tricky to understand :-)


 VAAI
 ----

 SCST supports all 3 VAAI SCSI commands: WRITE SAME, COMPARE AND WRITE
 (ATS) and EXTENDED COPY. Additionally, it supports not directly related
 to VAAI Thin Provisioning capabilities, particularly, UNMAP SCSI
 commands, WRITE SAME with UNMAP bit as well as thin provisioning related
 devices' sysfs attributes (see above).

 In some cases dev handlers should perform some manual actions to fully
 benefit from SCST VAAI implementation. Those actions described in the
 implementation notes below. For vdisk and fileio_tgt handlers they have
 already been implemented.

 IMPORTANT: To use EXTENDED COPY command between LUNs (datastores) they all
 =========  MUST have the same PRODUCT IDENTIFICATION INQUIRY field. By
            default, to simplify remote devices identification, SCST uses
            vdisk names as PRODUCT IDENTIFICATION, so SCST devices look
 	   differently from the initiators. However, for some reasons,
 	   VMware does not use EXTENDED COPY between LUNs with different
 	   PRODUCT IDENTIFICATION. Thus, to be able to use full VAAI in
 	   your VMware setups you must manually set PRODUCT
 	   IDENTIFICATION for all your VMware LUNs to the same value,
 	   for instance, "SCST", via using "prod_id" attribute. It could
 	   be done either by adding "prod_id" attribute to scstadmin
 	   scst.conf, or by directly writing to SCST sysfs attribute.
 	   For example:

 	   HANDLER vdisk_blockio {
 		DEVICE blockio1 {
 		filename /dev/sda5
 		prod_id SCST
 	   }

 	   or

 	   echo SCST >/sys/kernel/scst_tgt/devices/blockio1/prod_id
 	   correspondingly.

 	   Note, this prod_id modification must be done on all
 	   datastores BEFORE VMware connects to them.


 Implementation notes
 ....................

 WRITE SAME
 ~~~~~~~~~~

 WRITE SAME command supports 2 modes:

 1. Manual writing mode. In this mode WRITE SAME generates a set of
 internal WRITE(16) SCSI commands to perform requested writing.

 2. Remap mode. In this mode a dev handler, if supported, can remap being
 written blocks to a single block and then tell SCST to manually write
 parts of the requested area, which for some reason can not be remapped.

 In both cases dev handlers should call from WRITE SAME command handler
 scst_write_same() function. This function as the second argument gets
 array of descriptors where to write the requested block of data. Last
 element in this array must have len 0. If this argument is NULL, then
 the whole area will be manually written by SCST. This value should be
 used by dev handlers not supporting remapping blocks.

 User space dev handlers should use SCST_EXEC_REPLY_DO_WRITE_SAME
 reply_type of SCST_USER_EXEC subcommand. See scst_user doc for more
 info.


 COMPARE AND WRITE
 ~~~~~~~~~~~~~~~~~

 COMPARE AND WRITE implemented by SCST a set of read, compare and write
 actions done in atomic manner against affected blocks as well as regular
 RESERVE SCSI commands. Particularly, COMPARE AND WRITE doesn't need any
 queue flushing and unlimited number of COMPARE AND WRITE commands on
 different blocks can be executed simultaneously.

 The read and write actions implemented as generation of internal
 READ(16) and WRITE(16) SCSI commands.

 COMPARE AND WRITE command is completely transparent to dev handlers
 (they only see the corresponding READ(16) and WRITE(16) commands), so
 doesn't require any manual actions from them.


 EXTENDED COPY
 ~~~~~~~~~~~~~

 SCST implements EXTENDED COPY via internal Copy Manager target. This
 target has the following specific attribute in its sysfs:

  - allow_not_connected_copy - if not set (default), an initiator can
 perform copy only between devices it has direct access to via any
 target/session. If set, any initiator can copy between any devices in
 the system.

 The Copy Manager has access only to those devices, for which it has LUNs
 in /sys/kernel/scst_tgt/targets/copy_manager/copy_manager_tgt/luns/.
 Devices from scst_vdisk dev handler added to it automatically upon
 registration, but for other devices you need to manually add LUNs there
 the same way as for any target driver. You can also delete any device at
 any time from the Copy Manager visibility by deleting the corresponding
 LUN from the sysfs. It might be useful during ALUA state switching.

 Internally SCST implements EXTENDED COPY as generation of sets of
 internal READ(16) and WRITE(16) SCSI commands. Dev handlers don't need
 any manual actions to use it.

 Also SCST provides for dev handlers possibility to remap blocks instead
 of copy them, if they support this feature. It allows them to perform
 EXTENDED COPY command much faster by just metadata update of their
 backend storage, which supposed to be nearly instantaneous.

 To use this feature, a dev handler should setup ext_copy_remap()
 callback in its struct scst_dev_type. This callback is called by SCST
 during EXTENDED COPY command processing to let the dev handler try to
 remap affected blocks at first.

 Upon finish, the dev handler should call scst_ext_copy_remap_done(). In
 case of error, the dev handler should set the corresponding sense to cmd
 and then also call scst_ext_copy_remap_done(cmd, NULL, 0).

 If dev handler is not able to remap any part of the segment, if should
 kmalloc(), then fill all leftover subsegments and supply them to
 scst_ext_copy_remap_done(). SCST then will copy the subsegments using
 internal copy machine, then kfree() the supplied array. If the dev
 handler is not able to remap the whole segment, it can simply directly
 supply the original segment to scst_ext_copy_remap_done().

 It is highly recommended that in normal circumstances dev handlers call
 scst_ext_copy_remap_done() from another thread context than one where
 ext_copy_remap() callback was originally called, because otherwise there
 could be recursion in the segments processing. Hopefully, this thread
 context switch is natural for such potentially long operation as
 EXTENDED COPY.


 VMware and Ceph RBD space reclaim
 ---------------------------------

 VMware with VMFS5 filesystem ignores UNMAP alignment, so if you use 4MB
 Ceph RBD objects and VMFS5, only some discards will reclaim RBD space
 due to 1MB discard not often hitting the tail of objects.

 Thus, to have efficient ESXi space reclamation with RBD and VMFS5, you are
 recommended to use 1 MB object size in Ceph.

 See https://sourceforge.net/p/scst/mailman/message/35287598 thread for
 details.


 Caching
 -------

 By default for performance reasons VDISK FILEIO devices use write back
 caching policy.

 Generally, write back caching is safe for use and danger of it is
 greatly overestimated, because most modern (especially, Enterprise
 level) applications are well prepared to work with write back cached
 storage. Particularly, such are all transactions-based applications.
 Those applications flush cache to completely avoid ANY data loss on a
 crash or power failure. For instance, journaled file systems flush cache
 on each meta data update, so they survive power/hardware/software
 failures pretty well.

 Since locally on initiators write back caching is always on, if an
 application cares about its data consistency, it does flush the cache
 when necessary or on any write, if open files with O_SYNC. If it doesn't
 care, it doesn't flush the cache. As soon as the cache flushes
 propagated to the storage, write back caching on it doesn't make any
 difference. If application doesn't flush the cache, it's doomed to loose
 data in case of a crash or power failure doesn't matter where this cache
 located, locally or on the storage.

 To illustrate that consider, for example, a user who wants to copy /src
 directory to /dst directory reliably, i.e. after the copy finished no
 power failure or software/hardware crash could lead to a loss of the
 data in /dst. There are 2 ways to achieve this. Let's suppose for
 simplicity cp opens files for writing with O_SYNC flag, hence bypassing
 the local cache.

 1. Slow. Make the device behind /dst working in write through caching
 mode and then run "cp -a /src /dst".

 2. Fast. Let the device behind /dst working in write back caching mode
 and then run "cp -a /src /dst; sync". The reliability of the result is
 the same, but it's much faster than (1). Nobody would care if a crash
 happens during the copy, because after recovery simply leftovers from
 the not completed attempt would be deleted and the operation would be
 restarted from the very beginning.

 So, you can see in (2) there is no danger of ANY data loss from the
 write back caching. Moreover, since on practice cp doesn't open files
 for writing with O_SYNC flag, to get the copy done reliably, sync
 command must be called after cp anyway, so enabling write back caching
 wouldn't make any difference for reliability.

 Also you can consider it from another side. Modern HDDs have at least
 16MB of cache working in write back mode by default, so for a 10 drives
 RAID it is 160MB of a write back cache. How many people are happy with
 it and how many disabled write back cache of their HDDs? Almost all and
 almost nobody correspondingly? Moreover, many HDDs lie about state of
 their cache and report write through while working in write back mode.
 They are also successfully used.

 Note, Linux I/O subsystem guarantees to propagated cache flushes to the
 storage only using data protection barriers, which usually turned off by
 default (see http://lwn.net/Articles/283161). Without barriers enabled
 Linux doesn't provide a guarantee that after sync()/fsync() all written
 data really hit permanent storage. They can be stored in the cache of
 your backstorage devices and, hence, lost on a power failure event.
 Thus, ever with write-through cache mode, you still either need to
 enable barriers on your backend file system on the target (for direct
 /dev/sdX devices this is, indeed, impossible), or need a good UPS to
 protect yourself from not committed data loss. Some info about barriers
 from the XFS point of view could be found at
 http://xfs.org/index.php/XFS_FAQ#Write_barrier_support. On Linux
 initiators for Ext3 and ReiserFS file systems the barrier protection
 could be turned on using "barrier=1" and "barrier=flush" mount options
 correspondingly. You can check if the barriers turn on or off by looking
 in /proc/mounts. Windows and, AFAIK, other UNIX'es don't need any
 special explicit options and do necessary barrier actions on write-back
 caching devices by default.

 To limit this data loss with write back caching you can use files in
 /proc/sys/vm to limit amount of unflushed data in the system cache.

 If you for some reason have to use VDISK FILEIO devices in write through
 caching mode, don't forget to disable internal caching on their backend
 devices or make sure they have additional battery or supercapacitors
 power supply on board. Otherwise, you still on a power failure would
 loose all the unsaved yet data in the devices internal cache.

 Note, on some real-life workloads write through caching might perform
 better, than write back one with the barrier protection turned on.


 Errors caching
 ..............

 When using virtual device in FILEIO mode, the Linux page cache comes
 into picture. The negative side of it is that it's sometimes also
 caching errored pages. That is, if the underlying file experiences IO
 errors, those errors might be cached by the Linux page cache. As a
 result, even when the underlying file recovers and stops failing IOs,
 the initiator may still hit IO errors returned by the Linux page cache,
 until the cache re-reads the errored pages (usually it happens pretty
 soon, but not immediately). To make sure that cached pages are dropped,
 one of the following can be done:

 - Detach the SCSI virtual device (del_device) and re-attach it
   (add_device). This should evict all the cached pages, unless somebody
   else holds the same "filename" opened.

 - Issue a BLKFLSBUF ioctl to the same "filename" you provided for "add_device".

 For the second option, a rudimentary C code is required:

 fd = open(filename, O_RDWR);
 if (fd < 0) {
     err = errno;
     ...
 } else {
    err = ioctl(fd, BLKFLSBUF);
    if (err < 0) {
        err = errno;
        ...
    }
    close(fd);
 }


 BLOCKIO VDISK mode
 ------------------

 This module works best for these types of scenarios:

 1) Data that are not aligned to 4K sector boundaries and <4K block sizes
 are used, which is normally found in virtualization environments where
 operating systems start partitions on odd sectors (Windows and it's
 sector 63).

 2) Large block data transfers normally found in database loads/dumps and
 streaming media.

 3) Advanced relational database systems that perform their own caching
 which prefer or demand direct IO access and, because of the nature of
 their data access, can actually see worse performance with
 non-discriminate caching.

 4) Multiple layers of targets were the secondary and above layers need
 to have a consistent view of the primary targets in order to preserve
 data integrity which a page cache backed IO type might not provide
 reliably.

 Also it has an advantage over FILEIO that it doesn't copy data between
 the system cache and the commands data buffers, so it saves a
 considerable amount of CPU power and memory bandwidth.

 IMPORTANT: Since data in BLOCKIO and FILEIO modes are not consistent between
 =========  each other, if you try to use a device in both those modes
 	   simultaneously, you will almost instantly corrupt your data
 	   on that device.

 IMPORTANT: Some kernels starting from 2.6.32 have a problem, which
 =========  prevents BLOCKIO from working correctly with RAID5/DM. See
 	   http://lkml.org/lkml/2010/7/28/315. That problem was fixed in
 	   2.6.32.19, 2.6.34.4, 2.6.35.2 and 2.6.36-rc1. It is strongly
 	   recommended to not use affected kernels with BLOCKIO.

 IMPORTANT: In SCST 1.x BLOCKIO worked by default in NV_CACHE mode, when
 =========  each device reported to remote initiators as having write through
            caching. But if your backend block device has internal write
 	   back caching it might create a possibility for data loss of
 	   the cached in the internal cache data in case of a power
 	   failure. Starting from SCST 2.0 BLOCKIO works by default in
 	   non-NV_CACHE mode, when each device reported to remote
 	   initiators as having write back caching, and synchronizes the
 	   internal device's cache on each SYNCHRONIZE_CACHE command
 	   from the initiators. It might lead to some *PERFORMANCE LOSS*,
 	   so if you are are sure in your power supply and want to
 	   restore the 1.x behavior, your should recreate your BLOCKIO
 	   devices in NV_CACHE mode.


 Pass-through mode
 -----------------

 In the pass-through mode (i.e. using the pass-through device handlers
 scst_disk, scst_tape, etc) SCSI commands, coming from remote initiators,
 are passed to local SCSI devices on target as is, without any
 modifications.

 SCST supports 1 to many pass-through, when several initiators can safely
 connect a single pass-through device (a tape, for instance). For such
 cases SCST emulates all the necessary functionality.

 In the sysfs interface all real SCSI devices are listed in
 /sys/kernel/scst_tgt/devices in form host:channel:id:lun numbers, for
 instance 1:0:0:0. The recommended way to match those numbers to your
 devices is use of lsscsi utility.

 Each pass-through dev handler has in its root subdirectory
 /sys/kernel/scst_tgt/handlers/handler_name, e.g.
 /sys/kernel/scst_tgt/handlers/dev_disk, "mgmt" file. It allows the
 following commands. They can be sent to it using, e.g., echo command.

  - "add_device" - this command assigns SCSI device with
 host:channel:id:lun numbers to this dev handler.

 echo "add_device 1:0:0:0" >/sys/kernel/scst_tgt/handlers/dev_disk/mgmt

 will assign SCSI device 1:0:0:0 to this dev handler.

  - "del_device" - this command unassigns SCSI device with
 host:channel:id:lun numbers from this dev handler.

 As usually, on read the "mgmt" file returns small help about available
 commands.

 You need to manually assign each your real SCSI device to the
 corresponding pass-through dev handler using the "add_device" command,
 otherwise the real SCSI devices will not be visible remotely. The
 assignment isn't done automatically, because it could lead to the
 pass-through dev handlers load and initialization problems if any of the
 local real SCSI devices are malfunctioning.

 As any other hardware, the local SCSI hardware can not handle commands
 with amount of data and/or segments count in scatter-gather array bigger
 some values. Therefore, when using the pass-through mode you should note
 that values for maximum number of segments and maximum amount of
 transferred data (max_sectors) for each SCSI command on devices on
 initiators can not be bigger, than corresponding values of the
 corresponding SCSI devices on the target. Otherwise you will see
 symptoms like small transfers work well, but large ones stall and
 messages like: "Unable to complete command due to SG IO count
 limitation" are printed in the kernel logs.

 You can't control from the user space limit of the scatter-gather
 segments, but for block devices usually it is sufficient if you set on
 the initiators /sys/block/DEVICE_NAME/queue/max_sectors_kb in the same
 or lower value as in /sys/block/DEVICE_NAME/queue/max_hw_sectors_kb for
 the corresponding devices on the target.

 For not-block devices SCSI commands are usually generated directly by
 applications, so, if you experience large transfers stalls, you should
 check documentation for your application how to limit the transfer
 sizes.

 Another way to solve this issue is to build SG entries with more than 1
 page each. See the following patch as an example:
 http://scst.sourceforge.net/sgv_big_order_alloc.diff


 User space mode using scst_user dev handler
 -------------------------------------------

 User space program fileio_tgt uses interface of scst_user dev handler
 and allows to see how it works in various modes. Fileio_tgt provides
 mostly the same functionality as scst_vdisk handler with the most
 noticeable difference that it supports O_DIRECT mode. O_DIRECT mode is
 basically the same as BLOCKIO, but also supports files, so for some
 loads it could be significantly faster, than the regular FILEIO access.
 All the words about BLOCKIO from above apply to O_DIRECT as well. See
 fileio_tgt's README file for more details.


 Performance
 -----------

 SCST from the very beginning has been designed and implemented to
 provide the best possible performance. Since there is no "one fit all"
 the best performance configuration for different setups and loads, SCST
 provides extensive set of settings to allow to tune it for the best
 performance in each particular case. You don't have to necessary use
 those settings. If you don't, SCST will do very good job to autotune for
 you, so the resulting performance will, in average, be better
 (sometimes, much better) than with other SCSI targets. But in some cases
 you can by manual tuning improve it even more.

 Before doing any performance measurements note that performance results
 are very much dependent from your type of load, so it is crucial that
 you choose access mode (FILEIO, BLOCKIO, O_DIRECT, pass-through), which
 suits your needs the best.

 In order to get the maximum performance you should:

 1. For SCST:

  - Disable in Makefile and scst.h CONFIG_SCST_STRICT_SERIALIZING,
    CONFIG_SCST_EXTRACHECKS, CONFIG_SCST_TRACING, CONFIG_SCST_DEBUG*,
    CONFIG_SCST_STRICT_SECURITY.

 2. For target drivers:

  - Disable in Makefiles CONFIG_SCST_EXTRACHECKS, CONFIG_SCST_TRACING,
    CONFIG_SCST_DEBUG*

 3. For device handlers, including VDISK:

  - Disable in Makefile CONFIG_SCST_TRACING and CONFIG_SCST_DEBUG.

 Note, by disabling CONFIG_SCST_TRACING and CONFIG_SCST_DEBUG you are
 disabling many useful SCST diagnostic messages, which can significantly
 help in many troubleshooting cases. So, if you may consider to keep
 CONFIG_SCST_TRACING, its performance impact is very limited.

 IMPORTANT: The development version of SCST in the SVN is optimized for
 =========  development and bug hunting, not for performance. This means
            it is MUCH slower (multiple times). To reconfigure SCST for
 	   release you should run "make 2release" command in the root of
 	   your source code (e.g. trunk/). It will set the above options
 	   as needed. The only option it doesn't set is
 	   CONFIG_SCST_TEST_IO_IN_SIRQ, so, if needed, you should change
 	   it manually. There is also so called "performance" build
 	   mode, which you can activate by "make 2perf" command. The
 	   only difference it has comparing to release build mode is
 	   disabled CONFIG_SCST_TRACING option. Because of that, you
 	   won't be able to see many important SCST run time logging
 	   messages. This mode is intended to evaluate impact of
 	   CONFIG_SCST_TRACING on performance and not recommended for
 	   production.

 IMPORTANT: You can't use debug SCST drivers with non-debug SCST core.
 =========  So, after disabling both CONFIG_SCST_TRACING and CONFIG_SCST_DEBUG
 	   for SCST core you have to disable them for all SCST drivers
 	   you are using as well.

 4. Make sure you have io_grouping_type option set correctly, especially
 in the following cases:

  - Several initiators share your target's backstorage. It can be a
    shared LU using some cluster FS, like VMFS, as well as can be
    different LUs located on the same backstorage (RAID array). For
    instance, if you have 3 initiators and each of them using its own
    dedicated FILEIO device file from the same RAID-6 array on the
    target.

    In this case for the best performance you should have
    io_grouping_type option set in value "never" in all the LUNs' targets
    and security groups.

  - Your initiator connected to your target in MPIO mode. In this case for
    the best performance you should:

     * Either connect all the sessions from the initiator to a single
       target or security group and have io_grouping_type option set in
       value "this_group_only" in the target or security group,

     * Or, if it isn't possible to connect all the sessions from the
       initiator to a single target or security group, assign the same
       numeric io_grouping_type value for each target/security group this
       initiator connected to. The exact value itself doesn't matter,
       important only that all the targets/security groups use the same
       value.

 Don't forget, io_grouping_type makes sense only if you use CFQ I/O
 scheduler on the target and for devices with threads_num >= 0 and, if
 threads_num > 0, with threads_pool_type "per_initiator".

 You can check if in your setup io_grouping_type set correctly as well as
 if the "auto" io_grouping_type value works for you by tests like the
 following:

  - For not MPIO case you can run single thread sequential reading, e.g.
    using buffered dd, from one initiator, then run the same single
    thread sequential reading from the second initiator in parallel. If
    io_grouping_type is set correctly the aggregate throughput measured
    on the target should only slightly decrease as well as all initiators
    should have nearly equal share of it. If io_grouping_type is not set
    correctly, the aggregate throughput and/or throughput on any
    initiator will decrease significantly, in 2 times or even more. For
    instance, you have 80MB/s single thread sequential reading from the
    target on any initiator. When then both initiators are reading in
    parallel you should see on the target aggregate throughput something
    like 70-75MB/s with correct io_grouping_type and something like
    35-40MB/s or 8-10MB/s on any initiator with incorrect.

  - For the MPIO case it's quite easier. With incorrect io_grouping_type
    you simply won't see performance increase from adding the second
    session (assuming your hardware is capable to transfer data through
    both sessions in parallel), or can even see a performance decrease.

 5. If you are going to use your target in an VM environment, for
 instance as a shared storage with VMware, make sure all your VMs
 connected to the target via *separate* sessions. For instance, for iSCSI
 it means that each VM has own connection to the target, not all VMs
 connected using a single connection. You can check it using SCST sysfs
 interface. For other transports you should use available facilities,
 like NPIV for Fibre Channel, to make separate sessions for each VM. If
 you miss it, you can greatly loose performance of parallel access to
 your target from different VMs. This isn't related to the case if your
 VMs are using the same shared storage, like with VMFS, for instance. In
 this case all your VM hosts will be connected to the target via separate
 sessions, which is enough.

 6. For other target and initiator software parts:

  - Make sure you applied on your kernel all available SCST patches.
    If for your kernel version this patch doesn't exist, it is strongly
    recommended to upgrade your kernel to version, for which this patch
    exists.

  - Don't enable debug/hacking features in the kernel, i.e. use them as
    they are by default.

  - The default kernel read-ahead and queuing settings are optimized
    for locally attached disks, therefore they are not optimal if they
    attached remotely (SCSI target case), which sometimes could lead to
    unexpectedly low throughput. You should increase read-ahead size to at
    least 512KB or even more on all initiators and the target.

    You should also limit on all initiators maximum amount of sectors per
    SCSI command. This tuning is also recommended on targets with large
    read-ahead values. To do it on Linux, run:

    echo “64” > /sys/block/sdX/queue/max_sectors_kb

    where specify instead of X your imported from target device letter,
    like 'b', i.e. sdb.

    To increase read-ahead size on Linux, run:

    blockdev --setra N /dev/sdX

    where N is a read-ahead number in 512-byte sectors and X is a device
    letter like above.

    Note: you need to set read-ahead setting for device sdX again after
    you changed the maximum amount of sectors per SCSI command for that
    device.

    Note2: you need to restart SCST after you changed read-ahead settings
    on the target. It is a limitation of the Linux read ahead
    implementation. It reads RA values for each file only when the file
    is open and not updates them when the global RA parameters changed.
    Hence, the need for vdisk to reopen all its files/devices.

  - You may need to increase amount of requests that OS on initiator
    sends to the target device. To do it on Linux initiators, run

    echo “64” > /sys/block/sdX/queue/nr_requests

    where X is a device letter like above.

    You may also experiment with other parameters in /sys/block/sdX
    directory, they also affect performance. If you find the best values,
    please share them with us.

  - On the target use CFQ IO scheduler. In most cases it has performance
    advantage over other IO schedulers, sometimes huge (2+ times
    aggregate throughput increase).

  - It is recommended to turn the kernel preemption off, i.e. set
    the kernel preemption model to "No Forced Preemption (Server)".

  - Looks like XFS is the best filesystem on the target to store device
    files, because it allows considerably better linear write throughput,
    than ext3.

 7. For hardware on target.

  - Make sure that your target hardware (e.g. target FC or network card)
    and underlying IO hardware (e.g. IO card, like SATA, SCSI or RAID to
    which your disks connected) don't share the same PCI bus. You can
    check it using lspci utility. They have to work in parallel, so it
    will be better if they don't compete for the bus. The problem is not
    only in the bandwidth, which they have to share, but also in the
    interaction between cards during that competition. This is very
    important, because in some cases if target and backend storage
    controllers share the same PCI bus, it could lead up to 5-10 times
    less performance, than expected. Moreover, some motherboard (by
    Supermicro, particularly) have serious stability issues if there are
    several high speed devices on the same bus working in parallel. If
    you have no choice, but PCI bus sharing, set in the BIOS PCI latency
    as low as possible.

 8. If you use VDISK IO module in FILEIO mode, NV_CACHE option will
 provide you the best performance. But using it make sure you use a good
 UPS with ability to shutdown the target on the power failure.

 Baseline performance numbers you can find in those measurements:
 http://lkml.org/lkml/2009/3/30/283.

 IMPORTANT: If you use on initiator some versions of Windows (at least W2K)
 =========  you can't get good write performance for VDISK FILEIO devices with
            default 512 bytes block sizes. You could get about 10% of the
 	   expected one. This is because of the partition alignment, which
 	   is (simplifying) incompatible with how Linux page cache
 	   works, so for each write the corresponding block must be read
 	   first. Use 4096 bytes block sizes for VDISK devices and you
 	   will have the expected write performance. Actually, any OS on
 	   initiators, not only Windows, will benefit from block size
 	   max(PAGE_SIZE, BLOCK_SIZE_ON_UNDERLYING_FS), where PAGE_SIZE
 	   is the page size, BLOCK_SIZE_ON_UNDERLYING_FS is block size
 	   on the underlying FS, on which the device file located, or 0,
 	   if a device node is used. Both values are from the target.
 	   See also important notes about setting block sizes >512 bytes
 	   for VDISK FILEIO devices above.


 9. In some cases, for instance working with SSD devices, which consume
 100% of a single CPU load for data transfers in their internal threads,
 to maximize IOPS it can be needed to assign for those threads dedicated
 CPUs. Consider using cpu_mask attribute for devices with
 threads_pool_type "per_initiator" or Linux CPU affinity facilities for
 other threads_pool_types. No IRQ processing should be done on those
 CPUs. Check that using /proc/interrupts. See taskset command and
 Documentation/IRQ-affinity.txt in your kernel's source tree for how to
 assign IRQ affinity to tasks and IRQs.

 The reason for that is that processing of coming commands in SIRQ
 context might be done on the same CPUs as SSD devices' threads doing data
 transfers. As the result, those threads won't receive all the processing
 power of those CPUs and perform worse.

 10. If your storage is capable of operation on hundreds of thousands
 IOPS level, you can use poll_us sysfs attribute to set how many us each
 SCST thread is polling its queue after it became empty in a hope that a
 new command can come. In some cases, polling can significantly increase
 IOPS, especially if low power states on CPU not disabled, because on
 high IOPS polling could be cheaper comparing to spending significant
 time on entering, then exiting CPU low power states + corresponding
 context switches. Polling is disabled by default. The recommended value
 to start from is 5-10 us. Then you can increase or decrease it to see if
 your IOPS are increasing or decreasing.


 Commands suspending takes too long
 ----------------------------------

 SCST is suspending commands during some management activities like
 adding/deleting LUNs or devices. It is done to have lockless LUNs
 translation on the hot commands processing path. This brings significant
 performance advantage. You will see a message like "Waiting for X active
 commands to complete" when this wait started.

 But downside of it is that no new commands start executing until older
 ones, which had started before the suspending begun, finished. This
 wait can not be any longer, than the worst command latency any your
 initiator is seeing at this particular time.

 So, if this wait takes too long, in majority of cases it means that you
 are overloading your storage. A proper storage should have worst case
 latency below few hundreds of milliseconds. In this case the SCST
 suspending will finish in few hundreds of milliseconds at worse.

 Another case, when it can take too long to suspend is a hung user space
 device (i.e. scst_user device) not responding to any command. In this
 case you should kill the corresponding user space program to finish
 suspending.


 Work if target's backstorage or link is too slow
 ------------------------------------------------

 Under high I/O load, when your target's backstorage gets overloaded, or
 working over a slow link between initiator and target, when the link
 can't serve all the queued commands on time, you can experience I/O
 stalls or see in the kernel log abort or reset messages.

 At first, consider the case of too slow target's backstorage. On some
 seek intensive workloads even fast disks or RAIDs, which able to serve
 continuous data stream on 500+ MB/s speed, can be as slow as 0.3 MB/s.
 Another possible cause for that can be MD/LVM/RAID on your target as in
 http://lkml.org/lkml/2008/2/27/96 (check the whole thread as well).

 Thus, in such situations simply processing of one or more commands takes
 too long time, hence initiator decides that they are stuck on the target
 and tries to recover. Particularly, it is known that the default amount
 of simultaneously queued commands (48) is sometimes too high if you do
 intensive writes from VMware on a target disk, which uses LVM in the
 snapshot mode. In this case value like 16 or even 8-10 depending of your
 backstorage speed could be more appropriate.

 There are 6 possible actions, which you can do to workaround or fix such
 issues:

 1. Ignore incoming task management (TM) commands. It's fine if there are
 not too many of them, so average performance isn't hurt and the
 corresponding device isn't getting put offline, i.e. if the backstorage
 isn't a way too slow.

 2. Decrease /sys/block/sdX/device/queue_depth on the initiator in case
 if it's Linux (see below how) or/and SCST_MAX_TGT_DEV_COMMANDS constant
 in scst_priv.h file until you stop seeing incoming TM commands.
 ISCSI-SCST driver also has its own iSCSI specific parameter for that,
 see its README file.

 To decrease device queue depth on Linux initiators you can run command:

 # echo Y >/sys/block/sdX/device/queue_depth

 where Y is the new number of simultaneously queued commands, X - your
 imported device letter, like 'a' for sda device. There are no special
 limitations for Y value, it can be any value from 1 to possible maximum
 (usually, 32), so start from dividing the current value on 2, i.e. set
 16, if /sys/block/sdX/device/queue_depth contains 32.

 3. Increase the corresponding timeout on the initiator. For Linux it is
 located in
 /sys/devices/platform/host*/session*/target*:0:0/*:0:0:1/timeout. It can
 be done automatically by an udev rule. For instance, the following
 rule will increase it to 300 seconds:

 SUBSYSTEM=="scsi", KERNEL=="[0-9]*:[0-9]*", ACTION=="add", ATTR{type}=="0|7|14", ATTR{timeout}="300"

 By default, this timeout is 30 or 60 seconds, depending on your distribution.

 4. Try to avoid such seek intensive workloads.

 5. Increase speed of the target's backstorage.

 6. Implement in SCST QoS, so queue depth size on the target is
 dynamically adjusted, hence worst case initiator seen latencies are
 controlled.

 Next, consider the case of too slow link between initiator and target,
 when the initiator tries to simultaneously push N commands to the target
 over it. In this case time to serve those commands, i.e. send or receive
 data for them over the link, can be more, than timeout for any single
 command, hence one or more commands in the tail of the queue can not be
 served on time less than the timeout, so the initiator will decide that
 they are stuck on the target and will try to recover.

 To workaround/fix this issue in this case you can use ways 1, 2, 3 above
 or (7): increase speed of the link between target and initiator.

 Note, that logged messages about QUEUE_FULL status are quite different
 by nature. This is a normal work, just SCSI flow control in action.
 Simply don't enable "mgmt_minor" logging level, or, alternatively, if
 you are confident in the worst case performance of your back-end storage
 or initiator-target link, you can increase SCST_MAX_TGT_DEV_COMMANDS in
 scst_priv.h to 64. Usually initiators don't try to push more commands on
 the target.

 IMPORTANT
 =========

 There must be LUN 0 in each security group, i.e. LUs numeration must not
 start from, e.g., 1. Otherwise you will see no devices on remote
 initiators and SCST core will write into the kernel log message: "tgt_dev
 for LUN 0 not found, command to unexisting LU?"

 IMPORTANT
 =========

 All the access control must be fully configured BEFORE load of the
 corresponding target driver! When you load a target driver or enable
 target mode in it, as for qla2x00t driver, it will immediately start
 accepting new connections, hence creating new sessions, and those new
 sessions will be assigned to security groups according to the
 *currently* configured access control settings. For instance, to
 "Default" group, instead of "HOST004" as you may need, because "HOST004"
 doesn't exist yet. So, one must configure all the security groups before
 new connections from the initiators are created, i.e. before target
 drivers loaded.

 Access controls can be altered after the target driver loaded as long as
 the target session doesn't yet exist. And even in the case of the
 session already existing, changes are still possible, but won't be
 reflected on the initiator side.

 So, the safest choice is to configure all the access control before any
 target driver load and then only add new devices to new groups for new
 initiators or add new devices to old groups, but not altering existing
 LUNs in them.


 Credits
 -------

 Thanks to:

  * Mark Buechler <mark.buechler@gmail.com> for a lot of useful
    suggestions, bug reports and help in debugging.

  * Ming Zhang <mingz@ele.uri.edu> for fixes and comments.

  * Nathaniel Clark <nate@misrule.us> for fixes and comments.

  * Calvin Morrow <calvin.morrow@comcast.net> for testing and useful
    suggestions.

  * Hu Gang <hugang@soulinfo.com> for the original version of the
    LSI target driver.

  * Erik Habbinga <erikhabbinga@inphase-tech.com> for fixes and support
    of the LSI target driver.

  * Ross S. W. Walker <rswwalker@hotmail.com> for BLOCKIO inspiration
    and Vu Pham <huongvp@yahoo.com> who implemented it for VDISK dev handler.

  * Alessandro Premoli <a.premoli@andxor.it> for fixes

  * Terry Greeniaus <tgreeniaus@yottayotta.com> for fixes.

  * Krzysztof Blaszkowski <kb@sysmikro.com.pl> for many fixes and bug reports.

  * Jianxi Chen <pacers@users.sourceforge.net> for fixing problem with
    devices >2TB in size

  * Bart Van Assche <bvanassche@acm.org> for a lot of help

  * University of New Hampshire Interoperability Labs (UNH IOL, http://www.iol.unh.edu)
    for UNH-iSCSI project (http://www.iol.unh.edu/consortiums/iscsi/index.html)
    on which interface between SCST core and target drivers was based.

  * Daniel Debonzi <debonzi@linux.vnet.ibm.com> for a big part of the
    initial SCST sysfs tree implementation


 Vladislav Bolkhovitin <vst@vlnb.net>, http://scst.sourceforge.net