| <!doctype linuxdoc system> |
| |
| <article> |
| |
| <title> |
| SCST technical description |
| </title> |
| |
| <author> |
| <name>Vladislav Bolkhovitin</name> |
| </author> |
| |
| <date> |
| Version 3.0.0 for SCST 3.0.0 and later |
| </date> |
| |
| <toc> |
| |
| <sect>Introduction |
| |
| <p> SCST is a SCSI target mid-level subsystem for Linux. It provides |
| unified consistent interface between SCSI target drivers, backend device |
| handlers and Linux kernel as well as simplifies target drivers |
| development as much as possible. |
| |
| It has the following features: |
| |
| <itemize> |
| |
| <item> Very low overhead and fine-grained locks, which allow to reach |
| maximum possible performance and scalability that close to theoretical |
| limit. |
| |
| <item> Complete SMP support. |
| |
| <item> Performs all required pre- and post- processing of incoming |
| requests and all necessary error recovery functionality. |
| |
| <item> Emulates necessary functionality of SCSI host adapters, because |
| from a remote initiator's point of view SCST acts as a SCSI host with |
| its own devices. Some of the emulated functions are the following: |
| |
| <itemize> |
| |
| <item> Generation of necessary UNIT ATTENTIONs, their storage and |
| delivery to all connected remote initiators (sessions). |
| |
| <item> RESERVE/RELEASE functionality, including Persistent Reservations. |
| |
| <item> All types of RESETs and other task management functions. |
| |
| <item> REPORT LUNS command as well as SCSI address space |
| management in order to have consistent address space on all |
| remote initiators, since local SCSI devices could not know about |
| each other to report via REPORT LUNS command. Additionally, SCST |
| responds with error on all commands to non-existing devices and |
| provides access control, so different remote initiators could |
| see different set of devices. |
| |
| <item> Other necessary functionality (task attributes, etc.) as |
| specified in SAM-2, SPC-2, SAM-3, SPC-3 and other SCSI standards. |
| |
| </itemize> |
| |
| <item> Verifies all incoming requests to ensure commands execution |
| reliability and security. |
| |
| <item> Device handlers architecture provides extra flexibility by |
| allowing to make additional requests processing, which is completely |
| independent from target drivers, for example, data caching or device |
| dependent exceptional conditions treatment. |
| |
| </itemize> |
| |
| <sect>Terms and Definitions |
| |
| <p> |
| <bf/SCSI initiator device/ |
| |
| A SCSI device that originates service and task management requests to be |
| processed by a SCSI target device and receives device service and task |
| management responses from SCSI target devices. |
| |
| <bf/SCSI target device/ |
| |
| A SCSI device that receives device service and task management requests |
| for processing and sends device service and task management responses |
| to SCSI initiator devices or drivers. |
| |
| <bf/SCST session/ |
| |
| SCST session is the object that describes relationship between a remote |
| initiator and SCST via a target driver. All the commands from the remote |
| initiator is passed to SCST in the session. For example, for connection |
| oriented protocols, like iSCSI, SCST session could be mapped to TCP |
| connection (as well as iSCSI session). SCST session is equivalent of |
| SCSI I_T nexus object. |
| |
| <bf/Local SCSI initiator/ |
| |
| A SCSI initiator that is located on the same host as SCST subsystem. |
| Examples are sg and st drivers. |
| |
| <bf/Remote SCSI initiator/ |
| |
| A SCSI initiator that is located on the remote host for SCST subsystem |
| and makes client connections to SCST via SCST target drivers. |
| |
| <bf/SCSI target driver/ |
| |
| A Linux hardware or logical driver that acts as a SCSI target for remote |
| SCSI initiators, i.e. accepts remote connections, passes incoming SCSI |
| requests to SCST and sends SCSI responses from SCST back to their |
| originators. |
| |
| <bf/Device (backend) handler driver/ |
| |
| Also known as "device type specific driver" or "dev handler", SCST |
| driver, which helps SCST to analyze incoming requests and determine |
| parameters, specific to various types of devices as well as perform some |
| processing. See below for more details. |
| |
| <sect>SCST Core Architecture |
| |
| <p> |
| SCST accepts commands and passes them to SCSI mid-level at the same |
| way as SCSI high-level drivers (sg, sd, st) do. Figure 1 shows |
| interaction between SCST, its drivers and Linux SCSI subsystem. |
| |
| <figure> |
| <eps file="fig1.png"> |
| <img src="fig1.png"> |
| <caption> |
| <newline> Interaction between SCST, its drivers and Linux SCSI subsystem. |
| </caption> |
| </figure> |
| |
| <sect> Target drivers |
| |
| <sect1>struct scst_tgt_template |
| |
| <p> |
| To work with SCST a target driver must register its template in SCST by |
| calling <bf/scst_register_target_template()/. The template lets SCST know the |
| target driver's entry points. It is defined as the following: |
| |
| <verb> |
| struct scst_tgt_template |
| { |
| int sg_tablesize; |
| const char name[SCST_MAX_NAME]; |
| |
| unsigned unchecked_isa_dma:1; |
| unsigned use_clustering:1; |
| unsigned no_clustering:1; |
| |
| unsigned xmit_response_atomic:1; |
| unsigned rdy_to_xfer_atomic:1; |
| |
| unsigned no_proc_entry:1; |
| |
| int max_hw_pending_time; |
| |
| int threads_num; |
| |
| int (*release)(struct scst_tgt *tgt); |
| |
| int (*xmit_response)(struct scst_cmd *cmd); |
| int (*rdy_to_xfer)(struct scst_cmd *cmd); |
| |
| void (*on_hw_pending_cmd_timeout) (struct scst_cmd *cmd); |
| |
| void (*on_free_cmd) (struct scst_cmd *cmd); |
| |
| int (*alloc_data_buf) (struct scst_cmd *cmd); |
| |
| void (*preprocessing_done) (struct scst_cmd *cmd); |
| |
| int (*pre_exec) (struct scst_cmd *cmd); |
| |
| void (*task_mgmt_affected_cmds_done) (struct scst_mgmt_cmd *mgmt_cmd); |
| void (*task_mgmt_fn_done)(struct scst_mgmt_cmd *mgmt_cmd); |
| |
| int (*report_aen) (struct scst_aen *aen); |
| |
| int (*read_proc) (struct seq_file *seq, struct scst_tgt *tgt); |
| int (*write_proc) (char *buffer, char **start, off_t offset, |
| int length, int *eof, struct scst_tgt *tgt); |
| |
| int (*get_initiator_port_transport_id) (struct scst_session *sess, |
| uint8_t **transport_id); |
| } |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| |
| <item><bf/sg_tablesize/ - allows checking whether scatter/gather can be |
| used or not and, if yes, sets the maximum supported count of |
| scatter/gather entries |
| |
| <item><bf/name/ - the name of the template. Must be unique to identify |
| the template. Must be defined. |
| |
| <item><bf/unchecked_isa_dma/ - true, if this target adapter uses |
| unchecked DMA onto an ISA bus. |
| |
| <item><bf/use_clustering/ - true, if this target adapter wants to use |
| clustering (i.e. smaller number of merged segments). |
| |
| <item> <bf/no_clustering/ - true, if this target adapter doesn't support |
| SG-vector clustering |
| |
| <item><bf/xmit_response_atomic/, <bf/rdy_to_xfer_atomic/ - true, if the |
| corresponding function supports execution in the atomic (non-sleeping) |
| context. |
| |
| <item> <bf/no_proc_entry/ - true, if this template doesn't need the entry in /proc |
| |
| <item> <bf/max_hw_pending_time/ - The maximum time in seconds cmd can |
| stay inside the target hardware, i.e. after rdy_to_xfer() and |
| xmit_response(), before on_hw_pending_cmd_timeout() will be called, if |
| defined. In the current implementation a cmd will be aborted in time t |
| max_hw_pending_time <= t < 2*max_hw_pending_time. |
| |
| <item> <bf/threads_num/ - number of additional threads to the pool of |
| dedicated threads. Used if xmit_response() or rdy_to_xfer() is blocking. |
| It is the target driver's duty to ensure that not more, than that number |
| of threads, are blocked in those functions at any time. |
| |
| <item><bf/int (*release)(struct scst_tgt *tgt)/ - this function is |
| intended to free up resources allocated to the device. The function |
| should return 0 to indicate successful release or a negative value if |
| there are some issues with the release. In the current version of SCST |
| the return value is ignored. Must be defined. |
| |
| <item><bf/int (*xmit_response)(struct scst_cmd *cmd)/ - this |
| function is equivalent to the SCSI queuecommand(). The target should |
| transmit the response data and the status in the struct scst_cmd. See |
| below for details. Must be defined. |
| |
| <item><bf/int (*rdy_to_xfer)(struct scst_cmd *cmd)/ - this function |
| informs the driver that data buffer corresponding to the said command |
| have now been allocated and it is OK to receive data for this command. |
| This function is necessary because a SCSI target does not have any |
| control over the commands it receives. Most lower-level protocols have |
| the corresponding function which informs the initiator that buffers have |
| been allocated e.g., XFER_RDY in Fibre Channel. After the data actually |
| received, the low-level driver should call <it/scst_rx_data()/ in order |
| to continue processing this command. Returns one of the |
| <it/SCST_TGT_RES_*/ constants, described below. Pay attention to |
| "atomic" attribute of the command, which can be get via |
| scst_cmd_atomic(). It is true if the function called in the atomic |
| (non-sleeping) context. Must be defined. |
| |
| <item> <bf/void (*on_hw_pending_cmd_timeout) (struct scst_cmd *cmd)/ - |
| Called if cmd stays inside the target hardware, i.e. after rdy_to_xfer() |
| and xmit_response(), more than max_hw_pending_time time. The target |
| driver supposed to cleanup this command and resume cmd's processing. |
| |
| <item><bf/void (*on_free_cmd)(struct scst_cmd *cmd)/ - this function |
| called to notify the driver that the command is about to be freed. |
| Necessary, because for aborted commands xmit_response() could not be |
| called. Could be used on IRQ context. Must be defined. |
| |
| <item> <bf/int (*alloc_data_buf) (struct scst_cmd *cmd)/ - this function |
| allows target driver to handle data buffer allocations on its own. |
| Target driver doesn't have to always allocate buffer in this function, |
| but if it decided to do it, it must check that |
| scst_cmd_get_data_buff_alloced() returns 0, otherwise to avoid double |
| buffer allocation and memory leaks alloc_data_buf() shall fail. Returns |
| 0 in case of success or < 0 (preferably -ENOMEM) in case of error, or > |
| 0 if the regular SCST allocation should be done. In case of returning |
| successfully, scst_cmd->tgt_data_buf_alloced will be set by SCST. It is |
| possible that both target driver and dev handler request own memory |
| allocation. If allocation in atomic context, i.e. scst_cmd_atomic() is |
| true, and < 0 is returned, this function will be recalled in thread |
| context. Note that the driver will have to handle itself all relevant |
| details such as scatterlist setup, highmem, freeing the allocated |
| memory, etc. |
| |
| <item> <bf/void (*preprocessing_done) (struct scst_cmd *cmd)/ - this |
| function informs the driver that data buffer corresponding to the said |
| command have now been allocated and other preprocessing tasks have been |
| done. A target driver could need to do some actions at this stage. After |
| the target driver done the needed actions, it shall call |
| <it/scst_restart_cmd()/ in order to continue processing this command. In case |
| of preliminary commands completion, this function will also be called |
| before xmit_response(). Called only for commands queued using |
| scst_cmd_init_stage1_done() instead of scst_cmd_init_done(). Returns |
| void, the result is expected to be returned using scst_restart_cmd(). |
| This command is expected to be NON-BLOCKING. If it is blocking, consider |
| to set threads_num to some none 0 number. Pay attention to "atomic" |
| attribute of the cmd, which can be get by scst_cmd_atomic(). It is true |
| if the function called in the atomic (non-sleeping) context. |
| |
| <item> <bf/int (*pre_exec) (struct scst_cmd *cmd)/ - this function |
| informs the driver that the said command is about to be executed. |
| Returns one of the <it/SCST_PREPROCESS_*/ constants. This command is |
| expected to be NON-BLOCKING. If it is blocking, consider to set |
| threads_num to some none 0 number. |
| |
| <item> <bf/void (*task_mgmt_affected_cmds_done) (struct scst_mgmt_cmd |
| *mgmt_cmd)/ - this function informs the driver that all affected by the |
| corresponding task management function commands have beed completed. No |
| return value expected. This function is expected to be NON-BLOCKING. |
| Called without any locks held from a thread context. |
| |
| <item><bf/void (*task_mgmt_fn_done)(struct scst_mgmt_cmd *mgmt_cmd)/ - |
| this function informs the driver that a received task management |
| function has been completed. Completion status could be get via |
| <it/scst_mgmt_cmd_get_status()/. No return value expected. Must be |
| defined, if the target supports task management functionality. |
| |
| <item><bf/int (*report_aen) (struct scst_aen *aen)/ - this function is |
| used for Asynchronous Event Notifications. Returns one of the |
| <it/SCST_AEN_RES_*/ constants. After AEN is sent, target driver must |
| call <it/scst_aen_done()/ and, optionally, |
| <it/scst_set_aen_delivery_status()/. This function is expected to be |
| NON-BLOCKING, but can sleep. This function must be prepared to handle |
| AENs between calls for the corresponding session of |
| scst_unregister_session() and unreg_done_fn() callback called or before |
| scst_unregister_session() returned, if its called in the blocking mode. |
| AENs for such sessions should be ignored. Must be defined, if low-level |
| protocol supports AENs. |
| |
| <item> <bf/int (*read_proc) (struct seq_file *seq, struct scst_tgt |
| *tgt), int (*write_proc) (char *buffer, char **start, off_t offset, |
| int length, int *eof, struct scst_tgt *tgt)/ - those functions can be |
| used to export the driver's statistics and other infos to the world |
| outside the kernel as well as to get some management commands from it. |
| If the driver needs to create additional files in its /proc |
| subdirectory, it can use <it/scst_proc_get_tgt_root()/ function to get |
| the root proc_dir_entry. |
| |
| <item> <bf/int (*get_initiator_port_transport_id) (struct scst_session |
| *sess, uint8_t **transport_id)/ - this function returns in tr_id the |
| corresponding to sess initiator port TransporID in the form as it's used |
| by PR commands, see "Transport Identifiers" in SPC. Space for the |
| initiator port TransporID must be allocated via kmalloc(). Caller |
| supposed to kfree() it, when it isn't needed anymore. If sess is NULL, |
| this function must return TransportID PROTOCOL IDENTIFIER of this |
| transport. Returns 0 on success or negative error code otherwise. Should |
| be defined, because it's required for Persistent Reservations. |
| |
| </itemize> |
| |
| Functions <bf/xmit_response()/, <bf/rdy_to_xfer()/ are expected to be |
| non-blocking, i.e. return immediately and don't wait for actual data |
| transfer to finish. Blocking in such command could negatively impact on |
| overall system performance. If blocking is necessary, it is worth to |
| consider creating dedicated thread(s) in target driver, to which the |
| commands would be passed and which would perform blocking operations |
| instead of SCST. If the function allowed to sleep or not is defined by |
| "atomic" attribute of the cmd that can be get via |
| <it/scst_cmd_atomic()/, which is true, if sleeping is not allowed. In |
| this case, if the function requires sleeping, it can return |
| <it/SCST_TGT_RES_NEED_THREAD_CTX/ in order to be recalled in the thread |
| context, where sleeping is allowed. |
| |
| Functions <bf/task_mgmt_fn_done()/ and <bf/report_aen()/ are recommended |
| to be non-blocking as well. Blocking there will stop all management |
| processing for all target drivers in the system (there is only one |
| management thread in the system). |
| |
| Functions <bf/xmit_response()/ and <bf/rdy_to_xfer()/ can return the |
| following error codes: |
| |
| <itemize> |
| |
| <item><bf/SCST_TGT_RES_SUCCESS/ - success. |
| |
| <item><bf/SCST_TGT_RES_QUEUE_FULL/ - internal device queue is full, retry |
| again later. |
| |
| <item><bf/SCST_TGT_RES_NEED_THREAD_CTX/ - it is impossible to complete |
| requested task in atomic context. The command should be restarted in the |
| thread context as described above. |
| |
| <item><bf/SCST_TGT_RES_FATAL_ERROR/ - fatal error, i.e. it is unable to |
| perform requested operation. If returned by <bf/xmit_response()/ the |
| command will be destroyed, if by <bf/rdy_to_xfer()/, |
| <bf/xmit_response()/ will be called with <bf/HARDWARE ERROR/ sense data. |
| |
| </itemize> |
| |
| <sect2>More about xmit_response() |
| |
| <p> |
| As already written above, function xmit_response() should transmit |
| the response data and the status from the cmd parameter. |
| |
| Sense data, if any, is contained in the buffer, returned by |
| <it/scst_cmd_get_sense_buffer()/, with length, returned by |
| <it/scst_cmd_get_sense_buffer_len()/. SCST always works in autosense |
| mode. If a low-level SCSI driver/device doesn't support autosense mode, |
| SCST will issue REQUEST SENSE command, if necessary. Thus, if CHECK |
| CONDITION established, target driver will always see sense in the sense |
| buffer and isn't required to request the sense manually. |
| |
| After the response is completely sent, the target should call |
| <it/scst_tgt_cmd_done()/ function in order to allow SCST to free the |
| command. |
| |
| Function xmit_response() returns one of the <it/SCST_TGT_RES_*/ |
| constants, described above. Pay attention to "atomic" attribute of the |
| cmd, which can be get via <it/scst_cmd_atomic()/: it is true if the |
| function called in the atomic (non-sleeping) context. |
| |
| To detect aborted commands xmit_response() must in the beginning check |
| return status of function <bf/scst_cmd_aborted_on_xmit()/. If it's true, |
| xmit_response() must call <bf/scst_set_delivery_status(cmd, |
| SCST_CMD_DELIVERY_ABORTED)/ and terminate further processing by calling |
| <bf/scst_tgt_cmd_done(cmd, SCST_CONTEXT_SAME)/. |
| |
| <sect1>Target driver registration functions |
| |
| <sect2>scst_register_target_template() |
| |
| <p> |
| Function <bf/scst_register_target_template()/ is defined as the following: |
| |
| <verb> |
| int scst_register_target_template( |
| struct scst_tgt_template *vtt) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| <item><bf/vtt/ - pointer to the target driver template |
| </itemize> |
| |
| Returns 0 on success or appropriate error code otherwise. |
| |
| <sect2>scst_register_target() |
| |
| <p> |
| Function <bf/scst_register_target()/ is defined as the following: |
| |
| <verb> |
| struct scst_tgt *scst_register_target( |
| struct scst_tgt_template *vtt) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| <item><bf/vtt/ - pointer to the target driver template |
| </itemize> |
| |
| Returns target structure based on template vtt or NULL in case of error. |
| |
| <sect1>Target driver unregistration functions |
| |
| <p> |
| In order to unregister itself target driver should at first call |
| <bf/scst_unregister_target()/ for all its adapters and then call |
| <bf/scst_unregister_target_template()/ for its template. |
| |
| <sect2>scst_unregister_target() |
| |
| <p> |
| Function <bf/scst_unregister_target()/ is defined as the following: |
| |
| <verb> |
| void scst_unregister_target( |
| struct scst_tgt *tgt) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| <item><bf/tgt/ - pointer to the target driver structure |
| </itemize> |
| |
| <sect2>scst_unregister_target_template() |
| |
| <p> |
| Function <bf/scst_unregister_target_template()/ is defined as the following: |
| |
| <verb> |
| void scst_unregister_target_template( |
| struct scst_tgt_template *vtt) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| <item><bf/vtt/ - pointer to the target driver template |
| </itemize> |
| |
| <sect>Device specific drivers (backend device handlers) |
| |
| <p> Device specific drivers are add-ons for SCST, which help SCST to |
| analyze incoming requests and determine parameters, specific to various |
| types of devices as well as actually execute specified SCSI commands. |
| Device handlers are intended for the following: |
| |
| <itemize> |
| |
| <item>To get data transfer length and direction directly from CDB and |
| current device's configuration exactly as an end-target SCSI device |
| does. This serves two purposes: |
| |
| <itemize> |
| |
| <item> Improves security and reliability by not trusting the data |
| supplied by remote initiator via SCSI low-level protocol. |
| |
| <item> Some low-level SCSI protocols don't provide data transfer |
| length and direction, so that information can be get only |
| directly from CDB and current device's configuration. For |
| example, for tape devices to get data transfer size it might be |
| necessary to know block size setting. |
| |
| </itemize> |
| |
| <item> Execute commands |
| |
| <item>To process some exceptional conditions, like ILI on tape devices. |
| |
| <item>To initialize incoming commands with some device-specific |
| parameters, like timeout value. |
| |
| <item>To allow some additional device-specific commands pre-, post- |
| processing or alternative execution, like copying data from system |
| cache, and do that completely independently from target drivers. |
| |
| </itemize> |
| |
| Device handlers considered to be part of SCST, so they could directly |
| access any fields in SCST's structures as well as use the corresponding |
| functions. |
| |
| Without appropriate device handler SCST hides devices of this type from |
| remote initiators and returns <bf/HARDWARE ERROR/ sense data to any |
| requests to them. |
| |
| <sect1>Structure <bf/scst_dev_type/ |
| |
| <p> |
| Structure <bf/scst_dev_type/ is defined as the following: |
| |
| <verb> |
| struct scst_dev_type |
| { |
| char name[]; |
| int type; |
| |
| unsigned parse_atomic:1; |
| unsigned alloc_data_buf_atomic:1; |
| unsigned dev_done_atomic:1; |
| |
| unsigned no_proc:1; |
| |
| unsigned pr_cmds_notifications:1; |
| |
| int threads_num; |
| enum scst_dev_type_threads_pool_type threads_pool_type; |
| |
| int (*attach) (struct scst_device *dev); |
| void (*detach) (struct scst_device *dev); |
| |
| int (*attach_tgt) (struct scst_tgt_device *tgt_dev); |
| void (*detach_tgt) (struct scst_tgt_device *tgt_dev); |
| |
| int (*parse) (struct scst_cmd *cmd); |
| int (*alloc_data_buf) (struct scst_cmd *cmd); |
| int (*exec) (struct scst_cmd *cmd); |
| int (*dev_done) (struct scst_cmd *cmd); |
| int (*on_free_cmd) (struct scst_cmd *cmd); |
| |
| int (*task_mgmt_fn) (struct scst_mgmt_cmd *mgmt_cmd, |
| struct scst_tgt_dev *tgt_dev); |
| |
| int (*read_proc) (struct seq_file *seq, struct scst_dev_type *dev_type); |
| int (*write_proc) (char *buffer, char **start, off_t offset, |
| int length, int *eof, struct scst_dev_type *dev_type); |
| } |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| |
| <item><bf/name/ - the name of the device handler. Must be defined and |
| unique. |
| |
| <item><bf/type/ - SCSI type of the supported device. Must be defined. |
| |
| <item><bf/parse_atomic/, <bf/alloc_data_buf_atomic/, |
| <bf/dev_done_atomic/ - true, if the corresponding callback supports |
| execution in the atomic (non-sleeping) context. |
| |
| <item> <bf/no_proc/ - true, if no /proc files should be automatically |
| created by SCST for this dev handler |
| |
| <item> <bf/pr_cmds_notifications/ - should be set if the device wants to |
| receive notification of Persistent Reservation commands (PR OUT only) |
| Note: The notifications will not be sent if the command failed. |
| |
| <item> <bf/threads_num/ - sets number of threads in this handler's |
| devices' threads pools. If 0 - no threads will be created, if <0 - |
| creation of the threads pools is prohibited. Also pay attention to |
| <it/threads_pool_type/ below. |
| |
| <item> <bf/threads_pool_type/ - threads pool type. Valid only if |
| threads_num > 0. Possible values: |
| |
| <itemize> |
| |
| <item> <bf/SCST_THREADS_POOL_PER_INITIATOR/ - each initiator |
| will have dedicated threads pool |
| |
| <item> <bf/SCST_THREADS_POOL_SHARED/ - all connected initiators will use |
| shared threads pool |
| |
| </itemize> |
| |
| <item><bf/int (*attach) (struct scst_device *dev)/ - called when new |
| device is being attached to the device handler |
| |
| <item><bf/void (*detach) (struct scst_device *dev)/ - called when new |
| device is being detached from the device handler |
| |
| <item><bf/int (*attach_tgt) (struct scst_tgt_device *tgt_dev)/ - called |
| when new tgt_dev (session) is being attached to the device handler |
| |
| <item><bf/void (*detach_tgt) (struct scst_tgt_device *tgt_dev)/ - called |
| when tgt_dev (session) is being detached from the device handler |
| |
| <item><bf/int (*parse) (struct scst_cmd *cmd, const struct scst_info_cdb |
| *cdb_info)/ - called to parse CDB from the cmd and initialize |
| <it/cmd->bufflen/ and <it/cmd->data_direction/ (both - REQUIRED). Returns the |
| command's <it/next state/ or <it/SCST_CMD_STATE_DEFAULT/, if the next default |
| state should be used, or <it/SCST_CMD_STATE_NEED_THREAD_CTX/ if the function |
| called in atomic context, but requires sleeping, or <it/SCST_CMD_STATE_STOP/ |
| if the command should not be further processed for now. In the |
| SCST_CMD_STATE_NEED_THREAD_CTX case the function will be recalled in the |
| thread context, where sleeping is allowed. Pay attention to "atomic" |
| attribute of the cmd, which can be get by scst_cmd_atomic(). It is true |
| if the function called in the atomic (non-sleeping) context. Must be |
| defined. |
| |
| <item><bf/int (*alloc_data_buf) (struct scst_cmd *cmd)/ - this function |
| allows dev handler to handle data buffer allocations on its own. Returns |
| the command's <it/next state/ or <it/SCST_CMD_STATE_DEFAULT/, if the |
| next default state should be used, or |
| <it/SCST_CMD_STATE_NEED_THREAD_CTX/ if the function called in atomic |
| context, but requires sleeping, or <it/SCST_CMD_STATE_STOP/ if the |
| command should not be further processed for now. In the |
| SCST_CMD_STATE_NEED_THREAD_CTX case the function will be recalled in the |
| thread context, where sleeping is allowed. Pay attention to "atomic" |
| attribute of the cmd, which can be get by scst_cmd_atomic(). It is true |
| if the function called in the atomic (non-sleeping) context. |
| |
| <item> <bf/int (*exec) (struct scst_cmd *cmd)/ - called to execute CDB. |
| Useful, for instance, to implement data caching. The result of CDB |
| execution is reported via <it/cmd->scst_cmd_done()/ callback. |
| |
| Returns: |
| <itemize> |
| |
| <item> <bf/SCST_EXEC_COMPLETED/ - the cmd is done, go to other ones |
| |
| <item> <bf/SCST_EXEC_NOT_COMPLETED/ - the cmd should be sent to SCSI |
| mid-level. |
| </itemize> |
| |
| If this function provides sync execution, you should consider to setup |
| dedicated threads by setting <it/threads_num/ > 0. |
| |
| Optional, if not set, the commands will be sent directly to SCSI |
| device. |
| |
| <bf/If this function is implemented, scst_check_local_events() shall be |
| called inside it just before the actual command's execution./ |
| |
| <item><bf/int (*dev_done) (struct scst_cmd *cmd)/ - called to notify |
| device handler about the result of the command's execution and perform |
| some post processing. If <it/parse()/ function is called, dev_done() is |
| <it/guaranteed/ to be called as well. The command's fields |
| <it/tgt_resp_flags/ and <it/resp_data_len/ should be set by this |
| function, but SCST offers good defaults. Pay attention to "atomic" |
| attribute of the command, which can be get via scst_cmd_atomic(). It is |
| true if the function called in the atomic (non-sleeping) context. |
| Returns the command's <it/next state/ or <it/SCST_CMD_STATE_DEFAULT/, if |
| the next default state should be used, or |
| <it/SCST_CMD_STATE_NEED_THREAD_CTX/ if the function called in atomic |
| context, but requires sleeping. In the last case, the function will be |
| recalled in the thread context, where sleeping is allowed. |
| |
| <item><bf/void (*on_free_cmd) (struct scst_cmd *cmd)/ - called to notify |
| device handler that the command is about to be freed. Could be called on |
| IRQ context. |
| |
| <item><bf/int (*task_mgmt_fn) (struct scst_mgmt_cmd *mgmt_cmd, struct |
| scst_tgt_dev *tgt_dev)/ - called to execute a task management command. |
| Returns: |
| |
| <itemize> |
| |
| <item><bf/SCST_MGMT_STATUS_SUCCESS/ - the command is done |
| with success, no further actions required |
| |
| <item><bf/SCST_MGMT_STATUS_*/ - the command is failed, |
| no further actions required |
| |
| <item><bf/SCST_DEV_TM_NOT_COMPLETED/ - regular standard actions |
| for the command should be done |
| |
| </itemize> |
| |
| <bf/NOTE/: for <bf/SCST_ABORT_TASK/ it is called under spinlock! |
| |
| <item> <bf/int (*read_proc) (struct seq_file *seq, struct scst_tgt |
| *tgt), int (*write_proc) (char *buffer, char **start, off_t offset, |
| int length, int *eof, struct scst_tgt *tgt)/ - those functions can be |
| used to export the driver's statistics and other infos to the world |
| outside the kernel as well as to get some management commands from it. |
| If the driver needs to create additional files in its /proc |
| subdirectory, it can use <it/scst_proc_get_dev_type_root()/ function to |
| get the root proc_dir_entry. |
| |
| |
| </itemize> |
| |
| <sect1>Device specific drivers registration |
| |
| <sect2> scst_register_dev_driver() |
| |
| <p> |
| To work with SCST a device specific driver must register itself in SCST by |
| calling <bf/scst_register_dev_driver()/. It is defined as the following: |
| |
| <verb> |
| int scst_register_dev_driver( |
| struct scst_dev_type *dev_type) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| <item><bf/dev_type/ - device specific driver's description structure |
| </itemize> |
| |
| The function returns 0 on success or appropriate error code otherwise. |
| |
| <sect2> scst_register_virtual_device() |
| |
| <p> |
| To create a virtual device a device handler must register it in SCST by |
| calling <bf/scst_register_virtual_device()/. It is defined as the following: |
| |
| <verb> |
| int scst_register_virtual_device( |
| struct scst_dev_type *dev_handler, |
| const char *dev_name) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| |
| <item><bf/dev_handler/ - device specific driver's description structure |
| |
| <item> <bf/dev_name/ - the new device name, NULL-terminated string. Must be unique |
| among all virtual devices in the system. |
| |
| </itemize> |
| |
| The function returns ID assigned to the device on success, or negative |
| value otherwise. |
| |
| All local real SCSI devices will be registered and unregistered by the |
| SCST core automatically, so pass-through dev handlers don't have to |
| worry about it. |
| |
| |
| <sect1>Device specific drivers unregistration |
| |
| <sect2> scst_unregister_virtual_device() |
| |
| <p> |
| Virtual devices unregistered by calling |
| <bf/scst_unregister_virtual_device()/. It is defined as the following: |
| |
| <verb> |
| void scst_unregister_virtual_device( |
| int id) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| <item><bf/id/ - the device's ID, returned by the registration function. |
| </itemize> |
| |
| <sect2> scst_unregister_dev_driver() |
| |
| <p> |
| Device specific driver is unregistered by calling |
| <bf/scst_unregister_dev_driver()/. It is defined as the following: |
| |
| <verb> |
| void scst_unregister_dev_driver( |
| struct scst_dev_type *dev_type) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| <item><bf/dev_type/ - device specific driver's description structure |
| </itemize> |
| |
| <sect>SCST sessions |
| |
| <sect1>SCST sessions registration |
| |
| <p> |
| When target driver determines that it needs to create new SCST session |
| (for example, by receiving new TCP connection), it should call |
| <bf/scst_register_session()/, that is defined as the following: |
| |
| <verb> |
| struct scst_session *scst_register_session( |
| struct scst_tgt *tgt, |
| int atomic, |
| const char *initiator_name, |
| void *tgt_priv, |
| void *result_fn_data, |
| void (*result_fn) ( |
| struct scst_session *sess, |
| void *data, |
| int result)) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| |
| <item><bf/tgt/ - target |
| |
| <item><bf/atomic/ - true, if the function called in the atomic context |
| |
| <item><bf/initiator_name/ - remote initiator's name, any NULL-terminated |
| string, e.g. iSCSI name, which used as the key to found appropriate |
| access control group. Could be NULL, then "default" group is used. The |
| groups are set up via /proc interface. |
| |
| <item> <bf/tgt_priv/ - pointer to target driver's private data |
| |
| <item><bf/result_fn_data/ - data that will be used as the second |
| parameter for <bf/bf/result_fn/()/ function |
| |
| <item><bf/result_fn/ - pointer to the function that will be |
| asynchronously called when session initialization finishes. Can be NULL. |
| Parameters: |
| |
| <itemize> |
| |
| <item><bf/sess/ - session |
| |
| <item><bf/data/ - target driver supplied to scst_register_session() data |
| |
| <item><bf/result/ - session initialization result, 0 on success or |
| appropriate error code otherwise |
| |
| </itemize> |
| |
| </itemize> |
| |
| A session creation and initialization is a complex task, which requires |
| sleeping state, so it can't be fully done in interrupt context. |
| Therefore the "bottom half" of it, if scst_register_session() is |
| called from atomic context, will be done in SCST thread context. In this |
| case scst_register_session() will return not completely initialized |
| session, but the target driver can supply commands to this session via |
| scst_rx_cmd(). Those commands processing will be delayed inside |
| SCST until the session initialization is finished, then their processing |
| will be restarted. The target driver will be notified about finish of |
| the session initialization by function <it/result_fn()/. On success the |
| target driver could do nothing, but if the initialization fails, the |
| target driver must ensure that no more new commands being sent or will |
| be sent to SCST after result_fn() returns. All already sent to SCST |
| commands for failed session will be returned in <it/xmit_response()/ |
| with BUSY status. In case of failure the driver shall call |
| <it/scst_unregister_session()/ inside result_fn(), it will NOT be |
| called automatically. |
| |
| Thus, scst_register_session() can be safely called from IRQ context. |
| |
| <sect1>SCST sessions unregistration |
| |
| <p> |
| SCST session unregistration basically is the same, except that instead of |
| atomic parameter there is <bf/wait/ one. |
| |
| <verb> |
| void scst_unregister_session( |
| struct scst_session *sess, |
| int wait, |
| void (*unreg_done_fn)( |
| struct scst_session *sess)) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| |
| <item><bf/sess/ - session to be unregistered |
| |
| <item><bf/wait/ - if true, instructs to wait until all commands, which |
| currently being executed in the session, finished. Otherwise, target |
| driver should be prepared to receive <it/xmit_response()/ for the |
| session after scst_unregister_session() returns. |
| |
| <item><bf/unreg_done_fn/ - pointer to the function that will be |
| asynchronously called when the last session's command finishes and the |
| session is about to be completely freed. Can be NULL. Parameter: |
| |
| <itemize> |
| |
| <item><bf/sess/ - session |
| |
| </itemize> |
| |
| </itemize> |
| |
| All outstanding commands will be finished regularly. After |
| scst_unregister_session() returned no new commands must be sent to SCST |
| via scst_rx_cmd(). Also, the caller must ensure that no scst_rx_cmd() or |
| scst_rx_mgmt_fn_*() is called in parallel with |
| scst_unregister_session(). |
| |
| Function scst_unregister_session()/ can be called before result_fn() of |
| scst_register_session() called, i.e. during the session |
| registration/initialization. |
| |
| |
| <sect>Commands processing and interaction between SCST core and its drivers |
| |
| <p> |
| Consider simplified commands processing example. It assumes that target |
| driver doesn't need own memory allocation, i.e. not defined |
| alloc_data_buf() callback. Example of such target driver is qla2x00t. |
| |
| The commands processing by SCST started when target driver calls |
| <bf/scst_rx_cmd()/. This function returns SCST's command. Then the |
| target driver finishes the command's initialization, for example, |
| storing necessary target driver specific data there, and calls |
| <bf/scst_cmd_init_done()/ telling SCST that it can start the command processing. |
| Then SCST translates the command's LUN to local device, determines the |
| command's data direction and required data buffer size by calling |
| appropriate device handler's <bf/parse()/ callback function. Then: |
| |
| <itemize> |
| |
| <item>If the command required no data transfer, it will be passed to |
| SCSI mid-level directly or via device handler's <bf/exec()/ callback. |
| |
| <item>If the command is a <it/READ/ command (data to the remote/local initiator), |
| necessary space will be allocated and then the command will be passed |
| to SCSI mid-level directly or via device handler's <bf/exec()/ callback. |
| |
| <item>If the command is a <it/WRITE/ command (data from the remote/local initiator), |
| necessary space will be allocated, then the target's <bf/rdy_to_xfer()/ |
| callback will be called, telling the target that the space is ready and |
| it can start data transferring. When all the data are read from the |
| target, it will call <bf/scst_rx_data()/, and the command will be passed |
| to SCSI mid-level directly or via device handler's <bf/exec()/ callback. |
| |
| </itemize> |
| |
| When the command is finished by SCSI mid-level, device handler's |
| <bf/dev_done()/ callback is called to notify it about the command's |
| completion. Then in order to send its response the target's |
| <bf/xmit_response()/ callback is called. When the response, including |
| data, if any, is transmitted, the target will call |
| <bf/scst_tgt_cmd_done()/ to tell SCST that it can free the command and |
| its data buffer. |
| |
| Then during the command's deallocation device handler's and the target's |
| <bf/on_free_cmd()/ callback will be called in this order, if set. |
| |
| This sequence is illustrated on Figure 2. To simplify the picture, sign |
| "..." means SCST's waiting state for the corresponding command to |
| complete. During this state SCST and its drivers continue processing of |
| other commands, if there are any. One way arrow, for example to |
| xmit_response(), means that after this function returns, nothing |
| valuable for the current command will be done and SCST goes sleeping or |
| to the next command processing until the corresponding event happens. |
| |
| <figure> |
| <eps file="fig2.png"> |
| <img src="fig2.png"> |
| <caption> |
| <newline> The commands processing flow |
| </caption> |
| </figure> |
| |
| <sect1>The commands processing functions |
| |
| <sect2>scst_rx_cmd() |
| |
| <p> |
| Function <bf/scst_rx_cmd()/ creates and sends new command to SCST. Returns |
| the command on success or NULL otherwise. It is defined as the |
| following: |
| |
| <verb> |
| struct scst_cmd *scst_rx_cmd( |
| struct scst_session *sess, |
| const uint8_t *lun, |
| int lun_len, |
| const uint8_t *cdb, |
| int cdb_len, |
| int atomic) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| |
| <item><bf/sess/ - SCST's session |
| |
| <item><bf/lun/ - pointer to device's LUN as specified by SAM in without |
| any byte order translation. Extended addressing method is not supported. |
| |
| <item><bf/lun_len/ - LUN's length |
| |
| <item><bf/cdb/ - SCSI CDB |
| |
| <item><bf/cdb_len/ - CDB's length. Can be up to 64KB long. |
| |
| <item><bf/atomic/ - if true, the command will be allocated with |
| GFP_ATOMIC flag, otherwise GFP_KERNEL will be used |
| |
| </itemize> |
| |
| <sect2>scst_cmd_init_done() |
| |
| <p> |
| Function <bf/scst_cmd_init_done()/ notifies SCST that the driver finished |
| its part of the command initialization, and the command is ready for |
| execution. It is defined as the following: |
| |
| <verb> |
| void scst_cmd_init_done( |
| struct scst_cmd *cmd, |
| enum scst_exec_context pref_context) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| |
| <item><bf/cmd/ - the command |
| |
| <item><bf/pref_context/ - preferred command execution context. See |
| <it/SCST_CONTEXT_*/ constants below for details. |
| |
| </itemize> |
| |
| <sect2>scst_rx_data() |
| |
| <p> |
| Function <bf/scst_rx_data()/ notifies SCST that the driver received all |
| the necessary data and the command is ready for further processing. It |
| is defined as the following: |
| |
| <verb> |
| void scst_rx_data( |
| struct scst_cmd *cmd, |
| int status, |
| enum scst_exec_context pref_context) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| |
| <item><bf/cmd/ - the command |
| |
| <item><bf/status/ - completion status, see below. |
| |
| <item><bf/pref_context/ - preferred command execution context. See |
| <it/SCST_CONTEXT_*/ constants below for details. |
| |
| </itemize> |
| |
| Parameter <bf/status/ can have one of the following values: |
| |
| <itemize> |
| |
| <item><bf/SCST_RX_STATUS_SUCCESS/ - success |
| |
| <item><bf/SCST_RX_STATUS_ERROR/ - data receiving finished with error, so |
| SCST should set the sense and finish the command by calling |
| xmit_response() |
| |
| <item><bf/SCST_RX_STATUS_ERROR_SENSE_SET/ - data receiving finished with |
| error and the sense is set, so SCST should finish the command by calling |
| xmit_response() |
| |
| <item><bf/SCST_RX_STATUS_ERROR_FATAL/ - data receiving finished with |
| fatal error, so SCST should finish the command, but don't call |
| xmit_response(). In this case the driver must free all associated |
| with the command data before calling scst_rx_data(). |
| |
| </itemize> |
| |
| <sect2>scst_tgt_cmd_done() |
| |
| <p> |
| Function <bf/scst_tgt_cmd_done()/ notifies SCST that the driver has sent |
| the data and/or response. It must not been called if there are an error |
| and xmit_response() returned something other, than SCST_TGT_RES_SUCCESS. |
| It is defined as the following: |
| |
| <verb> |
| void scst_tgt_cmd_done( |
| struct scst_cmd *cmd, |
| enum scst_exec_context pref_context) |
| </verb> |
| |
| Where: |
| <itemize> |
| |
| <item><bf/cmd/ - the command |
| |
| <item><bf/pref_context/ - preferred command execution context. See |
| <it/SCST_CONTEXT_*/ constants below for details. |
| |
| </itemize> |
| |
| <sect1>The commands processing context |
| |
| <p> |
| Execution context often is a major problem in the kernel drivers |
| development, because many contexts, like IRQ context, greatly limit |
| available functionality, therefore require additional complex code in |
| order to pass processing to more simple context. SCST does its best to |
| undertake most of the context handling. |
| |
| On the initialization time SCST creates for internal command processing |
| as many threads as there are processors in the system or specified by |
| user via <bf/scst_threads/ module parameter. Similarly, as many tasklets |
| created as there are processors in the system. |
| |
| Each command can be processed in one of four contexts: |
| |
| <enum> |
| <item>Directly, i.e. in the caller's context, without limitations |
| <item>Directly atomically, i.e. with sleeping forbidden |
| <item>In the SCST's internal threads |
| <item>In the SCST's per processor tasklets |
| </enum> |
| |
| The target driver sets this context as pref_context parameter for SCST |
| functions. Additionally, target's template's <it/xmit_response_atomic/ |
| and <it/rdy_to_xfer_atomic/ flags have direct influence on the context. |
| If one of them is false, the corresponding function will never be called |
| in the atomic context and, if necessary, the command will be rescheduled |
| to one of the SCST's threads. |
| |
| SCST in some circumstances can change preferred context to less |
| restrictive one, for example, for large data buffer allocation, if |
| there is not enough GFP_ATOMIC memory. |
| |
| <sect2>Preferred context constants |
| |
| <p> |
| There are the following preferred context constants: |
| |
| <itemize> |
| |
| <item><bf/SCST_CONTEXT_DIRECT/ - sets direct command processing (i.e. |
| regular function calls in the current context) sleeping is allowed, no |
| context restrictions. Supposed to be used when calling from thread |
| context where no locks are held and the driver's architecture allows |
| sleeping without performance degradation or anything like that. |
| |
| <item><bf/SCST_CONTEXT_DIRECT_ATOMIC/ - sets direct command processing |
| (i.e. regular function calls in the current context), sleeping is not |
| allowed. Supposed to be used when calling on thread context where there |
| are locks held, when calling on softirq context or the driver's |
| architecture does not allow sleeping without performance degradation or |
| anything like that. |
| |
| <item><bf/SCST_CONTEXT_TASKLET/ - tasklet or thread context required for |
| the command processing. Supposed to be used when calling from IRQ |
| context. |
| |
| <item><bf/SCST_CONTEXT_THREAD/ - thread context required for the |
| command processing. Supposed to be used if the driver's architecture |
| does not allow using any of above. |
| |
| <item> <bf/SCST_CONTEXT_SAME/ - context is the same as it was in |
| previous call of the corresponding callback. For example, if dev |
| handler's exec() does sync. data reading this value should be used for |
| scst_cmd_done(). The same is true if scst_tgt_cmd_done() called directly |
| from target driver's xmit_response(). Not allowed in |
| scst_cmd_init_done() and scst_cmd_init_stage1_done(). |
| |
| </itemize> |
| |
| <sect1>SCST commands' processing states |
| |
| <p> |
| There are the following processing states, which a SCST command passes |
| through during execution and which could be returned by device handler's |
| <bf/parse()/ and <bf/dev_done()/ (but not all states are allowed to be |
| returned): |
| |
| <itemize> |
| |
| <item><bf/SCST_CMD_STATE_INIT_WAIT/ - the command is created, but |
| <it/scst_cmd_init_done()/ not called |
| |
| <item><bf/SCST_CMD_STATE_INIT/ - LUN translation (i.e. <it/cmd->tgt_dev/ |
| assignment) state |
| |
| <item><bf/SCST_CMD_STATE_PARSE/ - device handler's <it/parse()/ is going |
| to be called |
| |
| <item><bf/SCST_CMD_STATE_PREPARE_SPACE/ - allocation of the command's |
| data buffer |
| |
| <item> <bf/SCST_CMD_STATE_PREPROCESSING_DONE_CALLED/ - waiting for scst_restart_cmd() |
| |
| <item><bf/SCST_CMD_STATE_RDY_TO_XFER/ - target driver's |
| <it/rdy_to_xfer()/ is going to be called |
| |
| <item><bf/SCST_CMD_STATE_DATA_WAIT/ - waiting for data from the initiator |
| (until <it/scst_rx_data()/ called) |
| |
| <item> <bf/SCST_CMD_STATE_TGT_PRE_EXEC/ - target driver's |
| <it/pre_exec()/ is going to be called |
| |
| <item><bf/SCST_CMD_STATE_SEND_FOR_EXEC/ - the command is going to be |
| sent for execution |
| |
| <item><bf/SCST_CMD_STATE_EXECUTING/ - waiting for the command's execution |
| finish |
| |
| <item> <bf/SCST_CMD_STATE_LOCAL_EXEC/ - the command is being checked if |
| it should be executed locally |
| |
| <item> <bf/SCST_CMD_STATE_REAL_EXEC/ - the command is ready for execution |
| |
| <item> <bf/SCST_CMD_STATE_REAL_EXECUTING/ - waiting for CDB's execution |
| finish |
| |
| <item> <bf/SCST_CMD_STATE_PRE_DEV_DONE/ - internal post-exec checks |
| |
| <item> <bf/SCST_CMD_STATE_MODE_SELECT_CHECKS/ - internal MODE SELECT |
| pages related checks |
| |
| <item><bf/SCST_CMD_STATE_DEV_DONE/ - device handler's <it/dev_done()/ is |
| going to be called |
| |
| <item> <bf/SCST_CMD_STATE_PRE_XMIT_RESP/ - checks before target driver's |
| <it/xmit_response()/ is called |
| |
| <item><bf/SCST_CMD_STATE_XMIT_RESP/ - target driver's |
| <it/xmit_response()/ is going to be called |
| |
| <item><bf/SCST_CMD_STATE_XMIT_WAIT/ - waiting for data/response's |
| transmission finish (until <it/scst_tgt_cmd_done()/ called) |
| |
| <item><bf/SCST_CMD_STATE_FINISHED/ - the command finished and going to be |
| freed |
| |
| </itemize> |
| |
| |
| <sect>Task management functions |
| |
| <p> |
| There are the following task management functions supported: |
| |
| <itemize> |
| |
| <item> <bf/SCST_ABORT_TASK/ - this is <it/ABORT_TASK/ SAM task |
| management function. Aborts the specified task (command). |
| |
| <item> <bf/SCST_ABORT_TASK_SET/ - this is <it/ABORT_TASK_SET/ SAM task |
| management function. Aborts all tasks (commands) in the specified |
| session. |
| |
| <item> <bf/SCST_CLEAR_ACA/ - this is <bf/CLEAR_ACA/ SAM task management |
| function. Currently does nothing. |
| |
| <item> <bf/SCST_CLEAR_TASK_SET/ - this is <bf/CLEAR_TASK_SET/ SAM task |
| management function. Clears task set of commands on the specified |
| device or session. |
| |
| <item> <bf/SCST_LUN_RESET/ - this is <bf/LUN_RESET/ SAM task management |
| function. Resets specified device. |
| |
| <item> <bf/SCST_TARGET_RESET/ - this is <bf/TARGET_RESET/ SAM task management |
| function. Resets all devices visible in this session. |
| |
| <item> <bf/SCST_NEXUS_LOSS_SESS/ - SCST extension. Notifies about I_T |
| nexus loss event in the corresponding session. Aborts all tasks there, |
| resets the reservation, if any, and sets up the I_T Nexus loss UA. |
| |
| <item> <bf/SCST_ABORT_ALL_TASKS_SESS/ - SCST extension. Aborts all |
| tasks in the corresponding session. |
| |
| <item> <bf/SCST_NEXUS_LOSS/ - SCST extension. Notifies about I_T nexus |
| loss event. Aborts all tasks in all sessions of the tgt, resets the |
| reservations, if any, and sets up the I_T Nexus loss UA. |
| |
| <item> <bf/SCST_ABORT_ALL_TASKS/ - SCST extension. Aborts all tasks in |
| all sessions of the tgt. |
| |
| </itemize> |
| |
| All task management functions return completion status via |
| <it/task_mgmt_fn_done()/ when the affected SCSI commands (tasks) are |
| actually aborted, i.e. guaranteed never be executed any time later. |
| |
| <sect1>scst_rx_mgmt_fn_tag() |
| |
| <p> |
| Function <bf/scst_rx_mgmt_fn_tag()/ tells SCST to perform the specified |
| task management function, based on the command's tag. Can be used only |
| for <it/SCST_ABORT_TASK/. |
| |
| It is defined as the following: |
| |
| <verb> |
| int scst_rx_mgmt_fn_tag( |
| struct scst_session *sess, |
| int fn, |
| uint32_t tag, |
| int atomic, |
| void *tgt_priv) |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| |
| <item> <bf/sess/ - the session, on which the command should be performed. |
| |
| <item> <bf/fn/ - task management function, one of the constants above. |
| |
| <item> <bf/tag/ - the command's tag. |
| |
| <item> <bf/atomic/ - true, if the function called in the atomic context. |
| |
| <item> <bf/tgt_priv/ - pointer to the target driver specific data, can |
| be retrieved in task_mgmt_fn_done() via <it/scst_mgmt_cmd_get_status()/ |
| function. |
| |
| </itemize> |
| |
| Returns 0 if the command was successfully created and scheduled for |
| execution, error code otherwise. On success, the completion status of |
| the command will be reported asynchronously via task_mgmt_fn_done() |
| driver's callback. |
| |
| <sect1>scst_rx_mgmt_fn_lun() |
| |
| <p> |
| Function <bf/scst_rx_mgmt_fn_lun()/ tells SCST to perform the specified |
| task management function, based on the LUN. Currently it can be used for |
| any function, except <it/SCST_ABORT_TASK/. |
| |
| It is defined as the following: |
| |
| <verb> |
| int scst_rx_mgmt_fn_lun( |
| struct scst_session *sess, |
| int fn, |
| const uint8_t *lun, |
| int lun_len, |
| int atomic, |
| void *tgt_priv); |
| </verb> |
| |
| Where: |
| |
| <itemize> |
| |
| <item> <bf/sess/ - the session, on which the command should be performed. |
| |
| <item> <bf/fn/ - task management function, one of the constants above. |
| |
| <item> <bf/lun/ - LUN, the format is the same as for <bf/scst_rx_cmd()/. |
| |
| <item> <bf/lun_len/ - LUN's length. |
| |
| <item> <bf/atomic/ - true, if the function called in the atomic context. |
| |
| <item> <bf/tgt_priv/ - pointer to the target driver specific data, can |
| be retrieved in task_mgmt_fn_done() via <it/scst_mgmt_cmd_get_status()/ |
| function. |
| |
| </itemize> |
| |
| Returns 0 if the command was successfully created and scheduled for |
| execution, error code otherwise. On success, the completion status of |
| the command will be reported asynchronously via task_mgmt_fn_done() |
| driver's callback. |
| |
| Possible status constants which can be returned by |
| <bf/scst_mgmt_cmd_get_status()/: |
| |
| <itemize> |
| |
| <item> <bf/SCST_MGMT_STATUS_SUCCESS/ - success |
| |
| <item> <bf/SCST_MGMT_STATUS_TASK_NOT_EXIST/ - requested task does not exist |
| |
| <item> <bf/SCST_MGMT_STATUS_LUN_NOT_EXIST/ - requested LUN does not exist |
| |
| <item> <bf/SCST_MGMT_STATUS_FN_NOT_SUPPORTED/ - requested TM function |
| does not exist. |
| |
| <item> <bf/SCST_MGMT_STATUS_REJECTED/ - TM function rejected. |
| |
| <item> <bf/SCST_MGMT_STATUS_FAILED/ - TM function failed. |
| |
| </itemize> |
| |
| <sect>SGV cache<label id="sgv_cache"> |
| |
| <p> |
| SCST SGV cache is a memory management subsystem in SCST. One can call it |
| a "memory pool", but Linux kernel already have a mempool interface, |
| which serves different purposes. SGV cache provides to SCST core, target |
| drivers and backend dev handlers facilities to allocate, build and cache |
| SG vectors for data buffers. The main advantage of it is the caching |
| facility, when it doesn't free to the system each vector, which is not |
| used anymore, but keeps it for a while (possibly indefinitely) to let it |
| be reused by the next consecutive command. This allows to: |
| |
| <itemize> |
| |
| <item> Reduce commands processing latencies and, hence, improve performance; |
| |
| <item> Make commands processing latencies predictable, which is essential |
| for RT applications. |
| |
| </itemize> |
| |
| The freed SG vectors are kept by the SGV cache either for some (possibly |
| indefinite) time, or, optionally, until the system needs more memory and |
| asks to free some using the set_shrinker() interface. Also the SGV cache |
| allows to: |
| |
| <itemize> |
| |
| <item> Cluster pages together. "Cluster" means merging adjacent pages in a |
| single SG entry. It allows to have less SG entries in the resulting SG |
| vector, hence improve performance handling it as well as allow to |
| work with bigger buffers on hardware with limited SG capabilities. |
| |
| <item> Set custom page allocator functions. For instance, scst_user device |
| handler uses this facility to eliminate unneeded mapping/unmapping of |
| user space pages and avoid unneeded IOCTL calls for buffers allocations. |
| In fileio_tgt application, which uses a regular malloc() function to |
| allocate data buffers, this facility allows ~30% less CPU load and |
| considerable performance increase. |
| |
| <item> Prevent each initiator or all initiators altogether to allocate too |
| much memory and DoS the target. Consider 10 initiators, which can have |
| access to 10 devices each. Any of them can queue up to 64 commands, each |
| can transfer up to 1MB of data. So, all of them in a peak can allocate |
| up to 10*10*64 = ~6.5GB of memory for data buffers. This amount must be |
| limited somehow and the SGV cache performs this function. |
| |
| </itemize> |
| |
| <sect1> Implementation |
| |
| <p> |
| From implementation POV the SGV cache is a simple extension of the kmem |
| cache. It can work in 2 modes: |
| |
| <enum> |
| |
| <item> With fixed size buffers. |
| |
| <item> With a set of power 2 size buffers. In this mode each SGV cache |
| (struct sgv_pool) has SGV_POOL_ELEMENTS (11 currently) of kmem caches. |
| Each of those kmem caches keeps SGV cache objects (struct sgv_pool_obj) |
| corresponding to SG vectors with size of order X pages. For instance, |
| request to allocate 4 pages will be served from kmem cache[2&rsqb, since the |
| order of the of number of requested pages is 2. If later request to |
| allocate 11KB comes, the same SG vector with 4 pages will be reused (see |
| below). This mode is in average allows less memory overhead comparing |
| with the fixed size buffers mode. |
| |
| </enum> |
| |
| Consider how the SGV cache works in the set of buffers mode. When a |
| request to allocate new SG vector comes, sgv_pool_alloc() via |
| sgv_get_obj() checks if there is already a cached vector with that |
| order. If yes, then that vector will be reused and its length, if |
| necessary, will be modified to match the requested size. In the above |
| example request for 11KB buffer, 4 pages vector will be reused and |
| modified using trans_tbl to contain 3 pages and the last entry will be |
| modified to contain the requested length - 2*PAGE_SIZE. If there is no |
| cached object, then a new sgv_pool_obj will be allocated from the |
| corresponding kmem cache, chosen by the order of number of requested |
| pages. Then that vector will be filled by pages and returned. |
| |
| In the fixed size buffers mode the SGV cache works similarly, except |
| that it always allocate buffer with the predefined fixed size. I.e. |
| even for 4K request the whole buffer with predefined size, say, 1MB, |
| will be used. |
| |
| In both modes, if size of a request exceeds the maximum allowed for |
| caching buffer size, the requested buffer will be allocated, but not |
| cached. |
| |
| Freed cached sgv_pool_obj objects are actually freed to the system |
| either by the purge work, which is scheduled once in 60 seconds, or in |
| sgv_shrink() called by system, when it's asking for memory. |
| |
| <sect1> Interface |
| |
| <sect2> sgv_pool *sgv_pool_create() |
| |
| <p> |
| <verb> |
| struct sgv_pool *sgv_pool_create( |
| const char *name, |
| enum sgv_clustering_types clustered, int single_alloc_pages, |
| bool shared, int purge_interval) |
| </verb> |
| |
| This function creates and initializes an SGV cache. It has the following |
| arguments: |
| |
| <itemize> |
| |
| <item> <bf/name/ - the name of the SGV cache |
| |
| <item> <bf/clustered/ - sets type of the pages clustering. The type can be: |
| |
| <itemize> |
| |
| <item> <bf/sgv_no_clustering/ - no clustering performed. |
| |
| <item> <bf/sgv_tail_clustering/ - a page will only be merged with the latest |
| previously allocated page, so the order of pages in the SG will be |
| preserved |
| |
| <item> <bf/sgv_full_clustering/ - free merging of pages at any place in |
| the SG is allowed. This mode usually provides the best merging |
| rate. |
| |
| </itemize> |
| |
| <item> <bf/single_alloc_pages/ - if 0, then the SGV cache will work in the set of |
| power 2 size buffers mode. If >0, then the SGV cache will work in the |
| fixed size buffers mode. In this case single_alloc_pages sets the |
| size of each buffer in pages. |
| |
| <item> <bf/shared/ - sets if the SGV cache can be shared between devices or not. |
| The cache sharing allowed only between devices created inside the same |
| address space. If an SGV cache is shared, each subsequent call of |
| sgv_pool_create() with the same cache name will not create a new cache, |
| but instead return a reference to it. |
| |
| <item> <bf/purge_interval/ - sets the cache purging interval. I.e. an SG buffer |
| will be freed if it's unused for time t purge_interval <= t < |
| 2*purge_interval. If purge_interval is 0, then the default interval |
| will be used (60 seconds). If purge_interval <0, then the automatic |
| purging will be disabled. Shrinking by the system's demand will also |
| be disabled. |
| |
| </itemize> |
| |
| Returns the resulting SGV cache or NULL in case of any error. |
| |
| <sect2> void sgv_pool_del() |
| |
| <p> |
| <verb> |
| void sgv_pool_del( |
| struct sgv_pool *pool) |
| </verb> |
| |
| This function deletes the corresponding SGV cache. If the cache is |
| shared, it will decrease its reference counter. If the reference counter |
| reaches 0, the cache will be destroyed. |
| |
| <sect2> void sgv_pool_flush() |
| |
| <p> |
| <verb> |
| void sgv_pool_flush( |
| struct sgv_pool *pool) |
| </verb> |
| |
| This function flushes, i.e. frees, all the cached entries in the SGV |
| cache. |
| |
| <sect2> void sgv_pool_set_allocator() |
| |
| <p> |
| <verb> |
| void sgv_pool_set_allocator( |
| struct sgv_pool *pool, |
| struct page *(*alloc_pages_fn)(struct scatterlist *sg, gfp_t gfp, void *priv), |
| void (*free_pages_fn)(struct scatterlist *sg, int sg_count, void *priv)); |
| </verb> |
| |
| This function allows to set for the SGV cache a custom pages allocator. For |
| instance, scst_user uses such function to supply to the cache mapped from |
| user space pages. |
| |
| <bf/alloc_pages_fn()/ has the following parameters: |
| |
| <itemize> |
| |
| <item> <bf/sg/ - SG entry, to which the allocated page should be added. |
| |
| <item> <bf/gfp/ - the allocation GFP flags |
| |
| <item> <bf/priv/ - pointer to a private data supplied to sgv_pool_alloc() |
| |
| </itemize> |
| |
| This function should return the allocated page or NULL, if no page was |
| allocated. |
| |
| |
| <bf/free_pages_fn()/ has the following parameters: |
| |
| <itemize> |
| |
| <item> <bf/sg/ - SG vector to free |
| |
| <item> <bf/sg_count/ - number of SG entries in the sg |
| |
| <item> <bf/priv/ - pointer to a private data supplied to the |
| corresponding sgv_pool_alloc() |
| |
| </itemize> |
| |
| <sect2> struct scatterlist *sgv_pool_alloc() |
| |
| <p> |
| <verb> |
| struct scatterlist *sgv_pool_alloc( |
| struct sgv_pool *pool, |
| unsigned int size, |
| gfp_t gfp_mask, |
| int flags, |
| int *count, |
| struct sgv_pool_obj **sgv, |
| struct scst_mem_lim *mem_lim, |
| void *priv) |
| </verb> |
| |
| This function allocates an SG vector from the SGV cache. It has the |
| following parameters: |
| |
| <itemize> |
| |
| <item> <bf/pool/ - the cache to alloc from |
| |
| <item> <bf/size/ - size of the resulting SG vector in bytes |
| |
| <item> <bf/gfp_mask/ - the allocation mask |
| |
| <item> <bf/flags/ - the allocation flags. The following flags are possible and |
| can be set using OR operation: |
| |
| <enum> |
| |
| <item> <bf/SGV_POOL_ALLOC_NO_CACHED/ - the SG vector must not be cached. |
| |
| <item> <bf/SGV_POOL_NO_ALLOC_ON_CACHE_MISS/ - don't do an allocation on a |
| cache miss. |
| |
| <item> <bf/SGV_POOL_RETURN_OBJ_ON_ALLOC_FAIL/ - return an empty SGV object, |
| i.e. without the SG vector, if the allocation can't be completed. |
| For instance, because SGV_POOL_NO_ALLOC_ON_CACHE_MISS flag set. |
| |
| </enum> |
| |
| <item> <bf/count/ - the resulting count of SG entries in the resulting SG vector. |
| |
| <item> <bf/sgv/ - the resulting SGV object. It should be used to free the |
| resulting SG vector. |
| |
| <item> <bf/mem_lim/ - memory limits, see below. |
| |
| <item> <bf/priv/ - pointer to private for this allocation data. This pointer will |
| be supplied to alloc_pages_fn() and free_pages_fn() and can be |
| retrieved by sgv_get_priv(). |
| |
| </itemize> |
| |
| This function returns pointer to the resulting SG vector or NULL in case |
| of any error. |
| |
| <sect2> void sgv_pool_free() |
| |
| <p> |
| <verb> |
| void sgv_pool_free( |
| struct sgv_pool_obj *sgv, |
| struct scst_mem_lim *mem_lim) |
| </verb> |
| |
| This function frees previously allocated SG vector, referenced by SGV |
| cache object sgv. |
| |
| <sect2> void *sgv_get_priv(struct sgv_pool_obj *sgv) |
| |
| <p> |
| <verb> |
| void *sgv_get_priv( |
| struct sgv_pool_obj *sgv) |
| </verb> |
| |
| This function allows to get the allocation private data for this SGV |
| cache object sgv. The private data are set by sgv_pool_alloc(). |
| |
| <sect2> void scst_init_mem_lim() |
| |
| <p> |
| <verb> |
| void scst_init_mem_lim( |
| struct scst_mem_lim *mem_lim) |
| </verb> |
| |
| This function initializes memory limits structure mem_lim according to |
| the current system configuration. This structure should be latter used |
| to track and limit allocated by one or more SGV caches memory. |
| |
| |
| <sect1> Runtime information and statistics. |
| |
| <p> |
| SGV cache runtime information and statistics is available in |
| <it>/proc/scsi_tgt/sgv</it>. |
| |
| |
| <sect> Target driver qla2x00t |
| |
| <p> |
| Target driver qla2x00t allows to use QLogic 2xxx based adapters in |
| the target (server) mode. |
| |
| It consists from two parts: |
| |
| <itemize> |
| |
| <item> <bf/qla2xxx/ - patched initiator driver from Linux kernel, which |
| is, among other things, intended to perform all the initialization and |
| shutdown tasks. |
| |
| <item> <bf/qla2x00tgt/ - target mode add-on for the changed qla2xxx |
| |
| </itemize> |
| |
| The initiator driver qla2xxx was changed to: |
| |
| <itemize> |
| |
| <item> To provide support for the target mode add-on via a set of |
| exported callbacks |
| |
| <item> To provide extra info and management interface in the driver's |
| sysfs interface (attributes target_mode_enabled, ports_database, etc.) |
| |
| <item> To fix some problems uncovered during target mode development and |
| usage. |
| |
| </itemize> |
| |
| The changes are relatively small (few thousands lines big patch) and local. |
| |
| The changed qla2xxx is still capable to work as initiator only. Mode, |
| when a host acts as initiator and target simultaneously, is supported as |
| well. |
| |
| Since firmware interface for 24xx+ chips is fundamentally different from |
| earlier versions, qla2x00t generally contains 2 separate drivers sharing |
| some common processing. |
| |
| <sect1> Driver initialization |
| |
| <p> |
| On initialization, qla2x00tgt registers its SCST template tgt2x_template |
| in the SCST core. Then during template registration SCST core calls |
| detect() callback which is function q2t_target_detect(). |
| |
| In this function qla2x00tgt registers its callbacks in qla2xxx by |
| calling qla2xxx_tgt_register_driver(). Qla2xxx_tgt_register_driver() |
| stores pointer to the being registered callbacks in variable qla_target. |
| |
| Then q2t_target_detect() calls qla2xxx_add_targets(), which calls for |
| each known local FC port (HBA instance) qla_target.tgt_host_action() |
| callback with ADD_TARGET action. Then q2t_host_action() calls |
| q2t_add_target() which registers SCST target for this FC port. |
| |
| If later a new FC port is hot added, qla2x00_probe_one() will also call |
| for all new local ports qla_target.tgt_host_action() with ADD_TARGET |
| action. |
| |
| |
| <sect1> Driver unload |
| |
| <p> |
| When a local FC port is being removed, the Linux kernel calls |
| qla2x00_remove_one(), which then qla_target.tgt_host_action() with |
| REMOVE_TARGET action. |
| |
| Then q2t_host_action() calls q2t_remove_target(), which unregisters the |
| corresponding SCST target in SCST. During unregistration SCST core calls |
| release() callback of tgt2x_template, which is q2t_target_release(). |
| |
| Then q2t_target_release() calls q2t_target_stop(). Then |
| q2t_target_stop() marks this target as stopped by setting flag tgt_stop. |
| When this flag is set, all incoming from initiators commands are |
| refused. |
| |
| Then q2t_target_stop() schedules deletion of all sessions of the target. |
| |
| Then q2t_target_stop() waits until all outstanding commands finished and |
| sessions deleted. |
| |
| Then q2t_target_stop(), if necessary, calls qla2x00_disable_tgt_mode() |
| to disables target mode, which disables target mode of the corresponding |
| HBA and resets it. Then qla2x00_disable_tgt_mode() waits until reset |
| finished. |
| |
| Then q2t_target_stop() returns and then q2t_target_release() frees the |
| target. |
| |
| |
| If module qla2x00tgt is being unloaded, q2t_exit() at first takes |
| q2t_unreg_rwsem on writing. Taking it is necessary to make sure that |
| q2t_host_action() will not be active during qla2x00tgt unload. |
| |
| Then q2t_exit() calls scst_unregister_target_template() for |
| tgt2x_template, which then in a loop will unregister all QLA SCST targets |
| from SCST as described above. |
| |
| |
| <sect1> Enabling target mode |
| |
| <p> |
| When command to enable target mode received, |
| qla_target.tgt_host_action() with action ENABLE_TARGET_MODE called. Then |
| q2t_host_action() goes over all discovered remote of the being enabled |
| target and adds SCST sessions for all them. |
| |
| Then it calls qla2x00_enable_tgt_mode(), which enables target mode of |
| the corresponding HBA and resets it. Then qla2x00_enable_tgt_mode() |
| waits until reset finished. |
| |
| During reset firmware initialization functions detect that target mode |
| is enables and initialize the firmware accordingly. |
| |
| |
| <sect1> Disabling target mode |
| |
| <p> |
| When command to disable target mode received, |
| qla_target.tgt_host_action() with action DISABLE_TARGET_MODE called. Then |
| q2t_host_action() calls q2t_target_stop(), which processes as describe above. |
| |
| |
| <sect1> SCST sessions management |
| |
| <p> |
| As required by SCSI and FC standards, each remote initiator FC port |
| has the corresponding SCST session. |
| |
| Since qla2xxx is not intended to strictly maintain database of remote |
| initiator FC ports as it is needed for target mode, qla2x00t uses mixed |
| approach for SCST sessions management, when both qla2xxx and QLogic |
| firmware generate events and information about currently active remote |
| FC ports. |
| |
| Remote FC ports management also has to handle changing FC and loop IDs |
| after fabric events, so it needs to constantly monitor FC and loop IDs |
| of the registered FC ports. This is implemented by checks in |
| q2t_create_sess() that being registered FC port already has SCST session |
| and q2t_check_fcport_exist() in q2t_del_sess_work_fn(). See below for |
| more info. |
| |
| Interaction with qla2xxx is implemented using tgt_fc_port_added() and |
| tgt_fc_port_deleted() qla_target's callbacks. |
| |
| Callback tgt_fc_port_added() called by qla2xxx when the target driver |
| detects new remote FC port. Assigned to it q2t_fc_port_added() checks if |
| an SCST session already exists for this remote FC port and, if not, |
| creates it. |
| |
| Callback tgt_fc_port_deleted() called by qla2xxx when it deletes a |
| remote FC port from its database. Assigned to it q2t_fc_port_deleted() |
| checks if an SCST session already exists for this remote FC port and, if |
| yes, schedules it for deletion. |
| |
| Driver qla2x00tgt has 2 types of SCST sessions: local and not local. |
| Sessions created by q2t_fc_port_added() are not local. Local sessions |
| created if qla2x00tgt receives a command from remote initiator for which |
| there is no know remote FC port and, hence, SCST session. Local sessions |
| are created in tgt->sess_work (q2t_sess_work_fn()) by calling |
| q2t_make_local_sess(). All received from remote initiators commands for |
| local sessions are delayed until the sessions are created. |
| |
| To minimize affecting initiators by FC fabric events, qla2x00tgt doesn't |
| immediately delete SCST sessions scheduled for deletion, but instead |
| delay them for some time. If during this time a command from an unknown |
| remote initiator received, q2t_make_local_sess()/q2t_create_sess() at |
| first check if a session for this initiator already exists and, if yes, |
| undelete then reuse it after updating its s_id and loop_id to new values. |
| |
| If a session not reused during the delete delay time, then |
| q2t_del_sess_work_fn() asks the firmware internal database if it knows |
| the corresponding remote FC port. If yes, then this session is undeleted |
| and its s_id and loop_id updated to new values. If no, the session is |
| deleted. |
| |
| |
| <sect1> Handling stuck commands |
| |
| <p> |
| Driver qla2x00tgt defines in tgt2x_template callback |
| on_hw_pending_cmd_timeout for handling stuck commands in |
| q2t_on_hw_pending_cmd_timeout() function, with max_hw_pending_time |
| timeout set Q2T_MAX_HW_PENDING_TIME (60 seconds). If the firmware |
| doesn't return reply for one or more IOCBs for the corresponding SCST |
| command, SCST core calls this callback. |
| |
| In this callback all the stuck commands are forcibly finished. |
| |
| <appendix> |
| |
| <sect> Debugging and troubleshooting |
| |
| <p> |
| SCST core and its drivers provide excessive debugging and logging |
| facilities suitable to catch and analyze problems of virtually any level |
| of complexity. |
| |
| Depending from amount debugging and logging facilities available, there |
| are 3 types of builds: |
| |
| <itemize> |
| |
| <item> <bf/release/ - has basic amount of logging, suitable for basic |
| tracing. Extra checking is disabled in this mode. This is the default |
| mode. |
| |
| <item> <bf/debug/ - has full amount of logging and extrachecks enabled. |
| Has slower and much bigger binary code, but suitable for advanced |
| tracing and debugging. Also in this mode more logging is enabled by |
| default. |
| |
| <item> <bf/perf/ - has all logging and extrachecks disables. Intended to |
| performance measuremens, including measurements of overhead introduced |
| by the logging and extrachecks facilities. |
| |
| </itemize> |
| |
| Switch between build modes is done by calling "make x2y", where "x" - |
| current build mode and "y" - desired build mode. For instance, to switch |
| from release to debug mode you should run "make release2debug". |
| |
| <sect1> Logging levels management |
| |
| <p> |
| Logging levels management is done using "trace_level" file located in the |
| driver's proc interface subdirectory. Each SCST driver has it, except in |
| the perf build mode. For instance, for SCST core it's located in |
| /proc/scsi_tgt/. For qla2x00t it's located in /proc/scsi_tgt/qla2x00tgt/. |
| |
| Reading from it you can find currently enabled logging levels. |
| |
| You can change them by writing in this file, like: |
| |
| # echo "add scsi" >/proc/scsi_tgt/trace_level |
| |
| The following commands are available: |
| |
| <itemize> |
| |
| <item> <bf/add trace_level/ - adds (enables) the corresponding trace level |
| |
| <item> <bf/del trace_level/ - deletes (disables) the corresponding trace level |
| |
| <item> <bf/set mask/ - sets all trace levels at ones using a mask, e.g. |
| 0x1538 |
| |
| <item> <bf/all/ - enables all trace levels |
| |
| <item> <bf/none/ - disables all trace levels |
| |
| <item> <bf/default/ - sets all trace levels in the default value |
| |
| <item> <bf/dump_prs dev_name/ - dumps Persistent Reservations states for |
| device "dev_name" |
| |
| </itemize> |
| |
| The following trace levels are common for all drivers: |
| |
| <itemize> |
| |
| <item> <bf/function/ - enables printing the corresponding function names |
| for each logged messages |
| |
| <item> <bf/line/ - enables printing the corresponding numbers of line of |
| code for each logged message |
| |
| <item> <bf/pid/ - enables printing PIDs of the corresponding processes |
| or threads for each logged message |
| |
| <item> <bf/scsi/ - enables logging of processed SCSI commands and their |
| processing results |
| |
| <item> <bf/mgmt/ - enables logging of processed Task Management functions |
| |
| <item> <bf/minor/ - enables logging of minor events, line unknown SCSI |
| commands or difference between buffer lengths encoded in CDBs and |
| expected transfer values |
| |
| <item> <bf/out_of_mem/ - enables logging of out of memory events |
| |
| <item> <bf/entryexit/ - enables logging of functions entry and exit. Not |
| available in the release build. |
| |
| <item> <bf/mem/ - enables logging of memory allocation and freeing. Not |
| available in the release build. |
| |
| <item> <bf/debug/ - enables various debug logging messages. Not |
| available in the release build. |
| |
| <item> <bf/buff/ - enables logging of various buffers contain. Not |
| available in the release build. |
| |
| <item> <bf/sg/ - enables logging of SG vectors manipulations. Not |
| available in the release build. |
| |
| <item> <bf/mgmt_dbg/ - enables debug logging of Task Management |
| functions processing. Not available in the release build. |
| |
| <item> <bf/special/ - enables logging of "special" events. Intended to |
| temporary enable logging of some debug messages without enabling the |
| whole "debug" level. Not available in the release build. |
| |
| </itemize> |
| |
| The following trace levels are additionally available for SCST core: |
| |
| <itemize> |
| |
| <item> <bf/scsi_serializing/ - enables logging of SCSI commands task |
| attributes processings (SIMPLE, ORDERED, etc.). Not available in the |
| release build. |
| |
| <item> <bf/retry/ - enables logging of retries of rdy_to_xfer() and |
| xmit_response() target drivers callbacks. Not available in the release |
| build. |
| |
| <item> <bf/recv_bot/, <bf/send_bot/, <bf/recv_top/, <bf/send_top/ - |
| enables logging of commands buffers on various processing stages. Not |
| available in the release build. |
| |
| </itemize> |
| |
| <sect1> Preparing a debug kernel |
| |
| <p> |
| SCST logging can produce huge amount of logging, which default kernel |
| configuration can't cope with, so it needs some extra adjustments. |
| |
| For that you should change in lib/Kconfig.debug or init/Kconfig |
| depending from your kernel version LOG_BUF_SHIFT from "12 21" to "12 25". |
| |
| Then you should in your .config set CONFIG_LOG_BUF_SHIFT to 25. |
| |
| Also, Linux kernel has a lot of helpful debug facilities, like lockdep, |
| which allows to catch various deadlocks, or memory allocation debugging. |
| It is recommended to enable them during SCST debugging. |
| |
| The following options are recommended to be enabled (available depending |
| from your kernel version): CONFIG_SLUB_DEBUG, CONFIG_PRINTK_TIME, |
| CONFIG_MAGIC_SYSRQ, CONFIG_DEBUG_FS, CONFIG_DEBUG_KERNEL, |
| CONFIG_DEBUG_SHIRQ, CONFIG_DETECT_SOFTLOCKUP, CONFIG_DETECT_HUNG_TASK, |
| CONFIG_SLUB_DEBUG_ON, CONFIG_SLUB_STATS, CONFIG_DEBUG_PREEMPT, |
| CONFIG_DEBUG_RT_MUTEXES, CONFIG_DEBUG_PI_LIST, CONFIG_DEBUG_SPINLOCK, |
| CONFIG_DEBUG_MUTEXES, CONFIG_DEBUG_LOCK_ALLOC, CONFIG_PROVE_LOCKING, |
| CONFIG_LOCKDEP, CONFIG_LOCK_STAT, CONFIG_DEBUG_SPINLOCK_SLEEP, |
| CONFIG_STACKTRACE, CONFIG_DEBUG_BUGVERBOSE, CONFIG_DEBUG_VM, |
| CONFIG_DEBUG_VIRTUAL, CONFIG_DEBUG_WRITECOUNT, CONFIG_DEBUG_MEMORY_INIT, |
| CONFIG_DEBUG_LIST, CONFIG_DEBUG_SG, CONFIG_DEBUG_NOTIFIERS, |
| CONFIG_FRAME_POINTER, CONFIG_FAULT_INJECTION, CONFIG_FAILSLAB, |
| CONFIG_FAIL_PAGE_ALLOC, CONFIG_FAIL_MAKE_REQUEST, |
| CONFIG_FAIL_IO_TIMEOUT, CONFIG_FAULT_INJECTION_DEBUG_FS, |
| CONFIG_FAULT_INJECTION_STACKTRACE_FILTER. |
| |
| <sect1> Preparing logging subsystem |
| |
| <p> |
| It is recommended that you system logger daemon on the target configured: |
| |
| <itemize> |
| |
| <item> To store kernel logs in separate files on the fastest disk you |
| have. It will be better if this disk is dedicated for logging or, at |
| least, doesn't contain your LUNs data. |
| |
| <item> To write the kernel logs to the disk in asynchronous manner, i.e. |
| without calling fsync() after each written message. Usually, you can |
| achieve it, if you add a '-' sign before the corresponding file path in |
| your syslog daemon conf file, like: |
| |
| kern.* -/var/log/kern.log |
| |
| </itemize> |
| |
| <sect1> Decoding OOPS messages |
| |
| <p> |
| You can decode an OOPS message to the corresponding line in C file |
| using gdb "l" command. For example, an OOPS message has a line: |
| |
| <verb> |
| [<ffffffff88646174>&rsqb :iscsi_scst:iscsi_extracheck_is_rd_thread+0x94/0xb0 |
| </verb> |
| |
| You can decode it by: |
| |
| <verb> |
| $ gdb iscsi-scst.ko |
| (gdb) l *iscsi_scst:iscsi_extracheck_is_rd_thread+0x94 |
| </verb> |
| |
| For that the corresponding module (iscsi-scst.ko) should be build with |
| debug info. But modules not always have debug info built-in. To |
| workaround it you can add "-g" flag in the corresponding Makefile |
| (without changing anything else!) or enable in .config using "make |
| menuconfig" building kernel with debug info. Then rebuild only the .o |
| file you need. |
| |
| For instance, to decode OOPS in mm/filemap.c in the kernel you need |
| enable in .config building kernel with debug info and then run: |
| |
| <verb> |
| $ make mm/filemap.o |
| ... |
| $ gdb mm/filemap.o |
| </verb> |
| |
| </article> |