| Before asking any questions directly or in scst-devel mailing list make |
| sure that you read *ALL* relevant documentation files (at least, 2 |
| README files: one for SCST and one for the target driver you are using) |
| and *understood* *ALL* written there. I personally very much like |
| working with people who understand what they are doing and hate when |
| somebody tries to use me as a replacement for his brain to save his time |
| on expense of mine. So, in such cases it shouldn't be a surprise if your |
| question will not be answered or will be answered in the RTFM style. |
| |
| See a very good guide "How To Ask Questions The Smart Way" in |
| http://www.catb.org/~esr/faqs/smart-questions.html. |
| |
| Sorry, if the above might sound too harsh. Unfortunately, we, SCST |
| developers, have limited abilities and can't waste them keeping |
| explaining basic concepts and answering on the same questions again and |
| again. |
| |
| Examples of too FAQ areas are "What are those aborts and resets, which |
| your target from time to time logging, mean and what to do with them?", |
| "Do they relate to I/O stalls I sometimes experience" and "Why after |
| them my device was put offline?". |
| |
| So, as a bottom line, don't ask questions answers on which you can find |
| out yourself by a simple documentation reading and minimal thinking |
| effort. |
| |
| If you experience kernel crash, hang, etc., you should follow |
| REPORTING-BUGS file from your kernel source tree. |
| |
| For most questions it is very desirable if you attach to your message |
| full kernel log from the target since it's booted. Note, *SINCE IT |
| BOOTED*. Please don't try to be smart and filter out what's you |
| think isn't needed. What's usually removed could allow us to see the |
| target and SCST configurations. |
| |
| Please, NEVER send dmesg output without timestamps, because timestamps |
| are very important to see the whole picture. You should either enable |
| CONFIG_PRINTK_TIME kernel compile option, or use kernel logs your system |
| logger stored for you in /var/log. In case if you enabled a trace option |
| producing a lot of log data, you should make NOT CORRUPTED logs as |
| described in section "Dealing with massive logs" of the SCST README. |
| |
| ****************************************************** |
| ****************************************************** |
| **!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!** |
| **!! ALWAYS COMPRESS YOUR LOGS USING "bzip2 -9" !!** |
| **!! OR, IF THEY ARE SMALL (<10K), MAKE SURE YOUR !!** |
| **!! EDITOR OR MAILER NOT WORD-WRAP LONG LINES !!** |
| **!! (TO BE SURE ALWAYS SEND LOGS AS ATTACHMENTS) !!** |
| **!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!** |
| ****************************************************** |
| ****************************************************** |
| |
| Example of a really bad question: |
| |
| ====================================================================== |
| |
| In our user space driver , i use epoll_wait to wait on multiple file |
| descriptors for multiple devices. Apparently when i wait on the ioctl in |
| blocking mode , everything works well , but when i wait on epoll , and |
| try to attach a target device , i get immediately a "Bad address" error |
| value from the epoll. |
| |
| What is the reason ? |
| |
| ---------------------------------------------------------------------- |
| |
| This question is bad, because, apparently, the author was doing |
| something wrong with epoll, but instead of checking the scst_user source |
| code to find out when "Bad address" error can be returned and understand |
| possible reasons for it, he expected others to do that for him. He even |
| didn't bothered to look in the kernel log, where, very probably, the |
| reason of the error was logged. |
| |
| |
| Here are three examples of good questions: |
| |
| ====================================================================== |
| |
| I'm looking for a help in understanding of SCST internal architecture |
| and operation. The problem I'm experiencing now is that SCST seems to |
| process deferred commands incorrectly in some cases. More specifically, |
| I'm confused with the 'while' loop in scst_send_to_midlev function. |
| |
| As far as I understand, the basic execution path consists of a call to |
| scst_do_send_midlev followed by taking of a decision on this command |
| (continue with this command, reschedule it, or move to the next one), |
| the decision is stored in 'int res', which is then returned from the |
| function. |
| |
| However, if there are deferred commands on the device, the function does |
| not return but makes another call to scst_do_send_to_midlev, analyzes |
| the return code again and stores the decision in 'int res' thereby |
| erasing the decision for the previous command. If scst_send_to_midlev |
| exits now, it will return the _new_ decision (for the deferred command) |
| whereas the scst_process_active_cmd will think that it is the decision |
| for the command that was originally passed to scst_send_to_midlev. |
| |
| For example, this will cause problems in the following situation: |
| 1. scst_send_to_midlev is called with cmd == 0x80000100 |
| 2. scst_do_send_to_midlev is called with cmd == 0x8000100 |
| 3. scst_do_send_to_midlev returns with SCST_EXEC_COMPLETED |
| (in certain scenarios the command is already destroyed at this point) |
| 4. scst_check_deferred_commands finds the deferred cmd == 0x80000200 |
| 5. scst_do_send_to_midlev is called with cmd == 0x80000200 |
| 6. scst_do_send_to_midlev returns with SCST_EXEC_NEED_THREAD |
| 7. scst_send_to_midlev returns with SCST_CMD_STATE_RES_NEED_THREAD |
| 8. Now, the scst_process_active_cmd will try to reschedule command 0x8000100 |
| which is already destroyed at this point ! |
| |
| Can anyone on the list confirm my guess? Or, this situation should never |
| happen because of some other condition which I may have missed? Right |
| now I can't think of any of simple methods to work around the issue, |
| i.e. any of my ideas require rewriting significant part of the code. |
| |
| ====================================================================== |
| |
| Hello, |
| |
| I have two machines (SCST targets) with the following parameters: |
| - two dual core Xeon CPUs |
| - QLA2342 FC HBA |
| - Areca SATA RAID HBA |
| - Linux 2.6.21.3, running in 64 bit mode with 16G RAM |
| - SCST trunk version |
| |
| On the client side there is a Solaris 10 U3 machine, with the same (chip |
| wise) Qlogic controller. |
| |
| There is an FC switch between the three machines, and each of the |
| targets are zoned to the client's port in a one-by-one manner, so HBA |
| port 1 sees only target 1 and port 2 sees only target 2. |
| |
| The targets are configured with two large sparse files on XFS (8 TB |
| each, with dd if=/dev/zero of=file bs=1M count=0 seek=8388608). |
| |
| In Solaris I do various tests with SVM (Sun's built in volume manager) |
| and multiterabyte UFS. Occasionally, there are some strange write |
| errors, where the volume manager drops its volumes and without a VM, a |
| simple UFS fs write can fail too. |
| |
| I see various errors logged by the kernel (Solaris'), these are some |
| examples, both with and without SVM: |
| Jun 21 10:42:14 solaris fctl: [ID 517869 kern.warning] WARNING: |
| fp(1)::GPN_ID for D_ID=621200 failed |
| Jun 21 10:42:14 solaris fctl: [ID 517869 kern.warning] WARNING: |
| fp(1)::N_x Port with D_ID=621200, PWWN=210000e08b944419 disappeared from |
| fabric |
| Jun 21 10:42:53 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 10:42:53 solaris SCSI transport failed: reason |
| 'tran_err': retrying command |
| Jun 21 10:43:06 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 10:43:06 solaris SCSI transport failed: reason 'timeout': |
| retrying command |
| Jun 21 10:43:13 solaris scsi: [ID 107833 kern.notice] Device is gone |
| Jun 21 10:43:13 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 10:43:13 solaris transport rejected fatal error |
| Jun 21 10:43:13 solaris md_stripe: [ID 641072 kern.warning] WARNING: md: |
| d10: write error on /dev/dsk/c2t210000E08B944419d0s6 |
| Jun 21 10:43:13 solaris last message repeated 9 times |
| Jun 21 10:43:13 solaris scsi: [ID 243001 kern.info] |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0 (fcp1): |
| Jun 21 10:43:13 solaris offlining lun=0 (trace=0), target=621200 |
| (trace=2800004) |
| Jun 21 10:43:13 solaris ufs: [ID 702911 kern.warning] WARNING: Error |
| writing master during ufs log roll |
| Jun 21 10:43:13 solaris ufs: [ID 127457 kern.warning] WARNING: ufs log |
| for /mnt changed state to Error |
| Jun 21 10:43:13 solaris ufs: [ID 616219 kern.warning] WARNING: Please |
| umount(1M) /mnt and run fsck(1M) |
| Jun 21 11:08:55 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 11:08:55 solaris offline or reservation conflict |
| Jun 21 11:09:41 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 11:09:41 solaris offline or reservation conflict |
| Jun 21 11:09:41 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 11:09:41 solaris offline or reservation conflict |
| Jun 21 11:09:41 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 11:09:41 solaris i/o to invalid geometry |
| Jun 21 11:09:41 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 11:09:41 solaris offline or reservation conflict |
| Jun 21 11:09:41 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 11:09:41 solaris i/o to invalid geometry |
| Jun 21 11:09:41 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 11:09:41 solaris offline or reservation conflict |
| Jun 21 11:09:41 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 11:09:41 solaris i/o to invalid geometry |
| Jun 21 11:09:43 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 11:09:43 solaris offline or reservation conflict |
| Jun 21 11:09:43 solaris scsi: [ID 107833 kern.warning] WARNING: |
| /pci@1,0/pci1022,7450@a/pcie11,105@1,1/fp@0,0/disk@w210000e08b944419,0 |
| (sd1): |
| Jun 21 11:09:43 solaris SYNCHRONIZE CACHE command failed (5) |
| |
| I don't see anything in the dmesg on the target side. |
| |
| After these errors SCST seems to be dead. I can't unload its modules and |
| can't communicate it via /proc. |
| A simple cat vdisk just waits and waits. |
| |
| Could you please help? What should I set/collect/send in this case to |
| help resolving this issue? |
| |
| ====================================================================== |
| |
| Hello, |
| |
| I am trying to get scst working on an Opteron machine. |
| |
| After some hours, playing with different kernel versions and different |
| missing functions, I've sticked with a 2.6.15 and a |
| drivers/scsi/scsi_lib.c hack from 2.6.14, which contains the |
| scsi_wait_req. (Linux is a mess, each point release changes something. |
| How can developers keep up with this?) |
| |
| Now everything seems to be OK, I could load the modules and such. |
| |
| I have a setup of two machines connected to each other in an FC-P2P |
| manner. The two machines has two 2G links between them. On the initiator |
| side I have FreeBSD, because I know that better and this is what I did |
| some target mode tests. |
| |
| The strange thing is that the loop seems to be only running at 1 Gbps: |
| [ 61.731265] QLogic Fibre Channel HBA Driver |
| [ 61.731454] GSI 21 sharing vector 0xD1 and IRQ 21 |
| [ 61.731563] ACPI: PCI Interrupt 0000:06:01.0[A] -> GSI 36 (level, low) -> IRQ 21 |
| [ 61.731821] qla2300 0000:06:01.0: Found an ISP2312, irq 21, iobase 0xffffc200 |
| 00014000 |
| [ 61.732194] qla2300 0000:06:01.0: Configuring PCI space... |
| [ 61.732441] qla2300 0000:06:01.0: Configure NVRAM parameters... |
| [ 61.816885] qla2300 0000:06:01.0: Verifying loaded RISC code... |
| [ 61.852177] qla2300 0000:06:01.0: Extended memory detected (512 KB)... |
| [ 61.852294] qla2300 0000:06:01.0: Resizing request queue depth (2048 -> 4096) |
| ... |
| [ 61.852604] qla2300 0000:06:01.0: LIP reset occurred (f8e8). |
| [ 61.852740] qla2300 0000:06:01.0: Waiting for LIP to complete... |
| [ 62.865911] qla2300 0000:06:01.0: LIP occurred (f7f7). |
| [ 62.866042] qla2300 0000:06:01.0: LOOP UP detected (1 Gbps). |
| [ 62.866269] qla2300 0000:06:01.0: Topology - (Loop), Host Loop address 0x0 |
| [ 62.868285] scsi0 : qla2xxx |
| [ 62.868507] qla2300 0000:06:01.0: |
| [ 62.868507] QLogic Fibre Channel HBA Driver: 8.01.03-k |
| [ 62.868508] QLogic QLA2312 - |
| [ 62.868509] ISP2312: PCI-X (100 MHz) @ 0000:06:01.0 hdma+, host#=0, fw=3.03.18 IPX |
| |
| |
| I did the following: |
| modprobe qla2x00tgt: |
| |
| [ 104.988170] qla2x00tgt: no version for "scst_unregister" found: kernel tainted. |
| |
| echo "open lun0 /data/lun0" >/proc/scsi_tgt/disk_fileio/disk_fileio" |
| [ 169.102877] scst: Device handler disk_fileio for type 0 loaded successfully |
| [ 169.103002] scst: Device handler cdrom_fileio for type 5 loaded successfully |
| [ 191.261000] dev_fileio: Attached SCSI target virtual disk lun0 (file="/data/l |
| un0", fs=1000001MB, bs=512, nblocks=2048002048, cyln=1000001) |
| [ 191.261191] scst: Attached SCSI target mid-level to virtual device lun0 (id 1 |
| ) |
| |
| and |
| echo "add lun0 0" > /proc/scsi_tgt/groups/Default/devices |
| |
| On the other side a camcontrol rescan all (SCSI rescan) gives me the following with a verbose logging kernel: |
| Mar 29 18:09:17 blade2 kernel: pass1: <SCST_FIO lun0 093> Fixed Direct Access SCSI-4 device |
| Mar 29 18:09:17 blade2 kernel: pass1: Serial Number 383 |
| Mar 29 18:09:17 blade2 kernel: pass1: 100.000MB/s transfers |
| Mar 29 18:09:17 blade2 kernel: da1 at isp0 bus 0 target 0 lun 0 |
| Mar 29 18:09:17 blade2 kernel: da1: <SCST_FIO lun0 093> Fixed Direct Access SCSI-4 device |
| Mar 29 18:09:17 blade2 kernel: da1: Serial Number 383 |
| Mar 29 18:09:17 blade2 kernel: da1: 100.000MB/s transfers |
| Mar 29 18:09:17 blade2 kernel: da1: 1024MB (2097152 512 byte sectors: 64H 32S/T 1024C) |
| Mar 29 18:09:17 blade2 kernel: (probe0:isp0:0:0:1): error 6 |
| Mar 29 18:09:17 blade2 kernel: (probe0:isp0:0:0:1): Unretryable Error |
| Mar 29 18:09:17 blade2 kernel: isp0: data overrun for command on 0.0.0 |
| Mar 29 18:09:17 blade2 kernel: (da1:isp0:0:0:0): Data Overrun |
| Mar 29 18:09:17 blade2 kernel: (da1:isp0:0:0:0): Retrying Command |
| Mar 29 18:09:18 blade2 kernel: (probe0:isp0:0:0:2): error 6 |
| Mar 29 18:09:18 blade2 kernel: (probe0:isp0:0:0:2): Unretryable Error |
| Mar 29 18:09:18 blade2 kernel: isp0: data overrun for command on 0.0.0 |
| Mar 29 18:09:18 blade2 kernel: (da1:isp0:0:0:0): Data Overrun |
| Mar 29 18:09:18 blade2 kernel: (da1:isp0:0:0:0): Retrying Command |
| Mar 29 18:09:18 blade2 kernel: (probe0:isp0:0:0:3): error 6 |
| Mar 29 18:09:18 blade2 kernel: (probe0:isp0:0:0:3): Unretryable Error |
| Mar 29 18:09:18 blade2 kernel: isp0: data overrun for command on 0.0.0 |
| Mar 29 18:09:18 blade2 kernel: (da1:isp0:0:0:0): Data Overrun |
| Mar 29 18:09:18 blade2 kernel: (da1:isp0:0:0:0): Retrying Command |
| Mar 29 18:09:18 blade2 kernel: (probe0:isp0:0:0:4): error 6 |
| Mar 29 18:09:18 blade2 kernel: (probe0:isp0:0:0:4): Unretryable Error |
| Mar 29 18:09:18 blade2 kernel: isp0: data overrun for command on 0.0.0 |
| Mar 29 18:09:18 blade2 kernel: (da1:isp0:0:0:0): Data Overrun |
| Mar 29 18:09:18 blade2 kernel: (da1:isp0:0:0:0): Retrying Command |
| Mar 29 18:09:18 blade2 kernel: (probe0:isp0:0:0:5): error 6 |
| Mar 29 18:09:18 blade2 kernel: (probe0:isp0:0:0:5): Unretryable Error |
| Mar 29 18:09:18 blade2 kernel: isp0: data overrun for command on 0.0.0 |
| Mar 29 18:09:18 blade2 kernel: (da1:isp0:0:0:0): Data Overrun |
| Mar 29 18:09:18 blade2 kernel: (da1:isp0:0:0:0): error 5 |
| Mar 29 18:09:18 blade2 kernel: (da1:isp0:0:0:0): Retries Exhausted |
| Mar 29 18:09:18 blade2 kernel: (probe0:isp0:0:0:6): error 6 |
| Mar 29 18:09:18 blade2 kernel: (probe0:isp0:0:0:6): Unretryable Error |
| Mar 29 18:09:19 blade2 kernel: (probe0:isp0:0:0:7): error 6 |
| Mar 29 18:09:19 blade2 kernel: (probe0:isp0:0:0:7): Unretryable Error |
| |
| |
| The device is there, but I cannot use it. |
| |
| BTW, the target mode machine (Linux) runs on a dual Opteron in 64 bit |
| mode, with 8GB of RAM. I've lowered it with mem=800M, but the effect is |
| the same. |
| |
| Assuming that mixed 2.6.14-.15 kernel is the fault, could you please |
| tell me what version should I use, for which all of the patches will |
| work? |
| |
| ====================================================================== |
| |
| Vladislav Bolkhovitin <vst@vlnb.net>, http://scst.sourceforge.net |