OVERVIEW.scsi
-------------

This file describes the way the iscsi_initiator utilizes the
interface provided by the Linux SCSI mid-level subsystem.

Error handling:
--------------

use_new_eh_code should be set to 1 in the Scsi_Host_Template.


Addressing:
----------

SCSI has a 4-level addressing scheme:
	Linux		SCSI				/proc/scsi/scsi
	name		name				name
	----		----				----
	host		Adaptor Number		Host
	bus			Channel Number		Channel
	target		Id Number			Id
	lun			Logical Unit Number	Lun

The "host" is a number assigned by the Linux SCSI subsystem to Host Bus
Adaptors as they register by calling scsi_register().  These numbers
are sequential, starting with 0.  If iSCSI is the only SCSI adaptor
on this system, then it will register as number 0.

The "bus" is always 0 for us -- it is supposed to be the number of
hardware buses an HBA can access.

The "target" is the number of the target device on the SCSI bus.
This corresponds to a single session in iSCSI, so that the first
session is target number 0, the second 1, etc.

The "lun" is always 0 in Linux.


queuecommand handling:
---------------------

The SCSI mid-level always does a:
	spin_lock_irqsave(&io_request_lock, flags);
prior to every call to the driver's queuecommand function in the
Scsi_Host_Template, and a
	spin_unlock_irqrestore(&io_request_lock, flags);
when that function returns back to the mid-level.  Therefore, while in
the iscsi_initiator_queuecommand() function, the "io_request_lock" is held
by the running process.

The value returned by the driver's queuecommand should be:
	==0	Success
	!=0	Failure


The SCSI mid-level always passes the function scsi_done() as the second
parameter on every call to the driver's queuecommand function in the
Scsi_Host_Template.

According to comments in "/usr/src/linux/drivers/scsi/hosts.h":

	The done() function must only be called after QueueCommand() has returned!!

However, a review of many drivers show that they often call done() before
returning from the QueueCommand() function when the command can be
completed immediately (usually due to an error).

The done() function must be called with the "io_request_lock" held by the
caller.

The result passed in the result field of the command when the driver calls
the mid-level's done() function is encoded as:
			byte 0	SCSI status code
			byte 1	SCSI 1-byte message
			byte 2	host error byte (i.e., DID_xxx << 16)
			byte 3	mid-level error return

DID_xxx host error codes are defined in "/usr/src/linux/drivers/scsi/scsi.h".
These are always shifted left by 16 bits to get them into byte 2!!
Some common codes are:
	DID_OK			0x00	No error
	DID_NO_CONNECT	0x01	No connect before timeout
	DID_BAD_TARGET	0x03	Bad target
	DID_ABORT		0x05	Command aborted by mid-level
	DID_ERROR		0x07	Internal error
	DID_RESET		0x08	Command reset by mid-level


For purposes of synchronization, we depend on the fundamental premise of the
Linux kernel, which is that a process or thread cannot be preempted once it
is running in kernel mode.

This means once it gets the cpu in kernel mode, a process or thread runs until
it itself gives up the cpu, and while running no other process or thread will
run in kernel mode on that cpu.  In an SMP system, there can be other processes
or threads in kernel mode on other cpus, which is why we need spinlocks.

Since the scsi mid-level always locks/unlocks the "io_request_lock" around
calls to the queuecommand(), eh_abort_handler(), eh_device_reset_handler(),
etc., and because the done() call-back function expects this same lock to
be held around a call to it, we will also use this same lock for our
functions in order to prevent race conditions in SMP systems.

Beware -- these spinlocks have not been tested yet on a multiprocessor system.


Processing PDUs sent by the target
----------------------------------

The headers for all PDUs sent by the target will be received by the rx_thread
for a connection (each connection has its own rx_thread).

The only opcodes that should be received from a target are:

													ITT			related command
	0x20	Nop In								F=1	optional	-none- or NopOut
	0x21	SCSI Response						F=1	required	SCSI Command
	0x22	Task Management Response	DSL=0	F=1	required	T.M. Request
	0x23	Login Response							required	Login Request
	0x24	Text Response							required	Text Request
	0x25	Data In									required	SCSI Command
	0x26	Logout Response				DSL=0	F=1	required	Logout Request
	0x31	R2T							DSL=0	F=1	required	SCSI Command
	0x32	Asynchronous Message				F=1	reserved	-none-
	0x3f	Reject								F=1	reserved	-none-


The rx_thread does some standard processing on all received PDUs
before dispatching to a pdu-specific routine based on the opcode
field of the received PDU.  This standard processing includes:

1.	A check is made that the TotalAHSLength field (byte at offset 4 in
	each pdu) is zero (because we do not yet support Additional Header
	Segments of any kind).  This error is fatal.

2.	If Header Digests are in use, the digest is computed for the 48-byte
	pdu header.  If the computed CRC does not equal the CRC contained at
	the end of the header (4 bytes at offset 48 in each pdu) then an error
	message is printed.  At this point the PDU should be discarded.
	However, for now, we just keep going as if the CRC had been correct.
	This will change when we implement error recovery properly.

3.	A check is made that the old X bit (bit 7 in the byte at offset 0 in
	each pdu) is 0.  A warning is given if it is not.

4.	For drafts 9 and earlier, a check is made that the I bit (bit 6 in the
	byte at offset 0 in each pdu) is set to 1.  For drafts later than 9, the
	opposite check is made -- that the I bit (bit 6 in the byte at offset 0
	in each pdu) is set to 0 (reserved).

5.	A check is made that the opcode (the low-order 6 bits in the byte at
	offset 0 in each pdu) is legal (see the table above).  This error is
	fatal.

6.	A check is made that the F bit (bit 7 in the byte at offset 1 in each
	pdu) is 1.  If it is 0, a further check is made that the attributes
	of the reply opcode allow F = 0 and if not, a warning messages is
	printed.

7.	A check is made that the ITT (the 4 bytes at offset 16 in each pdu) is
	legal (see the table above).  There are 3 cases:

	1.	ITT is reserved	- for drafts 9 and earlier, its value should be 0.
						- for drafts after 9, its value should be 0xffffffff.
	2.	ITT is optional	- if value is 0xfffffff then it is reserved, else
						  search pending commands list for related command
						  (which must be found and match the expected opcode).
	3.	ITT is required	- its value should NOT be 0xffffffff.  search pending
						  commands list for related command (which must be
						  found and match the expected opcode).

8.	A check is made that if the LUN (the 8 bytes at offset 8 in each pdu)
	is not 0 then the opcode allows this.  A warning is given if not.

9.	A check is made that if the Data Segment Length (the 3 bytes at offset
	5 in each pdu) is not 0 then the opcode allows this.  This error is fatal.

10.	When the DSL is legally > 0, it is checked against MaxRecvPDULength
	(for drafts 9 and later) or DataPDULength (for drafts before 9).

11.	For all response PDUs except DataIn, if the DSL is > 0 then read the
	attached data into a newly allocated buffer, and if Data Digests are in
	use, compute the digest and check it against the received digest value.
	This is all done in the function recv_locally().  Note that this is
	not done for DataIn PDUs because the data sent in them will be read
	directly into the buffers reserved for the related SCSI READ operation
	by the SCSI Mid-Level.  This read is done in the DataIn specific routine
	rx_data().

12.	For all response PDUs except DataIn with the S-bit NOT set to 1, the
	StatSN field received in the response PDU's header is checked against
	the exp_stat_sn value for the connection on which the PDU was received.
	If they are not equal, an error message is given.  For all those PDUs
	in which the StatSN value is checked, except for R2T or NopIn with
	a valid Target Transfer Tag (i.e., TTT != 0xffffffff), the
	exp_stat_sn value for the connection is then incremented by 1.

13.	For all response PDUs, the ExpCmdSN field received in the response PDU's
	header is checked against the cur_cmd_sn value for the session.
	If they are not equal, an error messages is given.

After all this common processing on a response PDU received from a target
has been completed, the rx_thread dispatches (via a switch statement on the
response PDU's opcode) to an opcode-specific routine with a name of the form
rx_xxx, where xxx indicates the specific opcode (in abbreviated form).

Once the opcode-specific rx_xxx routine returns, any data buffer allocated
in step 11 above is freed unless the opcode-specific routine indicated that
it is still in use.  At present, the only time it would be still in use is
if it is being transmitted back to the target in a NopOut ping operation.
