OVERVIEW.txt
------------

	vi: set autoindent tabstop=4 shiftwidth=4 :
	this document is written in ordinary ASCII text with tabs set
		to a width of 4.


The UNH iscsi reference initiator is implemented as a loadable module
for the Linux kernel.  This document is the beginnings of a document
to explain the internals of this module.

The login phase will be described in a separate document.  This
document explains the general structure and the internal organization
of the Full Feature Phase


Threads
-------

Once the login phase is complete and the connection to a target has been
established, there are the following threads running:

	1 tx_thread for each session
	1 rx_thread for each connection

Multiple connections per session, and multiple sessions per initiator
are both supported.

The tx_thread does all the sending on all connections belonging to a session.
The rx_thread does all the receiving on one connection.

Most of the time the rx_thread is blocked on a call to sock_recvmsg,
waiting for a PDU to arrive from the target.  Also most of the time
the tx_thread is blocked on a semaphore waiting for an "up" signal that
a new PDU has been enqueued for transmission.  There are 3 sources
for a new PDU:

	1.	the SCSI midlevel calls the iscsi_initiator_queuecommand() function
		with a SCSI CDB to be delivered to the target.  This function will
		create a SCSI Command PDU and enqueue it for the tx_thread.

	2.	the SCSI midlevel calls the iscsi_initiator_abort() function in
		order to abort a previously issued command.  This function will
		create a Task Management PDU and enqueue it for the tx_thread.

	3.	the rx_thread can generate a reply to a PDU it has received.
		At present there are two types of PDUs that can be generated in this
		manner -- a DataOut PDU is generated as a reply to an R2T sent by
		the target, and a NopOut PDU is generated as a reply to a NopIn sent
		by the target (that was not a reply to a previously issued NopOut).

Session-specific signaling semaphores
-------------------------------------

Each session has three signaling semaphores defined in the session structure:

	1.	tx_sem
	2.	tx_done_sem
	3.	task_mgt_sem

tx_sem is initialized to locked when a session is created, and the tx_thread
normally blocks on a call to down_interruptible() on this semaphore.  Whenever
the rx_thread or the functions called by the SCSI midlevel enqueue a new PDU
to be transmitted to the target associated with this session, they do an
up() on the session's tx_sem to wake up the tx_thread (which will cause
the tx_thread to actually transmit the PDU on the proper connection in
the proper order).

tx_done_sem is also initialized to locked when a session is created.  A new
session is created via the /proc interface, so the thread that creates a
session is the one that is processing the write to the /proc interface, which
is handled by our iscsi_initiator_proc_info() function.  After all the
data structures for the new session have been correctly created and
initialized, and the login phase has successfully concluded, the tx_thread
is started.  The /proc thread does a down() on tx_done_sem to wait until
the tx_thread has actually started and initialized itself, at which time
it does the corresponding up() on tx_done_sem.

Once going, the tx_thread does not do another up() on the tx_done_sem until
it terminates itself for any reason.  The function close_session() is called
to terminate a session, which is triggered by two different sources (each
running in its own thread created by the user-level command): if the module
is unloaded using /sbin/rmmod, at which time our iscsi_initiator_release()
function is called; and if the session is terminated by a write to the
/proc interface, as handled by our iscsi_initiator_proc_info() function.
The close_session() function will send a kill signal to the tx_thread and then
block on a down_interruptible() on the tx_done_sem to wait until the tx_thread
shuts itself down cleanly.

task_mgt_sem is also initialized to locked when a session is created.
When the iscsi_initiator_abort() function is called by the SCSI midlevel,
a task management command is queued for the tx_thread, a local timer is
activated to wait for at most 3 seconds for a response from the target,
and then iscsi_initiator_abort() does a down() on the task_mgt_sem for the
session.  If the timer expires before a response is received, the interrupt
service routine for this timer, abort_timer_function(), will do an up()
on the task_mgt_sem.  If a response is received, it will be processed by
rx_task_mgt_rsp(), which also does an up() on the task_mgt_sem for the
session.


Connection-specific signaling semaphores
----------------------------------------

Each connection has one signaling sempahore defined in the connection
structure:

	1.	rx_done_sem

rx_done_sem is initialized to locked when a connection is created, and the
rx_thread does an up() on it when it terminates for any reason.  The function
close_connection() will send a kill signal to the rx_thread and then block on
a down_interruptible() on the rx_done_sem to wait until the rx_thread shuts
itself down cleanly.

rx_done_sem is initialized to locked when a connection is created.  A new
connection is created via the /proc interface, so the thread that creates a
connection is the one that is processing the write to the /proc interface,
which is handled by our iscsi_initiator_proc_info() function.  After all the
data structures for the new connection have been correctly created and
initialized, and the login phase has successfully concluded, the rx_thread
is started.  The /proc thread does a down() on rx_done_sem to wait until
the rx_thread has actually started and initialized itself, at which time
it does the corresponding up() on rx_done_sem.

Once going, the rx_thread does not do another up() on the rx_done_sem until
it terminates itself for any reason.  The function close_connection() is
called to terminate a connection, which is triggered by two different sources
(each running in its own thread created by the user-level command): if the
module is unloaded using /sbin/rmmod, at which time our
iscsi_initiator_release() function is called; and if the connection is
terminated by a write to the /proc interface, as handled by our
iscsi_initiator_proc_info() function.  The close_connection() function will
send a kill signal to the rx_thread and then block on a down_interruptible()
on the rx_done_sem to wait until the rx_thread shuts itself down cleanly.


Timers
------

There are is one local timer created and destroyed for each task management
command, as explained above under the description of the task_mgt_sem.

There is also one session-specific timer called tx_timer.  This is created
and initialized in init_session(), which is called by create_session()
whenever a new session is created.  When it starts up, the tx_thread
calls restart_tx_timer() with the "nop_period" converted to jiffies.
The nop_period is a session-specific variable that can be "forced" by
the iscsi_manage program to a non-zero value which is the nop period in
seconds.  What this means is if there is no activity received on a
connection during one nop period, the initiator will send a NopOut PDU
to the target to "ping" it to see if it is still alive.  This NopOut PDU
has data attached, and a NopIn response from the target with the same
data attached is expected.


restart_tx_timer() activates the session-specific tx_timer with the
period supplied as a parameter, assuming certain other sanity conditions
are met.  When this timer expires, the kernel calls the deal_with_tx_timer()
function, which does an up() on the session-specific tx_sem (thereby
waking the tx_thread if it is blocked) and then calls restart_tx_timer()
again to keep the timer running.  If the tx_thread itself receives an
EAGAIN error from a call to sock_sendmsg(), meaning that data could not
be written to a socket because it was backed up, then the tx_thread
also calls restart_tx_timer() with a value of TX_RESTART_PERIOD (10)
and then blocks again on the tx_sem for that session.  This gives the
TCP/IP sublayers 10 jiffies (100 milliseconds) to clear the buffers
before the tx_thread tries again to send data.


recv_a_message()
----------------

After login has been completed, there is only a single place where
sock_recvmsg() is called by the isci_initiator, and this call is in the
main input routine called recv_iovector().  There are 3 parameters to
recv_iovector():

	1.	A pointer to the connection structure on which to read.
	2.	The number of bytes to read, which MUST be positive.
	3.	The number of io vectors which have been set up in the connection's
		rx_iov array.

recv_iovector() uses the rx_msg area of the connection structure
to set up the message header required by sock_recvmsg(), but the i/o vector
must have been previously set up in the connection's rx_iov array by the
caller to sock_recvmsg().

The caller to recv_iovector() MUST hold the session->sess_lock spinlock prior
to the call, with the flags saved in the lock_flags field for the current
session.  Just before calling sock_recvmsg(), recv_iovector() releases
this lock, thereby releasing the global exclusive access in the
(highly likely) case that the sock_recvmsg() call will block until data
arrives from the target.  When sock_recvmsg() returns, successfully or not,
recv_iovector() gets the session->sess_lock spinlock again to regain global
exclusive access to all the session structures.  If sock_recvmsg() fails or is
interrupted by a signal, recv_iovector() releases the lock and returns a
negative error code.  If sock_recvmsg() succeeds, recv_iovector() returns the
number of bytes received (which will always be equal to the number of bytes
expected, because recv_iovector() will loop until this number has been received
or an error occurs).  It is for this reason that the parameter specifying
the number of bytes to read MUST be positive when recv_iovector() is called.

To summarize, there are 2 possible return values from recv_iovector():

	1.	> 0	Success, io_request_lock is locked
	2.	< 0 Failure, io_request_lock is NOT locked

There are only 2 calls to recv_iovector() in the iscsi initiator:

	1.	From recv_pdu_header() to read the next PDU header.
	2.	From recv_data_in_data() to read data into a SCSI scatter-gather list.


recv_pdu_header()
---------------

This routine is called in order to set up the connection's rx_iov
to read a 48-byte or 52-byte header (if header digests are in use)
into the connections rx_buf area.

recv_data_in_data()
-----------------

This routine is called in order to set up the connection's rx_iov
to read data into scsi buffers.  This routine maps the scsi midlevel's
scatter-gather list into the necessary iovector slots, and takes care
of adding padding and/or the data digest and/or overflow data.



Key Parameters
--------------

1.	The number of key parameters recognized in this implementation of Draft 20
	is given in the file "common/text_param.h" as:

		#define	MAX_CONFIG_PARAMS	28


2.	The struct parameter_type is defined in the file "common/text_param.h" as:

		struct parameter_type	{
								unsigned int		type;
								unsigned int		int_value;
								char				*str_value;
								char				*value_list;
								unsigned int		neg_info;
								unsigned long long	special_key_flag;
								};

	type				- bit-set that defines type and attributes of each param
	int_value			- valid only if type is NUMBER or NUMBER_RANGE; else 0
	str_value			- valid only if type is not NUMBER; else NULL
	value_list			- valid only if type is not NUMBER; else NULL
	neg_info			- bit-set that defines negotiation options
	special_key_flag	- bit-set that defines "special" processing


3.	The possible values for each field are given as symbolic constants in
	the file "common/text_param.h".  Each parameter must have a "primary type"
	indication that must be chosen from 1 of the following bit constants:

		NUMBER			- a numeric value
		STRING			- an arbitrary (non-enumerated) string
		ENUMERATED		- an enumerated string
		BOOL_AND		- a boolean
		BOOL_OR			- a boolean
		NUMBER_RANGE	- a range of 2 numeric values


4.	The "value selection" bit in the "type" field of struct parameter_type
	further qualifies the "primary type", again by setting a single bit.

	If the "primary type" is NUMBER, then one of the following bits defines
	the range of numbers that are acceptable:

		ONE_TO_65535		- range [1..65535]
		N512_TO_16777215	- range [512..16777215]
		ZERO_TO_3600		- range [0..3600]
		ZERO_TO_2			- range [0..2]
		ZERO_TO_65535		- range [0..65535]

	If the "primary type" is STRING, then the following bits define what
	characters are legal in the string:

		UTF_8				- UTF-8 string
		ISCSI_NAME			- iSCSI name
		TARGET_ADDRESS_TYPE	- domainname:port,portal

	If the "primary type" is ENUMERATED, then the following bits define what
	list of key values are allowed in the string:

		DIGEST_PARAM		- key is HeaderDigest or DataDigest
		AUTH_PARAM			- key is an Authorization Key
		DISCOVERY_NORMAL	- key accepts <normal|discovery>


5.	In addition to the "primary type" and "value selection" bits, the following
	bits can also be set for certain parameters:

	If the "primary type" is NUMBER:

		MIN_NUMBER		- an integer whose selection function is MIN
		MAX_NUMBER		- an integer whose selection function is MAX


6.	The table "config_params" contains one entry for each key parameter
	defined in iscsi.  This table is defined and initialized at compile-
	time in the file "common/text_param.h" as:

		struct parameter_type config_params[MAX_CONFIG_PARAMS] = {
						{}, {}, ... {} };

	It is remains constant during the lifetime of the scsi_initiator module.


7.	When the iscsi_initiator module is loaded, the function
	"iscsi_initiator_detect" is called by the SCSI Mid-level to detect
	all occurences of "iscsi adaptor" cards, of which there is only 1.
	This function registers this adaptor card with the Mid-level,
	and gets back a pointer to the "struct Scsi_Host" structure allocated
	by the Mid-level for this card, which is stored in the global variable
	global_host.  The end of this structure is itself the structure
	struct iscsi_hostdata hostdata, which is the "private" data for this
	adaptor and is defined in the file "iscsi_initiator.h".

		struct iscsi_hostdata	{
						unsigned int	force;
						unsigned int	nop_period;
						unsigned int	version_number;
						struct session	*session_head;
						struct parameter_type	(*param_tbl)[MAX_CONFIG_PARAMS];
								};

	The global variable global_hostdata is set to point to this structure.
	As part of initializing the newly-loaded module, the "param_tbl" field
	of this structure is dynamically allocated, and an exact copy of the
	global "config_params" array is copied into it.

	From this point onward, it is this table (global_hostdata->param_tbl)
	that is modified when configuration changes are triggered via a write
	by the user to the /proc/scsi/iscsi_initiator interface of a line of the
	form:

		iscsi_initiator manage n key=value 

	where "n" is an integer number indicating what to do.  Note that if the
	value of "n" indicates a "reset" function, then entire table
	"global_hostdata->param_tbl" will be reset to a new copy of the
	unmodified "config_params" table.


8.	New sessions are triggered via a write by the user to the
	/proc/scsi/iscsi_initiator interface of a line of the form:

		iscsi-initiator ip 1234abcd port 4000 target 6 lun 0

	This causes the function "create_session" to be called in order to create
	a new "struct session" structure, initialize it, use it to login to a
	target, and if that is successful, to then start up the read and write
	threads for the new session.  As part of this initialization, the table

		session->session_params

	of type

		struct parameter_type (*session_params)[MAX_CONFIG_PARAMS];

	is dynamically allocated and filled with an exact copy of the

		global_hostdata->param_tbl

	From this point onward, these are the default parameters for this session,
	and this table is used as the basis for all negotiations during the
	login phase.  It will be modified during the login phase to reflect the
	results of the negotiations.


9.	If the login is successful, a final (fourth) table is filled in with
	the final results of the login negotiations (the table is also
	dynamically allocated, but this is actually done earlier, during session
	initialization).  This is the table:

		session->oper_param

	of type

		struct operational_parameters	*oper_param;

	From this point onward, the values in this table are the ones used
	to control operation during the Full Feature Phase.

	This table has type

		struct operational_parameters {};

	which is defined in the file "common/iscsi_common.h".

	Note that this final (fourth) table does not have the same format as the
	previous 3 tables.  That is because this table is never used for
	negotiations, and therefore does not contain key names or attributes
	necessary for negotiations.  Rather, it simply contains the final values
	arrived at by the negotiations, in a form in which they can be easily
	used during Full Feature Phase operation.  The function
	"set_session_parameters()", defined in the file "common/text_param.h",
	does the work of converting the format from the "struct parameter_type"
	format to the "struct operational_parameters" format.

Connection States
-----------------

A connection is always in one of 7 possible states:
	CONNECTION_NOT_PRESENT
	CONNECTION_CONNECTED
	CONNECTION_LOGGED_IN
	CONNECTION_FULL_FEATURE_PHASE
	CONNECTION_RECOVERY
	CONNECTION_LOGGED_OUT
	CONNECTION_DISCONNECTED

CONNECTION_NOT_PRESENT
	Transitions into this state:
		1.	Initial value set in build_session_skeleton() when connection
			structure is first created and added to session->connection_head.
		2.	from CONNECTION_CONNECTED in init_connection() when an error
			occurs after a TCP connection was established
		3.	from CONNECTION_FULL_FEATURE_PHASE in close_connection() when TCP
			connection has been disconnected, rx thread has terminated,
			rx buffers have been freed, and /proc entry for this connection
			has been removed.
	Transitions out of this state:
		1.	to CONNECTION_CONNECTED in init_connection() when a TCP connection
			is successfully established

CONNECTION_CONNECTED
	Transitions into this state:
		1.  from CONNECTION_NOT_PRESENT in init_connection() after a TCP
			connection to a target has been successfully established, all
			the rx buffers have been created, and the /proc entry for this
			connection has been established.
	Transitions out of this state:
		1.	to CONNECTION_LOGGED_IN in iscsi_initiator_login() after an
			iscsi login phase has completed successfully.

CONNECTION_LOGGED_IN
	Transitions into this state:
		1.	from CONNECTION_CONNECTED in iscsi_initiator_login() after an
			iscsi login phase has completed successfully.
	Transitions out of this state:
		1.	to CONNECTION_FULL_FEATURE_PHASE in create_session() after the
			rx_thread for this connection has started successfully.

CONNECTION_FULL_FEATURE_PHASE
	Transitions into this state:
		1.	from CONNECTION_LOGGED_IN in create_session() after the
			rx_thread for this connection has started successfully.
	Transitions out of this state:
		1.	to CONNECTION_NOT_PRESENT in close_connection() when the TCP
			connection has been disconnected, rx thread has terminated,
			rx buffers have been freed, and /proc entry for this connection
			has been removed.
		2.	to CONNECTION_LOGGED_OUT in drive_logout() when a
			Logout Request PDU has been attached as a pending command.
			This prevents the tx thread from sending pdus other than
			those with opcode = Logout Request on this connection.

CONNECTION_LOGGED_OUT
	Transitions into this state:
		1.	from CONNECTION_FULL_FEATURE_PHASE in drive_logout() when a
			Logout Request PDU has been attached as a pending command.
			This prevents the tx thread from sending pdus other than
			those with opcode = Logout Request on this connection.

CONNECTION_DISCONNECTED
	Transitions into this state:
		1.	from CONNECTION_FULL_FEATURE_PHASE in iscsi_initiator_rx_thread()
			when the rx thread terminates after freeing any pending commands.
