The Linux SCSI Generic (sg) HOWTO

Douglas Gilbert

2002-05-03

Revision History
Revision 1.22002-05-03Revised by: dpg
ENOMEM, EPERM; DRIVER_SENSE->CHECK_CONDITION
Revision 1.12002-01-26Revised by: dpg
corrections, host_status, odd dxfer_len
Revision 1.02001-12-21Revised by: dpg
original, displace SCSI-PROGRAMMING-HOWTO

Table of Contents
1. Introduction
2. What the sg driver does
3. Identifying the version of the SG driver
4. Interface
5. Theory of operation
6. The sg_io_hdr_t structure in detail
6.1. interface_id
6.2. dxfer_direction
6.3. cmd_len
6.4. mx_sb_len
6.5. iovec_count
6.6. dxfer_len
6.7. dxferp
6.8. cmdp
6.9. sbp
6.10. timeout
6.11. flags
6.12. pack_id
6.13. usr_ptr
6.14. status
6.15. masked_status
6.16. msg_status
6.17. sb_len_wr
6.18. host_status
6.19. driver_status
6.20. resid
6.21. duration
6.22. info
7. System calls
7.1. open()
7.2. write()
7.3. read()
7.4. poll()
7.5. close()
7.6. mmap()
7.7. fcntl(sg_fd, F_SETFL, oflags | FASYNC)
7.8. Errors reported in errno
8. Ioctl()s
8.1. SG_IO
8.2. SG_GET_ACCESS_COUNT
8.3. SG_SET_COMMAND_Q (and _GET_)
8.4. SG_SET_DEBUG
8.5. SG_EMULATED_HOST
8.6. SG_SET_KEEP_ORPHAN (and _GET_)
8.7. SG_SET_FORCE_LOW_DMA
8.8. SG_GET_LOW_DMA
8.9. SG_NEXT_CMD_LEN
8.10. SG_GET_NUM_WAITING
8.11. SG_SET_FORCE_PACK_ID
8.12. SG_GET_PACK_ID
8.13. SG_GET_REQUEST_TABLE
8.14. SG_SET_RESERVED_SIZE (and _GET_ )
8.15. SG_SCSI_RESET
8.16. SG_GET_SCSI_ID
8.17. SG_GET_SG_TABLESIZE
8.18. SG_GET_TIMEOUT
8.19. SG_SET_TIMEOUT
8.20. SG_SET_TRANSFORM
8.21. SG_GET_TRANSFORM
8.22. Sg ioctls removed in version 3
8.23. SCSI_IOCTL_GET_IDLUN
8.24. SCSI_IOCTL_GET_PCI
8.25. SCSI_IOCTL_PROBE_HOST
8.26. SCSI_IOCTL_SEND_COMMAND
9. Direct and Mmap-ed IO
9.1. Direct IO
9.2. Mmap-ed IO
10. Driver and module initialization
11. Sg and the "proc" file system
11.1. /proc/scsi/sg/debug
12. Asynchronous usage of sg
A. Sg3_utils package
B. sg_header, the original sg control structure
C. Programming example
D. Debugging
E. Other references

Chapter 1. Introduction

This document outlines the Linux SCSI Generic (sg) driver interface as found in the 2.4 series kernels. The driver's purpose is to allow SCSI commands to be sent directly to SCSI devices. The responses of those commands can then be obtained. This type of driver is sometimes termed as a "pass through". In the case of SCSI disks, the block subsystem which is normally used to mount and access a disk, is bypassed permitting low level operations such as formatting to be performed. Various specialized applications for writing CD-Rs and document scanning use the sg driver.

Many devices that use other physical buses (e.g. ATAPI cdroms, USB mass storage devices and IEEE 1394 sbp2 devices) utilize the SCSI command set. By using Linux pseudo SCSI device drivers which bridge between the native protocol stack and the SCSI subsystem, the upper level SCSI device drivers, including sg, can be used to control "non-SCSI" devices.

This is the third major version of the sg driver. A summary of the sg driver history is as follows:

This document can be found at the Linux Documentation Project's site at www.linuxdoc.org/HOWTO/SCSI-Generic-HOWTO/ . It is available in plain text and pdf renderings at that site. A (possibly later) version of this document can be found at www.torque.net/sg/p/sg_v3_ho.html. That is a single html page; drop the ".html" extension for multi-page html. There are also postscript, pdf and rtf renderings from the original SGML (docbook) file at the same location.

A more general description of the Linux SCSI subsystem of which sg is a part can be found in the SCSI-2.4-HOWTO.

This document was last modified on 3rd May 2002.


Chapter 2. What the sg driver does

The sg driver permits user applications to send SCSI commands to devices that understand them. SCSI commands are 6, 10, 12 or 16 bytes long [1]. The SCSI disk driver (sd), once device initialization is complete, only sends SCSI READ and WRITE commands. There a several other interesting things one might want to do, for example, perform a low level format or turn on write caching.

Associated with some SCSI commands there is data to be written to the device. A SCSI WRITE command is one obvious example. When instructed, the sg driver arranges for data to be transferred to the device along with the SCSI command. It is possible that the lower level driver (often known as the "Host Bus Adapter" [HBA] or simply "adapter" driver) is unable to send the command to the device. An example of this occurs when the device does not respond in which case a 'host_status' or 'driver-status' error will be conveyed back to the user application.

All going well the SCSI command (and optionally some data) are conveyed to the device. The device will respond with a single byte value called the 'scsi_status'. GOOD is the scsi status indicating everything has gone well. The most common other status is CHECK CONDITION. In this latter case, the SCSI mid level issues a REQUEST SENSE SCSI command The response of the REQUEST SENSE is 18 bytes or more in length and is called the "sense buffer". It will indicate why the original command may not have been executed. It is important to realize that a CHECK CONDITION may vary in severity from informative (e.g. command needed to be retried before succeeding) to fatal (e.g. "medium error" which often indicates it is time to replace the disk).

So in all cases a user application should check the various status values. If necessary the "sense buffer" will be copied back to the user application. SCSI commands like READ convey data back to the user application (if they succeed). The sg driver arranges for this data transfer from the device to the user space, if necessary.

The description so far has concentrated on a disk device, but in reality the sg driver is not needed very often for disks because there already is a purpose built device driver for that: sd. The same is true of reading audio and data CDs (sr [scd]) and tapes (st). However scanners that understand the SCSI command set and CDR "burning" programs tend to use the sg driver. Other applications include tape "robots" and music CD "ripping".

To find out more about SCSI (draft) standards and resources visit www.t10.org. To use the sg device driver you should be familiar with the SCSI commands supported by the device that you wish to control. Getting hold of such information for devices like scanners can be quite challenging (if the vendor does not provide it).

The first SCSI command sent to a SCSI device when it is initialized is an INQUIRY. All SCSI devices should respond promptly to an INQUIRY supplying information such as the vendor, product designation and revision. Appendix C shows the sg driver being used to send an INQUIRY and print out some of the information in the response.


Chapter 3. Identifying the version of the SG driver

Earlier versions of the sg device driver either have no version number (e.g. the original driver) or a version number starting with "2". The drivers that support this new interface have a major version number of "3". The sg version numbers are of the form "x.y.z" and the single number given by the SG_GET_VERSION_NUM ioctl() is calculated by (x * 10000 + y * 100 + z). The sg driver discussed here will yield a number greater than or equal to 30000 from SG_GET_VERSION_NUM. The version number can also be seen using cat /proc/scsi/sg/version in the new driver. This document describes sg version 3.1.24 for the lk 2.4 series. Where some facility has been added during the lk 2.4 series (e.g. mmap-ed IO) and hence is not available in all versions of the lk 2.4 series, this is noted. [2]

Here is a list of sg versions that have appeared to date during the lk 2.4 series.


Chapter 4. Interface

This driver supports the following system calls, most of which are typical for a character device driver in Linux. They are:

The interface to these calls as seem from Linux applications is well documented in the "man" pages (in section 2).

A user application accesses the sg driver by using the open() system call on sg device file name. Each sg device file name corresponds to one (potentially) attached SCSI device. These are usually found in the /dev directory. Here are some sg device file names:
$ ls -l /dev/sg[01]
crw-rw----    1 root     disk      21,   0 Aug 30 16:30 /dev/sg0
crw-rw----    1 root     disk      21,   1 Aug 30 16:30 /dev/sg1
The leading "c" at the front of the permissions indicates a character device. The absence of read or write permissions for "others" is prudent security. The major number of all sg device names is 21 while the minor number is the same as the number following "sg" in the device file name. When the device file system (devfs) is active on a system then the primarily sg device file names are found at the bottom of an informative subtree:
$ cd /dev/scsi/host1/bus0/target0/lun0
$ ls -l generic
crw-r-----    1 root     root      21,   1 Dec 31  1969 generic
Under devfs (when its daemon [devfsd] is running) there would usually be a symbolic link from /dev/sg1 to /dev/scsi/host1/bus0/target0/lun0/generic. This is so existing applications looking for the abridged device file name will not be surprised. One advantage of devfs is that only attached SCSI devices appear in the /dev/scsi subtree.

A significant addition in sg v3 is an ioctl() called SG_IO which is functionally equivalent to a write() followed by a blocking read(). In certain contexts the write()/read() combination have advantages over SG_IO (e.g. command queuing) and continue to be supported.

The existing (and original) sg interface based on the sg_header structure is still available using a write()/read() sequence as before. The SG_IO ioctl will only accept the new interface based on the sg_io_hdr_t structure.

The sg v3 driver thus has a write() call that can accept either the older sg_header structure or the new sg_io_hdr_t structure. The write() calls decides which interface is being used based on the second integer position of the passed header (i.e. sg_header::reply_len or sg_io_hdr_t::dxfer_direction). If it is a positive number then the old interface is assumed. If it is a negative number then the new interface is assumed. The direction constants placed in 'dxfer_direction' in the new interface have been chosen to have negative values.

If a request is sent to a write() with the sg_io_hdr_t interface then the corresponding read() that fetches the response must also use the sg_io_hdr_t interface. The same rule applies to the sg_header interface.

This document concentrates on the sg_io_hdr_t interface introduced in the sg version 3 driver. For the definition of the older sg_header interface see the sg version 2 documentation. A brief description is given in Appendix B.


Chapter 5. Theory of operation

The path of a request through the sg driver can be broken into 3 distinct stages:

  1. The request is received from the user, resources are reserved as required (e.g. kernel buffer for indirect IO). If necessary, data in the user space is transferred into kernel buffers. Then the request is submitted to the SCSI mid level (and then onto the adapter) for execution. The SCSI mid level maintains a queue so the request may have to wait. If a SCSI device supports command queuing then it may be able to accommodate multiple outstanding requests.

  2. Assuming the SCSI adapter supports interrupts, then an interrupt is received when the request is completed. When this interrupt arrives the data transfer is complete. This means that if the SCSI command was a READ then the data is in kernel buffers (indirect IO) or in user buffers (direct or mmap-ed IO). The sg driver is informed of this interrupt via a kernel mechanism called a "bottom half" handler. Some kernel resources are freed up.

  3. The user makes a call to fetch the result of the request. If necessary, data in kernel buffers is transferred to the user space. If necessary, the sense buffer is written out to the user space. The remaining kernel resources associated with this request are freed up.

The write() call performs stage 1 while the read() call performs stage 3. If the read() call is made before stage 2 is complete then it will either wait or yield EAGAIN (depending on whether the file descriptor is blocking or not). If asynchronous notification is being used then stage 2 will send a SIGPOLL signal to the user process. The poll() system call will show this file descriptor is now readable (unless it was sent by the SG_IO ioctl()).

The SG_IO ioctl() performs stage 1, waits for stage 2 and then performs stage 3. If the file descriptor in question is set O_NONBLOCK then SG_IO will ignore this and still block! Also a SG_IO call will not effect the poll() state nor cause a SIGPOLL signal to be sent. If you really want non-blocking operation (e.g. for command queuing) then don't use SG_IO; use the write() read() sequence instead.

For more information about normal (or indirect), direct and mmap-ed IO see Chapter 9 .

Currently the sg driver uses one Linux major device number (char 21) which in the lk 2.4 series limits it to handling 256 SCSI devices. Any attempt to attach more than this number will rejected with a message being sent to the console and the log file. [3]


Chapter 6. The sg_io_hdr_t structure in detail

The main control structure for the version 3 SCSI generic driver has a struct tag name of "sg_io_hdr" and a typedef name of "sg_io_hdr_t". The structure is shown in abridged form below. The "[i]" notation indicates an input value while "[o]" indicates a value that is output. The "[i->o]" indicates a value that is conveyed from input to output and apart from one special case, is not used by the driver. The "[i->o]" members are meant to aid an application matching the request sent to a write() to the corresponding response received by a read(). For pointers the "[*i]" indicates a pointer that is used for reading from user memory into the driver, "[*o]" is a pointer used for writing, and "[*io]" indicates a pointer used for either reading or writing.
typedef struct sg_io_hdr
{
    int interface_id;           /* [i] 'S' (required) */
    int dxfer_direction;        /* [i] */
    unsigned char cmd_len;      /* [i] */
    unsigned char mx_sb_len;    /* [i] */
    unsigned short iovec_count; /* [i] */
    unsigned int dxfer_len;     /* [i] */
    void * dxferp;              /* [i], [*io] */
    unsigned char * cmdp;       /* [i], [*i]  */
    unsigned char * sbp;        /* [i], [*o]  */
    unsigned int timeout;       /* [i] unit: millisecs */
    unsigned int flags;         /* [i] */
    int pack_id;                /* [i->o] */
    void * usr_ptr;             /* [i->o] */
    unsigned char status;       /* [o] */
    unsigned char masked_status;/* [o] */
    unsigned char msg_status;   /* [o] */
    unsigned char sb_len_wr;    /* [o] */
    unsigned short host_status; /* [o] */
    unsigned short driver_status;/* [o] */
    int resid;                  /* [o] */
    unsigned int duration;      /* [o] */
    unsigned int info;          /* [o] */
} sg_io_hdr_t;  /* 64 bytes long (on i386) */


6.2. dxfer_direction

The type of dxfer_direction is int. This is required to be one of the following:

The value SG_DXFER_NONE should be used when there is no data transfer associated with a command (e.g. TEST UNIT READY). The value SG_DXFER_TO_DEV should be used when data is being moved from user memory towards the device (e.g. WRITE). The value SG_DXFER_FROM_DEV should be used when data is being moved from the device towards user memory (e.g. READ).

The value SG_DXFER_TO_FROM_DEV is only relevant to indirect IO (otherwise it is treated like SG_DXFER_FROM_DEV). Data is moved from the user space to the kernel buffers. The command is then performed and most likely a READ-like command transfers data from the device into the kernel buffers. Finally the kernel buffers are copied back into the user space. This technique allows application writers to initialize the buffer and perhaps deduce the number of bytes actually read from the device (i.e. detect underrun). This is better done by using 'resid' if it is supported.

The value SG_DXFER_UNKNOWN is for those (rare) situations where the data direction is not known. It may be useful for backward compatibility of existing applications when the relevant direction information is not available in the sg interface layer. There is a (minor) performance "hit" associated with choosing this option (e.g. on the PCI bus). Some recent pseudo device drivers (e.g. USB mass storage) may have problems handling this value (especially on vendor-specific SCSI commands).

N.B. 'dxfer_direction' must have one of the five indicated values and cannot be uninitialized or zero.

If 'dxfer_len' is zero then all values are treated like SG_DXFER_NONE.


6.6. dxfer_len

This is the number of bytes to be moved in the data transfer associated with the command. The direction of the transfer is indicated by 'dxfer_direction'. If 'dxfer_len' is zero then no data transfer takes place. [5]

If iovec_count is non-zero then 'dxfer_len' should be equal to the sum of iov_len lengths. If not, the minimum of the two is the transfer length. The type of dxfer_len is unsigned int.


6.10. timeout

This value is used to timeout the given command. The units of this value are milliseconds. The time being measured is from when a command is sent until when sg is informed the request has been completed. A following read() can take as long as the user likes. Timeouts are best avoided, especially if SCSI bus resets will adversely effect other devices on that SCSI bus. When the timeout expires, the SCSI mid level attempts error recovery. Error recovery completes when the first action in the following list is successful. Note that a more extreme measure is being taken at each step.

  • the SCSI command that has timed out is aborted [6]

  • a SCSI device reset is attempted

  • a SCSI bus reset is attempted. Note this may have an adverse effect on other devices sharing that SCSI bus.

  • a SCSI host (bus adapter) reset is attempted. This is an attempt to re-initialize the adapter card associated with the SCSI device that has the timed out command.

If all these fail then the device may be set "offline" which means that it is no longer accessible (except by this driver when open()-ed O_NONBLOCK) until the machine is rebooted. Offline devices still appear in the cat /proc/scsi/scsi listing. The last column of the cat /proc/scsi/sg/devices listing shows the online/offline status of a device ("1" means online while "0" is offline). The exact status returned depends on which level of error recovery succeeded. Most likely the 'host_status' will be set to DID_ABORT or DID_RESET.

The two error statuses containing the word "TIME(_)OUT" are typically _not_ related to a command timing out. DID_TIME_OUT in the 'host_status' usually means an (unexpected) device selection timeout. DRIVER_TIMEOUT in the 'driver_status' byte means the SCSI adapter is unable to control the devices on its SCSI bus (and has given up).

The type of timeout is unsigned int (and it represents milliseconds).


6.11. flags

These are single or multi-bit values that can be "or-ed" together:

The type of flags is unsigned int.


6.18. host_status

These codes potentially come from the firmware on a host adapter or from one of several hosts that an adapter driver controls. The 'host_status' field has the following values whose #defines mimic those which are only visible within the kernel (with the "SG_ERR_" removed from the front of each define). A copy of these defines can be found in sg_err.h (see Appendix A):

  • SG_ERR_DID_OK [0x00] NO error

  • SG_ERR_DID_NO_CONNECT [0x01] Couldn't connect before timeout period

  • SG_ERR_DID_BUS_BUSY [0x02] BUS stayed busy through time out period

  • SG_ERR_DID_TIME_OUT [0x03] TIMED OUT for other reason (often this an unexpected device selection timeout)

  • SG_ERR_DID_BAD_TARGET [0x04] BAD target, device not responding?

  • SG_ERR_DID_ABORT [0x05] Told to abort for some other reason. From lk 2.4.15 the SCSI subsystem supports 16 byte commands however few adapter drivers do. Those HBA drivers that don't support 16 byte commands will yield this error code if a 16 byte command is passed to a SCSI device they control.

  • SG_ERR_DID_PARITY [0x06] Parity error. Older SCSI parallel buses have a parity bit for error detection. This probably indicates a cable or termination problem.

  • SG_ERR_DID_ERROR [0x07] Internal error detected in the host adapter. This may not be fatal (and the command may have succeeded). The aic7xxx and sym53c8xx adapter drivers sometimes report this for data underruns or overruns. [9]

  • SG_ERR_DID_RESET [0x08] The SCSI bus (or this device) has been reset. Any SCSI device on a SCSI bus is capable of instigating a reset.

  • SG_ERR_DID_BAD_INTR [0x09] Got an interrupt we weren't expecting

  • SG_ERR_DID_PASSTHROUGH [0x0a] Force command past mid-layer

  • SG_ERR_DID_SOFT_ERROR [0x0b] The low level driver wants a retry

The type of host_status is unsigned short .


Chapter 7. System calls

System calls that can be used on sg devices are discussed in this chapter. The ioctl() system call is discussed in the following chapter [ see Chapter 8 ].

Successfully opening a sg device file name (e.g. /dev/sg0) establishes a link between a file descriptor and an attached SCSI device. The sg driver maintains state information and resources at both the SCSI device (e.g. exclusive lock) and the file descriptor (e.g. reserved buffer) levels.

A SCSI device can be detached while an application has a sg file descriptor open. An example of this is a "hotplug" device such as a USB mass storage device that has just been unplugged. Most subsequent system calls that attempt to access the detached SCSI device will yield ENODEV. The close() call will complete silently while the poll() call will "or" in POLLHUP to its result. A subsequent attempt to open() that device name will yield ENODEV.


7.1. open()

open(const char * filename, int flags). The filename should be a sg device file name as discussed in the Chapter 4. Flags can be a number of the following or-ed together:

  • O_RDONLY restricts operations to read()s and ioctl()s (i.e. can't use write() ).

  • O_RDWR permits all system calls to be executed.

  • O_EXCL waits for other opens on the associated SCSI device to be closed before proceeding. If O_NONBLOCK is set then yields EBUSY when someone else has the SCSI device open. The combination of O_RDONLY and O_EXCL is disallowed.

  • O_NONBLOCK Sets non-blocking mode. Calls that would otherwise block yield EAGAIN (e.g. read() ) or EBUSY (e.g. open() ). This flag is ignored by ioctl(SG_IO) .

Either O_RDONLY or O_RDWR must be set in flag. Either of the other 2 flags (but not both) can be or-ed in.

Note that multiple file descriptors may be open to the same SCSI device. [This is a way of side stepping the SG_MAX_QUEUE limit.] At the sg level separate state information is maintained. This means that even if multiple file descriptors are open to a single SCSI device their write() read() sequences are essentially independent.

Open() calls may be blocked due to exclusive locks (i.e. O_EXCL). An exclusive lock applies to a single SCSI device and only to sg's use of that device (i.e. it has no effect on access via sd, sr or st to that device). If the O_NONBLOCK flag is used then open() calls that would have otherwise blocked, yield EBUSY. Applications that scan sg devices trying to determine their identity (e.g. whether one is a scanner) should use the O_NONBLOCK flag otherwise they run the risk of blocking.

The driver will attempt to reserve SG_DEF_RESERVED_SIZE bytes (32KBytes in the current sg.h) on open(). The size of this reserved buffer can subsequently be modified with the SG_SET_RESERVED_SIZE ioctl(). In both cases these are requests subject to various dynamic constraints. The actual amount of memory obtained can be found by the SG_GET_RESERVED_SIZE ioctl(). The reserved buffer will be used if:

  • it is not already in use (e.g. when command queuing is in use)

  • a write() or ioctl(SG_IO) requests a data transfer size that is less than or equal to the reserved buffer size.

Returns a file descriptor if >= 0 , otherwise -1 implies an error.


7.2. write()

write(int sg_fd, const void * buffer, size_t count). The action of write() with a control block based on struct sg_header is discussed in the earlier document: www.torque.net/sg/p/scsi-generic.txt (i.e the sg version 2 documentation). This section describes the action of write() when it is given a control block based on struct sg_io_hdr.

The 'buffer' should point to an object of type sg_io_hdr_t and 'count' should be sizeof(sg_io_hdr_t) [it can be larger but the excess is ignored]. If the write() call succeeds then the 'count' is returned as the result.

Up to SG_MAX_QUEUE (16) write()s can be queued up before any finished requests are completed by read(). An attempt to queue more than that will result in an EDOM error. [11] The write() command should return more or less immediately. [12]

The version 2 sg driver defaulted the maximum queue length to 1 (and made available the SG_SET_COMMAND_Q ioctl() to switch it to SG_MAX_QUEUE). So for backward compatibility a file descriptor that only receives sg_header structures in its write() will have a default "max" queue length of 1. As soon as a sg_io_hdr_t structure is seen by a write() then the maximum queue length is switched to SG_MAX_QUEUE on that file descriptor.

The "const" on the 'buffer' pointer is respected by the sg driver. Data is read in from the sg_io_hdr object that is pointed to. Significantly this is when the 'sbp' and the 'dxferp' are recorded internally (i.e. not from the sg_io_hdr object given to the corresponding read() ).


7.3. read()

read(int sg_fd, void * buffer, size_t count). The action of read() with a control block based on struct sg_header is discussed in the earlier document: www.torque.net/sg/p/scsi-generic.txt (i.e. the sg version 2 documentation). This section describes the action of read() when it is given a control block based on struct sg_io_hdr.

The 'buffer' should point to an object of type sg_io_hdr_t and 'count' should be sizeof(sg_io_hdr_t) [it can be larger but the excess is ignored]. If the read() call succeeds then the 'count' is returned as the result.

By default, read() will return the oldest completed request that is queued up. A read() will not interfere with any request associated with the SG_IO ioctl() on this file descriptor except in a special case when a SG_IO ioctl() is interrupted by a signal.

If the SG_SET_FORCE_PACK_ID,1 ioctl() is active then read() will attempt to fetch the packet whose pack_id (given earlier to write()) matches the sg_io_hdr_t::pack_id given to this read(). If not available it will either wait or yield EAGAIN. As a special case, -1 in sg_io_hdr_t::pack_id given to read() will match the request whose response has been waiting for the longest time. Take care to also set 'dxfer_direction' to any valid value (e.g. SG_DXFER_NONE) when in this mode. The 'interface_id' member should also be set appropriately.

Apart from the SG_SET_FORCE_PACK_ID case (and then only for the 3 indicated fields), the sg_io_hdr_t object given to read() can be uninitialized. Note that the 'sbp' pointer value for optionally outputting a sense buffer was recorded from the earlier, corresponding write().


7.5. close()

When close() leaves outstanding SCSI commands still awaiting responses, the sg driver maintains its internal structures for the now defunct file descriptor. These internal structures are maintained until all outstanding responses (some might be timeouts) are received. When the sg driver is loaded as a module and has any open file descriptors or "defunct" file descriptors then it cannot be unloaded. An attempt to call rmmod sg will report the driver is busy. Defunct file descriptors that remain for some time, perhaps awaiting a timeout, can be observed with the cat /proc/scsi/sg/debug command. In this case "closed=1" will be set on the defunct file descriptor [see Section 11.1]. Defunct file descriptors do not impede attempts by applications to open() new file descriptors on the same SCSI device.

The kernel arranges for only the last close() on a file descriptor to be seen by a driver (and to emphasize this, the corresponding sg driver call is named sg_release() rather than sg_close()). This is only significant when an application uses fork() or dup().

Returns 0 if successful, otherwise -1 implies an error.


7.6. mmap()

The mmap() system call can be made multiple times on the same sg_fd. The munmap() system call is not required if close() is called on sg_fd. Mmap-ed IO is well-behaved when a process is fork()-ed (or the equivalent finer grained clone() system call is made). In the case of a fork(), 2 processes will be sharing the same memory mapped area together with the sg driver for a sg_fd and the last one to close the sg_fd (or exit) will cause the shared memory to be freed.

It is assumed that if the default reserved buffer size of 32 KB is not sufficient then a ioctl(SG_SET_RESERVED_SIZE) call is made prior to any calls to mmap(). If the required size is not a multiple of the kernel's page size (returned by getpagesize() system call) then the size passed to ioctl(SG_SET_RESERVED_SIZE) should be rounded up to the next page size multiple.

Mmap-ed IO is requested by setting (or or-ing in) the SG_FLAG_MMAP_IO constant into the flag member of the the sg_io_hdr structure prior to a call to write() or ioctl(SG_IO). The logic to do mmap-ed IO _assumes_ that an appropriate mmap() call has been made by the application. In other words it does not check. [13]


7.8. Errors reported in errno

With the original interface almost any string could be accidentally given to write() and potentially (but rarely) something nasty could happen. If some error was detected then more than likely EIO was placed in errno.

Unfortunately this can still happen with write() since it can accept both the original struct sg_header or the newer sg_io_hdr_t described in this note. However since the SG_IO ioctl() will only accept the sg_io_hdr_t structure there is less chance of a random string being interpreted as a command. Since the sg_io_hdr_t interface does a lot more error checking, it attempts to give out more precise errno values to help the user pinpoint the problem. [Admittedly some of these errno values are picked in an arbitrary way from the large set of available values.]

In most cases when a system call on a sg file descriptor fails, the call in question will return -1. After an application detects that a system call has failed it should read the value in the "errno" variable (prior to do any more system calls). Applications should include the <errno.h> header.

Below is a table of errno values indicating which calls to sg will generate them and the meaning of the error. A write() call is indicated by "w", a read() call by "r" and an open() call by "o".

errno    which_calls    Meaning
-----    -----------    ----------------------------------------------
EACCES    <some ioctls> Root permission (more precisely CAP_SYS_ADMIN
                        or CAP_SYS_RAWIO) required. Also may occur during
                        an attempted write to /proc/scsi/sg files.
EAGAIN    r             The file descriptor is non-blocking and the request
                        has not been completed yet.
EAGAIN    w,SG_IO       SCSI sub-system has (temporarily) run out of 
                        command blocks.
EBADF     w             File descriptor was not open()ed O_RDWR.
EBUSY     o             Someone else has an O_EXCL lock on this device.
EBUSY     w             With mmap-ed IO, the reserved buffer already in use.
EBUSY     <some ioctls> Attempt to change something (e.g. reserved buffer
                        size) when the resource was in use.
EDOM      w,SG_IO       Too many requests queued against this file
                        descriptor. Limit is SG_MAX_QUEUE active requests.
                        If sg_header interface is being used then the
                        default queue depth is 1. Use SG_SET_COMMAND_Q
                        ioctl() to increase it.
EFAULT    w,r,SG_IO     Pointer to user space invalid.
          <most ioctls> 
EINVAL    w,r           Size given as 3rd argument not large enough for the
                        sg_io_hdr_t structure. Both direct and mmap-ed IO
			selected.
EIO       w             Size given as 3rd argument less than size of old
                        header structure (sg_header). Additionally a write()
                        with the old header will yield this error for most
                        detected malformed requests.
EIO       r             A read() with the older sg_header structure yields
			this value for some errors that it detects.
EINTR     o             While waiting for the O_EXCL lock to clear this call
                        was interrupted by a signal.
EINTR     r,SG_IO       While waiting for the request to finish this call
                        was interrupted by a signal.
EINTR     w             [Very unlikely] While waiting for an internal SCSI
                        resource this call was interrupted by a signal.
EMSGSIZE  w,SG_IO       SCSI command size ('cmd_len') was too small 
                        (i.e. < 6) or too large
ENODEV    o             Tried to open() a file with no associated device.
                        [Perhaps sg has not been built into the kernel or
                        is not available as a module?]
ENODEV    o,w,r,SG_IO   SCSI device has detached, awaiting cleanup.
                        User should close fd. Poll() will yield POLLHUP.
ENOENT    o             Given filename not found.
ENOMEM    o             [Very unlikely] Kernel was not even able to find
                        enough memory for this file descriptor's context.
ENOMEM    w,SG_IO       Kernel unable to find memory for internal buffers.
                        This is usually associated with indirect IO.
			For mmap-ed IO 'dxfer_len' greater than reserved
			buffer size.
			Lower level (adapter) driver does not support enough
			scatter gather elements for requested data transfer.
ENOSYS    w,SG_IO       'interface_id' of a sg_io_hdr_t object was _not_ 'S'.
ENXIO     o             "remove-single-device" may have removed this device.
ENXIO     o, w,r,SG_IO  Internal error (including SCSI sub-system busy doing
                        error processing - e.g. SCSI bus reset). When a
			SCSI device is offline, this is the response. This 
			can be bypassed by opening O_NONBLOCK.
EPERM     o             Can't use O_EXCL when open()ing with O_RDONLY
EPERM     w,SG_IO       File descriptor open()-ed O_RDONLY but O_RDWR
          <some ioctls> access mode needed for this operation.


Chapter 8. Ioctl()s

The Linux SCSI upper level drivers, including sg, have a "trickle down" ioctl() architecture. This means that ioctl()s whose request value (i.e. the second argument) is not understood by the upper level driver, are passed down to the SCSI mid-level. Those ioctl()s that are not understood by the mid level driver are passed down to the lower level (adapter) driver. If none of the 3 levels understands the ioctl() request value then -1 is returned and EINVAL is placed in errno. By convention the beginning of the request value's symbolic name indicates which level will respond to the ioctl(). For example, request values starting with "SG_" are processed by the sg driver while those starting with "SCSI_" are processed by the mid level.

Most of the sg ioctl()s read or write information via a pointer given as the third argument to the ioctl() call and return 0 on success. A few of the older ioctl()s that get a value from the driver return that value as the result of the ioctl() call (e.g. ioctl(SG_GET_TIMEOUT) ).

All sg driver ioctl()s are listed below. They all start with "SG_". They are followed by several interesting SCSI mid level ioctl()s which start with "SCSI_IOCTL_". The sg ioctl()s are roughly in alphabetical order (with _SET_, _GET_ and _FORCE_ ignored). Since ioctl(SG_IO) is a complete SCSI command request/response sequence then it is listed first.


8.1. SG_IO

The same file descriptor can be used both for SG_IO synchronous calls and the write() read() sequences at the same time. The sg driver makes sure that the response to a SG_IO call will never accidentally be fetched by a read(). Even though a single file descriptor can be shared in this manner, it is probably more sensible (and results in cleaner code) if separate file descriptors to the same SCSI device are used in this case.

It is possible that the wait for the command completion is interrupted by a signal. In this case the SG_IO call will yield an EINTR error. This is reasonably complex to handle and is discussed in the ioctl(SG_SET_KEEP_ORPHAN) description below. The following SCSI commands will be permitted by SG_IO when the sg file descriptor was opened O_RDONLY:

All commands to SCSI device type SCANNER are accepted. Other cases yield an EPERM error. Note that the write() read() interface must have the sg file descriptor open()-ed with O_RDWR as write permission is required by Linux to execute a write() system call.

The ability of the SG_IO ioctl() to issue certain SCSI commands has led to some relaxation on file descriptors open()ed "read-only" compared with the version 2 sg driver. The open() call will now attempt to allocate a reserved buffer for all newly opened file descriptors. The ioctl(SG_SET_RESERVED_SIZE) will now work on "read-only" file descriptors.


8.15. SG_SCSI_RESET

Unfortunately this ioctl() doesn't currently do much (but may in the future after other issues are resolved). Yields an EBUSY error if the SCSI bus or the associated device is being reset when this ioctl() is called, otherwise returns 0. N.B. In some recent distributions there is a patch to the SCSI mid level code that activates this ioctl. Check your distribution.


8.26. SCSI_IOCTL_SEND_COMMAND

The structure that we are passed should look like:
   struct sdata {
    unsigned int inlen;     [i] Length of data written to device
    unsigned int outlen;    [i] Length of data read from device
    unsigned char cmd[x];   [i] SCSI command (6 <= x <= 16)
                            [o] Data read from device starts here
                            [o] On error, sense buffer starts here
    unsigned char wdata[y]; [i] Data written to device starts here
   };
Notes:

  • The SCSI command length is determined by examining the 1st byte of the given command [15] . There is no way to override this.

  • Data transfers are limited to PAGE_SIZE (4K on i386, 8K on alpha).

  • The length (x + y) must be at least OMAX_SB_LEN bytes long to accommodate the sense buffer when an error occurs. The sense buffer is truncated to OMAX_SB_LEN (16) bytes so that old code will not be surprised.

  • If a Unix error occurs (e.g. ENOMEM) then the user will receive a negative return and the Unix error code in 'errno'. If the SCSI command succeeds then 0 is returned. Positive numbers returned are the compacted SCSI error codes (4 bytes in one int) where the lowest byte is the SCSI status. See the drivers/scsi/scsi.h file for more information on this.


Chapter 9. Direct and Mmap-ed IO

The normal action of the sg driver for a read operation (from a device) is to request the lower level (adapter) driver to DMA [16] data into kernel buffers that the sg driver manages. The sg driver will then copy the contents of its buffers into the user space. [This sequence is reversed for a write operation (towards a device)]. While this double handling of data is obviously inefficient it does decouple some hardware issues from user applications. For these and historical reasons the "double-buffered" IO remains the default for the sg driver.

Both "direct" and "mmap-ed" IO are techniques that permit the data to be DMA-ed directly from the lower level (adapter) driver into the user application (vice versa for write operations). Both techniques result in faster speed, smaller latencies and lower CPU utilization but come at the expense of complexity (as always). For example the Linux kernel must not attempt to swap out pages in a user application that a SCSI adapter is busy DMA-ing data into.


9.1. Direct IO

Direct IO uses the kiobuf mechanism [see the Linux Device Drivers book] to manipulate memory allocated within the user space so that a lower level (adapter) driver can DMA directly to or from that user space memory. Since the user can give a different data buffer to each SCSI command passed through the sg interface then the kiobuf mechanism needs to setup its structures (and undo that setup) for each SCSI command. [17] Direct IO is available as an option in sg 3.1.18 (before that the sg driver needed to be recompiled with an altered define). Direct IO support is designed in such a way that if it is requested and cannot be performed then the command will still be performed using indirect IO. If direct IO is requested and has been performed then the SG_INFO_DIRECT_IO bit will be set in the 'info' member of the sg_io_hdr_t control structure after the request has been completed. Direct IO is not supported on ISA SCSI adapters since they only can address a 24 bit address space.

One limit on direct IO is that sg_io_hdr_t::iovec_count==0. So the user cannot (currently) use application level scatter gather and direct IO on the same request.

For direct IO to be worthwhile, a reasonable amount of data should be requested for data transfer. For transfers less than 8 KByte it is probably not worth the trouble. On the other hand "locking down" a multiple 512 KB blocks of data for direct IO could adversely impact overall system performance. Remember that for the duration of a direct IO request, the data transfer buffer is mapped to a fixed memory location and locked in such a way that it won't be swapped out. This can "cramp the style" of the kernel if it is overdone.

Prior to sg 3.1.18 the direct IO code was commented out with the "SG_ALLOW_DIO" define. In sg 3.1.18 (available for lk 2.4.2 and later) the direct IO code is active but is defaulted off by a run time value. This value can be accessed via the "proc" file system at /proc/scsi/sg/allow_dio . Direct IO is enabled when a user with root permissions writes "1" to that file: echo 1 > /proc/scsi/sg/allow_dio . If SG_FLAG_DIRECT_IO is set in sg_io_hdr::flags but /proc/scsi/sg/allow_dio holds "0" then indirect IO will be performed (and this is indicated by ((sg_io_hdr::info & SG_INFO_DIRECT_IO_MASK) == SG_INFO_INDIRECT_IO) after the request is completed).


9.2. Mmap-ed IO

Memory-mapped IO takes a different approach from direct IO to removing the extra data copy performed by normal ("indirect") IO. With mmap-ed IO the application calls the mmap() system call to memory map sg's reserved buffer. The sg driver maintains one reserved buffer per file descriptor. The default size of the reserved buffer is 32 KB and it can be changed with the ioctl(SG_SET_RESERVED_SIZE). The mmap() system call only needs to be called once prior [18] to doing mmap-ed IO. For more details on the mmap() see Section 7.6. An application indicates that it wants mmap-ed on a SCSI request by setting the SG_FLAG_MMAP_IO value in 'flags'.

Since there is only reserved buffer per sg file descriptor then only one mmap-ed IO command can be active at one time. In order to perform command queuing with mmap-ed IO, an application will need to open() multiple file descriptors to the same SCSI device. With mmap-ed IO the various status values and the sense buffer (if required) are conveyed back to an application in the same fashion as normal ("indirect") IO.

Mmap-ed has very low per command latency since the reserved buffer mapping only needs to be done once per file descriptor. Also the reserved buffer is set up by the sg driver to aid the efficient construction of the internal scatter gather list used by the lower level (adapter) driver for DMA purposes. This tends to be more efficient than the user memory that direct IO requires the sg driver to process into an internal scatter gather list. So on both these counts, mmap-ed IO has the edge over direct IO.


Chapter 10. Driver and module initialization

The size of the default reserved buffer can be specified when the sg driver is loaded. If it is built into the kernel then use:
    sg_def_reserved_size=<n>
on the boot line (only supported in 2.4 kernels).

If sg is a module, it can be loaded with modprobe in either manner:
    modprobe sg
    modprobe sg def_reserved_size=<n>
In the second case "<n>" is an integer (non negative). The default value is the value of the SG_DEF_RESERVED_SIZE defined in sg.h . This is currently 32768.

If sg is a module, it can be unloaded with rmmod like this:
    rmmod sg
However if there is a file descriptor still open with the sg driver (or there is an outstanding request awaiting a response) then the sg module is considered to be busy and can't be unloaded.


Chapter 11. Sg and the "proc" file system

The sg driver provides information about the SCSI subsystem and the current internal state of the sg driver in the /proc/scsi/sg directory. Some sg driver defaults can be changed by super user writing values to these "pseudo" files [19].

The following files which are readable by all:
allow_dio       0 indicates direct IO disable, 1 for enabled
debug           debug information including active request data
def_reserved_size  default buffer size reserved for each file descriptor
devices         one line of numeric data per device
device_hdr      single line of column names corresponding to 'devices'
device_strs     one line of vendor, product and rev info per device
hosts           one line of numeric data per host
host_hdr        single line of column names corresponding to 'hosts'
host_strs       one line of host information (string) per host
version         sg version as a number followed by a string representation

Each line in 'devices' and 'device_strs' corresponds to an sg device. For example the first line corresponds to /dev/sg0. The line number (origin 0) also corresponds to the sg minor device number. This mapping is local to sg and is normally the same as given by th cat /proc/scsi/scsi command which is reported by the SCSI mid level driver. The two mappings may diverge when 'remove-single-device' and 'add-single-device' are used (see the SCSI-2.4-HOWTO for more information).

Each line in 'hosts' and 'host_strs' corresponds to a SCSI host. For example the first line corresponds to the host normally represented as "scsi0". This mapping is invariant across the SCSI sub system. [So these entries could arguably be migrated to the mid level.]

The column headers in 'device_hdr' are given below. If the device is not present (and one is present after it) then a line of "-1" entries is output. Each entry is separated by a whitespace (currently a tab):
host            host number (indexes 'hosts' table, origin 0)
chan            channel number of device
id              SCSI id of device
lun             Logical Unit number of device
type            SCSI type (e.g. 0->disk, 5->cdrom, 6->scanner)
opens           number of opens (by sd, sr, sr and sg) at this time
depth           maximum queue depth supported by device
busy            number of commands being processed by host for this device
online          1 indicates device is in normal online state, 0->offline
A SCSI device is set offline by the SCSI mid level when it decides that a device is no longer responding (e.g. the device does not respond to an SCSI INQUIRY command after it has been reset).

The column headers in 'host_hdr' are given below. Each entry is separated by a whitespace (currently a tab):
uid             unique id (non-zero if multiple hosts of same type)
busy            number of commands being processed for this host
cpl             maximum number of command per lun (may be 0 if "device depth"
                is given
sgat            maximum elements of scatter gather the adapter (pseudo)
                DMA can accommodate
isa             0 -> non-ISA adapter, 1 -> ISA adapter. ISA adapters are
                assumed to have a 24 bit address bus limit (16 MB).
emu             0 -> real SCSI adapter, 1 -> emulated SCSI adapter
                (e.g. ide-scsi device driver)

The 'def_reserved_size' is both readable and writable. It is only writable by root. It is initialized to the value of DEF_RESERVED_SIZE in the "sg.h" file. Values between 0 and 1048576 (which is 2 ** 20) are accepted and can be set from the command line with the following syntax:
$ echo "262144" > /proc/scsi/sg/def_reserved_size
Note that the actual reserved buffer associated with a file descriptor could be less than 'def_reserved_size' if appropriate memory is not available. If the sg driver is compiled into the kernel (but not when it is a module) this value can also be read at /proc/sys/kernel/sg-big-buff . This latter feature is deprecated.

The 'allow_dio' is both readable and writable. It is only writable by root. When it is 0 (default) any request to do direct IO (i.e. by setting SG_FLAG_DIRECT_IO) will be ignored and indirect IO will be done instead.


11.1. /proc/scsi/sg/debug

This appendix explains the output from the /proc/scsi/sg/debug which is typically viewed by the command cat /proc/scsi/sg/debug. Below is the (slightly abridged) output while this command: sgp_dd if=/dev/sg0 of=/dev/null bs=512 is executing on the system. That sgp_dd command is using command queuing to read a disk (and the data is written to /dev/null which forgets it).
$ cat /proc/scsi/sg/debug
dev_max(currently)=7 max_active_device=1 (origin 1)
 scsi_dma_free_sectors=416 sg_pool_secs_aval=320 def_reserved_size=32768
 >>> device=sg0 scsi0 chan=0 id=0 lun=0   em=0 sg_tablesize=255 excl=0
   FD(1): timeout=60000ms bufflen=65536 (res)sgat=2 low_dma=0
   cmd_q=1 f_packid=1 k_orphan=0 closed=0
     fin: id=3949312 blen=65536 dur=10ms sgat=2 op=0x28
     act: id=3949440 blen=65536 t_o/elap=60000/10ms sgat=2 op=0x28
     rb>> act: id=3949568 blen=65536 t_o/elap=60000/10ms sgat=2 op=0x28
     act: id=3949696 blen=65536 t_o/elap=60000/0ms sgat=2 op=0x28
Those items output above that are significant to user applications are described below.

Broadly speaking the above output shows everything is going fine. Four SCSI READ(10) commands (SCSI opcode 0x28) for different ids are underway. Three commands are active while one is finished with its status and data read() and the request structure is pending deletion. The "id" corresponds to the pack_id given in the sg_io_hdr structure (or the sg_header structure). In the case if sgp_dd the pack_id value is the block number being given to the SCSI READ (or WRITE). You will notice the 4 ids are 128 apart.

The ">>>" line shows the sg device name followed by the linux scsi adapter, channel, scsi id and lun numbers. The "em=" argument indicates whether the driver emulates a SCSI HBA. The ide-scsi driver would set "em=1". The "sg_tablesize" is the maximum number of scatter gather elements supported by the adapter driver. The "excl=0" indicates no sg open() on this device is currently using the O_EXCL flag.

The next two lines starting with "FD(1)" supply data about the first (and only in this case) open file descriptor on /dev/sg0. The default timeout is 60 seconds however this is only significant if the sg_header interface is being used since the sg_io_hdr interface explicits sets the timeout on a per command basis. "bufflen=65536" is the reserved buffer size for this file descriptor. The "(res)sgat=2" indicates that this reserved buffer requires 2 scatter gather elements. The "low_dma" will be set to 1 for ISA HBAs indicating only the bottom 16 MB of RAM can be used for its kernel buffers. The "cmd_q=1" indicates command queuing is being allowed. The "f_packid=1" indicates the SG_SET_FORCE_PACK_ID mode is on. The "k_orphan" value is 1 in the rare cases when a SG_IO is interrupted while a SCSI command is "in flight". The "closed" value is 1 in the rare cases the file descriptor has been closed while a SCSI command is "in flight".

Each line indented with 5 spaces represents a SCSI command. The state of the command is either:

These states can be optionally prefixed by "rb>>" which means the reserved buffer is being used, "dio>>" which means this command is using direct IO, or "mmap>>" which means that mmap-ed IO is being used by this command. The "id" is the pack_id from this command's interface structure. The "blen" is the buffer length used by the data transfer associated with this command. For commands that a response has been received "dur" shows its duration in milliseconds. For commands still "in flight" an indication of "t_o/elap=60000/10ms" means this command has a timeout of 60000 milliseconds of which 10 milliseconds has already elapsed. The "sgat=2" argument indicates that this command's "blen" requires 2 scatter gather elements. The "op" value is the hexadecimal value of the SCSI command being executed.

If sg has lots of activity then the "debug" output may span many lines and in some cases appear to be corrupted. This occurs because procfs requests fixed buffer sizes of information and, if there is more data to output, returns later to get the remainder. The problem with this strategy is that sg's internal state may have changed. Rather than double buffering, the sg driver just continues from the same offset. While procfs is very useful, ioctl()s (such as SG_GET_REQUEST_TABLE) still have their place.


Chapter 12. Asynchronous usage of sg

It is recommended that synchronous sg-based applications use the new SG_IO ioctl() command. Existing applications (which are mainly synchronous) can continue to use the older sg_header based interface which is still supported.

Asynchronous usage allows multiple SCSI commands to be queued up to the device. If the device supports command queuing then there can be a major performance gain. Even if the device doesn't support command queuing (or is temporarily busy) then queuing up commands in the mid level or the host driver can be a minor performance win (since there will be a lower latency to transmit the next command when the device becomes free).

Asynchronous usage usually starts with setting the O_NONBLOCK flag on open() [or thereafter by using the fcntl(fd, SETFD, old_flags | O_NONBLOCK) system call]. A similar effect can be obtained without using O_NONBLOCK when POSIX threads are used. There are several strategies that can then be followed:

  1. set O_NONBLOCK and use a poll() loop

  2. set O_NONBLOCK and use SIGPOLL signal to alert app when readable

  3. use POSIX threads and a single sg file descriptor

  4. use POSIX threads and multiple sg file descriptors to same device

The O_NONBLOCK flag also permits open(), write() and read() [but not the ioctl(SG_IO)] to access a SCSI device even though it has been marked offline. SCSI devices are marked offline when they are detected and don't respond to the initial SCSI commands as expected, or, some SCSI error condition is detected on that device and the mid level error recovery logic is unable to "resurrect" the device. A SCSI device that is being reset (and still settling) could be accessed during this period by using the O_NONBLOCK flag; this could lead to unexpected behaviour so the sg user should take care.

In Linux SIGIO and SIGPOLL are the same signal. If POSIX real time signals are used (e.g. when SA_SIGINFO is used with sigaction() and fcntl(fd, F_SETSIG, SIGRTMIN + <n>) ) then the file descriptor with which the signal is associated is available to the signal handler. The associated file descriptor is in the si_fd member of the siginfo_t structure. The poll() system call that is often used after a signal is received can thus be bypassed.


Appendix A. Sg3_utils package

The sg3_utils package is a collection of programs that use the sg interface. The utilities can be categorized as follows:

The "dd" family of utilities take a sg device file name as input (i.e. if=<sg_dev_filen_name>), as output of both. They can also take raw device file names [20] instead of sg device file names. One important difference from the standard dd command is that the value given to the block size (bs=) argument must be the exact block size of that device and not a integral multiple as allowed by dd. These "dd" variants are suitable for SCSI Direct Access Devices such as disk and CDROMs (but are not suitable for SCSI tape devices).

The sg3_utils package is designed to be used with the sg version 3 driver found in the lk 2.4 series. There is also a sg_utils package that supports a subset of these commands for the sg version 2 driver (with some support for the original sg driver) which is found in the lk 2.2 series (from and after lk 2.2.6). There are links to the most recent sg3_utils (and sg_utils) packages at the sg website at www.torque.net/sg. There are tarballs and both source and binary rpm packages. At the time of writing the latest sg3_utils tarball is at www.torque.net/sg/p/sg3_utils-0.97.tgz. There is a README file in that tarball that should be examined for up to date information. The more important utility commands (e.g. sg_dd) have "man" pages. [21]

Almost all of the sg device driver capabilities discussed in this document appear in code in one or more of these programs. For example the recently added mmap-ed IO can be found in sgm_dd, sg_read and sg_rbuf.

The sg3_utils package also provides some functions that may be useful for applications that use sg. The functions declared in sg_err.h and defined in sg_err.c categorize SCSI subsystem errors that are returned to an application in a read() or a ioctl(SG_IO). In the case of sense buffers, they are decoded into text message (as per SCSI 2 definitions). There is also a function to do a 64 bit seek (llseek.h).


Appendix B. sg_header, the original sg control structure

Following is the original interface structure of the sg driver that dates back to 1991. Those field elements with a "[o]+" are added by the sg version 2 driver which was first placed in lk 2.2.6 in April 1999.
struct sg_header
{
    int pack_len;    /* [o] */
    int reply_len;   /* [i] */
    int pack_id;     /* [i->o] */
    int result;      /* [o] */
    unsigned int twelve_byte:1;     /* [i] */
    unsigned int target_status:5;   /* [o]+ */
    unsigned int host_status:8;     /* [o]+ */
    unsigned int driver_status:8;   /* [o]+ */
    unsigned int other_flags:10;    /* unused */
    unsigned char sense_buffer[SG_MAX_SENSE]; /* [o] */
};      /* This structure is 36 bytes long on i386 */
SCSI commands are sent via write() calls to an sg device name (e.g. /dev/sg0). The data written to write() is of the form <a_sg_header_obj + scsi_command [ + data_to_write]>. The "data_to_write" component is only needed for SCSI commands that transfer data towards the SCSI device. The corresponding read() to the sg device name will yield data of the form <a_sg_header_obj [ + data_to_read]>.

This interface is fully described in the www.torque.net/sg/p/scsi-generic.txt file which documents the sg version 2 driver.

Since many Linux applications use this interface, it is still supported in this version (i.e. version 3) of the driver. Only its most perverse idiosyncrasies have been modified and no major applications have reported any problems running old applications atop this newer driver.


Appendix C. Programming example

This appendix contains an example program. It is an abridged version of sg_simple2.c found in the sg3_utils package. It send a SCSI INQUIRY command to the nominated sg device and prints out some of the response or outputs error information. Hopefully showing the error processing does not cloud what is being illustrated.

#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <scsi/sg.h> /* take care: fetches glibc's /usr/include/scsi/sg.h */

/* This is a simple program executing a SCSI INQUIRY command using the
   sg_io_hdr interface of the SCSI generic (sg) driver.

*  Copyright (C) 2001 D. Gilbert
*  This program is free software.   Version 1.01 (20020226)
*/

#define INQ_REPLY_LEN 96
#define INQ_CMD_CODE 0x12
#define INQ_CMD_LEN 6

int main(int argc, char * argv[])
{
    int sg_fd, k;
    unsigned char inqCmdBlk[INQ_CMD_LEN] =
                    {INQ_CMD_CODE, 0, 0, 0, INQ_REPLY_LEN, 0};
/* This is a "standard" SCSI INQUIRY command. It is standard because the
 * CMDDT and EVPD bits (in the second byte) are zero. All SCSI targets
 * should respond promptly to a standard INQUIRY */
    unsigned char inqBuff[INQ_REPLY_LEN];
    unsigned char sense_buffer[32];
    sg_io_hdr_t io_hdr;

    if (2 != argc) {
        printf("Usage: 'sg_simple0 <sg_device>'\n");
        return 1;
    }
    if ((sg_fd = open(argv[1], O_RDONLY)) < 0) {
    	/* Note that most SCSI commands require the O_RDWR flag to be set */
        perror("error opening given file name");
        return 1;
    }
    /* It is prudent to check we have a sg device by trying an ioctl */
    if ((ioctl(sg_fd, SG_GET_VERSION_NUM, &k) < 0) || (k < 30000)) {
        printf("%s is not an sg device, or old sg driver\n", argv[1]);
        return 1;
    }
    /* Prepare INQUIRY command */
    memset(&io_hdr, 0, sizeof(sg_io_hdr_t));
    io_hdr.interface_id = 'S';
    io_hdr.cmd_len = sizeof(inqCmdBlk);
    /* io_hdr.iovec_count = 0; */  /* memset takes care of this */
    io_hdr.mx_sb_len = sizeof(sense_buffer);
    io_hdr.dxfer_direction = SG_DXFER_FROM_DEV;
    io_hdr.dxfer_len = INQ_REPLY_LEN;
    io_hdr.dxferp = inqBuff;
    io_hdr.cmdp = inqCmdBlk;
    io_hdr.sbp = sense_buffer;
    io_hdr.timeout = 20000;     /* 20000 millisecs == 20 seconds */
    /* io_hdr.flags = 0; */     /* take defaults: indirect IO, etc */
    /* io_hdr.pack_id = 0; */
    /* io_hdr.usr_ptr = NULL; */

    if (ioctl(sg_fd, SG_IO, &io_hdr) < 0) {
        perror("sg_simple0: Inquiry SG_IO ioctl error");
        return 1;
    }

    /* now for the error processing */
    if ((io_hdr.info & SG_INFO_OK_MASK) != SG_INFO_OK) {
        if (io_hdr.sb_len_wr > 0) {
            printf("INQUIRY sense data: ");
            for (k = 0; k < io_hdr.sb_len_wr; ++k) {
                if ((k > 0) && (0 == (k % 10)))
                    printf("\n  ");
                printf("0x%02x ", sense_buffer[k]);
            }
            printf("\n");
        }
        if (io_hdr.masked_status)
            printf("INQUIRY SCSI status=0x%x\n", io_hdr.status);
        if (io_hdr.host_status)
            printf("INQUIRY host_status=0x%x\n", io_hdr.host_status);
        if (io_hdr.driver_status)
            printf("INQUIRY driver_status=0x%x\n", io_hdr.driver_status);
    }
    else {  /* assume INQUIRY response is present */
        char * p = (char *)inqBuff;
        printf("Some of the INQUIRY command's response:\n");
        printf("    %.8s  %.16s  %.4s\n", p + 8, p + 16, p + 32);
        printf("INQUIRY duration=%u millisecs, resid=%d\n",
               io_hdr.duration, io_hdr.resid);
    }
    close(sg_fd);
    return 0;
}

The sg_simple4.c program is an example of using mmap-ed IO in the sg3_utils package. An example of using direct IO can be found in sg_rbuf.c in the same package.


Appendix D. Debugging

There are various ways to debug what is happening with the sg driver. The information provided in the /proc/scsi/sg directory can be useful, especially the debug pseudo file. It outputs the state of the sg driver when it is called. Invoking it at the right time can be a challenge. One approach (used in SANE) is to invoke the system() system call like this:
    system("cat /proc/scsi/sg/debug");
at appropriate times within an application that is using the sg driver.

Another debugging technique is to trace all system calls a program makes with the strace command (see its "man" page). This command can also be used to obtain timing information (with the "-r" and "t" options).

To debug the sg driver itself then the kernel needs to be built with CONFIG_SCSI_LOGGING selected. Then copious output will be sent by the sg driver whenever it is invoked to the log (normally /var/log/messages) and/or the console. This debug output is turned on by:
 $ echo "scsi log timeout 7" > /proc/scsi/scsi
As the number (i.e. 7) is reduced, less output is generated. To turn off this type of debugging use:
 $ echo "scsi log timeout 0" > /proc/scsi/scsi

If you want the system to log SCSI (CHECK_CONDITION related) errors that sg detects rather than process them within the application using sg then set ioctl(SG_SET_DEBUG) to a value greater than zero. Processing SCSI errors within the application using sg is my preference.


Appendix E. Other references

The primary site for SCSI information, standards (draft and emerging) and related reseources is www.t10.org.

The most recent news on the sg driver can be found at: www.torque.net/sg .

Some notes on the sg v3 driver can be found at: www.torque.net/sg/s_packet.html . For some timings (and CPU utilizations) comparisons between direct and indirect IO see: www.torque.net/sg/rbuf_tbl.html

The Linux Documentation Project's SCSI-2.4-HOWTO may help to put this driver into perspective: linuxdoc.org/HOWTO/SCSI-2.4-HOWTO . The most recent version of that document can be found at www.torque.net/scsi/SCSI-2.4-HOWTO .

To understand the inner workings of device drivers there is a fine book called "Linux Device Drivers", second edition by Alessandro Rubini and Jonathan Corbet published by O'Reilly [ISBN 0-596-00008-1]. The authors and the publisher have unselfishly made this book available under the GNU Free Documentation License (version 1.1). It can be found in html at www.oreilly.com/catalog/linuxdrive2/chapter/book .

Notes

[1]

SCSI command opcode 0x7f does allow for variable length commands but that is not supported in Linux currently.

[2]

There is an sg version 3.0.19 which is an optional driver for the lk 2.2 series. It has the following limitations:

  • maximum size of SCSI commands is 12 bytes

  • sense buffer limited to 16 bytes

  • resid (residual data transfer count) is always 0

  • direct and mmap-ed IO not supported (defaults to indirect IO)

[3]

Patches exist for sg to extend the number of SCSI devices past the 256 limit when the device file system (devfs) is being used.

[4]

Linux kernel prior to 2.4.15 limited SCSI commands to a length of 12 bytes. In lk 2.4.15 this was raised to 16 bytes. However unless lower level drivers (e.g. aic7xxx) indicate that they can handle 16 byte commands (and few currently do) then the command is aborted with a DID_ABORT host status.

[5]

Some HBA - SCSI device combinations have difficulties with an odd valued dxfer_len . In some cases the operation succeeds but a DID_ERROR host status is returned. So unless there is a good reason, applications that want maximum portability should avoid an odd valued dxfer_len .

[6]

Whether aborting individual commands is supported or not is left to the adapter. Many adapters are unable to abort SCSI commands "in flight" because these details are handled in silicon by embedded processors in hardware. SCSI device or bus resets are required.

[7]

Some lower level drivers (e.g. ide-scsi) clear this status field even when a CHECK_CONDITION or COMMAND_TERMINATED status has occurred. However they do set DRIVER_SENSE in driver_status field. Also a (sb_len_wr > 0) indicates there is a sense buffer.

[8]

Some lower level drivers (e.g. ide-scsi) clear this masked_status field even when a CHECK_CONDITION or COMMAND_TERMINATED status has occurred. However they do set DRIVER_SENSE in driver_status field. Also a (sb_len_wr > 0) indicates there is a sense buffer.

[9]

In some cases the sym53cxx driver reports a DID_ERROR when it internally rounds up an odd transfer length by 1. This is an example of a "non-error".

[10]

Unfortunately some adapters drivers report an incorrect number for 'resid'. This is due to some "fuzziness" in the internal interface definitions within the Linux scsi subsystem concerning the _exact_ number of bytes to be transferred. Therefore only applications tied to a specific adapter that is known to give the correct figure should use this feature. Hopefully this will be cleared up in the near future.

[11]

The command queuing capabilities of the SCSI device and the adapter driver should also be taken into account. To this end the sg_scsi_id::h_cmd_per_lun and sg_scsi_id::d_queue_depth values returned bu ioctl(SG_GET_SCSI_ID) may be useful. Also some devices that indicate in their INQUIRY response that they can accept command queuing react badly when queuing is actually attempted.

[12]

There is a small probability it will spend some time waiting for a command block to become available. In this case the wait is interruptible. If O_NONBLOCK is active then this scenario will cause a EAGAIN.

[13]

The sg driver does record that the mmap() system call has been invoked at least once on a file descriptor. This is not sufficient because the given 'length' may be too short for the current IO. Also the driver is unaware of munmap() calls so it could easily be tricked.

[14]

If ioctl(SG_SET_KEEP_ORPHAN) is set to 1 and a ioctl(SG_IO) operation is interrupted (e.g. by control-C by the user) then when the response arrives then the "num_waiting" will be incremented to indicate a read() can now pick up the response.

[15]

Here is the mapping from the SCSI opcode "group" (top 3 bits of opcode) to the assumed length (in lk 2.4.15):
unsigned char scsi_command_size[8] =
{
        6, 10, 10, 12,
        16, 12, 10, 10
};
The assumed length of group 4 commands changed from 12 to 16 in lk 2.4.15 reflecting support for 16 byte SCSI commands being added to the kernel.

[16]

Older SCSI adapters and some pseudo adapter drivers don't have DMA capability in which case the CPU is used to copy the data.

[17]

Unfortunately that setup time is large enough in some versions of the lk 2.4 series to adversely impact direct IO performance. Also memory malloc()-ed in the user space tends to be made up of discontinuous pages seen from the SCSI adapter. This requires the sg driver to build heavily splintered scatter gather lists which is less than desirable. This limits the maximum transfer size to [(max_scsi_adapter_scatter_gather_elements - 1) * PAGE_SIZE]. [This is a _different_ scatter gather mechanism to that which the user sees in the sg interface based on iovec.]

[18]

When a write() or ioctl(SG_IO) attempts mmap-ed IO there is no check performed that a prior mmap() system call has been performed. If no mmap() has been issued then random data is written to the device or data read from the device in inaccessible. Also once mmap() has been called on a file descriptor then all subsequent calls to ioctl(SG_SET_RESERVED_SIZE) will yield EBUSY.

[19]

One strange quirk is that the /proc/scsi/sg directory will not appear if there are no SCSI devices (or pseudo devices such as USB mass storage) attached to the system. The reason for this is that in the absence of SCSI devices, the SCSI mid level does not initialize the sg driver (even if it has been loaded as a module). When the sg driver is a module and the rmmod sg is successfully executed then the /proc/scsi/sg directory and its contents are removed.

[20]

Raw device names are of the form /dev/raw/raw<n> and can be bound to block devices (e.g. an IDE disk partition such as /dev/hda3). The binding is done with the raw command (see "man raw").

[21]

Although the author wrote most of these programs, initially to test facilities within the sg driver, some have been contributed by others. See www.torque.net/sg/u_index.html for more information.