Corporate What's New? Support Contact Us Home



 

I/O Subsystem

Most user-level I/O is done through the file subsystem. The I/O subsystem handles actual interaction with the device.

 

Device I/O

One of the virtues of UNIX operating systems is that I/O requests to devices look very much like I/O requests to regular files. Each device on the system is represented by one or more special device files that are addressed like any other file. The operating system uses the information in the special device file to locate the appropriate device driver, which is coded to handle the device-specific requests.

For a simple example of how this works, consider the banner(1) command that makes a "poster" out of the character string provided with the command. By default the output goes to stdout, or the terminal from which the banner command is issued. So, if you type banner hello at your terminal, "HELLO" prints out on your terminal in large letters. If you type banner hello > file, a file is created and "HELLO" is written into it. If you type banner hello > /dev/tty01, the string "HELLO" prints on the terminal tty01 (assuming that the special device file for the terminal has privileges that allow you to write to it).

 

Special Device Files

All device I/O is directed to a special device file in the /dev directory. Special device files are created by the superuser with the mknod(1M) command. Each special device file is either "block" type or "character" type, identified by a "b" or a "c" in the first field of the permissions field.

  • Block drivers, which transfer data between user address space and the device through the system buffer cache, are normally written for disk drives and other mass storage devices capable of handling data in blocks.
  • Character drivers, the typical choice for interactive terminals and process control devices, are normally written for devices that send and receive information one character at a time.

One device often has more than one special device file associated with it. For example, a disk device has a block special device file that is used for normal user I/O operations, as well as a character special device file that is used for administrative operations such as backups, file system checking and repair, and disk formatting. As another example, one device may have more than one name by which it is accessed, such as the console, which is also accessed as contty and systty. In this case, the three names are represented by three special device files that are linked together.

Special device files store the major and minor device number in the inode field that holds file size information for regular files. The major device number is an index used by the operating system to identify the driver that controls this device; the minor number is used by the driver to identify which particular device or subdevice controlled by the driver is being accessed.

 

Device Drivers

Device drivers are kernel-level processes that isolate low-level, device-specific details from system calls. Because there are so many details for each device, it is impractical to design the kernel to handle all possible devices. Instead, a device driver is included for each configured device.

The main outline of any driver is a set of entry-point routines that are named in a specific way. Each driver is assigned a two- to four-character prefix, which is combined with prescribed suffixes (such as open(2), close(2), read(2), write(2), and so forth) to form the names of the driver entry point routines. So, if the driver's prefix is xxxx, the driver might have entry-point routines named xxxxopen, xxxxclose, xxxxread, xxxxwrite, and so forth. These routines may call subordinate routines, for which the naming conventions are less strict.

The driver routines are accessed through two device switch tables: cdevsw for character special device files, and bdevsw for block special device files. These tables are built by sysgen(1M), and are basically matrices with a row for each driver and a column for each type of entry point routine. The tables are indexed by the major device number. So, if the user-level program issues a read system call against a character device with major number 53, the operating system looks in the cdevsw table for major number 53, and finds that the appropriate routine is xxxxread.

The switch tables are used to access the driver's base-level routines, which execute synchronously in response to user-level program requests. Most hardware device drivers also have an interrupt level, which executes in response to interrupts generated by the hardware device itself. Interrupts are the primary mechanism through which devices communicate with the operating system. They may signal successful device connections, write acknowledgments, data availability, or read/write completion. For some non-traditional computer devices such as process monitoring and control devices, they may signal that some abnormal condition (such as a device out of calibration or temperature out of range) has occurred.

Interrupt routines are accessed through an interrupt vector table created by sysgen. The interrupt vector table associates the interrupt received with the appropriate interrupt handling routine using the driver prefix and a specific name for the interrupt handling routine much as the switch tables described above do.

 

Direct I/O

Direct I/O allows a user program to control devices directly, thus saving the system overhead associated with other I/O (whether synchronous or asynchronous) such as system call entry and exit. The user process maps the device registers into its address space with the shmget(2) and shmat (documented on the shmop(2) manual page) system calls. Once mapped in, the process can read and write these registers like any other variable, thus gaining complete control over the device operations.

 

Configuring Devices

Configuring a device into the operating system is the process of providing information about a driver and any associated devices and tunable parameters to the system. On the REAL/IX Operating System, this is done using the administrative sysgen(1M) command, which is an interactive menu-driven program that automatically updates the appropriate system files. Refer to the Driver Development Guide for information about using sysgen to configure a driver and associated devices.

 

Priority Disk I/O Queuing

The traditional scheme for scheduling disk I/O provides fair-share accesses to all processes, but can result in disk "bottlenecks" that would subvert the benefits of the realtime scheduler. Ensuring high disk I/O throughput for high-priority processes is essential for providing deterministic response times for critical realtime programs.

The REAL/IX Operating System supports priority queuing of disk I/O associated with processes executing at realtime priorities. The priority of the process determines the priority of disk I/O operations such as:

  • asynchronous I/O operations (aread(2), awrite(2)).
  • synchronous read operations (open(2), read(2), (page faults)).
  • synchronous write operations (operations to files that were opened with the O_SYNC flag set).

"Write-behind" operations (such as when the bdflush daemon executes) are not prioritized because they could interfere with I/O operations for critical realtime processes. Disk I/O operations for processes executing at time-share priorities are scheduled according to the traditional algorithm.

No system calls are associated with this feature. Realtime processes that want to change the priority of their disk I/O operations should change the priority of the process before initiating the I/O operation.

Processes that depend on predictable file access measurements should use the asynchronous I/O facility and should execute at a priority high enough so as not to compete with disk I/O operations from other realtime processes.

 

I/O Transfer Methods

All user I/O is done against data in buffers in user address space. This information is moved to the device itself using one of three schemes:

 

  1. lock the data in user space and transfer directly to the device.
  2. do an intermediate transfer to local driver data space in the kernel.
  3. do an intermediate transfer using a kernel buffering scheme, either the system buffer cache or a private buffering scheme defined just for the application.

Figure 6-1 illustrates these three schemes.

Three methods of I/O transfer

Figure 1 - Three Methods of I/O Transfer

 

Method #2 is useful for occasional transfers of small amounts of data. Method #1 is the fastest data transfer method, but you should only use it with devices that have adequate data storage on the controller and allow a restart after an error (such as network, printer, and some robotics devices). You must lock the data in user address space before beginning the I/O transfer to ensure that it is not paged out.

Most I/O operations for data files use Method #3 with the system buffer cache. Each buffer consists of two distinct parts: the buffer itself, which is a region of kernel memory allocated for data storage, and the buffer header that provides control information for the operations, such as the number of characters to transfer, whether this is a read or write operation, and status information. Tunable parameters allow you to reconfigure the number of buffers and to specify storage sizes ranging from 1K to 128K bytes.

 

Initiating I/O Operations

Most I/O operations are controlled by two or more paired structures, one of which is a user-level control block, the other of which is a kernel-level control block. The user-level process populates the user-level control block, and the operating system transfers the appropriate information to the parallel kernel-level structure, then later transfers new information from the kernel-level structure back to the user-level structure. The reason for these paired control blocks is that the interrupt level of the kernel, which must update the control structure when the I/O transfer is complete (or if it fails because of a device error), can never access a data structure that is also accessed by user-level processes.

There are two types of I/O operations on the REAL/IX Operating System: synchronous (or blocking) and asynchronous (or non-blocking). Synchronous I/O operations are all that is supported on many UNIX operating systems: after the process issues the I/O request, it blocks and does nothing more until the I/O operation completes. The asynchronous I/O interface is provided to support realtime applications (such as a journalizing program) that can (and should) continue execution while waiting for the previous I/O request to complete.

The I/O request's interaction with file descriptors at the user level is discussed in Chapter 5. The following sections describe how the I/O transfers to the device itself are performed.

 

Synchronous (Blocking) I/O Operations

Synchronous I/O transfers are requested with the read(2) and write(2) system calls. These in turn activate the appropriate driver entry point routines: read and write for character devices, strategy for block devices.

For I/O operations that use the system buffer cache, the information in the user-level buffer is transferred to a parallel buffer in the kernel-level buffer cache. For other operations, the system may transfer the information into some area of kernel memory, or lock the user-level address space that contains the data. Regardless of whether the data itself is copied into kernel address space, the control block information is usually transferred into a kernel structure.

The driver routine then validates the request and handles the device-specific details required to transfer the data. After initiating the I/O transfer, the process blocks to await completion of the transfer. The device will signal that the I/O operation has completed, and the driver's interrupt handle will catch this signal and unblock the driver routine that initiated the I/O request, which in turn unblocks the associated user-level process as described below.

For data transfers that use the system buffer cache, used buffers are returned to the system's list of free buffers after the I/O transfer is completed. These buffers are reused according to a least-recently-used algorithm.

 

Multiple Block Transfers

The REAL/IX Operating System provides an optional I/O feature: multiple block I/O operations to mounted file systems on SCSI disk devices. This feature allows using single I/O operations to move multiple disk blocks, giving a considerable reduction in system load. The multi-block technique is likely to provide improved performance over single-block I/O.

The behavior of multi-block I/O is controlled by sysgen tunable parameters. The defaults are usually suitable for most applications, but for ultimate performance they may need changing. Refer to the System Administrator's Guide for information about sysgen tunable parameters.

Multi-block I/O is switchable on a per device basis. Block drivers written to run on the REAL/IX Operating System can include the mbstrategy driver entry point routine in addition to strategy to provide this functionality. For various technical reasons, multi-block I/O is available only for disk devices.

 

Asynchronous (Non-Blocking) I/O Operations

Asynchronous I/O operations are initiated with the aread(2) and awrite(2) system calls, which in turn call the aio driver entry point routine. Emulated asynchronous I/O is performed directly or by daemons created by the fcntl(2) command options F_SETAIOEMUL and F_AIOTWODAEM. Asynchronous I/O operations may or may not use the buffer header associated with the system buffer cache, but always use a pair of control block structures: aiocb is the user-level control block, areq is the kernel-level control block.

When these structures are validated and populated with information on the data for transfer, the operating system begins the I/O transfer. The process does not block, and may indeed initiate other asynchronous I/O operations before the first one completes.

When the I/O transfer completes, the driver's interrupt handler catches the devices completion interrupt and issues a function call that updates the kernel-level control structure. The operating system then transfers this completion information to the user-level structure, which effectively notifies the associated user-level process.

Each active asynchronous I/O transfer has its own pair of control blocks, but might associate many pairs with one process. In general, the interface is optimized for applications where a process repeatedly does similar transfers and can reuse control structures. Once used, the control block structures are returned to the process's pool of available structures, but they are not returned to the system-wide pool of control structures until the process exits or explicitly frees them.

 

Device Interrupts

An interrupt is any service request that causes the CPU to stop its current execution stream and to execute an instruction stream that services the interrupt. When the CPU finishes servicing the interrupt, it returns to the original stream and resumes execution at the point it left off.

Hardware devices use interrupt requests to signal a range of conditions. For traditional computer peripherals, interrupts signal successful device connections, write acknowledgments, data availability, and read/write completion. For other devices such as process control devices, interrupts tell the CPU that something has happened on the device. Driver interrupt routines are responsible for determining the cause of the hardware interrupt and executing the instructions to respond to the interrupt appropriately. The ability to handle such interrupts in a predictable time frame is a key element of a realtime operating system environment.

To further enhance the realtime environment, the multiprocessor REAL/IX Operating System allows targeted interrupts. A targeted interrupt is one that is assigned to a particular CPU (through the intrctl(2) system call). Typical uses are to assign an interrupt or group of interrupts away from a CPU so that particular CPU can be dedicated to a realtime process (through the targetcpu(2) system call) and its associated interrupts.

In addition to hardware interrupts, the operating system handles two other types of interrupts:

  • Exceptions, which are conditions that interrupt the current processing of the CPU and require special fault handler processing for recovery. Examples of exceptions include floating point divide-by-zero operations, bus errors, and illegal instructions. Fault handlers are responsible for executing instructions to handle the specific fault, and for restarting the interrupted instruction sequence once the fault is handled.
  • Software interrupts (Programmed Interrupt Requests or PIRs), which are generated by writing an integer into a logical register address assigned to the interrupt vector table. These are used by the operating system and are not available for general use.

The REAL/IX computer systems use microprocessor families capable of accepting multiple levels of interrupts. The level indicates the degree of priority given the interrupt by the CPU. The higher the priority, the quicker the system will service the interrupt when multiple interrupts are pending. The Interrupt Priority Level (IPL) for the requesting device is determined by the device itself.

 

Interrupt Vectors

An interrupt vector is an entry to the interrupt vector table that is assigned to an interrupt by the sysgen(1M) process. The interrupt vector table resides in kernel space in main memory and associates interrupts with their appropriate interrupt routines. Every hardware device has at least one interrupt vector table entry. Each entry is assigned an interrupt vector number that associates the interrupt with the text address identifying the starting address of the interrupt handler for that interrupt. When an interrupt occurs, the CPU associates the interrupt with its interrupt vector number, fetches the starting address of the interrupt handler, and executes the address to service the interrupt.

Each device has at least one interrupt vector assigned to it. sysgen creates an internal interrupt vector table and stores it in the file the system uses to identify which interrupt vector is associated with which device.

 

Traditional Interrupt Handling

Interrupts are always asynchronous events; in other words, they can arrive in the driver at any time. If an interrupt occurs that is at a higher priority than whatever is executing at the time, control switches to the interrupt handler to service the interrupt, then returns to whatever was executing before. However, user-level programs are usually notified of the results of the interrupt synchronously, through return codes that the driver writes to the user structure.

When a driver's interrupt routine receives an interrupt, it determines what type of interrupt it is and then handles it, which typically involves one or more of the following:

  1. issuing a function call (vsema or wakeup) to unblock the appropriate base-level routine of the driver, which in turn resumes execution. The base-level routine will eventually issue a return code that notifies the user-level process that the I/O is complete.
  2. sending a signal or posting an event to the appropriate user process(es) associated with this device.
  3. updating the u_error member of the user structure to notify the user-level process that an error occurred.

These conventions for handling interrupts are used for most traditional computer peripherals, where most interrupts indicate that some hardware action requested by the base level of the driver has completed. Many devices associated with realtime applications must handle other sorts of interrupt handling. The REAL/IX Operating System features described in the next two sections provide this functionality.

 

Connected Interrupts

Connected interrupts provide a consistent interrupt notification interface for hardware interrupts. Realtime applications that benefit from connected interrupts are characterized by having devices that generate interrupts that are not directly associated with an I/O operation and require rapid service, even if some other processes are delayed. When the interrupt handling routine receives an interrupt, it updates data structures; the user-level process is notified of the interrupt either through the common event notification facility or by polling a process memory location. This provides the determinate interrupt notification required by realtime applications.

Using the connected interrupt mechanism requires special coding in both the device driver and the associated user-level program, although most control of the facility is in the user-level program. The following scenario summarizes how the mechanism works:

  1. The user-level program populates a cintrio structure, which specifies whether notification is by polling, through the common event notification mechanism, or through cisema(2); whether notification is to occur for every interrupt or whether each interrupt requires acknowledgment before the user-level program is notified of subsequent interrupts from that device; and, if the notification method is polling, the memory location polled. The cintrio structure also includes one member whose use you may tailor to the specific needs of the driver and application.
  2. When the user-level program issues the CI_CONNECT ioctl(2) command, the operating system creates a cintr data structure in the kernel where it is accessible by the connected interrupt interface functions.
  3. When the driver's interrupt routine receives an interrupt, it uses the cintrnotify(D3X) kernel function to notify the associated user-level process.
  4. The operating system updates the cintrio structure appropriately, effectively notifying the associated user-level process of the interrupt. This is illustrated in Figure 6-2.

connected interrupt notification

Figure 2 - Connected Interrupt Notification

 

User-level programs use a set of IOCTL commands (documented on the cintrio(7) manual page) to control connected interrupts. Drivers implement the connected interrupt mechanism with the cintrget(D3X), cintrnotify(D3X), cintrctl(D3X), and cintrelse(D3X) kernel functions.

Using direct I/O with the connected interrupt mechanism makes it possible to implement a device driver as a user-level process. This is useful for applications where tight control over the device is required.

 

Handling Interrupts for Asynchronous I/O Operations

Job completion interrupts for asynchronous I/O operations are handled differently than those for synchronous I/O operations, where a process is blocked awaiting the completion of the operation. When the interrupt routine receives the interrupt and determines that it is associated with an asynchronous I/O operation, it updates appropriate members of the areq structure. The operating system then updates the corresponding aiocb structure, thus notifying the associated user-level process.

 

Drivers for Realtime Applications

Technically speaking, there is no such thing as a "realtime driver." There are, however, drivers that support the performance requirements of realtime applications, and drivers for devices that are associated with realtime applications.

All MODCOMP drivers (including those bundled with the operating system and those offered as optional products) support the performance needs of realtime applications. This means they meet the following criteria:

  • allow for preemption.
  • have low interrupt latencies.
  • are written to minimize contention for resources, which may cause processes to block for long periods of time.
  • support realtime features as appropriate.

In addition, we made a number of internal enhancements to the UNIX System V kernel to improve driver performance. For example, on UNIX System V, error messages written to the console by drivers may degrade performance by increasing interrupt latencies; the REAL/IX Operating System includes a daemon that writes the error messages to the console and an internal kernel structure, so that these error messages have virtually no effect on system latencies.

Drivers for standard computer peripherals (such as printers, terminals, disks, and tapes) that are ported to the REAL/IX Operating System from other UNIX operating systems may or may not support realtime performance, depending on how they are installed. The easiest way to port a driver to the REAL/IX Operating System is to use the compatibility modes, but this may impact general system performance. For instance, drivers running under CPU affinity disable preemption when they are running, which degrades general system performance. Any call to spl (used to protect critical code regions by disabling interrupts) degrades the interrupt latency of the system. Refer to the Driver Development Guide for more information about the compatibility modes. Drivers rewritten so that they are fully semaphored (refer to Chapter 3 for a discussion of kernel-level semaphores) and that adhere to the guidelines in the driver and kernel documentation will usually support the performance needs of realtime applications.

Drivers written for devices that are used specifically for realtime applications (such as an analog or digital process control board) can meet more stringent performance requirements by never blocking and using specific realtime features such as connected interrupts. In many cases, the best approach is to use direct I/O for control of the board, and have a kernel-level driver that is written mainly to support interrupts generated by the board.

 

Internet Protocols over Ethernet

The REAL/IX Operating System supports the Internet networking protocols running over various Local Area Network (LAN) controllers. The REAL/IX Networking model provides both the Berkeley Socket System Call Interface and the AT&T Transport Level Interface (TLI) Library for network development, and a full range of user-level commands for data transfers using existing networking packages.

Figure 6-3 parallels the components of the REAL/IX Operating System's networking capabilities to the OSI Model.

REAL/IX networking and the OSI Model

Figure 3 - REAL/IX Networking and the OSI Model

 

Ethernet Protocol

Ethernet is a Local Area Network (LAN) protocol, meaning it is used to interconnect several machines (typically, a few hundred computers, each of which is referred to as a host) that are located within a kilometer of each other geographically. Each computer is a node on the Ethernet, and connects to an Ethernet cable. Ethernet utilizes a broadcast bus technology, where all interfaces attach to a common communication channel: a coaxial cable called the ether, and each transmission is broadcast to all machines that are connected.

Two different types of cabling are used as an Ethernet network. The first is a thick coaxial cable which can have a segment length of up to 500 meters and have up to 100 nodes attached per segment; this is the cable originally specified for Ethernet. An alternate type of cabling is commonly referred to as "CheaperNet," because it is significantly less expensive and easier to install. CheaperNet can have a segment length of only 185 meters and up to 30 nodes attached per segment. The limits on segment length do not limit the total size of an Ethernet LAN, because segments of cable are linked together with repeaters or bridges.

Ethernet is categorized as a Carrier Sense Multiple Access, Collision Detect (CSMA-CD) protocol. When a machine on the network has data to transmit, it checks ("senses") whether the network is idle, and only transmits the data if it is. If two machines put data on the network at the same time (which happens because the signal does not reach all parts of the network at the same time), the two signals are scrambled; this is called a collision. The sending machine is notified of the collision and the transmission is aborted. The sender waits a short while and tries again.

It is possible to link networks together; a network of networks is referred to as an Internet, and this is the significance of the "IP" portion of the TCP/IP name. Ethernet LANs may stand alone or connect to an Internet through some wide area link, such as a line to an X.25 packet switched network.

 

Internet Components

The TCP/IP system supports several communication mechanisms, all of which can exist simultaneously on the same network. An analogy is a telephone network: two people can have a connection and speak to each other in English, while two other people speak in German on another connection on the same network. As long as both connections use the same underlying switching scheme, the conversations in different languages can coexist on the network. Similarly, a number of different communication mechanisms can coexist on one Ethernet network, as long as the two endpoints of a particular connection use the same protocols. The REAL/IX Operating System supports the following Internet protocols:

  • TCP -(Transmission Control Protocol)
    • A reliable, byte-stream data transfer connection with flow control between network applications. It supports both client and server connections using sockets or TLI.
  • UDP -(User Datagram Protocol)
    • A simple, low-overhead method of sending datagrams to other processes. It is classified as an unreliable protocol, meaning it provides less extensive flow control and error recovery than TCP; packets are not guaranteed to arrive or to arrive in a specified order. UDP serves as an alternate transport layer to TCP, and supports connectionless communications using the Berkeley sockets model or TLI.
  • IP -(Internet Protocol)
    • IP performs the routing of individual datagrams to any node on the local network or (through a gateway) on a remote network. IP is flexible enough to accommodate huge nationwide networks, yet efficient enough for a simple two-host network.
  • ICMP -(Internet Control Message Protocol)
    • ICMP provides a means for handling network error and sending routing information around the network. The ICMP "ECHO" message (ping(1M) command) is used to determine if a host is present on the network.
  • ARP -(Address Resolution Protocol)
    • ARP maintains a translation table that maps Internet addresses to Ethernet addresses. Normally, entries in this table are added and deleted as various hosts are accessed over the network. In addition, the local host CPU can read and change the table by a direct connection to ARP.
  • ECHO -(Xerox Echo)
    • The ECHO module handles version 2 Xerox Ethernet test packets received from the network. Such packets are used by other hosts to verify the existence of this node. The ECHO protocol is an unsupported feature of the REAL/IX Internet implementation.

User Commands

The standard system networking applications exist as Section 1 user commands. Table 6-1 lists the basic user command set. Note that both the DARPA commands and the standard Berkeley commands are supported.

TCP/IP applications (user commands)

Table 1 - TCP/IP Applications (User Commands)

 

Refer to the manual pages for information on using these commands.

You may supplement these with customized commands written using the Socket Library System Calls documented in Section 2 and the Transport Level Interface Library (TLI) documented in Section 3N of the reference manual.

 

Networking Daemons

The TCP/IP networking software includes user-level daemons that support the functionality of the standard networking applications. Like the system daemons discussed in Chapter 3, these networking daemons occupy a process slot and are initialized when the system goes to multi-user state. However, they are user-level daemons that you may stop with the kill(1) command. (This is not recommended).

The inetd networking daemon runs constantly; it spawns other daemons for the application as needed. The daemon streamsd runs in support of the STREAMS toolkit. In addition, the tftpd daemon listens for tftp requests, and is used mostly to download REAL/VUC terminals.

Note: The STREAMS tools are supplied as part of the NOSUPPORT package distributed with the REAL/IX Operating System and do not provide customer support regarding STREAMS applications.

By default, these daemons run at time-sharing priorities, although you may change their priorities to a fixed realtime priority by issuing the setpri(1R) command manually or from a script executed when the system is initialized.

 

UUCP and TCP/IP

You may configure the standard UNIX-to-UNIX System Copy (UUCP) utility package to run over the TCP/IP network. This provides support for standard mail and UUCP commands across the Ethernet network.

 

Networking in Realtime Applications

Ethernet does not guarantee a fixed response time, so you may not consider it to act as such in a realtime environment. However, execution through STREAMS and TCP/IP will not affect the realtime response of capabilities of the REAL/IX Operating System. A realtime process with a high priority that gains control of the CPU will significantly impact the network, even to the point where network connections time out.

 

Hardware Environment

The Hardware Architectures that the REAL/IX Operating System supports will be discussed in Appendices which will be added to this document at a later date.


Go to Chapter 7 TOC

 


E-Mail Webmaster  | Legal | Copyright © 2001 MODCOMP, Inc. | Rendered Sept. 28, 2001

MODCOMP is a subsidiary of CSP Inc