Corporate What's New? Support Contact Us Home



 

Process Subsystem

The process subsystem is responsible for process scheduling, process synchronization, memory management, and interprocess communications. The REAL/IX Operating System supports all process subsystem facilities of UNIX System V, plus enhancements to provide an appropriate execution environment for realtime processes.

 

Processes

In UNIX terminology, a program is the set of instructions and data coded and compiled by the programmer, and a process is one execution of a program. Some other operating systems use the term task for what we call a process. On UNIX operating systems, many processes may concurrently execute the same program at the same time.

Processes execute at either user level or kernel level.

  • Each user-level process runs in its own address space, separated from all other processes. Other processes may communicate with it through one of the facilities discussed in the next chapter, and a process executing at a higher priority may prevent it from executing, but otherwise it is totally protected from other processes.
  • All kernel-level processes execute as functions of the kernel main() routine. While it is possible to synchronize kernel processes to prevent concurrent access of kernel resources, any kernel process can access the address space of any other executing process. For this reason, it is important that kernel-level processes (including user-installed system calls and drivers) use defined functions as much as possible, access kernel data structures appropriately, use kernel semaphores and spin locks as needed, and are tested thoroughly before installation.

 

Memory Segments for Processes

Each process is represented by two memory segments called the text (or code) segment and the data segment, and a set of data structures that are referred to as the process environment. A text segment contains code and constant data and is shared by all processes running the same program.

The data segment of an executing process is composed of two regions: the stack region and the data region. The data region contains the process's static variables, and may also contain a dynamic data structure known as a heap. The data region begins at the low address of the space and grows upward; the stack region begins at the high address and grows downward. The actual stack and heap areas are always slightly smaller than the region that is allocated for them, as illustrated in Figure 3-1. Note that any shared memory segments are located between the stack and the heap segments.

data segment of executing porocess

Figure 1 - Data Segment of Executing Process

 

The process environment records the information the kernel needs to manage the process, such as register contents, priority, open files, and so forth. In order to maintain system integrity, a process may not address its environment directly, but can use system calls to modify it.

 

Creation and Termination

One of the fundamental differences between UNIX operating systems and other operating systems is its reliance on processes that are spawned from others, called child processes. A number of standard system services exist as ordinary utility processes rather than embedded in the operating system. When the operating system is first booted, it creates the init process, which is the parent of all shell processes created when users log in. If you issue a command from the terminal (such as cat(1), to list the contents of a file), that process is a child of your shell program. On UNIX operating systems, processes often use other processes rather than a subroutine. Of course, processes also create processes to run portions of algorithms in parallel, as they would on other operating systems, but the use of a process as something similar to a subroutine is peculiar to UNIX operating systems and the associated programming style.

 

Forking Child Processes

A new process is created with a fork(2) system call. This call creates a child process, which is effectively a duplicate of the parent process that created it. This is implemented using a copy-on-write scheme, where data pages are copied only when a write operation is requested, thus avoiding unnecessary copying.  fork copies the parent's data and stack segments (or regions) plus its environment to the child. The child shares its parent's text segment, which conserves memory space (by default, all REAL/IX Operating System programs are reentrant). Child processes initialize more quickly than the original process, since they only have to modify parts of the inherited environment rather than recreate the entire environment.

fork returns values to both the parent and child, but returns a different value to each. The parent receives the process id (PID) of the child, which can never be 0, and the child receives 0. The program uses these return values to determine which is the child process and which the parent process (since they are both executing the same program) and takes different branches in the code for each.

After checking the value returned by fork, the parent and child execute different branches of the same program in parallel. If the child needs to execute a different program, it issues an exec(2) call. exec replaces the text and data segments of its caller with those of a new program read from a file specified in the call. exec does not alter its caller's environment; a child process may execute a different program, but still have access to its parent's files, although they may have been modified by the child between the fork and the exec.

A child issues an exit(2) system call to terminate normally; this call takes a parameter whose value is returned to the child's parent. A child may also terminate abnormally by a signal issued by the kernel, a user, or another process. When a child terminates, either normally or abnormally, the operating system sends a SIGCLD signal to the parent process. Signals are discussed in more detail in Chapter 4.

 

Waiting for a Child Process to Terminate

Meanwhile, the parent process is free to continue execution in parallel with the child process. If the parent needs to wait for the child to terminate, it issues the wait(2) system call. wait returns the process id of the terminated child (which allows one parent process to spawn several children and specify the one for which it is waiting) and the status code passed by the child when it exits.

Whether or not the parent waits for the child to terminate, it receives a SIGCLD signal when the child terminates. The parent process can catch this signal, then issue a wait to learn what happened to the child process.

 

Process States

On the REAL/IX Operating System, there are eight process states, viewed with the ps-efl command. These states, with the ps descriptor shown in parentheses, are:

  1. The process is runnable when the kernel schedules it, although it is not currently running (R).
  2. The zombie state, where the process has issued the exit(2) system call and no longer exists, but it leaves a record containing an exit code and some timing statistics for its parent process to collect. The zombie state is the final state of a process (Z).
  3. The process is stopped by a signal (T).
  4. The process is newly created and is in a transition state; the process exists, but is neither blocked nor runnable. This state is the start state for all processes except process 0 (I).
  5. The process is blocked awaiting memory availability (X).
  6. The process is executing in either kernel or user mode (N). The "N " indicates the processor number on which the process is executing.
  7. The process is blocked awaiting some event; a signal will not unblock it (D).
  8. The process is blocked awaiting some event; a signal may unblock it (S). This is usually the most common state.

 

Memory Management

The memory management module allocates memory resources among all executing processes on the system.

 

Data Structures

Every executing process has three memory management data structures (proc, user, and pregion) associated with it:

 

  • proc structure (defined in the proc.h header file). All proc structures are listed in the kernel's process table, whose size is determined at sysgen(1M) time. The kernel's process table is always in memory (never paged out), so the proc structures contain information the kernel may need while a process is paged out, such as its priority, its process group and parent process, address of the "u" page, and addresses used for sleep functionality.
  • user block (or u area, defined in the user.h header file). The user block is never paged out on the REAL/IX Operating System, but because some other varieties of the UNIX operating system do page them out, the convention is to include no information in user blocks that the kernel needs in case the user block is paged out. The user block for the currently executing process is always located at a specific location in virtual memory; when the kernel does a context switch, the u area for the currently running process is mapped out of the fixed address, and the u area for the process that is about to run is mapped into the fixed address. Each user block has a pointer to the corresponding proc structure.
  • process region (or pregion) structure. The entries in the process's pregion table point to entries in the system's region table, each of which describes a logical segment of memory. Each pregion table has a fixed number of entries, usually three (for text, data, and stack) plus the number of shared memory segments for the process. All processes executing the same program have a pregion pointer to the same text segment in the system's region table. When sharing memory, the pregion structure for the process that initialized the shared memory segment points to the entry in the region table; the pregion structure for all other pro-cesses that access that shared memory segment point to the same region entry.

These structures are illustrated in Figure 4.

memory management data structures

Figure 4 - Memory Management Data Structures

 

Note that the shared memory is accessed by the two processes whose pregion tables both point to the same Kernel Region Table entry.

The fork(2) and exec(2) system calls are intimately associated with the memory management data structures. When a process forks, the kernel:

  • makes a new entry in its process table for the child process, copying most of its contents from the parent's proc structure.
  • copies the parent's pregion table to the pregion table it allocates for the child.
  • allocates new pregion entries and page descriptors for the child's data and stack segments.
  • allocates new pregion entries for the child's text and shared memory segments; these regions share the parent's corresponding physical pages as long as both processes only read the pages.
  • if either the child or the parent writes to one of the pages for data or stack, the kernel makes a separate copy of that physical page for the child.

 

Address Mapping

A running process refers to memory with virtual addresses, which essentially consist of a virtual page number and a byte offset into the page. The hardware's memory management unit (MMU) translates the virtual page number into a physical page frame number, adds the offset, and sends the resulting physical address to memory.

The key to performing the virtual-to-physical address translation, or mapping, is the page map maintained in kernel memory and defined in immu.h. As discussed earlier, each process has its own page table, containing one entry for each of its pages; when the process is running, its page table is loaded into the page map.

 

Paging

When the number of processes exceeds the permissible number for residence in memory, the kernel moves less-active pages (that are not locked in main memory) to disk memory from main memory. This maintains over-all good system performance, although it may slow the response of an individual process.

The vhand daemon is responsible for paging operations. When less than 10% of the available memory is free, vhand makes "aging passes". When less than 1% of the available memory is free, vhand pages out the least recently used pages on the system until 5% of memory is free.

The free list contains page frames that are eligible for reuse. Page frames are added to the head of the free list when they are no longer needed, and to the tail of the free list when they are needed again. For example, when the last process executing a program terminates, the process's stack, data, user page, and page table frames are of no use to another process, and so are added to the head of the free list. The frames containing the code pages are usable if another process executes the program, so they are added to the tail of the free list.

If another process executes the program while the code pages are on the free list, the kernel reclaims them rather than bringing them in from the object file or the swap device. Page frames are allocated only from the head of the free list so that pages that may be needed again stay associated with page frames as long as possible.

Note that frames containing kernel code and data are on neither the swap list nor the free list because no part of the kernel is ever paged out.

 

Page Faults

A page fault is an attempt to access a page that the pager has marked invalid, and invokes the kernel's page fault handler. The page fault handler takes different actions depending on where the invalid page is:

  • If the page is on the free list, the page fault handler unlinks the associated page frame from the free list, marks it valid, and resumes the process. Reclaiming a page frame from the free list in this manner does not block the faulting process and is faster than a disk read, although the system may incur unacceptable (for critical realtime processes) overhead.
  • If the page is paged out (in other words, the frame it formerly occupied is allocated to another page), the page fault handler blocks the faulting process and schedules a disk read to retrieve the page from the swap device. Later, when the page is read, the kernel allocates a frame from the head of the free list, updates the frame address in the process's page table, marks the page valid, and unblocks the process.

The combined actions of the pager and the page fault handler tend to keep frequently-accessed pages associated with page frames, while little-used pages tend to migrate to the swap device. The pager moves page frames to the free list if they are not recently accessed, so that they are eventually reallocated. Concurrently, page faults taken on these pages nullify the pager's efforts. If a page is accessed very frequently, the pager never sees it marked as not-accessed and never adds it to the free list. If a page is accessed often, it is possible that it is added to the free list, but is rapidly reclaimed by the page fault handler before it gets to the head of the list. Only the least-used pages get to the head of the free list and require reading from disk before they are used. The free list thus serves both as a source of available page frames and a cache of recently discarded pages that are quickly reclaimed.

 

Allocating Memory

REAL/IX memory allocation is similar to that on other UNIX operating systems, with extensions to provide explicit control over memory allocation in critical realtime application programs. These extensions provide the realtime programmer with complete control over the REAL/IX demand-paging subsystem. To guarantee response time, users are allowed to pre-page and lock all pages (instructions, data, shared memory, and stack) of a program into memory. At the programmer's option, the operating system notifies a realtime process of any attempt to grow the stack or data portions of the process's data segment.

The underlying philosophy of memory allocation is different for realtime and time-sharing processes. The time-sharing philosophy is to avoid consuming any more memory than is absolutely necessary so that all processes have equal access to memory resources. For critical realtime processes, the emphasis is on providing optimal performance for the program.

Consequently, realtime programs typically preallocate a generous amount of memory and lock all resources they might need into memory. You may use the REAL/IX Operating System to implement either philosophy.

 

Preallocating Memory

The REAL/IX Operating System uses demand paging, so the operating system does not allocate any memory for a process when it is initialized. Rather, the process is allowed to start executing at its entry point, which causes the process to page fault text and data pages as they are referenced. As the data segment (which is composed of the stack and data regions) outgrows that memory, the system allocates more physical pages. This scheme conserves memory and is appropriate for many applications, but the overhead incurred is unacceptable for critical realtime programs.

Most UNIX operating systems provide the brk(2) and sbrk(2) system calls to preallocate virtual space for the data region. The end of the data segment is called the break. brk and sbrk allow you to specify a new location for the break, with brk specifying an absolute address and sbrk specifying an address relative to the current break.

The REAL/IX Operating System also provides the stkexp(2) system call, to preallocate virtual space for the stack. You can specify either the absolute size of the stack or the increment by which the stack is to grow.

For programs that need memory allocated for data space, the malloc(3C) mechanisms are a simple, general-purpose memory allocation package. Realtime programs should call malloc only during the initialization part of the program, or use brk or sbrk to pre-allocate data space. Note that one program should not call both malloc and brk/sbrk.

 

Locking Pages in Memory

Paging enables the operating system to provide good performance to a number of programs executing at the same time. However, the overhead associated with accessing processes or data that are paged out is significantly more than the overhead involved in accessing processes or data that are resident in memory.

The plock(2) system call allows you to lock text and data segments into memory. The shmctl(2) SHM_LOCK system call allows you to lock shared memory segments into memory. These calls lock segments into memory when they are first accessed. They are used with the system calls that preallocate memory, or can allow the operating system to allocate memory as needed.

Critical realtime processes can lock segments into memory during process initialization with the resident system call, so that the first attempt to access a segment does not incur the overhead of loading it into memory. The resident(2) call requires preallocation of memory as discussed above. For critical realtime processes that preallocate memory, expanding memory beyond the preallocated limits is usually considered a fault, although the action taken for such a fault is at the discretion of the programmer. If desired, the resident call may post an event (as discussed in the following chapter) to the process if the memory allocated is inadequate for the stack or data region.

 

Scheduling

Scheduling determines how a CPU is allocated to executing processes. Each executing process has a priority that determines its position on the run queue.

The multiprocessing environment uses, effectively, two run queues: the global run queue and the local run queue. The global run queue is the list of processes that are executable by the first available CPU. The local run queue is maintained by each individual CPU and is a list of processes that have been targeted for execution on that particular CPU. Specifying a targeted CPU for a process might be used to improve the realtime performance of a particular process. Targeting CPUs is accomplished with the targetcpu(2) system call. The multiprocessor run queues are organized in the same manner as uniprocessors. Note that if a local run queue is available, the global run queue is not used. The run queue organization is discussed in the following paragraphs.

On the REAL/IX Operating System, the run queue consists of 256 process priority "buckets", divided into two major parts as illustrated in Figure 3-3: processes executing at priorities 128 through 255 (time-slice priorities) utilize a time-sharing scheduler implemented in the onesec process, and processes executing at priorities 0 through 127 (realtime priorities) utilize a process priority scheduler implemented internally. Each of these schedulers utilize different scheduling algorithms and are discussed below:

0

.

.

.

127

realtime priorities are changed only by explicit request of program or administrator.

128

.

.

.

253

timesharing priorities are dynamically adjusted by the operating system.

254

255

non-migrating priorities execute only when all other processes are idle

Figure 3 - REAL/IX Process Priorities

 

Processes are scheduled according to the following rules:

  • A process runs only when no other process at a higher priority is runnable.
  • Once a process with a realtime priority (0 through 127) has control of a CPU, it retains possession of that CPU until it is preempted by a process running at a higher priority or relinquishes the CPU by making a call that causes a context switch, or blocks to await some event, such as I/O completion, or its time slice expires (by default, the time slice is 11 years, but this is changeable with the setslice(2) system call.
  • Because the REAL/IX Operating System has a preemptive kernel, pre-empting a running process is possible at any time if a process at a higher priority becomes runnable.

The process table for each executing process includes scheduling parameters used by the kernel to determine the order in which processes are executed. These parameters are determined differently for time-sharing and fixed-priority scheduled processes.

 

Time-Sharing Scheduling

Processes executing at priorities 128 through 253 utilize a time-sharing scheduler similar to that on other UNIX operating systems. The operating system varies the priorities of executing processes according to a specific algorithm. For instance, interactive processes gravitate towards higher priorities (their actual run times are relatively small compared to their wait times), and processes that recently consumed a large amount of the CPU are relegated to lower priorities. A process' only control over its priority is with the nice(2) command and system call, which lowers the relative priority of a process or, if issued by the superuser, can grant the process a more favorable priority.

The kernel allocates a CPU to a process for a time quantum, determined by the MAXSLICE tunable kernel parameter. The process will retain possession of the CPU until preempted by a higher priority process, until it is finished and relinquishes control of the CPU, or until its time quantum expires. If the process has not released the CPU at the end of the time quantum, it is preempted and fed back into the queue at the same priority. When the kernel again allocates a CPU to the process, the process resumes execution from the point where it was suspended. Once a second, those time-sharing processes that have consistently used their whole quantum are shuffled to lower priorities.

This time-slice scheduler has the advantage of equitably distributing the CPU among all executing processes. It is not adequate for critical realtime processes, which need to execute in a determinate (preferably fast) time frame. For this reason, the REAL/IX system supplements the traditional time-sharing scheduler with the fixed-priority scheduler.

 

Effect of nice

Each executing process has a nice value. This value is displayed on the  ps -efl output. The nice(2) system call and user command change this nice value.

When the nice value is recalculated, the execute priority of the process is recalculated if the process is executing at a time-share priority (128 through 253). If the process is executing at a fixed priority (0 through 127, 254, or 255), recalculating the nice value has no affect on the execute priority.

 

Fixed-Priority Scheduling

Processes executing at priorities 0 through 127 utilize a fixed-priority scheduler. A process establishes its own priority with the setpri(2) system call. The operating system never automatically changes the priority of a process executing at a realtime priority except during a semaphore boost operation; only the process itself or a user with realtime or superuser privileges can change the priority. A process executes only if there are no processes with higher priorities that are runnable at this time.

Runnable processes at the same priority are arranged in a circular, doubly-linked list. A round-robin scheduling scheme is used for processes at the same priority, with the ability for a process to relinquish its time slice to another process at the same priority. A process can also use the setslice(2) system call to establish quantums; the default is 6 ticks.

The fixed-priority scheduler allows a critical realtime process to "hog" the CPU as long as necessary to finish. If processes at high realtime priorities are using the CPU, low-priority realtime processes and time-share processes may never get to execute. Note that a process scheduled at priority 100 will run just as fast as a process scheduled at priority 1 if no processes are scheduled at a higher priority. Because a high-priority "runaway" process may never surrender the CPU, we recommend setting the console to run at priority 0 or 1 and no other processes (other than critical system processes) run at that priority, to ensure that you can regain control of the system.

 

Run Queue Organization

To enable the operating system to search the queue efficiently, two bit mask schemes are implemented for each run queue. The rqmask2 scheme contains 8 bit masks, each of which is 32 bits long. Each bit corresponds to one bucket in the queue; if a bit is set to 1, there are one or more runnable processes at that priority. The rqmask is an 8-bit mask, with one bit for each bit mask in rqmask2. If there are any runnable processes at priorities 0 through 31, the first bit is set; if there are any runnable processes at priorities 32 through 63, the second bit is set, and so forth. The run queue organization is illustrated in Figure 6.

REAL/IX run queue organization

Figure 6 - REAL/IX Run Queue Organization

 

Locks to Preserve Data Structure Integrity

The REAL/IX Operating System uses spin locks and suspend locks (or kernel semaphores) to ensure data structure integrity in the preemptive kernel. If two processes access the same global data structure, it is important that the first process completes any update of that structure before the second process accesses it. In other UNIX kernels, this is handled by disabling interrupts to prevent an interrupt handler from accessing a data structure that was being manipulated by process-level kernel code. If the preemptive kernel were implemented without locks, a higher-priority process could cause a context switch from a lower-priority process even though it is in the process of updating a data structure, and thus corrupt the structure.

In a non-preemptive uniprocessor configuration, data structure integrity is preserved by manipulating processor execution levels to prevent interrupts when updating a structure, but this is inadequate for a multiprocessor configuration because the interrupt handler or another process may execute on a different processor than the process-level routines. The locking mechanism enables the REAL/IX Operating System to run on a multiprocessor configuration, where all processors operate on a symmetrical, peer-to-peer basis and each processor can simultaneously execute user-level code, process-level kernel code, and interrupt-level kernel code.

 

Synchronization in Compatibility Mode

Other UNIX operating systems provide kernel-level synchronization with the sleep/wakeup functions to block and unblock a process, and the spl (set priority level) function to disable interrupts. The REAL/IX system provides three compatibility modes that allow drivers to be ported from UNIX System V without rewriting the synchronization facilities. The three modes are:

  • Non-preemptible - kernel preemption is turned off when the process is running.
  • Major-device semaphoring - one semaphore is set for the major device (that is, the driver itself). This is implemented in the switch table, so that only one instance of the driver entry point can execute at a time.
  • Minor-device semaphoring - one semaphore is set for each minor device (that is, for each actual device controlled by the driver).

Drivers installed using these compatibility modes may not realize the full performance enhancements provided by rewriting the drivers to use kernel semaphores and spin locks, but should perform similarly to how they do on UNIX System V.

 

Kernel Level Semaphores

Suspend locks, or kernel semaphores, are used when the lock time is relatively long (implemented by switching to another runnable process while the desired resource is busy). They are used to limit the number of processes that access a kernel resource simultaneously or to block a process until a specified event occurs (in lieu of the sleep/wakeup functions of traditional UNIX operating systems).

The value of a semaphore is initialized at system initialization time using the initsema function. initsema sets up the initial integer values for the semaphore while the valuesema function is used to read the current status of a semaphore. valuesema does not either lock or alter the state of the semaphore.

The value of a semaphore is decremented with the psema or cpsema (conditional psema) functions. The difference between these two functions is that, if the resource is not available, psema causes the process to block and wait until the resource is available, and cpsema returns without gaining access to the resource and attempts to gain access to the semaphore at a later time. The decsema is used by the operating system to unconditionally decrement the value of the semaphore counters.

The value of a semaphore is incremented with the vsema or cvsema (conditional vsema) functions. The difference between these two functions is that vsema increments the value of the semaphore unconditionally (thus unblocking a process that may be blocked waiting for it), whereas cvsema increments the value of the semaphore only if a process is blocked on that semaphore. The incsema function is used by the operating system to unconditionally increment the value of the semaphore counters.

The value to which a specific semaphore is initialized determines its use. Semaphores used only to block processes, such as while waiting for an I/O operation to complete, are initialized to 0, so that the first process to issue a psema will block. Semaphores used to control access to a kernel resource are initialized to the number of resources available (for instance, the number of buffers in the pool), so that processes do not block unless the resource is exhausted.

Whatever the initial value of a semaphore, its effect is determined by its value at any given time. Figure 7 summarizes the meaning of kernel semaphore values.

Value of Kernel (Suspend Lock) Semaphore

 <0

 0

 >0

One or more processes are blocked waiting access to this semaphore (and the resource it controls). The absolute value of the semaphore is the number of processes that are blocked. The semaphore (and the resource it controls) may be in use, but no processes are blocked on it. The resource controlled by the semaphore is available. The value of the semaphore indicates the number of resources available.
A process that issues a psema on the semaphore will block; a process that issues a cpsema on the process will return without accessing the resource. The value of the semaphore indicates the number of processes that can access the resource without blocking.

Figure 5 - Values of Kernel Semaphores

 

Spin Locks

Spin locks are used when the lock time is very small (typically less than or equal to the time of two context switches). The locks are initialized to zero when the system is booted by initlock. When a process wants to gain exclusive of a protected resource, it executes an spsema operation which tests and sets the lock. If other processes try to use the resource they will be blocked until the process owning the lock issues an svsema operation to unlock the resource.

For uniprocessors, spin locks are equivalent to disable/enable interrupt instructions. In multiprocessors, spin loops using special hardware instructions are provided such as "test & set (TAS) a bit in a memory location."

To better understand the REAL/IX spin lock by the TAS spin loop process, consider the example illustrated in Figure 8. Processes MP1 and MP2 share a common interest in a piece of code. MP1 enters the region and sets a spin lock on the common code area. When MP2 attempts to enter the same region, it finds the lock set, and so loops on the lock operation until MP1 releases the protected regions with the unlock operation.

The operating system also makes use of a special semaphore function, cspsema. If, upon execution of the cspsema the semaphore is spin locked, the operating system does not wait until the semaphore is available, but immediately returns control of the process back to the CPU.

As illustrated in Figure 8, the spin lock process in the semaphore processor differs from that of the TAS spin loop process, but is functionally identical. When the semaphore processor receives a semaphore function, it tests the semaphores to ensure that the requested semaphore is available for use. MP1 and MP2 share a common interest in a piece of code. MP1 enters the region and the semaphore processor sets a spin lock. The spin lock is actually a single read request to semaphore memory. When MP2 attempts to enter the same region, the semaphore processor indicates the lock is set, whereby MP2 actually suspends execution (stalls) and remains stalled until the semaphore is released by the unlock operation.

The advantage of the SSP method over the TAS spin loop method is that a LOCK signal is not required in order to suspend CPU execution. For more detailed information about the semaphore processor, and its spin lock process, refer to the System Level Concepts Technical Manual.

REAL/IX spin locks, TAS spin loops

Figure 6 - REAL/IX Spin Locks, TAS Spin Loops

 

REAL/IX spin locks, semaphore processor

Figure 7 - REAL/IX Spin Locks, Semaphore Processor

 

Kernel Daemons

To meet realtime constraints, some of the traditional interrupt handler functions have been moved to high-priority processes (daemons) that are triggered from an interrupt level with a vsema or cvsema operation. These daemons include:

  • onesec - maintains free page counts, calculates new process priorities for time-slice processes, and unblocks other daemons
  • vhand - processes paging operations
  • bdflush - flushes I/O buffers that have been around too long
  • hitimed - handles delays and timeouts for high priority processes
  • lotimed - handles delays and timeouts for low priority processes
  • ttyd - handles interrupts for the line discipline
  • prfd - handles kernel printf functions
  • pgrpsigd - handles process group signal delivery functions
  • idle - ensures that the system always has a process to execute there by maintaining scheduler algorithms
  • streamsd - runs in support of the STREAMS toolkit

The executing priorities of these daemons are set with tunable parameters, enabling users to choose which daemons run at priorities higher than critical realtime processes.

Note that the idle daemon is fixed at the lowest running priority of 255. There is a special case priority of 254 that is reserved as an idle task priority for those customers who want to execute a custom idle task that may run some background process instead of the standard REAL/IX system idle daemon.

In addition to these kernel daemons, the TCP/IP networking software uses a set of user-level daemons; these are discussed in Chapter 6 under Networking Daemons.

 

Timers

The operating system uses timers to schedule "housekeeping" operations that must be run periodically and to provide a mechanism through which realtime processes can schedule events.

 

System Clock

The system clock is the primary source for system synchronization. A clock interrupt is generated on every clock tick. The clock interrupt handler is activated by this interrupt, and maintains the user and system times and the current date. It also provides a triggering mechanism for process interval timers, profiling, and driver timeout functions.

By default, the system clock will tick at the rate specified by the constant HZ in the file sys/param.h, typically 60 times a second. It is possible to increase the frequency and thereby gain resolution at the expense of additional interrupt processing time.

Each CPU in a multiprocessor system has a clock. When the REAL/IX Operating System is booted, only one of the CPUs is designated to maintain system time, the current date, and process driver time-outs, as well as triggering process interval timer expirations. The other CPUs maintain CPU dependent information that requires updating at each clock tick.

The REAL/IX clock interrupt handlers differ from that on most other UNIX operating systems in that the interrupt blocking time has been bounded to never block interrupts longer than 100 microseconds. This has been achieved by moving some functionality to kernel daemons.

 

Timing Functions

The REAL/IX Operating System supports all timing functionality supported by the UNIX System V operating system. These include the following system calls that consider time as the number of seconds since 00:00:00 GMT, January 1, 1970:

  • time(2) returns the value of time in seconds
  • stime(2) sets the system time measured in seconds
  • alarm(2) sends SIGALARM to the calling process after a specified number of seconds have elapsed
  • pause(2) suspend process until a signal is received
  • sleep(2) library routine that suspends execution of a process for a specified number of seconds or until a signal is received

In addition, the REAL/IX system offers some timing facilities that originated with the Berkeley variants of the UNIX system.

  • setitimer(2) sends SIGALARM to the calling process after a specified number of seconds and microseconds, optionally allows the SIGALARM to be repeated at fixed intervals, and may be used to cancel a running timer
  • getitimer(2) returns the length of time remaining before a SIGALARM due to a previous setitimer is delivered
  • adjtime(2) make small changes in the system clock to allow synchronization with other time sources

Additional timing facilities based on those in the POSIX 1003.4 1b documents include:

  • gettimer(2) returns the value of time in seconds and nanoseconds
  • settimer(2) sets the system clock to a value in seconds and nanoseconds
  • nanosleep(2) suhyspends execution of a process for a specified number of seconds and nano-seconds
  • nanosleep_getres(2) returns details of the resolution of the nanosleep function

The cron(1M) facilities build on these timer services to allow users to schedule processes by way of the command:

  • at(1), to execute either at some specified time.
  • batch(1), to execute when system load levels permit.
  • crontab(1), to execute periodically.

 

Process Interval Timers

Process interval timers are designed for use by one or more realtime processes to schedule system events within a very fine time scale, from a few seconds down to a 1/1920 second. The interval timers are set by realtime processes to expire based on a time value that is relative to the current system time, or a time value that represents an absolute time in the future. They are set and used as "one-shot" or periodic timers.

There is flexibility in specifying the action taken when one of the timers expires. It is possible to cause an asynchronous signal in the traditional manner of SIGALARM (although the timers are not restricted to delivering this one signal). It is also possible for an event to be delivered to a waiting synchronous process. Refer to Common Event Notification in Chapter 4 for more information about common events.

A list of free process interval timers is defined at system generation (sysgen) time. The interval timers from this list are allocatable by realtime processes during process initialization or during normal execution. By using sysgen parameters, customization of a timer mechanism for a particular application (requiring the availability of varying amounts of timers for process allocation) is possible.

The REAL/IX Operating System offers process interval timers based on the POSIX 1003. 1b document. A process may use as many of these timers as it wishes, subject only to configuration limits. The system calls used are:

  • gettimerid(2) get a process interval timer identifier for subsequent use
  • reltimerid(2) release a process interval timer identifier
  • incinterval(2) set a process interval timer running to expire relative to current time; also used to cancel a timer
  • absinterval(2) set a process interval timer running with absolute expiration time specified; also used to cancel a timer
  • resinc(2) returns the resolution details of the incinterval function
  • resabs(2) returns the resolution details of the absinterval function

To use a process interval timer, a realtime process does the following:

  1. Use evget(2) to obtain an event identifier to be used when a timer expires. Parameters to evget allow flexibility in the method of notification.
  2. Issue a gettimerid(2) system call, quoting the previously obtained event identifier, to obtain access to a process interval timer. gettimerid gets a unique timer identifier from the free pool of process interval timers.
  3. Set a timer expiration value and activate the process interval timer.
  • To set the value to an absolute time, use the absinterval(2) system call.
  • To set the value to a time relative to the current system time, use the incinterval(2) system call.
  • Both incinterval(2) and absinterval allow for an optional periodic repeat.
4. When the timer expires, the appropriate notification method is used. The process takes whatever action is necessary.

 

Implementation Details

A common mechanism is used for all time services that are driven from the system clock. These services include:

  • process interval timers (refer to gettimerid(2)).
  • Berkeley style interval timers (refer to setitimer(2)).
  • high resolution sleeps (refer to nanosleep(2)).
  • internal kernel services such as delay(D3X) and timeout(D3X).

In all cases, a control block describing the operation is placed on an appropriate queue. At each clock interrupt the interrupt handler examines all queues that may contain a block which requires servicing. If the clock interrupt handler determines that further processing is necessary it wakes up a system daemon to do the work. This style limits the amount of processing required in the interrupt handler.

The clock interrupt handler also provides statistics and profiling information. These operations are performed at a fixed rate, so if the system clock is configured to deliver four times as many interrupts as normal, statistics will only be collected on every fourth tick. In a multiprocessor system, each clock tick will result in an interrupt to a single CPU. On every tick where statistics gathering is required the clock interrupt handler will cause interrupt handling code on the other CPUs to run. These "slave" clock interrupt handlers will perform statistics gathering for the processes running on their respective CPUs.

There are two timer daemons, one each for high priority and low priority processes. The dividing point between high and low priority processes is a tunable parameter, as are the priorities of the two daemons. Normally the high priority timeout daemon should run at a priority greater than any other process that may require timer services. In order to prevent such a high priority process having a potentially large amount of work to do on behalf of low priority processes, such work is relegated to the other timeout daemon, which has a comparatively low priority.


Go to Chapter 4 TOC

 


E-Mail Webmaster  | Legal | Copyright © 2001 MODCOMP, Inc. | Rendered Sept. 28, 2001

MODCOMP is a subsidiary of CSP Inc