Chapter 1
Introduction

1.1 Motivation

Complementary advances in storage, processing power, network bandwidth, and data compression techniques have enabled computers to run new kinds of applications, and to run combinations of applications that were previously infeasible. For example, a modern personal computer can simultaneously decode and display a high-quality video stream, encode an audio stream in real time, and accurately recognize continuous speech; any one of these would have been impossible on an inexpensive machine just a few years ago. Also, market pressure is encouraging vendors to migrate functionality previously performed in dedicated hardware onto the main processor; this includes real-time tasks such as sound mixing  [39] and modem signal processing  [41]. Furthermore, home and office networks, both wired and wireless, are making personal computers into attractive storage and processing servers for resource-limited networked devices such as stereos, digital still and video cameras, and personal digital assistants. In his keynote speech at COMDEX in January 2001  [7], Intel CEO Craig Barrett said that:

We have architected the latest generation of our microprocessor, the Pentium 4 processor, specifically for this. It was architected not to run [Microsoft] Word faster ... We did it to handle rich multimedia information. Whether it is for voice recognition, or animation or for gaming. Whether it is for showing video or capturing video or images.

Of course, powerful hardware alone is not enough--to reliably run combinations of real-time applications an operating system must effectively manage system resources such as processor time, storage bandwidth, and network bandwidth. Providing each resource to each task at an appropriate rate and granularity is no easy task; allocating at too high a rate or too fine a granularity is inefficient, and allocating at too low a rate or too coarse a granularity may reduce the value provided by applications. Scheduling is particularly difficult when the demand for one or more resources exceeds the supply--a situation that is all too common.

This dissertation focuses on the effective management of processor time, which is an important factor in overall system performance  [66]. Traditional general-purpose operating systems (GPOSs) lack the flexibility required to support diverse workloads including multimedia and other soft real-time applications. They provide a single scheduling policy that is designed to support interactive and batch applications, and consequently they cannot provide application developers and users with meaningful guarantees about the level of service that applications will receive, and they cannot provide isolation between threads, applications, users, or other entities in the system. Furthermore, general-purpose operating systems give users very coarse controls for selectively allocating processor time to different applications, and they discourage potential implementors of innovative scheduling algorithms because their schedulers are interwoven with other operating system internals, and are therefore difficult to understand and modify.

1.2 The Thesis

The thesis that this dissertation supports is:

Extending a general-purpose operating system with general, heterogeneous hierarchical CPU scheduling is feasible and useful.

A hierarchy of schedulers generalizes the role of CPU schedulers by allowing them to schedule other schedulers, as well as scheduling threads. A general, heterogeneous scheduling hierarchy is one that allows arbitrary (or nearly arbitrary) scheduling algorithms throughout the hierarchy. In contrast, most of the previous work on hierarchical scheduling has imposed restrictions on the schedulers used in part or all of the hierarchy. This dissertation describes the Hierarchical Loadable Scheduler (HLS) architecture, which permits schedulers to be dynamically loaded into the kernel of a general-purpose operating system.

The feasibility of general, heterogeneous hierarchical scheduling is demonstrated by (1) the design, implementation, and performance evaluation of the hierarchical scheduler infrastructure (HSI) in combination with several loadable schedulers and (2) the design of a system of guarantees for reasoning about the ongoing allocation of CPU time to soft real-time threads. The most important characteristics of HLS, and the ones that distinguish it from all previous work, are (1) that it has demonstrated that general, heterogeneous hierarchical scheduling can be efficiently supported by the kernel of a general-purpose operating system and that (2) the scheduling behavior of a general, heterogeneous hierarchy of soft real-time schedulers can be reasoned about in order to provide guaranteed scheduling behavior to application threads.

The usefulness of HLS is supported by showing that many deficiencies of the schedulers in general-purpose operating systems can be solved in a flexible way using a dynamically loaded hierarchy of schedulers. In particular, applications with diverse requirements can be supported by loading schedulers or combinations of schedulers that provide appropriate and, in some cases, guaranteed scheduling behavior. Guarantees provide the developers of real-time applications with a model of CPU allocation to which they can program. Guarantees also benefit end users by providing a mechanism for ensuring that the scheduling requirements of an important application will be met for the duration of the application’s execution, or at least until the user wants the guarantee to end. Finally, the scheduling hierarchy supports multi-level isolation between threads, applications, and other entities.

The advantage of a flexible scheduling hierarchy over a monolithic, or fixed, scheduling policy is that it allows the complexity of the scheduler to be tailored to different situations. For example, a single user machine that runs a few undemanding multimedia applications can use a very simple scheduling hierarchy--the user is not forced to pay the administrative and run-time costs associated with complex scheduling behavior. Similarly, a powerful machine that must isolate the CPU usage of different users from each other while supporting multiple real-time applications can employ a much more sophisticated scheduling hierarchy--in this case users can expect their real-time applications to work (if the machine has sufficient capacity to run them) and they will not be forced to deal with unfairness issues that can result from lack of isolation. In short, we assert that there is no good “one size fits all” scheduler for general-purpose operating systems.

Additional benefits of HLS include a well-defined programming interface that can reduce the effort associated with implementing new schedulers. This is accomplished by providing loadable schedulers with the notifications about operating system events that they require in order to make scheduling decisions, while abstracting away operating-system-specific details that do not concern the scheduler. Finally, a rule-based resource manager can be used in conjunction with the scheduling hierarchy to map application threads to appropriate schedulers based on their requirements, and to enforce high-level user- and administrator-supplied policies about the allocation of processor time.

1.3 Reasons to Use and Understand Hierarchical Scheduling

The flexibility enabled by hierarchical CPU scheduling has a number of advantages:

Even in situations where it is undesirable to place an explicit scheduling hierarchy in an operating system kernel, it is useful to understand the properties of hierarchical schedulers because there are many real-world cases of implicit hierarchical scheduling. For example, the kernel thread scheduler in a general-purpose operating system in combination with the scheduler for a user-level thread package, the scheduler for threads in a Java virtual machine, or the kernel thread scheduler for an operating system being run in a virtual machine such as VMWare, forms a hierarchical scheduler. The kernel scheduler itself in a general-purpose operating system can be viewed as a hierarchical scheduler: a fixed-priority scheduler implemented in hardware schedules interrupts at the highest overall priority, a (usually) FIFO scheduler runs low-level kernel tasks at the middle priority, and application threads are run at the lowest priority using what is usually thought of as “the scheduler.” Furthermore, when the co-resident operating system approach to real-time is used  [1193], the kernel scheduler for a general-purpose OS is no longer the root scheduler; this privilege is taken over by a small hard-real-time OS.

1.4 Overview of the HLS

Conventional time-sharing scheduling algorithms are designed with a set of tradeoffs in mind; applications running under these schedulers are forced to cope with these tradeoffs. Since general-purpose operating systems are used in diverse ways, sometimes unanticipated by the designers of the OS, it is difficult to select, in advance, the right scheduler for an operating system. A premise of HLS is that scheduling algorithms should not be chosen in advance by operating system designers. Rather, scheduling policies and combinations of scheduling policies implementing desired behaviors should be dynamically loaded into the operating system to meet the scheduling requirements of a particular usage scenario.

1.4.1 Systems Architecture

Figure 1.1 depicts the HLS architecture.The scheduling hierarchy is implemented in the kernel of a general-purpose operating system, where it is supported by the hierarchical scheduler infrastructure. There is a root scheduler at the top of the hierarchy that controls the allocation of all processor time. It grants the CPU to its children, which are below it in the hierarchy. The children may further subdivide processor time to their children, until a leaf of the hierarchy is reached; leaves always represent threads.


PIC
Figure 1.1: High-level overview of the HLS architecture

In Figure 1.1 FP, a fixed priority scheduler, is the root and RES and TS are its children. RES is a real-time scheduler and TS is a time-sharing scheduler. FP runs RES at a higher priority than TS--this means that any time RES is able to run, it is guaranteed to have access to a processor, allowing it to make real-time guarantees. It is not the case that TS can always get access to a processor: since it is scheduled at a lower priority than RES, it simply does not get to schedule any threads while RES is running. RES has one thread to schedule, T1, while TS is scheduling two threads, T2 and T3.

A request for a certain type of scheduling behavior is made by contacting the scheduling hierarchy using the HLSCtl command. Applications may make requests on their own, or a middleware resource manager may make requests on their behalf. If a resource manager is being used, it can also ensure that a particular request does not violate any user- or administrator-supplied rules about the allocation of CPU time.

1.4.2 Architecture over Time

It is important to understand the three distinct time scales over which the scheduling hierarchy interacts with applications and users.

Long: The longest-term decision that needs to be made to use HLS concerns the way a particular computer will be used--this will determine the initial shape of the scheduling hierarchy. For example, basic time-sharing usage of a personal computer requires only a single scheduler. On the other hand, providing load isolation between users while supporting multiple real-time applications with different kinds of requirements may necessitate a complex hierarchy. For a given initial hierarchy, variations in usage can be accommodated by adding, deleting, or changing parts of the hierarchy. For example, a user might at first require only time-sharing behavior, but later want to isolate the CPU usage of an untrusted application. This can be accomplished by creating a new scheduler that is a child of the main time-sharing scheduler to run threads belonging to the untrusted application. The scheduler composition logic described in Chapter 5 describes how hierarchical collections of schedulers can be shown to provide correct real-time behavior.

Medium: At the medium time scale the scheduling hierarchy remains constant while applications are instantiated, exit, and change their requirements. Decisions at this level are expected to last for seconds or minutes, and possibly for much longer. New applications are assigned scheduling from a default scheduler; they may then request other kinds of scheduling, or other kinds of scheduling may be requested on their behalf. The actual decision of whether a particular scheduler can grant a particular request for scheduling is made by the scheduler itself, using a schedulability analysis routine. For example, a voice recognition application could request an ongoing share of 10% of the CPU by sending a message to the appropriate scheduler. The return value of the message would indicate that either yes, the scheduler has enough spare capacity to reserve 10% of the processor, or no, it does not, in which case the user could manually reduce system load or try to run the application later.

Short: At the short time scale, the scheduling hierarchy and the set of applications and their requirements remain constant. This is the level at which individual scheduling decisions are made by schedulers in the hierarchy, on a granularity of milliseconds or tens of milliseconds. For example, at this level a scheduler that has guaranteed a real-time thread to receive at least 3 ms of CPU time during every 50 ms interval would use a timer to have the Hierarchical Scheduler Infrastructure send it a notification every 50 ms so that it could schedule the thread.

These time scales are intended to be descriptive of the ways that computers running general-purpose operating systems are often used, rather than being prescriptive, or telling users how to use HLS. For example, there is nothing stopping an application from requesting a different guarantee ten times per millisecond, and in fact, the entire scheduling hierarchy on a machine can be reconfigured quite rapidly. Still, it is expected that the core scheduling hierarchy on a machine will remain stable for a considerable length of time, often for at least for the length of time between reboots. This reflects the basic insight that most computers are purchased and used for some purpose, even if that purpose is a very general one like “running interactive and multimedia applications.”

General-purpose operating systems with traditional time-sharing schedulers lack flexibility over long time scales since they have a single, fixed scheduling policy. They have some flexibility at medium time scales, but in most cases it is limited to changing the priorities of threads and limiting threads to run on a subset of the processors on a multiprocessor machine.

1.5 Scope of the Dissertation

Multimedia scheduling is a broad topic. In order to limit the scope of this dissertation a number of assumptions have been made. This section lists them and provides brief discussion and justification.

1.6 Contributions

The contributions of this dissertation fall into five categories.

Programming models: The programming model is identified as an important aspect of supporting multimedia applications in general-purpose operating systems. A taxonomy of multimedia programming models is presented: the first-order distinction made by the taxonomy is between the scheduling algorithms used to multiplex processor time on short time scales (milliseconds or tens of milliseconds) and the high-level mode-change protocols used to allocate CPU time over longer time periods (seconds, minutes, or longer).

Scheduler composition: The underlying unity of the scheduling behaviors provided by a broad class of multimedia scheduling algorithms is exploited in order to develop a novel system of formal guarantees about the way that schedulers allocate CPU time. This system is useful because: it separates abstract scheduling behaviors from the algorithms that provide them, it shows which guarantees are equivalent to which other guarantees, and it allows the scheduling behavior of a hierarchy of schedulers to be reasoned about. Also, a new result about the schedulability of a task set using a rate monotonic scheduler that is given a CPU reservation is presented. Finally, it is demonstrated that a number of complex, idiomatic scheduling behaviors that have been described in the literature can be implemented using HLS schedulers as basic components.

Hierarchical scheduler infrastructure: The design, implementation, and performance evaluation of an architecture supporting a hierarchy of schedulers in the kernel of a multiprocessor operating system is presented. The scheduler infrastructure is based on a novel extension of the virtual processor abstraction that was developed for scheduler activations  [3]. It is also novel in that it is the first system that permits a hierarchy of generic scheduling algorithms to be dynamically loaded into the kernel of a general-purpose operating system. The scheduler infrastructure facilitates the implementation of new schedulers by providing a simplified programming model: it isolates loadable schedulers from OS complexities such as extraneous thread states and a difficult multiprocessor programming model. Operations on the scheduling hierarchy such as creating or destroying a scheduler instance, moving a thread between schedulers, and beginning or ending a CPU reservation can be performed quickly: they all take less than 40 ms on a 500 MHz Pentium III. Finally, although HLS increases the cost of a context switch slightly, we show that the performance penalty that a context switch imparts to a thread in terms of re-establishing its working set in the processor cache can easily be two orders of magnitude greater than the cost added by HLS.

Resource manager: The high-level design (but not detailed design or implementation) of a rule-based, user-level resource manager is presented. The novel feature of the resource manager is that it makes use of reflective information about the scheduling hierarchy, as well as information about users and applications, in order to implement high-level policies about the allocation of CPU time.

Augmented CPU reservations: Stolen time is defined and shown to be a potential obstacle to the predictable execution of real-time applications on general-purpose operating systems. For example, on a 500 MHz Pentium III running Windows 2000 more than 25% of a CPU reservation’s time can be stolen by the network stack. On the same machine running Linux, close to 50% of a CPU reservation’s time can be stolen by the IDE disk driver.

Two novel scheduling algorithms, Rez-C and Rez-FB, that provide augmented CPU reservations were designed, implemented, and evaluated. Augmented CPU reservations provide applications with increased predictability in the presence of stolen time. For example, a test application scheduled by Rez (which does not provide augmented reservations) must over-reserve by almost 25% to avoid missing deadlines when the network stack steals time from it. When scheduled by Rez-C or Rez-FB, 6% over-reservation is sufficient to eliminate nearly all deadline misses.

1.7 Outline of the Dissertation

Chapter 2 provides additional motivation for HLS by examining the relationship between the programming models that are provided by important classes of soft real-time schedulers, the properties of multimedia applications, and the conflicting requirements that must be satisfied by a general-purpose operating system that also supports multimedia applications. Chapter 3 describes three application scenarios that motivate support for flexible scheduling in general-purpose operating systems. Chapter 4 presents the design of the Hierarchical Scheduler Infrastructure (HSI)--the in-kernel component of HLS that supports loadable schedulers. In Chapter 5 the guarantee system for verifying the composability of hierarchical multimedia schedulers is presented. Chapter 6 continues the subject of scheduler composition, and includes a section showing that complex and useful scheduling behaviors can be composed using simple schedulers as components. Solutions to the scheduling challenges posed by the application scenarios in Chapter 3 are presented in Chapter 7. The design of the HLS resource manager is presented in Chapter 8. Chapter 9 describes the issues involved with implementing the HSI in the Windows 2000 kernel. Chapter 10 presents data about the performance of the prototype implementation. Chapter 11 describes and evaluates two schedulers that can increase predictability for applications executing on general-purpose operating systems in the presence of stolen time. Chapter 12 compares elements of the HLS architecture with related work that has appeared in the literature. Conclusions and future work are presented in Chapter 13. Finally, Appendix A contains an alphabetical list of technical terminology used in this dissertation, Appendix B describes the programming interface for loadable schedulers, and Appendix C contains source code for a loadable scheduler.