Chapter 5 Composing Scheduling Policies

Chapter 5
Composing Scheduling Policies

A thread, as a leaf node of the scheduling hierarchy, can only be scheduled when each scheduler between it and the root of the hierarchy schedules it at the same time. So, a key question is “can a given scheduling hierarchy provide threads with the guarantees that they need in order to meet their deadlines?” Clearly, some arrangements of schedulers are flawed. For example, suppose that a hierarchy includes a real-time scheduler that is scheduled using a time-sharing scheduler. Since the time-sharing scheduler makes no particular guarantees to entities that it schedules, the real-time scheduler cannot predict when it will receive CPU time, and therefore it cannot promise threads that it schedules that they will be able to meet their deadlines.

This chapter develops a system of guarantees about the ongoing allocation of processor time. Guarantees formalize the scheduling properties provided by a large class of multimedia schedulers, and allow these properties to be reasoned about. By matching the guarantee that one scheduler provides with the guarantee that another scheduler requires, and by using conversion rules that allow a guarantee to be rewritten as a different guarantee, complete scheduling hierarchies can be constructed with the assurance that the leaves of the hierarchy will actually receive CPU time according to the distribution they were promised.

5.1 Guarantees

5.1.1 Definition and Properties

Guarantees are the basic abstraction for reasoning about the ongoing allocation of processor cycles to real-time schedulers and threads. A guarantee is a contract between a scheduler and a virtual processor (VP) regarding the distribution of CPU time that the VP will receive for as long as the guarantee remains in force. The meaning of a particular guarantee is defined in two complementary ways:

It is equivalent to a formal statement about the allocation of processor time that the guarantee promises. For example, a real-time thread might be guaranteed that during any time interval y units long, it will be scheduled for at least x time units.
It is defined as the distribution of CPU time produced by a particular implementation of a scheduler.

Both aspects of a guarantee are important: the formal statement is used to reason about scheduler composition, and the correspondence with a scheduler implementation is used to provide a thread with the kind of scheduling service that it requested or was assigned. Establishing a correspondence between the two parts of the definition--proving that a particular scheduler implementation provides some scheduling behavior--is not within the scope of this dissertation.

The distinguishing characteristics of guarantees are that they describe bounds on the ongoing allocation of CPU time, and that they are independent of any particular scheduling algorithm. The primary advantages that this independence confers are:

Guarantees abstract the behavior provided by a scheduler from the algorithm itself, permitting an application (or an entity acting on an application’s behalf) to request scheduling based on its requirements, rather than requesting scheduling from a particular scheduler.
Guarantees provide a model of CPU allocation to which developers can program. In contrast, many of the multimedia schedulers that have been proposed provide no guarantees, requiring programmers to explicitly take into account scenarios in which application threads receive amounts of CPU time other than their full requirements.
Guarantees provide a mechanism that allows users to ensure that the requirements of important applications are always met.
Schedulability analysis in a hierarchical scheduling system using guarantees can be performed using only local knowledge. In other words, each scheduler can determine whether or not it can provide a new guarantee based only on knowledge of the guarantee that it receives and the guarantees that it currently provides, rather than having to perform a global calculation over all schedulers in the hierarchy.

5.1.2 Composing Schedulers using Guarantees

From the point of view of the guarantee system, the purpose of the scheduling hierarchy is to convert the guarantee representing 100% of the CPU (or the set of guarantees representing 100% of multiple CPUs) into the set of guarantees required by users, applications, and other resource consumers.

Each scheduler written for HLS is characterized by one or more mappings from an acceptable guarantee to a set of provided guarantees. Any guarantee that can be converted into a guarantee that is acceptable to a scheduler is also acceptable. For example, the start-time fair queuing (SFQ) scheduler can accept a proportional share guarantee, in which case it can provide proportional share guarantees to its children. It can also accept a proportional share with bounded error guarantee, in which cases it can provide that guarantee to its children. The guarantee representing 100% of the CPU is acceptable to the SFQ scheduler because it can be trivially converted into either kind of proportional share guarantee.

A hierarchy of schedulers and threads composes correctly if and only if (1) each scheduler in the hierarchy receives a guarantee that is acceptable to it and (2) each application thread receives a guarantee that is acceptable to it. The set of guarantees that is acceptable to a scheduler is an inherent property of that scheduling algorithm. The set of guarantees that is acceptable to an application depends partially on inherent characteristics of the application, and partially on other factors, such as the application’s importance.

The overall correctness of a scheduling hierarchy is established by starting at the root and working towards the children. If the scheduler at the root of a hierarchy receives the guarantee it was promised, and if each scheduler in the hierarchy correctly provides the guarantees that they have agreed to provide, then it is the case that all application threads will also receive the guarantees that they were promised.

The primary goal of the work on guarantees and scheduler composition is to ensure that each thread’s guarantees can be met in a system using hierarchical scheduling. Furthermore, when translating between guarantees, as little processor time as possible should be wasted; in other words, applications’ requirements should be met as efficiently as possible.

Two fundamental insights drive scheduler composition:

The guarantee a scheduler makes to its children can be no stronger than the guarantee that it receives; guarantees must become weaker towards the bottom of the scheduling hierarchy. The guarantee language presented in this chapter formalizes this notion.
Each scheduler must receive a guarantee that is semantically compatible with the guarantee that it makes.

The compositional correctness of a scheduling hierarchy can be established off-line if the hierarchy will remain fixed. Correctness can also be established online by middleware such as the resource manager.

5.1.3 Assumptions

The guarantee system for scheduler composition can ensure that a hierarchy of schedulers and threads is correct in the sense that for each application that has received a guarantee, its scheduling requirements will be met. This correctness is contingent on the following assumptions:

Real-time threads are soft real-time and require ongoing guarantees. Guarantees are assumed to be long-lived entities relative to the millisecond-granularity scheduling decisions made by the scheduling hierarchy. The assumption that threads in multimedia applications require long-lived guarantees, such as a fixed percentage of the CPU bandwidth with a bound on scheduling latency, is a valid one for many kinds of multimedia applications such as the ones surveyed in Section 2.4.
Application requirements are known. In other words, an amount of CPU time that at least meets the requirements of threads in each real-time application is known or can be calculated, either from first principles or by measurement. A procedure for determining application requirements is proposed in Section 8.5.1.
Hierarchical schedulers are implemented correctly. This means that each scheduler meets the requirements outlined in Section 4.3.10, and that when it receives the required guarantee, it correctly provides the guarantees that it has promised to entities that it schedules.
Guarantees apply only while threads are runnable. A thread’s guarantee may not be met over time intervals during which it blocks. Reasons for blocking include waiting to enter a critical section, waiting for disk or network data, and waiting for the expiration of a timer. Threads in well-designed real-time applications can usually avoid blocking for so long that they miss deadlines.
The operating system permits the guarantee to be provided. In a general-purpose OS, device drivers and other kernel code executing in the context of an interrupt, a deferred procedure call, or a spinlock cannot be preempted by threads, and can therefore interfere with a scheduler’s efforts to provide a guarantee. It is simply not possible to give unconditional guarantees to application threads in this kind of operating system unless the kernel and device drivers can be shown not to be non-preemptible for too long or too often. However, in practice, general-purpose operating systems like Linux and Windows 2000 are usually capable of scheduling threads in real-time multimedia applications with deadlines of several milliseconds or more. Chapter 11 discusses circumstances in which this assumption does not hold, and presents some ways to deal with the resulting problems.

5.1.4 Obtaining a Guarantee

So far, guarantees have been treated as abstract properties. In practice, however, guarantees are parameterized. For example, in the abstract a proportional share scheduler can provide a percentage of the CPU to a thread. In practice, a thread requires a specific percentage such as 10%.

To acquire this guarantee, a message is sent to the proportional share scheduler asking for a guarantee of 10% of the CPU. The scheduler uses its schedulability analysis routine to decide whether the guarantee is feasible or not. If it is, the request is accepted, otherwise it is rejected and no guarantee is provided. When a request for a guarantee is rejected, there are several courses of action that may be taken:

System load can be reduced by terminating applications or by giving some applications a smaller guarantee.
The requesting application can attempt to obtain a smaller guarantee.
The guarantee given to the scheduler that rejected the request can be increased by sending a request to that scheduler’s parent.

5.2 Soft Real-Time Guarantees

5.2.1 Guarantee Syntax

Guarantees are represented by identifiers of the following form:

TYPE [params]

Where TYPE represents the name of the kind of guarantee and [params] denotes a list of numeric parameters. The number of parameters is fixed for each kind of guarantee. Guarantee parameters are often in time units; for convenience, numeric parameters in this dissertation will be taken to be integers representing milliseconds unless otherwise noted.

Except in the special case of the uniformly slower processor guarantee, utilizations that appear as guarantee parameters are absolute, rather than relative. For example, if a proportional share scheduler that controls the allocation of 40% of a processor gives equal shares to two children, the guarantees that it provides have the type PS 0.2 rather than PS 0.5. Parameters in guarantees are in absolute units because this allows the meaning of each guarantee to be independent of extraneous factors such as the fraction of a processor controlled by the parent scheduler.

5.2.2 Guarantee Types

This section describes the formal properties of types of guarantees provided by multimedia schedulers.

5.2.2.1 Root Scheduler

The scheduler at the top of the hierarchy is given a guarantee of ALL, or 100% of the CPU, by the operating system. This guarantee has no parameters. Although it is tempting to parameterize this guarantee (and others) with a value indicating how much work can be done per time unit on a particular processor, Section 6.3.3.1 will present an argument that this type of parameterization is not practical in general.

5.2.2.2 CPU Reservation

The CPU reservation is a fundamental real-time abstraction that guarantees that a thread will be scheduled for a specific amount of time during each period. Reservations are a good match for threads in real-time applications whose value does not degrade gracefully if they receive less processing time than they require. A wide variety of scheduling algorithms can be used to implement CPU reservations, and many different kinds of reservations are possible. The guarantee types for CPU reservations are:

Basic, hard: RESBH x y.
Basic, soft: RESBS x y.
Continuous, hard: RESCH x y.
Continuous, soft: RESCS x y.
Probabilistic, soft: RESPS x y z, where z is the size of the overrun partition that probabilistic guarantees have access to (see Section 6.2.2 for more details); probabilistic reservations are always soft.
Non-preemptive, hard: RESNH x y (non-preemptive reservations are implicitly continuous since their CPU time is allocated as a single block that is always at the same offset within a period).
Synchronized, hard: RESSH x y z, where z is the time that the reservation is synchronized to (synchronized reservations are implicitly continuous).

The following list describes the properties of these kinds of CPU reservations. In particular, note that every CPU reservation is either basic or continuous, and either hard or soft, and that these properties are orthogonal.

Basic CPU reservations are what would be provided by an EDF or rate monotonic scheduler that limits the execution time of each thread that it schedules using a budget. Section 9.3.2 describes Rez, a basic reservation scheduler for HLS. For a reservation with amount x and period y, a basic reservation makes a guarantee to a virtual processor that there exists a time t such that for every integer i the VP will receive x units of CPU time during the time interval [t + iy, t + (i + 1)y]. In other words, the reservation scheduler divides time into period-sized intervals, during each of which it guarantees the VP to receive the reserved amount of CPU time. The value of t is chosen by the scheduler and is not made available to the application.
Continuous CPU reservations (as defined by Jones et al. [40]) are those that make the following guarantee: given a reservation with amount x and period y, for any time t, the thread will be scheduled for x time units during the time interval [t, t + y]. A continuous CPU reservation is a stronger guarantee than a basic CPU reservation since every period-sized time interval will contain the reserved amount of CPU time, rather than only certain scheduler-chosen intervals. In other words, a continuous reservation scheduler is not free to arbitrarily rearrange CPU time within a period. Continuous reservations are provided by schedulers that utilize a (possibly dynamically computed) static schedule, such as Rialto [40] and Rialto/NT [37], where a thread with a reservation receives its CPU time at the same offset during every period. In contrast, a basic reservation scheduler such as Rez retains the freedom to schedule a task during any time interval (or combination of shorter intervals) x units long within each time interval y units long. Figure 5.1 depicts two CPU reservations, one basic and one continuous, that are guaranteed to receive 3 ms of CPU time out of every 8 ms. The continuous CPU reservation is also a basic reservation, but the basic reservation is not a continuous reservation.

Figure 5.1: Segment of a time-line showing when the CPU is allocated to a basic and a continuous reservation
Hard reservations limit the CPU usage of a virtual processor to at most the reserved amount of time, as well as guaranteeing that it will be able to run at at least that rate and granularity. Hard reservations are useful for applications that cannot opportunistically take advantage of extra CPU time; for example, those that display video frames at a particular rate. They are also useful for limiting the CPU usage of applications that were written to use the full CPU bandwidth provided by a processor slower than the one on which they are currently running. For example, older CPU-bound games and voice recognition software have been run successfully on fast machines by limiting their utilization to a fraction of the full processor bandwidth.
Soft reservations may receive extra CPU time on a best-effort basis. They are useful for applications that can use extra CPU time to provide added value. However, no extra time is guaranteed.
Probabilistic reservations (called CPU service classes by Chu and Nahrstedt [16]), are a special case of soft CPU reservations that guarantee a thread to receive a specified minimum execution rate and granularity as well as having a chance of obtaining extra CPU time. Probabilistic reservations are designed to give applications with highly variable execution times a high probability of meeting their deadlines without actually giving them a guarantee based on their worst-case execution times.
Non-preemptive reservations guarantee that the amount of reserved time will be provided to the virtual processor in a single block, at the same offset in each period. In other words, for a non-preemptive reservation with amount x and period y there exists a time t such that for every integer i, the thread will be scheduled for the time interval [t + iy, t + iy + x]. This guarantee is useful for threads in applications that must interact with sensitive hardware devices requiring the undivided attention of the CPU. Although any reservation provided by the main thread scheduler in a GPOS is always preemptible by hardware and software interrupts, these events are generally very short compared to time quanta for application threads.
Synchronized reservations are a special case of non-preemptive reservations that start at a specific time. Synchronized reservations permit the requesting thread to specify a time t such that for every integer i, the thread will be scheduled for the time interval [t + iy, t + iy + x]. Synchronized reservations are useful to synchronize a reservation to a periodic external event (such as a video card refresh--this is a real, and difficult problem), or to implement schedulers that coordinate thread execution on different processors of a multiprocessor, or on different machines.

5.2.2.3 Uniformly Slower Processors

A uniformly slower processor (USP) is a special kind of CPU reservation that was described in Section 2.3.1.4 . A USP guarantee has the form RESU r, where r is the fraction of the overall CPU bandwidth allocated to the USP. The granularity over which the reserved fraction of the CPU is to be received is not part of the guarantee, and must be specified dynamically. The guarantee provided by a uniformly slower processor is as follows. Given a virtual processor with guarantee RESU r, for any two consecutive deadlines d_n and d_n+1 that the child scheduler at the other end of the VP notifies the USP scheduler of, the VP is guaranteed to receive r(d_n+1 - d_n) units of CPU time between times d_n and d_n+1.

5.2.2.4 Proportional Share

Proportional share (PS) schedulers are quantum-based approximations of fair schedulers. Some PS schedulers can guarantee that during any time interval of length t, a thread with a share s of the total processor bandwidth will receive at least st - units of processor time, where is an error term that depends on the particular scheduling algorithm. This guarantee is called “proportional share bounded error” and has the type PSBE s . For the earliest eligible virtual deadline first (EEVDF) scheduler [34], is the length of the scheduling quantum. For the start-time fair queuing (SFQ) scheduler [28], is more complicated to compute: it is a function of the quantum size, the number of threads being scheduled by the SFQ scheduler, and the share of a particular thread. Since it is particularly well suited to hierarchical scheduling, we will return to the SFQ scheduler in Sections 5.3.1.4 and 5.3.3 .

Some proportional share schedulers, such as the lottery scheduler [90], provide no deterministic performance bound to threads that they schedule. The guarantee given by this kind of PS scheduler is PS s, where s is the asymptotic share that the thread is guaranteed to receive over an unspecified period of time.

5.2.2.5 Time Sharing

Since time-sharing schedulers make no particular guarantee to threads they schedule, they make the NULL guarantee, indicating strictly best-effort scheduling. Time-sharing schedulers usually attempt (but do not really guarantee) to avoid completely starving threads because starvation has been shown to lead to unbounded priority inversion when threads are starved while holding shared resources. Priority inversion is said to occur when a low-priority thread holds a resource, preventing a high-priority thread from running.

Most multilevel feedback queue time-sharing schedulers avoid starving threads because they are actually weak proportional share schedulers. The Windows 2000 scheduler (which is not proportional share) avoids starvation by increasing a thread’s priority to the maximum time-sharing priority if it is runnable but does not get to run for more than about three seconds. Both of these schedulers attempt to ensure that threads will not be starved as long as there are no high-priority “real-time” threads.

Time-sharing (TS) schedulers are designed to avoid starving threads that they schedule on a best-effort basis. However, TS schedulers will be unable to prevent starvation if the schedulers themselves are starved. It is therefore desirable to ensure that each time-sharing scheduler in a system receives a real (instead of best-effort) scheduling guarantee. This guarantee will ensure that (1) priority inversions caused by time-sharing threads being starved while holding resources do not occur, and (2) that the windowing system and associated administrative applications will remain responsive to user input regardless of the behavior of real-time applications in the system.

5.2.3 Completeness of the Set of Guarantees

The set of guarantees presented in this chapter is by no means complete. Rather, it covers an interesting and viable set of guarantees made by multimedia schedulers that have been presented in the literature and that, together, can be used to meet the scheduling needs of many kinds of multimedia applications.

An additional guarantee type based on the time constraint abstraction provided by Rialto or Rialto/NT could be defined--time constraints guarantee a certain amount of CPU time before an application-specified deadline. However, this guarantee would be a dead-end with respect to conversion to other kinds of guarantees using schedulers and rewrite rules. Furthermore, the one-shot constraints violate our assumption that guarantees are long-lived entities.

An interesting guarantee would be one based on Mok and Chen’s multiframe tasks [63]. Multiframe tasks exploit patterns in applications’ CPU requirements in order to avoid making reservations based on worst-case requirements. This would be useful for applications such as the MPEG decoder, which exhibit large but somewhat predictable fluctuations in requirements between frames [8].

5.3 Converting Between Guarantees

Recall that from the point of view of guarantees, the purpose of the scheduling hierarchy is to convert the ALL guarantee(s) into the set of guarantees required by users, applications, and other resource consumers. The remainder of this chapter describes the two ways that guarantees can be converted into other guarantees. First, each scheduler in the hierarchy requires a guarantee, and provides guarantees to other schedulable entities through virtual processors. Second, guarantee rewrite rules can be used to convert guarantees without using a scheduler to perform the conversion.

5.3.1 Converting Guarantees Using Schedulers

Table 5.1 shows a number of schedulers and what guarantee(s) they can provide. The top six schedulers have been implemented in HLS; the remaining schedulers have been described in the literature. The table is to be interpreted as follows:

A B+ means that a scheduler can convert a guarantee with type A into multiple guarantees of type B. A is the weakest guarantee that is acceptable to the scheduler when used to provide guarantees of type B. Implicitly, any guarantee that can be converted into guarantee A using one of the conversions in Section 5.3.2 is also acceptable.
The identifier “any” indicates a variable that may be bound to any guarantee. So, the fixed priority scheduler passes whatever guarantee it is given to its highest-priority virtual processor while providing multiple NULL guarantees to lower-priority VPs.
Whenever a scheduler makes use of the RESU guarantee, amounts of CPU time in the guarantees that it provides must be interpreted as being in the domain of the uniformly slower processor. For example, if a reservation scheduler that receives a guarantee of RESU 0.5 provides a reservation of RESBS 10 20, the thread that receives that reservation will only have 25% of the total CPU bandwidth available to it, because 0.5(10/20) = 0.25.

Sections 5.3.1.1 through 5.3.1.7 discuss and justify the guarantee conversions listed in Table 5.1.


Scheduler	Conversions

Fixed Priority	any (any, NULL+)
Join	see below
Limit	RESBS RESBH
PS	PS PS+, PSBE PSBE+, RESU PSBE+, RESCH PSBE+
Rez	ALL RESBH+, RESU RESBH+
TS	NULL NULL+

BSS-I, PShED	ALL RESU+, RESU RESU+
BVT	PS PS+, RESU PS+, RESCH PS+
CBS	ALL RESBH+, RESU RESBH+
EEVDF	ALL PSBE+, RESU PSBE+
Linux	NULL NULL+
Lottery	PS PS+, RESU PS+, RESCH PS+
Resource Kernel	ALL (RESBS+, RESBH+), RESU (RESBS+, RESBH+)
Rialto, Rialto/NT	ALL RESCS+, RESU RESCS+
SFQ	PS PS+, PSBE PSBE+, RESU PSBE+, RESCH PSBE+
SFS	PS+ PS+, RESU+ PS+, RESCH+ PS+
SMART	NULL NULL+
Spring	ALL RESBH+, RESU RESBH+
Stride	PS PS+, RESU PS+, RESCH PS+
TBS	ALL RESBS+, RESU RESBS+
Windows 2000	NULL NULL+

Table 5.1:

Guarantees required and provided by common multimedia scheduling algorithms and scheduler implementations. Notation used in this table is explained in Section 5.3.1.

5.3.1.1 Fixed Priority

A preemptive, fixed-priority scheduler that uses admission control to schedule only one virtual processor at each priority gives no guarantee of its own: rather, it passes whatever guarantee it receives to its highest-priority child. All other children receive the NULL guarantee.

This logic can be seen to be correct by observing that no other virtual processor can create scheduling contention for the highest-priority VP when a preemptive fixed-priority scheduler is in use. No guarantee can be given to lower-priority VPs because the one with the highest priority may be CPU-bound.

5.3.1.2 Join

Most uniprocessor schedulers register a single virtual processor with their parent, and multiplex CPU time received from that virtual processor among multiple children. The join scheduler performs the opposite function: it registers multiple virtual processors with its parents and schedules its child VP any time any of the parents allocates a physical processor to it. This allows the scheduling hierarchy to be generalized to a directed acyclic graph.

Join schedulers function as OR gates in the scheduling hierarchy: they schedule their child virtual processor when receiving CPU time from any parent. The synthesis of complex scheduling behaviors from collections of small, simple schedulers often requires the use of join schedulers. For example, as Section 6.2.1 illustrates, a scheduler that provides hard CPU reservations can also be used to provide soft CPU reservations using join schedulers. The purpose of a join scheduler, as in the preceding example, is usually to direct slack time in the schedule to a virtual processor that can make use of it. For this reason, a join scheduler will usually be joining a guarantee such as a CPU reservation with a NULL guarantee (since slack time in the schedule is, by definition, not guaranteed). Even so, the rest of this section will present the exact rules for the guarantees that a join scheduler can provide, which were too complicated to present in Table 5.1 .

An entity scheduled by a join scheduler may pick any one of its parent guarantees to take, with the restriction that if the guarantee that it picks is a hard guarantee, it must be converted into the corresponding soft guarantee. To see that this is correct, notice that a join scheduler cannot give a virtual processor any less CPU time than the VP would have received if it were directly given any of the join scheduler’s parent guarantees. However, the join scheduler may cause a virtual processor to receive additional CPU time, meaning that it cannot give a hard guarantee.

On a uniprocessor machine, guarantees of the same type (or guarantees that can be converted into the same type) may be added together. For example, if a join scheduler receives guarantees of RESBS 2 30 and RESBS 3 30, then the join scheduler can provide a guarantee of RESBS 5 30. Clearly this is more complicated when the reservations have different periods. We do not work out this case in detail since it is difficult to picture this situation coming up in practice--it would be simpler and more efficient to simply have a single scheduler give the thread the guarantee that it needs. Guarantees cannot be added on a multiprocessor machine because the join scheduler may be scheduled by multiple parents at the same time, and it can make use of at most one. To address this case, multiprocessor join schedulers must have a method for deciding from which parent to accept a physical processor. For example, a join scheduler could assign unique numbers to its parents and always release any processors granted from parents other than the highest-numbered parent that grants a processor.

5.3.1.3 Limit

Limit schedulers can be used to convert a soft guarantee into a hard one. A limit scheduler that is given a guarantee of a basic, soft CPU reservation would, like a reservation scheduler, keep track of the amount of CPU time that it has allocated to its (single) child virtual processor during each period. However, when its child is about to receive more than the guaranteed amount, it releases the processor and does not request it again until the start of the next period.

5.3.1.4 Proportional Share

The PS scheduler that was implemented for HLS implements the start-time fair queuing (SFQ) algorithm with a warp extension similar to the one in BVT [20]. When the warp of all virtual processors is zero, it behaves as an SFQ scheduler. Goyal et al. showed that an SFQ scheduler provides fair resource allocation in a hierarchical scheduling environment where it does not receive the full CPU bandwidth. This result was derived in the context of network scheduling [28] and has also been applied to CPU scheduling [27]. Therefore, the conversion PS PS+ is justified. They also showed that when an SFQ scheduler is scheduled by a fluctuation constrained (FC) server, then the entities scheduled by the SFQ scheduler are also FC servers.

We now show that an FC server is the same as the PSBE (proportional share bounded error) guarantee. Following the notation in Lee [47, p. 127], an FC server is characterized by two parameters (s, ). Informally, s is the average share guaranteed to a virtual processor and is the furthest behind the average share it may fall. Let C(t) denote the instantaneous service delivered by a processor. For any times a and b with a < b

integral b a C(t)dt > max(0,s(b -a) -d)

(5.1)

Therefore, the FC server constrains the deviation from the average service rate. This is precisely what the PSBE guarantee does, and consequently an FC server with parameters (s,

) is equivalent to the guarantee PSBE s

. Therefore, an SFQ scheduler that is given a PSBE guarantee can also provide this guarantee to its children. The formula for calculating the parameters of the child FC servers was derived in [28]. Here we present a simplified version of that equation since the original accounts for variable-length scheduling quanta--this is unnecessary when SFQ is used to schedule the CPU, which uses a periodic clock interrupt to provide a uniform quantum size.¹

Let q be the scheduler quantum size and T be the total number of threads being scheduled by a SFQ scheduler, with r_f being the weight (the fraction of the total number of shares) assigned to thread f. Then, if an SFQ scheduler is scheduled by an FC server with parameters (s, ), each of the threads scheduled by the SFQ scheduler is also a FC server with parameters calculated as follows:

(5.2)

The utility of this result will become clear in Section 5.3.2 where we show that it is possible to convert a proportional share bounded error guarantee into a CPU reservation and vice-versa. Table 5.1 shows that proportional share scheduler may make use of the uniformly slower processor guarantee. Despite the fact that PS schedulers do not have any notion of deadlines, this can be accomplished by providing a uniform fraction of the CPU to the PS scheduler during each scheduling quantum.

Theorem 5.1. Any proportional share scheduler can be scheduled correctly by (1) a continuous CPU reservation whose period is equal to the size of the scheduling quantum, or (2) a uniformly slower processor by treating the end of each scheduling quantum as a deadline.

Informal proof. Since the time quantum of a PS scheduler determines its minimum enforceable scheduling granularity, the scheduler will function correctly as long as it receives a guaranteed amount of CPU time during each quantum. This can be accomplished by assigning to the PS scheduler a continuous CPU reservation with period equal to the quantum size, or scheduling it using a USP and treating the end of each period as a deadline.

This result is of limited use in real systems because the subdivision of scheduling quanta will result in fine-grained time slicing among different reservations or USPs. The purpose of scheduling quanta is to limit the number of context switches in a system. Therefore, fine-grained time slicing defeats the purpose of scheduling quanta.

5.3.1.5 CPU Reservations

Rez, the CPU reservation scheduler that was implemented for HLS, must be run at the root of the scheduling hierarchy unless it receives a RESU guarantee. It is capable of making use of such a guarantee because it is deadline-based, and consequently can make deadline information available to a uniformly slower processor scheduler.

5.3.1.6 Time Sharing

Since a time-sharing scheduler does not make any guarantee to entities that it schedules, it can make use of any guarantee. More precisely, it requires a NULL guarantee, to which any other guarantee may be trivially converted.

5.3.1.7 Schedulers From the Literature

This section explains and justifies the guarantee conversions for the schedulers listed in the bottom part of Table 5.1 .

The BSS-I [51] and PShED [52] schedulers provide uniformly slower processors to other schedulers. Any scheduler that has a run-time representation of its deadlines may use a uniformly slower processor.

The borrowed virtual time (BVT) scheduler [20] has not been shown to be capable of providing any reservation-like guarantee. It gives each thread a warp parameter that allows it to borrow against its future CPU allocation; this has the useful property of decreasing average dispatch latency for threads with high warps. Warp, in combination with two additional per-thread scheduler parameters warp time limit and unwarp time requirement, defeats PS analysis techniques that could be used to prove that BVT provides bounded allocation over a specific time interval. However, despite the additional scheduling parameters, BVT (1) limits the CPU allocation of each thread to its share (over an unspecified time interval) and (2) supports hierarchical scheduling in the sense that it only increases a thread’s virtual time when it is actually running. Therefore, BVT is capable of splitting a PS guarantee into multiple PS guarantees.

The constant bandwidth server (CBS) [2] provides the same scheduling behavior as a basic, hard reservation. Since it is deadline based, it can be scheduled by a uniformly slower processor.

The earliest eligible deadline first (EEVDF) algorithm [85] was shown to provide proportional share scheduling with bounded error. It has not been shown to work when given less than the full processor bandwidth.

The Linux scheduler is a typical time-sharing scheduler that provides no guarantees.

Lottery and Stride scheduling [90] provide proportional share resource allocation but do not bound allocation error. They have been shown to work correctly in a hierarchical environment.

The scheduler in the Portable Resource Kernel [68] was designed to be run as a root scheduler, and can provide both hard and soft CPU reservations. Although it is based on rate monotonic scheduling, it must have an internal representation of task deadlines in order to replenish application budgets. Therefore, it could be adapted to be scheduled using a uniformly slower processor.

Rialto [40] and Rialto/NT [37] are reservation schedulers. They could be adapted to be scheduled using a USP because they have an internal representation of their deadlines.

SFS [15] is a multiprocessor proportional share scheduler. It does not provide bounded allocation error to entities that it schedules.

The Spring operating system [82] uses a deadline-based real-time scheduler to provide hard real-time guarantees to tasks that it schedules. Since it is deadline-based, it can make use of a USP.

The total bandwidth server (TBS) [79] provides soft CPU reservations: it guarantees that a minimum fraction of the total processor bandwidth will be available to entities that it schedules, but it can also take advantage of slack time in the schedule to provide extra CPU time. It is deadline-based, and consequently can be scheduled using a USP.

The Windows 2000 time-sharing scheduler requires and provides no guarantees.

5.3.2 Converting Guarantees Using Rewrite Rules

Rewrite rules exploit the underlying similarities between different kinds of soft real-time scheduling. For example, it is valid to convert any CPU reservation with amount x and period y to the guarantee PS x/y. This conversion means that any pattern of processor allocation that meets the requirements for being a CPU reservation also meets the requirements for being a PS guarantee. Clearly, the reverse conversion cannot be performed: a fixed fraction of the CPU over an unspecified time interval is not, in general, equivalent to any particular CPU reservation.

Table 5.2 shows which guarantees can be converted into which others using rewrite rules. Characters in the matrix indicate whether the guarantees listed on the left can be converted to the guarantees listed on the bottom. Feasible conversions are indicated by “t,” while impossible conversions are indicated by “f.” When a conversion or lack of conversion is non-trivial, the accompanying number refers to a theorem from this section.

ALL

RESU

RESBH

f, 5.3

t, 5.2

t, 5.5

t, 5.6

RESBS

f, 5.3

t, 5.2

t, 5.5

t, 5.6

RESCH

t, 5.4

t, 5.6

RESCS

t, 5.4

t, 5.6

RESPS

f, 5.3

t, 5.2

t, 5.5

t, 5.6

RESNH

t, 5.4

t, 5.6

RESSH

t, 5.4

t, 5.6

PSBE

t, 5.7

NULL

ALL

RESU

RESBH

RESBS

RESCH

RESCS

RESPS

RESNH

RESSH

PSBE

NULL

Table 5.2:

Guarantee conversion matrix

The following lemma will be used to establish an equivalence between basic and continuous CPU reservations.

Lemma 5.1. Given a thread with guarantee RESBH x y or RESBS x y, every time interval (2y -x) units long contains at least x units of CPU time. Furthermore, any time interval smaller than this is not guaranteed to contain at least x units of CPU time.

Proof. According to the definition on Section 5.2.2.2 , a basic CPU reservation contains at least x units of CPU time in each time interval [t+iy, t+(i+1)y] where t is a time chosen by the reservation scheduler and i is an integer. We refer to each of these time intervals as a period of the reservation. To show that an arbitrary time interval of length 2y - x is sufficiently long to always contain x units of CPU time, we examine two cases, depicted in Figure 5.2.In the first case, the 2y - x interval overlaps three periods of the reservation, and therefore contains a complete period. Then, by the definition of basic CPU reservation, there must be at least x units of CPU time within this interval. In the second case, the interval 2y - x overlaps two periods of the reservation, and does not contain a complete period. Without loss of generality, assume that it is contained by time interval [t, t+2y]. By the definition of basic CPU reservation, this interval must contain at least 2x units of CPU time. Observe that there is not enough room inside of this interval 2y units long to contain both 2x units of CPU time and the arbitrary interval 2y - x units long unless at least x of the units of CPU time are contained in the arbitrary interval.

Figure 5.2:

Time-lines for case analysis in the sufficient condition of the proof of Lemma 5.1

We show that an interval of length 2y - x is necessary by contradiction. Assume that there exists an 0 < < x such that every time interval of length 2y - x - contains at least x units of CPU time. Without loss of generality we examine the time interval [t, t + 2y] where t is the time chosen by the scheduler in the definition of basic CPU reservation in Section 5.2.2.2 . Furthermore, assume that the basic reservation scheduler allocates CPU time to the reservation during time intervals [t, t + x] and [t + 2y - x, t + 2y]. In other words, it allocates CPU time at the beginning of the first period and the end of the second, as Figure 5.3illustrates. Then during the time interval [t + x, t + 2y - ] the reservation will have been scheduled for x - time units, contradicting our assumption that every time interval 2y - x - units long contains at least x units of CPU time.

Figure 5.3:

Time-line for the necessary condition of the proof of Lemma 5.1

The following theorem shows that basic CPU reservations can be converted into continuous CPU reservations.

Theorem 5.2. The guarantees RESBS x y and RESBH x y can each be converted into the guarantee RESCS x (2y - x + c) for any c > 0.

Proof. Lemma 5.1 proved that an arbitrarily placed time interval 2y -x or more time units long will contain at least x units of CPU time. This meets the definition of a continuous CPU reservation.

The following theorem proves that it is not possible to convert an arbitrary basic CPU reservation into a continuous, hard CPU reservation. The only basic CPU reservation that is also a hard, continuous CPU reservation is the trivial basic reservation that is equivalent to ALL.

Theorem 5.3. Neither RESBS x y nor RESBH x y can be converted into a continuous, hard CPU reservation unless x = y.

Proof. By Theorem 5.2 the guarantees RESBS x y and RESBH x y can each be converted into a continuous, soft CPU reservation with period x and amount 2y - x + c where c > 0. The soft CPU reservation cannot also be a hard reservation unless its utilization is bounded above as well as below. The requires that the utilization of the original basic reservation, which is x/y, be equal to the utilization of the new continuous reservation, which is x/(2y -x+c). This condition is satisfied only when x = y and c = 0.

The next lemma will be used in the proof that continuous CPU reservations can be converted into proportional share guarantees with bounded error.

Lemma 5.2. The longest time interval during which a thread with guarantee RESCS x y or RESCH x y can be starved is y - x units long.

Proof. A gap is a time interval during which a particular task is not scheduled. Assume that a task experiences a gap of length g, where g > y - x, starting at time t. Then, during the time interval [t, t+y] the task will not be scheduled until time t+g, making it impossible for the task to receive x units of CPU time before time t+y. This is a contradiction according to the definition of continuous CPU reservation given in Section 5.2.2.2 .

The following theorem establishes a correspondence between continuous CPU reservations and proportional share guarantees with bounded error.

Theorem 5.4. The guarantees RESCH x y or RESCS x y may be converted to the guarantee PSBE (y - x).

Proof. The long-term average service rate of the CPU reservation is

. This is the correct value for the share part of the PS guarantee because share and long-term average service rate mean the same thing.

is, intuitively, the farthest a thread receiving PS scheduling is allowed to fall behind its average service rate. By Lemma 5.2 , the longest gap in service that a task with guarantee RESCS x y or RESCH x y can experience is y - x units long. During a gap of length y - x a reservation with service rate

falls

(y -x) units behind its average service rate. Therefore, this is the correct value for

The next lemma will be used to prove that basic CPU reservations can be converted into proportional share guarantees with bounded error.

Lemma 5.3. The longest time interval during which a thread with guarantee RESBS x y or RESBH x y can be starved is 2(y - x) units long.

Proof. A thread with a basic CPU reservation experiences the longest possible gap when the reservation scheduler schedules it at the beginning of one period and at the end of the next period. In other words, for a time t chosen by the scheduler, the application is scheduled during intervals [t, t + x] and [t + 2y - x, t + 2y]. The gap in scheduling between these intervals is 2(y - x) time units long.

The following theorem establishes a correspondence between basic CPU reservations and proportional share guarantees with bounded error.

Theorem 5.5. The guarantees RESBH x y or RESBS x y may be converted to the guarantee PSBE 2(y - x).

Proof. The average service rate of a basic CPU reservation--like the average service rate of a continuous CPU reservation--is the same thing as the share part of a PS guarantee. By Lemma 5.3 the longest gap in service for an entity with a basic reservation is 2(y-x) units long. During the time interval 2(y - x), the service provided by the reservation will fall behind by 2

(y - x). Therefore, this is the correct value for

The next theorem proves that any CPU reservation may be converted into a proportional share guarantee.

Theorem 5.6. Any CPU reservation, whether hard or soft, basic or continuous, with amount x and period y may be converted into the guarantee PS x/y.

Proof. The PS guarantee requires a specified long-term average fraction of the processor bandwidth. The long-term average fraction of the processor guaranteed by a CPU reservation is x/y.

The following theorem was motivated by an observation by Stoica et al. [84], who said it is possible to provide reservation semantics using a proportional share scheduler.

Theorem 5.7. The guarantee PSBE s can, for any y > , be converted into the guarantee RESCS (ys - ) y or RESBS (ys - ) y.

Proof. The proportional share guarantee states that during any time interval n units long, a thread with a share s of the CPU will receive at least ns-

units of CPU time where

is the allocation error bound. Let y be the period of the desired CPU reservation. A thread is then guaranteed to receive ys -

units of CPU time during every time interval y units long. This is precisely the definition of a soft, continuous CPU reservation. Trivially, it is also a soft, basic reservation. y cannot be less than

because that would make the quantity (ys -

) negative.

5.3.3 Implications of the Equivalence of Reservations and Proportional Share with Bounded Error

Theorems 5.4 , 5.5 , and 5.7 show that CPU reservations can be converted into proportional share guarantees with bounded error and vice versa. The abstraction underlying all of these guarantees--a minimum allocation of CPU time over a specified time interval--is necessary for real-time scheduling to occur.

The conversions are useful when combined with the results that were proved about hierarchical SFQ schedulers. Recall that if a SFQ scheduler receives a PSBE guarantee it also provides this type of guarantee to its children. Therefore, a reservation scheduler can be used at the root of the scheduling hierarchy to schedule applications whose threads require precise or fine-grained scheduling, as well as giving a reservation to an instance of the SFQ scheduler. The reservation can be converted into a PSBE guarantee that is useful to the SFQ scheduler, which can then give performance guarantees to entities that it schedules.

There is a good reason not to use a proportional share scheduler as the root scheduler in a system, even though PS schedulers that have bounded allocation error are perfectly capable of providing CPU reservations. The reason is that while these schedulers bound allocation error, they do not bound allocation error within a period of a thread’s choosing. To see this, let us first look at SFQ.

5.3.3.1 Providing a Reservation Using Start-Time Fair Queuing

Consider an SFQ scheduler that uses a quantum size of 10 ms (a common quantum size in a general-purpose operating system) and receives a guarantee of RESCS 10 20. It schedules 6 threads, T1-T6. T1 has weight 0.5 and T2-T6 each have weight 0.1. This scheduler is depicted in Figure 5.4 .Applying Theorem 5.4, the SFQ scheduler’s guarantee can be converted to the guarantee PSBE , or PSBE 0.5 5. Using Equation 5.2, the thread with weight 0.5 can be shown to have a guarantee of PSBE , or PSBE 0.25 75. By Theorem 5.7 , this guarantee can be converted to RESCS (400 ^. 0.25 - 75) 400, or RESCS 25 400.

Figure 5.4:

Using an SFQ scheduler to provide a CPU reservation

This reservation is highly pessimistic: instead of being guaranteed to receive 25% of the CPU (its long-term average share), T1 is only guaranteed to receive 6.25% of the CPU during any given 400 ms period. In order to bring the level of pessimism down to 10% the reservation period y would have to be 3000 ms, or 3 s--far too long a deadline for most multimedia applications. This example illustrates a serious shortcoming of SFQ: it has delay bounds that are proportional to the number of threads that it is scheduling. In a typical time-sharing system with dozens of threads the kind of performance guarantee given to threads by an SFQ scheduler would not be useful for scheduling multimedia applications.

5.3.3.2 Providing a Reservation Using EEVDF

EEVDF [34] has a tighter error bound than SFQ: allocation error is bounded to the length of a single scheduling quantum. This bound is optimal in the sense that no quantum-based scheduler can have a lower error. To see the consequences of the EEVDF error bound, assume that a thread in a video application requires a guarantee of 5 ms / 33 ms in order to display 30 frames per second, and that we want to provide this reservation using an EEVDF scheduler. Assume that the EEVDF scheduler uses a 10 ms time quantum and is at the root of the scheduling hierarchy where it receives a guarantee of ALL.

The guarantee provided by the EEVDF scheduler will have the form PSBE s , where =10 ms. To construct a CPU reservation with period 33 ms from the PSBE guarantee we use Theorem 5.7 , which states that the guarantee PSBE s can be converted into a soft, continuous CPU reservation with amount (ys - ) and period y for any y > . Combining this conversion with the PSBE guarantee, we can deduce that the reservation amount (ys - ) will be (33s - 10), which must equal 5 ms. This constrains the value of s to be , or 0.45.

Therefore, providing a CPU reservation of 5 ms / 33 ms, or 15% of the CPU, using EEVDF requires allocating 45% of the processor bandwidth when the EEVDF scheduler uses a 10 ms quantum. Clearly, this degree of over-reservation is not acceptable in general. EEVDF’s error bounds are optimal, so it is impossible for a different proportional share scheduler to do better than this. To reduce the level of pessimism, the only parameter available to adjust is the scheduler quantum size.

Assume that 10% is the largest degree of pessimism that is acceptable for the video player reservation. This means that s, the share of the CPU reserved for the thread, must be no more than 1.1(5/33), or 16.7% of the CPU. So, we assume that the EEVDF scheduler provides the guarantee PSBE 0.167 and we need to solve for a value of that allows the PSBE guarantee to be converted into the guarantee RESCS 5 33. To do this we apply Theorem 5.7 backwards: we know that x must be 5 ms and y must be 33 ms; this tells us that 5 = (33)(0.167) - . Solving for , we find that the required scheduling quantum is 0.5 ms.

Since EEVDF is an optimally fair PS scheduler, it will context switch between threads every 0.5 ms even when there are no real-time applications in the system. This will cause unacceptable inefficiency for threads that make use effective use of the cache. As we will show in Section 10.3.3 , on a 500 MHz Pentium III a thread whose working set is 512 KB, the same size as the level-two cache, can take more than 2 ms to re-establish its working set in the cache after a context switch. Clearly, if context switches occur every 0.5 ms, applications with large working sets will not benefit from the cache.

These calculations indicate that proportional share schedulers are not suitable for providing CPU reservations to applications whose periods are of the same order of magnitude as the scheduler quantum size. We conclude that EDF- or static-priority-based reservation schedulers should be used to schedule applications with precise, fine-grained real-time requirements.

5.4 Conclusion

A hierarchy of schedulers composes correctly only if each scheduler in the hierarchy receives an acceptable guarantee, or a guarantee that can be converted into an acceptable guarantee. This chapter has (1) described the guarantee mechanism and how it can be used, (2) described the sets of guarantees that are acceptable and provided by a number of multimedia schedulers, and (3) shown which guarantees can be converted into which other guarantees, including proofs of the conversions when necessary.

[next] [prev] [prev-tail] [front] [up]