File: ftp://ftp.lcs.mit.edu/pub/carl/lottery/README.MACH_LOTTO Document: Mach Lottery Scheduling Code README Author: Carl Waldspurger Last modified: 21-Dec-94 * COPYRIGHT AND DISTRIBUTION All new code, modifications to existing Mach code, and documentation in this distribution are subject to the following conditions (effectively identical to the standard Mach distribution terms): Copyright (c) 1993, 1994 Carl A. Waldspurger. All Rights Reserved. Permission to use, copy, modify and distribute this software and its documentation is hereby granted, provided that both the copyright notice and this permission notice appear in all copies of the software, derivative works or modified versions, and any portions thereof, and that both notices appear in supporting documentation. CARL WALDSPURGER ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" CONDITION, BUT DISCLAIMS ANY LIABILITY OF ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. It is requested that users of this software return any improvements they make and grant the rights to redistribute these changes to: Carl Waldspurger or carl@lcs.mit.edu MIT Laboratory for Computer Science 545 Technology Square, Room 206 Cambridge, MA 02139 * OVERVIEW This distribution contains new source code files and modifications to existing Mach 3.0 microkernel sources (MK82) for a DECStation 5000/125. Instead of just providing diffs for the changes, I've included the complete modified files. I believe that everything is provided that is required to build the kernel in a shadow directory of an MK82 distribution using odemake. This distribution is virtually identical to the one that I used for the experiments in our OSDI '94 paper (Carl A. Waldspurger and William E. Weihl. "Lottery Scheduling: Flexible Proportional-Share Resource Management," in Proceedings of the First Symposium on Operating Systems Design and Implementation, Monterey, California, November 1994). This file is intended to serve as a guide to the prototype lottery scheduling implementation. The information in this file is currently incomplete and rather rough, but it should still be helpful. If you have any questions, please send e-mail to "carl@lcs.mit.edu". The new files "lotto.h" and "lotto.c" in "mk/kernel/kern" contain the bulk of the lottery scheduling definitions and code. The small files "lotto_debug.h" and "lotto_rnd.s" are also new. The new files "mach_lotto.defs", "mach_lotto_types.defs", and "mach_lotto_types.h" in "mk/kernel/mach" contain the appropriate type and routine definitions for MIG-based IPC to handle kernel requests related to lottery scheduling. A large number of modifications were also made to several of the existing Mach kernel source files in "mk/kernel/kern". Except for modifications to "sched_prim.c", most changes were relatively minor; the only tricky part was determining where to make them (this was my first exposure to the Mach kernel). Virtually all modifications are surrounded by "#ifdef MACH_LOTTO" and "#endif MACH_LOTTO". Modifications for automatic ticket transfers during synchronous RPCs via mach_msg() were also made to "mach_msg.c" and "ipc_mqueue.c" in "mk/kernel/ipc". These modifications are surrounded by "#ifdef MACH_LOTTO_IPC" and "#endif MACH_LOTTO_IPC". Several common modifications were applied to many files. Despite what appears to be an abstract scheduling policy mechanism, the vanilla Mach sources frequently assume that there are only two possible scheduling policies: POLICY_TIMESHARE and POLICY_FIXEDPRI. I modified many files to update code containing conditionals involving scheduling policy. I also changed many "#ifdef MACH_FIXPRI" occurrences to "#ifdef MACH_FIXPRI || MACH_LOTTO". The vanilla Mach scheduling code also assumes that all runnable threads are enqueued on one of the priority run queues. This isn't the case for lottery-scheduled threads, which use their own separate queue. Since lottery-scheduled threads can co-exist with priority-scheduled threads, I needed to add checks to guard against the priority-based code indexing off the end of the run queues. To do this I introduced the constant LAST_VALID_RUNQ (= 31), and inserted appropriate checks. I also define the constant LOTTO_PRIORITY to allow lottery-policy threads to co-exist with priority-policy threads. Its current value is 7, which is higher priority than typical user threads, but lower priority than fixed-priority kernel or UX server threads, like the Ethernet driver. This is a hack, and was done to avoid modifications to many parts of the Mach sources that I didn't want to change. In principle, all priority scheduling could be converted to lottery scheduling, or a better mechanism could be used for their peaceful co-existence (such as giving priority-based threads a fixed share of the machine). * DOCUMENTATION AND CODING STYLE Documentation at the function level is fairly good; I tried to follow the MIT (6.170/Liskov/Guttag) "requires", "modifies", and "effects" commenting style. I also added a "locked" comment to indicate if any locks must be held before invoking a function. Sometimes this gets a bit verbose, but I generally find that it's helpful. I also use a coding style that bloats line counts, e.g. separate line per parameter in function defintions, separate line for braces in blocks, lots of white space and comments, etc. The coding style bears some resemblance to the GNU conventions. * NORMAL AND PRIMITIVE FUNCTIONS Many functions have both a "normal" version and a "primitive" version, e.g. lotto_create_currency() and lotto_prim_create_currency(). The "normal" version is a simple wrapper that acquire a lock (usually the pset lotto lock via the LOTTO_LOCK macro), invoke the primitive function, and then releases the lock (usually via the LOTTO_UNLOCK macro). The primitive functions do the "real work", and are also invoked from within functions that have already acquired the appropriate lock. * DEBUGGING CODE A fair amount of debugging code is included. Since the code is stable, I was going to remove most of the debugging code. However, I decided that others might find the debugging code helpful, so I ended up leaving it in, although it clutters up the code somewhat. The debugging compilation flags are defined in lotto_debug.h. Really verbose or unnecessary checks are performed only when MACH_LOTTO_DEBUG_VERBOSE > 1. Many sanity checks are always included via the LOTTO_ASSERT() macro. * POTENTIAL OPTIMIZATIONS As mentioned in our OSDI paper, the prototype lottery scheduler has not been optimized. Some obvious candidates for optimization include: modifying mach_msg() to handle ticket transfers on the fast path, caching currency and/or ticket values, inlining functions, removing unnecessary assertions, metrics, and sanity checks, and perhaps performing locking at a finer granularity (e.g. lock per currency). * USER-LEVEL COMMAND-LINE INTERFACE PROGRAMS The files in "user/bin/lotto" are all new. The files "lotto_util.h" and "lotto_util.c" implement a set of useful utility routines. The remaining files implement commands to create currencies and tickets ("mkcur.c", "mktkt.c"), destroy currencies and tickets ("rmcur.c", "rmtkt.c"), list currencies and tickets ("lscur.c", "lstkt.c"), fund and unfund a currency ("fund.c", "unfund.c"), execute a program with specified funding ("fundx.c", "fundrx.c"), and obtain status information about various metrics ("lottostat.c"). Aside from their general utility, these programs also demonstrate how many of the lottery scheduling kernel calls can be used. * CHANGE SUMMARY FOR MODIFIED FILES * DIRECTORY mk/kernel/kern/ (1) ast.c - updated conditional policy code - prevent priority runq overflows (2) eventcount.c - avoid simpler_thread_setrun() when MACH_LOTTO defined (3) ipc_kobject.c - added hook to handle lotto server requests (4) priority.c - updated conditional policy code - update thread quantum usage (for compensation tickets) (5) processor.h - defined lotto policy-specific fields, including the lotto-policy state lock, primitive currencies and tickets, and various metrics - update lock ordering diagram to include lotto_lock (6) processor.c - updated conditional policy code - initialize and support lotto policy-specific pset fields (7) sched.h - updated conditional policy code - updated csw_needed() macro for lotto policy - defined lotto-policy run queue, count in struct run_queue (8) sched_prim.c - updated conditional policy code - prevent priority runq overflows - ignore priority changes for lottery-scheduled threads - defined lotto_run_queue_enqueue() - use lotto_run_queue_enqueue() instead of run_queue_enqueue() - updated rem_runq() to handle lottery-scheduled threads (9) startup.c - invoke lotto_init() in setup_main() (10) syscall_subr.c - handle "priority depression" for lottery-scheduled threads (11) task.h - defined lotto policy-specific task currency field (12) task.c - initialize, support, and destroy task currency appropriately (13) thread.h - defined lotto policy-specific fields, including thread currency field, quantum-compensation related fields, and lotto-policy IPC fields for automatic ticket transfers (14) thread.c - updated conditional policy code - initialize, support, and destroy lotto policy-specific thread fields appropriately - ignore priority changes for lottery-scheduled threads - allow lotto-policy => fixpri/timeshare policy change * DIRECTORY mk/kernel/mach/ (1) policy.h - defined POLICY_LOTTO * DIRECTORY mk/kernel/ipc/ (1) ipc_mqueue.c - modified ipc_mqueue_send() and ipc_mqueue_receive() to transfer funding from sender to receiver on a call, and return the funding on a reply (2) mach_msg.c - use simpler, unoptimized mach_msg_trap() when MACH_LOTTO_IPC defined; this was much easier than trying to modify the fast path, which does lots of inlining and is relatively complicated - enable and disable ticket transfers depending on message type