CS606: Advanced Operating Systems Lecture 1 Overall organization: - Why distributed systems? - Unique challenges of distributed systems - Operating system evolution - Administravia - Goals of distributed systems - Classic forms of distributed systems - Networking review (?) ---------------------------------------------------------------------------- Topic: Why distributed systems? Old cost/MIP reality: double the cost, quadruple the power New cost/MIP reality: double the cost, marginally improve the power (but add lots of I/O capacity -> "servers") Old networking reality: 10 mbps Ethernet, 56Kbps leased lines, 14.4K modems (or sneakernet/gradstudentnet) New networking reality: 100-1000 mbps Ethernet, T3 lines, ISDN or better Old software reality: Big, integrated, specialized software packages New software reality: Lots of small software components and services Old PC reality: PCs (and to a lesser extent workstations) slow/unreliable New PC reality: Ok, some things don't change (...just kidding...) Put this all together: * Best way to maximize raw MIPS: lots of cheap machines & fast network * Best way to get economies of scale: - everybody has machine(s) on desktop - workgroup data shared - common applications and tools shared (focus of new "TCO" effort) - (emerging) realtime interactive groupware/conferencing * Unfortunately, the infrastructure isn't quite there to make this as nice as it should be ---------------------------------------------------------------------------- Topic: Unique challenges of distributed systems 1. Far more complex types of failures: network partitions (yuck!), corrupted data, "byzantine" failures, partial system failures, etc. 2. Lack of global information 3. Data replication: Why ? Fault tolerance (replicas) and performance (caching) Problems? Synchronization and concurrency control 4. Lack of shared memory (message passing) Q: What else? ---------------------------------------------------------------------------- Topic: Operating system evolution Centralized operating system ---------------------------- Characteristics: process management, memory management, I/O, files Goals: resource management, virtual machine (virtualization) Network operating system ------------------------ Characteristics: remote access, information exchange, network browsing Goals: resource sharing (interoperability) Distributed operating system ---------------------------- Characteristics: Global view of... * file system * name space * memory * time * security * computational power * ... Goals: single computer view of multiple computer system (transparency) Reality today: somewhere between network and distributed OSes ---------------------------------------------------------------------------- Topic: Administravia - Will start meeting in the CSL library starting Wednesday - Join class mailing list (cs7460) - handled by majordomo@cs - Am in the process of creating a class web page - Read Chapter 1 of Tanenbaum (skip 1.3) for Wednesday ---------------------------------------------------------------------------- Topic: Goals of distributed system designs * Efficiency: more complex than for single-node system due to network, load imbalances, etc. Can we harness close to raw aggregate MIPS? * Flexibility: Can we evolve, migrate functionality, update functionality on the fly, avoid unreasonable restrictions on users? * Consistency: Difficult to achieve in distributed systems (lack of global information, replication, partitioning in time and space, failures, etc.). System must maintain INTEGRITY even under duress. * Robustness: Greater average MTBF of any component cannot lead to high MTBF of entire system (e.g., DNS, file system, etc.) * Transparency: To migrate from a NOS to a DOS, we want transparency. - access transparency (remote/local is irrelevant) - location transparency (name transparency) - migration transparency (location independence) - concurrency transparency - replication transparency - failure transparency - performance transparency - size transparency (scalability) - revision transparency (dynamic upgradeability) ---------------------------------------------------------------------------- Topic: Classical forms of distributed systems * Client-server: Idea: To single-system abstraction, bundle common system services from many machines to run in one "central" server. Any client machine wishing to get access to this service contacts this server. Common forms of servers: - file server - print server - time server - directory (name) server - network server (DNS) - process server (as you get more integrated) - security server - authentication server Q: What are some other kinds of servers? * Peer-to-peer (unintegrated): Idea: Rather than centralizing each service, let each node provide a local version of the service, but make it available for remote nodes. Example: LANtastic or Win95 shares In many ways, this is like client-server, but without as much transparency or performance. * Clusters: Idea: Have a collection of nodes COOPERATE to provide a service. Could think of this as a cross between C/S and peer-to-peer, with high degrees of integration, replication, migration, fault tolerance, load balancing, etc. Example: VAXclusters, Inktomi's web search engine, or Mango Medley. Advantages: - scalability - high availability (fault tolerant) - load balancing Disadvantages: - concurrency control - complexity * "True" distributed system (aka, single system image) Idea: Build a totally dynamic, self-healing, self-tuning,network-wide operating system that manages the entire collection of resources in the system as a single, large, powerful machine. Examples (partial): Apollo Domain, V, Locus, DCE, Orca. Unique features: - integrated process management (adaptive cross-machine scheduling and load balancing) - integrated resource management - process and service migration - automatic service replication - strong support for concurrency control ---------------------------------------------------------------------------- Topic: Optional overview of networking Types of networks: - broadcast (mostly gone in wired networks now) - switched (pretty much taken over the LAN market) - point-to-point (dominant in the WAN infrastructure) Components: - switches - bridges - routers Common LANS: - Ethernet - Ethernet - Ethernet Commons single-site backbones: - ATM - FDDI Common failure modes: - adapter failures (normally failstop, sometimes byzantine) - switch failures - router failures - SOFTWARE FAILURES (router tables, DNS entries, etc.)