Overview of Distributed File Systems - Part 2 --------------------------------------------- ============================================================================ Topic: Sharing semantics Possible choices: * Unix semantics: - every read sees the effect of every write - writes to open files immediately reflected in all clients - can share location pointers (e.g., shell programs) Issues: * Expensive to implement! * Rarely required in full generality * Generally requires centralized, stateful algorithms * Can work around it (?) * Session semantics: - clients get a snapshot of file when they open it - local writes immediately visible - after close, other opens will see changes >>>>> - no shared file pointers (e.g., shell script example) - breaks some programs, but common (e.g., Andrew) * Immutable files: - write-once, overwrite to modify - hard to use * Transaction oriented: - must be serializable - must use locks - must support rollback - pain in the ass to use * Real systems: - NFS (delayed write semantics) - Microsoft Lanman (oplocks - centralized consistency mgmt) - Medley (oplocks plus DSM) ========================================================================= Topic: Performance Improving performance translates almost directly to efficient caching, avoiding disk accesses, and avoiding network IPC. As with any caching study, it is important to know how the data being cached is commonly used (Satya's study prior to Andrew/CODA): * Most files are small (under 10K) * Most data is stored in big files (the big ones are BIG) * Reading is much more common than writing * Sequential access dominates * Many temporary files * Files are only rarely actively shared * Distinct file classes, e.g., >>>>> - binaries (rarely change, basically read-only -> replicate >>>>> - compiler and other temporary files -> handle 100% locally >>>>> - mailboxes and other private files -> basically never shared - regular files -> on demand replication might help + sharing Issues: * Whole file sharing versus blockwise sharing (Old Andrew versus Sprite LFS) * Where to cache? Clients (disk or memory) or server (memory)? Sprite folks make a strong argument for client side caching. File servers usually have tons of DRAM to do server caching. A combination seems in order: - client-side caching for unshared data, temp files - server-side caching for interprocess locality * If client, where? - library code (Boo! Little sharing or reuse) - client kernel (disk buffer cache) - separate client-side cache server (pin pages?) * If client, volatile or persistent? - most systems, only in volatile DRAM - in Medley, both volatile and persistent * Cache consistency model - disable caching * imposes serious load on server * hope sharing is RARE (LFS) - write through * lots of messages * need to revalidate at each subsequent open * does not help writes * does not help common case of temporary files - write back on close * session semantics * set timer, flush dirty blocks every N seconds * delay write back (for temporary files) (NFS+NT) * Who guarantees validity of cache contents (client or server)? - check on each open - callbacks (requires stateful server) - e.g., LanMan oplocks - DSM-like protocols * How do you deal with possible inconsistencies? - Can I use the cached data if remote server is down? ========================================================================== Topic: System Structure -- stateless versus stateful Advantages of Stateless Advantages of Stateful ----------------------- ---------------------- Better fault tolerance! Shorter request messages No OPEN/CLOSE calls Cache state information No server-side tables Prefetching Client crashes? No problem. File locking easier Which do you think is better? Why? Real world examples: Stateless: NFS Stateful : Andrew, LanMan (Samba) ========================================================================== Topic: Fault Tolerance For most filesystems, this comes down to REPLICATION. Why replicate? - Increase reliability (backup copies of data) - Increase availability (another server can take over) - Load balance (spread workload across servers) Issues: * How do you decide what to replicate - explicit versus on-demand * Consistency management: - leader/cohort model (aka "primary copy" replication) - voting algorithms (talk about reader/writer quorums and ghosts) - atomic commit protocols ============================================================================ Topic: Designing for Scalability and Performance * Off load work from server(s) * Avoid centralized algorithms and bottlenecks * Consider using clusters * Replication (at least read-only!) * Multiple lightweight processes in server * Log-structured file systems ============================================================================ Topic: Future * Portables (CODA) * Wide area file systems (e.g., WebNFS and CIFS) as replacement to HTTP for wide area sharing * Faster networks and processors, but "slower" disks and memory * Specialized servers: Object-oriented/database file servers