CS6963 Distributed Systems

Lecture 21 Final Review

  • 20 questions, multiple choice
  • 2 to 3 questions on each paper except COPS

  • RPC, threading models

  • Replication

    • Primary/backup: FDS
    • Consensus: Paxos, Raft, Spanner, Lab 3/4
    • BFT
  • Partitioning/scaling

  • Fault tolerance

    • MapReduce
    • Spark
    • Paxos, Dynamo, Memcache at Facebook
  • Consistency

    • Linearizability
    • Leases
    • Eventual consistency: Dynamo, Bayou
    • Causal consistency
  • Transactions, concurrency control, atomicity

    • OCC Thor
    • Sinfonia
    • Argus
    • 2PC
  • Naming

    • Consistent hashing
    • Chord
  • Types of services

    • Key-value stores, transactional stores, distributed filesystems, datacenter filesystems
    • Batch processing systems

MapReduce

  • Flow of data: GFS, read local, Map, write local, shuffle, Reduce, write GFS
  • Failure, recovery

Spark

  • RDDs
    • Function to compute, list of parents + narrow/wide, list of partitions where/whether materialized, parititioning scheme, computation placement hint.
  • Lineage
  • Laziness
  • persist(), reuse of RDDs through in-memory materialization
  • Recovery model
  • Wide and narrow dependencies

Leases

  • Temporarily delegate ownership of an object to clients
  • Requires bounded clock drift
    • What goes wrong if this assumption is violated?
  • Understand tradeoffs:
    • Longer/shorter leases
    • Leases that group objects
    • False sharing
    • Impact of latency on lease length
  • Understand lease behavior under failures.

Spanner

  • (Note to self: pick up notes at 'Q: In Lab 3,')
  • Consider the relationship of Lab 3/4 to Spanner.
  • Linearizable semantics even with data center failures
  • Paxos groups, 2PC, TrueTime, read-from-any, external consistency across partitions...
  • Focus on:
  • Want read from any replica
  • More efficient, allows read from local data center
  • Problem: reads may not be externally consistent, may not see same point in 'time' across different keys residing on different Paxos groups.
  • Understand why reads have to be delays - to ensure RSMs have 'caught up'.
  • Understand what could go wrong if clock assumptions are off.

Chord

  • Consistent hashing
    • Understand main benefit: scale-up/down
  • Understand query routing basics
  • Successor
  • Finger tables
  • What happens if finger tables are out of date?

Bayou

  • Eventual consistency
  • Disconnected operation
  • Log prefix property
  • What goes wrong if causality isn't obeyed
  • Lamport clocks and causality

Dynamo

  • Sloppy quorum
  • Vector clocks
    • Go through divergence
    • Supercession.
  • Application merge semantics on write back
  • Thomas' write rule
  • Know what can go wrong here
    • Linearizable? No.
    • Read your own writes? No.
    • Eventually "converges"? Yes.

COPS

  • CAP theorem
  • Otherwise not covered

PBFT

  • Byzantine faults
  • Independence
  • 3f + 1 -> 2f + 1 -> f + 1
  • Need for 2 phase protocol
  • One phase to choose an ordering within a view
  • Another phase to ensure enough servers agreed to the ordering within the view to be sure future primaries won't overwrite the operation.
    • Need f + 1 honest replicas to agree, hence 2f + 1 to agree.