CS6963 Distributed Systems

Lecture 09 Consensus, Raft

  • Objectives
    • Understand how Raft works.
    • Discuss the differences and similarities of Raft and Paxos.
    • Discuss whether (and where) Raft meets its goals of understandbility.

Discussion Questions

  • Q: How many failures can this/Paxos tolerate?

    • What about P/B? -> more economical
    • f + 1, 2f + 1, 3f + 1??? Would that be useful?
    • Turns out this is really interesting.
  • Q: What are some of the key (claimed) differences to multi-Paxos?

    • P305
  • Q: What are their complaints about Paxos?

    • P306
    • Difficult to understand: what evidence do they give?
      • You know the algorithm; do you believe it?
    • Claim is that this comes from the single decree formulation.
    • Hard to understand on its own. (Hard to see the point on its own.)
    • Hard to wrap a full system around.
    • No good for implementation because no widely agreed upon full algorithm.
  • Q: What are the criteria they spell out in Section 4 for understandability?

    • Do they always adhere to it?
    • e.g. Does the configuration change protocol seem as simple as possible?
    • The multi-Paxos alpha seems much simpler, though we didn't get to it in class.
    • Raft doesn't work with alpha, though.
    • My take: they are definitely improving intuition, but I'm not sure that I understand it more clearly when it comes down to the details.
    • Most explanations sweep complex details away. Good for intution, but can feel uncomfortable when I think hard but not hard enough.
  • Q: Is this true at the bottom of 310?

    • A log entry is committed once the leader that created the entry has replicated it on a majority of the servers.

Slides

Server States (6)

  • At most 1 viable leader at a time.
    • Is this true? What's the key word that makes this true?
    • Ousterhout is constantly doing this: he says true things that are nearly false; the goal is to lull the listener into thinking things are simple.
  • At most 1 viable leader at a time != 1 leader at a time unless you twist the definition of leader.

Terms (7)

  • Is time really divded into terms?
    • Yes, but not wall time. Again, depends on how you define it.
  • What role are terms playing that's similar in Paxos?
    • The 'abort' aspect of the prepare's proposal numbers.

Committing Entry from Earlier Term (19)

  • Problem here: what if s1 goes down?
  • Q: Why isn't <3,2> committed? What's the problem/threat?
    • How can we guarantee that a log entry won't be unwound?
    • There's only one way: to ensure that its last entry contains the highest possible term.
    • What term is that? Only the current term.
    • Why is this important? Because entries with higher terms could wipe out any entry that isn't 'protected' by a later engety with the highest term number on it.
    • So: we've got to stick a 'cap' on the end of the log that protects prior entries from being unwound.
    • Idea: commit an entry from the current leader's term (by simple majority replication).
    • Once we've done that the log can't get 'peeled back' across the older entry.
  • Q: Why not just have the leader overwrite the old entry with a new one with a higher term number but holding the same value (ala Paxos)?
    • One answer: values/RPCs never flow up to the leader in Raft, which is nice.
    • Don't need to rewrite anything.
    • Keep the statements strong for the proofs: a leader never overwrites any entry in its log.

RaftScope

  • Explain notation
  • Basic operation
    • Normal
    • Lost log replication messages
  • Leader election
    • Normal
    • Split vote
    • Two leaders
  • Majority down
  • Server contains uncommitted entries that are undone