CS6963 Distributed Systems

Lecture 20 Byzantine Fault Tolerance and PBFT

Practical Byzantine Fault Tolerance
Castro and Liskov
SOSP 99

  • Why this paper?

    • Byzantine Fault Tolerance!
    • Area of a fair amount of research, but less deployment.
    • Important to know that this problem can even be solved: seems a bit hard to believe that it works!
  • What is Byzantine Fault Tolerance?

  • What is a Byzantine behavior?

    • Any thing that doesn't follow our protocol.
    • Malicious code/nodes.
    • Buggy code.
    • Fault networks that deliver corrupted packets.
    • Disks that corrupt data, duplicate data, lose data, fabricate data.
    • Nodes that impersonate others or join the cluster without permission.
    • Nodes that operation when they shouldn't (e.g. clock drift outside of tolerance).
    • Try to service operations on a partition even after partition was given to someone else.
    • Really wicked bad stuff: any arbibtrary behavior.
    • Subject to one restriction - will come to this in a minute.
  • Primary/backup

    • Replicate state, stay alive so long at f + 1 replicas for f failures.
  • Paxos

    • Idea: replicate a state machine, stay 'available' so long as <= f failures in cluster of 2f + 1.
    • Lose up to f, others can still make decisions.
    • When others come back, they catch up.
  • What does 3f + 1 get us then?

    • SMR that tolerates up to f malcious or arbitrarily failed nodes.
  • Key restriction: assume independent node failures for BFT!

  • Q: Is this a big assumption?

  • Costlier than you might expect under malcious attack; requires:

    • Different implementations.
    • different operating systems.
    • Different root passwords.
    • Different administrators!!!
  • Why? Otherwise given one failure attacker may be able to amplify. If f + 1 nodes have the same problem then we are already toast.

  • Another key consideration: in Paxos, a pair of partitioned nodes may not be able to communicate directly.

  • We'll have the same situation here: [drawn 4 nodes in a 1, 1, 2 split].

  • Should keep operating.

  • But what if A communicates through B to C and D and B lies about A's messages!

  • Can B use this to amplify it's power?

  • i.e. we could have tolerated a faulty B, but now A is faulty by proxy.

  • Solution: authenticate messages with crypto.

  • Two goals: prevent spoofing and replays.

  • Spoofing can kill us in the above situation.

  • How might replay attack hurt us?

  • XXX Example here?

  • Use public-key signatures, message authentication codes, and message digests.

  • Public-key crypto/signatures.

    • Each node has a public key and a private key.
    • Public key is known by all other nodes.
    • Private key is kept secret (at least if the node is non-Byzantine).
    • Exposing a private key counts at one of the f failures.
    • Node i can use it's private key to generate a message si.
    • Any node with i's public key can verify that i generated the message.
    • Or at least someone with access to i's private key.
    • This fixes spoofing.
  • XXX Can skip this if it isn't used elsewhere.

  • Message digest/hash: one-way function.

    • Easy to compute given plaintext.
    • Extremely hard to generate a plaintext for a given hash/digest.
    • Here: use it as a short summary of the message.
  • Implements SMR just like Paxos/Raft.

    • [XXX: Diagram here...]
    • Serial replicated log that represents a form of virtual time.
    • Once entries are decided they are fed to a deterministic state machine.
    • Here the state machine will be an NFS server.
  • XXX: Spend plenty of time here and really diagram this!!!

  • [Section 3, Page 3, two paragraphs before Section 4].

  • Why 3f + 1? Why this number? What does this buy us?

  • First, must be able to proceed after contacting 2f + 1 replicas.

    • Why? Have to stay available if f have failed.
    • Imagine 3f + 1 = 4, 2f + 1 = 3.
  • Of 2f + 1 responses must they all be from non-faulty nodes? NO!

  • What if a non-faulty node is slower than a faulty one?

  • Of the 2f + 1 we can count on being up, up to f of them could still be faulty!

  • How can we make the right decision if some of the responses are 'true' and others are 'lies'?

  • Idea: need to make sure the number of 'true' responses outnumber the 'lies'.

  • If we have 2f + 1 responses and f are lies, then we know we have f + 1 that are ok.

  • Result: with 3f + 1 servers, we get responses from 2f + 1, up to f of those could be faulty, but if we should f + 1 common responses which indicate the correct outcome.

  • Another way to think about it:

    • In P/B, everyone was forced to agree, so if 1 up of f + 1 then ok.
    • In Paxos, up to f could have a different/no value, so need f + 1 to be sure.
    • PBFT, up to f could lie about the value and they might be among the responses we get back first. Need f + 1 true values to offset those. So, f could be lies, f + 1 good ones needed, and another f to account for other good guys that are just slower than potential bad guys.
  • The Algorithm

    • Sequence of Views (similar to Lab 2/Lab 4) numbered consecutively.
    • One replica is Primary, others are Backups.
    • Primary of View v is v mod |R|.
  1. Client sends request to primary.
  2. Primary sends request to all backups.
  3. Replicas execute the request and send the reply to the client.
  4. Client waits for f + 1 responses with the same result.
  • What if Client gets f faulty responses that all agree?
    • Then will have to wait for f + 1 more (for a total of 2f + 1 responses).
  • What if Client gets f + 1 fault responses that all agree?
    • Cannot: violates assumption of f faulty processes.
  • What if Client gets f 'faulty' responses and 1 correct one and they all agree?
    • Then the f responses weren't really faulty.