Why this paper?
- Byzantine Fault Tolerance!
- Area of a fair amount of research, but less deployment.
- Important to know that this problem can even be solved: seems a bit hard to
believe that it works!
What is Byzantine Fault Tolerance?
What is a Byzantine behavior?
- Any thing that doesn't follow our protocol.
- Malicious code/nodes.
- Buggy code.
- Fault networks that deliver corrupted packets.
- Disks that corrupt data, duplicate data, lose data, fabricate data.
- Nodes that impersonate others or join the cluster without permission.
- Nodes that operation when they shouldn't (e.g. clock drift outside of
tolerance).
- Try to service operations on a partition even after partition was given
to someone else.
- Really wicked bad stuff: any arbibtrary behavior.
- Subject to one restriction - will come to this in a minute.
Primary/backup
- Replicate state, stay alive so long at f + 1 replicas for f failures.
Paxos
- Idea: replicate a state machine, stay 'available' so long as <= f failures
in cluster of 2f + 1.
- Lose up to f, others can still make decisions.
- When others come back, they catch up.
What does 3f + 1 get us then?
- SMR that tolerates up to f malcious or arbitrarily failed nodes.
Key restriction: assume independent node failures for BFT!
Q: Is this a big assumption?
Costlier than you might expect under malcious attack; requires:
- Different implementations.
- different operating systems.
- Different root passwords.
- Different administrators!!!
Why? Otherwise given one failure attacker may be able to amplify. If f + 1
nodes have the same problem then we are already toast.
Another key consideration: in Paxos, a pair of partitioned nodes may not be
able to communicate directly.
We'll have the same situation here: [drawn 4 nodes in a 1, 1, 2 split].
Should keep operating.
But what if A communicates through B to C and D and B lies about A's
messages!
Can B use this to amplify it's power?
i.e. we could have tolerated a faulty B, but now A is faulty by proxy.
Solution: authenticate messages with crypto.
Two goals: prevent spoofing and replays.
Spoofing can kill us in the above situation.
How might replay attack hurt us?
XXX Example here?
Use public-key signatures, message authentication codes, and message digests.
Public-key crypto/signatures.
- Each node has a public key and a private key.
- Public key is known by all other nodes.
- Private key is kept secret (at least if the node is non-Byzantine).
- Exposing a private key counts at one of the f failures.
- Node i can use it's private key to generate a message si.
- Any node with i's public key can verify that i generated the message.
- Or at least someone with access to i's private key.
- This fixes spoofing.
XXX Can skip this if it isn't used elsewhere.
Message digest/hash: one-way function.
- Easy to compute given plaintext.
- Extremely hard to generate a plaintext for a given hash/digest.
- Here: use it as a short summary of the message.
Implements SMR just like Paxos/Raft.
- [XXX: Diagram here...]
- Serial replicated log that represents a form of virtual time.
- Once entries are decided they are fed to a deterministic state machine.
- Here the state machine will be an NFS server.
XXX: Spend plenty of time here and really diagram this!!!
[Section 3, Page 3, two paragraphs before Section 4].
Why 3f + 1? Why this number? What does this buy us?
First, must be able to proceed after contacting 2f + 1 replicas.
- Why? Have to stay available if f have failed.
- Imagine 3f + 1 = 4, 2f + 1 = 3.
Of 2f + 1 responses must they all be from non-faulty nodes? NO!
What if a non-faulty node is slower than a faulty one?
Of the 2f + 1 we can count on being up, up to f of them could still be
faulty!
How can we make the right decision if some of the responses are 'true' and
others are 'lies'?
Idea: need to make sure the number of 'true' responses outnumber the 'lies'.
If we have 2f + 1 responses and f are lies, then we know we have f + 1 that
are ok.
Result: with 3f + 1 servers, we get responses from 2f + 1, up to f of those
could be faulty, but if we should f + 1 common responses which indicate the
correct outcome.
Another way to think about it:
- In P/B, everyone was forced to agree, so if 1 up of f + 1 then ok.
- In Paxos, up to f could have a different/no value, so need f + 1 to be
sure.
- PBFT, up to f could lie about the value and they might be among the
responses we get back first. Need f + 1 true values to offset those. So, f
could be lies, f + 1 good ones needed, and another f to account for other
good guys that are just slower than potential bad guys.
The Algorithm
- Sequence of Views (similar to Lab 2/Lab 4) numbered consecutively.
- One replica is Primary, others are Backups.
- Primary of View v is v mod |R|.