Lecture 14 Leases
Leases: An Efficient Fault-Tolerant Mechanism
Gray and Chertion
SOSP 1989
Overview
- What is this about?
- Context: V was a distributed OS; popular idea in the 80s
- Plan 9, Sprite
- Never materialized; distributed abstractions happened above the OS
instead.
- A file system server; clients have to cache files for performance.
- But, how do we ensure consistent reads?
- Can issue shootdowns, but what about failures?
- Effectively unresponsive clients hold locks.
- Locks won't work if participants aren't reliable.
- Leases: locks that work even when clients go away.
- Idea: bake revocation in at allocation time.
- Server/client agree on duration of lease.
- After that server is safe to assume client won't use cached data.
- Tradeoff: improving read performance at the expense of writes.
- Worst case: write rate is 1/lease length.
- Not a problem because in most workloads reads dominate.
- Problem: what about clock drift?
- Bad if server's clock is too fast.
- Bad if client's clock is too slow.
- Need to assume bounded drift.
Background
Why does this matter?
- NFS performance is less of a hot topic today.
- The basic notion of leases is ubiquitous in systems today.
Example: hand out ownership of tablets, partitions, etc.
- A KVS server contacts a coordinator and get a lease on tablets it owns.
- Pings to the coordinator effectively extend leases.
- Coordinator cannot reassign the tablets until lease is revoked.
- It can contact the server to revoke the lease to move it.
- Otherwise, if the server crashes, the coordinator cannot assign ownership
of the range to another server until it is sure that the old server has
stopped serving requests.
- The coordinator waits for the lease to expire.
- For this to work, servers must not perform any operations after lease
expiry.
Used as a simple 'leader election'.
- Defer the heavy HA consensus to a small lease manager.
- [Possibly diagram out full system.]
- See Vertical Paxos for fuller description and configurations.
Choosing a Lease Term
- Tradeoff?
- Fast renewals: less benefit for reads.
- Slow renewals: more false sharing.
- Lease shootdown when previous user was done with file anyway>
- Renewal time also determines min delay for recovery on failures.
Issues
- A problem:
- What can go wrong here even if the clocks are working fine?
- How do we make lease checks atomic with operations?
- Or at least conservative?
client:
if check_lease(blockId)
return cache_read(blockId)
or:
if check_lease(blockId)
r = cache_read(blockId)
if check_lease(blockId)
return r
or:
if check_lease(blockId)
cache_write(blockId, v)
if check_lease(blockId)
return ok
else
???
- Likely to happen? What if the system starts swapping?
Basically, need a form of transaction to rollback.
First, the problems with clocks:
- Clock drift
- End of section 5: "leases only requires that clocks have a know bounded
drift, in which case the lease term can be communicated as its
duration t"
- Not too strong of an assumption once you throw in a reasonable time sync
protocol.
- Clock skew
- Section 3.1: e: the allowance for clock skew.
- Here it seems they are assuming clock skew bounded by e.
- A stronger assumption, since clocks must be the same time, not just the
same rate.
- With bounded skew, not too hard to keep bounded drift.
What about cases where delay in receiving ack from lease manager takes
awhile?
- This is ok if we assume bounded clock skew, probably broken if we assume
only bounded clock drift.
- Paper says (page 3) tc = max(0, ts - (mprop + 2mproc) - e)
- What if we have a bunch of retransmits due to packet loss?
- How does the renewer know how long its been since the lease was actually
renewed from the lease manager's perspective (without synchronized clocks)?
- One idea to help with this: send renewal messages unreliably and retransmit
at application level.
- If a renewal is lost, then the client will just assume the lease is lost
early and refresh from the server.
- Still what do we do if the lease renewer's process is swapping and the
lease renewal gets processed late?
- Can add some protection in continous renewal situations by adding extension
term to last expiry time rather than to time.now upon renewal receipt.
Lab 4
- Load-balanacing Sharded Fault-tolerant KVS
- Key space sharded for capacity/performance
- Key components:
- Replica group
- A Paxos replicated KVS that serves configurable set of shards.
- Shards
- Fixed-size blocks of key space.
- Number chosen up front.
- Shardmaster
- Passes out each shard to at most one replica group at a time.
- Paxos replicated for HA
- Shardmaster must allow shards to be redistributed across replica groups over
time.
- Balance load, scale up, scale down.
- On reconfiguration, replica groups will have to 'hand off' shard contents
to new owner.
- Assumption that makes life easier: can assume replica groups are always
availble, since they are HA with Paxos.
- Easier to revoke a shard, easier to ensure source/dest are available for
migration.
- You probably reuse large parts of both parts of Lab 3.