CS6963 Distributed Systems

Lecture 14 Leases

Leases: An Efficient Fault-Tolerant Mechanism
Gray and Chertion
SOSP 1989

Overview

  • What is this about?
    • Context: V was a distributed OS; popular idea in the 80s
      • Plan 9, Sprite
      • Never materialized; distributed abstractions happened above the OS instead.
    • A file system server; clients have to cache files for performance.
    • But, how do we ensure consistent reads?
      • Can issue shootdowns, but what about failures?
      • Effectively unresponsive clients hold locks.
      • Locks won't work if participants aren't reliable.
    • Leases: locks that work even when clients go away.
      • Idea: bake revocation in at allocation time.
      • Server/client agree on duration of lease.
      • After that server is safe to assume client won't use cached data.
      • Tradeoff: improving read performance at the expense of writes.
      • Worst case: write rate is 1/lease length.
      • Not a problem because in most workloads reads dominate.
      • Problem: what about clock drift?
      • Bad if server's clock is too fast.
      • Bad if client's clock is too slow.
      • Need to assume bounded drift.

Background

  • Why does this matter?

    • NFS performance is less of a hot topic today.
    • The basic notion of leases is ubiquitous in systems today.
  • Example: hand out ownership of tablets, partitions, etc.

    • A KVS server contacts a coordinator and get a lease on tablets it owns.
    • Pings to the coordinator effectively extend leases.
    • Coordinator cannot reassign the tablets until lease is revoked.
    • It can contact the server to revoke the lease to move it.
    • Otherwise, if the server crashes, the coordinator cannot assign ownership of the range to another server until it is sure that the old server has stopped serving requests.
    • The coordinator waits for the lease to expire.
    • For this to work, servers must not perform any operations after lease expiry.
  • Used as a simple 'leader election'.

    • Defer the heavy HA consensus to a small lease manager.
    • [Possibly diagram out full system.]
    • See Vertical Paxos for fuller description and configurations.

Choosing a Lease Term

  • Tradeoff?
    • Fast renewals: less benefit for reads.
    • Slow renewals: more false sharing.
    • Lease shootdown when previous user was done with file anyway>
    • Renewal time also determines min delay for recovery on failures.

Issues

  • A problem:
    • What can go wrong here even if the clocks are working fine?
    • How do we make lease checks atomic with operations?
    • Or at least conservative?
client:
  if check_lease(blockId)
    return cache_read(blockId)

or:
  if check_lease(blockId)
    r = cache_read(blockId)
    if check_lease(blockId)
      return r

or:
 if check_lease(blockId)
    cache_write(blockId, v)
    if check_lease(blockId)
      return ok
    else
      ???
  • Likely to happen? What if the system starts swapping?
  • Basically, need a form of transaction to rollback.

  • First, the problems with clocks:

    • Clock drift
      • End of section 5: "leases only requires that clocks have a know bounded drift, in which case the lease term can be communicated as its duration t"
      • Not too strong of an assumption once you throw in a reasonable time sync protocol.
    • Clock skew
      • Section 3.1: e: the allowance for clock skew.
      • Here it seems they are assuming clock skew bounded by e.
      • A stronger assumption, since clocks must be the same time, not just the same rate.
      • With bounded skew, not too hard to keep bounded drift.
  • What about cases where delay in receiving ack from lease manager takes awhile?

    • This is ok if we assume bounded clock skew, probably broken if we assume only bounded clock drift.
    • Paper says (page 3) tc = max(0, ts - (mprop + 2mproc) - e)
    • What if we have a bunch of retransmits due to packet loss?
    • How does the renewer know how long its been since the lease was actually renewed from the lease manager's perspective (without synchronized clocks)?
    • One idea to help with this: send renewal messages unreliably and retransmit at application level.
    • If a renewal is lost, then the client will just assume the lease is lost early and refresh from the server.
    • Still what do we do if the lease renewer's process is swapping and the lease renewal gets processed late?
    • Can add some protection in continous renewal situations by adding extension term to last expiry time rather than to time.now upon renewal receipt.

Lab 4

  • Load-balanacing Sharded Fault-tolerant KVS
  • Key space sharded for capacity/performance
  • Key components:
    • Replica group
      • A Paxos replicated KVS that serves configurable set of shards.
    • Shards
      • Fixed-size blocks of key space.
      • Number chosen up front.
    • Shardmaster
      • Passes out each shard to at most one replica group at a time.
      • Paxos replicated for HA
  • Shardmaster must allow shards to be redistributed across replica groups over time.
    • Balance load, scale up, scale down.
    • On reconfiguration, replica groups will have to 'hand off' shard contents to new owner.
  • Assumption that makes life easier: can assume replica groups are always availble, since they are HA with Paxos.
    • Easier to revoke a shard, easier to ensure source/dest are available for migration.
  • You probably reuse large parts of both parts of Lab 3.