Lecture 05 Optimistic Concurrency Control

Efficient Optimistic Concurrency Control using Loosely Synchronized Clocks
by Adya, Gruber, Liskov and Maheshwari.

Why this paper?
- Tuesday: Argus, 2PC + 2PL for serializable transactions
- Can be slow: must take locks even when there is no contention
- Scalability limited: lots of locks/unlocks flying around
- Argus also fused data and computation: people don't like that
- Thor has a more conventional model
- Goal: scalable protocol for enforcing serializable transactions across objects.
Thor overview
- [clients, client caches, servers A-M N-Z]
- Data sharded over servers
- Code runs in clients (not like Argus; not an RPC system)
- Clients read/write DB records from servers
- Clients cache data locally for fast access
- On client cache miss, fetch from server
Thor arrangement is fairly close to modern big web site habits
- Clients, local fast cache, slower DB servers
- Similar to Facebook/memcache paper
- but Thor has much better semantics, strong guarantees
- As a result, prone to unavailability during failures.
Thor programs use fully general transactions
- Multi-operation
- Serializable
- So can do bank xfers w/o losing money...
Client caching makes transactions tricky
- Writes have to invalidatate cached copies
- How to cope with reads of stale cached data?
- How to cope with read-modify-write races?
- Clients could lock before using each record
- But that's slow - probably need to contact server
- Wrecks the whole point of fast local caching in clients
- (though caching read locks might be OK, as in paper Eval)
Thor uses optimistic concurrency control (OCC)
- An idea from the early 1980s
- Just read and write the local copy
- Don't worry about other transactions until commit
- When transaction wants to commit:
- Send read/write info to server for "validation"
- Validation decides if OK to commit -- if serializable
- If yes, send invalidates to clients with cached copies of written records
- If no, abort, discard writes
- Optimistic b/c hopes for no conflict
- If turns out to be true, fast!
- If false, validation can detect, but slow
What should validation do?
- It looks at what the executing transactions read and wrote
- Decides if there's a serial execution order that would have gotten
- The same results as the actual concurrent execution
- There are many OCC validation algorithms!
- I will outline a few, leading up to Thor's
Validation scheme #1
- First, just let clients read/write as they see fit
- A single validation server
- Clients tell validation server the read and write VALUES
- Seen by each transaction that wants to commit
- "read set" and "write set"
- Validation must decide:
- Would the results be serializable if we let these transactions commit?
- Scheme #1 shuffles the transactions, looking for a serial order
- In which each read sees the value written by the most recent write; if one exists, the execution was serializable.

Validation example 1:

  initially, x=0 y=0 z=0
  T1: Rx0 Wx1
  T2: Rz0 Wz9
  T3: Ry1 Rx1
  T4: Rx0 Wy1

Validation needs to decide if this execution (reads, writes) is equivalent to some serial order.
Yes: one such order is T4, T1, T3, T2; says yes to all
- (really T2 can go anywhere)
Note this scheme is far more permissive than Thor's
- e.g. it allows transactions to see uncommitted writes (non-ACR)
OCC is neat b/c transactions don't need to lock!
- So they can run quickly from client caches
- Just one msg exchange w/ validator per transaction
- Not one locking exchange per record used
- OCC excellent for T2 which didn't conflict with anything
- We got lucky for T1 T3 T4, which do conflict

Validation example 2 -- sometimes must abort:

  initially, x=0 y=0
  T1: Rx0 Wx1
  T2: Rx0 Wy1
  T3: Ry0 Rx1

Values not consistent w/ any serial order!
- T1 -> T3 (via x)
- T3 -> T2 (via y)
- T2 -> T1 (via x)
- There's a cycle, so not the same as any serial execution
- Perhaps T3 read a stale y=0 from cache or T2 read a style x=0 from cache
- In this case validation can abort one of them then others are OK to commit
- e.g. abort T2
- Then T1, T3 is OK (but not T3, T1)
How should client handle abort?
- Roll back the program (including writes to program variables)
- Re-run from start of transaction
- Hopefully won't be conflicts the second time
- OCC is best when conflicts are uncommon!
Do we need to validate read-only transactions?

Example:

    initially x=0 y=0
    T1: Wx1
    T2: Rx1 Wy2
    T3: Ry2 Rx0

i.e. T3 read a stale x=0 from its cache, hadn't yet seen invalidate.
Need to validate in order to abort T3.
Other OCC schemes can avoid validating read-only transactions
- Keep multiple versions -- but Thor and my schemes don't
Is OCC better than locking?
- yes, if few conflicts
- avoids lock msgs, clients don't have to wait for locks
- no, if many conflicts
- OCC aborts, must re-start, perhaps many times
- locking waits

Example: simultaneous increment

    locking:
      T1: Rx0 Wx1
      T2: -------Rx1  Wx2
    OCC:
      T1: Rx0 Wx1
      T2: Rx0 Wx1

OCC: fast but wrong; must abort one

We really want distributed OCC validation
- Split storage and validation load over servers
- Each storage server sees only txns that use its data
- Each storage server validates just its part of the txn
- Two-phase commit (2PC) to check that they all say "yes"
- Only really commit if all relevant servers say "yes"
Can we just distribute validation scheme #1?
Imagine server S1 knows about x, server S2 knows about y

Example 2 again:

    T1: Rx0 Wx1
    T2: Rx0 Wy1
    T3: Ry0 Rx1

S1 validates just x information:

    T1: Rx0 Wx1
    T2: Rx0
    T3: Rx1

Answer is "yes" (T2 T1 T3)
S2 validates just y information:

    T2: Wy1
    T3: Ry0

Answer is "yes" (T3 T2)
but we know the real answer is "no"!

So simple distributed validation does not work
- The validators must choose the same serial order!
Validation scheme #2
- Idea: client (or coordinator) chooses timestamp for committing txn
  - from loosely synchronized clocks, as in Thor
- Validation checks that reads and writes are consistent with TS order
- Solves distrib validation problem:
  - Timestamps tell the validators the order to check
  - So "yes" votes will refer to the same order

Example 2 again, with timestamps:

  T1@100: Rx0 Wx1
  T2@110: Rx0 Wy1
  T3@105: Ry0 Rx1

S1 validates just x information:

    T1@100: Rx0 Wx1
    T2@110: Rx0
    T3@105: Rx1

Timestamps say order must be T1, T3, T2
does not validate! T2 should have seen x=1

S2 validates just y information:

    T2@110: Wy1
    T3@105: Ry0

Timestamps say order must be T3, T2
validates!

S1 says no, S2 says yes, two-phase commit coordinator will abort

What have we given up by requiring timestamp order?

Example:

    T1@100: Rx0 Wx1
    T2@50: Rx1 Wx2

T2 follows T1 in real time, and sees T1's write
- but T2 will abort, since TS says T2 comes first, so T1 should have seen x=2
  - could have committed, since T1 then T2 works
- this will happen if client clocks are too far off
  - if T1's client clock is ahead, or T2's behind
- so: requiring TS order can abort unnecessarily
  - Because validation no longer searching for an order that works
  - Instead merely checking that TS order consistent w/ reads, writes
  - We've given up some optimism by requiring TS order
- Maybe not a problem if clocks closely synched
- Maybe not a problem if conflicts are rare
Problem with schemes so far:
- Commit messages contained values, which can be big
- Could instead use version numbers to check whether later txn read earlier txn's write
- Let's use writing txn's TS as record version number
Validation scheme #4
- Tag each DB record (and cached record) with TS of xation that last wrote it
- Validation requests carry TS of each record read

Our example for scheme #4:

  all values start with timestamp 0
  T1@100: Rx@0 Wx
  T2@110: Rx@0 Wy
  T3@105: Ry@0 Rx@100

Note:
- Reads have timestamp that was in read record, not value
- Writes don't include either value or timestamp

S1 validates just x information: orders the transactions by timestamp:

    T1@100: Rx@0 Wx
    T3@105: Rx@100
    T2@110: Rx@0

The question: does each read see the most recent write?
T3 is ok, but T2 is not

S2 validates just y information: again, sort by TS, check each read saw latest write:

    T3@105: Ry@0
    T2@110: Wy

This does validate
So scheme #4 aborts, correctly, reasoning only about version TSs

What have we give up by thinking about version #s rather than values?
- Maybe version numbers are different but values are the same e.g.

    T1@100: Wx1
    T2@110: Wx2
    T3@120: Wx1
    T4@130: Rx1@100

Timestamps say we should abort T4 b/c read a stale version
- Should have read T3's write
- So scheme #4 will abort
- But T4 read the correct value: x=1
- So abort wasn't necessary
Problem: per-record timestamp might use too much storage space
- Thor wants to avoid space overhead
- Maybe important, maybe not
Validation scheme #5
- Thor's invalidation scheme: no timestamps on records
- How can validation detect that a transaction read stale data?
- It read stale data b/c earlier txn's invalidation hadn't yet arrived!
- So server can track invalidation msgs that might not have arrived yet
  - "invalid set" - one per client
  - Delete invalid set entry when client ACKs invalidation msg
  - Server aborts committing txn if it read record in client's invalid set
  - Client aborts running txn if it read record mentioned in invalidation
Example use of invalid set
- [timeline]
- Client C1:
  - T2@105 ... Rx ... 2PC commit point
  - imagine that client acts as 2PC coordinator
- Server:
  - VQ: T1@100 Wx
  - T1 committed, x in C1's invalid set
    - server has sent invalidation message to C1
Three cases:
1. Invalidation arrives before T2 reads:
  - Rx will miss in client cache, read from data from server
  - Client will (probably) return ACK before T2 commits
  - Server won't abort T2
2. Invalidation arrives after T2 reads, before commit point:
  - Client will abort T2 in response to invalidation
3. Invalidation arrives after 2PC commit point:
  - i.e. after all servers replied to prepare
  - This means the client was still in the invalid set when
    - the server tried to validate the transaction
  - So the server aborted, so the client will abort too

So: Thor's validation detects stale reads w/o timestamp on each record.

Performance

Figure 5
- AOCC is Thor
- Comparing to ACBL: client talks to srvr to get write-locks,
- and to commit non-r/o txns, but can cache read locks along with data
- why does Thor (AOCC) have higher throughput?
  - fewer msgs; commit only, no lock msgs
- why does Thor throughput go up for a while w/ more clients?
  - apparently a single client can't keep all resources busy
  - maybe due to network RTT?
  - maybe due to client processing time? or think time?
  - more clients -> more parallel xactions -> more completed
- why does Thor throughput level off?
  - maybe 15 clients is enough to saturate server disk or CPU
  - abt 100 xactions/second, about right for writing disk
- why does Thor throughput drop with many clients?
  - more clients means more concurrent xactions at any given time
  - more concurrency means more chance of conflict
  - for OCC, more conflict means more aborts, so more wasted CPU
Conclusions
- Fast client caching + transactions would be excellent
- Distributed OCC very interesting, still an open research area
- Avoiding per-record version #s doesn't seem compelling
- Thor's use of time was influential, e.g. Spanner

CS6963 Distributed Systems

Lecture 05 Optimistic Concurrency Control

Performance