Lecture 05 Optimistic Concurrency Control
Efficient Optimistic Concurrency Control using Loosely Synchronized Clocks
by Adya, Gruber, Liskov and Maheshwari.
Why this paper?
- Tuesday: Argus, 2PC + 2PL for serializable transactions
- Can be slow: must take locks even when there is no contention
- Scalability limited: lots of locks/unlocks flying around
- Argus also fused data and computation: people don't like that
- Thor has a more conventional model
- Goal: scalable protocol for enforcing serializable transactions across
objects.
Thor overview
- [clients, client caches, servers A-M N-Z]
- Data sharded over servers
- Code runs in clients (not like Argus; not an RPC system)
- Clients read/write DB records from servers
- Clients cache data locally for fast access
- On client cache miss, fetch from server
Thor arrangement is fairly close to modern big web site habits
- Clients, local fast cache, slower DB servers
- Similar to Facebook/memcache paper
- but Thor has much better semantics, strong guarantees
- As a result, prone to unavailability during failures.
Thor programs use fully general transactions
- Multi-operation
- Serializable
- So can do bank xfers w/o losing money...
Client caching makes transactions tricky
- Writes have to invalidatate cached copies
- How to cope with reads of stale cached data?
- How to cope with read-modify-write races?
- Clients could lock before using each record
- But that's slow - probably need to contact server
- Wrecks the whole point of fast local caching in clients
- (though caching read locks might be OK, as in paper Eval)
Thor uses optimistic concurrency control (OCC)
- An idea from the early 1980s
- Just read and write the local copy
- Don't worry about other transactions until commit
- When transaction wants to commit:
- Send read/write info to server for "validation"
- Validation decides if OK to commit -- if serializable
- If yes, send invalidates to clients with cached copies of written records
- If no, abort, discard writes
- Optimistic b/c hopes for no conflict
- If turns out to be true, fast!
- If false, validation can detect, but slow
What should validation do?
- It looks at what the executing transactions read and wrote
- Decides if there's a serial execution order that would have gotten
- The same results as the actual concurrent execution
- There are many OCC validation algorithms!
- I will outline a few, leading up to Thor's
Validation scheme #1
- First, just let clients read/write as they see fit
- A single validation server
- Clients tell validation server the read and write VALUES
- Seen by each transaction that wants to commit
- "read set" and "write set"
- Validation must decide:
- Would the results be serializable if we let these transactions commit?
- Scheme #1 shuffles the transactions, looking for a serial order
- In which each read sees the value written by the most
recent write; if one exists, the execution was serializable.
Validation example 1:
initially, x=0 y=0 z=0
T1: Rx0 Wx1
T2: Rz0 Wz9
T3: Ry1 Rx1
T4: Rx0 Wy1
Validation example 2 -- sometimes must abort:
initially, x=0 y=0
T1: Rx0 Wx1
T2: Rx0 Wy1
T3: Ry0 Rx1
Values not consistent w/ any serial order!
- T1 -> T3 (via x)
- T3 -> T2 (via y)
- T2 -> T1 (via x)
- There's a cycle, so not the same as any serial execution
- Perhaps T3 read a stale y=0 from cache or T2 read a style x=0 from cache
- In this case validation can abort one of them then others are OK to commit
- e.g. abort T2
- Then T1, T3 is OK (but not T3, T1)
How should client handle abort?
- Roll back the program (including writes to program variables)
- Re-run from start of transaction
- Hopefully won't be conflicts the second time
- OCC is best when conflicts are uncommon!
Do we need to validate read-only transactions?
Example:
initially x=0 y=0
T1: Wx1
T2: Rx1 Wy2
T3: Ry2 Rx0
Example: simultaneous increment
locking:
T1: Rx0 Wx1
T2: -------Rx1 Wx2
OCC:
T1: Rx0 Wx1
T2: Rx0 Wx1
OCC: fast but wrong; must abort one
We really want distributed OCC validation
- Split storage and validation load over servers
- Each storage server sees only txns that use its data
- Each storage server validates just its part of the txn
- Two-phase commit (2PC) to check that they all say "yes"
- Only really commit if all relevant servers say "yes"
Can we just distribute validation scheme #1?
Imagine server S1 knows about x, server S2 knows about y
Example 2 again:
T1: Rx0 Wx1
T2: Rx0 Wy1
T3: Ry0 Rx1
S1 validates just x information:
T1: Rx0 Wx1
T2: Rx0
T3: Rx1
Answer is "yes" (T2 T1 T3)
S2 validates just y information:
Answer is "yes" (T3 T2)
but we know the real answer is "no"!
Example 2 again, with timestamps:
T1@100: Rx0 Wx1
T2@110: Rx0 Wy1
T3@105: Ry0 Rx1
S1 validates just x information:
T1@100: Rx0 Wx1
T2@110: Rx0
T3@105: Rx1
Timestamps say order must be T1, T3, T2
does not validate! T2 should have seen x=1
S2 validates just y information:
Timestamps say order must be T3, T2
validates!
S1 says no, S2 says yes, two-phase commit coordinator will abort
- What have we given up by requiring timestamp order?
Example:
T1@100: Rx0 Wx1
T2@50: Rx1 Wx2
Our example for scheme #4:
all values start with timestamp 0
T1@100: Rx@0 Wx
T2@110: Rx@0 Wy
T3@105: Ry@0 Rx@100
- Note:
- Reads have timestamp that was in read record, not value
- Writes don't include either value or timestamp
S1 validates just x information: orders the transactions by timestamp:
T1@100: Rx@0 Wx
T3@105: Rx@100
T2@110: Rx@0
The question: does each read see the most recent write?
T3 is ok, but T2 is not
S2 validates just y information: again, sort by TS, check each read saw latest write:
This does validate
So scheme #4 aborts, correctly, reasoning only about version TSs
- What have we give up by thinking about version #s rather than values?
- Maybe version numbers are different but values are the same e.g.
T1@100: Wx1
T2@110: Wx2
T3@120: Wx1
T4@130: Rx1@100
Timestamps say we should abort T4 b/c read a stale version
- Should have read T3's write
- So scheme #4 will abort
- But T4 read the correct value: x=1
- So abort wasn't necessary
Problem: per-record timestamp might use too much storage space
- Thor wants to avoid space overhead
- Maybe important, maybe not
Validation scheme #5
- Thor's invalidation scheme: no timestamps on records
- How can validation detect that a transaction read stale data?
- It read stale data b/c earlier txn's invalidation hadn't yet arrived!
- So server can track invalidation msgs that might not have arrived yet
- "invalid set" - one per client
- Delete invalid set entry when client ACKs invalidation msg
- Server aborts committing txn if it read record in client's invalid set
- Client aborts running txn if it read record mentioned in invalidation
Example use of invalid set
- [timeline]
- Client C1:
- T2@105 ... Rx ... 2PC commit point
- imagine that client acts as 2PC coordinator
- Server:
- VQ: T1@100 Wx
- T1 committed, x in C1's invalid set
- server has sent invalidation message to C1
Three cases:
- Invalidation arrives before T2 reads:
- Rx will miss in client cache, read from data from server
- Client will (probably) return ACK before T2 commits
- Server won't abort T2
- Invalidation arrives after T2 reads, before commit point:
- Client will abort T2 in response to invalidation
- Invalidation arrives after 2PC commit point:
- i.e. after all servers replied to prepare
- This means the client was still in the invalid set when
- the server tried to validate the transaction
- So the server aborted, so the client will abort too
So: Thor's validation detects stale reads w/o timestamp on each record.
Performance
Figure 5
- AOCC is Thor
- Comparing to ACBL: client talks to srvr to get write-locks,
- and to commit non-r/o txns, but can cache read locks along with data
- why does Thor (AOCC) have higher throughput?
- fewer msgs; commit only, no lock msgs
- why does Thor throughput go up for a while w/ more clients?
- apparently a single client can't keep all resources busy
- maybe due to network RTT?
- maybe due to client processing time? or think time?
- more clients -> more parallel xactions -> more completed
- why does Thor throughput level off?
- maybe 15 clients is enough to saturate server disk or CPU
- abt 100 xactions/second, about right for writing disk
- why does Thor throughput drop with many clients?
- more clients means more concurrent xactions at any given time
- more concurrency means more chance of conflict
- for OCC, more conflict means more aborts, so more wasted CPU
Conclusions
- Fast client caching + transactions would be excellent
- Distributed OCC very interesting, still an open research area
- Avoiding per-record version #s doesn't seem compelling
- Thor's use of time was influential, e.g. Spanner