Lecture 02 Communication, Messaging, and RPC
Remote Procedure Call (RPC)
Client:
Server:
fn(x, y) {
compute
return z
}
RPC message diagram:
Client Server
request--->
<---response
Software structure:
client app handlers
stubs dispatcher
RPC lib RPC lib
net ------------ net
A few details:
- Which server function (handler) to call?
- Marshalling: format data into packets
- Tricky for arrays, pointers, objects, etc.
- Go's RPC library is pretty powerful!
- some things you cannot pass: e.g., channels, functions
- Binding: how does client know who to talk to?
- Maybe client supplies server host name
- Maybe a name service maps service names to best server host
- Threads:
- Client often has many threads, so > 1 call outstanding, match up replies
- Handlers may be slow, so server often runs each in a thread
RPC problem: what to do about failures?
- (This will become the single most asked question in this class.)
- e.g. lost packet, broken network, slow server, crashed server
- Q: Doesn't TCP solve this?
- What if you never get an ACK? Does that guarantee it didn't happen?
- What should we do between 'broken' TCP connections?
What does a failure look like to the client RPC library?
- Client never sees a response from the server
- Client does not know if the server saw the request!
- Maybe server/net failed just before sending reply
- (diagram of lost reply)
Simplest scheme: "at least once" behavior
- RPC library waits for response for a while
- If none arrives, re-send the request
- Do this a few times
- Still no response -- return an error to the application
Q: is "at least once" easy for applications to cope with?
Simple problem w/ at least once:
- client sends "deduct $10 from bank account"
Q: what can go wrong with this client program?
- Put("k", 10) -- an RPC to set key's value in a DB server
- Put("k", 20) -- client then does a 2nd Put to same key
- [diagram, timeout, re-send, original arrives very late]
Q: is at-least-once ever OK?
- yes: if it's OK to repeat operations, e.g. read-only op
- yes: if application has its own plan for coping w/ duplicates
- which you will need for Lab 1
Better RPC behavior: "at most once"
- idea: server RPC code detects duplicate requests
- returns previous reply instead of re-running handler
- Q: how to detect a duplicate request?
- client includes unique ID (XID) with each request
- uses same XID for re-send
server:
if seen[xid]:
r = old[xid]
else
r = handler()
old[xid] = r
seen[xid] = true
some at-most-once complexities
- this will come up in labs 2 and on
- how to ensure XID is unique?
- big random number?
- combine unique client ID (ip address?) with sequence #?
- server must eventually discard info about old RPCs
- when is discard safe?
- idea:
- unique client IDs
- per-client RPC sequence numbers
- client includes "seen all replies <= X" with every RPC
- much like TCP sequence #s and acks
- or only allow client one outstanding RPC at a time
- arrival of seq+1 allows server to discard all <= seq
- or client agrees to keep retrying for < 5 minutes
- server discards after 5+ minutes
- how to handle dup req while original is still executing?
- server doesn't know reply yet; don't want to run twice
- idea: "pending" flag per executing RPC; wait or ignore
What if an at-most-once server crashes and re-starts?
- if at-most-once duplicate info in memory, server will forget
- and accept duplicate requests after re-start
- maybe it should write the duplicate info to disk?
- maybe replica server should also replicate duplicate info?
What about "exactly once"?
- at-most-once plus unbounded retries plus fault-tolerant service
- Lab 3
Go RPC is "at-most-once"
- open TCP connection
- write request to TCP connection
- TCP may retransmit, but server's TCP will filter out duplicates
- no retry in Go code (i.e. will NOT create 2nd TCP connection)
- Go RPC code returns an error if it doesn't get a reply
- perhaps after a timeout (from TCP)
- perhaps server didn't see request
- perhaps server processed request but server/net failed before reply came back
Go RPC's at-most-once isn't enough for Lab 1
- it only applies to a single RPC call
- if worker doesn't respond, the master re-send to it to another worker
- but original worker may have not failed, and is working on it too
- Go RPC can't detect this kind of duplicate
- No problem in lab 1, which handles at application level
- Lab 2 will explicitly detect duplicates
Threads
- threads are a fundamental server structuring tool
- you'll use them a lot in the labs
- they can be tricky
- useful with RPC
- Go calls them goroutines; everyone else calls them threads
Thread = "thread of control"
- threads allow one program to (logically) do many things at once
- the threads share memory
- each thread includes some per-thread state:
- program counter, registers, stack
Threading challenges:
- sharing data
- two threads modify the same variable at same time?
- one thread reads data that another thread is changing?
- these problems are often called races
- need to protect invariants on shared data
- use Go sync.Mutex
- coordination between threads
- e.g. wait for all Map threads to finish
- use Go channels
- deadlock
- thread 1 is waiting for thread 2
- thread 2 is waiting for thread 1
- easy detectable (unlike races)
- lock granularity
- coarse-grained -> simple, but little concurrency/parallelism
- fine-grained -> more concurrency, more races and deadlocks
- let's look at a toy RPC package to illustrate these problems
look at RPC example at the bottom of this page
- it's a simplified RPC system
- illustrates threads, mutexes, channels
- it's a toy, though it does run
- assumes connection already open
- only supports an integer arg, integer reply
- omits error checks
struct ToyClient
- client RPC state
- mutex per ToyClient
- connection to server (e.g. TCP socket)
- xid -- unique ID per call, to match reply to caller
- pending[] -- chan per thread waiting in Call()
- so client knows what to do with each arriving reply
Call
- application calls reply := client.Call(procNum, arg)
- procNum indicates what function to run on server
- WriteRequest knows the format of an RPC msg
- basically just the arguments turned into bits in a packet
- Q: why the mutex in Call()? what does mu.Lock() do?
- Q: could we move "xid := tc.xid" outside the critical section?
- after all, we are not changing anything
- [diagram to illustrate]
- Q: do we need to WriteRequest inside the critical section?
- note: Go says you are responsible for preventing concurrent map ops
- that's one reason the update to pending is locked
Listener
- runs as a background thread
- what is <- doing?
- not quite right that it may need to wait on chan for caller
Back to Call()...
Q: what if reply comes back very quickly?
- could Listener() see reply before pending[xid] entry exists?
- or before caller is waiting for channel?
Q: should we put reply:=<-done inside the critical section?
- why is it OK outside? after all, two threads use it.
Q: why mutex per ToyClient, rather than single mutex per whole RPC pkg?
Server's Dispatcher()
- note that the Dispatcher echos the xid back to the client
- so that Listener knows which Call to wake up
- Q: why run the handler in a separate thread?
- Q: is it a problem that the dispatcher can reply out of order?
main()
- note registering handler in handlers[]
- what will the program print?
Q: when to use channels vs shared memory + locks?
- A point of debate: one view.
- use channels when you want one thread to explicitly wait for another
- often wait for a result, or wait for the next request
- e.g. when client Call() waits for Listener()
- use shared memory and locks when the threads are not intentionally
- directly interacting, but just happen to r/w the same data
- e.g. when Call() uses tc.xid
- but: they are fundamentally equivalent; either can always be used.
Go's "memory model" requires explicit synchronization to communicate!
- This code is not correct:
var x int
done := false
go func() { x = f(...); done = true }
while done == false { }
Things to think about
- Make sure you understand why TCP doesn't solve the problem of lost
messages, and the ramifications on the semantics of operations.
- Meditate a bit on the tradeoffs of RPC.
- What are the costs?
- Where does it's goal of transparency breakdown?
- When would one-way messages be better?
- Why are synchronous RPC calls a bad idea in many cases?
- What is the impact on the threading/concurrency model if RPCs are not
synchronous?
- Take a look at Go's async RPC facility (see the
rpc.Client.Go
method).
Go RPC example
toy-rpc.go
package main
//
// toy RPC library
//
import "io"
import "fmt"
import "sync"
import "encoding/binary"
type ToyClient struct {
mu sync.Mutex
conn io.ReadWriteCloser // connection to server
xid int64 // next unique request #
pending map[int64]chan int32 // waiting calls [xid]
}
func MakeToyClient(conn io.ReadWriteCloser) *ToyClient {
tc := &ToyClient{}
tc.conn = conn
tc.pending = map[int64]chan int32{}
tc.xid = 1
go tc.Listener()
return tc
}
func (tc *ToyClient) WriteRequest(xid int64, procNum int32, arg int32) {
binary.Write(tc.conn, binary.LittleEndian, xid)
binary.Write(tc.conn, binary.LittleEndian, procNum)
binary.Write(tc.conn, binary.LittleEndian, arg)
}
func (tc *ToyClient) ReadReply() (int64, int32) {
var xid int64
var arg int32
binary.Read(tc.conn, binary.LittleEndian, &xid)
binary.Read(tc.conn, binary.LittleEndian, &arg)
return xid, arg
}
//
// client application uses Call() to make an RPC.
// client := MakeClient(server)
// reply := client.Call(procNum, arg)
//
func (tc *ToyClient) Call(procNum int32, arg int32) int32 {
done := make(chan int32) // for tc.Listener()
tc.mu.Lock()
xid := tc.xid // allocate a unique xid
tc.xid++
tc.pending[xid] = done // for tc.Listener()
tc.WriteRequest(xid, procNum, arg) // send to server
tc.mu.Unlock()
reply := <- done // wait for reply via tc.Listener()
tc.mu.Lock()
delete(tc.pending, xid)
tc.mu.Unlock()
return reply
}
//
// listen for replies from the server,
// send each reply to the right client Call() thread.
//
func (tc *ToyClient) Listener() {
for {
xid, reply := tc.ReadReply()
tc.mu.Lock()
ch, ok := tc.pending[xid]
tc.mu.Unlock()
if ok {
ch <- reply
}
}
}
type ToyServer struct {
mu sync.Mutex
conn io.ReadWriteCloser // connection from client
handlers map[int32]func(int32)int32 // procedures
}
func MakeToyServer(conn io.ReadWriteCloser) *ToyServer {
ts := &ToyServer{}
ts.conn = conn
ts.handlers = map[int32](func(int32)int32){}
go ts.Dispatcher()
return ts
}
func (ts *ToyServer) WriteReply(xid int64, arg int32) {
binary.Write(ts.conn, binary.LittleEndian, xid)
binary.Write(ts.conn, binary.LittleEndian, arg)
}
func (ts *ToyServer) ReadRequest() (int64, int32, int32) {
var xid int64
var procNum int32
var arg int32
binary.Read(ts.conn, binary.LittleEndian, &xid)
binary.Read(ts.conn, binary.LittleEndian, &procNum)
binary.Read(ts.conn, binary.LittleEndian, &arg)
return xid, procNum, arg
}
//
// listen for client requests,
// dispatch each to the right handler function,
// send back reply.
//
func (ts *ToyServer) Dispatcher() {
for {
xid, procNum, arg := ts.ReadRequest()
ts.mu.Lock()
fn, ok := ts.handlers[procNum]
ts.mu.Unlock()
go func() {
var reply int32
if ok {
reply = fn(arg)
}
ts.mu.Lock()
ts.WriteReply(xid, reply)
ts.mu.Unlock()
}()
}
}
type Pair struct {
r *io.PipeReader
w *io.PipeWriter
}
func (p Pair) Read(data []byte) (int, error) {
return p.r.Read(data)
}
func (p Pair) Write(data []byte) (int, error) {
return p.w.Write(data)
}
func (p Pair) Close() error {
p.r.Close()
return p.w.Close()
}
func main() {
r1, w1 := io.Pipe()
r2, w2 := io.Pipe()
cp := Pair{r : r1, w : w2}
sp := Pair{r : r2, w : w1}
tc := MakeToyClient(cp)
ts := MakeToyServer(sp)
ts.handlers[22] = func(a int32) int32 { return a+1 }
reply := tc.Call(22, 100)
fmt.Printf("Call(22, 100) -> %v\n", reply)
}