| Myrinet Synchronized Multichannel API |
The Virtual Prototyping project requires low latency communication between a SGI OCTANE workstation running IRIX and a PowerPC 604 single-board computer running the real-time OS VxWorks.
The OCTANE updates a visual display of a virtual world, and the PowerPC handles the interaction of a robotic manipulator with this virtual world. The PowerPC supplies the OCTANE with information about the position and orientation of each object and the manipulator in the virtual world, and the OCTANE supplies the PowerPC with the global closest point, i.e. the point out of all the object surfaces which is closest to the manipulator's end effector. (Other communication does occur, but these are the most latency sensitive.)
Both computers perform their work in cycles, and need to know only the most recent values for the data sent to them. But it is also important that the visual simulation receive a set of positions that are associated. For example, if the manipulator has grasped and is moving an object, it is important for the visual simulation to receive position information that is sychronized. The position information for the manipulator and the object should be derived from one particular cycle of the PowerPC's calculations. If they are not, the visual simulation will show them moving with respect to one another, when the PowerPC is actually moving them in unison.
To use a byte-stream protocol such as TCP/IP, we would have to wrap it with a protocol which would extract the data for the most recent complete cycle. Using conventional shared memory (say, using SBS Bit-3's broadcast memory system) would eliminate the network overhead, but would require additional hardware, as well as implementing a 'rate-matching' scheme to synchronize the channels and provide a consistent set to the reader.
The API I describe below is an attempt to implement low overhead,
low latency replicated shared memory communication which provides
for the synchronization of several channels of data, using myrinet
as the communications fabric. I use MSM
(Myrinet Synchronized Multichannel API) to label this method.
While using myrinet hardware in this way requires no less an
investment in special purpose hardware than a hardware shared
memory implementation, and though the hardware shared memory approach
would involve much less software engineering, this approach
uses hardware we already have. Further, other research efforts here
involve low latency protocols using myrinet. This work will give
me experience badly needed before tackling these other low
latency protocols.
The Flavor
The intent of this api is to provide a reader and writer each
with a block of memory consisting of N objects of size M
bytes (where N is the number of channels and M is the size of
each channel) which can be written by the writer and read by
the reader. The reader and writer both perform their reads/writes
of this memory inside begin_frame/end_frame function calls. The reader is
guaranteed to see a snapshot of the writer's memory as of the
writer's most recent end_frame call.
The API
void msm_err(char *s, int error); | Prints a message describing the error given by the parameter. Prepends the string s to this message. Works like perror does for standard error codes. | ||||||||||
int
msm_connect_write(int local_unit,
int local_port,
int reader_id,
int reader_port,
int number_of_channels,
int size_of_each_channel,
int *err);
int
msm_connect_read(int local_unit,
int local_port,
int writer_id,
int writer_port,
int number_of_channels,
int size_of_each_channel,
int *err);
|
Connects to a remote host as a writer or reader. Since each host
can have more that one myrinet card, you have to specify which one to
use with the local_unit parameter. Each interface in the host is numbered,
and unit 0 shows up as /hw/myri0 in IRIX, unit 1 as /hw/myri1, etc. In addition,
each interface on the network has a unique id number. You specifiy the
target partner using this id (writer_id or reader_id). Since many things
can go wrong, an error code is returned in err which can be examined using
msm_err(). A non-negative connection id is returned on success, and -1 is
returned on failure.
The interface id's of all myrinet equipped hosts will appear in a human-readable routing file placed in some standard location. | ||||||||||
int msm_begin_frame(int cid); int msm_end_frame(int cid); |
Marks the beginning and ending of a frame. The msm_begin_frame call
will block until (for writers) all the new channels have been sent to
the partner, or (for readers) the channel memory has been updated to
reflect the most recent complete frame received.
Returns 0 on success, -1 on failure. Should fail only if cid is not valid. Note: msm_begin_frame can block. For readers, it will only block until the DMA operations which bring new data into memory are complete. For writers, it will block until all new data in the last frame has been correctly received and acknowledged by the receiving interface card. Note that this has nothing do with what the partner process is doing; the receiving interface card receives and acknowledges packets independent what any user process is doing. If the receiving host has crashed (and thus the receiving interface card cannot acknowledge), this call will block for about 10 seconds while the system tries to resend. After that, the status will go to the unconnected state, and the begin_frame call will return. The connection will be reestablished when the receiving host reboots and the partner process restarts. | ||||||||||
msm_channel_write(int cid,
int channel_number,
void *buffer);
msm_channel_read(int cid,
int channel_number,
void *buffer);
|
In order for two hosts on a myrinet to communicate using this method, one must call msm_connect_read and the other must call msm_connect_write. The writer targets the reader's host and port, and vice versa. If the reader and writer agree on port, channel size and number of channels, this establishes a one-way transmision of the requested number of channels. To establish two-way communication, set up two one-way channels. You can establish as many connections in either direction as the fixed number of ports per interface (currently 8) and memory will allow. Each connection can have any number and size of channels. (Both the number of channels and the size of each channel in bytes must be nonzero and divisible by four.)
This does not follow a client/server model; either reader or writer can be called first. The writer can proceed to write before the reader connects. Once the reader does, he will receive the most recent values written. The reader can also proceed to read before the writer connects. The result will be that none of the channels will get new data until the writer connects. The design also allows for either reader or writer to crash or exit without disturbing his partner. The connection will be reestablished when the partner rejoins.
(Annoying complication: VxWorks is not a secure OS; that is,
application crashes often crash the OS, and most of the time
an application crash means a reboot. So this implementation has to handle reboots of any host as well.)
Example
For example, the simplest pattern of operation is described below.
I assume that there are two process running on two separate hosts connected to the Myrinet network. Since each host can have more than one myrinet interface, I assume that each process uses the first unit (unit 0) on their respective hosts, and that Host A's unit is given the interface id 0, and Host B's unit is given the interface id 1. The port numbers were chosen to match the interface id, but we could have picked any from 0 to 7.
{
Transform xform[100]; /* 100 transform objects, each 48 bytes */
cid = msm_connect_write(0, /* unit 0; appears as /hw/myri0 in IRIX */
0, /* port 0 on this unit
1, /* the remote interface id */
1, /* port 1 on interface 1 */
100, /* 100 channels, 0-99 */
48); /* 48 bytes in a channel */
while (...) {
msm_begin_frame(cid) /* i'm ignoring some return values for now.
they will signal errors. */
update(xform); /* update all transforms */
msm_channel_write(cid,
0, /* channel 0 */
&xform[0]); /* write new xform0 to channel 0 */
msm_channel_write(cid,
1, /* channel 1 */
&xform[1]); /* write new xform1 to channel 1 */
msm_channel_write(cid,
34, /* channel 34 */
&xform[34]); /* write new xform34 to channel 34 */
msm_end_frame(cid); /* only now will the data be seen by
the reader; these three xforms
will be modified simultaneously on
the reader's side. */
}
msm_disconnect(cid)
}
{
Transform xform[100]; /* 100 transform objects, each 48 bytes */
cid = msm_connect_read(1, /* unit 1 on this host; shows up as /hw/myri1 under IRIX */
1, /* port 1 on this unit */
0, /* expects data from interface id 0 */
0, /* and port 0. */
100, /* 100 channels */
48) /* 48 bytes */
while (...) {
msm_begin_frame(cid);
for (int i=0; i<100; i++) {
/* if there is new data on a channel,
copy it into the associated xform. */
if ( 0 < msm_channel_read(cid,
i,
&xform[i])) {
printf(" xform %d updated this cycle.\n", i);
}
else { /* error: return value of -1 means the
cid was bad, the channel number was bad,
or the channel was not a reader. */
}
}
msm_end_frame(cid);
/* do other work here.
RESTRICTION: it is important that you do not access the
channels (using channel_read or check_read outside of the begin/end,
because the data is being updated by the myrinet interface card.
Doing other work is important, because the end_ call triggers
an update of all channels, which may take some time.
On the next loop, the begin_ call will block until the interface is
done working. If you do work here that does not
involve communication, you maximize the parallelism.
*/
}
msm_disconnect(cid);
}
The begin and end functions define the region of code in which it is safe to read or write the channels. If you attempt to read or write channels outside of these 'brackets', you will corrupt your data. They also serve to enforce the boundaries for data consistency: all writer updates between a begin and an end will be seen as one update by the reader.
The channel_read and channel_write functions copy from the area pointed to by their third argument into an internal buffer. You can reduce the protocol to zero-copy if you use this internal buffer directly. The mark, check, and buffer_address functions facilitate this.
The msm_buffer_address function returns a pointer to this internal buffer. It holds the current values for each channel in order, so it looks like this, where s is the size of one channel in 32-bit words:
offset(words) data
0 channel 0, word 0
1 channel 0, word 1
2 channel 0, word 2
.
.
s-2 channel 0, word s-2
s-1 channel 0, word s-1
s channel 1, word 0
s+1 channel 1, word 1
.
.
2s-2 channel 1, word s-2
2s-1 channel 1, word s-1
2s channel 2, word s
So, I've just gone overboard explaining something simple. You can
write to or read from these locations directly, bypassing the copy.
Here is the above example rewritten for zero-copy operation.
{
cid = msm_connect_write(0, /* unit 0 */
0, /* port 0 on this unit */
1, /* the remote interface id */
1, /* port 1 on interface 1 */
100, /* 100 channels, 0-99 */
48); /* 48 bytesin a channel */
Transform *xform = (Transform *)msm_buffer_address(cid);
while (...) {
msm_begin_write(cid) /* i'm ignoring some return values;
they signal errors. */
update(xform); /* now the updates occur directly to
the internal buffer */
/* now inform the reader of changes on xforms 0,1,and 34 */
msm_mark_write(cid,0); /* mark each channel as written. */
msm_mark_write(cid,1); /* You have to do this because */
msm_mark_write(cid,34); /* the api doesn't know you fiddled with the
internal buffer. */
msm_end_write(cid); /* only now will the data be seen by
the reader; all xforms
will be modified simultaneously on
the reader's side. */
}
msm_disconnect(cid)
}
{
cid = msm_connect_read(1, /* unit 1 */
1, /* port 1 on this unit */
0, /* expects data from interface id 0 */
0, /* and port 0. */
100, /* 100 channels */
48) /* 48 bytes */
Transform *xform = (Transform *)msm_buffer_address(cid);
while (...) {
msm_begin_frame(cid);
for (int i=0; i<100; i++) {
if ( 0< msm_check_read(cid,i) ) {
printf(" xform %d was updated this cycle.\n", i);
}
else {
/* error: return value of -1 means the cid
was bad, the channel number was bad,
or the channel was not a reader. */
}
}
/* do any work involving the buffer (or equivalently
the xforms) here. */
msm_end_frame(cid);
/* do other work here.
RESTRICTION: it is important that you do not access the
buffer/xforms outside of the begin/end, because they are
being updated by the myrinet interface card.
*/
}
msm_disconnect(cid);
}
Right now only robocop (the IRIX 6.4 Octane) and vxw0 (the VxWorks PPC MV2604) are set up to use the API, but Zan and Jayna (IRIX 6.5 Origin 200) and the O2K could easily be set up to use it.
To use the API under IRIX, include the msmapi.h file and link with the msmapi.o file. (msmapi.c is available as well, but you'd need my whole myrinet distribution to compile it.) All access to the network is memory protected, and all resources are properly reclaimed if a user program crashes. Port usage is granted on a first come, first served basis, as with TCP. Connections persist across a fork, but the API does not single-thread any of the functions. A valid connection id can be used by any process in the share group. The user must ensure that only one thread or process at a time accesses the begin, end, write, or mark functions. Any number of threads/processes can access the read channel or check read functions at once, as long as these accesses take place (temporally) between begin and end calls. The port in use will remain in use until the last process using it calls disconnect or exits.
To use it under VxWorks, simply include the msmapi.h
file in your code; VxWorks will preload the library after
booting. A connection id can be used by any task. Exactly one
task must call the disconnect function for resources to be reclaimed
properly. The VxWorks implementation cannot provide
memory protected access to the network, since we don't have the virtual memory
extentions to VxWorks. Also, the VxWorks implementation is not guaranteed
to clean up network resources properly if the user code crashes.
You should just reboot if your VxWorks code crashes.
It is safe to do this even if other hosts are still connected and
using the API.
Routing/Hostname Issues
The system supports up to 256 hosts on one myrinet LAN. Each network interface is given an interface id (0 <= id <= 255). The robotics lab has only three interfaces:
Host/Unit interface id
Robocop, unit 0 0
Robocop, unit 1 1
vxw0 , unit 0 2
The routing is programmed into the interface cards after
the host boots. The routing table is kept in a human-readable
file (/home/robotics/msmroutes or /res/robotics/msmroutes).
| . |