IPC¶
Socket Messages¶
Sockets are used for event-driven communication.
Formats¶
Except for the VSD socket, all sockets send messages that are MessagePack-encoded objects. MessagePack is easy to use, flexible like JSON, pretty fast, and doesn’t take up that much space on the wire. Most Runtime messages are small and simple, so a serialization framework like Protobuf requiring a strongly-typed schema would be overkill.
- RPC
Follows the MessagePack-RPC specification.
Request object:
[0, <request id: int>, <procedure: String>, [<arg1>, <arg2>, ...]]
Response object:
[1, <request id: int>, <error: null or Object>, <result>]
See the service documentation for the full list of procedures.
- Log Message
{ "event": <message: String>, /* The message that was logged */ "logger": <module: String>, /* For example: runtime.<submod1>.<submod2> */ "level": <level: String>, /* One of: debug, info, warning, error, critical */ "timestamp": <timestamp: String>, /* ISO format */ "extra": { /* More contextual data. */ ... } }
- Gamepad Input
{ "gamepads": { <gamepad index: String>: { "lx": <float>, "ly": <float>, "rx": <float>, "ry": <float>, "btn": <int> }, ... } }
The
[lr][xy]
parameters denote joystick positions, wherel
andr
stand for the left and right joysticks, respectively, andx
andy
are Cartesian coordinates. The origin (0, 0) corresponds to the joystick in the resting position. Each joystick’s positon is constrained within the unit circle.btn
is a bitmask where a 1 bit indicates the corresponding button is pressed. See the DOM Gamepad specification for the list of buttons. The first button in theGamepad.buttons
attribute corresponds to the least-significant bit of the bitmask.- Smart Device Update
{ "sd": { <uid: String>: { <param name: String>: <param value>, ... }, ... }, "aliases": { <uid: String>: <alias: String>, ... } }
Each Smart Device UID is formatted as an integer. To reduce the message’s size, parameters that have not changed since the last update may not be sent. All device UIDs will always be sent.
Each device UID may have an alias, an alternative human-readable name.
Buffers¶
Buffers backed by shared memory store and communicate peripheral data between processes.
In this context, peripherals include not only Smart Devices (SDs), but also gamepads, commodity sensors like cameras, and even queues of messages sent by other robots.
Each peripheral is allocated a buffer, a shared memory object under /dev/shm
on Linux, formatted as a C-style structure.
Every peripheral has an owner: the process responsible for communicating with the peripheral.
For example, the device
is the owner of SDs and server
is the owner of gamepads.
Consumers are processes that open views of the buffer.
Buffers do not explicitly notify consumers when an update occurs, as a condition variable can do. Instead, consumers should either access the buffer on-demand (as student code does) or poll the buffer at a fixed interval. Batching updates in this way is less noisy.
Catalog¶
The peripheral catalog is a YAML config file detailing all available peripherals and their parameters. Some peripherals are Smart Devices with special catalog fields.
"<peripheral-name>":
# Provided if this peripheral is a Smart Device.
device_id: <int>
# The delay (in ms) between subscription updates. Omit for no subscription.
# Ignored for non-Smart Device peripherals.
delay: <float>
params:
- # Required (any legal Python identifier)
name: "<name>"
# Type name (legal types are suffixes to ``ctypes.c_*``). You may
# specify a length-n array by adding "[n]" as a suffix.
type: "<type>"
# Minimum and maximum limits used for validation. Defaults to -inf to
# inf. If the validation check fails, the value is clamped within the
# range and a warning is emitted. Ignored for non-numeric parameters.
lower: <real>
upper: <real>
# Whether the parameter is readable or writeable by student code, which
# emits a warning if the access constraint is violated.
readable: <bool>
writeable: <bool>
# Whether this parameter should be subscribed to. Ignored for non-Smart
# Device peripherals.
subscribed: <bool>
...
...
SDs may have up to 16 parameters, per the specification. Non-SD peripherals have no such restriction.
Format¶
As shown, each buffer consists of up to three substructures: a read block, write block, and possibly a device control block.
The actual sizes of these blocks may vary, depending on how ctypes
chooses to align each structure’s fields.
A pthread
mutex protects access to the entire buffer.
A RW lock is not useful since most buffer operations involves both reading and writing.
The valid bit indicates whether this buffer is active (see notes on the buffer lifecycle for details).
The read and write blocks are where student code reads from and writes into, respectively.
Read block parameters are currently sensed values while write block parameters are desired values.
The read block contains a parameter iff that parameter is readable, according to the catalog.
(Likewise for writeable parameters in the write block.)
The Timestamp
field is a double representing the seconds since the epoch, possibly fractional, when any parameter in that block was last written to.
The device control block contains special SD-only fields.
The device
process polls each SD’s buffer at a fixed frequency and may send messages to the SD on each cycle.
Field |
Description |
---|---|
|
Describes the current state of the SD subscription. |
|
|
|
|
|
A bitmap identifying parameters the
|
|
Similar to |
|
A bitmap identifying parameters
changed since the last SD update.
The bits are set when a |
Operations¶
A list of atomic buffer operations are listed in the table below. All operations must begin by acquiring the mutex, checking the valid bit and, if the bit is not set, aborting the operation.
Operation |
Description |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
The peripheral owner updates parameters in the read block, sets the corresponding bits in the |
|
The peripheral owner sets the valid bit. |
|
|
Lifecycle¶
Allocating and freeing shared memory is difficult because all consumers must coordinate to achieve consensus. Otherwise, Runtime risks accessing invalid memory or trying to close a buffer that still has outstanding references.
From the Python documentation:
Requests that the underlying shared memory block be destroyed. In order to ensure proper cleanup of resources,
unlink()
should be called once (and only once) across all processes which have need for the shared memory block. After requesting its destruction, a shared memory block may or may not be immediately destroyed and this behavior may differ across platforms. Attempts to access data inside the shared memory block afterunlink()
has been called may result in memory access errors. Note: the last process relinquishing its hold on a shared memory block may callunlink()
andclose()
in either order.
Runtime takes a lazy approach to shared memory management, meaning consumers do not use a background task to proactively open or close views of shared memory.
When a peripheral connects for the first time, the peripheral owner creates the shared memory block and sets the buffer’s valid bit. Subsequent disconnects and reconnects involve toggling the valid bit, but the underlying buffer remains valid memory. In fact, Runtime delays closing shared memory blocks until it exits because delayed cleanup eliminates the need for out-of-band acknowledgements from consumers that they have closed their views of a block about to be unlinked. Blocks are allocated on a per-device basis, and there is a bounded number of devices, so running out of shared memory is highly unlikely.
Note
Delayed cleanup also avoids data races when a connection is transient (i.e., a device rapidly disconnects, then reconnects). Poor contact in a loose USB port can cause a transient connection, especially if the robot is colliding with other objects.
Consumers may acquire a list of available buffers from the children of /dev/shm
.
Invalid blocks should be treated as if the shared memory does not exist on the filesystem.
A consumer should never die while holding a buffer’s mutex if the consumer uses Python’s context manager pattern to run an exit handler releasing the mutex.
pthread
also has a robust mutex option to deal with a dead owner.
Rejected alternative implementations that proactively clean up shared memory:
server
is the sole owner of all shared memory blocks. Consumers pollserver
for a list of active blocks and request allocation/deallocation. This design is quite noisy andserver
must guess when there are no more outstanding views of a shared memory block before it unlinks the block.Add a reference count to every block and unlink the block when the number of views reaches zero. A service can die suddenly and leave the reference count too high. This design also suffers from the aforementioned transient connection issue.
Use a special shared memory block as a directory of all active blocks, their open/close states, and reference counts. This introduces more shared state with a bad single point of failure.
Use
inotify
to alert consumers when a new buffer is available or unavailable. When the peripheral disconnects, the peripheral owner unlinks the shared memory immediately, which triggersinotify
to prompt consumers to close their views.Although this is an elegant event-driven solution, it relies on the
multiprocessing.shared_memory
module’s handling of what it calls “memory access errors”. It’s unclear whether these errors entail exceptions or segfaults; the latter would be very bad. POSIX shared memory, which Python’smultiprocessing.shared_memory
module wraps, allowsunlink()
to precede allclose()
calls. The block remains usable, but invisible on the filesystem, afterunlink()
. In any case, relying on platform-specific behavior and an API likeinotify
is not portable.