Welcome to LIBCAP!

Under construction

Temporary, old documentation:


Note

This code is under development and being moved to a separate library, as it is needed in other projects. Expect this to change in the near future.

Concepts

An LCD refers to objects managed by the microkernel - like a page of memory - using an integer identifier, similar to a file descriptor; these are called capability pointers, or cptr_t's.

The microkernel resolves these integer identifiers to the objects and the LCD's rights using a capability space, or cspace. Each LCD has a capability space.

A cspace is a radix-tree-like data structure (but the look up is a bit different). Each node in the tree contains multiple slots that contain pointers to further nodes, or object and rights data. The slots are called cnodes and the nodes in the tree are called cnode tables. So, each cnode is either empty, contains a pointer to another cnode table, or contains object and rights data.

An LCD can grant a capability to another LCD. Suppose LCD A grants a capability to LCD B. When this happens, the cnode in LCD A's cspace becomes the parent of the cnode in LCD B's cspace, in a capability derivation tree, or CDT. If LCD B grants the rights on the same object to LCD C, the cnode in LCD C's cspace becomes a child of LCD B's cnode in the cdt, and so on. If LCD A decides to revoke rights, the microkernel will recursively delete the capabilities in LCD B's and LCD C's cspaces, using the cdt.

So, cspaces are contained in an LCD, but cdts have pointers that span across cspaces.

Relevant Files

  • lcd/domains/include/lcd-domains/types.h - cptr definition, simple functions, cspace configuration macros
  • lcd-domains/microkernel/internal.h - internal cspace function defs
  • lcd-domains/microkernel/cap.c - cspace implementation in microkernel
  • lcd-domains/kliblcd/cptr_cache.c - cptr cache implementation for klcd's
  • lcd/domains/liblcd/lcd/cap.c - cptr cache implementation for regular lcd's

Operations

Cspace init/destroy

Called by the microkernel when the LCD is created/destroyed.

Insert

When an object is first created / introduced into the microkernel, it is inserted into a cnode in the calling LCD's cspace, using __lcd_cap_insert. The calling LCD should provide a cptr to indicate where the object is placed. The microkernel will initialize a cdt with the cnode at the root.

Grant

Grant can occur at two times: when an LCD is being created, and during ipc.

If LCD A is creating LCD B, A can grant capabilities to B using lcd_cap_grant. LCD A is responsible for notifying B where the capabilities are in B's cspace, by some agreed upon protocol.

During IPC, LCD A can grant rights to LCD B. This is how it works:

  1. Suppose LCD A has a capability to an object already. The capability is stored in a cnode and referenced by cptr1.
  2. Suppose LCD B has an empty cnode slot in its cspace referred to by cptr2.
  3. LCD B puts cptr2 in its IPC buffer and does an IPC receive, to indicate it will accept a granted capability into the cnode at cptr2.
  4. LCD A puts cptr1 in its IPC buffer and does an IPC send, to indicate it will grant the capability that is in the cnode at cptr1.
  5. The microkernel matches A and B up, and invokes __lcd_cap_grant. This will copy the object data from A's cspace to B's cspace, and make the cnode at cptr2 a child of the cnode at cptr1 in the cdt for the object.

This convoluted technique is used to ensure that LCD A and LCD B are agreeing to share rights - LCD A is willing to grant, and LCD B is willing to receive the grant and make room in its cspace.

Revoke

At some point later, LCD A may want to revoke the rights it granted to LCD B. It does so using a call into the microkernel, which, in turn, will invoke _lcd_cap_revoke. This will delete LCD B's cnode from the cdt for the object, clear the cnode for later use, and update the state of LCD B to reflect the change in rights (e.g., if the capability was for a page, and LCD B had mapped the page, the page will become unmapped).

Note: a revoke does not delete the capability itself, only its children. This is what seL4 does, for example.

Get/Put

The microkernel may invoke __lcd_cnode_get when LCDs invoke methods on certain objects to confirm they have rights to do so. Invoking __lcd_cnode_get will lock the cnode, but not the underlying object. It should be matched with __lcd_cnode_put. Because we are using mutexes (see "Locking" below), interrupts are not disabled while in a "cnode critical section" - so beware! We may change the lock types in the future - but mutexes are easier and more forgiving than hard spinlocks.

Warnings

Multiple capabilities to same object

It is possible for an LCD to acquire mutliple capabilities to the same object. For example, an LCD could get 3 capabilities to the same page. Each time a capability to the page is deleted, the microkernel will unmap the page if necessary. If the page is mapped 3 times to 3 different places or the same place, the microkernel will try to unmap each time, but won't crash.

The microkernel is carefully designed to handle this weird case for the objects it currently manages, but if you add new kinds of objects, beware!

The cspace/cdt data structures can also accommodate this weird case too.

Note: It's not possible for an object to be inserted mutliple times - and lead to multiple cdts - because the microkernel does the insertion.

Comparison to seL4

No cspace sharing

Unlike seL4, cspaces cannot be shared, and the microkernel manages all aspects of cspace creation. In seL4, threads can create cnodes using untype/retype, and dynamically build a cspace, using the seL4 microkernel interfaces.

To make things doable, we don't allow LCDs to manage microkernel objects at the granularity that seL4 allows - seL4 allows threads to create page tables, cnodes in cspaces, and many other kinds of microkernel objects. Why don't we allow this? Because the threads do all of the set up, so the microkernel doesn't know (or has to do a lot of work and maintain extra state to know) how to tear it all down. For example, if an LCD sets up a big guest physical paging hierarchy, and loses rights on one of the page directories, the microkernel would have to know which directories to unmap it from, etc. seL4 handles all of this using some tricks (e.g., address space pools), but the developers list caveats and limitiations, and in general it's hard.

Punchline: it's complicated. Maybe this will change in the future.

No mint or copy

In seL4, threads populate cspaces using mint and copy. Because we aren't having threads / LCDs create and set up cspaces to the extent that seL4 does, we do the copying on behalf of the LCDs when rights are granted. (As mentioned in , it is still possible for an LCD to have multiple capabilities to the same object.)

Extra grant mechanism

In our microkernel, an LCD can grant capabilities to an LCD it is creating or has created, using lcd_cap_grant.

Cptr Cache

kliblcd and liblcd contain cptr cache implementations, so that the other code inside an LCD doesn't need to track which cnodes are used. Not every integer is a valid cptr, so the allocation is a bit complicated. See the Capability Space Radix Tree below for an overview of cptr indexing.

The cache contains a bitmap for each level of the cspace tree. This makes it easy to generate valid cptr's and track allocation/free's.

Note: It may be silly to have a cptr cache since the microkernel can just find a free cnode for the LCD. (We aren't having LCDs manage cspaces to the extent seL4 does, so why use cptr caches?)

Capability Space Radix Tree

The cspace is for resolving a cptr, or index, to a capability / cnode.

A single lock in the struct cspace protects "radix tree traversal", and individual locks for each cnode (slots in the cnode table) protect each cnode.

Each node in the tree is a cnode table of slots with the following layout:

     ______ cap slots _______   ___ table id slots _____
    /                        \ /                        \
    +---+---+-- ... --+---+---+---+---+-- ... --+---+---+
    |   |   |         |   |   |   |   |         |   |   |
    +---+---+-- ... --+---+---+---+---+-- ... --+---+---+

There are LCD_CNODE_TABLE_NUM_SLOTS slots - half are slots for capabilities, the other half for further pointers to more tables (so the number of slots should be a multiple of 2). The cspace contains a root cnode table to start it off.

The cspace is built dynamically by the microkernel as slots are referenced by an LCD. (This is different from seL4 - threads in seL4 are responsible for building the cspace using the interface provided by the microkernel.)

(See notes at the end if you're wondering why this is so complicated.)

The shape of the cspace is controlled by three parameters (in @@include/lcd-domains/types.h):

      LCD_CPTR_DEPTH_BITS   -   controls number of levels in cspace 

                             0 bits :   only root table (2^0 levels)
                             1 bit  :   root and one more level (2^1 levels)
                             2 bits :   root and three more levels (2^2 levels)

      LCD_CPTR_FANOUT_BITS  -   controls how many table slots are in the
                                cnode tables - in other words, the fanout

                             0 bits :   one table slot (2^0)
                             1 bit  :   two table slots (2^1)
                             2 bits :   four table slots (2^2)

      LCD_CPTR_SLOT_BITS    -   controls how many cap slots are in the
                                cnode tables; similar to table slots above

An index, or cptr, encodes the location of a cnode in the cspace. The encoding includes the level of the table that contains the slot; the fanout "path" to get to that table; and the slot index inside the table.

The lookup is just like a radix tree lookup (starting from LSB instead of MSB), but the level bits tell us how far to go / how many fanout bits are meaningful.

If all of the parameters are 2 (2 bits each), the cptr bit layout is the following:

                                        ____ cap slot index bits
                                       /
                                      /
                         00 00 00 00 00
                         /   |  |  |
                        /    |  |  |
            level -----'     fanout path (like a radix tree path)

So, in LSB order, the slot index bits come first, then the fanout path, then the level. The level and slot index bits are always in the same position. The interpretation of the fanout path bits depends on the level.

For example, if the cptr is (in binary):

                         11 11 00 10 01

the slot index is 01 = 1, and the level of the table is 11 = 3. All three pairs of fanout bits are used to traverse from the root cnode table to the final cnode table:

  1. From the root cnode table, we look at the first pair of fanout bits (in LSB order) = 10 = 2; so we follow the 2nd table slot (zero indexed) in the root cnode table to level 1
  2. From the level 1 cnode table, we look at the next pair of fanout bits = 00 = 0; so we follow the 0th table slot in this cnode table to level 2
  3. Finally, from level 2, we look at the next pair of fanout bits = 11 = 3; so we follow the 3rd table slot to arrive at the cnode table in level 3

We now use the cap slot index bits = 01 = 1 to look up the capability.

Another example: if the cptr is (in binary):

                         01 00 00 01 11

the level is 1, and only the first pair (01) of fanout bits are meaningful. Starting in the root cnode table,

  1. We see that the level = 01 = 1 > 0, so we look at the first pair of fanout bits = 01 = 1; we follow the pointer in the 1st table slot to level 1
  2. We now see that the level of the cptr = the level we are at, 1, so we now use the slot bits to look up the slot in the table (11 = 3).

Notes

It looks more complicated than it is. It's just radix tree traversal with a depth check.

We tried the "obvious" way of using a traditional radix-tree style approach. But this lead to balancing problems since there are slots inside each node of the tree. (In a traditional radix tree, all of the "slots" are at the last level of the tree.)

We tried a similar technique, but without using the level bits. This lead to problems: since a cptr may refer to different levels, it became hard to know when to stop the radix tree-like traversal.

As an alternative, we tried taking triples of bits from LSB to MSB: If the high bit was set in a triple, this indicated to keep traversing and look at the next triple; otherwise, stop. But it's hard to convert a number like 15 to an index of this type. And this makes alloc/free of cptrs hard.

Capability Derivation Tree

When an object is created and inserted into the LCD A's cspace, a new cdt is created, and the first cnode is made the root. If LCD A grants rights to LCD B, the cnode in LCD B's cspace is made a child of the cnode in A's cspace. And so on. The cdt can look like this:

   - LCD A's cspace -         
   |                |           - LCD B's cspace -
   |     .          |           |                |
   |    .           |           |                |
   |   /            |           |                |
   | +-+            |           |     .          |
   | +-+            |           |      .         |
   `- \ ------------'           |       \        |
       `------------------------------> +-+      |
       |                        |       +-+      |
       |                         `-------|-------'
       |      - LCD C's cspace -         |
       |      |                |         |
       |      |   .            |         |    - LCD D's cspace -
       |      |    \           |         |    |                |
       `---------> +-+         |         |    |        .       |
              |    +-+         |         |    |       .        |
              `----------------'         |    |      /         |
                                         |    |     +-+        |
                                          `-------> +-+        |
                                              `----------------'

If LCD A does a revoke, it will delete the capabilities (clear the cnodes) in B, C, and D's cspaces.

Each populated cnode that contains a capability also contains a pointer to the cdt (so if the cnode is cleared, it can be removed from the cdt, etc.). The cdt is protected by a lock.

Locking

We use mutexes with locking interruptible throughout - for now - so that we don't get into nasty deadlocks that require a reboot.

There are three primary locks - one lock per cspace, one lock per cdt, and one lock per cnode.

Cspace locks

The lock on the cspace provides mutual exclusion on cspace radix tree traversal and modifications. For example, an insert and grant cannot concurrently traverse the cspace, because an insert may create cnode tables on the fly, and the grant may incorrectly traverse those new tables. (It may not be possible for a call to insert and grant that involve the same thread to happen concurrently. The grant would require an lcd to be in an endpoint queue, so unless we had asynchronous endpoints, the insert couldn't be called concurrently. But the next example shows that the lock is necssary.)

As another example, an lcd may be in an endpoint queue while it is being torn down. Suppose it gets dequeued, and capabilities are being transferred. This requires lookups in the LCD's cspace. These lookups should fail. Eventually, once the tear down process reaches the cnode that refers to the endpoint, the code will ensure the lcd is not in the queue; but until then, all of this can happen. This is why we mark the cspace as invalid and release the lock, so that other threads like this can make progress but see that the LCD is going away.

cdt locks

The lock on a cdt provides mutual exclusion on cdt traversal and updates. For example, LCD 1 may be granting rights to LCD 2, but LCD 3 might be revoking rights to LCD 1; these operations need to be serialized (we only want to see two possible outcomes: LCD 1 grant succeeds, and then LCD 3 revokes all rights; or LCD 3 revoke succeeds, and then LCD 1 grant fails).

During a revoke, when cnodes are removed from a cdt, the microkernel state is updated to reflect that change in rights. It must be possible to update the microkernel state while the cdt is locked. For the current objects, this is ok, but beware.

cnode locks

The lock on a cnode provides mutual exclusion on it, and also prevents the containing cdt and cspace from going away while it is in use (they can't go away until the cnode is locked and removed from the cspace and/or cdt).

Special cptrs / capabilities

  • cptr 0 = null, always invalid
  • cptr 1 = capability to lcd's endpoint for receiving replies
  • cptr 2 = (dynamic) capability to caller's reply endpoint, during call/reply