Introduction

The whole goal is to run some kernel code in an isolated virtual machine container called a Lightweight Capability Domain (LCD) and allow it to interact with the rest of the host kernel (or other LCDs). LCDs run in a separate physical and virtual address space from the rest of the host kernel.

In any use case, there are a few key players:

  • The LCD microkernel
  • The LCD(s)
  • Code in the non-isolated part of the kernel (host kernel) that sets up the LCD(s) and interacts with them

See the figure below:

   +-------------------+
   |                   |
   |                   |                LCD
   | +--------------+  |        +------------------+  
   | | non-isolated |  |        |   isolated code  |
   | | code that    |  |        |                  |
   | | interacts    |  |        |                  |
   | | with the     |  |        |                  |
   | | LCD(s)       |  |        +------------------+
   | +--------------+  |        
   |                   |       
   |                   |     
   |                   |        
   |                   |
   |                   +-------------------------------------+
   |                                                         |
   |                       +--------------------------+      |
   |                       |                          |      |
   |                       |      LCD Microkernel     |      |
   |                       |                          |      |
   |                       +--------------------------+      |
   |                                                         |
   | Host Linux Kernel                                       |
   +---------------------------------------------------------+

LCD Microkernel

This is built as a regular Linux kernel module (lcd_domains.ko) and installed in the host Linux kernel.

  • It manages the LCDs similar to how KVM manages regular VMs
  • It provides an interface (described below) for non-isolated kernel code to create and interact with LCDs

It consists of a low-level, architecture-dependent part that sets up hardware virtual machines, and an architecture-independent part that does higher-level LCD management and provides the interface to non-isolated code:

                                LCD Microkernel
             +-------------------------+---------------------------+
             |  Architecture-dependent |  Architecture-independent |
             |       part (VT-x)       |         part              |
             +-------------------------+---------------------------+

Note that the LCD microkernel does not run at boot, and is mostly "hands off" with the rest of the Linux kernel. More on this below.

LCDs are isolated

We are paranoid about the kernel code that is running inside the LCD, or we just don't want to give it access to everything (all memory, devices, buses, etc.) because it doesn't need it all, and giving access only to what it needs helps us confine bugs. But the code inside the LCD needs at least some resources like memory and access to some devices to get its job done. Furthermore, we want to explicitly track the resources the LCD has access to (which pages of memory, which devices, etc.).

Tracking resources with capabilities

The LCD microkernel tracks every resource an LCD has access to using capabilities, and an LCD invokes operations on resources indirectly using a capability-mediated interface provided by the microkernel. Our design is similar to seL4 (seL4 manual).

More specifically, the microkernel maintains a CSpace for each LCD that stores references to resources the LCD has access to. When the LCD wants to invoke an operation on a resource, it uses a CPtr (cptr_t), a file-descriptor-like integer identifier, to refer to the resource in its CSpace.

For example, suppose an LCD has a capability to a page memory, and it wants to map that page in its physical address space. It invokes the following function (from the LIBLCD interface, explained below):

          int _lcd_mmap(cptr_t page, gpa_t physical_address);

Here, page is an integer identifier that the microkernel will use to look up the page capability in the LCD's CSpace. After resolving to a page capability, the microkernel can then update the LCD's physical address space by mapping the page of memory at physical_address.

See the libcap documentation for more on capabilities.

Non-isolated code comes into the picture

A large part of the host kernel doesn't even care about these LCDs, and doesn't want to interact with them or the LCD microkernel. It just wants to do its job and not be bothered. This code is virtually unaffected by the presence of the LCDs and LCD microkernel.

But some non-isolated code may want to create LCDs and interact with them. This code needs to be able to share resources with LCDs and communicate with them. Perhaps non-isolated code wants to set up some shared memory with an LCD.

Rather than have non-isolated code directly access internal microkernel data structures (e.g., share memory with an LCD by directly modifying its address space), we insist that non-isolated code that wants to interact with an LCD should use the same capability-mediated interface that isolated code uses. Of course, this is voluntary: the microkernel has no way of preventing non-isolated code from doing whatever it wants. Moreover, even as the non-isolated code is interacting with an LCD, it is free to interact with the rest of the host kernel in any way it wants (calling other kernel functions, etc.).

We insist that non-isolated code use the same capability-mediated interface so that we can track interactions that cross the isolation boundary, and it also makes the interaction patterns symmetric.

LIBLCD: the capability-mediated interface

This is the common interface used by isolated and non-isolated code. There are two implementations of it:

  • Isolated implementation: inside the library kernel (also called liblcd) that runs with the isolated code. This library kernel translates function calls like _lcd_mmap into lower level hypercalls (VMCALLs) out into the LCD microkernel.
  • Non-isolated implementation: inside the LCD microkernel module; it's called kliblcd.

While kliblcd is part of the LCD microkernel module, conceptually it's separate from it. kliblcd translates function calls like _lcd_mmap into internal function calls into the LCD microkernel. Many of these internal functions overlap with those used to handle hypercalls for isolated code, so we get a lot of code re-use.

This interface contains functions for

  • Allocating and mapping memory (pages)
  • Creating LCDs, and sharing resources with them
  • Communicating with LCDs using synchronous IPC
  • Working with capabilities for memory (translating memory addresses to capabilities)

Here is a more detailed picture now showing the interaction patterns. This shows an LCD sending a (synchronous) IPC message to a non-isolated thread in the host kernel. Both threads synchronize by calling indirectly (through liblcd/klibld) into the LCD microkernel, and the LCD microkernel does the message transfer.

   ----------------------+
     Host Linux Kernel   |               LCD 
                         |      +-------------------+
                         |      |                   |
                         |      |         |         |
                         |      |         |         |
                         |      |   isolated thread |
                         |      |   sending message |
                         |      |   to non-isolated |
                         |      |   thread          |
                         |      |         |         |
                         |      |    lcd_sync_send  |
                         |      |         |         |
                         |      |         |         |
                         |      |  +-------------+  |
                         |      |  |    liblcd   |  |
                         |      |  | (libkernel) |  |
                         |      |  +------|------+  |
                         |      +---------|---------+
                         |                |
                         |                +----------+ hypercall
          |              |                           | (VMCALL)
          |              |                           | 
  non-isolated thread    +-------------------------- | -----------------+
  receiving IPC message                              |                  |
  from LCD                                           |                  |
          |                                          V                  |
     lcd_sync_recv      +---------+  internal    +-----------------+    |
          |             |         |    call      |                 |    |
          +------------>| kliblcd -------------> | LCD Microkernel |    |
                        |         |              |                 |    |
                        +---------+              +-----------------+    |
                                                                        |
                                                                        |
  ----------------------------------------------------------------------+

Interacting with LCDs

There are two ways:

  • Synchronous IPC (lcd_sync_send, lcd_sync_recv, etc.)
  • Shared memory (including fast, ring buffer based IPC; see libfipc)