The Performance of µ-Kernel-Based Systems

Härtig, Hohmuth, Liedtke, Schoenberg, Wolter (1997)

What kind of paper?

Refute conventional wisdom.
Performance Analysis.

Thesis Statement

The research community decided that microkernels were bad only because first generation microkernels were slow. However, microkernels should be very tightly-coupled with the hardware, and if they are, then the resulting systems can be perform as well as monolithic kernels.

Look at this via a 4-way comparison

Native Linux
L⁴Linux
MkLinux
In-kernel MkLinux

The L4 micro-kernel

Two main abstractions:
- Address space
- Thread
Fundamental mechanism: IPC
Initial address space is physical memory (sigma-0)
Construct new address spaces by granting, mapping, and unmapping pages.
Address spaces constructed and maintained via pagers.
I/O ports are part of address space.
Interrupts are treated as IPC (send a message to the destination thread).
Small-address space optimization on the Pentium
- Use segments to protect address spaces between 4 and 512 MB.
- Actually limit the optimization to a total of 512 MB.
- Simulates a tagged TLB.
Runs on x86, Alpha, and MIPS.

L⁴Linux

Full binary compatibility.
Used single Linux server.
Map physical memory into the server's address space.
- Can map only parts of physical memory if desired.
- User processes can have address spaces larger than the server's.
Both Linux and L4 have to maintain page tables (for security, they are in L4).
Single L4 thread in the server; Linux multiplexes the thread.
Use interrupt disabling for synchronization.
Interrupt handlers
- Implement top half handlers as threads waiting for messages (as delivered by L4).
- Bottom halves all implemented by a single thread.
- Interrupt threads execute at higher priority than the server.
User processes are L4 tasks.
Linux server is the pager for these tasks.
System calls are IPCs between user task and the server.
Signals implemented by separate thread in each user-process.

Performance

Nicely designed experiments to show different things.
- Native Linux to L⁴Linux
- L⁴Linux to Mach Linux
- Mach Linux server to co-location
Overall we see that while you can observe that the microkernel introduces overhead in microbenchmarks, as you do more macrobenchmarks, we see that it has less of an impact. What does this say about the macrobenchmarks being used?
Compatibility
- Looked at getpid to look at pure system call overhead.
- Next used lmbench and hbench to look at a variety of primitives.
- Microkernel introduces anywhere from 0 overhead (TCP latency) to a factor of 2.5.
- Linux server build
  - 6% overhead of microkernel.
  - Co-location saves you nearly 50% of the Mach microkernel hit.
  - L4 has less than half the overhead of Mach.
- AIM suite looks at systems under saturation.
  - Effects of L4 microkernel are minimal (5-10%).
  - Mach microkernel shows significant hit.
Extensibility
- Pipes and RPC
  - 4-way comparison
    1. 5 Linux Variants
      1. Native Linux pipers
      2. L⁴Linux pipes using shared library
      3. L⁴Linux pipes using trampoline
      4. User mode MkLinux
      5. Co-located MkLinux
    2. L4 native pipes
    3. Synchronous L4 RPC
    4. Synchronous mapped RPC
  - Linux mechanism suffers from extra data copies.
  - L4 provides for efficient native implementation.
- VM
  - Look at a user-level pager.
  - Results show that user-level pagers are more efficient under L4 (a surprise?).
- Cache partitioning
  - Summarizes results to show that partitioning the cache can be done to improve performance by a factor of 4 by reducing the worst case cache behavior.

Conclusions

Not clear that co-location is a big deal (i.e., are extensible kernels necessary).
Microkernel does not need to impose excessive overhead.
Speed of microkernel dictates speed of resulting system.
Fast IPC makes microkernel extensibility feasible.