The Performance of µ-Kernel-Based Systems
Härtig, Hohmuth, Liedtke, Schoenberg, Wolter (1997)
What kind of paper?
- Refute conventional wisdom.
- Performance Analysis.
Thesis Statement
The research community decided that microkernels were bad only
because first generation microkernels were slow. However,
microkernels should be very tightly-coupled with the hardware,
and if they are, then the resulting systems can be perform as
well as monolithic kernels.
Look at this via a 4-way comparison
- Native Linux
- L4Linux
- MkLinux
- In-kernel MkLinux
The L4 micro-kernel
- Two main abstractions:
- Fundamental mechanism: IPC
- Initial address space is physical memory (sigma-0)
- Construct new address spaces by granting, mapping, and unmapping pages.
- Address spaces constructed and maintained via pagers.
- I/O ports are part of address space.
- Interrupts are treated as IPC (send a message to the destination thread).
- Small-address space optimization on the Pentium
- Use segments to protect address spaces between 4 and 512 MB.
- Actually limit the optimization to a total of 512 MB.
- Simulates a tagged TLB.
- Runs on x86, Alpha, and MIPS.
L4Linux
- Full binary compatibility.
- Used single Linux server.
- Map physical memory into the server's address space.
- Can map only parts of physical memory if desired.
- User processes can have address spaces larger than the server's.
- Both Linux and L4 have to maintain page tables (for security, they are
in L4).
- Single L4 thread in the server; Linux multiplexes the thread.
- Use interrupt disabling for synchronization.
- Interrupt handlers
- Implement top half handlers as threads waiting for messages (as
delivered by L4).
- Bottom halves all implemented by a single thread.
- Interrupt threads execute at higher priority than the server.
- User processes are L4 tasks.
- Linux server is the pager for these tasks.
- System calls are IPCs between user task and the server.
- Signals implemented by separate thread in each user-process.
Performance
- Nicely designed experiments to show different things.
- Native Linux to L4Linux
- L4Linux to Mach Linux
- Mach Linux server to co-location
- Overall we see that while you can observe that the microkernel
introduces overhead in microbenchmarks, as you do more macrobenchmarks,
we see that it has less of an impact. What does this say about the
macrobenchmarks being used?
Compatibility
- Looked at getpid to look at pure system call overhead.
- Next used lmbench and hbench to look at a variety of primitives.
- Microkernel introduces anywhere from 0 overhead (TCP latency) to a
factor of 2.5.
- Linux server build
- 6% overhead of microkernel.
- Co-location saves you nearly 50% of the Mach microkernel hit.
- L4 has less than half the overhead of Mach.
- AIM suite looks at systems under saturation.
- Effects of L4 microkernel are minimal (5-10%).
- Mach microkernel shows significant hit.
Extensibility
- Pipes and RPC
- 4-way comparison
- 5 Linux Variants
- Native Linux pipers
- L4Linux pipes using shared library
- L4Linux pipes using trampoline
- User mode MkLinux
- Co-located MkLinux
- L4 native pipes
- Synchronous L4 RPC
- Synchronous mapped RPC
- Linux mechanism suffers from extra data copies.
- L4 provides for efficient native implementation.
- VM
- Look at a user-level pager.
- Results show that user-level pagers are more efficient under L4 (a surprise?).
- Cache partitioning
- Summarizes results to show that partitioning the cache can be done
to improve performance by a factor of 4 by reducing the worst case
cache behavior.
Conclusions
- Not clear that co-location is a big deal (i.e., are extensible
kernels necessary).
- Microkernel does not need to impose excessive overhead.
- Speed of microkernel dictates speed of resulting system.
- Fast IPC makes microkernel extensibility feasible.