Introduction

Welcome to the Poplar Book, which serves as the main source of documentation for Poplar. The Book aims to be both a 10,000-meter overview of Poplar for the interested observer, and a definitive reference for the inner workings of the kernel and userspace.

Please note that this book (like the rest of the OS!) is still very early in development and may lag behind the state of the code. If anything is unclear, please file an issue!

What is Poplar?

At heart, Poplar is a microkernel written in the Rust programming language. Poplar becomes an "OS" when it's combined with other packages such as drivers, filesystems and user applications.

Poplar is designed to be a modern microkernel, supporting a minimal system call interface and first-class support for message-passing-based IPC between userspace processes. Versatile message-passing allows Poplar to move much more out of the kernel than traditionally possible. For example, the kernel has no concept of a filesystem or of files - instead, the VFS and all filesystems are implemented entirely in userspace, and files are read and written to by passing messages.

Why Rust?

While Poplar's design is in theory language-agnostic, the implementation is very tied to Rust. Rust is a systems programming language with a rich type system and a novel ownership model that guarantees memory and thread safety in safe code. This qualification is important, as Poplar uses a lot of unsafe code out of necessity - it's important to understand that the use of Rust does not in any way mean that Poplar is automatically bug-free.

However, Rust makes you think a lot more about how to make your programs safe, which is exactly the sort of code we want to be writing for a kernel. This focus on safety, as well as good ergonomics features and performance, makes Rust perfect for OS-level code.

The Poplar Kernel

TODO

Platforms

A platform is a build target for the kernel. In some cases, there is only one platform for an entire architecture because the hardware is relatively standardized (e.g. x86_64). Other times, hardware is different enough between platforms that it's easier to treat them as different targets (e.g. a headless ARM server that boots using UEFI, versus a Raspberry Pi).

All supported platforms are enumerated in the table below - some have their own sections with more details, while others are just described below. The platform you want to build for is specified in your Poplar.toml configuration file, or with the -p/--platform flag to xtask. Some platforms also have custom xtask commands to, for example, flash a device with a built image.

Platform nameArchDescription
x64x86_64Modern x86_64 platform.
rv64_virtRV64A virtual RISC-V QEMU platform.
mq_proRV64The MangoPi MQ-Pro RISC-V platform.

Platform: x64

The vast majority of x86_64 hardware is pretty similar, and so is treated as a single platform. It uses the hal_x86_64 HAL. We assume that the platform:

  • Boots using UEFI (using seed_uefi)
  • Supports the APIC
  • Supports the xsave instruction

Platform: rv64_virt

This is a virtual RISC-V platform emulated by qemu-system-riscv64's virt machine. It features:

  • A customizable number of emulated RV64 HARTs
  • Is booted via QEMU's -kernel option and OpenSBI
  • A Virtio block device with attached GPT 'disk'
  • Support for USB devices via EHCI

Devices such as the EHCI USB controller are connected to a PCIe bus, and so we use the Advanced Interrupt Architecture with MSIs to avoid the complexity of shared pin-based PCI interrupts. This is done by passing the aia=aplic-imsic machine option to QEMU.

MangoPi MQ-Pro

The MangoPi MQ-Pro is a small RISC-V development board, featuring an Allwinner D1 SoC with a single RV64 core, and either 512MiB or 1GiB of memory. Most public information about the D1 itself can be found on the Sunxi wiki.

You're probably going to want to solder a GPIO header to the board and get a USB-UART adaptor as well. The xtask contains a small serial utility for logging the output from the board, but you can also use an external program such as minicom. Adding a male-to-female jumper wire to a ground pin is also useful - you can touch it to the RST pad on the back of the board to reset it (allowing you to FEL new code onto it).

Boot procedure

The D1 can be booted from an SD card or flash, or, usefully for development, using Allwinner's FEL protocol, which allows data to be loaded into memory and code executed using a small USB stack. This procedure is best visualised with a diagram: Diagram of the D1's boot procedure

The initial part of this process is done by code loaded from the BROM (Boot ROM) - it contains the FEL stack, as well as enough code to load the first-stage bootloader from either an SD card or SPI flash. Data loaded by the FEL stack, or from the bootable media, is loaded into SRAM. The DRAM has to be brought up, either by the first-stage bootloader, or by a FEL payload.

Booting via FEL

To boot with FEL, the MQ-Pro needs to be plugged into the development machine via the USB-C OTG port (not the host port), and then booted into FEL mode. The easiest way to do this is to just remove the SD card - as long as the flash hasn't been written to, this should boot into FEL. It should then enumerate as a USB device with ID 1f3a:efe8 on the host.

You then need something that can talk the FEL protocol on the host. We're currently using xfel, but this may be replaced/augmented with a more capable solution in the future. xfel should be relatively easy to compile and install on a Linux system, and should install some udev rules that allow it to be used by normal users. xfel should then automatically detect a connected device in FEL mode.

The first step is to initialize the DRAM - xfel ddr d1 does this by uploading and running a small payload with the correct procedures. After this, further code can be loaded directly into DRAM - we load OpenSBI and Seed.

Then, we load OpenSBI's FW_JUMP firmware at the start of RAM, 0x4000_0000. This provides the SBI interface, moves from M-mode to S-mode, and then jumps into Seed, which is loaded after 512KiB after it at 0x4008_0000 (this address is supplied to OpenSBI at build-time). We also bundle a device tree for the platform into OpenSBI, which it uses to bootstrap the platform, and then supplies it onwards.

TODO: we should investigate customising the driver list to maybe get OpenSBI under 256KiB (it's just over).

Seed

Seed is Poplar's bootloader ± pre-kernel. What it is required to do varies by platform, but generally it is responsible for bringing up the system, loading the kernel and initial tasks into memory, and preparing the environment for executing the kernel.

x86_64

On x86_64, Seed is an UEFI executable that utilises boot services to load the kernel and initial tasks. The Seed exectuable, the kernel, and other files are all held in the EFI System Partition (ESP) - a FAT filesystem present in all UEFI-booted systems.

riscv

On RiscV, Seed is more of a pre-kernel than a traditional bootloader. It is booted into by the system firmware, and then has its own set of drivers to load the kernel and other files from the correct filesystem, or elsewhere.

The boot mechanism has not yet been fully designed for RiscV, and also will heavily depend on the hardware target, as booting different platforms is much less standardised than on x86_64.

Kernel Objects

Kernel Objects are how Poplar represents resources that can be interacted with from userspace. They are all allocated a unique ID.

Handles

Handles are used to refer to kernel objects from userspace, and are allocated to a single Task. A handle of value 0 acts as a sentinel value that can be used for special meanings. From userspace, handles must be treated as opaque, 32-bit integers.

Debugging the kernel

Kernels can be difficult to debug - this page tries to collect useful techniques for debugging kernels in general, and also any Poplar specific things that might be useful.

Poplar specific: the breakpoint exception

The breakpoint exception is useful for inspecting the contents of registers at specific points, such as in sections of assembly (where it's inconvenient to call into Rust, or to use a debugger because getting global_asm! to play nicely with GDB is a pain).

Simply use the int3 instruction:

...

mov rsp, [gs:0x10]
int3  // Is my user stack pointer correct?
sysretq

Building OVMF

Building a debug build of OVMF isn't too hard (from the base of the edk2 repo):

OvmfPkg/build.sh -a X64

By default, debug builds of OVMF will output debugging information on the ISA debugcon, which is actually probably nicer for our purposes than most builds, which pass DEBUG_ON_SERIAL_PORT during the build. To log the output to a file, you can pass -debugcon file:ovmf_debug.log -global isa-debugcon.iobase=0x402 to QEMU.

System calls

Userspace code can interact with the kernel through system calls. Poplar's system call interface is based around 'kernel objects', and so many of the system calls are to create, destroy, or modify the state of various types of kernel object. Because of Poplar's microkernel design, many traditional system calls (e.g. open) are not present, their functionality instead being provided by userspace.

Each system call has a unique number that is used to identify it. A system call can then take up to five parameters, each a maximum in size of the system's register width. It can return a single value, also the size of a register.

Overview of system calls

NumberSystem callDescription
0yieldYield to the kernel.
1early_logLog a message. Designed to be used from early processes.
2get_framebufferGet the framebuffer that the kernel has created, if it has.
3create_memory_objectCreate a MemoryObject kernel object.
4map_memory_objectMap a MemoryObject into an AddressSpace.
5create_channelCreate a channel, returning handles to the two ends.
6send_messageSend a message down a channel.
7get_messageReceive the next message, if there is one.
8wait_for_messageYield to the kernel until a message arrives on the given
9register_serviceRegister yourself as a service.
10subscribe_to_serviceCreate a channel to a particular service provider.
11pci_get_infoGet information about the PCI devices on the platform.
12wait_for_eventYield to the kernel until an event is signalled
13poll_interestPoll a kernel object to see if changes need to be processed. TODO: this is an experiment and may not continue to exist.

Making a system call on x86_64

To make a system call on x86_64, populate these registers:

rdirsirdxr10r8r9
System call numberabcde

The only way in which these registers deviate from the x86_64 Sys-V ABI is that c is passed in r10 instead of rcx, because rcx is used by the syscall instruction. You can then make the system call by executing syscall. Before the kernel returns to userspace, it will put the result of the system call (if there is one) in rax. If a system call takes less than five parameters, the unused parameter registers will be preserved across the system call.

Return values

Often, a system call will need to return a status, plus one or more handles. The first handle a system call needs to return (often the only handle returned) can be returned in the upper bits of the status value:

  • Bits 0..32 contain the status:
    • 0 means that the system call succeeded, and the rest of the return value is valid
    • >0 means that the system call errored. The meaning of the value is system-call specific.
  • Bits 32..64 contain the value of the first returned handle, if applicable

A return value of 0xffffffffffffffff (the maximum value of u64) is reserved for when a system call is made with a number that does not correspond to a system call. This is defined as a normal error code (as opposed to, for example, terminating the task that tried to make the system call) to provide a mechanism for tasks to detect kernel support for a system call (so they can use a fallback method on older kernels, for example).

yield

Used by a task that can't do any work at the moment, allowing the kernel to schedule other tasks.

Parameters

None.

Returns

Always 0.

Capabilities needed

None.

early_log

Used by tasks that are started early in the boot process, before reliable userspace logging support is running. Output is logged to the same place as kernel logging.

Parameters

  • a - the length of the string to log in bytes. Maximum length is 4096 bytes.
  • b - a usermode pointer to the start of the UTF-8 encoded string.

Returns

  • 0 if the system call succeeded
  • 1 if the string was too long
  • 2 if the string was not valid UTF-8
  • 3 if the task making the syscall doesn't have the EarlyLogging capability

Capabilities needed

The EarlyLogging capability is needed to make this system call.

get_framebuffer

On many architectures, the bootloader or kernel can create a naive framebuffer using a platform-specific method. This framebuffer can be used to render from userspace, if a better hardware driver is not available on the platform.

Parameters

  • a should contain a mapped, writable, user-space address, to which information about the framebuffer will be written.

Returns

This system call returns three things:

  • A status code
  • A handle to a MemoryObject containing the framebuffer, if successful
  • Information about the framebuffer, if successful, written into the address in a

The status codes used are:

  • 0 means that the system call was successful
  • 1 means that the calling task does not have the correct capability
  • 2 means that a does not contain a valid address for the kernel to write to
  • 3 means that the kernel did not create the framebuffer

The information written back to the address in a has the following structure:

#![allow(unused)]
fn main() {
#[repr(C)]
struct FramebufferInfo {
    width: u16,
    height: u16,
    stride: u16,
    /// 0 = RGB32
    /// 1 = BGR32
    pixel_format: u8,
}
}

Capabilities needed

Tasks need the GetKernelFramebuffer capability to use this system call.

create_memory_object

Create a MemoryObject kernel object. Userspace can only create "blank" MemoryObjects (that are allocated to free, conventional physical memory). MemoryObjects that point to special objects (e.g. framebuffer data, PCI configuration spaces) must be created by the kernel.

Parameters

  • a - the size of the MemoryObject's memory area (in bytes)
  • b - flags:
    • Bit 0: set if the memory should be writable
    • Bit 1: set if the memory should be executable
  • c - a pointer to which the kernel will write the physical address to which the MemoryObject was allocated. Ignored if null.

Returns

Uses the standard representation to return a Result<Handle, MemoryObjectError> method. Error status codes are:

  • 1 if the given virtual address is invalid
  • 2 if the given set of flags are invalid
  • 3 if memory of the requested size could not be allocated
  • 4 if the pointer to write the allocated physical address to was not valid

Capabilities needed

None.

map_memory_object

Map a MemoryObject into an AddressSpace.

Parameters

  • a - a handle to the MemoryObject.
  • b - a handle to the AddressSpace. The zero handle indicates to map the memory object into the task's AddressSpace.
  • c - the virtual address to map the MemoryObject at, if it should be mapped at a specific address. If null, the kernel will attempt to find a suitable address to map it at, and write that address to the pointer supplied in d.
  • d - the pointer at which the virtual address the object is mapped at will be written to, if c is null. If an address is supplied in c, this pointer does not need to be valid, and will not be accessed. If this pointer is null, the address will not be written, even if the kernel allocated memory for the object.

Returns

  • 0 if the system call succeeded
  • 1 if either of the passed handles are invalid
  • 2 if the portion of the AddressSpace that would be mapped is already occupied by another MemoryObject
  • 3 if the supplied MemoryObject handle does not point to a MemoryHandle
  • 4 if the supplied AddressSpace handle does not point to an AddressSpace
  • 5 if the supplied pointer in d is invalid, and c is null

Capabilities needed

None (this may change in the future).

create_channel

Create a Channel kernel object. Channels are slightly odd kernel objects in that they must be referred to in userspace by two handles, one for each "end" of the channel. This system call therefore returns two handles, one of which is usually transferred to another task.

Parameters

  • a - the virtual address to write the second handle into (only one can be returned in the status)

Returns

Uses the standard representation to return a Result<Handle, CreateChannelError> method. Error status codes are:

  • 1 if the passed virtual address is not valid

TODO: if we ditch the ability to return an error (i.e. by making this infallible, or by saying that a null handle denotes an error but not which one), we could return both handles in the status.

Capabilities needed

None.

send_message

Send a message, consisting of a number of bytes and optionally a number of handles, down a Channel. All the handles are removed from the sending Task and added to the receiving Task.

A maximum of 4 handles can be transferred by each message. The maximum number of bytes is currently 4096.

Parameters

  • a - the handle to the Channel end that is sending the message. The handle must have the SEND right.
  • b - a pointer to the array of bytes to send
  • c - the number of bytes to send
  • d - a pointer to the array of handle entries to transfer. All handles must have the TRANSFER right. This may be 0x0 if the message does not transfer any handles.
  • e - the number of handles to send

Returns

A status code:

  • 0 if the system call succeeded and the message was sent
  • 1 if the Channel handle is invalid
  • 2 if the Channel handle does not point to a Channel
  • 3 if the Channel handle does not have the correct rights to send messages
  • 4 if one or more of the handles to transfer is invalid
  • 5 if any of the handles to transfer do not have the correct rights
  • 6 if the pointer to the message bytes was not valid
  • 7 if the message's byte array is too large
  • 8 if the pointer to the handles array was not valid
  • 9 if the handles array is too large
  • 10 if the other end of the Channel has been disconnected

Capabilities needed

None.

get_message

Receive a message from a Channel, if one is waiting to be received.

A maximum of 4 handles can be transferred by each message. The maximum number of bytes is currently 4096.

Parameters

  • a - the handle to the Channel end that is receiving the message. The handle must have the RECEIVE right.
  • b - a pointer to the array of bytes to put the message into
  • c - the size of the bytes buffer
  • d - a pointer to the array of handle entries to transfer. This may be 0x0 if the receiver does not expect to receive any handles.
  • e - the size of the handles buffer (in handles)

Returns

Bits 0..16 are a status code:

  • 0 if the message was received successfully. The rest of the return value is valid.
  • 1 if the Channel handle is invalid.
  • 2 if the Channel handle does not point to a Channel.
  • 3 if there was no message to receive.
  • 4 if the address of the bytes buffer is invalid.
  • 5 if the bytes buffer is too small to contain the message.
  • 6 if the address of the handles buffer is invalid, or if 0x0 was passed and the message does contain handles.
  • 7 if the handles buffer is too small to contain the handles transferred with the message.

If the status code is 0 (i.e. a valid message was written into the bytes and handles buffers), the return value also contains the number of valid entries in both the byte and handle buffers:

  • Bits 16..32 contain the length of the valid byte buffer (in bytes). If the passed buffer was larger than this, the remaining bytes have not been written by the kernel.
  • Bits 32..48 contain the length of the valid handles buffer (in handles). If the passed buffer was larger than this, the remaining bytes have not been written by the kernel.

Capabilities needed

None.

register_service

Register yourself as the provider of a service. The name of the service will be {task_name}.{service_name}. This returns a channel that is used to alert the provider when another task subscribes to your service with the subscribe_to_service system call.

See the section on Services for more information about services, how to register a service, and how to subscribe to a service.

Parameters

  • a - the length of the name string in bytes. Maximum length is 256. Must be greater than 0.
  • b - a usermode pointer to the start of the UTF-8 encoded name string.

Returns

Returns the standard representation of a Result<Handle, ServiceError>. Error status codes are:

  • 1 if the task does not have the correct capability
  • 2 if the usermode pointer to the name is not valid
  • 3 if the name is too long, or 0

The returned handle is to a Channel that is used to serve channel subscriptions.

Capabilities needed

The ServiceProvider capability is needed to make this system call.

subscribe_to_service

Subscribe to a registered service by name. This will deliver a notification to the task that registered the service with one end of a newly created channel. The other end of the channel will be returned by this system call, if successful.

See the section on Services for more information about services, how to register a service, and how to subscribe to a service.

Parameters

  • a - the length of the name string in bytes. Maximum length is 256. Must be greater than 0.
  • b - a usermode pointer to the start of the UTF-8 encoded name string.

Returns

Returns the standard representation of a Result<Handle, ServiceError>. Error status codes are:

  • 1 if the task does not have the correct capability
  • 2 if the usermode pointer to the name is not valid
  • 3 if the name is too long, or 0
  • 4 if the supplied name does not correspond to a registered channel.

The returned handle is to one end of a Channel, the other end of which has been given to the task that supplies the service.

Capabilities needed

The ServiceUser capability is needed to make this system call.

pci_get_info

Get information about the PCI devices on the platform. This is only meant to be used from the userspace PCI bus driver.

TODO: detail structure of PCI descriptor

Parameters

  • a - a pointer to the buffer to put the PCI descriptors in
  • b - the size of the buffer (in descriptors)

Returns

Bits 0..16 contain a status code:

  • 0 if the system call succeeded
  • 1 if the task does not have the correct capabilities
  • 2 if the given buffer can't hold all the descriptors
  • 3 if the address to the descriptor buffer is invalid
  • 4 if the platform doesn't support PCI

If the status code is 0 (i.e. the system call succeeded), bits 16..48 contain the number of descriptors written back. If the status code is 2 (i.e. the buffer was not large enough), bits 16..48 contain the number of entries that need to be written.

If a is 0x0, this system call will always fail with status code 2 and the number of descriptors in bits 16..48. This is to allow userspace to dynamically allocate a buffer of the correct size, if it desires.

Capabilities needed

Tasks need the PciBusDriver capability to use this system call.

Poplar's userspace

Poplar supports running programs in userspace on supporting architectures. This offers increased protection and separation compared to running code in kernelspace - as a microkernel, Poplar tries to run as much code in userspace as possible.

Building a program for Poplar's userspace

Currently, the only officially supported language for writing userspace programs is Rust.

Target

Poplar provides custom target files for userspace programs. These are found in the user/{arch}_poplar.toml files.

Standard library

Poplar provides a Rust crate, called std, which replaces Rust's standard library. We've done this for a few reasons:

  • We originally had targets and a std port in a fork of rustc. This proved difficult to maintain and required users to build a custom Rust fork and add it as a rustup toolchain. This is a high barrier of entry for anyone wanting to try Poplar out.
  • Poplar's ideal standard library probably won't end up looking very similar to other platform's, as there are significant ideological differences in how programs should interact with the OS. This is unfortunate from a porting point of view, but does allow us to design the platform interface from the group up.

The name of the crate is slightly unfortunate, but is required, as rustc uses the name of the crate to decide where to import the prelude from. This significantly increases the ergonomics we can provide, so is worth the tradeoff.

The std crate does a few important things that are worth understanding to reduce the 'magic' of Poplar's userspace:

  • It provides a linker script - the linker script for the correct target is shipped as part of the crate, and then the build script copies it into the Cargo OUT_DIR. It also passes a directive to rustc such that you can simply pass -Tlink.ld to link with the correct script. This is, for example, done using RUSTFLAGS by Poplar's xtask, but you can also pass it manually or with another method, depending on your build system.

Capabilities

Capabilities describe what a task is allowed to do, and are encoded in its image. This allows users to audit the permissions of the tasks they run at a much higher granularity than user-based permissions, and also allow us to move parts of the kernel into discrete userspace tasks by creating specialised capabilities to allow access to sensitive resources (such as the raw framebuffer) to only select tasks.

Encoding capabilities in the ELF image

Capabilities are encoded in an entry of a PT_NOTE segment of the ELF image of a task. This entry will have an owner (sometimes referred to in documentation as the 'name') of POPLAR and a type of 0. The descriptor will be an encoding of the capabilities as described by the 'Format' section. The descriptor must be padded such that the next descriptor is 4-byte aligned, and so a value of 0x00 is reserved to be used as padding.

Initial images (tasks loaded by the bootloader before filesystem drivers are working) are limited to a capabilities encoding of 32 bytes (given the variable-length encoding, this does not equate to a fixed maximum number of capabilities).

Format

The capabilities format is variable-length - simple capabilities can be encoded as a single byte, while more complex / specific ones may need multiple bytes of prefix, and can also encode fixed-length data.

Overview of capabilities

This is an overview of all the capabilities the kernel supports:

First byteNext byte(s)DataArch specific?Description
0x00---No meaning - used to pad descriptor to required length (see above)
0x01NoGetFramebuffer
0x02NoEarlyLogging
0x03NoServiceProvider
0x04NoServiceUser
0x05--NoPciBusDriver

Userspace memory map (x86_64)

x86_64 features an enormous 256TB virtual address space, most of which is available to userspace processes under Poplar. For this reason, things are spread throughout the virtual address space to make it easy to identify what a virtual address points to.

Userspace stacks

Within the virtual address space, the userspace stacks are allocated a 4GB range. Each task has a maximum stack size of 2MB, which puts a limit of 2048 tasks per address space.

Message Passing

Poplar has a kernel object called a Channel for providing first-class message passing support to userspace. Channels move packets, called "messages", which contain a stream of bytes, and optionally one or more handles that are transferred from the sending task to the receiving task.

Ptah

Channels can move arbitrary bytes, but Poplar also includes a layer on top of of Channels called Ptah, which consists of a data model and wire format suitable for encoding data which can be serialized and deserialized from any sensible language without too much difficulty.

Ptah is heavily inspired by Serde, and the first implementation of Ptah was actually a Serde data format. Unfortunately, it made properly handling Poplar handles very difficult - when a handle is sent over a channel, it needs to be put into a separate array, and the in-line data replaced by an index into that array. When the message travels over a task boundary, the kernel examines and replaces each handle in this array with a handle to the same kernel object in the new task. This effectively means we need to add a new Handle type to our data model, which is not easily possible with Serde (and would make it incompatible with standard Serde anyway).

The Ptah Data Model

The Ptah data model maps pretty well to the Rust type system, and relatively closely to the Serde data model. Key differences are some stronger guarantees about the encoding of types such as enums (the data model only needs to fit a single wire format, and so can afford to be less flexible than Serde's), and the lack of a few types - unit-based types, and the statically-sized version of seq and map - tuple and struct. Ptah is not a self-describing format (i.e. the types you're trying to deserialize is fully known), so the elements of structs and tuples can simply be serialized in the order they appear, and then deserialized in order at the other end.

  • Primitive types
    • bool
    • u8, u16, u32, u64, u128
    • i8, i16, i32, i64, i128
    • f32, f64
    • char
  • string
    • Encoded as a seq of u8, but with the additional requirement that it is valid UTF-8
    • Not null terminated, as seq includes explicit length
  • option
    • Encoded in the same way as an enum, but separately for the benefit of languages without proper enums
    • Either None or Some({value})
  • enum
    • Include a tag, and optionally some data
    • Represent a Rust enum, or a tagged union in languages without proper enums
    • The data is encoded separately to the tag, and can be of any other Ptah type:
      • Rust tuple variants (e.g. E::A(u8, u32)) are represented by tuple
      • Rust struct variants (e.g. E::B { foo: u8, bar: u32 }) are represented by struct
  • seq
    • A variable-length sequence of values, mapping to many types such as Vec<T>.
  • map
    • A variable-length series of key-value pairings, mapping to collections like BTreeMap<K, V>.
  • handle
    • This is the type that means we need our own data model in the first place
    • These are encoded out-of-line of the rest of the data, so that the Poplar kernel can introspect into them, if it needs to

The Ptah Wire Format

The wire format describes how messages can be encoded into a stream of bytes suitable for transmission over a channel, or over another transport layer such as the network or a serial port.

Primitives

Primitives are transmitted as little-endian and packed to their natural alignment. The following primitive types are recognised:

NameSize (bytes)Description
bool1A boolean value
u8, u16, u32, u64, u1281, 2, 4, 8An unsigned integer
i8, i16, i32, i64, i1281, 2, 4, 8A signed integer
f32, f644, 8Single / double-precision IEEE-754 FP values
char4A single UTF-8 Unicode scalar value

Journal

This is just a place I put notes that I make during development.

Building a rustc target for Poplar

We want a target in rustc for building userspace programs for Poplar. It would be especially cool to get it merged as an upstream Tier-3 target. This documents my progress, mainly as a reference for me to remember how it all works.

How do I actually build and use rustc?

A useful baseline invocation for normal use is:

./x.py build -i library/std

The easiest way to test the built rustc is to create a rustup toolchain (from the root of the Rust repo):

rustup toolchain link poplar build/{host triple}/stage1     # If you built a stage-1 compiler (default with invocation above)
rustup toolchain link poplar build/{host triple}/stage2     # If you built a stage-2 compiler

It's easiest to call your toolchain poplar, as this is the name we use in the Makefiles for now.

You can then use this toolchain from Cargo anywhere on the system with:

cargo +poplar build     # Or whatever command you need

Using a custom LLVM

  • Fork Rust's llvm-project
  • cd src/llvm_project
  • git remote add my_fork {url to your custom LLVM's repo}
  • git fetch my_fork
  • git checkout my_fork/{the correct branch}
  • cd ..
  • git add llvm-project
  • git commit -m "Move to custom LLVM"

Things to change in config.toml

This is as of 2020-09-29 - you need to remember to keep the config.toml up-to-date (as it's not checked in upstream), and can cause confusing errors when it's out-of-date.

  • download-ci-llvm = true under [llvm]. This makes the build much faster, since we don't need a custom LLVM.
  • assertions = true under [llvm]
  • incremental = true under [rust]
  • lld = true under [rust]. Without this, the toolchain can't find rust-lld when linking.
  • llvm-tools = true under [rust]. This probably isn't needed, I just chucked it in in case rust-lld needs it.

Adding the target

I used a slightly different layout to most targets (which have a base, which creates a TargetOptions, and then a target that modifies and uses those options).

  • Poplar targets generally need a custom linker script. I added one at compiler/rustc_target/src/spec/x86_64_poplar.ld.
  • Make a module for the target (I called mine compiler/rustc_target/src/spec/x86_64_poplar.rs). Copy from a existing one. Instead of a separate poplar_base.rs to create the TargetOptions, we do it in the target itself. We include_str! the linker script in here, so it's distributed as part of the rustc binary.
  • Add the target in the supported_targets! macro in compiler/rustc_target/src/spec/mod.rs.

Adding the target to LLVM

I don't really know my way around the LLVM code base, so this was fairly cobbled together:

  • In llvm/include/llvm/ADT/Triple.h, add a variant for the OS in the OSType enum. I called it Poplar. Don't make it the last entry, to avoid having to change the LastOSType variant.
  • In llvm/lib/Support/Triple.cpp, in the function Triple::getOSTypeName, add the OS. I added case Poplar: return "poplar";.
  • In the same file, in the parseOS function, add the OS. I added .StartsWith("poplar", Triple::Poplar).
  • This file also contains a function, getDefaultFormat, that gives the default format for a platform. The default is ELF, so no changes were needed for Poplar, but they might be for another OS.

TIP: When you make a change in the llvm-project submodule, you will need to commit these changes, and then update the submodule in the parent repo, or the bootstrap script will checkout the old version (without your changes) and build an entire compiler without the changes you are trying to test.

NOTE: to avoid people from having to switch to our llvm-project fork, we don't actually use our LLVM target from rustc (yet). I'm not sure why you need per-OS targets in LLVM, as it doesn't even seem to let us do any of the things we wanted to (this totally might just be me not knowing how LLVM targets work).

Notes

  • We needed to change the entry point to _start, or it silently just doesn't emit any sections in the final image.
  • By default, it throws away our .caps sections. We need a way to emit it regardless - this is done by manually creating the program header and specifying that they should be kept with KEEP. There are two possible solutions that I can see: make rustc emit a linker script, or try and introduce these ideas into llvm/lld with our target (I'm not even sure this is possible).
  • It looks like lld has no OS-specific code at all, and the only place that specifically-kept sections are added is in the linker script parser. Looks like we might have to actually create a file-based linker script (does literally noone else need to pass a linker script by command line??).

USB

USB has a host that sends requests to devices (devices only respond when asked something). Some devices are dual role devices (DRD) (previously called On-The-Go (OTG) devices), and can dynamically negotiate whether they're the host or the device.

Each device can have one ore more interfaces, which each have one or more endpoints. Each endpoint has a hardcoded direction (host-to-device or device-to-host). There are a few types of endpoint (the type is decided during interface configuration):

  • Control endpoints are for configuration and control requests
  • Bulk endpoints are for bulk transfers
  • Isochronous endpoints are for periodic transfers with a reserved bandwidth
  • Int endpoints are for transfers triggered by interruptions

The interfaces and endpoints a device has are described by descriptors reported by the device during configuration.

Every device has a special endpoint called ep0. It's an in+out control endpoint, and is used to configure the other endpoints.

RISC-V

Building OpenSBI

OpenSBI is the reference implementation for the Supervisor Binary Interface (SBI). It's basically how you access M-mode functionality from your S-mode bootloader or kernel.

Firstly, we need a RISC-V C toolchain. On Arch, I installed the riscv64-unknown-elf-binutils AUR package. I also tried to install the riscv64-unknown-elf-gcc package, but this wouldn't work, so I built OpenSBI with Clang+LLVM instead with (from inside lib/opensbi):

make PLATFORM=generic LLVM=1

This can be tested on QEMU with:

qemu-system-riscv64 -M virt -bios build/platform/generic/firmware/fw_jump.elf

It also seems like you can build with a platform of qemu/virt - I'm not sure what difference this makes yet but guessing it's the hardware it assumes it needs to drive? Worth exploring. (Apparently the generic image is doing dynamic discovery (I'm assuming from the device tree) so that sounds good for now).

So the jump firmware (fw_jump.elf) jumps to a specified address in memory (apparently QEMU can load an ELF which would be fine initially). Other option would be a payload firmware, which bundles your code into the SBI image (assuming as a flat binary) and executes it like that.

We should probably make an xtask step to build OpenSBI and move it to the bundled directory, plus decide what sort of firmware / booting strategy we're going to use. Then the next step would be some Rust code that can print to the serial port, to prove it's all working.

QEMU virt memory map

Seems everything is memory-mapped, which makes for a nice change coming from x86's nasty port thingy. This is the virt machine's one (from the QEMU source...):

RegionAddressSize
Debug0x00x100
MROM0x10000x11000
Test0x1000000x1000
CLINT0x0200_00000x10000
PCIe PIO0x0300_00000x10000
PLIC0x0c00_00000x4000000
UART00x1000_00000x100
Virtio0x1000_10000x1000
Flash0x2000_00000x4000000
PCIe ECAM0x3000_00000x10000000
PCIe MMIO0x4000_00000x40000000
DRAM0x8000_0000{mem size}

Getting control from OpenSBI

On QEMU, we can get control from OpenSBI by linking a binary at 0x80200000, and then using -kernel to automatically load it at the right location. OpenSBI will then jump to this location with the HART ID in a0 and a pointer to the device tree in a1.

However, this does make setting paging up slightly icky, as has been a problem on other architectures. Basically, the first binary needs to be linked at a low address with bare translation, and then we need to construct page tables and enable translation, then jump to a higher address. I'm thinking we might as well do it in two stages: a Seed stage that loads the kernel and early tasks from the filesystem/network/whatever, builds the kernel page tables, and then enters the kernel and can be unloaded at a later date. The kernel can then be linked normally at its high address without faffing around with a bootstrap or anything.

The device tree

So the device tree seems to be a data structure passed to you that tells you about the hardware present / memory etc. Hopefully it's less gross than ACPI eh. Repnop has written a crate, fdt, so I think we're just going to use that.

So fdt seems to work fine, we can list memory regions etc. The only issue seems to be that memory_reservations doesn't return anything, which is kind of weird. There also seems to be a /reserved-memory node, but this suggests that this doesn't include stuff we want like which part of memory OpenSBI resides in.

This issue says Linux just assumes it shouldn't touch anything before it was loaded. I guess we could use the same approach, reserving the memory used by Seed via linker symbols, and then seeing where the loader device gets put to reserve the ramdisk, but the issue was closed saying OpenSBI now does the reservations correctly which would be cleaner, but doesn't stack up with what we're seeing.

Ah so actually, /reserved-memory does seem to have some of what we need. On QEMU there is one child node, called mmode_resv@80000000, which would fit with being the memory OpenSBI is in. We would still need to handle the memory we're in, and idk what happens with the loader device yet, but it's a start. Might be worth talking to repnop about whether the crate should use this node.

Dumb way to load the kernel for now

So for some reason, fw_cfg doesn't seem to be working on QEMU 7.1. This is what we were gonna use for loading the kernel, command line, and user programs, etc. but obvs this is not possible atm. For now, as a workaround, we can use the loader device to load arbitrary data and files into memory.

I'm thinking we could use 0x1_0000_0000 as the base physical address for this - this gives us a maximum of 2GiB of DRAM, which seems plenty for now (famous last words). We'll need to know the size of the object we're loading on the guest-side, so we'll load that separately for now (in the future, this whole scheme could be extended to some sort of mini-filesystem).

Okay so the loader device is pretty finnicky, and has no error handling. Turns out you can't define new memory with it, just load values into RAM, but it doesn't actually tell you this has failed. You then try and read this on the guest, and get super wierd UB from doing so - it doesn't just fault or whatever, it seems to break code before you ever read the memory (super weird ngl, didn't stick around to work out what was going on).

Right, seems to be working much better by actually putting the values in RAM. We've extended RAM to 1GiB (0x8000_0000..0xc000_0000) and we'll use this as the new layout:

AddressDescriptionSize (bytes)
0xb000_0000Size of Data4
0xb000_0004DataN