Introduction

Welcome to the Poplar Book, which serves as the main source of documentation for Poplar. The Book aims to be both a 10,000-meter overview of Poplar for the interested observer, and a definitive reference for the inner workings of the kernel and userspace.

Please note that this book (like the rest of the OS!) is still very early in development and not at all complete. If anything is unclear, please file an issue!

This book is not always up to date, and needs to document a lot more to be very useful. If you have questions not covered by the book, please don't hesitate to contact me through other channels and I'll do my best to answer them. I aim to keep this book up to date, but I do not have enough time to make that a reality at the moment - sorry.

What is Poplar?

At heart, Poplar is a microkernel written in the Rust programming language. Poplar becomes an "OS" when it's combined with other packages such as drivers, filesystems and user applications.

Poplar is designed to be a modern microkernel, supporting a minimal system call interface and first-class support for message-passing-based IPC between userspace processes. Versatile message-passing allows Poplar to move much more out of the kernel than traditionally possible. For example, the kernel has no concept of a filesystem or of files - instead, the VFS and all filesystems are implemented entirely in userspace, and files are read and written to by passing messages.

Why Rust?

While Poplar's design is in theory language-agnostic, the implementation is very tied to Rust. Rust is a systems programming language with a rich type system and a novel ownership model that guarantees memory and thread safety in safe code. This qualification is important, as Poplar uses a lot of unsafe code out of necessity - it's important to understand that the use of Rust does not in any way mean that Poplar is automatically bug-free.

However, Rust makes you think a lot more about how to make your programs safe, which is exactly the sort of code we want to be writing for a kernel. This focus on safety, as well as good ergonomics features and performance, makes Rust perfect for OS-level code.

The Poplar Kernel

TODO

Platforms

A platform is a build target for the kernel. In some cases, there is only one platform for an entire architecture because the hardware is relatively standardized (e.g. x86_64). Other times, hardware is different enough between platforms that it's easier to treat them as different targets (e.g. a headless ARM server that boots using UEFI, versus a Raspberry Pi).

Platform: x86_64

The vast majority of x86_64 hardware is pretty similar, and so is treated as a single platform. It uses the hal_x86_64 HAL. We assume that the platform:

  • Boots using UEFI
  • Supports the APIC
  • Supports the xsave instruction

Efiloader

efiloader is the bootloader for Poplar on x86_64. It utilities UEFI boot-services to load the kernel and any extra images needed into memory, allocate memory for the heap, configure a basic framebuffer, and enter the kernel.

Description of booting process

A rough order of the steps that efiloader performs is:

  • Parses a set of load options passed to the loader, allowing the user to instruct it on how to load the kernel
  • Finds the physical address of the RSDP, so the kernel can find the ACPI tables
  • Creates a basic framebuffer using the UEFI GOP (Graphics Output Protocol), if requested
  • Allocate and map a heap for the kernel to use
  • Load any additional images needed from the filesystem
  • Constructs some "boot info", including a map of physical memory, telling the kernel about the hardware
  • Jumps into the kernel

Load options

A series of load options may be supplied to efiloader to tell it how Poplar should be booted. These options consist of a string of space separated key-value pairs, of the form a.dot.separated.key=value. Supported keys, plus descriptions of their values, are:

KeyExample valueDescription
kernelkernel.elfThe path within the ESP that the kernel should be loaded from.
fb.noneNo valueSpecify that a GOP framebuffer should not be created.
fb.width1920Specify that a GOP framebuffer should be created, and its width.
fb.height1080Specify that a GOP framebuffer should be created, and its height.
image.{name}my_task.elfSpecifies a path that an additional image should be loaded from. The key is the name that is passed in the boot info.

If no load options are supplied, a kernel will be loaded from \kernel.elf, no additional images will be loaded, and a GOP framebuffer with a width of 800 and a height of 600 will be created.

Kernel Objects

Kernel Objects are how Poplar represents resources that can be interacted with from userspace. They are all allocated a unique ID.

Handles

Handles are used to refer to kernel objects from userspace, and are allocated to a single Task. A handle of value 0 acts as a sentinel value that can be used for special meanings. From userspace, handles must be treated as opaque, 32-bit integers.

System calls

Userspace code can interact with the kernel through system calls. Poplar's system call interface is based around 'kernel objects', and so many of the system calls are to create, destroy, or modify the state of various types of kernel object. Because of Poplar's microkernel design, many traditional system calls (e.g. open) are not present, their functionality instead being provided by userspace.

Each system call has a unique number that is used to identify it. A system call can then take up to five parameters, each a maximum in size of the system's register width. It can return a single value, also the size of a register.

Overview of system calls

NumberSystem callDescription
0yieldYield to the kernel.
1early_logLog a message. Designed to be used from early processes.
2get_framebufferGet the framebuffer that the kernel has created, if it has.
3create_memory_objectCreate a MemoryObject kernel object.
4map_memory_objectMap a MemoryObject into an AddressSpace.
5create_channelCreate a channel, returning handles to the two ends.
6send_messageSend a message down a channel.
7get_messageReceive the next message, if there is one.
8wait_for_messageYield to the kernel until a message arrives on the given channel (WIP)
9register_serviceRegister yourself as a service.
10subscribe_to_serviceCreate a channel to a particular service provider.
11pci_get_infoGet information about the PCI devices on the platform.

Making a system call on x86_64

To make a system call on x86_64, populate these registers:

rdirsirdxr10r8r9
System call numberabcde

The only way in which these registers deviate from the x86_64 Sys-V ABI is that c is passed in r10 instead of rcx, because rcx is used by the syscall instruction. You can then make the system call by executing syscall. Before the kernel returns to userspace, it will put the result of the system call (if there is one) in rax. If a system call takes less than five parameters, the unused parameter registers will be preserved across the system call.

Return values

Often, a system call will need to return a status, plus one or more handles. The first handle a system call needs to return (often the only handle returned) can be returned in the upper bits of the status value:

  • Bits 0..32 contain the status:
    • 0 means that the system call succeeded, and the rest of the return value is valid
    • >0 means that the system call errored. The meaning of the value is system-call specific.
  • Bits 32..64 contain the value of the first returned handle, if applicable

A return value of 0xffffffffffffffff (the maximum value of u64) is reserved for when a system call is made with a number that does not correspond to a system call. This is defined as a normal error code (as opposed to, for example, terminating the task that tried to make the system call) to provide a mechanism for tasks to detect kernel support for a system call (so they can use a fallback method on older kernels, for example).

Debugging the kernel

Kernels can be difficult to debug - this page tries to collect useful techniques for debugging kernels in general, and also any Poplar specific things that might be useful.

Using GDB

Firstly, start GDB with (this is just an example, alter e.g. paths as needed):

tools/rust_gdb -q "build/Poplar/fat/kernel.elf" -ex "target remote :1234"

Note that the rust_gdb script is used instead of invoking GDB directly - this installs various plugins to make life easier.

A few tips for using GDB specific/helpful for kernel debugging:

  • QEMU will not run any code (even the firmware) until you run continue in GDB. This allows you to place breakpoints before any code runs.
  • By default, the make gdb recipe will use KVM acceleration. This means that software breakpoints (created by break) will not work. Use hardware-assisted breakpoints (created with hbreak) instead.
  • To step through assembly, you must use si instead of s
  • Use tui enable to move to the TUI, and then layout regs to show both general registers and source

Emulate with a custom build of QEMU

For particularly tricky issues, it can sometimes be useful to insert printfs in QEMU and see if they trigger when emulating Poplar. The Makefile makes this easy - run something like:

QEMU_DIR='~/qemu/build/x86_64-softmmu/' make qemu-no-kvm

where the location pointed to by QEMU_DIR is the build destination of the correct QEMU executable. A lot of the time, the printfs you've inserted will only trigger with TCG, so it's usually best to use qemu-no-kvm.

Poplar specific: the breakpoint exception

The breakpoint exception is useful for inspecting the contents of registers at specific points, such as in sections of assembly (where it's inconvenient to call into Rust, or to use a debugger because getting global_asm! to play nicely with GDB is a pain).

Simply use the int3 instruction:

...

mov rsp, [gs:0x10]
int3  // Is my user stack pointer correct?
sysretq

Building OVMF

Building a debug build of OVMF isn't too hard (from the base of the edk2 repo):

OvmfPkg/build.sh -a X64

By default, debug builds of OVMF will output debugging information on the ISA debugcon, which is actually probably nicer for our purposes than most builds, which pass DEBUG_ON_SERIAL_PORT during the build. To log the output to a file, you can pass -debugcon file:ovmf_debug.log -global isa-debugcon.iobase=0x402 to QEMU.

yield

Used by a task that can't do any work at the moment, allowing the kernel to schedule other tasks.

Parameters

None.

Returns

Always 0.

Capabilities needed

None.

early_log

Used by tasks that are started early in the boot process, before reliable userspace logging support is running. Output is logged to the same place as kernel logging.

Parameters

  • a - the length of the string to log in bytes. Maximum length is 4096 bytes.
  • b - a usermode pointer to the start of the UTF-8 encoded string.

Returns

  • 0 if the system call succeeded
  • 1 if the string was too long
  • 2 if the string was not valid UTF-8
  • 3 if the task making the syscall doesn't have the EarlyLogging capability

Capabilities needed

The EarlyLogging capability is needed to make this system call.

get_framebuffer

On many architectures, the bootloader or kernel can create a naive framebuffer using a platform-specific method. This framebuffer can be used to render from userspace, if a better hardware driver is not available on the platform.

Parameters

  • a should contain a mapped, writable, user-space address, to which information about the framebuffer will be written.

Returns

This system call returns three things:

  • A status code
  • A handle to a MemoryObject containing the framebuffer, if successful
  • Information about the framebuffer, if successful, written into the address in a

The status codes used are:

  • 0 means that the system call was successful
  • 1 means that the calling task does not have the correct capability
  • 2 means that a does not contain a valid address for the kernel to write to
  • 3 means that the kernel did not create the framebuffer

The information written back to the address in a has the following structure:

#![allow(unused)]
fn main() {
#[repr(C)]
struct FramebufferInfo {
    width: u16,
    height: u16,
    stride: u16,
    /// 0 = RGB32
    /// 1 = BGR32
    pixel_format: u8,
}
}

Capabilities needed

Tasks need the GetKernelFramebuffer capability to use this system call.

create_memory_object

Create a MemoryObject kernel object. Userspace can only create "blank" MemoryObjects (that are allocated to free, conventional physical memory). MemoryObjects that point to special objects (e.g. framebuffer data, PCI configuration spaces) must be created by the kernel.

Parameters

  • a - the virtual address to map the MemoryObject at
  • b - the size of the MemoryObject's memory area (in bytes)
  • c - flags:
    • Bit 0: set if the memory should be writable
    • Bit 1: set if the memory should be executable
  • d - a pointer to which the kernel will write the physical address to which the MemoryObject was allocated. Ignored if null.

Returns

Uses the standard representation to return a Result<Handle, MemoryObjectError> method. Error status codes are:

  • 1 if the given virtual address is invalid
  • 2 if the given set of flags are invalid
  • 3 if memory of the requested size could not be allocated
  • 4 if the pointer to write the allocated physical address to was not valid

Capabilities needed

None.

map_memory_object

Map a MemoryObject into an AddressSpace.

Parameters

  • a - a handle to the MemoryObject.
  • b - a handle to the AddressSpace. The zero handle indicates to map the memory object into the task's AddressSpace.
  • c - the virtual address to map the MemoryObject at, if it does not need to be mapped at a specific address. Should be null if the MemoryObject supplies the address.
  • d - a pointer to which the kernel will write the virtual address at which the MemoryObject was mapped. Ignored if null.

Returns

  • 0 if the system call succeeded
  • 1 if either of the passed handles are invalid
  • 2 if the portion of the AddressSpace that would be mapped is already occupied by another MemoryObject
  • 3 if the supplied MemoryObject handle does not point to a MemoryHandle
  • 4 if the supplied AddressSpace handle does not point to an AddressSpace
  • 5 if the pointer to write the virtual address back to is invalid
  • 6 if a virtual address to map at was supplied, but the MemoryObject needs to be mapped at a specific address.
  • 7 if a virtual address was not supplied, but the MemoryObject does not specify the address to map it at.

Capabilities needed

None (this may change in the future).

create_channel

Create a Channel kernel object. Channels are slightly odd kernel objects in that they must be referred to in userspace by two handles, one for each "end" of the channel. This system call therefore returns two handles, one of which is usually transferred to another task.

Parameters

  • a - the virtual address to write the second handle into (only one can be returned in the status)

Returns

Uses the standard representation to return a Result<Handle, CreateChannelError> method. Error status codes are:

  • 1 if the passed virtual address is not valid

TODO: if we ditch the ability to return an error (i.e. by making this infallible, or by saying that a null handle denotes an error but not which one), we could return both handles in the status.

Capabilities needed

None.

send_message

Send a message, consisting of a number of bytes and optionally a number of handles, down a Channel. All the handles are removed from the sending Task and added to the receiving Task.

A maximum of 4 handles can be transferred by each message. The maximum number of bytes is currently 4096.

Parameters

  • a - the handle to the Channel end that is sending the message. The handle must have the SEND right.
  • b - a pointer to the array of bytes to send
  • c - the number of bytes to send
  • d - a pointer to the array of handle entries to transfer. All handles must have the TRANSFER right. This may be 0x0 if the message does not transfer any handles.
  • e - the number of handles to send

Returns

A status code:

  • 0 if the system call succeeded and the message was sent
  • 1 if the Channel handle is invalid
  • 2 if the Channel handle does not point to a Channel
  • 3 if the Channel handle does not have the correct rights to send messages
  • 4 if one or more of the handles to transfer is invalid
  • 5 if any of the handles to transfer do not have the correct rights
  • 6 if the pointer to the message bytes was not valid
  • 7 if the message's byte array is too large
  • 8 if the pointer to the handles array was not valid
  • 9 if the handles array is too large
  • 10 if the other end of the Channel has been disconnected

Capabilities needed

None.

get_message

Receive a message from a Channel, if one is waiting to be received.

A maximum of 4 handles can be transferred by each message. The maximum number of bytes is currently 4096.

Parameters

  • a - the handle to the Channel end that is receiving the message. The handle must have the RECEIVE right.
  • b - a pointer to the array of bytes to put the message into
  • c - the size of the bytes buffer
  • d - a pointer to the array of handle entries to transfer. This may be 0x0 if the receiver does not expect to receive any handles.
  • e - the size of the handles buffer (in handles)

Returns

Bits 0..16 are a status code:

  • 0 if the message was received successfully. The rest of the return value is valid.
  • 1 if the Channel handle is invalid.
  • 2 if the Channel handle does not point to a Channel.
  • 3 if there was no message to receive.
  • 4 if the address of the bytes buffer is invalid.
  • 5 if the bytes buffer is too small to contain the message.
  • 6 if the address of the handles buffer is invalid, or if 0x0 was passed and the message does contain handles.
  • 7 if the handles buffer is too small to contain the handles transferred with the message.

If the status code is 0 (i.e. a valid message was written into the bytes and handles buffers), the return value also contains the number of valid entries in both the byte and handle buffers:

  • Bits 16..32 contain the length of the valid byte buffer (in bytes). If the passed buffer was larger than this, the remaining bytes have not been written by the kernel.
  • Bits 32..48 contain the length of the valid handles buffer (in handles). If the passed buffer was larger than this, the remaining bytes have not been written by the kernel.

Capabilities needed

None.

register_service

Register yourself as the provider of a service. The name of the service will be {task_name}.{service_name}. This returns a channel that is used to alert the provider when another task subscribes to your service with the subscribe_to_service system call.

See the section on Services for more information about services, how to register a service, and how to subscribe to a service.

Parameters

  • a - the length of the name string in bytes. Maximum length is 256. Must be greater than 0.
  • b - a usermode pointer to the start of the UTF-8 encoded name string.

Returns

Returns the standard representation of a Result<Handle, ServiceError>. Error status codes are:

  • 1 if the task does not have the correct capability
  • 2 if the usermode pointer to the name is not valid
  • 3 if the name is too long, or 0

The returned handle is to a Channel that is used to serve channel subscriptions.

Capabilities needed

The ServiceProvider capability is needed to make this system call.

subscribe_to_service

Subscribe to a registered service by name. This will deliver a notification to the task that registered the service with one end of a newly created channel. The other end of the channel will be returned by this system call, if successful.

See the section on Services for more information about services, how to register a service, and how to subscribe to a service.

Parameters

  • a - the length of the name string in bytes. Maximum length is 256. Must be greater than 0.
  • b - a usermode pointer to the start of the UTF-8 encoded name string.

Returns

Returns the standard representation of a Result<Handle, ServiceError>. Error status codes are:

  • 1 if the task does not have the correct capability
  • 2 if the usermode pointer to the name is not valid
  • 3 if the name is too long, or 0
  • 4 if the supplied name does not correspond to a registered channel.

The returned handle is to one end of a Channel, the other end of which has been given to the task that supplies the service.

Capabilities needed

The ServiceUser capability is needed to make this system call.

pci_get_info

Get information about the PCI devices on the platform. This is only meant to be used from the userspace PCI bus driver.

TODO: detail structure of PCI descriptor

Parameters

  • a - a pointer to the buffer to put the PCI descriptors in
  • b - the size of the buffer (in descriptors)

Returns

Bits 0..16 contain a status code:

  • 0 if the system call succeeded
  • 1 if the task does not have the correct capabilities
  • 2 if the given buffer can't hold all the descriptors
  • 3 if the address to the descriptor buffer is invalid
  • 4 if the platform doesn't support PCI

If the status code is 0 (i.e. the system call succeeded), bits 16..48 contain the number of descriptors written back. If the status code is 2 (i.e. the buffer was not large enough), bits 16..48 contain the number of entries that need to be written.

If a is 0x0, this system call will always fail with status code 2 and the number of descriptors in bits 16..48. This is to allow userspace to dynamically allocate a buffer of the correct size, if it desires.

Capabilities needed

Tasks need the PciBusDriver capability to use this system call.

Capabilities

Capabilities describe what a task is allowed to do, and are encoded in its image. This allows users to audit the permissions of the tasks they run at a much higher granularity than user-based permissions, and also allow us to move parts of the kernel into discrete userspace tasks by creating specialised capabilities to allow access to sensitive resources (such as the raw framebuffer) to only select tasks.

Encoding capabilities in the ELF image

Capabilities are encoded in an entry of a PT_NOTE segment of the ELF image of a task. This entry will have an owner (sometimes referred to in documentation as the 'name') of PEBBLE and a type of 0. The descriptor will be an encoding of the capabilities as described by the 'Format' section. The descriptor must be padded such that the next descriptor is 4-byte aligned, and so a value of 0x00 is reserved to be used as padding.

Initial images (tasks loaded by the bootloader before filesystem drivers are working) are limited to a capabilities encoding of 32 bytes (given the variable-length encoding, this does not equate to a fixed maximum number of capabilities).

Format

The capabilities format is variable-length - simple capabilities can be encoded as a single byte, while more complex / specific ones may need multiple bytes of prefix, and can also encode fixed-length data.

Overview of capabilities

This is an overview of all the capabilities the kernel supports:

First byteNext byte(s)DataArch specific?Description
0x00---No meaning - used to pad descriptor to required length (see above)
0x01NoGetFramebuffer
0x02NoEarlyLogging
0x03NoServiceProvider
0x04NoServiceUser
0x05--NoPciBusDriver

Userspace memory map (x86_64)

x86_64 features an enormous 256TB virtual address space, most of which is available to userspace processes under Poplar. For this reason, things are spread throughout the virtual address space to make it easy to identify what a virtual address points to.

Userspace stacks

Within the virtual address space, the userspace stacks are allocated a 4GB range. Each task has a maximum stack size of 2MB, which puts a limit of 2048 tasks per address space.

Message Passing

Poplar has a kernel object called a Channel for providing first-class message passing support to userspace. Channels move packets, called "messages", which contain a stream of bytes, and optionally one or more handles that are transferred from the sending task to the receiving task.

Ptah

Channels can move arbitrary bytes, but Poplar also includes a layer on top of of Channels called Ptah, which consists of a data model and wire format suitable for encoding data which can be serialized and deserialized from any sensible language without too much difficulty.

Ptah is heavily inspired by Serde, and the first implementation of Ptah was actually a Serde data format. Unfortunately, it made properly handling Poplar handles very difficult - when a handle is sent over a channel, it needs to be put into a separate array, and the in-line data replaced by an index into that array. When the message travels over a task boundary, the kernel examines and replaces each handle in this array with a handle to the same kernel object in the new task. This effectively means we need to add a new Handle type to our data model, which is not easily possible with Serde (and would make it incompatible with standard Serde anyway).

The Ptah Data Model

The Ptah data model maps pretty well to the Rust type system, and relatively closely to the Serde data model. Key differences are some stronger guarantees about the encoding of types such as enums (the data model only needs to fit a single wire format, and so can afford to be less flexible than Serde's), and the lack of a few types - unit-based types, and the statically-sized version of seq and map - tuple and struct. Ptah is not a self-describing format (i.e. the types you're trying to deserialize is fully known), so the elements of structs and tuples can simply be serialized in the order they appear, and then deserialized in order at the other end.

  • Primitive types
    • bool
    • u8, u16, u32, u64, u128
    • i8, i16, i32, i64, i128
    • f32, f64
    • char
  • string
    • Encoded as a seq of u8, but with the additional requirement that it is valid UTF-8
    • Not null terminated, as seq includes explicit length
  • option
    • Encoded in the same way as an enum, but separately for the benefit of languages without proper enums
    • Either None or Some({value})
  • enum
    • Include a tag, and optionally some data
    • Represent a Rust enum, or a tagged union in languages without proper enums
    • The data is encoded separately to the tag, and can be of any other Ptah type:
      • Rust tuple variants (e.g. E::A(u8, u32)) are represented by tuple
      • Rust struct variants (e.g. E::B { foo: u8, bar: u32 }) are represented by struct
  • seq
    • A variable-length sequence of values, mapping to many types such as Vec<T>.
  • map
    • A variable-length series of key-value pairings, mapping to collections like BTreeMap<K, V>.
  • handle
    • This is the type that means we need our own data model in the first place
    • These are encoded out-of-line of the rest of the data, so that the Poplar kernel can introspect into them, if it needs to

The Ptah Wire Format

The wire format describes how messages can be encoded into a stream of bytes suitable for transmission over a channel, or over another transport layer such as the network or a serial port.

Primitives

Primitives are transmitted as little-endian and packed to their natural alignment. The following primitive types are recognised: | Name | Size (bytes) | Description | |-----------------------------------|---------------|-----------------------------------------------| | bool | 1 | A boolean value | | u8, u16, u32, u64, u128 | 1, 2, 4, 8 | An unsigned integer | | i8, i16, i32, i64, i128 | 1, 2, 4, 8 | A signed integer | | f32, f64 | 4, 8 | Single / double-precision IEEE-754 FP values | | char | 4 | A single UTF-8 Unicode scalar value |

Journal

This is just a place I put notes that I make during development.

Building a rustc target for Poplar

We want a target in rustc for building userspace programs for Poplar. It would be especially cool to get it merged as an upstream Tier-3 target. This documents my progress, mainly as a reference for me to remember how it all works.

How do I actually build and use rustc?

A useful baseline invocation for normal use is:

./x.py build -i library/std

The easiest way to test the built rustc is to create a rustup toolchain (from the root of the Rust repo):

rustup toolchain link poplar build/{host triple}/stage1     # If you built a stage-1 compiler (default with invocation above)
rustup toolchain link poplar build/{host triple}/stage2     # If you built a stage-2 compiler

It's easiest to call your toolchain poplar, as this is the name we use in the Makefiles for now.

You can then use this toolchain from Cargo anywhere on the system with:

cargo +poplar build     # Or whatever command you need

Using a custom LLVM

  • Fork Rust's llvm-project
  • cd src/llvm_project
  • git remote add my_fork {url to your custom LLVM's repo}
  • git fetch my_fork
  • git checkout my_fork/{the correct branch}
  • cd ..
  • git add llvm-project
  • git commit -m "Move to custom LLVM"

Things to change in config.toml

This is as of 2020-09-29 - you need to remember to keep the config.toml up-to-date (as it's not checked in upstream), and can cause confusing errors when it's out-of-date.

  • download-ci-llvm = true under [llvm]. This makes the build much faster, since we don't need a custom LLVM.
  • assertions = true under [llvm]
  • incremental = true under [rust]
  • lld = true under [rust]. Without this, the toolchain can't find rust-lld when linking.
  • llvm-tools = true under [rust]. This probably isn't needed, I just chucked it in in case rust-lld needs it.

Adding the target

I used a slightly different layout to most targets (which have a base, which creates a TargetOptions, and then a target that modifies and uses those options).

  • Poplar targets generally need a custom linker script. I added one at compiler/rustc_target/src/spec/x86_64_poplar.ld.
  • Make a module for the target (I called mine compiler/rustc_target/src/spec/x86_64_poplar.rs). Copy from a existing one. Instead of a separate poplar_base.rs to create the TargetOptions, we do it in the target itself. We include_str! the linker script in here, so it's distributed as part of the rustc binary.
  • Add the target in the supported_targets! macro in compiler/rustc_target/src/spec/mod.rs.

Adding the target to LLVM

I don't really know my way around the LLVM code base, so this was fairly cobbled together:

  • In llvm/include/llvm/ADT/Triple.h, add a variant for the OS in the OSType enum. I called it Poplar. Don't make it the last entry, to avoid having to change the LastOSType variant.
  • In llvm/lib/Support/Triple.cpp, in the function Triple::getOSTypeName, add the OS. I added case Poplar: return "poplar";.
  • In the same file, in the parseOS function, add the OS. I added .StartsWith("poplar", Triple::Poplar).
  • This file also contains a function, getDefaultFormat, that gives the default format for a platform. The default is ELF, so no changes were needed for Poplar, but they might be for another OS.

TIP: When you make a change in the llvm-project submodule, you will need to commit these changes, and then update the submodule in the parent repo, or the bootstrap script will checkout the old version (without your changes) and build an entire compiler without the changes you are trying to test.

NOTE: to avoid people from having to switch to our llvm-project fork, we don't actually use our LLVM target from rustc (yet). I'm not sure why you need per-OS targets in LLVM, as it doesn't even seem to let us do any of the things we wanted to (this totally might just be me not knowing how LLVM targets work).

Notes

  • We needed to change the entry point to _start, or it silently just doesn't emit any sections in the final image.
  • By default, it throws away our .caps sections. We need a way to emit it regardless - this is done by manually creating the program header and specifying that they should be kept with KEEP. There are two possible solutions that I can see: make rustc emit a linker script, or try and introduce these ideas into llvm/lld with our target (I'm not even sure this is possible).
  • It looks like lld has no OS-specific code at all, and the only place that specifically-kept sections are added is in the linker script parser. Looks like we might have to actually create a file-based linker script (does literally noone else need to pass a linker script by command line??).

USB

USB has a host that sends requests to devices (devices only respond when asked something). Some devices are dual role devices (DRD) (previously called On-The-Go (OTG) devices), and can dynamically negotiate whether they're the host or the device.

Each device can have one ore more interfaces, which each have one or more endpoints. Each endpoint has a hardcoded direction (host-to-device or device-to-host). There are a few types of endpoint (the type is decided during interface configuration):

  • Control endpoints are for configuration and control requests
  • Bulk endpoints are for bulk transfers
  • Isochronous endpoints are for periodic transfers with a reserved bandwidth
  • Int endpoints are for transfers triggered by interruptions

The interfaces and endpoints a device has are described by descriptors reported by the device during configuration.

Every device has a special endpoint called ep0. It's an in+out control endpoint, and is used to configure the other endpoints.

RISC-V

Building OpenSBI

OpenSBI is the reference implementation for the Supervisor Binary Interface (SBI). It's basically how you access M-mode functionality from your S-mode bootloader or kernel.

Firstly, we need a RISC-V C toolchain. On Arch, I installed the riscv64-unknown-elf-binutils AUR package. I also tried to install the riscv64-unknown-elf-gcc package, but this wouldn't work, so I built OpenSBI with Clang+LLVM instead with (from inside lib/opensbi):

make PLATFORM=generic LLVM=1

This can be tested on QEMU with:

qemu-system-riscv64 -M virt -bios build/platform/generic/firmware/fw_jump.elf

It also seems like you can build with a platform of qemu/virt - I'm not sure what difference this makes yet but guessing it's the hardware it assumes it needs to drive? Worth exploring. (Apparently the generic image is doing dynamic discovery (I'm assuming from the device tree) so that sounds good for now).

So the jump firmware (fw_jump.elf) jumps to a specified address in memory (apparently QEMU can load an ELF which would be fine initially). Other option would be a payload firmware, which bundles your code into the SBI image (assuming as a flat binary) and executes it like that.

We should probably make an xtask step to build OpenSBI and move it to the bundled directory, plus decide what sort of firmware / booting strategy we're going to use. Then the next step would be some Rust code that can print to the serial port, to prove it's all working.

QEMU virt memory map

Seems everything is memory-mapped, which makes for a nice change coming from x86's nasty port thingy. This is the virt machine's one (from the QEMU source...):

RegionAddressSize
Debug0x00x100
MROM0x10000x11000
Test0x1000000x1000
CLINT0x20000000x10000
PLIC0xc0000000x4000000
UART00x100000000x100
Virtio0x100010000x1000
Flash0x200000000x4000000
DRAM0x80000000{mem size}
PCIe MMIO0x400000000x40000000
PCIe PIO0x030000000x10000
PCIe ECAM0x300000000x10000000

Getting control from OpenSBI

On QEMU, we can get control from OpenSBI by linking a binary at 0x80200000, and then using -kernel to automatically load it at the right location. OpenSBI will then jump to this location with the HART ID in a0 and a pointer to the device tree in a1.

However, this does make setting paging up slightly icky, as has been a problem on other architectures. Basically, the first binary needs to be linked at a low address with bare translation, and then we need to construct page tables and enable translation, then jump to a higher address. I'm thinking we might as well do it in two stages: a Seed stage that loads the kernel and early tasks from the filesystem/network/whatever, builds the kernel page tables, and then enters the kernel and can be unloaded at a later date. The kernel can then be linked normally at its high address without faffing around with a bootstrap or anything.

The device tree

So the device tree seems to be a data structure passed to you that tells you about the hardware present / memory etc. Hopefully it's less gross than ACPI eh. Repnop has written a crate, fdt, so I think we're just going to use that.

So fdt seems to work fine, we can list memory regions etc. The only issue seems to be that memory_reservations doesn't return anything, which is kind of weird. There also seems to be a /reserved-memory node, but this suggests that this doesn't include stuff we want like which part of memory OpenSBI resides in.

This issue says Linux just assumes it shouldn't touch anything before it was loaded. I guess we could use the same approach, reserving the memory used by Seed via linker symbols, and then seeing where the loader device gets put to reserve the ramdisk, but the issue was closed saying OpenSBI now does the reservations correctly which would be cleaner, but doesn't stack up with what we're seeing.

Ah so actually, /reserved-memory does seem to have some of what we need. On QEMU there is one child node, called mmode_resv@80000000, which would fit with being the memory OpenSBI is in. We would still need to handle the memory we're in, and idk what happens with the loader device yet, but it's a start. Might be worth talking to repnop about whether the crate should use this node.