Introduction

Poplar is a general-purpose operating system built around a microkernel and userspace written in Rust. Drivers and core services that would ordinarily be implemented as part of a traditional monolithic kernel are instead implemented as unprivileged userspace programs.

Poplar is not a UNIX, and does not aim for binary or source-level compability with existing programs. While this does slow development down significantly, it gives us the opportunity to design the interfaces we provide from scratch.

Poplar is targeted to run on small computers (think a SoC with a ~1GiB of RAM and a few peripherals) and larger general purpose machines (think a many-core x86_64 "PC"). It is specifically not designed for small embedded systems - other projects are more suited to this space. Currently, Poplar supports relatively modern x86_64 and 64-bit RISC-V (RV64GC) machines.

The Poplar Kernel

At the core of Poplar is a small Rust microkernel. The kernel's job is to multiplex access to hardware resources (such as CPU time, memory, and peripherals) between competing bits of userspace code.

Poplar is a microkernel because its drivers, pieces of code for managing the many different devices a computer may have, live in userspace, and are relatively unpriviledged compared to the kernel. This provides safety benefits over a monolithic kernel because a misbehaving or malicious driver (supplied by a hardware vendor, for example) has a much more limited scope of operation. The disadvantage is that microkernels tend to be slower than monolithic kernels, due to increased overheads of communication between the kernel and userspace.

Kernel objects

The Poplar microkernel is object-based - resources managed by the kernel are represented as discrete 'objects', and are interacted with from userspace via plain integers called 'handles'. Multiple handles referring to a single object can exist, and each possesses a set of permissions that dictate how the owning task can interact with the object.

Kernel objects are used for:

  • Task creation and management (e.g. AddressSpace and Task)
  • Access to hardware resources (e.g. MemoryObject)
  • Message passing between tasks (e.g. Channel)
  • Signaling and waiting (e.g. Event)

Kernel Objects

Kernel objects represent resources that are managed by the kernel, that userspace tasks may want to interact with through system calls.

Handles

Kernel objects are referenced from a userspace task using handles. From userspace, handles are opaque 32-bit integers, and are associated within the kernel to kernel objects through a per-task mapping.

A handle of value 0 is never associated with a kernel object, and can act as a sentinel value - various system calls use this value for various meanings.

Each handle is associated with a series of permissions that dictate what the owning userspace task can do with the corresponding object. Some permissions are relevant to all types of kernel object, while others have meanings specific to the type of object the handle is associated with.

Permissions (TODO: expand on these):

  • Clone (create a new handle to the referenced kernel object)
  • Destroy (destroy the handle, destroying the object if no other handle references it)
  • Send (send the handle over a Channel to another task)

Address Space

TODO

Memory Object

TODO

Task

TODO

Channel

TODO

Event

TODO

System calls

Userspace code can interact with the kernel through system calls. Poplar's system call interface is based around 'kernel objects', and so many of the system calls are to create, destroy, or modify the state of various types of kernel object. Because of Poplar's microkernel design, many traditional system calls (e.g. open) are not present, their functionality instead being provided by userspace.

Each system call has a unique number that is used to identify it. A system call can then take up to five parameters, each a maximum in size of the system's register width. It can return a single value, also the size of a register.

Overview of system calls

NumberSystem callDescription
0yieldYield to the kernel.
1early_logLog a message. Designed to be used from early processes.
3create_memory_objectCreate a MemoryObject kernel object.
4map_memory_objectMap a MemoryObject into an AddressSpace.
5create_channelCreate a channel, returning handles to the two ends.
6send_messageSend a message down a channel.
7get_messageReceive the next message, if there is one.
8wait_for_messageYield to the kernel until a message arrives on the given
12wait_for_eventYield to the kernel until an event is signalled
13poll_interestPoll a kernel object to see if changes need to be processed.
14create_address_spaceCreate an AddressSpace kernel object.
15spawn_taskCreate a Task kernel object and start scheduling it.

Deprecated:

NumberSystem callDescription
2get_framebufferGet the framebuffer that the kernel has created, if it has.
11pci_get_infoGet information about the PCI devices on the platform.

Making a system call on x86_64

To make a system call on x86_64, populate these registers:

rdirsirdxr10r8r9
numberabcde

The only way in which these registers deviate from the x86_64 Sys-V ABI is that c is passed in r10 instead of rcx, because rcx is used by the syscall instruction. You can then make the system call by executing syscall. Before the kernel returns to userspace, it will put the result of the system call (if there is one) in rax. If a system call takes less than five parameters, the unused parameter registers will be preserved across the system call.

Making a system call on RISC-V

TODO

Return values

Often, a system call will need to return a status, plus one or more handles. The first handle a system call needs to return (often the only handle returned) can be returned in the upper bits of the status value:

  • Bits 0..32 contain the status:
    • 0 means that the system call succeeded, and the rest of the return value is valid
    • >0 means that the system call errored. The meaning of the value is system-call specific.
  • Bits 32..64 contain the value of the first returned handle, if applicable

A return value of 0xffffffffffffffff (the maximum value of u64) is reserved for when a system call is made with a number that does not correspond to a system call. This is defined as a normal error code (as opposed to, for example, terminating the task that tried to make the system call) to provide a mechanism for tasks to detect kernel support for a system call (so they can use a fallback method on older kernels, for example).

Syscall: yield

Yield to the kernel. Generally called when a userspace task has no more useful work to perform.

  • Parameters:
    • None
  • Returns:
    • Always 0

Syscall: early_log

Output a line to the kernel log. This is generally used by tasks early in the boot process, before reliable userspace logging is running, but could also be used by small userspaces for diagnostic logging. The output to be logged must be provided as a formatted string encoded as UTF-8.

  • Parameters:
    • a: the length of the string to log, in bytes. Max of 4096 bytes.
    • b: the address of the string to log.
  • Returns:
    • 0: success
    • 1: the length supplied is too large
    • 2: the supplied string is not valid UTF-8

Syscall: create_memory_object

Create a MemoryObject kernel object. Userspace can only create "blank" memory objects, backed by free, conventional physical memory.

  • Parameters:
    • a: the length of the memory object, in bytes
    • b: flags:
      • Bit 0: set if the memory should be writable
      • Bit 1: set if the memory should be executable
    • c: an address to which the kernel will write the physical address to which the memory object was allocated. Not written if null.
  • Returns:
    • 0: success
    • 1: the given set of flags is invalid
    • 2: a memory area of the requested size could not be allocated
    • 3: the address in c is not null, but is not valid

Syscall: map_memory_object

Map a MemoryObject into an AddressSpace.

  • Parameters:
    • a: the handle of the MemoryObject
    • b: the handle of the Addressspace. A zero handle indicates that the memory object should be mapped into the task's address space.
    • c: the virtual address to map the memory object at. Null indicates that the kernel should attempt to find a region in the address space large enough to hold the memory object and map it there.
    • d: a pointer to which the virtual address the memory object has been mapped to is written, if c is null. If d is null, this address is not written.
  • Returns:
    • 0: success
    • 1: the handle to the MemoryObject is invalid or does not point to a MemoryObject
    • 2: the handle to the AddressSpace is invalid or does not point to a AddressSpace
    • 3: the region of the address space that would be mapped is alreay occupied
    • 4: the supplied pointer in d is invalid

Syscall: create_channel

Create a new channel, returning handles to two Channel objects, each representing an end of the channel. Generally, one of these handles is sent to another task to facilitate IPC.

  • Parameters:
    • a: the address to write the second handle to (only one can be returned in the status)
  • Returns:
    • Status in bits 0..32:
      • 0: success
      • 1: the virtual address to write the second handle to is invalid
    • Handle to first end in bits 32..64

TODO: we could pack both handles into the return value by using a sentinel 0 handle to mark that the other handle is actually an error?

Syscall: send_message

Send a message, consisting of a number of bytes and optionally a number of handles, down a Channel. All the handles are removed from the sending Task and added to the receiving Task.

A maximum of 4 handles can be transferred by each message. The maximum number of bytes is currently 4096.

  • Parameters:
    • a: the handle to the Channel from which the message is to be sent
    • b: a pointer to the array of bytes to send
    • c: the length of the message, in bytes
    • d: a pointer to the array of handle entries to transfer. If the message does not transfer any handles, this should be 0x0
    • e: the number of handles to transfer
  • Returns:
    • 0 if the system call succeeded and the message was sent
    • 1 if the Channel handle is invalid
    • 2 if the Channel handle does not point to a Channel
    • 3 if the Channel handle does not have the correct rights to send messages
    • 4 if one or more of the handles to transfer is invalid
    • 5 if any of the handles to transfer do not have the correct rights
    • 6 if the pointer to the message bytes was not valid
    • 7 if the message's byte array is too large
    • 8 if the pointer to the handles array was not valid
    • 9 if the handles array is too large
    • 10 if the other end of the Channel has been disconnected

Syscall: get_message

Receive a message from a Channel, if one is waiting to be received.

A maximum of 4 handles can be transferred by each message. The maximum number of bytes is currently 4096.

  • Parameters:
    • a: the handle to the Channel end that is receiving the message.
    • b: a pointer to the array of bytes to write the message to
    • c: the maximum number of bytes the kernel should attempt to write to the buffer at b
    • d: a pointer to the array of handle entries to transfer. This can be 0x0 if the receiver does not expect to receive any handles.
    • e: the maximum number of handles the kernel should attempt to write to the array at d.
  • Returns:
    • Status in bits 0..16:
      • 0 if the message was received successfully. The rest of the return value is valid.
      • 1 if the Channel handle is invalid.
      • 2 if the Channel handle does not point to a Channel.
      • 3 if there was no message to receive.
      • 4 if the address of the bytes buffer is invalid.
      • 5 if the bytes buffer is too small to contain the message.
      • 6 if the address of the handles buffer is invalid, or if 0x0 was passed and the message does contain handles.
      • 7 if the handles buffer is too small to contain the handles transferred with the message.
    • The length of the message in bits 16..32
      • This is only valid for statuses of 0
    • The number of handles tranferred in bits 32..48
      • This is only valid if statuses of 0

Syscall: wait_for_message

TODO

Syscall: wait_for_event

TODO

Syscall: poll_interest

TODO

Syscall: create_address_space

TODO

Syscall: spawn_task

TODO

Platforms

A platform is a build target for the kernel. In some cases, there is only one platform for an entire architecture because the hardware is relatively standardized (e.g. x86_64). Other times, hardware is different enough between platforms that it's easier to treat them as different targets (e.g. a headless ARM server that boots using UEFI, versus a Raspberry Pi).

All supported platforms are enumerated in the table below - some have their own sections with more details, while others are just described below. The platform you want to build for is specified in your Poplar.toml configuration file, or with the -p/--platform flag to xtask. Some platforms also have custom xtask commands to, for example, flash a device with a built image.

Platform nameArchDescription
x64x86_64Modern x86_64 platform.
rv64_virtRV64A virtual RISC-V QEMU platform.
mq_proRV64The MangoPi MQ-Pro RISC-V platform.

Platform: x64

The vast majority of x86_64 hardware is pretty similar, and so is treated as a single platform. It uses the hal_x86_64 HAL. We assume that the platform:

  • Boots using UEFI (using seed_uefi)
  • Supports the APIC
  • Supports the xsave instruction

Platform: rv64_virt

This is a virtual RISC-V platform emulated by qemu-system-riscv64's virt machine. It features:

  • A customizable number of emulated RV64 HARTs
  • Is booted via QEMU's -kernel option and OpenSBI
  • A Virtio block device with attached GPT 'disk'
  • Support for USB devices via EHCI

Devices such as the EHCI USB controller are connected to a PCIe bus, and so we use the Advanced Interrupt Architecture with MSIs to avoid the complexity of shared pin-based PCI interrupts. This is done by passing the aia=aplic-imsic machine option to QEMU.

MangoPi MQ-Pro

The MangoPi MQ-Pro is a small RISC-V development board, featuring an Allwinner D1 SoC with a single RV64 core, and either 512MiB or 1GiB of memory. Most public information about the D1 itself can be found on the Sunxi wiki.

You're probably going to want to solder a GPIO header to the board and get a USB-UART adaptor as well. The xtask contains a small serial utility for logging the output from the board, but you can also use an external program such as minicom. Adding a male-to-female jumper wire to a ground pin is also useful - you can touch it to the RST pad on the back of the board to reset it (allowing you to FEL new code onto it).

Boot procedure

The D1 can be booted from an SD card or flash, or, usefully for development, using Allwinner's FEL protocol, which allows data to be loaded into memory and code executed using a small USB stack. This procedure is best visualised with a diagram: Diagram of the D1's boot procedure

The initial part of this process is done by code loaded from the BROM (Boot ROM) - it contains the FEL stack, as well as enough code to load the first-stage bootloader from either an SD card or SPI flash. Data loaded by the FEL stack, or from the bootable media, is loaded into SRAM. The DRAM has to be brought up, either by the first-stage bootloader, or by a FEL payload.

Booting via FEL

To boot with FEL, the MQ-Pro needs to be plugged into the development machine via the USB-C OTG port (not the host port), and then booted into FEL mode. The easiest way to do this is to just remove the SD card - as long as the flash hasn't been written to, this should boot into FEL. It should then enumerate as a USB device with ID 1f3a:efe8 on the host.

You then need something that can talk the FEL protocol on the host. We're currently using xfel, but this may be replaced/augmented with a more capable solution in the future. xfel should be relatively easy to compile and install on a Linux system, and should install some udev rules that allow it to be used by normal users. xfel should then automatically detect a connected device in FEL mode.

The first step is to initialize the DRAM - xfel ddr d1 does this by uploading and running a small payload with the correct procedures. After this, further code can be loaded directly into DRAM - we load OpenSBI and Seed.

Then, we load OpenSBI's FW_JUMP firmware at the start of RAM, 0x4000_0000. This provides the SBI interface, moves from M-mode to S-mode, and then jumps into Seed, which is loaded after 512KiB after it at 0x4008_0000 (this address is supplied to OpenSBI at build-time). We also bundle a device tree for the platform into OpenSBI, which it uses to bootstrap the platform, and then supplies it onwards.

TODO: we should investigate customising the driver list to maybe get OpenSBI under 256KiB (it's just over).

Seed

Seed is Poplar's bootloader ± pre-kernel. What it is required to do varies by platform, but generally it is responsible for bringing up the system, loading the kernel and initial tasks into memory, and preparing the environment for executing the kernel.

x86_64

On x86_64, Seed is an UEFI executable that utilises boot services to load the kernel and initial tasks. The Seed exectuable, the kernel, and other files are all held in the EFI System Partition (ESP) - a FAT filesystem present in all UEFI-booted systems.

riscv

On RiscV, Seed is more of a pre-kernel than a traditional bootloader. It is booted into by the system firmware, and then has its own set of drivers to load the kernel and other files from the correct filesystem, or elsewhere.

The boot mechanism has not yet been fully designed for RiscV, and also will heavily depend on the hardware target, as booting different platforms is much less standardised than on x86_64.

Debugging the kernel

Kernels can be difficult to debug - this page tries to collect useful techniques for debugging kernels in general, and also any Poplar specific things that might be useful.

Poplar specific: the breakpoint exception

The breakpoint exception is useful for inspecting the contents of registers at specific points, such as in sections of assembly (where it's inconvenient to call into Rust, or to use a debugger because getting global_asm! to play nicely with GDB is a pain).

Simply use the int3 instruction:

...

mov rsp, [gs:0x10]
int3  // Is my user stack pointer correct?
sysretq

Building OVMF

Building a debug build of OVMF isn't too hard (from the base of the edk2 repo):

OvmfPkg/build.sh -a X64

By default, debug builds of OVMF will output debugging information on the ISA debugcon, which is actually probably nicer for our purposes than most builds, which pass DEBUG_ON_SERIAL_PORT during the build. To log the output to a file, you can pass -debugcon file:ovmf_debug.log -global isa-debugcon.iobase=0x402 to QEMU.

Message Passing

Poplar has a kernel object called a Channel for providing first-class message passing support to userspace. Channels move packets, called "messages", which contain a stream of bytes, and optionally one or more handles that are transferred from the sending task to the receiving task.

Ptah

Channels can move arbitrary bytes, but Poplar also includes a layer on top of Channels called Ptah, which consists of a data model and wire format suitable for encoding data which can be serialized and deserialized from any sensible language without too much difficulty.

Ptah is used for IPC between tasks running in userspace, and also for more complex communication between the kernel and userspace.

Ptah is heavily inspired by Serde, and the first implementation of Ptah was actually a Serde data format. Unfortunately, it made properly handling Poplar handles very difficult - when a handle is sent over a channel, it needs to be put into a separate array, and the in-line data replaced by an index into that array. When the message travels over a task boundary, the kernel examines and replaces each handle in this array with a handle to the same kernel object in the new task. This effectively means we need to add a new Handle type to our data model, which is not easily possible with Serde (and would make it incompatible with standard Serde serializers anyway).

The Ptah Data Model

The Ptah data model maps pretty well to the Rust type system, and relatively closely to the Serde data model. Key differences are some stronger guarantees about the encoding of types such as enums (the data model only needs to fit a single wire format, and so can afford to be less flexible than Serde's), and the lack of a few types - unit-based types, and the statically-sized version of seq and map - tuple and struct. Ptah is not a self-describing format (i.e. the type you're trying to (de)serialize must be known at both ends), so the elements of structs and tuples can simply be serialized in the order they appear, and then deserialized in order at the other end.

  • Primitive types
    • bool
    • u8, u16, u32, u64, u128
    • i8, i16, i32, i64, i128
    • f32, f64
    • char
  • string
    • Encoded as a seq of u8, but with the additional requirement that it is valid UTF-8
    • Not null terminated, as seq includes explicit length
  • option
    • Encoded in the same way as an enum, but separately for the benefit of languages without proper enums
    • Either None or Some({value})
  • enum
    • Include a tag, and optionally some data
    • Represent a Rust enum, or a tagged union in languages without proper enums
    • The data is encoded separately to the tag, and can be of any other Ptah type:
      • Rust tuple variants (e.g. E::A(u8, u32)) are represented by tuple
      • Rust struct variants (e.g. E::B { foo: u8, bar: u32 }) are represented by struct
  • seq
    • A variable-length sequence of values, mapping to types such as arrays and Vec<T>.
  • map
    • A variable-length series of key-value pairings, mapping to collections like BTreeMap<K, V>.
  • handle
    • A marker in the data stream that a handle to a kernel object is being moved across the channel. The handle itself is encoded out-of-band.
    • This allows the kernel, or something else handling Ptah-encoded data, to process the handle
    • Handles being first-class in the data model is why Poplar can't readily use something like serde

The Ptah Wire Format

The wire format describes how messages can be encoded into a stream of bytes suitable for transmission over a channel, or over another transport layer such as the network or a serial port.

Primitives are transmitted as little-endian and packed to their natural alignment. The following primitive types are recognised:

NameSize (bytes)Description
bool1A boolean value
u8, u16, u32, u64, u1281, 2, 4, 8An unsigned integer
i8, i16, i32, i64, i1281, 2, 4, 8A signed integer
f32, f644, 8Single / double-precision IEEE-754 FP values
char4A single UTF-8 Unicode scalar value

TODO: rest of the wire format

Poplar's userspace

Poplar supports running programs in userspace on supporting architectures. This offers increased protection and separation compared to running code in kernelspace - as a microkernel, Poplar tries to run as much code in userspace as possible.

Building a program for Poplar's userspace

Currently, the only officially supported language for writing userspace programs is Rust.

Target

Poplar provides custom rustc target files for userspace programs. These are found in the user/{arch}_poplar.toml files.

Standard library

Poplar provides a Rust crate, called std, which replaces Rust's standard library. We've done this for a few reasons:

  • We originally had targets and a std port in a fork of rustc. This proved difficult to maintain and required users to build a custom Rust fork and add it as a rustup toolchain. This is a high barrier of entry for anyone wanting to try Poplar out.
  • Poplar's ideal standard library probably won't end up looking very similar to other platform's, as there are significant ideological differences in how programs should interact with the OS. This is unfortunate from a porting point of view, but does allow us to design the platform interface from the group up.

The name of the crate is slightly unfortunate, but is required, as rustc uses the name of the crate to decide where to import the prelude from. This significantly increases the ergonomics we can provide, so is worth the tradeoff.

The std crate does a few important things that are worth understanding to reduce the 'magic' of Poplar's userspace:

  • It provides a linker script - the linker script for the correct target is shipped as part of the crate, and then the build script copies it into the Cargo OUT_DIR. It also passes a directive to rustc such that you can simply pass -Tlink.ld to link with the correct script. This is, for example, done using RUSTFLAGS by Poplar's xtask, but you can also pass it manually or with another method, depending on your build system.
  • It provides a prelude that should be very similar to the official std prelude
  • It provides an entry point to the executable that does required initialisation before passing control to Rust's main function

Capabilities

Capabilities describe what a task is allowed to do, and are encoded in its image. This allows users to audit the permissions of the tasks they run at a much higher granularity than user-based permissions, and also allow us to move parts of the kernel into discrete userspace tasks by creating specialised capabilities to allow access to sensitive resources (such as the raw framebuffer) to only select tasks.

Encoding capabilities in the ELF image

Capabilities are encoded in an entry of a PT_NOTE segment of the ELF image of a task. This entry will have an owner (sometimes referred to in documentation as the 'name') of POPLAR and a type of 0. The descriptor will be an encoding of the capabilities as described by the 'Format' section. The descriptor must be padded such that the next descriptor is 4-byte aligned, and so a value of 0x00 is reserved to be used as padding.

Initial images (tasks loaded by the bootloader before filesystem drivers are working) are limited to a capabilities encoding of 32 bytes (given the variable-length encoding, this does not equate to a fixed maximum number of capabilities).

Format

The capabilities format is variable-length - simple capabilities can be encoded as a single byte, while more complex / specific ones may need multiple bytes of prefix, and can also encode fixed-length data.

Overview of capabilities

This is an overview of all the capabilities the kernel supports:

First byteNext byte(s)DataArch specific?Description
0x00---No meaning - used to pad descriptor to required length (see above)
0x01NoGetFramebuffer
0x02NoEarlyLogging
0x03NoServiceProvider
0x04NoServiceUser
0x05--NoPciBusDriver

Platform Bus

The Platform Bus is a userspace service designed to be a core part of most Poplar systems. It manages an abstract "bus" of devices that can be added by userspace bus drivers and consumed by userspace device drivers. Drivers talk to the Platform Bus via channels obtained by subscribing to the Platform Bus's services, platform_bus.bus_driver and platform_bus.device_driver.

Platform Bus is entirely a userspace concept, and so systems can be built around the Poplar kernel without it. However, many drivers and applications will expect the Platform Bus service to be present, and systems not using it will have to handle many low-level systems, such as PCI device enumeration, themselves, and so it is expected that the vast majority of systems would use Platform Bus as a fundamental building block of their userspace.

Device representation on the Platform Bus

Devices on the Platform Bus can be quite abstract, or represent literal devices that are part of the platform, or plugged in as peripherals. Examples include a framebuffer "device" provided by a driver for a graphics-capable device, a power-management chip built into the platform, and USB devices, respectively.

Devices are described by a series of properties, which are typed pieces of data associated with a label. Device properties are used to identify devices, and are given to every device driver that claims it may be able to drive a device. Handoff properties are only transfered to a driver once it has been selected to drive a device, and can contain handles to kernel objects needed to drive the device. These handles are transferred to the task implementing the device driver, which is why they cannot be send arbitrarily to drivers to query support.

Device registration

TODO

Device hand-off to device driver

TODO

Standard devices

The Platform Bus library defines expected properties and behaviour for a number of standard device classes, in an attempt to increase compatability across drivers and device users. Additional properties may be added as necessary for an individual device.

PCI devices

Platform Bus will use information provided by the kernel to create devices for each enumerated PCI device. Standard properties:

PropertyTypeDescription
pci.vendor_idIntegerVendor ID of the PCI device
pci.device_idIntegerDevice ID of the PCI device
pci.classIntegerClass of the PCI device
pci.sub_classIntegerSub-class of the PCI device
pci.interfaceIntegerInterface of the PCI device
pci.interruptEventIf configured, an Event that is triggered when the PCI device gets an IRQ
pci.barN.sizeIntegerN is a number from 0-6. The size of the given BAR, if present.
pci.barN.handleMemoryObjectN is a number from 0-6. A memory object mapped to the given BAR, if present.

Generally, specific devices (e.g. a specific GPU) can be detected with a combination of the vendor_id and device_id properties, while a type of device can be identified via the class, sub_class, and interface properties. Drivers should filter against the appropriate properties depending on the devices they can drive.

USB devices

USB devices may be added to the Platform Bus by a USB Host Controller driver, and can be consumed by a wide array of drivers. Standard properties:

PropertyTypeDescription
usb.vendor_idIntegerClass of the USB device
usb.product_idIntegerClass of the USB device
usb.classIntegerClass of the USB device
usb.sub_classIntegerSub-class of the USB device
usb.protocolIntegerProtocol of the USB device
usb.config0BytesByte-stream of the first configuration descriptor of the device
usb.channelChannelControl channel to configure and control the device via the bus driver

HID devices

TODO

Journal

This is just a place I put notes that I make during development.

Building a rustc target for Poplar

We want a target in rustc for building userspace programs for Poplar. It would be especially cool to get it merged as an upstream Tier-3 target. This documents my progress, mainly as a reference for me to remember how it all works.

How do I actually build and use rustc?

A useful baseline invocation for normal use is:

./x.py build -i library/std

The easiest way to test the built rustc is to create a rustup toolchain (from the root of the Rust repo):

rustup toolchain link poplar build/{host triple}/stage1     # If you built a stage-1 compiler (default with invocation above)
rustup toolchain link poplar build/{host triple}/stage2     # If you built a stage-2 compiler

It's easiest to call your toolchain poplar, as this is the name we use in the Makefiles for now.

You can then use this toolchain from Cargo anywhere on the system with:

cargo +poplar build     # Or whatever command you need

Using a custom LLVM

  • Fork Rust's llvm-project
  • cd src/llvm_project
  • git remote add my_fork {url to your custom LLVM's repo}
  • git fetch my_fork
  • git checkout my_fork/{the correct branch}
  • cd ..
  • git add llvm-project
  • git commit -m "Move to custom LLVM"

Things to change in config.toml

This is as of 2020-09-29 - you need to remember to keep the config.toml up-to-date (as it's not checked in upstream), and can cause confusing errors when it's out-of-date.

  • download-ci-llvm = true under [llvm]. This makes the build much faster, since we don't need a custom LLVM.
  • assertions = true under [llvm]
  • incremental = true under [rust]
  • lld = true under [rust]. Without this, the toolchain can't find rust-lld when linking.
  • llvm-tools = true under [rust]. This probably isn't needed, I just chucked it in in case rust-lld needs it.

Adding the target

I used a slightly different layout to most targets (which have a base, which creates a TargetOptions, and then a target that modifies and uses those options).

  • Poplar targets generally need a custom linker script. I added one at compiler/rustc_target/src/spec/x86_64_poplar.ld.
  • Make a module for the target (I called mine compiler/rustc_target/src/spec/x86_64_poplar.rs). Copy from a existing one. Instead of a separate poplar_base.rs to create the TargetOptions, we do it in the target itself. We include_str! the linker script in here, so it's distributed as part of the rustc binary.
  • Add the target in the supported_targets! macro in compiler/rustc_target/src/spec/mod.rs.

Adding the target to LLVM

I don't really know my way around the LLVM code base, so this was fairly cobbled together:

  • In llvm/include/llvm/ADT/Triple.h, add a variant for the OS in the OSType enum. I called it Poplar. Don't make it the last entry, to avoid having to change the LastOSType variant.
  • In llvm/lib/Support/Triple.cpp, in the function Triple::getOSTypeName, add the OS. I added case Poplar: return "poplar";.
  • In the same file, in the parseOS function, add the OS. I added .StartsWith("poplar", Triple::Poplar).
  • This file also contains a function, getDefaultFormat, that gives the default format for a platform. The default is ELF, so no changes were needed for Poplar, but they might be for another OS.

TIP: When you make a change in the llvm-project submodule, you will need to commit these changes, and then update the submodule in the parent repo, or the bootstrap script will checkout the old version (without your changes) and build an entire compiler without the changes you are trying to test.

NOTE: to avoid people from having to switch to our llvm-project fork, we don't actually use our LLVM target from rustc (yet). I'm not sure why you need per-OS targets in LLVM, as it doesn't even seem to let us do any of the things we wanted to (this totally might just be me not knowing how LLVM targets work).

Notes

  • We needed to change the entry point to _start, or it silently just doesn't emit any sections in the final image.
  • By default, it throws away our .caps sections. We need a way to emit it regardless - this is done by manually creating the program header and specifying that they should be kept with KEEP. There are two possible solutions that I can see: make rustc emit a linker script, or try and introduce these ideas into llvm/lld with our target (I'm not even sure this is possible).
  • It looks like lld has no OS-specific code at all, and the only place that specifically-kept sections are added is in the linker script parser. Looks like we might have to actually create a file-based linker script (does literally noone else need to pass a linker script by command line??).

USB

USB has a host that sends requests to devices (devices only respond when asked something). Some devices are dual role devices (DRD) (previously called On-The-Go (OTG) devices), and can dynamically negotiate whether they're the host or the device.

Each device can have one ore more interfaces, which each have one or more endpoints. Each endpoint has a hardcoded direction (host-to-device or device-to-host). There are a few types of endpoint (the type is decided during interface configuration):

  • Control endpoints are for configuration and control requests
  • Bulk endpoints are for bulk transfers
  • Isochronous endpoints are for periodic transfers with a reserved bandwidth
  • Int endpoints are for transfers triggered by interruptions

The interfaces and endpoints a device has are described by descriptors reported by the device during configuration.

Every device has a special endpoint called ep0. It's an in+out control endpoint, and is used to configure the other endpoints.

RISC-V

Building OpenSBI

OpenSBI is the reference implementation for the Supervisor Binary Interface (SBI). It's basically how you access M-mode functionality from your S-mode bootloader or kernel.

Firstly, we need a RISC-V C toolchain. On Arch, I installed the riscv64-unknown-elf-binutils AUR package. I also tried to install the riscv64-unknown-elf-gcc package, but this wouldn't work, so I built OpenSBI with Clang+LLVM instead with (from inside lib/opensbi):

make PLATFORM=generic LLVM=1

This can be tested on QEMU with:

qemu-system-riscv64 -M virt -bios build/platform/generic/firmware/fw_jump.elf

It also seems like you can build with a platform of qemu/virt - I'm not sure what difference this makes yet but guessing it's the hardware it assumes it needs to drive? Worth exploring. (Apparently the generic image is doing dynamic discovery (I'm assuming from the device tree) so that sounds good for now).

So the jump firmware (fw_jump.elf) jumps to a specified address in memory (apparently QEMU can load an ELF which would be fine initially). Other option would be a payload firmware, which bundles your code into the SBI image (assuming as a flat binary) and executes it like that.

We should probably make an xtask step to build OpenSBI and move it to the bundled directory, plus decide what sort of firmware / booting strategy we're going to use. Then the next step would be some Rust code that can print to the serial port, to prove it's all working.

QEMU virt memory map

Seems everything is memory-mapped, which makes for a nice change coming from x86's nasty port thingy. This is the virt machine's one (from the QEMU source...):

RegionAddressSize
Debug0x00x100
MROM0x10000x11000
Test0x1000000x1000
CLINT0x0200_00000x10000
PCIe PIO0x0300_00000x10000
PLIC0x0c00_00000x4000000
UART00x1000_00000x100
Virtio0x1000_10000x1000
Flash0x2000_00000x4000000
PCIe ECAM0x3000_00000x10000000
PCIe MMIO0x4000_00000x40000000
DRAM0x8000_0000{mem size}

Getting control from OpenSBI

On QEMU, we can get control from OpenSBI by linking a binary at 0x80200000, and then using -kernel to automatically load it at the right location. OpenSBI will then jump to this location with the HART ID in a0 and a pointer to the device tree in a1.

However, this does make setting paging up slightly icky, as has been a problem on other architectures. Basically, the first binary needs to be linked at a low address with bare translation, and then we need to construct page tables and enable translation, then jump to a higher address. I'm thinking we might as well do it in two stages: a Seed stage that loads the kernel and early tasks from the filesystem/network/whatever, builds the kernel page tables, and then enters the kernel and can be unloaded at a later date. The kernel can then be linked normally at its high address without faffing around with a bootstrap or anything.

The device tree

So the device tree seems to be a data structure passed to you that tells you about the hardware present / memory etc. Hopefully it's less gross than ACPI eh. Repnop has written a crate, fdt, so I think we're just going to use that.

So fdt seems to work fine, we can list memory regions etc. The only issue seems to be that memory_reservations doesn't return anything, which is kind of weird. There also seems to be a /reserved-memory node, but this suggests that this doesn't include stuff we want like which part of memory OpenSBI resides in.

This issue says Linux just assumes it shouldn't touch anything before it was loaded. I guess we could use the same approach, reserving the memory used by Seed via linker symbols, and then seeing where the loader device gets put to reserve the ramdisk, but the issue was closed saying OpenSBI now does the reservations correctly which would be cleaner, but doesn't stack up with what we're seeing.

Ah so actually, /reserved-memory does seem to have some of what we need. On QEMU there is one child node, called mmode_resv@80000000, which would fit with being the memory OpenSBI is in. We would still need to handle the memory we're in, and idk what happens with the loader device yet, but it's a start. Might be worth talking to repnop about whether the crate should use this node.

Dumb way to load the kernel for now

So for some reason, fw_cfg doesn't seem to be working on QEMU 7.1. This is what we were gonna use for loading the kernel, command line, and user programs, etc. but obvs this is not possible atm. For now, as a workaround, we can use the loader device to load arbitrary data and files into memory.

I'm thinking we could use 0x1_0000_0000 as the base physical address for this - this gives us a maximum of 2GiB of DRAM, which seems plenty for now (famous last words). We'll need to know the size of the object we're loading on the guest-side, so we'll load that separately for now (in the future, this whole scheme could be extended to some sort of mini-filesystem).

Okay so the loader device is pretty finnicky, and has no error handling. Turns out you can't define new memory with it, just load values into RAM, but it doesn't actually tell you this has failed. You then try and read this on the guest, and get super wierd UB from doing so - it doesn't just fault or whatever, it seems to break code before you ever read the memory (super weird ngl, didn't stick around to work out what was going on).

Right, seems to be working much better by actually putting the values in RAM. We've extended RAM to 1GiB (0x8000_0000..0xc000_0000) and we'll use this as the new layout:

AddressDescriptionSize (bytes)
0xb000_0000Size of Data4
0xb000_0004DataN

PCI interrupt routing

PCI interrupt routing is the process of working out which platform-specific interrupt will fire when a given PCI device issues an interrupt. In general, we use message-signalled interrupts (MSIs) where avaiable, and fall back to the legacy interrupt pins (INTA, INTB, INTC, INTD) on devices where they are not.

Legacy interrupt pins

  • Each function header contains an interrupt pin field that can be 0 (no interrupts), or 1 through 4 for each pin.
  • Interrupts from devices that share a pin cannot be differentiated from each other without querying the devices themselves. For us, this means usermode drivers will need to be awoken before knowing their device has actually received an interrupt.

The pin used by each device is not programmable - each device hardcodes it at time of manufacture. However, they can be remapped by any bridge between the device and host, so that the interrupt signal on the upstream side of the bridge differs to the device's reported interrupt pin. This was necessitated by manufacturers defaulting to using INTA - usage of the 4 available pins was unbalanced, so firmware improves performance by rebalancing them at the bridge.

How the pins have been remapped is communicated to the operating system via a platform-specific mechanism. On modern x86 systems, this is through the _PRT method in the ACPI namespace (before ACPI, BIOS methods and later MP tables were used). On ARM and RISC-V, the device tree specifies this mapping through the interrupt-map property on the platform's interrupt controllers.