Introduction
Welcome to the Poplar Book, which serves as the main source of documentation for Poplar. The Book aims to be both a 10,000-meter overview of Poplar for the interested observer, and a definitive reference for the inner workings of the kernel and userspace.
Please note that this book (like the rest of the OS!) is still very early in development and may lag behind the state of the code. If anything is unclear, please file an issue!
What is Poplar?
At heart, Poplar is a microkernel written in the Rust programming language. Poplar becomes an "OS" when it's combined with other packages such as drivers, filesystems and user applications.
Poplar is designed to be a modern microkernel, supporting a minimal system call interface and first-class support for message-passing-based IPC between userspace processes. Versatile message-passing allows Poplar to move much more out of the kernel than traditionally possible. For example, the kernel has no concept of a filesystem or of files - instead, the VFS and all filesystems are implemented entirely in userspace, and files are read and written to by passing messages.
Why Rust?
While Poplar's design is in theory language-agnostic, the implementation is very tied to Rust. Rust is a systems programming language with a rich type system and a novel ownership model that guarantees
memory and thread safety in safe code. This qualification is important, as Poplar uses a lot of unsafe
code out of necessity - it's important to understand that the use of Rust does not in any way
mean that Poplar is automatically bug-free.
However, Rust makes you think a lot more about how to make your programs safe, which is exactly the sort of code we want to be writing for a kernel. This focus on safety, as well as good ergonomics features and performance, makes Rust perfect for OS-level code.
The Poplar Kernel
TODO
Platforms
A platform is a build target for the kernel. In some cases, there is only one platform for an entire architecture because the hardware is relatively standardized (e.g. x86_64). Other times, hardware is different enough between platforms that it's easier to treat them as different targets (e.g. a headless ARM server that boots using UEFI, versus a Raspberry Pi).
All supported platforms are enumerated in the table below - some have their own sections with more details, while others are just described below. The platform you want to
build for is specified in your Poplar.toml
configuration file, or with the -p
/--platform
flag to xtask
. Some platforms also have custom xtask
commands to, for
example, flash a device with a built image.
Platform name | Arch | Description |
---|---|---|
x64 | x86_64 | Modern x86_64 platform. |
rv64_virt | RV64 | A virtual RISC-V QEMU platform. |
mq_pro | RV64 | The MangoPi MQ-Pro RISC-V platform. |
Platform: x64
The vast majority of x86_64 hardware is pretty similar, and so is treated as a single platform. It uses the hal_x86_64
HAL. We assume that the platform:
- Boots using UEFI (using
seed_uefi
) - Supports the APIC
- Supports the
xsave
instruction
Platform: rv64_virt
This is a virtual RISC-V platform emulated by qemu-system-riscv64
's virt
machine. It features:
- A customizable number of emulated RV64 HARTs
- Is booted via QEMU's
-kernel
option and OpenSBI - A Virtio block device with attached GPT 'disk'
- Support for USB devices via EHCI
Devices such as the EHCI USB controller are connected to a PCIe bus, and so we use the Advanced Interrupt Architecture
with MSIs to avoid the complexity of shared pin-based PCI interrupts. This is done by passing the aia=aplic-imsic
machine option to QEMU.
MangoPi MQ-Pro
The MangoPi MQ-Pro is a small RISC-V development board, featuring an Allwinner D1 SoC with a single RV64 core, and either 512MiB or 1GiB of memory. Most public information about the D1 itself can be found on the Sunxi wiki.
You're probably going to want to solder a GPIO header to the board and get a USB-UART adaptor as well. The xtask
contains a small serial utility for logging the output from
the board, but you can also use an external program such as minicom
.
Adding a male-to-female jumper wire to a ground pin is also useful - you can touch it to the RST
pad on the back of the board to reset it (allowing you to FEL new code onto
it).
Boot procedure
The D1 can be booted from an SD card or flash, or, usefully for development, using Allwinner's FEL protocol, which allows data to be loaded into memory and code executed using a small USB stack. This procedure is best visualised with a diagram:
The initial part of this process is done by code loaded from the BROM
(Boot ROM) - it contains the FEL stack, as well as enough code to load the first-stage bootloader from
either an SD card or SPI flash. Data loaded by the FEL stack, or from the bootable media, is loaded into SRAM. The DRAM has to be brought up, either by the first-stage
bootloader, or by a FEL payload.
Booting via FEL
To boot with FEL, the MQ-Pro needs to be plugged into the development machine via the USB-C OTG port (not the host port), and then booted into FEL mode. The easiest way to do
this is to just remove the SD card - as long as the flash hasn't been written to, this should boot into FEL. It should then enumerate as a USB device with ID 1f3a:efe8
on
the host.
You then need something that can talk the FEL protocol on the host. We're currently using xfel
, but this may be replaced/augmented with a
more capable solution in the future. xfel
should be relatively easy to compile and install on a Linux system, and should install some udev rules that allow it to be used by
normal users. xfel
should then automatically detect a connected device in FEL mode.
The first step is to initialize the DRAM - xfel ddr d1
does this by uploading and running a small payload with the correct procedures. After this, further code can be loaded
directly into DRAM - we load OpenSBI and Seed.
Then, we load OpenSBI's FW_JUMP
firmware at the start of RAM, 0x4000_0000
. This provides the SBI interface, moves from M-mode to S-mode, and then jumps into Seed, which is
loaded after 512KiB after it at 0x4008_0000
(this address is supplied to OpenSBI at build-time). We also bundle a device tree for the platform into OpenSBI, which it uses to
bootstrap the platform, and then supplies it onwards.
TODO: we should investigate customising the driver list to maybe get OpenSBI under 256KiB (it's just over).
Seed
Seed is Poplar's bootloader ± pre-kernel. What it is required to do varies by platform, but generally it is responsible for bringing up the system, loading the kernel and initial tasks into memory, and preparing the environment for executing the kernel.
x86_64
On x86_64
, Seed is an UEFI executable that utilises boot services to load the kernel and initial tasks. The Seed
exectuable, the kernel, and other files are all held in the EFI System Partition (ESP) - a FAT filesystem present
in all UEFI-booted systems.
riscv
On RiscV, Seed is more of a pre-kernel than a traditional bootloader. It is booted into by the system firmware, and then has its own set of drivers to load the kernel and other files from the correct filesystem, or elsewhere.
The boot mechanism has not yet been fully designed for RiscV, and also will heavily depend on the hardware target, as booting different platforms is much less standardised than on x86_64.
Kernel Objects
Kernel Objects are how Poplar represents resources that can be interacted with from userspace. They are all allocated a unique ID.
Handles
Handles are used to refer to kernel objects from userspace, and are allocated to a single Task.
A handle of value 0
acts as a sentinel value that can be used for special meanings. From userspace, handles
must be treated as opaque, 32-bit integers.
Debugging the kernel
Kernels can be difficult to debug - this page tries to collect useful techniques for debugging kernels in general, and also any Poplar specific things that might be useful.
Poplar specific: the breakpoint exception
The breakpoint exception is useful for inspecting the contents of registers at specific points, such as in sections
of assembly (where it's inconvenient to call into Rust, or to use a debugger because getting global_asm!
to play
nicely with GDB is a pain).
Simply use the int3
instruction:
...
mov rsp, [gs:0x10]
int3 // Is my user stack pointer correct?
sysretq
Building OVMF
Building a debug build of OVMF isn't too hard (from the base of the edk2
repo):
OvmfPkg/build.sh -a X64
By default, debug builds of OVMF will output debugging information on the ISA debugcon
, which is actually
probably nicer for our purposes than most builds, which pass DEBUG_ON_SERIAL_PORT
during the build. To log the
output to a file, you can pass -debugcon file:ovmf_debug.log -global isa-debugcon.iobase=0x402
to QEMU.
System calls
Userspace code can interact with the kernel through system calls. Poplar's system call interface is based around
'kernel objects', and so many of the system calls are to create, destroy, or modify the state of various types of
kernel object. Because of Poplar's microkernel design, many traditional system calls (e.g. open
) are not present,
their functionality instead being provided by userspace.
Each system call has a unique number that is used to identify it. A system call can then take up to five parameters, each a maximum in size of the system's register width. It can return a single value, also the size of a register.
Overview of system calls
Number | System call | Description |
---|---|---|
0 | yield | Yield to the kernel. |
1 | early_log | Log a message. Designed to be used from early processes. |
2 | get_framebuffer | Get the framebuffer that the kernel has created, if it has. |
3 | create_memory_object | Create a MemoryObject kernel object. |
4 | map_memory_object | Map a MemoryObject into an AddressSpace. |
5 | create_channel | Create a channel, returning handles to the two ends. |
6 | send_message | Send a message down a channel. |
7 | get_message | Receive the next message, if there is one. |
8 | wait_for_message | Yield to the kernel until a message arrives on the given |
9 | register_service | Register yourself as a service. |
10 | subscribe_to_service | Create a channel to a particular service provider. |
11 | pci_get_info | Get information about the PCI devices on the platform. |
12 | wait_for_event | Yield to the kernel until an event is signalled |
13 | poll_interest | Poll a kernel object to see if changes need to be processed. TODO: this is an experiment and may not continue to exist. |
Making a system call on x86_64
To make a system call on x86_64, populate these registers:
rdi | rsi | rdx | r10 | r8 | r9 |
---|---|---|---|---|---|
System call number | a | b | c | d | e |
The only way in which these registers deviate from the x86_64 Sys-V ABI is that c
is passed in r10
instead of
rcx
, because rcx
is used by the syscall
instruction. You can then make the system call by executing
syscall
. Before the kernel returns to userspace, it will put the result of the system call (if there is one) in
rax
. If a system call takes less than five parameters, the unused parameter registers will be preserved across
the system call.
Return values
Often, a system call will need to return a status, plus one or more handles. The first handle a system call needs to return (often the only handle returned) can be returned in the upper bits of the status value:
- Bits
0..32
contain the status:0
means that the system call succeeded, and the rest of the return value is valid>0
means that the system call errored. The meaning of the value is system-call specific.
- Bits
32..64
contain the value of the first returned handle, if applicable
A return value of 0xffffffffffffffff
(the maximum value of u64
) is reserved for when a system call is made with
a number that does not correspond to a system call. This is defined as a normal error code (as opposed to, for
example, terminating the task that tried to make the system call) to provide a mechanism for tasks to detect kernel
support for a system call (so they can use a fallback method on older kernels, for example).
yield
Used by a task that can't do any work at the moment, allowing the kernel to schedule other tasks.
Parameters
None.
Returns
Always 0
.
Capabilities needed
None.
early_log
Used by tasks that are started early in the boot process, before reliable userspace logging support is running. Output is logged to the same place as kernel logging.
Parameters
a
- the length of the string to log in bytes. Maximum length is 4096 bytes.b
- a usermode pointer to the start of the UTF-8 encoded string.
Returns
0
if the system call succeeded1
if the string was too long2
if the string was not valid UTF-83
if the task making the syscall doesn't have theEarlyLogging
capability
Capabilities needed
The EarlyLogging
capability is needed to make this system call.
get_framebuffer
On many architectures, the bootloader or kernel can create a naive framebuffer using a platform-specific method. This framebuffer can be used to render from userspace, if a better hardware driver is not available on the platform.
Parameters
a
should contain a mapped, writable, user-space address, to which information about the framebuffer will be written.
Returns
This system call returns three things:
- A status code
- A handle to a
MemoryObject
containing the framebuffer, if successful - Information about the framebuffer, if successful, written into the address in
a
The status codes used are:
0
means that the system call was successful1
means that the calling task does not have the correct capability2
means thata
does not contain a valid address for the kernel to write to3
means that the kernel did not create the framebuffer
The information written back to the address in a
has the following structure:
#![allow(unused)] fn main() { #[repr(C)] struct FramebufferInfo { width: u16, height: u16, stride: u16, /// 0 = RGB32 /// 1 = BGR32 pixel_format: u8, } }
Capabilities needed
Tasks need the GetKernelFramebuffer
capability to use this system call.
create_memory_object
Create a MemoryObject kernel object. Userspace can only create "blank" MemoryObjects (that are allocated to free, conventional physical memory). MemoryObjects that point to special objects (e.g. framebuffer data, PCI configuration spaces) must be created by the kernel.
Parameters
a
- the size of the MemoryObject's memory area (in bytes)b
- flags:- Bit
0
: set if the memory should be writable - Bit
1
: set if the memory should be executable
- Bit
c
- a pointer to which the kernel will write the physical address to which the MemoryObject was allocated. Ignored if null.
Returns
Uses the standard representation to return a Result<Handle, MemoryObjectError>
method. Error status
codes are:
1
if the given virtual address is invalid2
if the given set of flags are invalid3
if memory of the requested size could not be allocated4
if the pointer to write the allocated physical address to was not valid
Capabilities needed
None.
map_memory_object
Map a MemoryObject into an AddressSpace.
Parameters
a
- a handle to the MemoryObject.b
- a handle to the AddressSpace. The zero handle indicates to map the memory object into the task's AddressSpace.c
- the virtual address to map the MemoryObject at, if it should be mapped at a specific address. Ifnull
, the kernel will attempt to find a suitable address to map it at, and write that address to the pointer supplied ind
.d
- the pointer at which the virtual address the object is mapped at will be written to, ifc
isnull
. If an address is supplied inc
, this pointer does not need to be valid, and will not be accessed. If this pointer isnull
, the address will not be written, even if the kernel allocated memory for the object.
Returns
0
if the system call succeeded1
if either of the passed handles are invalid2
if the portion of the AddressSpace that would be mapped is already occupied by another MemoryObject3
if the supplied MemoryObject handle does not point to a MemoryHandle4
if the supplied AddressSpace handle does not point to an AddressSpace5
if the supplied pointer ind
is invalid, andc
isnull
Capabilities needed
None (this may change in the future).
create_channel
Create a Channel
kernel object. Channels are slightly odd kernel objects in that they must be referred to in
userspace by two handles, one for each "end" of the channel. This system call therefore returns two handles, one of
which is usually transferred to another task.
Parameters
a
- the virtual address to write the second handle into (only one can be returned in the status)
Returns
Uses the standard representation to return a Result<Handle, CreateChannelError>
method. Error status
codes are:
1
if the passed virtual address is not valid
TODO: if we ditch the ability to return an error (i.e. by making this infallible, or by saying that a null handle denotes an error but not which one), we could return both handles in the status.
Capabilities needed
None.
send_message
Send a message, consisting of a number of bytes and optionally a number of handles, down a Channel
.
All the handles are removed from the sending Task
and added to the receiving Task
.
A maximum of 4 handles can be transferred by each message. The maximum number of bytes is currently 4096.
Parameters
a
- the handle to theChannel
end that is sending the message. The handle must have theSEND
right.b
- a pointer to the array of bytes to sendc
- the number of bytes to sendd
- a pointer to the array of handle entries to transfer. All handles must have theTRANSFER
right. This may be0x0
if the message does not transfer any handles.e
- the number of handles to send
Returns
A status code:
0
if the system call succeeded and the message was sent1
if theChannel
handle is invalid2
if theChannel
handle does not point to aChannel
3
if theChannel
handle does not have the correct rights to send messages4
if one or more of the handles to transfer is invalid5
if any of the handles to transfer do not have the correct rights6
if the pointer to the message bytes was not valid7
if the message's byte array is too large8
if the pointer to the handles array was not valid9
if the handles array is too large10
if the other end of theChannel
has been disconnected
Capabilities needed
None.
get_message
Receive a message from a Channel
, if one is waiting to be received.
A maximum of 4 handles can be transferred by each message. The maximum number of bytes is currently 4096.
Parameters
a
- the handle to theChannel
end that is receiving the message. The handle must have theRECEIVE
right.b
- a pointer to the array of bytes to put the message intoc
- the size of the bytes bufferd
- a pointer to the array of handle entries to transfer. This may be0x0
if the receiver does not expect to receive any handles.e
- the size of the handles buffer (in handles)
Returns
Bits 0..16
are a status code:
0
if the message was received successfully. The rest of the return value is valid.1
if theChannel
handle is invalid.2
if theChannel
handle does not point to aChannel
.3
if there was no message to receive.4
if the address of the bytes buffer is invalid.5
if the bytes buffer is too small to contain the message.6
if the address of the handles buffer is invalid, or if0x0
was passed and the message does contain handles.7
if the handles buffer is too small to contain the handles transferred with the message.
If the status code is 0
(i.e. a valid message was written into the bytes and handles buffers), the return value
also contains the number of valid entries in both the byte and handle buffers:
- Bits
16..32
contain the length of the valid byte buffer (in bytes). If the passed buffer was larger than this, the remaining bytes have not been written by the kernel. - Bits
32..48
contain the length of the valid handles buffer (in handles). If the passed buffer was larger than this, the remaining bytes have not been written by the kernel.
Capabilities needed
None.
register_service
Register yourself as the provider of a service. The name of the service will be {task_name}.{service_name}
. This
returns a channel that is used to alert the provider when another task subscribes to your service with the
subscribe_to_service
system call.
See the section on Services for more information about services, how to register a service, and how to subscribe to a service.
Parameters
a
- the length of the name string in bytes. Maximum length is 256. Must be greater than0
.b
- a usermode pointer to the start of the UTF-8 encoded name string.
Returns
Returns the standard representation of a Result<Handle, ServiceError>
. Error status codes are:
1
if the task does not have the correct capability2
if the usermode pointer to the name is not valid3
if the name is too long, or0
The returned handle is to a Channel
that is used to serve channel subscriptions.
Capabilities needed
The ServiceProvider
capability is needed to make this system call.
subscribe_to_service
Subscribe to a registered service by name. This will deliver a notification to the task that registered the service with one end of a newly created channel. The other end of the channel will be returned by this system call, if successful.
See the section on Services for more information about services, how to register a service, and how to subscribe to a service.
Parameters
a
- the length of the name string in bytes. Maximum length is 256. Must be greater than0
.b
- a usermode pointer to the start of the UTF-8 encoded name string.
Returns
Returns the standard representation of a Result<Handle, ServiceError>
. Error status codes are:
1
if the task does not have the correct capability2
if the usermode pointer to the name is not valid3
if the name is too long, or0
4
if the supplied name does not correspond to a registered channel.
The returned handle is to one end of a Channel
, the other end of which has been given to the task that supplies
the service.
Capabilities needed
The ServiceUser
capability is needed to make this system call.
pci_get_info
Get information about the PCI devices on the platform. This is only meant to be used from the userspace PCI bus driver.
TODO: detail structure of PCI descriptor
Parameters
a
- a pointer to the buffer to put the PCI descriptors inb
- the size of the buffer (in descriptors)
Returns
Bits 0..16
contain a status code:
0
if the system call succeeded1
if the task does not have the correct capabilities2
if the given buffer can't hold all the descriptors3
if the address to the descriptor buffer is invalid4
if the platform doesn't support PCI
If the status code is 0
(i.e. the system call succeeded), bits 16..48
contain the number of descriptors written back.
If the status code is 2
(i.e. the buffer was not large enough), bits 16..48
contain the number of entries that
need to be written.
If a
is 0x0
, this system call will always fail with status code 2
and the number of descriptors in bits
16..48
. This is to allow userspace to dynamically allocate a buffer of the correct size, if it desires.
Capabilities needed
Tasks need the PciBusDriver
capability to use this system call.
Poplar's userspace
Poplar supports running programs in userspace on supporting architectures. This offers increased protection and separation compared to running code in kernelspace - as a microkernel, Poplar tries to run as much code in userspace as possible.
Building a program for Poplar's userspace
Currently, the only officially supported language for writing userspace programs is Rust.
Target
Poplar provides custom target files for userspace programs. These are found in the user/{arch}_poplar.toml
files.
Standard library
Poplar provides a Rust crate, called std
, which replaces Rust's standard library. We've done this for a few
reasons:
- We originally had targets and a
std
port in a fork ofrustc
. This proved difficult to maintain and required users to build a custom Rust fork and add it as arustup
toolchain. This is a high barrier of entry for anyone wanting to try Poplar out. - Poplar's ideal standard library probably won't end up looking very similar to other platform's, as there are significant ideological differences in how programs should interact with the OS. This is unfortunate from a porting point of view, but does allow us to design the platform interface from the group up.
The name of the crate is slightly unfortunate, but is required, as rustc
uses the name of the crate to decide
where to import the prelude from. This significantly increases the ergonomics we can provide, so is worth the
tradeoff.
The std
crate does a few important things that are worth understanding to reduce the 'magic' of Poplar's
userspace:
- It provides a linker script - the linker script for the correct target is shipped as part of the crate, and
then the build script copies it into the Cargo
OUT_DIR
. It also passes a directive torustc
such that you can simply pass-Tlink.ld
to link with the correct script. This is, for example, done usingRUSTFLAGS
by Poplar'sxtask
, but you can also pass it manually or with another method, depending on your build system.
Capabilities
Capabilities describe what a task is allowed to do, and are encoded in its image. This allows users to audit the permissions of the tasks they run at a much higher granularity than user-based permissions, and also allow us to move parts of the kernel into discrete userspace tasks by creating specialised capabilities to allow access to sensitive resources (such as the raw framebuffer) to only select tasks.
Encoding capabilities in the ELF image
Capabilities are encoded in an entry of a PT_NOTE
segment of the ELF image of a task. This entry will have an
owner (sometimes referred to in documentation as the 'name') of POPLAR
and a type of 0
. The descriptor will be
an encoding of the capabilities as described by the 'Format' section. The descriptor must be padded such that the
next descriptor is 4-byte aligned, and so a value of 0x00
is reserved to be used as padding.
Initial images (tasks loaded by the bootloader before filesystem drivers are working) are limited to a capabilities encoding of 32 bytes (given the variable-length encoding, this does not equate to a fixed maximum number of capabilities).
Format
The capabilities format is variable-length - simple capabilities can be encoded as a single byte, while more complex / specific ones may need multiple bytes of prefix, and can also encode fixed-length data.
Overview of capabilities
This is an overview of all the capabilities the kernel supports:
First byte | Next byte(s) | Data | Arch specific? | Description |
---|---|---|---|---|
0x00 | - | - | - | No meaning - used to pad descriptor to required length (see above) |
0x01 | No | GetFramebuffer | ||
0x02 | No | EarlyLogging | ||
0x03 | No | ServiceProvider | ||
0x04 | No | ServiceUser | ||
0x05 | - | - | No | PciBusDriver |
Userspace memory map (x86_64)
x86_64 features an enormous 256TB virtual address space, most of which is available to userspace processes under Poplar. For this reason, things are spread throughout the virtual address space to make it easy to identify what a virtual address points to.
Userspace stacks
Within the virtual address space, the userspace stacks are allocated a 4GB range. Each task has a maximum stack size of 2MB, which puts a limit of 2048 tasks per address space.
Platform Bus
The Platform Bus is a userspace service designed to be a core part of most Poplar systems.
It manages an abstract "bus" of devices that can be added by userspace bus drivers and consumed by userspace device drivers.
Drivers talk to the Platform Bus via channels obtained by subscribing to the Platform Bus's services, platform_bus.bus_driver
and
platform_bus.device_driver
.
Platform Bus is entirely a userspace concept, and so systems can be built around the Poplar kernel without it. However, many drivers and applications will expect the Platform Bus service to be present, and systems not using it will have to handle many low-level systems, such as PCI device enumeration, themselves, and so it is expected that the vast majority of systems would use Platform Bus as a fundamental building block of their userspace.
Device representation on the Platform Bus
Devices on the Platform Bus can be quite abstract, or represent literal devices that are part of the platform, or plugged in as peripherals. Examples include a framebuffer "device" provided by a driver for a graphics-capable device, a power-management chip built into the platform, and USB devices, respectively.
Devices are described by a series of properties, which are typed pieces of data associated with a label. Device properties are used to identify devices, and are given to every device driver that claims it may be able to drive a device. Handoff properties are only transfered to a driver once it has been selected to drive a device, and can contain handles to kernel objects needed to drive the device. These handles are transferred to the task implementing the device driver, which is why they cannot be send arbitrarily to drivers to query support.
Device registration
TODO
Device hand-off to device driver
TODO
Standard devices
The Platform Bus library defines expected properties and behaviour for a number of standard device classes, in an attempt to increase compatability across drivers and device users. Additional properties may be added as necessary for an individual device.
PCI devices
Platform Bus will use information provided by the kernel to create devices for each enumerated PCI device. Standard properties:
Property | Type | Description |
---|---|---|
pci.vendor_id | Integer | Vendor ID of the PCI device |
pci.device_id | Integer | Device ID of the PCI device |
pci.class | Integer | Class of the PCI device |
pci.sub_class | Integer | Sub-class of the PCI device |
pci.interface | Integer | Interface of the PCI device |
pci.interrupt | Event | If configured, an Event that is triggered when the PCI device gets an IRQ |
pci.barN.size | Integer | N is a number from 0-6. The size of the given BAR, if present. |
pci.barN.handle | MemoryObject | N is a number from 0-6. A memory object mapped to the given BAR, if present. |
Generally, specific devices (e.g. a specific GPU) can be detected with a combination of the vendor_id
and device_id
properties, while a type of
device can be identified via the class
, sub_class
, and interface
properties. Drivers should filter against the appropriate properties depending
on the devices they can drive.
USB devices
USB devices may be added to the Platform Bus by a USB Host Controller driver, and can be consumed by a wide array of drivers. Standard properties:
Property | Type | Description |
---|---|---|
usb.vendor_id | Integer | Class of the USB device |
usb.product_id | Integer | Class of the USB device |
usb.class | Integer | Class of the USB device |
usb.sub_class | Integer | Sub-class of the USB device |
usb.protocol | Integer | Protocol of the USB device |
usb.config0 | Bytes | Byte-stream of the first configuration descriptor of the device |
usb.channel | Channel | Control channel to configure and control the device via the bus driver |
HID devices
TODO
Message Passing
Poplar has a kernel object called a Channel
for providing first-class message passing support to userspace.
Channels move packets, called "messages", which contain a stream of bytes, and optionally one or more handles that
are transferred from the sending task to the receiving task.
Ptah
Channels can move arbitrary bytes, but Poplar also includes a layer on top of of Channels called Ptah, which consists of a data model and wire format suitable for encoding data which can be serialized and deserialized from any sensible language without too much difficulty.
Ptah is heavily inspired by Serde, and the first implementation of Ptah was actually a Serde
data format.
Unfortunately, it made properly handling Poplar handles very difficult - when a handle is sent over a channel, it
needs to be put into a separate array, and the in-line data replaced by an index into that array. When the message
travels over a task boundary, the kernel examines and replaces each handle in this array with a handle to the same
kernel object in the new task. This effectively means we need to add a new Handle
type to our data model, which
is not easily possible with Serde (and would make it incompatible with standard Serde anyway).
The Ptah Data Model
The Ptah data model maps pretty well to the Rust type system, and relatively closely to the Serde data model. Key
differences are some stronger guarantees about the encoding of types such as enums (the data model only needs to
fit a single wire format, and so can afford to be less flexible than Serde's), and the lack of a few types -
unit
-based types, and the statically-sized version of seq
and map
- tuple
and struct
. Ptah is not a
self-describing format (i.e. the types you're trying to deserialize is fully known), so the elements of structs and
tuples can simply be serialized in the order they appear, and then deserialized in order at the other end.
- Primitive types
bool
u8
,u16
,u32
,u64
,u128
i8
,i16
,i32
,i64
,i128
f32
,f64
char
string
- Encoded as a
seq
ofu8
, but with the additional requirement that it is valid UTF-8 - Not null terminated, as
seq
includes explicit length
- Encoded as a
option
- Encoded in the same way as an enum, but separately for the benefit of languages without proper enums
- Either
None
orSome({value})
enum
- Include a tag, and optionally some data
- Represent a Rust
enum
, or a tagged union in languages without proper enums - The data is encoded separately to the tag, and can be of any other Ptah type:
- Rust tuple variants (e.g.
E::A(u8, u32)
) are represented bytuple
- Rust struct variants (e.g.
E::B { foo: u8, bar: u32 }
) are represented bystruct
- Rust tuple variants (e.g.
seq
- A variable-length sequence of values, mapping to many types such as
Vec<T>
.
- A variable-length sequence of values, mapping to many types such as
map
- A variable-length series of key-value pairings, mapping to collections like
BTreeMap<K, V>
.
- A variable-length series of key-value pairings, mapping to collections like
handle
- This is the type that means we need our own data model in the first place
- These are encoded out-of-line of the rest of the data, so that the Poplar kernel can introspect into them, if it needs to
The Ptah Wire Format
The wire format describes how messages can be encoded into a stream of bytes suitable for transmission over a channel, or over another transport layer such as the network or a serial port.
Primitives
Primitives are transmitted as little-endian and packed to their natural alignment. The following primitive types are recognised:
Name | Size (bytes) | Description |
---|---|---|
bool | 1 | A boolean value |
u8 , u16 , u32 , u64 , u128 | 1, 2, 4, 8 | An unsigned integer |
i8 , i16 , i32 , i64 , i128 | 1, 2, 4, 8 | A signed integer |
f32 , f64 | 4, 8 | Single / double-precision IEEE-754 FP values |
char | 4 | A single UTF-8 Unicode scalar value |
Journal
This is just a place I put notes that I make during development.
Building a rustc
target for Poplar
We want a target in rustc
for building userspace programs for Poplar. It would be especially cool to get it
merged as an upstream Tier-3 target. This documents my progress, mainly as a reference for me to remember how it all
works.
How do I actually build and use rustc
?
A useful baseline invocation for normal use is:
./x.py build -i library/std
The easiest way to test the built rustc
is to create a rustup
toolchain (from the root of the Rust repo):
rustup toolchain link poplar build/{host triple}/stage1 # If you built a stage-1 compiler (default with invocation above)
rustup toolchain link poplar build/{host triple}/stage2 # If you built a stage-2 compiler
It's easiest to call your toolchain poplar
, as this is the name we use in the Makefiles for now.
You can then use this toolchain from Cargo anywhere on the system with:
cargo +poplar build # Or whatever command you need
Using a custom LLVM
- Fork Rust's
llvm-project
cd src/llvm_project
git remote add my_fork {url to your custom LLVM's repo}
git fetch my_fork
git checkout my_fork/{the correct branch}
cd ..
git add llvm-project
git commit -m "Move to custom LLVM"
Things to change in config.toml
This is as of 2020-09-29
- you need to remember to keep the config.toml
up-to-date (as it's not checked in
upstream), and can cause confusing errors when it's out-of-date.
download-ci-llvm = true
under[llvm]
. This makes the build much faster, since we don't need a custom LLVM.assertions = true
under[llvm]
incremental = true
under[rust]
lld = true
under[rust]
. Without this, the toolchain can't findrust-lld
when linking.llvm-tools = true
under[rust]
. This probably isn't needed, I just chucked it in in caserust-lld
needs it.
Adding the target
I used a slightly different layout to most targets (which have a base, which creates a TargetOptions
, and then a
target that modifies and uses those options).
- Poplar targets generally need a custom linker script. I added one at
compiler/rustc_target/src/spec/x86_64_poplar.ld
. - Make a module for the target (I called mine
compiler/rustc_target/src/spec/x86_64_poplar.rs
). Copy from a existing one. Instead of a separatepoplar_base.rs
to create theTargetOptions
, we do it in the target itself. Weinclude_str!
the linker script in here, so it's distributed as part of therustc
binary. - Add the target in the
supported_targets!
macro incompiler/rustc_target/src/spec/mod.rs
.
Adding the target to LLVM
I don't really know my way around the LLVM code base, so this was fairly cobbled together:
- In
llvm/include/llvm/ADT/Triple.h
, add a variant for the OS in theOSType
enum. I called itPoplar
. Don't make it the last entry, to avoid having to change theLastOSType
variant. - In
llvm/lib/Support/Triple.cpp
, in the functionTriple::getOSTypeName
, add the OS. I addedcase Poplar: return "poplar";
. - In the same file, in the
parseOS
function, add the OS. I added.StartsWith("poplar", Triple::Poplar)
. - This file also contains a function,
getDefaultFormat
, that gives the default format for a platform. The default is ELF, so no changes were needed for Poplar, but they might be for another OS.
TIP: When you make a change in the llvm-project
submodule, you will need to commit these changes, and then update
the submodule in the parent repo, or the bootstrap script will checkout the old version (without your changes) and
build an entire compiler without the changes you are trying to test.
NOTE: to avoid people from having to switch to our llvm-project
fork, we don't actually use our LLVM target from
rustc
(yet). I'm not sure why you need per-OS targets in LLVM, as it doesn't even seem to let us do any of the
things we wanted to (this totally might just be me not knowing how LLVM targets work).
Notes
- We needed to change the entry point to
_start
, or it silently just doesn't emit any sections in the final image. - By default, it throws away our
.caps
sections. We need a way to emit it regardless - this is done by manually creating the program header and specifying that they should be kept withKEEP
. There are two possible solutions that I can see: makerustc
emit a linker script, or try and introduce these ideas intollvm
/lld
with our target (I'm not even sure this is possible). - It looks like
lld
has no OS-specific code at all, and the only place that specifically-kept sections are added is in the linker script parser. Looks like we might have to actually create a file-based linker script (does literally noone else need to pass a linker script by command line??).
USB
USB has a host that sends requests to devices (devices only respond when asked something). Some devices are dual role devices (DRD) (previously called On-The-Go (OTG) devices), and can dynamically negotiate whether they're the host or the device.
Each device can have one ore more interfaces, which each have one or more endpoints. Each endpoint has a hardcoded direction (host-to-device or device-to-host). There are a few types of endpoint (the type is decided during interface configuration):
- Control endpoints are for configuration and control requests
- Bulk endpoints are for bulk transfers
- Isochronous endpoints are for periodic transfers with a reserved bandwidth
- Int endpoints are for transfers triggered by interruptions
The interfaces and endpoints a device has are described by descriptors reported by the device during configuration.
Every device has a special endpoint called ep0
. It's an in+out control endpoint, and is used to configure the
other endpoints.
RISC-V
Building OpenSBI
OpenSBI is the reference implementation for the Supervisor Binary Interface (SBI). It's basically how you access M-mode functionality from your S-mode bootloader or kernel.
Firstly, we need a RISC-V C toolchain. On Arch, I installed the riscv64-unknown-elf-binutils
AUR package. I also
tried to install the riscv64-unknown-elf-gcc
package, but this wouldn't work, so I built OpenSBI with Clang+LLVM
instead with (from inside lib/opensbi
):
make PLATFORM=generic LLVM=1
This can be tested on QEMU with:
qemu-system-riscv64 -M virt -bios build/platform/generic/firmware/fw_jump.elf
It also seems like you can build with a platform of qemu/virt
- I'm not sure what difference this makes yet
but guessing it's the hardware it assumes it needs to drive? Worth exploring. (Apparently the generic
image is
doing dynamic discovery (I'm assuming from the device tree) so that sounds good for now).
So the jump firmware (fw_jump.elf
) jumps to a specified address in memory (apparently QEMU can load an ELF which
would be fine initially). Other option would be a payload firmware, which bundles your code into the SBI image
(assuming as a flat binary) and executes it like that.
We should probably make an xtask
step to build OpenSBI and move it to the bundled
directory, plus decide what
sort of firmware / booting strategy we're going to use. Then the next step would be some Rust code that can print
to the serial port, to prove it's all working.
QEMU virt
memory map
Seems everything is memory-mapped, which makes for a nice change coming from x86's nasty port thingy. This is the
virt
machine's one (from the QEMU source...):
Region | Address | Size |
---|---|---|
Debug | 0x0 | 0x100 |
MROM | 0x1000 | 0x11000 |
Test | 0x100000 | 0x1000 |
CLINT | 0x0200_0000 | 0x10000 |
PCIe PIO | 0x0300_0000 | 0x10000 |
PLIC | 0x0c00_0000 | 0x4000000 |
UART0 | 0x1000_0000 | 0x100 |
Virtio | 0x1000_1000 | 0x1000 |
Flash | 0x2000_0000 | 0x4000000 |
PCIe ECAM | 0x3000_0000 | 0x10000000 |
PCIe MMIO | 0x4000_0000 | 0x40000000 |
DRAM | 0x8000_0000 | {mem size} |
Getting control from OpenSBI
On QEMU, we can get control from OpenSBI by linking a binary at 0x80200000
, and then using -kernel
to
automatically load it at the right location. OpenSBI will then jump to this location with the HART ID in a0
and
a pointer to the device tree in a1
.
However, this does make setting paging up slightly icky, as has been a problem on other architectures. Basically, the first binary needs to be linked at a low address with bare translation, and then we need to construct page tables and enable translation, then jump to a higher address. I'm thinking we might as well do it in two stages: a Seed stage that loads the kernel and early tasks from the filesystem/network/whatever, builds the kernel page tables, and then enters the kernel and can be unloaded at a later date. The kernel can then be linked normally at its high address without faffing around with a bootstrap or anything.
The device tree
So the device tree seems to be a data structure passed to you that tells you about the hardware present / memory
etc. Hopefully it's less gross than ACPI eh. Repnop has written a crate, fdt
,
so I think we're just going to use that.
So fdt
seems to work fine, we can list memory regions etc. The only issue seems to be that memory_reservations
doesn't return anything, which is kind of weird. There also seems to be a /reserved-memory
node, but this
suggests that this doesn't include stuff we want like which part of memory OpenSBI resides in.
This issue says Linux just assumes it shouldn't touch
anything before it was loaded. I guess we could use the same approach, reserving the memory used by Seed via linker
symbols, and then seeing where the loader
device gets put to reserve the ramdisk, but the issue was closed saying
OpenSBI now does the reservations correctly which would be cleaner, but doesn't stack up with what we're seeing.
Ah so actually, /reserved-memory
does seem to have some of what we need. On QEMU there is one child node, called
mmode_resv@80000000
, which would fit with being the memory OpenSBI is in. We would still need to handle the
memory we're in, and idk what happens with the loader
device yet, but it's a start. Might be worth talking to
repnop about whether the crate should use this node.
Dumb way to load the kernel for now
So for some reason, fw_cfg
doesn't seem to be working on QEMU 7.1. This is what we were gonna use for loading the
kernel, command line, and user programs, etc. but obvs this is not possible atm. For now, as a workaround, we can
use the loader
device to load arbitrary data and files into memory.
I'm thinking we could use 0x1_0000_0000
as the base physical address for this - this gives us a maximum of 2GiB
of DRAM, which seems plenty for now (famous last words). We'll need to know the size of the object we're loading on
the guest-side, so we'll load that separately for now (in the future, this whole scheme could be extended to some
sort of mini-filesystem).
Okay so the loader
device is pretty finnicky, and has no error handling. Turns out you can't define new memory
with it, just load values into RAM, but it doesn't actually tell you this has failed. You then try and read this
on the guest, and get super wierd UB from doing so - it doesn't just fault or whatever, it seems to break code
before you ever read the memory (super weird ngl, didn't stick around to work out what was going on).
Right, seems to be working much better by actually putting the values in RAM. We've extended RAM to 1GiB
(0x8000_0000..0xc000_0000
) and we'll use this as the new layout:
Address | Description | Size (bytes) |
---|---|---|
0xb000_0000 | Size of Data | 4 |
0xb000_0004 | Data | N |