Introduction
Poplar is a general-purpose operating system built around a microkernel and userspace written in Rust. Drivers and core services that would ordinarily be implemented as part of a traditional monolithic kernel are instead implemented as unprivileged userspace programs.
Poplar is not a UNIX, and does not aim for binary or source-level compability with existing programs. While this does slow development down significantly, it gives us the opportunity to design the interfaces we provide from scratch.
Poplar is targeted to run on small computers (think a SoC with a ~1GiB of RAM and a few peripherals) and larger general purpose machines (think a many-core x86_64 "PC"). It is specifically not designed for small embedded systems - other projects are more suited to this space. Currently, Poplar supports relatively modern x86_64 and 64-bit RISC-V (RV64GC) machines.
The Poplar Kernel
At the core of Poplar is a small Rust microkernel. The kernel's job is to multiplex access to hardware resources (such as CPU time, memory, and peripherals) between competing bits of userspace code.
Poplar is a microkernel because its drivers, pieces of code for managing the many different devices a computer may have, live in userspace, and are relatively unpriviledged compared to the kernel. This provides safety benefits over a monolithic kernel because a misbehaving or malicious driver (supplied by a hardware vendor, for example) has a much more limited scope of operation. The disadvantage is that microkernels tend to be slower than monolithic kernels, due to increased overheads of communication between the kernel and userspace.
Kernel objects
The Poplar microkernel is object-based - resources managed by the kernel are represented as discrete 'objects', and are interacted with from userspace via plain integers called 'handles'. Multiple handles referring to a single object can exist, and each possesses a set of permissions that dictate how the owning task can interact with the object.
Kernel objects are used for:
- Task creation and management (e.g.
AddressSpace
andTask
) - Access to hardware resources (e.g.
MemoryObject
) - Message passing between tasks (e.g.
Channel
) - Signaling and waiting (e.g.
Event
)
Kernel Objects
Kernel objects represent resources that are managed by the kernel, that userspace tasks may want to interact with through system calls.
Handles
Kernel objects are referenced from a userspace task using handles. From userspace, handles are opaque 32-bit integers, and are associated within the kernel to kernel objects through a per-task mapping.
A handle of value 0
is never associated with a kernel object, and can act as a sentinel value - various system
calls use this value for various meanings.
Each handle is associated with a series of permissions that dictate what the owning userspace task can do with the corresponding object. Some permissions are relevant to all types of kernel object, while others have meanings specific to the type of object the handle is associated with.
Permissions (TODO: expand on these):
- Clone (create a new handle to the referenced kernel object)
- Destroy (destroy the handle, destroying the object if no other handle references it)
- Send (send the handle over a
Channel
to another task)
Address Space
TODO
Memory Object
TODO
Task
TODO
Channel
TODO
Event
TODO
System calls
Userspace code can interact with the kernel through system calls. Poplar's system call interface is based around
'kernel objects', and so many of the system calls are to create, destroy, or modify the state of various types of
kernel object. Because of Poplar's microkernel design, many traditional system calls (e.g. open
) are not present,
their functionality instead being provided by userspace.
Each system call has a unique number that is used to identify it. A system call can then take up to five parameters, each a maximum in size of the system's register width. It can return a single value, also the size of a register.
Overview of system calls
Number | System call | Description |
---|---|---|
0 | yield | Yield to the kernel. |
1 | early_log | Log a message. Designed to be used from early processes. |
3 | create_memory_object | Create a MemoryObject kernel object. |
4 | map_memory_object | Map a MemoryObject into an AddressSpace. |
5 | create_channel | Create a channel, returning handles to the two ends. |
6 | send_message | Send a message down a channel. |
7 | get_message | Receive the next message, if there is one. |
8 | wait_for_message | Yield to the kernel until a message arrives on the given |
12 | wait_for_event | Yield to the kernel until an event is signalled |
13 | poll_interest | Poll a kernel object to see if changes need to be processed. |
14 | create_address_space | Create an AddressSpace kernel object. |
15 | spawn_task | Create a Task kernel object and start scheduling it. |
Deprecated:
Number | System call | Description |
---|---|---|
2 | get_framebuffer | Get the framebuffer that the kernel has created, if it has. |
9 | register_service | Register yourself as a service. |
10 | subscribe_to_service | Create a channel to a particular service provider. |
11 | pci_get_info | Get information about the PCI devices on the platform. |
Making a system call on x86_64
To make a system call on x86_64, populate these registers:
rdi | rsi | rdx | r10 | r8 | r9 |
---|---|---|---|---|---|
number | a | b | c | d | e |
The only way in which these registers deviate from the x86_64 Sys-V ABI is that c
is passed in r10
instead of
rcx
, because rcx
is used by the syscall
instruction. You can then make the system call by executing
syscall
. Before the kernel returns to userspace, it will put the result of the system call (if there is one) in
rax
. If a system call takes less than five parameters, the unused parameter registers will be preserved across
the system call.
Making a system call on RISC-V
TODO
Return values
Often, a system call will need to return a status, plus one or more handles. The first handle a system call needs to return (often the only handle returned) can be returned in the upper bits of the status value:
- Bits
0..32
contain the status:0
means that the system call succeeded, and the rest of the return value is valid>0
means that the system call errored. The meaning of the value is system-call specific.
- Bits
32..64
contain the value of the first returned handle, if applicable
A return value of 0xffffffffffffffff
(the maximum value of u64
) is reserved for when a system call is made with
a number that does not correspond to a system call. This is defined as a normal error code (as opposed to, for
example, terminating the task that tried to make the system call) to provide a mechanism for tasks to detect kernel
support for a system call (so they can use a fallback method on older kernels, for example).
Syscall: yield
Yield to the kernel. Generally called when a userspace task has no more useful work to perform.
- Parameters:
- None
- Returns:
- Always
0
- Always
Syscall: early_log
Output a line to the kernel log. This is generally used by tasks early in the boot process, before reliable userspace logging is running, but could also be used by small userspaces for diagnostic logging. The output to be logged must be provided as a formatted string encoded as UTF-8.
- Parameters:
a
: the length of the string to log, in bytes. Max of4096
bytes.b
: the address of the string to log.
- Returns:
0
: success1
: the length supplied is too large2
: the supplied string is not valid UTF-8
Syscall: create_memory_object
Create a MemoryObject
kernel object. Userspace can only create "blank" memory objects, backed by free, conventional physical memory.
- Parameters:
a
: the length of the memory object, in bytesb
: flags:- Bit
0
: set if the memory should be writable - Bit
1
: set if the memory should be executable
- Bit
c
: an address to which the kernel will write the physical address to which the memory object was allocated. Not written if null.
- Returns:
0
: success1
: the given set of flags is invalid2
: a memory area of the requested size could not be allocated3
: the address inc
is not null, but is not valid
Syscall: map_memory_object
Map a MemoryObject
into an AddressSpace
.
- Parameters:
a
: the handle of theMemoryObject
b
: the handle of theAddressspace
. A zero handle indicates that the memory object should be mapped into the task's address space.c
: the virtual address to map the memory object at. Null indicates that the kernel should attempt to find a region in the address space large enough to hold the memory object and map it there.d
: a pointer to which the virtual address the memory object has been mapped to is written, ifc
is null. Ifd
is null, this address is not written.
- Returns:
0
: success1
: the handle to theMemoryObject
is invalid or does not point to aMemoryObject
2
: the handle to theAddressSpace
is invalid or does not point to aAddressSpace
3
: the region of the address space that would be mapped is alreay occupied4
: the supplied pointer ind
is invalid
Syscall: create_channel
Create a new channel, returning handles to two Channel
objects, each representing an end of the channel. Generally, one of these handles
is sent to another task to facilitate IPC.
- Parameters:
a
: the address to write the second handle to (only one can be returned in the status)
- Returns:
- Status in bits
0..32
:0
: success1
: the virtual address to write the second handle to is invalid
- Handle to first end in bits
32..64
- Status in bits
TODO: we could pack both handles into the return value by using a sentinel 0
handle to mark that the other handle is actually an error?
Syscall: send_message
Send a message, consisting of a number of bytes and optionally a number of handles, down a Channel
.
All the handles are removed from the sending Task
and added to the receiving Task
.
A maximum of 4 handles can be transferred by each message. The maximum number of bytes is currently 4096.
- Parameters:
a
: the handle to theChannel
from which the message is to be sentb
: a pointer to the array of bytes to sendc
: the length of the message, in bytesd
: a pointer to the array of handle entries to transfer. If the message does not transfer any handles, this should be0x0
e
: the number of handles to transfer
- Returns:
0
if the system call succeeded and the message was sent1
if theChannel
handle is invalid2
if theChannel
handle does not point to aChannel
3
if theChannel
handle does not have the correct rights to send messages4
if one or more of the handles to transfer is invalid5
if any of the handles to transfer do not have the correct rights6
if the pointer to the message bytes was not valid7
if the message's byte array is too large8
if the pointer to the handles array was not valid9
if the handles array is too large10
if the other end of theChannel
has been disconnected
Syscall: get_message
Receive a message from a Channel
, if one is waiting to be received.
A maximum of 4 handles can be transferred by each message. The maximum number of bytes is currently 4096.
- Parameters:
a
: the handle to theChannel
end that is receiving the message.b
: a pointer to the array of bytes to write the message toc
: the maximum number of bytes the kernel should attempt to write to the buffer atb
d
: a pointer to the array of handle entries to transfer. This can be0x0
if the receiver does not expect to receive any handles.e
: the maximum number of handles the kernel should attempt to write to the array atd
.
- Returns:
- Status in bits
0..16
:0
if the message was received successfully. The rest of the return value is valid.1
if theChannel
handle is invalid.2
if theChannel
handle does not point to aChannel
.3
if there was no message to receive.4
if the address of the bytes buffer is invalid.5
if the bytes buffer is too small to contain the message.6
if the address of the handles buffer is invalid, or if0x0
was passed and the message does contain handles.7
if the handles buffer is too small to contain the handles transferred with the message.
- The length of the message in bits
16..32
- This is only valid for statuses of
0
- This is only valid for statuses of
- The number of handles tranferred in bits
32..48
- This is only valid if statuses of
0
- This is only valid if statuses of
- Status in bits
Syscall: wait_for_message
TODO
Syscall: wait_for_event
TODO
Syscall: poll_interest
TODO
Syscall: create_address_space
TODO
Syscall: spawn_task
TODO
Platforms
A platform is a build target for the kernel. In some cases, there is only one platform for an entire architecture because the hardware is relatively standardized (e.g. x86_64). Other times, hardware is different enough between platforms that it's easier to treat them as different targets (e.g. a headless ARM server that boots using UEFI, versus a Raspberry Pi).
All supported platforms are enumerated in the table below - some have their own sections with more details, while others are just described below. The platform you want to
build for is specified in your Poplar.toml
configuration file, or with the -p
/--platform
flag to xtask
. Some platforms also have custom xtask
commands to, for
example, flash a device with a built image.
Platform name | Arch | Description |
---|---|---|
x64 | x86_64 | Modern x86_64 platform. |
rv64_virt | RV64 | A virtual RISC-V QEMU platform. |
mq_pro | RV64 | The MangoPi MQ-Pro RISC-V platform. |
Platform: x64
The vast majority of x86_64 hardware is pretty similar, and so is treated as a single platform. It uses the hal_x86_64
HAL. We assume that the platform:
- Boots using UEFI (using
seed_uefi
) - Supports the APIC
- Supports the
xsave
instruction
Platform: rv64_virt
This is a virtual RISC-V platform emulated by qemu-system-riscv64
's virt
machine. It features:
- A customizable number of emulated RV64 HARTs
- Is booted via QEMU's
-kernel
option and OpenSBI - A Virtio block device with attached GPT 'disk'
- Support for USB devices via EHCI
Devices such as the EHCI USB controller are connected to a PCIe bus, and so we use the Advanced Interrupt Architecture
with MSIs to avoid the complexity of shared pin-based PCI interrupts. This is done by passing the aia=aplic-imsic
machine option to QEMU.
MangoPi MQ-Pro
The MangoPi MQ-Pro is a small RISC-V development board, featuring an Allwinner D1 SoC with a single RV64 core, and either 512MiB or 1GiB of memory. Most public information about the D1 itself can be found on the Sunxi wiki.
You're probably going to want to solder a GPIO header to the board and get a USB-UART adaptor as well. The xtask
contains a small serial utility for logging the output from
the board, but you can also use an external program such as minicom
.
Adding a male-to-female jumper wire to a ground pin is also useful - you can touch it to the RST
pad on the back of the board to reset it (allowing you to FEL new code onto
it).
Boot procedure
The D1 can be booted from an SD card or flash, or, usefully for development, using Allwinner's FEL protocol, which allows data to be loaded into memory and code executed using a small USB stack. This procedure is best visualised with a diagram:
The initial part of this process is done by code loaded from the BROM
(Boot ROM) - it contains the FEL stack, as well as enough code to load the first-stage bootloader from
either an SD card or SPI flash. Data loaded by the FEL stack, or from the bootable media, is loaded into SRAM. The DRAM has to be brought up, either by the first-stage
bootloader, or by a FEL payload.
Booting via FEL
To boot with FEL, the MQ-Pro needs to be plugged into the development machine via the USB-C OTG port (not the host port), and then booted into FEL mode. The easiest way to do
this is to just remove the SD card - as long as the flash hasn't been written to, this should boot into FEL. It should then enumerate as a USB device with ID 1f3a:efe8
on
the host.
You then need something that can talk the FEL protocol on the host. We're currently using xfel
, but this may be replaced/augmented with a
more capable solution in the future. xfel
should be relatively easy to compile and install on a Linux system, and should install some udev rules that allow it to be used by
normal users. xfel
should then automatically detect a connected device in FEL mode.
The first step is to initialize the DRAM - xfel ddr d1
does this by uploading and running a small payload with the correct procedures. After this, further code can be loaded
directly into DRAM - we load OpenSBI and Seed.
Then, we load OpenSBI's FW_JUMP
firmware at the start of RAM, 0x4000_0000
. This provides the SBI interface, moves from M-mode to S-mode, and then jumps into Seed, which is
loaded after 512KiB after it at 0x4008_0000
(this address is supplied to OpenSBI at build-time). We also bundle a device tree for the platform into OpenSBI, which it uses to
bootstrap the platform, and then supplies it onwards.
TODO: we should investigate customising the driver list to maybe get OpenSBI under 256KiB (it's just over).
Seed
Seed is Poplar's bootloader ± pre-kernel. What it is required to do varies by platform, but generally it is responsible for bringing up the system, loading the kernel and initial tasks into memory, and preparing the environment for executing the kernel.
x86_64
On x86_64
, Seed is an UEFI executable that utilises boot services to load the kernel and initial tasks. The Seed
exectuable, the kernel, and other files are all held in the EFI System Partition (ESP) - a FAT filesystem present
in all UEFI-booted systems.
riscv
On RiscV, Seed is more of a pre-kernel than a traditional bootloader. It is booted into by the system firmware, and then has its own set of drivers to load the kernel and other files from the correct filesystem, or elsewhere.
The boot mechanism has not yet been fully designed for RiscV, and also will heavily depend on the hardware target, as booting different platforms is much less standardised than on x86_64.
Debugging the kernel
Kernels can be difficult to debug - this page tries to collect useful techniques for debugging kernels in general, and also any Poplar specific things that might be useful.
Poplar specific: the breakpoint exception
The breakpoint exception is useful for inspecting the contents of registers at specific points, such as in sections
of assembly (where it's inconvenient to call into Rust, or to use a debugger because getting global_asm!
to play
nicely with GDB is a pain).
Simply use the int3
instruction:
...
mov rsp, [gs:0x10]
int3 // Is my user stack pointer correct?
sysretq
Building OVMF
Building a debug build of OVMF isn't too hard (from the base of the edk2
repo):
OvmfPkg/build.sh -a X64
By default, debug builds of OVMF will output debugging information on the ISA debugcon
, which is actually
probably nicer for our purposes than most builds, which pass DEBUG_ON_SERIAL_PORT
during the build. To log the
output to a file, you can pass -debugcon file:ovmf_debug.log -global isa-debugcon.iobase=0x402
to QEMU.
Message Passing
Poplar has a kernel object called a Channel
for providing first-class message passing support to userspace.
Channels move packets, called "messages", which contain a stream of bytes, and optionally one or more handles that
are transferred from the sending task to the receiving task.
Ptah
Channels can move arbitrary bytes, but Poplar also includes a layer on top of Channels called Ptah, which consists of a data model and wire format suitable for encoding data which can be serialized and deserialized from any sensible language without too much difficulty.
Ptah is used for IPC between tasks running in userspace, and also for more complex communication between the kernel and userspace.
Ptah is heavily inspired by Serde, and the first implementation of Ptah was actually a Serde
data format.
Unfortunately, it made properly handling Poplar handles very difficult - when a handle is sent over a channel, it
needs to be put into a separate array, and the in-line data replaced by an index into that array. When the message
travels over a task boundary, the kernel examines and replaces each handle in this array with a handle to the same
kernel object in the new task. This effectively means we need to add a new Handle
type to our data model, which
is not easily possible with Serde (and would make it incompatible with standard Serde serializers anyway).
The Ptah Data Model
The Ptah data model maps pretty well to the Rust type system, and relatively closely to the Serde data model. Key
differences are some stronger guarantees about the encoding of types such as enums (the data model only needs to
fit a single wire format, and so can afford to be less flexible than Serde's), and the lack of a few types -
unit
-based types, and the statically-sized version of seq
and map
- tuple
and struct
. Ptah is not a
self-describing format (i.e. the type you're trying to (de)serialize must be known at both ends), so the elements
of structs and tuples can simply be serialized in the order they appear, and then deserialized in order at the
other end.
- Primitive types
bool
u8
,u16
,u32
,u64
,u128
i8
,i16
,i32
,i64
,i128
f32
,f64
char
string
- Encoded as a
seq
ofu8
, but with the additional requirement that it is valid UTF-8 - Not null terminated, as
seq
includes explicit length
- Encoded as a
option
- Encoded in the same way as an enum, but separately for the benefit of languages without proper enums
- Either
None
orSome({value})
enum
- Include a tag, and optionally some data
- Represent a Rust
enum
, or a tagged union in languages without proper enums - The data is encoded separately to the tag, and can be of any other Ptah type:
- Rust tuple variants (e.g.
E::A(u8, u32)
) are represented bytuple
- Rust struct variants (e.g.
E::B { foo: u8, bar: u32 }
) are represented bystruct
- Rust tuple variants (e.g.
seq
- A variable-length sequence of values, mapping to types such as arrays and
Vec<T>
.
- A variable-length sequence of values, mapping to types such as arrays and
map
- A variable-length series of key-value pairings, mapping to collections like
BTreeMap<K, V>
.
- A variable-length series of key-value pairings, mapping to collections like
handle
- A marker in the data stream that a handle to a kernel object is being moved across the channel. The handle itself is encoded out-of-band.
- This allows the kernel, or something else handling Ptah-encoded data, to process the handle
- Handles being first-class in the data model is why Poplar can't readily use something like
serde
The Ptah Wire Format
The wire format describes how messages can be encoded into a stream of bytes suitable for transmission over a channel, or over another transport layer such as the network or a serial port.
Primitives are transmitted as little-endian and packed to their natural alignment. The following primitive types are recognised:
Name | Size (bytes) | Description |
---|---|---|
bool | 1 | A boolean value |
u8 , u16 , u32 , u64 , u128 | 1, 2, 4, 8 | An unsigned integer |
i8 , i16 , i32 , i64 , i128 | 1, 2, 4, 8 | A signed integer |
f32 , f64 | 4, 8 | Single / double-precision IEEE-754 FP values |
char | 4 | A single UTF-8 Unicode scalar value |
TODO: rest of the wire format
Poplar's userspace
Poplar supports running programs in userspace on supporting architectures. This offers increased protection and separation compared to running code in kernelspace - as a microkernel, Poplar tries to run as much code in userspace as possible.
Building a program for Poplar's userspace
Currently, the only officially supported language for writing userspace programs is Rust.
Target
Poplar provides custom rustc
target files for userspace programs. These are found in the user/{arch}_poplar.toml
files.
Standard library
Poplar provides a Rust crate, called std
, which replaces Rust's standard library. We've done this for a few
reasons:
- We originally had targets and a
std
port in a fork ofrustc
. This proved difficult to maintain and required users to build a custom Rust fork and add it as arustup
toolchain. This is a high barrier of entry for anyone wanting to try Poplar out. - Poplar's ideal standard library probably won't end up looking very similar to other platform's, as there are significant ideological differences in how programs should interact with the OS. This is unfortunate from a porting point of view, but does allow us to design the platform interface from the group up.
The name of the crate is slightly unfortunate, but is required, as rustc
uses the name of the crate to decide
where to import the prelude from. This significantly increases the ergonomics we can provide, so is worth the
tradeoff.
The std
crate does a few important things that are worth understanding to reduce the 'magic' of Poplar's
userspace:
- It provides a linker script - the linker script for the correct target is shipped as part of the crate, and
then the build script copies it into the Cargo
OUT_DIR
. It also passes a directive torustc
such that you can simply pass-Tlink.ld
to link with the correct script. This is, for example, done usingRUSTFLAGS
by Poplar'sxtask
, but you can also pass it manually or with another method, depending on your build system. - It provides a prelude that should be very similar to the official
std
prelude - It provides an entry point to the executable that does required initialisation before passing control to Rust's
main
function
Capabilities
Capabilities describe what a task is allowed to do, and are encoded in its image. This allows users to audit the permissions of the tasks they run at a much higher granularity than user-based permissions, and also allow us to move parts of the kernel into discrete userspace tasks by creating specialised capabilities to allow access to sensitive resources (such as the raw framebuffer) to only select tasks.
Encoding capabilities in the ELF image
Capabilities are encoded in an entry of a PT_NOTE
segment of the ELF image of a task. This entry will have an
owner (sometimes referred to in documentation as the 'name') of POPLAR
and a type of 0
. The descriptor will be
an encoding of the capabilities as described by the 'Format' section. The descriptor must be padded such that the
next descriptor is 4-byte aligned, and so a value of 0x00
is reserved to be used as padding.
Initial images (tasks loaded by the bootloader before filesystem drivers are working) are limited to a capabilities encoding of 32 bytes (given the variable-length encoding, this does not equate to a fixed maximum number of capabilities).
Format
The capabilities format is variable-length - simple capabilities can be encoded as a single byte, while more complex / specific ones may need multiple bytes of prefix, and can also encode fixed-length data.
Overview of capabilities
This is an overview of all the capabilities the kernel supports:
First byte | Next byte(s) | Data | Arch specific? | Description |
---|---|---|---|---|
0x00 | - | - | - | No meaning - used to pad descriptor to required length (see above) |
0x01 | No | GetFramebuffer | ||
0x02 | No | EarlyLogging | ||
0x03 | No | ServiceProvider | ||
0x04 | No | ServiceUser | ||
0x05 | - | - | No | PciBusDriver |
Platform Bus
The Platform Bus is a userspace service designed to be a core part of most Poplar systems.
It manages an abstract "bus" of devices that can be added by userspace bus drivers and consumed by userspace device drivers.
Drivers talk to the Platform Bus via channels obtained by subscribing to the Platform Bus's services, platform_bus.bus_driver
and
platform_bus.device_driver
.
Platform Bus is entirely a userspace concept, and so systems can be built around the Poplar kernel without it. However, many drivers and applications will expect the Platform Bus service to be present, and systems not using it will have to handle many low-level systems, such as PCI device enumeration, themselves, and so it is expected that the vast majority of systems would use Platform Bus as a fundamental building block of their userspace.
Device representation on the Platform Bus
Devices on the Platform Bus can be quite abstract, or represent literal devices that are part of the platform, or plugged in as peripherals. Examples include a framebuffer "device" provided by a driver for a graphics-capable device, a power-management chip built into the platform, and USB devices, respectively.
Devices are described by a series of properties, which are typed pieces of data associated with a label. Device properties are used to identify devices, and are given to every device driver that claims it may be able to drive a device. Handoff properties are only transfered to a driver once it has been selected to drive a device, and can contain handles to kernel objects needed to drive the device. These handles are transferred to the task implementing the device driver, which is why they cannot be send arbitrarily to drivers to query support.
Device registration
TODO
Device hand-off to device driver
TODO
Standard devices
The Platform Bus library defines expected properties and behaviour for a number of standard device classes, in an attempt to increase compatability across drivers and device users. Additional properties may be added as necessary for an individual device.
PCI devices
Platform Bus will use information provided by the kernel to create devices for each enumerated PCI device. Standard properties:
Property | Type | Description |
---|---|---|
pci.vendor_id | Integer | Vendor ID of the PCI device |
pci.device_id | Integer | Device ID of the PCI device |
pci.class | Integer | Class of the PCI device |
pci.sub_class | Integer | Sub-class of the PCI device |
pci.interface | Integer | Interface of the PCI device |
pci.interrupt | Event | If configured, an Event that is triggered when the PCI device gets an IRQ |
pci.barN.size | Integer | N is a number from 0-6. The size of the given BAR, if present. |
pci.barN.handle | MemoryObject | N is a number from 0-6. A memory object mapped to the given BAR, if present. |
Generally, specific devices (e.g. a specific GPU) can be detected with a combination of the vendor_id
and device_id
properties, while a type of
device can be identified via the class
, sub_class
, and interface
properties. Drivers should filter against the appropriate properties depending
on the devices they can drive.
USB devices
USB devices may be added to the Platform Bus by a USB Host Controller driver, and can be consumed by a wide array of drivers. Standard properties:
Property | Type | Description |
---|---|---|
usb.vendor_id | Integer | Class of the USB device |
usb.product_id | Integer | Class of the USB device |
usb.class | Integer | Class of the USB device |
usb.sub_class | Integer | Sub-class of the USB device |
usb.protocol | Integer | Protocol of the USB device |
usb.config0 | Bytes | Byte-stream of the first configuration descriptor of the device |
usb.channel | Channel | Control channel to configure and control the device via the bus driver |
HID devices
TODO
Journal
This is just a place I put notes that I make during development.
Building a rustc
target for Poplar
We want a target in rustc
for building userspace programs for Poplar. It would be especially cool to get it
merged as an upstream Tier-3 target. This documents my progress, mainly as a reference for me to remember how it all
works.
How do I actually build and use rustc
?
A useful baseline invocation for normal use is:
./x.py build -i library/std
The easiest way to test the built rustc
is to create a rustup
toolchain (from the root of the Rust repo):
rustup toolchain link poplar build/{host triple}/stage1 # If you built a stage-1 compiler (default with invocation above)
rustup toolchain link poplar build/{host triple}/stage2 # If you built a stage-2 compiler
It's easiest to call your toolchain poplar
, as this is the name we use in the Makefiles for now.
You can then use this toolchain from Cargo anywhere on the system with:
cargo +poplar build # Or whatever command you need
Using a custom LLVM
- Fork Rust's
llvm-project
cd src/llvm_project
git remote add my_fork {url to your custom LLVM's repo}
git fetch my_fork
git checkout my_fork/{the correct branch}
cd ..
git add llvm-project
git commit -m "Move to custom LLVM"
Things to change in config.toml
This is as of 2020-09-29
- you need to remember to keep the config.toml
up-to-date (as it's not checked in
upstream), and can cause confusing errors when it's out-of-date.
download-ci-llvm = true
under[llvm]
. This makes the build much faster, since we don't need a custom LLVM.assertions = true
under[llvm]
incremental = true
under[rust]
lld = true
under[rust]
. Without this, the toolchain can't findrust-lld
when linking.llvm-tools = true
under[rust]
. This probably isn't needed, I just chucked it in in caserust-lld
needs it.
Adding the target
I used a slightly different layout to most targets (which have a base, which creates a TargetOptions
, and then a
target that modifies and uses those options).
- Poplar targets generally need a custom linker script. I added one at
compiler/rustc_target/src/spec/x86_64_poplar.ld
. - Make a module for the target (I called mine
compiler/rustc_target/src/spec/x86_64_poplar.rs
). Copy from a existing one. Instead of a separatepoplar_base.rs
to create theTargetOptions
, we do it in the target itself. Weinclude_str!
the linker script in here, so it's distributed as part of therustc
binary. - Add the target in the
supported_targets!
macro incompiler/rustc_target/src/spec/mod.rs
.
Adding the target to LLVM
I don't really know my way around the LLVM code base, so this was fairly cobbled together:
- In
llvm/include/llvm/ADT/Triple.h
, add a variant for the OS in theOSType
enum. I called itPoplar
. Don't make it the last entry, to avoid having to change theLastOSType
variant. - In
llvm/lib/Support/Triple.cpp
, in the functionTriple::getOSTypeName
, add the OS. I addedcase Poplar: return "poplar";
. - In the same file, in the
parseOS
function, add the OS. I added.StartsWith("poplar", Triple::Poplar)
. - This file also contains a function,
getDefaultFormat
, that gives the default format for a platform. The default is ELF, so no changes were needed for Poplar, but they might be for another OS.
TIP: When you make a change in the llvm-project
submodule, you will need to commit these changes, and then update
the submodule in the parent repo, or the bootstrap script will checkout the old version (without your changes) and
build an entire compiler without the changes you are trying to test.
NOTE: to avoid people from having to switch to our llvm-project
fork, we don't actually use our LLVM target from
rustc
(yet). I'm not sure why you need per-OS targets in LLVM, as it doesn't even seem to let us do any of the
things we wanted to (this totally might just be me not knowing how LLVM targets work).
Notes
- We needed to change the entry point to
_start
, or it silently just doesn't emit any sections in the final image. - By default, it throws away our
.caps
sections. We need a way to emit it regardless - this is done by manually creating the program header and specifying that they should be kept withKEEP
. There are two possible solutions that I can see: makerustc
emit a linker script, or try and introduce these ideas intollvm
/lld
with our target (I'm not even sure this is possible). - It looks like
lld
has no OS-specific code at all, and the only place that specifically-kept sections are added is in the linker script parser. Looks like we might have to actually create a file-based linker script (does literally noone else need to pass a linker script by command line??).
USB
USB has a host that sends requests to devices (devices only respond when asked something). Some devices are dual role devices (DRD) (previously called On-The-Go (OTG) devices), and can dynamically negotiate whether they're the host or the device.
Each device can have one ore more interfaces, which each have one or more endpoints. Each endpoint has a hardcoded direction (host-to-device or device-to-host). There are a few types of endpoint (the type is decided during interface configuration):
- Control endpoints are for configuration and control requests
- Bulk endpoints are for bulk transfers
- Isochronous endpoints are for periodic transfers with a reserved bandwidth
- Int endpoints are for transfers triggered by interruptions
The interfaces and endpoints a device has are described by descriptors reported by the device during configuration.
Every device has a special endpoint called ep0
. It's an in+out control endpoint, and is used to configure the
other endpoints.
RISC-V
Building OpenSBI
OpenSBI is the reference implementation for the Supervisor Binary Interface (SBI). It's basically how you access M-mode functionality from your S-mode bootloader or kernel.
Firstly, we need a RISC-V C toolchain. On Arch, I installed the riscv64-unknown-elf-binutils
AUR package. I also
tried to install the riscv64-unknown-elf-gcc
package, but this wouldn't work, so I built OpenSBI with Clang+LLVM
instead with (from inside lib/opensbi
):
make PLATFORM=generic LLVM=1
This can be tested on QEMU with:
qemu-system-riscv64 -M virt -bios build/platform/generic/firmware/fw_jump.elf
It also seems like you can build with a platform of qemu/virt
- I'm not sure what difference this makes yet
but guessing it's the hardware it assumes it needs to drive? Worth exploring. (Apparently the generic
image is
doing dynamic discovery (I'm assuming from the device tree) so that sounds good for now).
So the jump firmware (fw_jump.elf
) jumps to a specified address in memory (apparently QEMU can load an ELF which
would be fine initially). Other option would be a payload firmware, which bundles your code into the SBI image
(assuming as a flat binary) and executes it like that.
We should probably make an xtask
step to build OpenSBI and move it to the bundled
directory, plus decide what
sort of firmware / booting strategy we're going to use. Then the next step would be some Rust code that can print
to the serial port, to prove it's all working.
QEMU virt
memory map
Seems everything is memory-mapped, which makes for a nice change coming from x86's nasty port thingy. This is the
virt
machine's one (from the QEMU source...):
Region | Address | Size |
---|---|---|
Debug | 0x0 | 0x100 |
MROM | 0x1000 | 0x11000 |
Test | 0x100000 | 0x1000 |
CLINT | 0x0200_0000 | 0x10000 |
PCIe PIO | 0x0300_0000 | 0x10000 |
PLIC | 0x0c00_0000 | 0x4000000 |
UART0 | 0x1000_0000 | 0x100 |
Virtio | 0x1000_1000 | 0x1000 |
Flash | 0x2000_0000 | 0x4000000 |
PCIe ECAM | 0x3000_0000 | 0x10000000 |
PCIe MMIO | 0x4000_0000 | 0x40000000 |
DRAM | 0x8000_0000 | {mem size} |
Getting control from OpenSBI
On QEMU, we can get control from OpenSBI by linking a binary at 0x80200000
, and then using -kernel
to
automatically load it at the right location. OpenSBI will then jump to this location with the HART ID in a0
and
a pointer to the device tree in a1
.
However, this does make setting paging up slightly icky, as has been a problem on other architectures. Basically, the first binary needs to be linked at a low address with bare translation, and then we need to construct page tables and enable translation, then jump to a higher address. I'm thinking we might as well do it in two stages: a Seed stage that loads the kernel and early tasks from the filesystem/network/whatever, builds the kernel page tables, and then enters the kernel and can be unloaded at a later date. The kernel can then be linked normally at its high address without faffing around with a bootstrap or anything.
The device tree
So the device tree seems to be a data structure passed to you that tells you about the hardware present / memory
etc. Hopefully it's less gross than ACPI eh. Repnop has written a crate, fdt
,
so I think we're just going to use that.
So fdt
seems to work fine, we can list memory regions etc. The only issue seems to be that memory_reservations
doesn't return anything, which is kind of weird. There also seems to be a /reserved-memory
node, but this
suggests that this doesn't include stuff we want like which part of memory OpenSBI resides in.
This issue says Linux just assumes it shouldn't touch
anything before it was loaded. I guess we could use the same approach, reserving the memory used by Seed via linker
symbols, and then seeing where the loader
device gets put to reserve the ramdisk, but the issue was closed saying
OpenSBI now does the reservations correctly which would be cleaner, but doesn't stack up with what we're seeing.
Ah so actually, /reserved-memory
does seem to have some of what we need. On QEMU there is one child node, called
mmode_resv@80000000
, which would fit with being the memory OpenSBI is in. We would still need to handle the
memory we're in, and idk what happens with the loader
device yet, but it's a start. Might be worth talking to
repnop about whether the crate should use this node.
Dumb way to load the kernel for now
So for some reason, fw_cfg
doesn't seem to be working on QEMU 7.1. This is what we were gonna use for loading the
kernel, command line, and user programs, etc. but obvs this is not possible atm. For now, as a workaround, we can
use the loader
device to load arbitrary data and files into memory.
I'm thinking we could use 0x1_0000_0000
as the base physical address for this - this gives us a maximum of 2GiB
of DRAM, which seems plenty for now (famous last words). We'll need to know the size of the object we're loading on
the guest-side, so we'll load that separately for now (in the future, this whole scheme could be extended to some
sort of mini-filesystem).
Okay so the loader
device is pretty finnicky, and has no error handling. Turns out you can't define new memory
with it, just load values into RAM, but it doesn't actually tell you this has failed. You then try and read this
on the guest, and get super wierd UB from doing so - it doesn't just fault or whatever, it seems to break code
before you ever read the memory (super weird ngl, didn't stick around to work out what was going on).
Right, seems to be working much better by actually putting the values in RAM. We've extended RAM to 1GiB
(0x8000_0000..0xc000_0000
) and we'll use this as the new layout:
Address | Description | Size (bytes) |
---|---|---|
0xb000_0000 | Size of Data | 4 |
0xb000_0004 | Data | N |
PCI interrupt routing
PCI interrupt routing is the process of working out which platform-specific interrupt will fire when a given PCI device issues an interrupt. In general, we use message-signalled interrupts (MSIs) where avaiable, and fall back to the legacy interrupt pins (INTA, INTB, INTC, INTD) on devices where they are not.
Legacy interrupt pins
- Each function header contains an interrupt pin field that can be
0
(no interrupts), or1
through4
for each pin. - Interrupts from devices that share a pin cannot be differentiated from each other without querying the devices themselves. For us, this means usermode drivers will need to be awoken before knowing their device has actually received an interrupt.
The pin used by each device is not programmable - each device hardcodes it at time of manufacture. However, they can be remapped by any bridge between the device and host, so that the interrupt signal on the upstream side of the bridge differs to the device's reported interrupt pin. This was necessitated by manufacturers defaulting to using INTA - usage of the 4 available pins was unbalanced, so firmware improves performance by rebalancing them at the bridge.
How the pins have been remapped is communicated to the operating system via a platform-specific
mechanism. On modern x86 systems, this is through the _PRT
method in the ACPI namespace
(before ACPI, BIOS methods and later MP tables were used). On ARM and RISC-V, the device tree
specifies this mapping through the interrupt-map
property on the platform's interrupt controllers.