Basics
On this page
Let us first understand what we have in our hands.
As an example, I will use the devboard stm32mp257f-dk, the beefiest in the MP2 series.
Before the MP2 series (released in 2024 ), there was the MP1 (released in 2019 ), and both are based on a microprocessor architecture.
The MP2 has a 64-bit architecture instead of 32-bit, many more peripherals, such as those dedicated to encryption, an integrated GPU, NPU to accelerate AI tasks, etc., and greater speed.
In a little more detail, the MP2 is based on the Cortex-A35, while the MP1 is based on the Cortex-A7.
For those interested, I invite you to seek more information about this on your own. Let’s move on.
Architecture
The microprocessors in the STM32MP2 family are all based on 64-bit ARM architecture. In the Linux and toolchains worlds, this architecture is generally identified by the following names:
- ARM64
- aarch64
- armv8
Since ARM was originally developed as a 32-bit architecture, by default, the term “arm” refers to 32-bit, while “arm64” refers to 64-bit.
For completeness, the 32-bit architecture is usually identified by the names:
- ARM
- ARM32
- aarch32
- armv7
This information will be useful in understanding which toolchain to use.
Bootloader
The main purpose of the bootloader is to initialize all the necessary peripherals (RAM, UART, USB controller, watchdogs, MMU, etc.) and then pass the ball to the Linux kernel, which will take control of them.
We will see that, at least for this SOC, some security-critical peripherals will not be managed by Linux, but will remain in the ownership of an additional component with the sole purpose of managing their safe use.
From this STM document we understand that chip has a ROM code (which cannot be modified), which then requires other components to manage system initialization. These other components are responsible for
- initializing the external RAM (DDR type)
- initializing the cpu security context (which memory areas are reserved)
- initializing various peripherals
- finding the Linux kernel on disk and starting it with the correct parameters
Therefore, there is no single bootloader, but rather a bootchain consisting of:
TFA(Trusted Firmware Arm)OPTEE(Open Portable Trusted Execution Environment)UBOOT(classic bootloader for embedded systems)
Furthermore, it should be noted that TFA and UBOOT are components that perform a specific task and then leave room for the next stage (kinda like the different stages of a rocket), while OPTEE is a component that remains active until the system dies. The latter is therefore defined as a “runtime” component.
In fact, TFA is itself divided into two stages, the last of which remains at runtime.
Generally, these components are referred to by generic names depending on the stage to which they belong.
The main difference between one stage and another is dictated by where it is executed, i.e., whether directly in ROM (therefore without the need for RAM), whether from SYSRAM (the internal RAM of the SoC) or whether from RAM (the classic RAM external to the SoC). Furthermore, for brevity, the term “bl” is used to indicate “bootloader”, therefore:
- bl1 –> bootloader in ROM code (not editable)
- bl2 –> bootloader in SYSRAM (few KB available)
- bl3 –> bootloader in RAM (maximum freedom)
Why all this mess? The reason is that unlike PC boards, where the ROM code is a BIOS or UEFI bootloader and is capable of autonomously recognizing all the necessary peripherals, in embedded systems, where resources are always scarce, the bootloader must configure the system as it goes along. The ROM code is very small to keep chip costs low, but with BL2 we have a little more flexibility, which allows us to initialize an external RAM (which the processor currently knows NOTHING about) and then we can run a slightly larger and more capable bootloader on it, with the task of initializing the rest of the system peripherals.
The latter are not fixed, but vary depending on the board on which the processor is mounted. The main task of an embedded engineer is precisely to configure the software to be able to interact with custom hardware.
We will see that one of the most convenient and widely used tools is the Device Tree, where we will describe our hardware to the processor. This is a complex task that requires extensive knowledge of the processor’s internal registers and all the peripherals connected to it.
Usually, our vendor will provide us with a device tree to start from, so there’s no need to swear too much. However, we must bear in mind that our SOC also needs its own device tree, because although we see it as a single chip, it is actually a combination of a processor (the actual core) and many, many peripherals (sysram, uart, USB controller, watchdogs, etc.).
It’s enough to drive you crazy, yes.
Linux
Once we are able to get to a Linux shell, most of the work will be done and all that remains is to configure the loading of the necessary kernel modules and install the applications we need.
First break
The setup of the bootloader, kernel, and applications on Linux can also be done outside of Yocto, if we find it easier (which is also an excellent exercise to better understand the challenges of the process). Almost all of this process, we will do directly in Yocto.