Embedded Kernel on MCU

December 2022

Abstract

The goal of this project was to develop a pre-emptive multi-threaded kernel on the MSP432 microcontroller. This kernel provides an abstracted view of the hardware, such that the application developer can be concerned only with writing independent thread/processes to perform system functions.

This kernel presents as an API boundary layer between the kernel and a different set of developers.

Project Requirement

Implement and document a working kernel for the MSP342.
Support multiple concurrent threads of execution (a.k.a. tasks).
The kernel must be pre-emptive such that each thread may have its execution suspended at any time.
Pre-empt each thread at 2 ms intervals using the SysTick timer peripheral.
A thread may give up its 2 ms "time slice" at any time by calling a yield() function.
All threads are to run in the unprivileged thread mode of the Corted-M4F processor and to use the process stack. The kernel must run in privileged handler mode and use the main stack. Every thread is to have its own separate stack.
All communication between threads and the kernel is to be done through C API calls (not global variables). Communication between threads may use global variables/memory (e.g. circular buffers).
All interrupt service routines (ISR) and hardware interactions are considered to be part of the kernel. Threads are not allowed to enable or disable interrupts and mist interact with the hardware through C API function calls only.
There are to be no busy-wait loops anywhere in the kernel or threads, except perhaps for 1 "idle thread" that does nothing and runs only when there is no work to be done in either user threads or the kernel.
- ANY for-loop or while-loop (or any kind of loop) in the kernel is suspicious and must be justified.
  - A loop in the kernel that iterates over active threads looking for the next thread to run is OK.
  - A loop in the kernel that samples a pushbutton 5 times, with a delay in between each sample for the purpose of debouncing is NOT OK. Debouncing can be done in a user thread or intelligently in the kernel (without loops or delays).
  - A loop in the kernel that computes the average power of the last 'N' A/D samples is not OK. This loop belongs in a user thread.
- ISR's must be FAST and SHORT: handle the hardware event, move some data, set a flag, etc. then exit ASAP. There should be NO significant work or computation in the kernel.

Outcomes

To properly set up the operating system, three slices of memory were allocated to act as the core registers for each of the threads. Given that the core registers of the Cortex M4 are occupied by the “task at hand,” these memory slices serve as placeholders for each thread to store the value of the core registers before passing control to the following thread. This allows each thread to recall where in its individual task it left off before control was taken from it (or was willingly yielded).

This project contains three threads (Thread0, Thread1, and Thread2). Thread0 is in charge of blinking the Red on-board LED whenever the thread is entered. Thread1 and Thread2 both control the onboard UART module to print different messages to the PC’s Terminal Window.

Thread0 uses bit banding to toggle this LED. This bit banding approach consists of using the CortexM4 internal memory which maps to specific pin locations. When a particular value is written to such memory locations, the corresponding pin can be driven to whatever value is desired. With the particular registers determined, a simple value assignment is needed to address that bit as desired.

Alternatively, since Thread1 and Thread2 both use UART, the problem arises when both threads want to use the UART module at the same time. If Thread1 is in the middle of printing a message to the console, and its time slice expires, Thread2, then, gains control and starts printing its message. This problem can be solved using locks.

A lock was implemented using a variable that holds the value 1 when the lock is unlocked, and a value of 0 when the lock is locked. This allows for each of the Threads to retain control of the UART module even after its timeslice is up temporarily. Say Thread1 has the UART module locked, Thread2 is unable to do anything with it until Thread1 regains control and finishes its task. Only when Thread1 has finished its task and unlocked the lock can Thread2 print to the UART module. This allows for proper implementation of each thread.

The control for these slices is handled by the Systick Timer. The Systick timer is set up to trigger an interrupt every 2ms. When the interrupt is triggered, the ISR Sytick_Handler(), is called which then takes a snapshot of the current Thread’s registers by storing copies of these in their respective memory slices and giving control to the next thread in line.

Interactive Demo

Challenges

The main challenge encountered in this project was handling the individual stack memories of each thread. Essentially being able to take a snapshot of a thread's stack before giving up control to the next thread and then recalling such thread stack to the MCU's stack memory. Although a challenge, this project was completed with a successful implementation of a kernel on the MCU.

Software

Check out the GitHub Repo for this project!