K.I.S.S. state machine/cooperative concurrency for embedded systems

Although my primary interest is in embedded systems, I’ll present code that runs on a Linux system for illustrative purposes. But first, a little diversion.

My thinking towards embedded systems is tending more and more towards a radical simplicity. I have been exploring Rust and Ada as suitable replacements for C/C++. I know little about either. Ada seems eminently suited to embedded systems, having the notion of tasks, and a few other niceties. I managed to get a little blinky sketch working for the Raspberry Pi Pico. Disappointingly, I couldn’t get a Rust blinky sketch to compile that I found online. There was an incompatibility between the blink code and the embedded HAL. I guess that, by rights, that shouldn’t really happen, as Rust has a reproducible build mechanism. I’m not really criticising Rust per se, it was just a niggle that I could have lived without.

Having said that, they can be useful for reference purposes, as they are “clean” reimplementations of code that might have built up cruft in C/C++.

I’m an advocate of C++. I have been looking over some of the latest C specs, and it’s got me to wondering whether, meh, just use C. But I’ve been thinking that a standard C library for embedded systems might not be such a great idea, especially if you have to integrate it yourself. The Raspberry Pi Pico SDK has everything put in place, so it makes sense to just use it.

For other systems, the payoff might not be so obvious. Newlib seems to be a popular choice for embedded systems. But in a way, newlib wants to be a POSIX library, which might not be a good fit for embedded. I found it a little tedious to integrate into my bare metal Raspberry Pi 0 project, and I seem to have made a mistake with regards to floating point operations.

There are other standard libraries to choose from. One that looks interesting is picolibc, which blends newlib and avrlib, and contains no dynamic allocations. I think newlib needs to dynamically allocate memory for its print routines (??), but I could be wrong.

But I’m leaning to an alternative approach: ditch the idea of a standard C library, and just implement the needed functions on a case-by-case basis. It might seem an insane idea, but it’s all a question of “how much do I need this stuff anyway?” There are usually some good subsets of funtionality that you can find on the internet which you can use if you feel the need.

The game-plan is, therefore, to have a very lightweight, very flat, system; without being absurd about it. I’m liking libopencm3, for example, and I’ve gotten quite far with getting stuff done for my STM32 (before I fried it). The documentation on it could be a little better, admittedly, but I do get a satisfying sense that I could understand what the library does without wading through layers and layers of abstractions.

Concurrency. There’s a lot of debate raging around in embedded systems about whether it’s a good idea, or not. Some prefer state machines, and think that RTOS’s are a bad idea.

There’s a smorgasbord of concurrency mechanisms to choose from. Do you want preemptive, cooperative, stackful, or stackless coroutines. The big player in RTOS’s is FreeRTOS. There are plenty of others: Zephyr, RIOT, the list is endless.

One of the things to ask yourself, though, is: do I really understand what’s going on under the hood? Things can be a little black-boxy. I’m sure that many understand the underlying mechanisms, as I’ve been fairly lazy at investigating what it’s really all about. But again, the more stuff you put in your project, the more difficult it is to understand.

I had been thinking that C++20’s coroutines might be a great choice for the implementation of concurrency, but after reading a few blogs, my eyes glazed over. Not sure if the C++ coroutines are intrinsically brain-damaging to use, or the bloggers are making hard work of it.

A couple of simple approaches which work in C is the async library, and protothreads. The async library has been discussed on Hackaday, with some commenters liking it, and others not so much. I have used it in the past, and it works well enough. There is C macro trickery, which has been the bone of contention.

My own recent experiment is to consider using “computed gotos” as a mechanism for stackless cooperative coroutines using a round-robin scheduler. That’s a bit of a mouthful for something that is actually quite simple.

I’m going to define a function millis() which returns the number of milliseconds elapsed since the Unix epoch:

// get time in milliseconds
long long millis()
    struct timeval te;
    gettimeofday(&te, NULL); // get current time
    long long milliseconds = te.tv_sec*1000LL + te.tv_usec/1000; // calculate milliseconds
    // printf("milliseconds: %lld\n", milliseconds);
    return milliseconds;

Obviously in an embedded system you’d have to come up with your own implementation. Maybe you could even get away without one. I just wanted some kind of clock as a way of testing a non-blocking delay.

We’re going to define a task(). In my example we’ll want to quit from it, so let’s define a global var:

static bool done = false;

Our main function will periodically execute the task:

int main()
        while(!done) {

        return 0;

In practise you would implement many tasks, and you’d almost certainly want to loop forever on the task if you using it on a microcontroller.

Our “task” will be to start, wait for 2000ms without blocking, and then stop. Here’s how I chose to implement it:

void task(void)
        static long long start, now;
        static void* where = &&start;
        goto *where;
        puts("Task started");
        start = millis();
        where = &&pausing;
        if(millis() - start < 2000) return;
        puts("That's enough waiting");
        where = &&finis;
        done = true;
        puts("Task finished");

OK. So it’s not reentrent, as you have stuff stored statically. If you really wanted that kind of thing, then your design will be much more complicated. You’d have to pass through state, and all that jazz.

The real secret here is in the line:

static void* where = &&start;

So “where” just stores the address where execution should resume. Obviously you initialise it to the start state. You resume processing by simply jumping to the stored address:

goto *where;

The neat thing here is that you define states using C labels. It’s not the only way of doing it. You could use some kind of enumeration for the state and perform a switch on the state, but the way I’ve laid it out is better, I think. When you want to switch states, just set

where = &&whatever_new_state

The delay mechanism is pretty damn straightforward, too.

You need to take more care when adopting this approach, but it doesn’t seem any worse than all those other mechanisms employing macro tricks.

What about the use of global state? Yeah, the use of global state isn’t perfect, but I don’t think there are too many ways around it. Just call it a “poor man’s message-passing system”.

Be careful about any data that is shared with interrupts, though. You’ll need to employ the usual safeguards for those.

Anyway, that’s it for now. I hope it has given you something to think about. Code is available here.

About mcturra2000

Computer programmer living in Scotland.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s