Kicking the tyres of Ada for the #rp2040

In my last post I said that I seemed to have accidentally set off interrupt triggers by writing to the wrong bits of a register. It got me to wondering whether something like Ada might be a better bet. I guess Rust might be an an interesting alternative, but I couldn’t get a successful compile for the Pico.

I’m running Debian Stable, and I’m suspecting that the Ada compiler might be a little old. So I used a more recent version which I downloaded from AdaCore.

I think you also need a utility called “alire”, the Ada Library Repository. I’m a bit hazy of the details, and how I got the whole thing working. Here is a simple blink sketch:

--
--  Copyright 2021 (C) Jeremy Grosser
--
--  SPDX-License-Identifier: BSD-3-Clause
--
with RP.Device;
with RP.Clock;
with RP.GPIO;
with Pico;

procedure Main is
begin
        RP.Clock.Initialize (Pico.XOSC_Frequency);
        Pico.LED.Configure (RP.GPIO.Output);
        RP.Device.Timer.Enable;
        loop
                Pico.LED.Set;
                RP.Device.Timer.Delay_Milliseconds (100);
                Pico.LED.Clear;
                RP.Device.Timer.Delay_Milliseconds (900);
        end loop;
end Main;

You can find my set-up here. It uses a Makefile, which I prefer. The GNAT Programming Studio that comes with Debian seems quite snappy, but as I already said, the project doesn’t seem to build for whatever reason.

The Pico libraries are described in further detail in Jeremy Grosser’s blog.

I’ve nothing insightful add, as I’m completely new to Ada myself. I just thought it was worthwhile giving an interesting-looking project a bit of publicity.

Have fun.

Update 2021-07-15: I tried to get tasks working, but it appears to be an unsupported feature for the Pico. Hmmm, that’s disappointing. That removes much of the attraction of Ada for me. I have also found out today that C supports bit-fields. C++ definitely does.You’d think that feature would be heavily used in embedded systems, but I’m not aware of any library that does. That might be worth playing around with to see if it makes things a tad easier.

Posted in Uncategorized | Leave a comment

Simple memory-to-memory STM32F4 DMA example using libopencm3

Well, it took me most of the day to get it working, but I got there in the end.

The idea is that we want to copy memory from one location to another using DMA. Maybe it’s not especially useful, but it should help us get our feet under the table when it comes to learning DMA. I’ll present the code, and discuss it as I go along.

Start off with some boilerplate:

     1	/*
     2	 * Memory-to-memory must use DMA2
     3	 *
     4	 * The F4 has 2 controllers: DMA1, and DMA2.
     5	 * Each DMA has 8 streams: pathways where memory can flow
     6	 *
     7	 * Further info:
     8	 * https://adammunich.com/stm32-dma-cheat-sheet/
     9	 */
    10	
    11	#include <libopencm3/stm32/rcc.h>
    12	//#include <libopencm3/stm32/gpio.h>
    13	//#include <libopencm3/stm32/timer.h>
    14	//#include <libopencm3/stm32/spi.h>
    15	#include <libopencm3/stm32/dma.h>
    16	
    17	#include <string.h>
    18	

Include some convenience headers and other stuff that I won’t go into:

    19	#include "mal.h"
    20	
    21	typedef uint32_t u32;
    22	
    23	#define LED PC13
    24	

We’re going to need to decide on 3 things: a DMA, a stream, and a channel. An STM32F4 has two channels: DMA1 and DMA2. For memory-to-memory copying, you must use DMA2. I haven’t fathomed out exactly what streams and channels are, but it seems that only one set of data can pass through a stream at a time. By “set”, I mean “channel”.

Refer to tables 27 for DMA2 and table 28 for DMA2 , s9.3.3 of the RM0383 Reference Manual for the STM32F411xC/E. It’s on pages 166 and 167. It will tell you, for example, that if you want to use SPI3_RX DMA, then you need DMA, Channel 0, and either stream 0 or 2.

Memory-to-memory isn’t shown in the document, so I have assumed that I can use a slot that isn’t occupied in either of those two tables. So I chose DMA2 and stream 0:

    25	uint32_t dma = DMA2;
    26	//uint8_t ch = -3;
    27	uint8_t strm = DMA_STREAM0; // chosen arbitrarily
    28	

Further down I’ll select channel 4. Let me define an outputting function:

static void myputs(const char *str)
    30	{
    31		mal_usart_print(str);
    32		mal_usart_print("\r\n");
    33	}
    34	
    35	

Don’t worry about how mal_usart_print() is defined, just think of myputs() as equivalent of puts(), but for USART1.

It is possible to enable an interrupt for the DMA. The name of all the interrupts are predefined in libopencm3/libopencm3/include/libopencmsis/stm32/f4/irqhandlers.h

You don’t get to choose your own one. I’m going to use an ISR to experiment with. It’s not strictly necessary to use one, you could do busy waiting for the completion transfer flag to be set, for example. There’s probably not much point in just doing that, though, as you’ll have done a memcpy(), but harder. Here’s my definitions:


    36	volatile bool transfer_complete = false;
    37	volatile bool loud_dma_isr = true;
    38	
    39	void dma2_stream0_isr()
    40	{
    41		if(dma_get_interrupt_flag(dma, strm, DMA_TCIF)) {
    42			dma_clear_interrupt_flags(dma, strm, DMA_TCIF); // clear transfer complete flag
    43			transfer_complete = true;
    44			if(loud_dma_isr) 
    45				myputs("dma2_stream0_isr called: transfer complete");
    46		} 
    47		else if(dma_get_interrupt_flag(dma, strm, DMA_DMEIF)) {
    48			dma_clear_interrupt_flags(dma, strm, DMA_DMEIF);
    49			myputs("dma2_stream0_isr called: Direct Mode Error Interrupt Flag");
    50		} 
    51		else if(dma_get_interrupt_flag(dma, strm, DMA_TEIF)) {
    52			dma_clear_interrupt_flags(dma, strm, DMA_TEIF);
    53			myputs("dma2_stream0_isr called: Transfer Error Interrupt Flag");
    54		} 
    55		else if(dma_get_interrupt_flag(dma, strm, DMA_FEIF)) {
    56			dma_clear_interrupt_flags(dma, strm, DMA_FEIF);
    57			myputs("dma2_stream0_isr called: FIFO Error Interrupt Flag");
    58		} 
    59		else if(dma_get_interrupt_flag(dma, strm, DMA_HTIF)) {
    60			dma_clear_interrupt_flags(dma, strm, DMA_HTIF);
    61			myputs("dma2_stream0_isr called: Half Transfer Interrupt Flag");
    62		} 
    63		else {
    64			myputs("dma2_stream0_isr called: Unhandled (should never be called)");
    65		}
    66	
    67	}
    68	

The MCU could possible set several flags that can be set for the interrupt, and I have enumerated all the possibilities. I did this because I made a bug in my coding which seemed to trigger interrupt requests mysteriously. A language like ADA would probably have prevented the silly error in the first place.

So the way you’d hoose to code the ISR would likely be much shorter. The case we’re only really interested in is when the DMA_TCIF flag is set. This is when the transfer is complete.

In the ISR, I clear the flag. That is important, because otherwise the interrupt will keep firing. I also set my own boolean variable “transfer_complete” to true. The variable “loud_dma_isr” is for debugging purposes. I’m going to want to turn it off when I do benchmarking.

Let’s define main(), set up the built-in LED, and initialise a USART:

69    int main(void)
70    {
71        pin_out(LED);
72        mal_usart_init();
73    

I’m not going to go into the details of those. Just accept that they do something useful.

Let’s display some output, and set up some variables as source and destinations for our copying:

    74		myputs("");
    75		myputs("=============================");
    76		myputs("DMA example: memory to memory");
    77	
    78		char src1[] = "1234567890";
    79		char dst1[] = "abcdefghij";
    80		uint32_t len = strlen(src1) + 1;
    81	
    82		myputs(dst1);
    83	

OK, time to do some basic configuration. S9.3.17 (page 181) of the manual gives the stream configuration procedure. I think it is a little misleading, as you don’t quite want to do it exactly as they have laid out. I have tried to keep things to a minimum.

You need to enable the relevant clock:

    85		// follow instructions in s9.3.17 of RM0383a, p181
    86		rcc_periph_clock_enable(RCC_DMA2);
    87	

Disable the stream. It is possible that a stream is already being used, and so you need to block until it is finished. If you choose your streams carefully, so that there is no possible contention, then you probably won’t need to do much in the way of waiting:

    88		// step 1 : disable the stream
    89		myputs("Disabling stream");
    90		//dma_disable_stream(dma, strm);
    91		DMA_SCR(dma, strm) &= ~DMA_SxCR_EN;
    92		while(DMA_SCR(dma, strm) & DMA_SxCR_EN); // wait until it is down
    93		myputs("Stream disabled. OK to configure");
    94	

The procedure advises to get the addresses of the source and destination addresses. If you don’t want to change them in future, then you can set them now. Or set them as needed. The chances are that you are going to fix the addresses anyway. I wanted to play around for this example:

    95		//DMA_SPAR(dma, strm) = (uint32_t) str1; // step 2: set peripheral port address
    96		myputs("Step 2");
    97		//dma_set_peripheral_address(dma, strm, (uint32_t) src1); // step 2: set source address
    98		//DMA_SM0AR(dma, strm) = *(uint32_t*) str2; // step 3: set the memory address
    99		myputs("Step 3");
   100		dma_set_memory_address(dma, strm, (uint32_t) dst1); // step 3 : destination address
   101		myputs("Step 4");
   102		dma_set_number_of_data(dma, strm, len); // step 4: total number of data items
   103		// dma_channel_select(dma, strm, 0); // step 5: I just made up a channel number in this case

For now I have set the destination address – what the reference manual calls memory address, in line 100. I’ve also set up the length of the transfer, in line 102.

Then I chose stream 4:

   104		myputs("Step 5");
   105		dma_channel_select(dma, strm, DMA_SxCR_CHSEL_4); // step 5: I just made up a channel number in this case

There’s a bunch of stuff mentioned in the configuration procedure that I just ignored:

   106		// step 6: something about flow controller. Omitted
   107		// step 7: configure stream priority. Omitted
   108		// step 8: configure FIFO usage. Omitted setup of FIFO
   109	

I think “flow controller” is when you aren’t sure the length of the transmission. There’s also stream priorities you can set, which i’m not interested in. There’s a variety of transfer methods, including burst, FIFO, half-transmission, etc. You’d use half-transmission if you wanted to set up double-buffering. We’re going to be using memory-to-memory:

   110		// step 9: variety of things
   111		myputs("Step 9a");
   112		dma_set_transfer_mode(dma, strm, DMA_SxCR_DIR_MEM_TO_MEM); // step 9: data direction

and many of those methods won’t be available to us in that mode.

Setup up peripheral and memory increment mode:

   113		myputs("Step 9b");
   114		dma_enable_peripheral_increment_mode(dma, strm); // step 9: we want to increment "periph" address
   115		myputs("Step 9c");
   116		dma_enable_memory_increment_mode(dma, strm); // step 9: ditto for memory

In other words, as we transfer each item we increase both the source and destination addresses as we do so. This is how you do memory copy. If you’re writing to a SPI, for example, then the peripheral address won’t change. Our data is in 8-bit format:

   117		//dma_enable_directt_mode(dma, strm); // step 9: 
   118		// step 9 : can use single or burst, but not circ, direct or double-buffer
   119		myputs("Step 9d");
   120		dma_set_memory_size(dma, strm, DMA_SxCR_MSIZE_8BIT);
   121		myputs("Step 9e");
   122		//dma_set_peripheral_size(dma, strm, len);
   123		dma_set_peripheral_size(dma, strm, DMA_SxCR_PSIZE_8BIT);

You can do wacky things like have a source which is 8 bit and a destination which is 16 bits. This causes padding or truncation, which may be useful. Refer to the datasheet for more info. It’s of no use to us here.

Turn on the interrupts:

   124		myputs("Fiddling with interrupts");
   125		//dma_clear_interrupt_flags(dma, strm, DMA_TCIF); // clear transfer complete flag
   126		//dma_clear_interrupt_flags(dma, strm, DMA_HTIF); // clear half-transfer complete flag
   127		//dma_disable_half_transfer_interrupt(dma, strm);
   128		dma_enable_transfer_complete_interrupt(dma, strm);
   129		nvic_enable_irq(NVIC_DMA2_STREAM0_IRQ);
   130		myputs("Finished setting up interrupts");
   131	
   132	

Now let’s do a transfer:

   133		myputs("Start tfr 1");
   134		transfer_complete = false;
   135		dma_set_peripheral_address(dma, strm, (uint32_t) src1); // step 2: set source address
   136		dma_enable_stream(dma, strm); // step 10
   137	
   138		myputs(dst1); // this will likely only partially complete
   139		while(!transfer_complete);
   140		myputs("Tfr 1 completed");
   141		myputs(dst1);
   142		myputs(src1);

We wanted to play around with the source address, remember, in line 135. Line 134 ensures we zero out our completion of transfer flag.

In line 139, we print out the contents of our target location before we know the transfer is complete. Line 138 outputs to the console (or it did for me, at least):

a234567890

It should read:

1234567890

Line 139 does a busy-wait, after which the correct output is given. The output to the console so far is:

=============================
DMA example: memory to memory
abcdefghij
Disabling stream
Stream disabled. OK to configure
Step 2
Step 3
Step 4
Step 5
Step 9a
Step 9b
Step 9c
Step 9d
Step 9e
Fiddling with interrupts
Finished setting up interrupts
Start tfr 1
dma2_stream0_isr called: transfer complete
a234567890
Tfr 1 completed
1234567890
1234567890

Let’s try another DMA request to make sure things work as we expect them to:

   143	
   144		
   145	
   146	
   147	
   148		myputs("\r\nStart tfr 2");
   149		transfer_complete = false;
   150		char src2[] = "ABCDEFGHIJ";
   151		dma_set_peripheral_address(dma, strm, (uint32_t) src2);
   152		dma_enable_stream(dma, strm);
   153		while(!transfer_complete);
   154		myputs(dst1);
   155		myputs("Tfr 2 completed");
   156	

The output on the console reads:

Start tfr 2
dma2_stream0_isr called: transfer complete
ABCDEFGHIJ
Tfr 2 completed

Good! We have successfully copied string src2 to dst1.

That’s the basic’s covered. Now let’s do some timings, to see how fast DMA transfer is compared to a regular memcpy():

	// now do timings
   158	#define TPIN PC14
   159		pin_out(TPIN);
   160		char dst3[512], src3[512];
   161		int i;
   162		dma_set_peripheral_address(dma, strm, (uint32_t) src3);
   163		dma_set_number_of_data(dma, strm, 512);
   164		dma_set_memory_address(dma, strm, (uint32_t) dst3);
   165		loud_dma_isr = false;
   166		while(1) {
   167			// use dma
   168			pin_high(TPIN);
   169			for(i = 0; i< 100; ++i) {
   170				transfer_complete = false;
   171				dma_enable_stream(dma, strm);
   172				while(!transfer_complete);
   173			}
   174			pin_low(TPIN);
   175	
   176			mal_delayish(1);
   177	
   178			// use memcpy
   179			pin_high(TPIN);
   180			for(i = 0; i< 100; ++i) {
   181				memcpy(dst3, src3, 512);
   182			}
   183			pin_low(TPIN);
   184	
   185			mal_delayish(10);
   186		}
   187	
   188	}

I use pin PC14 to toggle the pin high and low when I do 100 rounds of DMA transfers, and 100 rounds of memcpy(). I use a logic analyser to see how long it took. I didn’t want to copy 11 bytes at a time, but a more reasonable 512 byte block. I haven’t bothered setting up their contents, as I’m happy that we’ve already figured out that the copying is working correctly.

Using my logic analyser, the DMA transfers take about 11.796ms. That’s for 100 X 512-byte blocks. So each block takes 118us. That’s actually pretty unpleasant if we’re playing with audio at, say, 44kHz, which works out at about 23us per sample. So we may need to be a little clever how we do this so as not to cause jittering in our audio.

Using memcpy() takes 2.703ms, which is 27us per block.

As you can see, a naive memcpy() works much faster than a DMA transfer. The difference is that memcpy() actually blocks, because it is tying up the CPU, whereas the DMA can be run synchronously.

That doesn’t seem to be much of a win for DMA over the simpler memcpy. Unless I’ve done something hideously wrong, of course. It seems that DMA will be much more useful in something like SPI transfers, which can seriously clog up CPU usage due to their relative low speed.

So, I hope this post was useful to you. It is my first foray into DMA, so if you have any comments to make, then feel free. I probably won’t be able to answer many questions you have, though.

My plan next is to see how DMA can be used over SPI to output to a DAC. I think the I2S functionality will be relevant here. But that’s a battle for another day.

You can download the code here. It’s probably simplest to download the whole repo and issue a make in that directory. Happy hunting.

Posted in Uncategorized | Leave a comment

K.I.S.S. state machine/cooperative concurrency for embedded systems

Although my primary interest is in embedded systems, I’ll present code that runs on a Linux system for illustrative purposes. But first, a little diversion.

My thinking towards embedded systems is tending more and more towards a radical simplicity. I have been exploring Rust and Ada as suitable replacements for C/C++. I know little about either. Ada seems eminently suited to embedded systems, having the notion of tasks, and a few other niceties. I managed to get a little blinky sketch working for the Raspberry Pi Pico. Disappointingly, I couldn’t get a Rust blinky sketch to compile that I found online. There was an incompatibility between the blink code and the embedded HAL. I guess that, by rights, that shouldn’t really happen, as Rust has a reproducible build mechanism. I’m not really criticising Rust per se, it was just a niggle that I could have lived without.

Having said that, they can be useful for reference purposes, as they are “clean” reimplementations of code that might have built up cruft in C/C++.

I’m an advocate of C++. I have been looking over some of the latest C specs, and it’s got me to wondering whether, meh, just use C. But I’ve been thinking that a standard C library for embedded systems might not be such a great idea, especially if you have to integrate it yourself. The Raspberry Pi Pico SDK has everything put in place, so it makes sense to just use it.

For other systems, the payoff might not be so obvious. Newlib seems to be a popular choice for embedded systems. But in a way, newlib wants to be a POSIX library, which might not be a good fit for embedded. I found it a little tedious to integrate into my bare metal Raspberry Pi 0 project, and I seem to have made a mistake with regards to floating point operations.

There are other standard libraries to choose from. One that looks interesting is picolibc, which blends newlib and avrlib, and contains no dynamic allocations. I think newlib needs to dynamically allocate memory for its print routines (??), but I could be wrong.

But I’m leaning to an alternative approach: ditch the idea of a standard C library, and just implement the needed functions on a case-by-case basis. It might seem an insane idea, but it’s all a question of “how much do I need this stuff anyway?” There are usually some good subsets of funtionality that you can find on the internet which you can use if you feel the need.

The game-plan is, therefore, to have a very lightweight, very flat, system; without being absurd about it. I’m liking libopencm3, for example, and I’ve gotten quite far with getting stuff done for my STM32 (before I fried it). The documentation on it could be a little better, admittedly, but I do get a satisfying sense that I could understand what the library does without wading through layers and layers of abstractions.

Concurrency. There’s a lot of debate raging around in embedded systems about whether it’s a good idea, or not. Some prefer state machines, and think that RTOS’s are a bad idea.

There’s a smorgasbord of concurrency mechanisms to choose from. Do you want preemptive, cooperative, stackful, or stackless coroutines. The big player in RTOS’s is FreeRTOS. There are plenty of others: Zephyr, RIOT, the list is endless.

One of the things to ask yourself, though, is: do I really understand what’s going on under the hood? Things can be a little black-boxy. I’m sure that many understand the underlying mechanisms, as I’ve been fairly lazy at investigating what it’s really all about. But again, the more stuff you put in your project, the more difficult it is to understand.

I had been thinking that C++20’s coroutines might be a great choice for the implementation of concurrency, but after reading a few blogs, my eyes glazed over. Not sure if the C++ coroutines are intrinsically brain-damaging to use, or the bloggers are making hard work of it.

A couple of simple approaches which work in C is the async library, and protothreads. The async library has been discussed on Hackaday, with some commenters liking it, and others not so much. I have used it in the past, and it works well enough. There is C macro trickery, which has been the bone of contention.

My own recent experiment is to consider using “computed gotos” as a mechanism for stackless cooperative coroutines using a round-robin scheduler. That’s a bit of a mouthful for something that is actually quite simple.

I’m going to define a function millis() which returns the number of milliseconds elapsed since the Unix epoch:

// get time in milliseconds
long long millis()
{
    struct timeval te;
    gettimeofday(&te, NULL); // get current time
    long long milliseconds = te.tv_sec*1000LL + te.tv_usec/1000; // calculate milliseconds
    // printf("milliseconds: %lld\n", milliseconds);
    return milliseconds;
}

Obviously in an embedded system you’d have to come up with your own implementation. Maybe you could even get away without one. I just wanted some kind of clock as a way of testing a non-blocking delay.

We’re going to define a task(). In my example we’ll want to quit from it, so let’s define a global var:

static bool done = false;

Our main function will periodically execute the task:

int main()
{
        while(!done) {
                task();         
        }
        puts("KTHXBYE");

        return 0;
}

In practise you would implement many tasks, and you’d almost certainly want to loop forever on the task if you using it on a microcontroller.

Our “task” will be to start, wait for 2000ms without blocking, and then stop. Here’s how I chose to implement it:

void task(void)
{
        static long long start, now;
        static void* where = &&start;
        goto *where;
start:
        puts("Task started");
        start = millis();
        where = &&pausing;
        return;
pausing:
        if(millis() - start < 2000) return;
        puts("That's enough waiting");
        where = &&finis;
        return;
finis:
        done = true;
        puts("Task finished");
        return;
}

OK. So it’s not reentrent, as you have stuff stored statically. If you really wanted that kind of thing, then your design will be much more complicated. You’d have to pass through state, and all that jazz.

The real secret here is in the line:

static void* where = &&start;

So “where” just stores the address where execution should resume. Obviously you initialise it to the start state. You resume processing by simply jumping to the stored address:

goto *where;

The neat thing here is that you define states using C labels. It’s not the only way of doing it. You could use some kind of enumeration for the state and perform a switch on the state, but the way I’ve laid it out is better, I think. When you want to switch states, just set

where = &&whatever_new_state

The delay mechanism is pretty damn straightforward, too.

You need to take more care when adopting this approach, but it doesn’t seem any worse than all those other mechanisms employing macro tricks.

What about the use of global state? Yeah, the use of global state isn’t perfect, but I don’t think there are too many ways around it. Just call it a “poor man’s message-passing system”.

Be careful about any data that is shared with interrupts, though. You’ll need to employ the usual safeguards for those.

Anyway, that’s it for now. I hope it has given you something to think about. Code is available here.

Posted in Uncategorized | Leave a comment

A fistful of code for the #RP2040

My Raspberry Pico Pico directory is now gathering pace with some useful device drivers.

The most important ones are likely to be: flash example, SSD1306 OLED, and SD Card block reader (no file system support).

Other devices that are likely to be useful: Blinkt, 8×8 Matrix display, and 7-segment displays.

There’s a bunch of other examples and stuff for you to play with, including: RP2040 file for KiCad software, a Forth, a Basic, Picol (small Tcl-like language), a whole bunch of sound stuff (noise generators , PWM examples, filters, frequency generators, bytebeats, a simple drum machine, etc.) , button debouncers, plus an assortment of code to demonstrating the RP2040’s hardware.

Have at it.

Code is here.

Posted in Uncategorized | Leave a comment

“Barebones” (no SDK) blinky sketch for the #RP2040

This post is a follow-up to yesterday’s attempt to get a blinky sketch running for the Raspberry Pico, without using the SDK.

If you haven’t read the previous post, then do so now.

What has changed is the linker script, the bootup assembly and the “pad” setup in the main file. In the end, I don’t think it was the linker or assembly that was the problem, but a lack of proper peripheral setup.

I managed to get things working because I cribbed from Jeremy Grosser’s pico_examples repo. He was writing for Ada, though.

I took his linker script and his crt0.s file. They are now “fuller fat” implementations. The linker now includes proper subsections for zero’d data, the heap, etc.. the crt0.s file now includes all the vectors for the interrupts, together with stuff for ensuring that the cores are set up properly; and more besides.

I’m not inclined to delve too deeply into it at the moment, though. The file seems to have been obtained by using the C preprocessor on the SDK. It’s fairly compact.

I did extract the boot2 code into a separate file: boot2.s. I used “word” layout instead of “byte”, which makes it easier to compare to dumps from other builds. As I said in the previous post, it is likely that the boot code is specific to the Pico. If you’re using a variant, then you should dive into Jeremy’s repo to find something suitable.

I did notice that there was a Rust repo elsewhere that actually coded a bootloader for themselves. I assume it works. The problem with bootloaders is that they have to be checksummed, so there’s some fiddling around getting that sorted.

In the end, I decided I’d just use the one generated by the SDK. I’ve used the generated bytes. I’m contented enough to leave it as “mystery meat”, because it makes things simpler. I don’t think there’s any great mileage to be had rolling your own. Maybe it’s worth having an overview of what’s going on. Maybe.

The real fix to yesterday’s code is setting up the banks. I added the following code to the start of main.c:

        // inspired by Ada. Appears to be necessary, too.
        RESETS_RESET &= ~(1ul << 5); // clear IO_BANK0
        RESETS_RESET &= ~(1ul << 8); // clear PAD_BANK0
        while(1) {
                int io_bank0_done = (RESETS_RESET_DONE & (1ul<<5)) >0;
                int pad_bank0_done = (RESETS_RESET_DONE & (1ul<<8)) >0;
                if(io_bank0_done && pad_bank0_done) break;
        }
        PADS_BANK0_GPIO25 &= ~(1<<7); // clear output disable 
        PADS_BANK0_GPIO25 &= ~(1<<6); // clear input enable

This resets the IO and the PAD, and waits for them to come back up again. It seems to be a detail that I missed last time. Hmmm, I’m a little sceptical that it’s necessary, but it appears to be.

Rust seems to have similar code in their bootloader, and I thought that this would have already been set up by the SDK bootloader. Apparently not. I can’t help but get the sneaking suspicion that I’m still doing something wrong. It’s probably better to reset all (or most) of the banks and pad at the beginning of main(), but I’ll leave it as-is for now. Sometimes one is just grateful for getting the wretched thing working in the first place.

The good news is that not only does it seem to work in debug mode (fingers crossed), but it also works when I transfer the UF2 over to the Pico. Just type “make app.uf2” to generate the uf2 file. You’re going to need elf2uf2 for that. Fortunately, it is build by the SDK. If you’ve set your SDK up in a standard way, the Makefile should be able to find it.

Type “make flash” to flash the uf2 file over to the mcu, after you’ve put it in USB boot mode, of course. I’m using Debian Stable, which adds the Pico as /media/$(USER)/RPI-RP2. You may need to tweak it to suit your system.

So, good stuff. It was a lot of work and, as always, I’ve done a lot of deconstructing of other people’s work. Without them, I’d be rather lost. Hopefully what I’ve given you is a simple skeletal structure from which you can build on. It’s rather compact and written in C. These are huge advantages, I think, as it allows one to see what’s going on without a huge amount of fuss.

Enjoy.

Posted in Uncategorized | Leave a comment

A working-esque non-SDK blinky sketch for the #raspberrypico

It’s always interesting to get as close to the metal as possible with mcus (microcontrollers), armed with little more than a makefile, compiler, and vim.

Health warnings apply: it’s my best attempts to get a blinky sketch working, and only that, for the Pico. There are likely to be subtle misconceptions in how I’ve understood things. I referred to the SDK extensively in order to get it to work. The other crucial document is the RP2020 Datasheet (PDF). The latter is crucial for finding out the register addresses.

The functioning part is contained in the C file, main.c:

     1	#include <stdint.h>
     2	
     3	/* The gpio functions are described in the file:
     4	 * src/rp2_common/hardware_gpio/gpio.c
     5	 * Some are inlined in gpio.h (e.g. gpio_set_dir())
     6	 */
     7	
     8	#define REG(addr) *(volatile uint32_t*)(addr)
     9	
    10	
    11	#define SIO_BASE 		0xd0000000 // see s2.3.1.7
    12	#define SIO_GPIO_OUT		REG(SIO_BASE+0x010) // GPIO output value
    13	#define SIO_GPIO_OUT_SET	REG(SIO_BASE+0x014) // GPIO output value set
    14	#define SIO_GPIO_OUT_CLR	REG(SIO_BASE+0x018) // GPIO output value clear
    15	#define SIO_GPIO_OE		REG(SIO_BASE+0x020) // GPIO output enable
    16	#define SIO_GPIO_OE_SET		REG(SIO_BASE+0x024) // GPIO output enable set
    17	#define SIO_GPIO_OE_CLR 	REG(SIO_BASE+0x028) // GPIO output enable clear
    18	
    19	#define IO_BANK0_BASE 		0x40014000
    20	#define IO_BANK0_GPIO25_CTRL 	REG(IO_BANK0_BASE+0x0cc)
    21	
    22	#define PADS_BANK0_BASE 	0x4001c000 // see s2.19.6.3. Pad control register
    23	#define PADS_BANK0_GPIO25	REG(PADS_BANK0_BASE+0x68)
    24	
    25	
    26	#define GPIO_FUNC_SIO	5
    27	
    28	
    29	#define LED 25
    30	
    31	void delay(int n) // no particular timing
    32	{
    33		for(int i =0 ; i< n; i++) {
    34			for(int j = 0; j< 10000; j++) {
    35				asm volatile ("nop");
    36			}
    37		}
    38	}
    39	
    40	
    41	
    42	int main()
    43	{
    44		IO_BANK0_GPIO25_CTRL = GPIO_FUNC_SIO; // init pin
    45		SIO_GPIO_OE_SET = 1ul << LED; // allow setting of output
    46	
    47		while(1) {
    48			SIO_GPIO_OUT_SET = 1ul << LED; 
    49			delay(100);
    50			SIO_GPIO_OUT_CLR = 1ul << LED; // turn off the LED
    51			delay(900);
    52		}
    53	
    54		return 0;
    55	}

Line 8 contains a little macro that we’ve defined for ourselves that perform a standard trick in the embedded community. It allows register addresses to be treated like a variable from which we can set and get values. The “volatile” keyword tells the compiler that the value of the variable may change at any time. It prevents the compiler from optimising the variable out, which would probably cause the program not to run correctly. There is some controversy over its use in the C++ community, as the standards committee seem to have taken the view that “it probably doesn’t mean what you think it means.” It is likely to stay in the language, with some deprecation warnings, as “volatile” is used all over the place in embedded systems.

On lines 11-26 I have set up a bunch of addresses that enable us to control the mcu. GPIO pins, along with all the other mcu peripherals, are set up by peeking and poking memory addresses.

Line 25 is a definition which is the value that means that a pin should be treated as just a regular GPIO pin, rather than having special usage for SPI, I2C, etc..

Line 29 defines the onboard LED, which is pin 25 (GPIO25).

Lines 31-38 define a delay function, so that we can see the LED blinking for the human eye. Line 35 has a “no-op” (no operation) assembler instruction to gobble up a little bit of time. Note that the “volatile” keyword has been used to prevent the compiler from optimises out the nop operation.

In line 44, we initialise the CTRL (control) register of GPIO25 to become a standard pin.

In line, we set the Output Enable bit for our LED.

It might also be necessary to set the “pads” (refer lines 22-23) for some types of operations, but not general IO.

In lines 47-52 we do our standard while loop, setting the pin high in line 48, waiting a bit, setting it low again (“clearing” it) in line 50, and delaying again.

Not as bad as you thought it might be, huh?

Now comes the “here be dragons” bits, which seem to be more a matter of luck than judgement to get working. Let’s look at the linker file first, linker.ld:

     1	/* source:
     2	 * https://github.com/rp-rs/pico-blink-rs/blob/develop/memory.x
     3	 */
     4	
     5	/*
     6	ENTRY(reset_handler)
     7	*/
     8	
     9	MEMORY
    10	{
    11		/* NOTE 1 K = 1 KiBi = 1024 bytes */
    12		/* To suit Raspberry Pi RP2040 SoC */
    13		BOOT2 : ORIGIN = 0x10000000, LENGTH = 0x100 
    14		FLASH : ORIGIN = 0x10000100, LENGTH = 2048K  - 0x100
    15		/* FLASH : ORIGIN = 0x10000000, LENGTH = 2048K */
    16	
    17		RAM : ORIGIN = 0x20000000, LENGTH = 264K
    18	}
    19	
    20	SECTIONS {
    21	
    22		
    23		.boot2 :
    24		{
    25			__boot2_start__ = .;
    26			*(.boot2*);
    27			__boot2_end__ = .;
    28		} >BOOT2
    29		ASSERT(__boot2_end__ - __boot2_start__ == 256, 
    30			"ERROR: Pico second stage bootloader must be 256 bytes in size")
    31	
    32		.text :
    33		{
    34			/*
    35			__boot2_start__ = .;
    36			*(.boot2*);
    37			__boot2_end__ = .;
    38			*/
    39	
    40			*(.vectors*)
    41			. = ALIGN(4); 
    42			*(.text*)
    43			. = ALIGN(4);
    44		} >FLASH
    45		/*
    46		ASSERT(__boot2_end__ - __boot2_start__ == 256, 
    47			"ERROR: Pico second stage bootloader must be 256 bytes in size")
    48			*/
    49			
    50	
    51		.userstack :
    52		{
    53			. = ALIGN(4);
    54			. = . + 0x0400; /* minimum stack size */
    55			. = ALIGN(4);
    56			__StackTop = .;
    57		} > RAM
    58	
    59	
    60	
    61	}
    62	
    63	

Clearly I could have tidied up the script somewhat, but let’s not worry about that right now.

Lines 9-18 tells the linker how we want our code laid out in memory. I’ve found that you don’t necessarily get what you want when it comes to mcus. Their bootloaders often rearrange bits of code around to different addresses, which can be a little confusing.

Talking of bootloaders … the RP2040 does things in a slightly unconventional way than most mcus, if I’ve understood correctly. I think it has a first stage bootloader that is burned into the chip, and you can’t overwrite. The advantage of this is if you press down BOOTSEL and RESET pins, the chip will be reset to its fresh state. You can’t, therefore, “brick” the chip like you can with an STM32, and then have to faff around with the bootloader pins in order to render it programmable.

Which brings us onto line 13: “BOOT2”. This is a second-stage bootloader that you can altered programmatically. It is at address 0x10000000, and is 256 bytes long (hex 0x100). I’m not really sure the exact value of this. I think it enables different vendors like Adafruit to write their own bootloaders. Second-stage bootloaders might not be compatible between vendor/chips, but I’m rather hazy on the details. I also read somewhere that the bootloaders are checksummed, too, so good luck figuring all that out.

It possibly doesn’t matter what the bootloader is, so long as you got one that works. But again, I’m a rather in the dark as to what’s really going on. How do we know what the bootloader should be? I’ll answer that later.

You can see in lines 23-28 that we have a special section for the bootloader, which we force at the fixed starting address. There is also a check, in lines 29-30, that the bootloader is 256 lines long.

Next in memory, we have a FLASH section, which is 2M long, less 256 for the bootloader.

Lines 32-44 lay out what we’d call our “regular” code in flash.

After the bootloader, we want “vectors”, starting at address 0x10000100. The first thing is vectors is, if I’ve understood correctly, a “top of stack” for the bootloader. After that comes the ISRs (Interrupt Service Routines), which is a whole stack of pointers to functions for such things as timer interrupts, GPIO pin change interrupts, and all the rest of them.

In this project, I have ignored all of the interrupts. All of the interrupts except one: the reset_handler. The reset handler is the address of the function to call when the chip is reset/first powered on. The address is the second item in the vector table. It is crucial for our purposes, because it’s how we get to execute main(). Very important!

Lines 51-57 talk about RAM: how memory should be laid out it RAM. One thing it describes is the top of the stack, which is crucial for the mcu to be able to call functions. I’m not too happy with the way I’ve laid it out, simple as it is. I suspect there may be problems.

If you’ve looked at RAM layouts in linkers for other mcus, you would have notices that they can be rather complicated. There are all sorts of sections for zero’d data, non-zero’d data, heap allocation space, stuff for C++ classes construction and destruction, and really confusing stuff relating to stack tracing for C++ exceptions.

The Pico is not inherently easier in this respect. It’s just that we don’t need all that right now. It would only confuse the issue. Our program doesn’t use heap, for example, so we don’t have to write any malloc/free stuff. It just uses static allocation in memory, plus a bit of stack usage, which the mcu will handle for us anyway.

So now you’re probably wondering, “but what exactly does the bootloader look like, and how is the rest handler defined?” for that, we’re going to write a bit of assembly, crt0.s:

    1	/* Inspired from
     2	https://smist08.wordpress.com/2021/04/16/assembly-language-on-the-raspberry-pi-pico/
     3	*/
     4	
     5	.syntax unified
     6	.cpu cortex-m0plus
     7	.thumb
     8	/*
     9	@ .syntax unified
    10	 .fpu softvfp 
    11	@ .thumb
    12	*/
    13	
    14	
    15	.section .boot2, "ax"
    16	.word 0x4b32b500, 0x60582021, 0x21026898, 0x60984388
    17	.word 0x611860d8, 0x4b2e6158, 0x60992100, 0x61592102
    18	.word 0x22f02101, 0x492b5099, 0x21016019, 0x20356099
    19	.word 0xf844f000, 0x42902202, 0x2106d014, 0xf0006619
    20	.word 0x6e19f834, 0x66192101, 0x66182000, 0xf000661a
    21	.word 0x6e19f82c, 0x6e196e19, 0xf0002005, 0x2101f82f
    22	.word 0xd1f94208, 0x60992100, 0x6019491b, 0x60592100
    23	.word 0x481b491a, 0x21016001, 0x21eb6099, 0x21a06619
    24	.word 0xf0006619, 0x2100f812, 0x49166099, 0x60014814
    25	.word 0x60992101, 0x2800bc01, 0x4700d000, 0x49134812
    26	.word 0xc8036008, 0x8808f380, 0xb5034708, 0x20046a99
    27	.word 0xd0fb4201, 0x42012001, 0xbd03d1f8, 0x6618b502
    28	.word 0xf7ff6618, 0x6e18fff2, 0xbd026e18, 0x40020000
    29	.word 0x18000000, 0x00070000, 0x005f0300, 0x00002221
    30	.word 0x180000f4, 0xa0002022, 0x10000100, 0xe000ed08
    31	.word 0x00000000, 0x00000000, 0x00000000, 0x7a4eb274
    32	
    33	
    34	
    35	
    36	.section .vectors, "ax"
    37	.align 2 
    38	
    39	.global __vectors
    40	__vectors:
    41	/* 	.word __StackTop  */
    42		.word 0x20042000 
    43	.word _reset_handler
    44	
    45	
    46	
    47	
    48	
    49	
    50	.section .text
    51	.type _reset_handler,%function /* vital for getting the correct offset */
    52	.thumb_func
    53	_reset_handler:
    54		@ mov r0, r0 @ just for testing purposes
    55		bl main
    56	
    57	
    58	/*
    59	.thumb_func
    60	.global main_asm
    61	.align 4
    62	main_asm:
    63	BL main  
    64	*/
    65	
    66	.data
    67	.align 4

I like to keep the assembly to the minimum, as I’m not very good at it.

Lines 1-14 contain a bit of blah-blah, telling the compiler that we’re compiling for an ARM Cortex M0+ (because that’s what the RP2040 is), and need to use “thumb” assembly.

Lines 15-31 is our bootloader! 256 bytes. Where did I get it? I basically wrote a project elsewhere that dumped out 256 bytes starting from 0x10000100. If you look at the disassembled code from a working project, you’ll see that the hex codes are the same as the ones I’ve shown in those lines.

Lines 36-43 gives you the vector table that we talked about. The first vector is the top of the stack we want to declare, and the second one is a pointer to the the all-important reset handler. There ought to be a bunch of other handlers after that, too, but that would be messy, and we don’t need them right now. Maybe a lesson for another day.

Lines 50-55 contain our reset handler.

Line 50 tells us that the handler should go in the “text” section of memory.

Line 51 declares a “function”. This is necessary in order to get the compiler to align the function properly in memory, or else the vector table pointer won’t be compatible with the function. Which would be bad.

Line 53 declares the handler address. What does it do? Well, as you can see in line 55, it performs a call to main. Our main!

And that’s how the mcu boots into our main function.

It’s very simple in our case, although in general, it’s much more complicated.

What the reset handler usually does, prior to calling main, is zero out memory that should be zerod out, and any other “preamble” that you’d generally like to do before before calling main(). You could probably do all (or nearly all) of the preamble in the main() function itself, but sometimes it’s nice to do “standard” stuff before calling main().

As you can see, we’ve actually done nothing, and just cut straight to the chase by calling main(). For different processors you might do a bunch of processor-specific stuff. The SDK for the RP2040 seems to do quite a lot involving setting up which core to use. Another mcu would probably have only one core, making such a thing unnecessary. On ARM A-class chips, there are different execution levels, which can be set up.

It is possible that I am omitting crucial setup steps in the reset handler. I’m also worried about how I’ve set up the stack, and what’s really happening with the bootloader. Feel free to comment.

We now need to assemble all these pieces together to make a binary file that we can put on our mcu. Here’s the Makefile:

    1	AS = arm-none-eabi-as
     2	CC = arm-none-eabi-gcc
     3	CFLAGS = -mthumb -mcpu=cortex-m0plus -nostdlib -ggdb
     4	LD = arm-none-eabi-ld
     5	BIN = arm-none-eabi-objcopy
     6	LDFLAGS = -T linker.ld
     7	
     8	OBJS = crt0.o main.o
     9	
    10	
    11	app.bin : app.elf 
    12		arm-none-eabi-objcopy -O binary app.elf app.bin
    13		arm-none-eabi-objdump -d app.elf >app.list
    14	
    15	app.elf : $(OBJS) linker.ld
    16		$(LD) $(LDFLAGS) -o $@ $(OBJS)
    17		#$(LD) -o $@ $(OBJS) $(LDFLAGS)
    18	
    19	%.o : %.c
    20		$(CC) $(CFLAGS) -c -o $@ $^
    21	
    22	%.o : %.s
    23		$(AS) -g -o $@ $<
    24	
    25	clean :
    26		rm -f *.o app.elf app.list app.bin app.uf2
    27	
    28	flash : app.uf2
    29		cp app.uf2 /media/$(USER)/RPI-RP2
    30	
    31	app.uf2 : app.elf app.bin
    32		$(PICO_SDK_PATH)/build/elf2uf2/elf2uf2 app.elf app.uf2

As you can see, it’s waaay simpler than the onion-skinned cmake files of the SDK.

Lines 1-6 set variables so that we compile with the GCC ARM cross-compiler.

Line 3 sets the flags for the compiler, telling it to use thumb, compile for a Cortex M0+ architecture, and use debugging. Notice also the “-nostdlib” option, meaning that we don’t compile against any library. We could never use a standard library that comes with the OS, because that library would be for Linux, which we obviously wouldn’t have on the mcu. It is possible to obtain C libraries for uses on mcu, like newlib, but that’s a topic for another day.

Line 8 contains the files we want to compile: just two files, the assembly file, and main.c.

Lines 11-23 contain a bunch of rules for creating the elf and bin file.

Line 13 creates an “object dump” of the elf file (called app.list). You should become familiar with these dumps. When you’re developing from scratch, you can often compare them with known good projects to see if you’ve made mistakes in the layout of memory.

The first few lines of my disassembled file look like this:

app.elf:     file format elf32-littlearm


Disassembly of section .boot2:

10000000 <__boot2_start__>:
10000000:       4b32b500        .word   0x4b32b500
10000004:       60582021        .word   0x60582021
10000008:       21026898        .word   0x21026898
1000000c:       60984388        .word   0x60984388
10000010:       611860d8        .word   0x611860d8
10000014:       4b2e6158        .word   0x4b2e6158
10000018:       60992100        .word   0x60992100
1000001c:       61592102        .word   0x61592102
10000020:       22f02101        .word   0x22f02101
...

Recognise that? That’s the bootloader. Further down:

Disassembly of section .text:

10000100 <__vectors>:
10000100:       20042000        .word   0x20042000
10000104:       10000109        .word   0x10000109

10000108 <_reset_handler>:
10000108:       f000 f820       bl      1000014c <main>

1000010c <delay>:
1000010c:       b580            push    {r7, lr}
1000010e:       b084            sub     sp, #16
10000110:       af00            add     r7, sp, #0
10000112:       6078            str     r0, [r7, #4]
...

Oh look, our interrupt vector begin at location 0x10000100. The second line contains the address 0x10000109, which is 0x10000108 but out-by-one. This is a feature of how the compiler should compute the offests.

Notice that the first line of the reset handler is effectively bl main. Hey, this is what we told the compiler to do. Further down you will see:

1000014c <main>:
1000014c:       b580            push    {r7, lr}
1000014e:       af00            add     r7, sp, #0
10000150:       4b0b            ldr     r3, [pc, #44]   ; (10000180 <main+0x34>)
10000152:       2205            movs    r2, #5
10000154:       601a            str     r2, [r3, #0]
10000156:       4b0b            ldr     r3, [pc, #44]   ; (10000184 <main+0x38>)
...

This is our main routine.

That’s mostly a condensed version of the object file, anyway. The generated code may be different for you, depending on whether you’ve enable debugging, optimisations, and so on. So don’t be too concerned if the output is a little different.

Lines 25-26 just does some project cleanup.

Lines 31-32 ostensibly create a uf2 file from the elf file. If you’ve compiled the Pico SDK, then this tool would have been created for you, so you can use it. I’m assuming that you’re using Linux.

Lines 29-29 allows you to flash the uf2 to the mcu itself, assuming you’ve done the BOOTSEL dance. Again, I’m assuming you’re using Linux, Debian Stable.

Now, I used the word “obstensibly” previously. This is because the uf2 generated file seems to have something wrong with it. I don’t know why that is. If you can help, then feel free to comment. There could be any number of problems with what’s produced.

So, unfortunately, it is not possible (yet) to just flash the mcu with the uf2 file and have it work. But it is possible to run the code in a debugger. That does work. Maybe that’s some kind of clue to a reader as to what crucial thing I’m doing wrong.

In a separate shell, run start-debug:

#!/usr/bin/env bash

sudo openocd -f interface/picoprobe.cfg -f target/rp2040.cfg -s tcl

My setup is that I have a Pico running as a debugger for another Pico. That’s probably the best way of doing it. I’ve heard of other strategies, but I haven’t been able to get the alternatives to work.

In a different shell, start the debugger by typing

gdb-multiarch

There’s a neat feature of the debugger in that it executes the script .gdbinit if it finds it. This is very convenient, as initialising the debugger can be tedious. Here is the script I use:

# quit without confirmation
define hook-quit
    set confirm off 
end

file app.elf
target remote localhost:3333
load
monitor reset init
#b main.c:64
echo Type c to continue...\n

GDB will load all that. To actually run the elf file and see the mcu blinking, just type c at the command prompt.

All of the above-mentioned files are available in the following directory on my git repo:

https://github.com/blippy/rpi/tree/master/pico/bare-blink

You might just as well clone the whole repo, though, navigate to the relevant directory, and type

make

I hope this has been useful to you. Please let me know of any fixes I need to make in order to get uf2 working. I feel it must be close (??)

Posted in Uncategorized | 1 Comment

Kicking the tyres of RIOT OS

RIOT is an RTOS that bills itself as “the friendly Operating System for the Internet of Things.”

There seems to be a fair amount of board support, including STM32 blue and black pills, ESP32, and even the Arduino Nano/Uno. Unsupported are the Raspberry Pi’s and the Pi Pico. A list of supported boards is available here.

So, I cloned their repo. Then

cd RIOT/examples

I copied one of their examples to the same directory, which I called my_project. Then

cd my_project

I edited main.c so that it looked like this

//Include Libraries
#include <stdio.h>
#include "thread.h"        // Use for Create the Thread
#include "xtimer.h"        // Use for create the delay
#include "periph/gpio.h"    // Use for GPIO operations

// Arduino-Nano On-Board LED connected with "A5" which mapped with atmega328p pin "PC5"
//#define EXTERNAL_LED GPIO_PIN(2,5)
#define EXTERNAL_LED GPIO_PIN(PORT_C, 13)


char task1_stack[THREAD_STACKSIZE_MAIN];
/**
 *
 */

void *Task1(void *arg)
{
	(void) arg;
	printf("Create Task1 Thread for Blink the External LED from the %s board.\n", RIOT_BOARD);
	gpio_init (EXTERNAL_LED, GPIO_OUT);
	while(1){
		puts("Task1\r\n");
		gpio_write (EXTERNAL_LED, 1);
		xtimer_msleep(500);
		gpio_write (EXTERNAL_LED, 0);
		xtimer_msleep(100);
	}
	return NULL;
}

/**
 *
 */
int main(void)
{
	xtimer_init();
#if 1
	printf("You are running RIOT on a(n) %s board.\n", RIOT_BOARD);
	printf("This board features a(n) %s MCU.\n", RIOT_MCU);
#endif

	thread_create(task1_stack,                         /* stack array pointer */
			sizeof(task1_stack),                            /* stack size */
			THREAD_PRIORITY_MAIN - 1,            /* thread priority */
			THREAD_CREATE_WOUT_YIELD | THREAD_CREATE_STACKTEST,        /* thread configuration flag, usually By default, the thread starts immediately */
			Task1,                                                 /* thread handler function */
			NULL,                                                 /* argument of thread_handler function */
			"task1"                                                /* thread_name */
		     );

	while(1){
		puts("--> I'm Main \n");
		xtimer_msleep(1000);
		//xtimer_msleep(200);
		//delay(100);
	}
	return 0;
}

As you can see, I created a task, set up the LED, and used xtimer_msleep() to sleep the task. That’s possibly a bit naughty inside a task, but I wanted to test the sleep functionality.

I wanted to test for the blue pill. I also needed to enable the xtimer module. So in the Makefile I did:

...
BOARD = bluepill
...
USEMODULE += xtimer

Then I built the project using make.

The resulting hex filing was placed in bin/bluepill, which I flashed to the chip, and hey presto, I got some blinky lights.

I didn’t get the Uno version working because I hadn’t figured out the USEMODULE trick of enabling the xtimer. Also, creating the thread meant that the project didn’t link. I think the thread uses too much memory. Perhaps the program’s default stack size was too big, and with some trimming it could be made to work.

The repo is of a chunky size, as you might expect from an OS that supports many systems. The standard Arduino API also seems supported, at least for the Arduino boards.

The bluepill hex file created by RIOT was 14192 bytes in size. This is way smaller than a similar file created by ARM mbed. The mbed file was about 44k in size, if I recall, and that was without threading. So you’re going to have to be careful here. mbed could easily max out your MCU flash if you do stuff that is more complicated.

So the advantages I see for RIOT over mbed is smaller binary size, faster compile times, and support for non-ARM chips.

Actually, one thing that disappointed me about libopencm3 is that the API is not standard across boards. The GPIO functions are not compatible between the blue and black pill. RIOT seems to offer a consistent API across chipsets. (Alas, I seemed to have fried my black pill after only a short amount of use. I’m not sure what I did wrong).

I am new to RTOSs, but my understanding is that FreeRTOS is mainly aimed at providing an RTOS scheduler, and virtually nothing else. That may be fine if that’s all you need, but the chances are that you’ll want extra goodies like consistent GPIO and Timer APIs, and suchlike.

I don’t know how RIOT compares with Zephyr. They both seem to cover the same space. Zephyr is getting a lot of buzz right now, so I can give no sage advice as to which horse to back. There are so many RTOSs to choose from, it seems to be mostly a case of picking something, and see how it takes you. There seems to be some interest in providing support for the RP2040 in Zephyr, whereas I’m not aware of any such effort for RIOT.

Posted in Uncategorized | Leave a comment

#rt-thread not ready for primetime

Hmm. There was some exciting news announced that RT-Thread now compiles for the Raspberry Pi Pico. I never uses RT-Thread before, but I was intrigued by the prospect of a real-time cross-platform interface for the Pico.

So I downloaded rt-thread from github, which was a monster download. I went to the Pico directory, and managed to compile and install a blinky sketch. Success!

I was excited to see that it support for STM32 blue pill. I tried compiling that. Failure! It gave a warning about “redefinition of ‘union sigval’ in signal.h. So there seems to be a conflict with the newlib library.

Then I tried compiling the f411 nucleo board. Same problem. That is after I changed the hard-coded EXEC command in rtconfig.py from r’C:\Users\XXYYZZ’ to r’/usr/bin’.

Hmmm, lot of hard-coding there.

How about the for Raspberry Pi 2? This time I had to change

EXEC_PATH = r’/opt/gcc-arm-none-eabi-5_4-2016q3/bin’

to

EXEC_PATH = r’/usr/bin’

So much hard-coding! I also had the same compilation errors.

RT-Thread also has an “ENV” project. I downloaded that. It appeared to be some kind of configuration environment. It wasn’t clear what it was supposed to do or how it was supposed to work. They did have a bundled-up tag for it, so I downloaded that. That turned out to be a monster 90MB in size. When I unpacked it, it contained their IDE which was for Windows only.

According to their website, the Env scripts are for Linux/MacOS. And?

Their homepage is is pretty slick, and it certainly looks encouraging, but the actual engineering of the project itself is a bit “ish”. This is despite the fact that the project was started in 2006.

Upshot: I can’t be bothered wading through all the minute details to get this thing running. If you’re looking for an RTOS for your MCU, then start somewhere else.

The only silver lining in all this is that I’ve decided to revisit my “crunky” library OS for my Raspberry Pi 0. Although hardly perfect it does do a wide range of stuff like UART, SPI, I2C, SD Cards (mostly), and even display output. I doubt I will ever get USB working, though. So I was a little disillusioned with it, but now I’m thinking of dusting it off again and seeing how it compares with microcontrollers.

I want to do a lot of audio work, maybe abandon the idea of using it as a kind of “retrocomputer”, and getting some serious DSP in. So, meh, I’ll see how it goes.

I just heard about RTEMS, which seems to have support for Raspberry Pi, and a couple of STM32’s. No RP2040, though. Might worth a little sniff later, though.

Posted in Uncategorized | Leave a comment

Ramblings on developing stm32 vs raspberry pi pico #rp2040

There has been lots of reviews on the capabilities of various uCs (microntrollers), comparing them with their price, and so on. Most people are further along in their journey in understanding, so this post is written from the perspective of someone who is feeling their way around. I’d class myself as an intermediate uC programmer: someone who has climbed at least a few rungs on the microcontroller ladder, but I’d probably look hopelessly naive compared to the grizzled war-dogs.

The purpose of this article is not to weigh the pros and cons a nuances of blue pills vs black pills, Arduino this vs Arduino that, etc., but to try to give a sense of what it has felt like to be a developer for various uCs. I’ve had exposure to: Arduino Uno/Nano, ATtiny85, stm8s, stm32 blue pill, ESP8266, ESP32 and Raspberry Pi rp2040 pico. I’ve had a go at bare metal programming for the Raspberry Pi. My prompt for this blog post is my recent purchase of an stm32f411, and whether I want to side with the Pico, or the f411.

ARM for the win. I’m thinking of standardising on a processor architecture, rather than continually flitting about. The great thing about ARM is that it’s like C/C++: it’s absolutely ubiquitous, there are standard toolchains, and it’s guaranteed to work. You just can’t go wrong.

I can see a place for still using the ATtiny85 for when I want to make something small. I have a couple of projects involving Nanos, which I will keep going. One thing I’d say, though, is that we seem to be shifting towards 3V chips, which makes the 5V of the Nano less interesting. The Arduino Uno/Nano is still the best introduction to UCs though.

IDEs suck. The Arduino IDE is the best out there, but that’s not saying much. It’s better than the abomination that is STM32CubeIDE. What is that thing? Something based on Java Eclipse? Whatever, it’s as slow AF. And I can’t make head nor tails out of the graphical configuration tool. It’s like staring at the controls on an airplane cockpit. Everything is there, assuming I know where to look. Which I don’t.

Actually, programming tools is the reason I’ve decided to skip other architectures. PIC processors seem intriguing, but I don’t want to have to install somebody’s bloated proprietary IDE just to compile a file.

My idea of a programming environment is a Makefile, vim, and a compiler that I can install from Debian. That’s the great thing about ARM. I can be sure I can get a well-supported compiler from my distro. I can use the AVR compiler and library for my Atmel chips if I have to. I liked the stm8 as a chip, but the free compiler – sdcc – has some of it’s own ideas about how things should be done. I’d rather it do standard things in standard ways.

Libopencm3 looks good. So, having decided that stm32’s look a good bet, we now move onto the question of what library – if any – we should use. There’s a bewildering array of choices: CMSIS, SPL, HAL, mbed, and a variety of real-time operating systems to choose from.

SPL seems deprecated. HAL just seems too difficult, too bloated, and a cure that’s worse than the disease. mbed seems a bit of a mish-mash, a skyscraper with foundations of shifting sand. Why does it keep saying that features are deprecated?

Having become disillusioned with these tools, I decided to write some code using bare registers and a datasheet. Writing out the data registers manually was perhaps not the greatest idea ever, although it did teach me a lot. A better way would have been to try to figure out how to integrate CMSIS, as it is basically a big library of register definitions.

So I think that if you really want to learn about a (ARM) microcontroller, then your best approach might be to create a blinky light program using no external libraries, and just register definitions. Once you’ve done that, then you can say “hey, I know how basic low-level register manipulation works”, and then try the same exercise again, this time integrating and using CMSIS.

You might like to try creating some SPI libraries. Have a go – it’s perhaps a little more difficult than it looks. It’s tricky to get going from the datasheets, although I did manage with a lot of help from Google.

Which brings me to libopencm3. It’s a nice little abstraction library. It seems small – unlike HAL. There’s no git submodules or any fancy tricks into compiling it. It just uses a Makefile. Imagine that! An honest to goodness Makefile. I don’t have to download a multi-hundred megabyte monstrosity. It compiles all its supported architecture in a oner. It doesn’t take long, either, so I’m happy with the general setup.

It could certainly use more examples and documentation. It does document the whole of the library using doxygen, though, which is useful to browse through. You can even view the source files themselves from the documentation. You can see how the SPI functions are implemented, for example.

The code looks quite clean, too. There’s not layers upon layers upon layers of abstraction. So if you wanted to write your own implementation of SPI, for example, then libopencm3 looks a good bet. But you could just use libopencm3, and have reasonable assurance that it’s at least as good as something you could implement yourself.

So although I am new to libopencm3, I’m finding that it fits in with how my brain works. I want to dig deeper.

Don’t forget the Pico. The Pico doesn’t have any CMSIS definitions, nor support from libopencm3. They have their own SDK. There is some set-up effort required, but it is well-documented. In fact, the whole of their SDK is well-documented. This is why the Pico is eminently worthy of investigation. The Pi Foundation have thought things through, and I’ve made a lot of progress in understanding how the thing works. Their SDK is good, and I don’t really see any reason not to use it. Projects need to be built using cmake, which some might object to. However, I consider it a minor bump considering what else I get for my money.

Conclusion. My current thinking it to operate a dual strategy, switching between the Pico and the STM32 as the mood takes me; favouring libopencm3 for the latter development. I’ll occasionally dip into the Atmel chips as conditions warrant, but I’ll consider it a side issue in favour of the ARM chips. The black pill that I recently purchased has DAC and DSP capabilities, and I’m keen to explore that functionality for some audio work that I’m interested in.

Just my 2 cents. Have fun, and stay safe.

Posted in Uncategorized | 4 Comments

Sleepico: a pleasant noise-generator for the #raspberrypi #rp2040

On this blog I’ve described a few “sleep” noise-generators in the past. White noise is too hash, and needs to be “softened” by attenuating the higher frequencies. I had described pseudo Brown noise generators in a previous post using an RC (resistor-capacitor) low-pass filter. RC filters are not “true” Brownian noise , but they are eminently sufficient for their intended purpose.

Brown noise reduces output at all frequencies, whereas an RC filter is a low-pass filter.

RC filters are usually implemented electronically. Ostensibly they use just a resistor and a capacitor, but I think things are more complicated than that. They assume that the output from the RC circuit has high impedance, i.e. that there is a large resistance in the output circuit to ground.

But this is not necessarily the case. You can’t just attach a speaker at the output of the circuit, because speakers generally have a low resistance. You therefore won’t get the output you expect. You could put an opamp to configured in as a buffer between the RC circuit and the speaker. This introduces extra problems, because with something like a 714 opamp you need to supply an negative voltage owing to its limitations.

What a palaver!

An alternative is to use a uC (microcontroller) to process signals digitally, and then just spit out the result. No electronics (although of course you’ll still need a speaker), just code.

Arduino Unos are unlikely to have enough grunt to handle the processing requirements. Fortunately, I found the Raspberry Pi Pico to be up to the job, at least for use as an RC filter.

The differential equation to determine the voltage across a capacitor is given by:

vc(t+dt) = vc(t) + 2*pi*fc*dt*(va(t)-vc(t))

where vc is the voltage output, fc is the cut-off frequency ( parameter), dt is the time interval, t is time and va is the input voltage.

va can vary over time, if you like. In my implementation I use a random number generator to generate white noise for va.

Normally, one would choose va to be uniformly distributed over some range. What I did in my code is to make it either fully on (value of 1), or fully off (value of 0). I calculate vc from that, and set the pin high if vc>=0.5, low otherwise.

Connect a speaker to GP19. You can use an audio jack if you like, or just use a simple small speaker.

As an added bonus, GP20 acts as a switch between white noise and the filtered noise. Under normal mode of operation, the filtered noise is produced. Press down GP20 to hear the underlying white noise.

Sweet dreams.

Posted in Uncategorized | Leave a comment