A working-esque non-SDK blinky sketch for the #raspberrypico

It’s always interesting to get as close to the metal as possible with mcus (microcontrollers), armed with little more than a makefile, compiler, and vim.

Health warnings apply: it’s my best attempts to get a blinky sketch working, and only that, for the Pico. There are likely to be subtle misconceptions in how I’ve understood things. I referred to the SDK extensively in order to get it to work. The other crucial document is the RP2020 Datasheet (PDF). The latter is crucial for finding out the register addresses.

The functioning part is contained in the C file, main.c:

     1	#include <stdint.h>
     2	
     3	/* The gpio functions are described in the file:
     4	 * src/rp2_common/hardware_gpio/gpio.c
     5	 * Some are inlined in gpio.h (e.g. gpio_set_dir())
     6	 */
     7	
     8	#define REG(addr) *(volatile uint32_t*)(addr)
     9	
    10	
    11	#define SIO_BASE 		0xd0000000 // see s2.3.1.7
    12	#define SIO_GPIO_OUT		REG(SIO_BASE+0x010) // GPIO output value
    13	#define SIO_GPIO_OUT_SET	REG(SIO_BASE+0x014) // GPIO output value set
    14	#define SIO_GPIO_OUT_CLR	REG(SIO_BASE+0x018) // GPIO output value clear
    15	#define SIO_GPIO_OE		REG(SIO_BASE+0x020) // GPIO output enable
    16	#define SIO_GPIO_OE_SET		REG(SIO_BASE+0x024) // GPIO output enable set
    17	#define SIO_GPIO_OE_CLR 	REG(SIO_BASE+0x028) // GPIO output enable clear
    18	
    19	#define IO_BANK0_BASE 		0x40014000
    20	#define IO_BANK0_GPIO25_CTRL 	REG(IO_BANK0_BASE+0x0cc)
    21	
    22	#define PADS_BANK0_BASE 	0x4001c000 // see s2.19.6.3. Pad control register
    23	#define PADS_BANK0_GPIO25	REG(PADS_BANK0_BASE+0x68)
    24	
    25	
    26	#define GPIO_FUNC_SIO	5
    27	
    28	
    29	#define LED 25
    30	
    31	void delay(int n) // no particular timing
    32	{
    33		for(int i =0 ; i< n; i++) {
    34			for(int j = 0; j< 10000; j++) {
    35				asm volatile ("nop");
    36			}
    37		}
    38	}
    39	
    40	
    41	
    42	int main()
    43	{
    44		IO_BANK0_GPIO25_CTRL = GPIO_FUNC_SIO; // init pin
    45		SIO_GPIO_OE_SET = 1ul << LED; // allow setting of output
    46	
    47		while(1) {
    48			SIO_GPIO_OUT_SET = 1ul << LED; 
    49			delay(100);
    50			SIO_GPIO_OUT_CLR = 1ul << LED; // turn off the LED
    51			delay(900);
    52		}
    53	
    54		return 0;
    55	}

Line 8 contains a little macro that we’ve defined for ourselves that perform a standard trick in the embedded community. It allows register addresses to be treated like a variable from which we can set and get values. The “volatile” keyword tells the compiler that the value of the variable may change at any time. It prevents the compiler from optimising the variable out, which would probably cause the program not to run correctly. There is some controversy over its use in the C++ community, as the standards committee seem to have taken the view that “it probably doesn’t mean what you think it means.” It is likely to stay in the language, with some deprecation warnings, as “volatile” is used all over the place in embedded systems.

On lines 11-26 I have set up a bunch of addresses that enable us to control the mcu. GPIO pins, along with all the other mcu peripherals, are set up by peeking and poking memory addresses.

Line 25 is a definition which is the value that means that a pin should be treated as just a regular GPIO pin, rather than having special usage for SPI, I2C, etc..

Line 29 defines the onboard LED, which is pin 25 (GPIO25).

Lines 31-38 define a delay function, so that we can see the LED blinking for the human eye. Line 35 has a “no-op” (no operation) assembler instruction to gobble up a little bit of time. Note that the “volatile” keyword has been used to prevent the compiler from optimises out the nop operation.

In line 44, we initialise the CTRL (control) register of GPIO25 to become a standard pin.

In line, we set the Output Enable bit for our LED.

It might also be necessary to set the “pads” (refer lines 22-23) for some types of operations, but not general IO.

In lines 47-52 we do our standard while loop, setting the pin high in line 48, waiting a bit, setting it low again (“clearing” it) in line 50, and delaying again.

Not as bad as you thought it might be, huh?

Now comes the “here be dragons” bits, which seem to be more a matter of luck than judgement to get working. Let’s look at the linker file first, linker.ld:

     1	/* source:
     2	 * https://github.com/rp-rs/pico-blink-rs/blob/develop/memory.x
     3	 */
     4	
     5	/*
     6	ENTRY(reset_handler)
     7	*/
     8	
     9	MEMORY
    10	{
    11		/* NOTE 1 K = 1 KiBi = 1024 bytes */
    12		/* To suit Raspberry Pi RP2040 SoC */
    13		BOOT2 : ORIGIN = 0x10000000, LENGTH = 0x100 
    14		FLASH : ORIGIN = 0x10000100, LENGTH = 2048K  - 0x100
    15		/* FLASH : ORIGIN = 0x10000000, LENGTH = 2048K */
    16	
    17		RAM : ORIGIN = 0x20000000, LENGTH = 264K
    18	}
    19	
    20	SECTIONS {
    21	
    22		
    23		.boot2 :
    24		{
    25			__boot2_start__ = .;
    26			*(.boot2*);
    27			__boot2_end__ = .;
    28		} >BOOT2
    29		ASSERT(__boot2_end__ - __boot2_start__ == 256, 
    30			"ERROR: Pico second stage bootloader must be 256 bytes in size")
    31	
    32		.text :
    33		{
    34			/*
    35			__boot2_start__ = .;
    36			*(.boot2*);
    37			__boot2_end__ = .;
    38			*/
    39	
    40			*(.vectors*)
    41			. = ALIGN(4); 
    42			*(.text*)
    43			. = ALIGN(4);
    44		} >FLASH
    45		/*
    46		ASSERT(__boot2_end__ - __boot2_start__ == 256, 
    47			"ERROR: Pico second stage bootloader must be 256 bytes in size")
    48			*/
    49			
    50	
    51		.userstack :
    52		{
    53			. = ALIGN(4);
    54			. = . + 0x0400; /* minimum stack size */
    55			. = ALIGN(4);
    56			__StackTop = .;
    57		} > RAM
    58	
    59	
    60	
    61	}
    62	
    63

Clearly I could have tidied up the script somewhat, but let’s not worry about that right now.

Lines 9-18 tells the linker how we want our code laid out in memory. I’ve found that you don’t necessarily get what you want when it comes to mcus. Their bootloaders often rearrange bits of code around to different addresses, which can be a little confusing.

Talking of bootloaders … the RP2040 does things in a slightly unconventional way than most mcus, if I’ve understood correctly. I think it has a first stage bootloader that is burned into the chip, and you can’t overwrite. The advantage of this is if you press down BOOTSEL and RESET pins, the chip will be reset to its fresh state. You can’t, therefore, “brick” the chip like you can with an STM32, and then have to faff around with the bootloader pins in order to render it programmable.

Which brings us onto line 13: “BOOT2”. This is a second-stage bootloader that you can altered programmatically. It is at address 0x10000000, and is 256 bytes long (hex 0x100). I’m not really sure the exact value of this. I think it enables different vendors like Adafruit to write their own bootloaders. Second-stage bootloaders might not be compatible between vendor/chips, but I’m rather hazy on the details. I also read somewhere that the bootloaders are checksummed, too, so good luck figuring all that out.

It possibly doesn’t matter what the bootloader is, so long as you got one that works. But again, I’m a rather in the dark as to what’s really going on. How do we know what the bootloader should be? I’ll answer that later.

You can see in lines 23-28 that we have a special section for the bootloader, which we force at the fixed starting address. There is also a check, in lines 29-30, that the bootloader is 256 lines long.

Next in memory, we have a FLASH section, which is 2M long, less 256 for the bootloader.

Lines 32-44 lay out what we’d call our “regular” code in flash.

After the bootloader, we want “vectors”, starting at address 0x10000100. The first thing is vectors is, if I’ve understood correctly, a “top of stack” for the bootloader. After that comes the ISRs (Interrupt Service Routines), which is a whole stack of pointers to functions for such things as timer interrupts, GPIO pin change interrupts, and all the rest of them.

In this project, I have ignored all of the interrupts. All of the interrupts except one: the reset_handler. The reset handler is the address of the function to call when the chip is reset/first powered on. The address is the second item in the vector table. It is crucial for our purposes, because it’s how we get to execute main(). Very important!

Lines 51-57 talk about RAM: how memory should be laid out it RAM. One thing it describes is the top of the stack, which is crucial for the mcu to be able to call functions. I’m not too happy with the way I’ve laid it out, simple as it is. I suspect there may be problems.

If you’ve looked at RAM layouts in linkers for other mcus, you would have notices that they can be rather complicated. There are all sorts of sections for zero’d data, non-zero’d data, heap allocation space, stuff for C++ classes construction and destruction, and really confusing stuff relating to stack tracing for C++ exceptions.

The Pico is not inherently easier in this respect. It’s just that we don’t need all that right now. It would only confuse the issue. Our program doesn’t use heap, for example, so we don’t have to write any malloc/free stuff. It just uses static allocation in memory, plus a bit of stack usage, which the mcu will handle for us anyway.

So now you’re probably wondering, “but what exactly does the bootloader look like, and how is the rest handler defined?” for that, we’re going to write a bit of assembly, crt0.s:

    1	/* Inspired from
     2	https://smist08.wordpress.com/2021/04/16/assembly-language-on-the-raspberry-pi-pico/
     3	*/
     4	
     5	.syntax unified
     6	.cpu cortex-m0plus
     7	.thumb
     8	/*
     9	@ .syntax unified
    10	 .fpu softvfp 
    11	@ .thumb
    12	*/
    13	
    14	
    15	.section .boot2, "ax"
    16	.word 0x4b32b500, 0x60582021, 0x21026898, 0x60984388
    17	.word 0x611860d8, 0x4b2e6158, 0x60992100, 0x61592102
    18	.word 0x22f02101, 0x492b5099, 0x21016019, 0x20356099
    19	.word 0xf844f000, 0x42902202, 0x2106d014, 0xf0006619
    20	.word 0x6e19f834, 0x66192101, 0x66182000, 0xf000661a
    21	.word 0x6e19f82c, 0x6e196e19, 0xf0002005, 0x2101f82f
    22	.word 0xd1f94208, 0x60992100, 0x6019491b, 0x60592100
    23	.word 0x481b491a, 0x21016001, 0x21eb6099, 0x21a06619
    24	.word 0xf0006619, 0x2100f812, 0x49166099, 0x60014814
    25	.word 0x60992101, 0x2800bc01, 0x4700d000, 0x49134812
    26	.word 0xc8036008, 0x8808f380, 0xb5034708, 0x20046a99
    27	.word 0xd0fb4201, 0x42012001, 0xbd03d1f8, 0x6618b502
    28	.word 0xf7ff6618, 0x6e18fff2, 0xbd026e18, 0x40020000
    29	.word 0x18000000, 0x00070000, 0x005f0300, 0x00002221
    30	.word 0x180000f4, 0xa0002022, 0x10000100, 0xe000ed08
    31	.word 0x00000000, 0x00000000, 0x00000000, 0x7a4eb274
    32	
    33	
    34	
    35	
    36	.section .vectors, "ax"
    37	.align 2 
    38	
    39	.global __vectors
    40	__vectors:
    41	/* 	.word __StackTop  */
    42		.word 0x20042000 
    43	.word _reset_handler
    44	
    45	
    46	
    47	
    48	
    49	
    50	.section .text
    51	.type _reset_handler,%function /* vital for getting the correct offset */
    52	.thumb_func
    53	_reset_handler:
    54		@ mov r0, r0 @ just for testing purposes
    55		bl main
    56	
    57	
    58	/*
    59	.thumb_func
    60	.global main_asm
    61	.align 4
    62	main_asm:
    63	BL main  
    64	*/
    65	
    66	.data
    67	.align 4

I like to keep the assembly to the minimum, as I’m not very good at it.

Lines 1-14 contain a bit of blah-blah, telling the compiler that we’re compiling for an ARM Cortex M0+ (because that’s what the RP2040 is), and need to use “thumb” assembly.

Lines 15-31 is our bootloader! 256 bytes. Where did I get it? I basically wrote a project elsewhere that dumped out 256 bytes starting from 0x10000100. If you look at the disassembled code from a working project, you’ll see that the hex codes are the same as the ones I’ve shown in those lines.

Lines 36-43 gives you the vector table that we talked about. The first vector is the top of the stack we want to declare, and the second one is a pointer to the the all-important reset handler. There ought to be a bunch of other handlers after that, too, but that would be messy, and we don’t need them right now. Maybe a lesson for another day.

Lines 50-55 contain our reset handler.

Line 50 tells us that the handler should go in the “text” section of memory.

Line 51 declares a “function”. This is necessary in order to get the compiler to align the function properly in memory, or else the vector table pointer won’t be compatible with the function. Which would be bad.

Line 53 declares the handler address. What does it do? Well, as you can see in line 55, it performs a call to main. Our main!

And that’s how the mcu boots into our main function.

It’s very simple in our case, although in general, it’s much more complicated.

What the reset handler usually does, prior to calling main, is zero out memory that should be zerod out, and any other “preamble” that you’d generally like to do before before calling main(). You could probably do all (or nearly all) of the preamble in the main() function itself, but sometimes it’s nice to do “standard” stuff before calling main().

As you can see, we’ve actually done nothing, and just cut straight to the chase by calling main(). For different processors you might do a bunch of processor-specific stuff. The SDK for the RP2040 seems to do quite a lot involving setting up which core to use. Another mcu would probably have only one core, making such a thing unnecessary. On ARM A-class chips, there are different execution levels, which can be set up.

It is possible that I am omitting crucial setup steps in the reset handler. I’m also worried about how I’ve set up the stack, and what’s really happening with the bootloader. Feel free to comment.

We now need to assemble all these pieces together to make a binary file that we can put on our mcu. Here’s the Makefile:

    1	AS = arm-none-eabi-as
     2	CC = arm-none-eabi-gcc
     3	CFLAGS = -mthumb -mcpu=cortex-m0plus -nostdlib -ggdb
     4	LD = arm-none-eabi-ld
     5	BIN = arm-none-eabi-objcopy
     6	LDFLAGS = -T linker.ld
     7	
     8	OBJS = crt0.o main.o
     9	
    10	
    11	app.bin : app.elf 
    12		arm-none-eabi-objcopy -O binary app.elf app.bin
    13		arm-none-eabi-objdump -d app.elf >app.list
    14	
    15	app.elf : $(OBJS) linker.ld
    16		$(LD) $(LDFLAGS) -o $@ $(OBJS)
    17		#$(LD) -o $@ $(OBJS) $(LDFLAGS)
    18	
    19	%.o : %.c
    20		$(CC) $(CFLAGS) -c -o $@ $^
    21	
    22	%.o : %.s
    23		$(AS) -g -o $@ $<
    24	
    25	clean :
    26		rm -f *.o app.elf app.list app.bin app.uf2
    27	
    28	flash : app.uf2
    29		cp app.uf2 /media/$(USER)/RPI-RP2
    30	
    31	app.uf2 : app.elf app.bin
    32		$(PICO_SDK_PATH)/build/elf2uf2/elf2uf2 app.elf app.uf2

As you can see, it’s waaay simpler than the onion-skinned cmake files of the SDK.

Lines 1-6 set variables so that we compile with the GCC ARM cross-compiler.

Line 3 sets the flags for the compiler, telling it to use thumb, compile for a Cortex M0+ architecture, and use debugging. Notice also the “-nostdlib” option, meaning that we don’t compile against any library. We could never use a standard library that comes with the OS, because that library would be for Linux, which we obviously wouldn’t have on the mcu. It is possible to obtain C libraries for uses on mcu, like newlib, but that’s a topic for another day.

Line 8 contains the files we want to compile: just two files, the assembly file, and main.c.

Lines 11-23 contain a bunch of rules for creating the elf and bin file.

Line 13 creates an “object dump” of the elf file (called app.list). You should become familiar with these dumps. When you’re developing from scratch, you can often compare them with known good projects to see if you’ve made mistakes in the layout of memory.

The first few lines of my disassembled file look like this:

app.elf:     file format elf32-littlearm


Disassembly of section .boot2:

10000000 <__boot2_start__>:
10000000:       4b32b500        .word   0x4b32b500
10000004:       60582021        .word   0x60582021
10000008:       21026898        .word   0x21026898
1000000c:       60984388        .word   0x60984388
10000010:       611860d8        .word   0x611860d8
10000014:       4b2e6158        .word   0x4b2e6158
10000018:       60992100        .word   0x60992100
1000001c:       61592102        .word   0x61592102
10000020:       22f02101        .word   0x22f02101
...

Recognise that? That’s the bootloader. Further down:

Disassembly of section .text:

10000100 <__vectors>:
10000100:       20042000        .word   0x20042000
10000104:       10000109        .word   0x10000109

10000108 <_reset_handler>:
10000108:       f000 f820       bl      1000014c <main>

1000010c <delay>:
1000010c:       b580            push    {r7, lr}
1000010e:       b084            sub     sp, #16
10000110:       af00            add     r7, sp, #0
10000112:       6078            str     r0, [r7, #4]
...

Oh look, our interrupt vector begin at location 0x10000100. The second line contains the address 0x10000109, which is 0x10000108 but out-by-one. This is a feature of how the compiler should compute the offests.

Notice that the first line of the reset handler is effectively bl main. Hey, this is what we told the compiler to do. Further down you will see:

1000014c <main>:
1000014c:       b580            push    {r7, lr}
1000014e:       af00            add     r7, sp, #0
10000150:       4b0b            ldr     r3, [pc, #44]   ; (10000180 <main+0x34>)
10000152:       2205            movs    r2, #5
10000154:       601a            str     r2, [r3, #0]
10000156:       4b0b            ldr     r3, [pc, #44]   ; (10000184 <main+0x38>)
...

This is our main routine.

That’s mostly a condensed version of the object file, anyway. The generated code may be different for you, depending on whether you’ve enable debugging, optimisations, and so on. So don’t be too concerned if the output is a little different.

Lines 25-26 just does some project cleanup.

Lines 31-32 ostensibly create a uf2 file from the elf file. If you’ve compiled the Pico SDK, then this tool would have been created for you, so you can use it. I’m assuming that you’re using Linux.

Lines 29-29 allows you to flash the uf2 to the mcu itself, assuming you’ve done the BOOTSEL dance. Again, I’m assuming you’re using Linux, Debian Stable.

Now, I used the word “obstensibly” previously. This is because the uf2 generated file seems to have something wrong with it. I don’t know why that is. If you can help, then feel free to comment. There could be any number of problems with what’s produced.

So, unfortunately, it is not possible (yet) to just flash the mcu with the uf2 file and have it work. But it is possible to run the code in a debugger. That does work. Maybe that’s some kind of clue to a reader as to what crucial thing I’m doing wrong.

In a separate shell, run start-debug:

#!/usr/bin/env bash

sudo openocd -f interface/picoprobe.cfg -f target/rp2040.cfg -s tcl

My setup is that I have a Pico running as a debugger for another Pico. That’s probably the best way of doing it. I’ve heard of other strategies, but I haven’t been able to get the alternatives to work.

In a different shell, start the debugger by typing

gdb-multiarch

There’s a neat feature of the debugger in that it executes the script .gdbinit if it finds it. This is very convenient, as initialising the debugger can be tedious. Here is the script I use:

# quit without confirmation
define hook-quit
    set confirm off 
end

file app.elf
target remote localhost:3333
load
monitor reset init
#b main.c:64
echo Type c to continue...\n

GDB will load all that. To actually run the elf file and see the mcu blinking, just type c at the command prompt.

All of the above-mentioned files are available in the following directory on my git repo:

https://github.com/blippy/rpi/tree/master/pico/bare-blink

You might just as well clone the whole repo, though, navigate to the relevant directory, and type

make

I hope this has been useful to you. Please let me know of any fixes I need to make in order to get uf2 working. I feel it must be close (??)

4 Responses to A working-esque non-SDK blinky sketch for the #raspberrypico

Pingback: “Barebones” (no SDK) blinky sketch for the #RP2040 | Mark Carter's blog
BABA says:

January 4, 2022 at 4:53 pm

Thank you, it works for me.

Justin says:

February 14, 2022 at 8:18 pm

Hi, great post, glad I found it! I prefer to avoid SDKs where I can, and do direct MMIO in my projects. Did this with the Pico for a couple weeks, but it’s so poorly documented on the startup bits (clock configuration, etc) that I relented and used the SDK (a lot of documentation, but not really an “initialization” section).

As part of my work, I dug into the bootloader bits. The ‘boot2’ component sets up the “XIP” (execute in place) shim for the flash, so you can run your application out of the 2MB SPI flash instead of the more limited 256kB RAM. You can find the boot_stage2 in the SDK under src/rp2_common/boot_stage2/.

And, I must say (an aside from the SDK), CMake is quite annoying, after being used to doing even my personal projects using the BSD Make infrastructure.

- mcturra2000 says:
  
  February 14, 2022 at 9:37 pm
  
  Yes, SDKs can be a pain. The Pi’s SDK can be a good source of “inspiration”, as they have things like “memory barriers” and other stuff to ensure that the code runs optimally and correctly. So it’s worth a bit of a dig.
  
  I also hate cmake. I prefer simple Makefile. The problem with things like cmake and SDKs is that they tend to obfuscate what’s really going on under the hood. I that most people don’t really understand cmake, me included. It can be a real battle.
  
  I think the Rust guys, and perhaps Ada guys (IIRC) have their own boot code. Possibly tinygo does too? I think that one problem is that the boot code requires a CRC (Cyclic Redundancy Check), so you can’t “just” compile code, you’ve got to post-process it to ensure that the CRC works out. Crikey! I took the easy way out and just obtained a dump of what is in the standard bootloader on the Pico, which I incorporated in the crt0.s file. If one had a variant one could always dump out the bootloader of the custom chip and use that.
  
  And at the end of the day, we probably don’t especially want to have a custom bootloader, we just want to boot our device.

	mcturra2000 on Sleepico: a pleasant noise-gen…
	Jordan Reiter on Sleepico: a pleasant noise-gen…
	mcturra2000 on A simple BASIC interpreter in…
	Kivepo on A simple BASIC interpreter in…
	A buffering DAC for… on A simple ring buffer for micro…