Simple memory-to-memory STM32F4 DMA example using libopencm3

Well, it took me most of the day to get it working, but I got there in the end.

The idea is that we want to copy memory from one location to another using DMA. Maybe it’s not especially useful, but it should help us get our feet under the table when it comes to learning DMA. I’ll present the code, and discuss it as I go along.

Start off with some boilerplate:

     1	/*
     2	 * Memory-to-memory must use DMA2
     3	 *
     4	 * The F4 has 2 controllers: DMA1, and DMA2.
     5	 * Each DMA has 8 streams: pathways where memory can flow
     6	 *
     7	 * Further info:
     8	 * https://adammunich.com/stm32-dma-cheat-sheet/
     9	 */
    10	
    11	#include <libopencm3/stm32/rcc.h>
    12	//#include <libopencm3/stm32/gpio.h>
    13	//#include <libopencm3/stm32/timer.h>
    14	//#include <libopencm3/stm32/spi.h>
    15	#include <libopencm3/stm32/dma.h>
    16	
    17	#include <string.h>
    18	

Include some convenience headers and other stuff that I won’t go into:

    19	#include "mal.h"
    20	
    21	typedef uint32_t u32;
    22	
    23	#define LED PC13
    24	

We’re going to need to decide on 3 things: a DMA, a stream, and a channel. An STM32F4 has two channels: DMA1 and DMA2. For memory-to-memory copying, you must use DMA2. I haven’t fathomed out exactly what streams and channels are, but it seems that only one set of data can pass through a stream at a time. By “set”, I mean “channel”.

Refer to tables 27 for DMA2 and table 28 for DMA2 , s9.3.3 of the RM0383 Reference Manual for the STM32F411xC/E. It’s on pages 166 and 167. It will tell you, for example, that if you want to use SPI3_RX DMA, then you need DMA, Channel 0, and either stream 0 or 2.

Memory-to-memory isn’t shown in the document, so I have assumed that I can use a slot that isn’t occupied in either of those two tables. So I chose DMA2 and stream 0:

    25	uint32_t dma = DMA2;
    26	//uint8_t ch = -3;
    27	uint8_t strm = DMA_STREAM0; // chosen arbitrarily
    28	

Further down I’ll select channel 4. Let me define an outputting function:

static void myputs(const char *str)
    30	{
    31		mal_usart_print(str);
    32		mal_usart_print("\r\n");
    33	}
    34	
    35	

Don’t worry about how mal_usart_print() is defined, just think of myputs() as equivalent of puts(), but for USART1.

It is possible to enable an interrupt for the DMA. The name of all the interrupts are predefined in libopencm3/libopencm3/include/libopencmsis/stm32/f4/irqhandlers.h

You don’t get to choose your own one. I’m going to use an ISR to experiment with. It’s not strictly necessary to use one, you could do busy waiting for the completion transfer flag to be set, for example. There’s probably not much point in just doing that, though, as you’ll have done a memcpy(), but harder. Here’s my definitions:


    36	volatile bool transfer_complete = false;
    37	volatile bool loud_dma_isr = true;
    38	
    39	void dma2_stream0_isr()
    40	{
    41		if(dma_get_interrupt_flag(dma, strm, DMA_TCIF)) {
    42			dma_clear_interrupt_flags(dma, strm, DMA_TCIF); // clear transfer complete flag
    43			transfer_complete = true;
    44			if(loud_dma_isr) 
    45				myputs("dma2_stream0_isr called: transfer complete");
    46		} 
    47		else if(dma_get_interrupt_flag(dma, strm, DMA_DMEIF)) {
    48			dma_clear_interrupt_flags(dma, strm, DMA_DMEIF);
    49			myputs("dma2_stream0_isr called: Direct Mode Error Interrupt Flag");
    50		} 
    51		else if(dma_get_interrupt_flag(dma, strm, DMA_TEIF)) {
    52			dma_clear_interrupt_flags(dma, strm, DMA_TEIF);
    53			myputs("dma2_stream0_isr called: Transfer Error Interrupt Flag");
    54		} 
    55		else if(dma_get_interrupt_flag(dma, strm, DMA_FEIF)) {
    56			dma_clear_interrupt_flags(dma, strm, DMA_FEIF);
    57			myputs("dma2_stream0_isr called: FIFO Error Interrupt Flag");
    58		} 
    59		else if(dma_get_interrupt_flag(dma, strm, DMA_HTIF)) {
    60			dma_clear_interrupt_flags(dma, strm, DMA_HTIF);
    61			myputs("dma2_stream0_isr called: Half Transfer Interrupt Flag");
    62		} 
    63		else {
    64			myputs("dma2_stream0_isr called: Unhandled (should never be called)");
    65		}
    66	
    67	}
    68	

The MCU could possible set several flags that can be set for the interrupt, and I have enumerated all the possibilities. I did this because I made a bug in my coding which seemed to trigger interrupt requests mysteriously. A language like ADA would probably have prevented the silly error in the first place.

So the way you’d hoose to code the ISR would likely be much shorter. The case we’re only really interested in is when the DMA_TCIF flag is set. This is when the transfer is complete.

In the ISR, I clear the flag. That is important, because otherwise the interrupt will keep firing. I also set my own boolean variable “transfer_complete” to true. The variable “loud_dma_isr” is for debugging purposes. I’m going to want to turn it off when I do benchmarking.

Let’s define main(), set up the built-in LED, and initialise a USART:

69    int main(void)
70    {
71        pin_out(LED);
72        mal_usart_init();
73    

I’m not going to go into the details of those. Just accept that they do something useful.

Let’s display some output, and set up some variables as source and destinations for our copying:

    74		myputs("");
    75		myputs("=============================");
    76		myputs("DMA example: memory to memory");
    77	
    78		char src1[] = "1234567890";
    79		char dst1[] = "abcdefghij";
    80		uint32_t len = strlen(src1) + 1;
    81	
    82		myputs(dst1);
    83	

OK, time to do some basic configuration. S9.3.17 (page 181) of the manual gives the stream configuration procedure. I think it is a little misleading, as you don’t quite want to do it exactly as they have laid out. I have tried to keep things to a minimum.

You need to enable the relevant clock:

    85		// follow instructions in s9.3.17 of RM0383a, p181
    86		rcc_periph_clock_enable(RCC_DMA2);
    87	

Disable the stream. It is possible that a stream is already being used, and so you need to block until it is finished. If you choose your streams carefully, so that there is no possible contention, then you probably won’t need to do much in the way of waiting:

    88		// step 1 : disable the stream
    89		myputs("Disabling stream");
    90		//dma_disable_stream(dma, strm);
    91		DMA_SCR(dma, strm) &= ~DMA_SxCR_EN;
    92		while(DMA_SCR(dma, strm) & DMA_SxCR_EN); // wait until it is down
    93		myputs("Stream disabled. OK to configure");
    94	

The procedure advises to get the addresses of the source and destination addresses. If you don’t want to change them in future, then you can set them now. Or set them as needed. The chances are that you are going to fix the addresses anyway. I wanted to play around for this example:

    95		//DMA_SPAR(dma, strm) = (uint32_t) str1; // step 2: set peripheral port address
    96		myputs("Step 2");
    97		//dma_set_peripheral_address(dma, strm, (uint32_t) src1); // step 2: set source address
    98		//DMA_SM0AR(dma, strm) = *(uint32_t*) str2; // step 3: set the memory address
    99		myputs("Step 3");
   100		dma_set_memory_address(dma, strm, (uint32_t) dst1); // step 3 : destination address
   101		myputs("Step 4");
   102		dma_set_number_of_data(dma, strm, len); // step 4: total number of data items
   103		// dma_channel_select(dma, strm, 0); // step 5: I just made up a channel number in this case

For now I have set the destination address – what the reference manual calls memory address, in line 100. I’ve also set up the length of the transfer, in line 102.

Then I chose stream 4:

   104		myputs("Step 5");
   105		dma_channel_select(dma, strm, DMA_SxCR_CHSEL_4); // step 5: I just made up a channel number in this case

There’s a bunch of stuff mentioned in the configuration procedure that I just ignored:

   106		// step 6: something about flow controller. Omitted
   107		// step 7: configure stream priority. Omitted
   108		// step 8: configure FIFO usage. Omitted setup of FIFO
   109	

I think “flow controller” is when you aren’t sure the length of the transmission. There’s also stream priorities you can set, which i’m not interested in. There’s a variety of transfer methods, including burst, FIFO, half-transmission, etc. You’d use half-transmission if you wanted to set up double-buffering. We’re going to be using memory-to-memory:

   110		// step 9: variety of things
   111		myputs("Step 9a");
   112		dma_set_transfer_mode(dma, strm, DMA_SxCR_DIR_MEM_TO_MEM); // step 9: data direction

and many of those methods won’t be available to us in that mode.

Setup up peripheral and memory increment mode:

   113		myputs("Step 9b");
   114		dma_enable_peripheral_increment_mode(dma, strm); // step 9: we want to increment "periph" address
   115		myputs("Step 9c");
   116		dma_enable_memory_increment_mode(dma, strm); // step 9: ditto for memory

In other words, as we transfer each item we increase both the source and destination addresses as we do so. This is how you do memory copy. If you’re writing to a SPI, for example, then the peripheral address won’t change. Our data is in 8-bit format:

   117		//dma_enable_directt_mode(dma, strm); // step 9: 
   118		// step 9 : can use single or burst, but not circ, direct or double-buffer
   119		myputs("Step 9d");
   120		dma_set_memory_size(dma, strm, DMA_SxCR_MSIZE_8BIT);
   121		myputs("Step 9e");
   122		//dma_set_peripheral_size(dma, strm, len);
   123		dma_set_peripheral_size(dma, strm, DMA_SxCR_PSIZE_8BIT);

You can do wacky things like have a source which is 8 bit and a destination which is 16 bits. This causes padding or truncation, which may be useful. Refer to the datasheet for more info. It’s of no use to us here.

Turn on the interrupts:

   124		myputs("Fiddling with interrupts");
   125		//dma_clear_interrupt_flags(dma, strm, DMA_TCIF); // clear transfer complete flag
   126		//dma_clear_interrupt_flags(dma, strm, DMA_HTIF); // clear half-transfer complete flag
   127		//dma_disable_half_transfer_interrupt(dma, strm);
   128		dma_enable_transfer_complete_interrupt(dma, strm);
   129		nvic_enable_irq(NVIC_DMA2_STREAM0_IRQ);
   130		myputs("Finished setting up interrupts");
   131	
   132	

Now let’s do a transfer:

   133		myputs("Start tfr 1");
   134		transfer_complete = false;
   135		dma_set_peripheral_address(dma, strm, (uint32_t) src1); // step 2: set source address
   136		dma_enable_stream(dma, strm); // step 10
   137	
   138		myputs(dst1); // this will likely only partially complete
   139		while(!transfer_complete);
   140		myputs("Tfr 1 completed");
   141		myputs(dst1);
   142		myputs(src1);

We wanted to play around with the source address, remember, in line 135. Line 134 ensures we zero out our completion of transfer flag.

In line 139, we print out the contents of our target location before we know the transfer is complete. Line 138 outputs to the console (or it did for me, at least):

a234567890

It should read:

1234567890

Line 139 does a busy-wait, after which the correct output is given. The output to the console so far is:

=============================
DMA example: memory to memory
abcdefghij
Disabling stream
Stream disabled. OK to configure
Step 2
Step 3
Step 4
Step 5
Step 9a
Step 9b
Step 9c
Step 9d
Step 9e
Fiddling with interrupts
Finished setting up interrupts
Start tfr 1
dma2_stream0_isr called: transfer complete
a234567890
Tfr 1 completed
1234567890
1234567890

Let’s try another DMA request to make sure things work as we expect them to:

   143	
   144		
   145	
   146	
   147	
   148		myputs("\r\nStart tfr 2");
   149		transfer_complete = false;
   150		char src2[] = "ABCDEFGHIJ";
   151		dma_set_peripheral_address(dma, strm, (uint32_t) src2);
   152		dma_enable_stream(dma, strm);
   153		while(!transfer_complete);
   154		myputs(dst1);
   155		myputs("Tfr 2 completed");
   156	

The output on the console reads:

Start tfr 2
dma2_stream0_isr called: transfer complete
ABCDEFGHIJ
Tfr 2 completed

Good! We have successfully copied string src2 to dst1.

That’s the basic’s covered. Now let’s do some timings, to see how fast DMA transfer is compared to a regular memcpy():

	// now do timings
   158	#define TPIN PC14
   159		pin_out(TPIN);
   160		char dst3[512], src3[512];
   161		int i;
   162		dma_set_peripheral_address(dma, strm, (uint32_t) src3);
   163		dma_set_number_of_data(dma, strm, 512);
   164		dma_set_memory_address(dma, strm, (uint32_t) dst3);
   165		loud_dma_isr = false;
   166		while(1) {
   167			// use dma
   168			pin_high(TPIN);
   169			for(i = 0; i< 100; ++i) {
   170				transfer_complete = false;
   171				dma_enable_stream(dma, strm);
   172				while(!transfer_complete);
   173			}
   174			pin_low(TPIN);
   175	
   176			mal_delayish(1);
   177	
   178			// use memcpy
   179			pin_high(TPIN);
   180			for(i = 0; i< 100; ++i) {
   181				memcpy(dst3, src3, 512);
   182			}
   183			pin_low(TPIN);
   184	
   185			mal_delayish(10);
   186		}
   187	
   188	}

I use pin PC14 to toggle the pin high and low when I do 100 rounds of DMA transfers, and 100 rounds of memcpy(). I use a logic analyser to see how long it took. I didn’t want to copy 11 bytes at a time, but a more reasonable 512 byte block. I haven’t bothered setting up their contents, as I’m happy that we’ve already figured out that the copying is working correctly.

Using my logic analyser, the DMA transfers take about 11.796ms. That’s for 100 X 512-byte blocks. So each block takes 118us. That’s actually pretty unpleasant if we’re playing with audio at, say, 44kHz, which works out at about 23us per sample. So we may need to be a little clever how we do this so as not to cause jittering in our audio.

Using memcpy() takes 2.703ms, which is 27us per block.

As you can see, a naive memcpy() works much faster than a DMA transfer. The difference is that memcpy() actually blocks, because it is tying up the CPU, whereas the DMA can be run synchronously.

That doesn’t seem to be much of a win for DMA over the simpler memcpy. Unless I’ve done something hideously wrong, of course. It seems that DMA will be much more useful in something like SPI transfers, which can seriously clog up CPU usage due to their relative low speed.

So, I hope this post was useful to you. It is my first foray into DMA, so if you have any comments to make, then feel free. I probably won’t be able to answer many questions you have, though.

My plan next is to see how DMA can be used over SPI to output to a DAC. I think the I2S functionality will be relevant here. But that’s a battle for another day.

You can download the code here. It’s probably simplest to download the whole repo and issue a make in that directory. Happy hunting.

About mcturra2000

Computer programmer living in Scotland.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s