Simple multithreaded http client in C using curl in 100 lines

I often find simple examples never go as far as I want. They miss some important detail that makes them useful. In an attempt to bridge that gap, I’ll present a program that uses curl to retrieve price data of various companies from Google Finance in less than 100 lines.

If you are using Ubuntu or Debian, first install the necessary dependencies:

sudo apt-get install libcurl4-openssl-dev libcurl4-doc

Then enter the following code as file curly.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <curl/curl.h>
 

struct HttpData {
  pthread_t tid;
  char *gepic;
  char response[2000];
};





size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata)
{
  char *response = (char *) userdata;
  strncat(response, ptr, size * nmemb);
  return size * nmemb;
} 

static void *pull_one_url(void *td)
{
  CURL *curl;
  char url[100];
  struct HttpData *h = (struct HttpData *)td;
  sprintf(url, "http://finance.google.com/finance/info?client=ig&q=LON:%s", h->gepic);
  printf("URL: <%s>\n", url);
  h->response[0] = '';
 
  
  curl = curl_easy_init();
  curl_easy_setopt(curl, CURLOPT_URL, url);
  curl_easy_setopt(curl, CURLOPT_WRITEDATA, h->response);
  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
  curl_easy_perform(curl); /* ignores error */ 
  curl_easy_cleanup(curl);
 
  return NULL;
}
 
 
 
int main(int argc, char **argv)
{
  struct HttpData https[100];
  char *epics[] = {"AZN", "BOGUS", "ULVR", "VOD", 0} ;

  int i;
  int error;

  char *epic;
  epic = *epics;
  int num = 0;
  while(epics[num]) {
    https[num].gepic = epics[num];
    num++;
  }

  for(i=0; i<num; i++) { printf("Epic=<%s>\n", https[i].gepic);}
      
 
  /* Must initialize libcurl before any threads are started 
     Using CURL_GLOBAL_ALL creates a small memory leak in libcurl4.
     The problem seems to be in the SSL side.
  */
  curl_global_init(CURL_GLOBAL_NOTHING);
 
  for(i=0; i< num; i++) {
    //error = pthread_create(&tid[i],
    error = pthread_create(&(https[i].tid),
                           NULL, /* default attributes please */ 
                           pull_one_url,
                           &https[i]);
    if(0 != error)
      fprintf(stderr, "Couldn't run thread number %d, errno %d\n", i, error);
    else
      fprintf(stderr, "Thread %d, gets %s\n", i, https[i].gepic);
  }
 
  /* now wait for all threads to terminate */ 
  for(i=0; i< num; i++) {
    error = pthread_join(https[i].tid, NULL);
    fprintf(stderr, "Thread %d terminated\n", i);
  }

  curl_global_cleanup();

  puts("Results...");
  for(i=0; i<num; i++) {
    printf("%s\n%s\n\n", https[i].gepic, https[i].response);
  }

  return 0;
}

Compile it using the command:

 gcc -ggdb -o curly curly.c -lcurl -lpthread

I wont explain the program in great detail. My main aim is to demystify some of the mechanisms involved. Most examples I see most examples simply dump the output from the fetches to stdout. This is unlikely to be useful in practice. A more usual scenario is where you want to store the reponses for future processing. So I have defined the data structure HttpData as follows:

struct HttpData {
  pthread_t tid;
  char *gepic;
  char response[2000];
};

The HttpdData struct data is passed through pthread_create. So it will need to store input and output parameters. Speaking strictly, the tid did not need to be part of the struct, as it is not used by the called threads.

We could have created an array of tids separately from HttpData, but I chose this design solution because it meant that we don’t have to define two sets of arrays. gepic holds the “epic”, or ticker symbol, of the company information that we want to download.

We construct a URL from these epics in a simple way. I have chosen to download the epics AZN (for Astrazeneca), ULVR (For Unilever) and VOD for Vodafone. I also opted to download “BOGUS”, which will generate a non-existent URL. You will see from the output that the response is just blank. For my purposes, that is acceptable.

You will notice that I have hard-coded the response to be 2000 chars long. This is plenty for holding the response from Google. In cases where the response could be of any conceivable size, you will need to concoct a more robust strategy.

When you want to handle the data that curl wants to give you, you must use provide two options to curl_wasy_setopt(). The first is CURLOPT_WRITEDATA which specifies a pointer where the data is to go:

 curl_easy_setopt(curl, CURLOPT_WRITEDATA, h->response);

The second is CURLOPT_WRITEFUNCTION which specifies a callback function that will be fed that pointer:

  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);

So, when curl feels like it, it will call your function write_callback, saying “here’s some data I retrieved, go do something with it”. I have written it as follows:

size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata)
{
  char *response = (char *) userdata;
  strncat(response, ptr, size * nmemb);
  return size * nmemb;
}

My function is crude. It assumes that all the data received from Curl can be stored. You are likely to want to do something more sophisticated in production code.

I don’t want to say too much about pthreads, as there are many good examples. Pthreads has almost the same idea as Curl: upon creation you need to specify a callback function that needs to be called, and a data structure which is passed to the callback. pthread_create() creates the threads and activates them. We will then want to “hang around” until all the threads have finished before we proceed with the next steps in our computations. This is what the pthread_join functions do.

So, I think I have demonstrated that threads and curl needed be too intimidating. Here is the output from the program:

Epic=<AZN>
Epic=<BOGUS>
Epic=<ULVR>
Epic=<VOD>
Thread 0, gets AZN
URL: <http://finance.google.com/finance/info?client=ig&q=LON:AZN>
Thread 1, gets BOGUS
URL: <http://finance.google.com/finance/info?client=ig&q=LON:BOGUS>
Thread 2, gets ULVR
URL: <http://finance.google.com/finance/info?client=ig&q=LON:ULVR>
Thread 3, gets VOD
URL: <http://finance.google.com/finance/info?client=ig&q=LON:VOD>
Thread 0 terminated
Thread 1 terminated
Thread 2 terminated
Thread 3 terminated
Results...
AZN

// [
{
"id": "1966410"
,"t" : "AZN"
,"e" : "LON"
,"l" : "4,300.00"
,"l_fix" : "4300.00"
,"l_cur" : "GBX4,300.00"
,"s": "0"
,"ltt":"5:07PM GMT+1"
,"lt" : "Oct 10, 5:07PM GMT+1"
,"lt_dts" : "2014-10-10T17:07:42Z"
,"c" : "+90.00"
,"c_fix" : "90.00"
,"cp" : "2.14"
,"cp_fix" : "2.14"
,"ccol" : "chg"
,"pcls_fix" : "4210"
}
]


BOGUS


ULVR

// [
{
"id": "11262480"
,"t" : "ULVR"
,"e" : "LON"
,"l" : "2,515.05"
,"l_fix" : "2515.05"
,"l_cur" : "GBX2,515.05"
,"s": "0"
,"ltt":"4:36PM GMT+1"
,"lt" : "Oct 10, 4:36PM GMT+1"
,"lt_dts" : "2014-10-10T16:36:45Z"
,"c" : "-25.95"
,"c_fix" : "-25.95"
,"cp" : "-1.02"
,"cp_fix" : "-1.02"
,"ccol" : "chr"
,"pcls_fix" : "2541"
}
]


VOD

// [
{
"id": "834331"
,"t" : "VOD"
,"e" : "LON"
,"l" : "195.81"
,"l_fix" : "195.81"
,"l_cur" : "GBX195.81"
,"s": "0"
,"ltt":"5:59PM GMT+1"
,"lt" : "Oct 10, 5:59PM GMT+1"
,"lt_dts" : "2014-10-10T17:59:57Z"
,"c" : "-1.54"
,"c_fix" : "-1.54"
,"cp" : "-0.78"
,"cp_fix" : "-0.78"
,"ccol" : "chr"
,"pcls_fix" : "197.35"
}
]

I like to check for memory leaks, which can be done using the command:

valgrind --leak-check=yes --show-leak-kinds=all ./curly

If you run it, you will see that there are no leaks:

==27522== HEAP SUMMARY:
==27522==     in use at exit: 0 bytes in 0 blocks
==27522==   total heap usage: 628 allocs, 628 frees, 336,059 bytes allocated
==27522==
==27522== All heap blocks were freed -- no leaks are possible
==27522==
==27522== For counts of detected and suppressed errors, rerun with: -v
==27522== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Note that we have not allocated any memory dynamically ourselves, so it looks like Curl has allocated 336kb itself. That is quite a lot.

Advertisements

About mcturra2000

Computer programmer living in Scotland.
This entry was posted in Computers and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s