map_concurrently: simple multithreading in Python

I had been playing with multithreading in Haskell recently, and I was impressed to see that it had a function mapConcurrently, which allowed you to run functions in parallel. I thought I would implement this functionality in Python.

Multithreading is something I had played with before in Python, but my prior attempts were completely inelegant. My latest attempt contains few lines of code, and has looser coupling. Here’s some generic code for map_concurrently, which requires a helper class:

import threading

class Strand(threading.Thread):

    def __init__(self, func, arg):
        threading.Thread.__init__(self)
        self.func = func
        self.arg = arg

    def run(self):
        self.result =  self.func(self.arg)

def map_concurrently(func, arg_list):
    strands = [Strand(func, el) for el in arg_list]
    for s in strands: s.start()
    for s in strands: s.join()
    results = [s.result for s in strands]
    return results

This now replicates the functionality of Haskell’s mapConcurrently.I don’t know of any way of avoiding the help class (Strand), because I need a place to store the result of the computation for later collection.

Stash the code above into a module somewhere, and call it using the following example:

import time

def myfunc(x):
    time.sleep(x)
    print("Hello from myfunc. Arg is", x)
    return x + 10


print(map_concurrently(myfunc, [1.5, 1.4, 1.6]))

As you can see, map_concurrently takes a function and a list of arguments, applies that function to the arguments individually, and collects the results into a list.

In the example, myfunc sleeps for the number of seconds passed in as an argument. This allows us to simulate some latency in the function. It then notifies us that it is alive, and returns the argument passed in, incremented by 10. Here is the output of running the code:

Hello from myfunc. Arg is 1.4
Hello from myfunc. Arg is 1.5
Hello from myfunc. Arg is 1.6
[11.5, 11.4, 11.6]

You can see that the function notifies you based on the length of the delay, but the result that is collected is based on original argument order.

You could use map_concurrently to download multiple urls in parallel; which is the original motivation for my interest in multithreading.

Advertisements

About mcturra2000

Computer programmer living in Scotland.
This entry was posted in Python and tagged . Bookmark the permalink.

One Response to map_concurrently: simple multithreading in Python

  1. Joe says:

    You are aware that the GIL in python prevents simultaneous execution of threads in a multicore environment?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s