Multiprocessing Idioms and Patterns in Python


Programming in Python is fun, you can get maximum done with minimum means…for the most part. Lately, I’ve been playing around with Python’s multiprocessing module, I thought I’d use some of the long weekend time to write up this post summarizing some general patterns (potentially) for others to leverage.

Multithreading in Python can be smoke and mirrors coz of the Global Interpreter Lock(only one thread executes Python bytecode at any given point of time), however, since multiprocessing module lets you spawn multiple python processes that can run on multiple CPU cores simultaneously and provides with some good built in synchronization techniques, you can do some real parallel programming here.

The following piece of code- uses multiprocessing.Pool to create a pool of worker processes that take a method and the input data, prallelizes the execution, returns the output (sorta like MapReduce). It also shows how you can pass multiple arguments to a pooled method (which textbook examples don’t usually cover) and it has some easter eggs if you can find them 😉

  1. You can’t have multiple layers of pool as in- a pool of processes using another pool of processes underneath. This is the biggest 😔 for me with pool
  2. Watch out for zombies- make sure your pool is properly terminated at the end
  3. Higher pool sizes can be counter productive; strike the right balance- beware of context switching

multiprocessing.Process module and the techniques related to- sharing state across multiple processes and synchronization is also worth learning.