Programming in Python is fun, you can get maximum done with minimum means…for the most part. Lately, I’ve been playing around with Python’s multiprocessing module, I thought I’d use some of the long weekend time to write up this post summarizing some general patterns (potentially) for others to leverage.
Multithreading in Python can be smoke and mirrors coz of the Global Interpreter Lock(only one thread executes Python bytecode at any given point of time), however, since multiprocessing module lets you spawn multiple python processes that can run on multiple CPU cores simultaneously and provides with some good built in synchronization techniques, you can do some real parallel programming here.
The Pool: ¶
The following piece of code- uses multiprocessing.Pool
to create a pool of worker processes that take a method and
the input data, prallelizes the execution, returns the output (sorta like MapReduce).
It also shows how you can pass multiple arguments to a pooled method (which textbook examples don’t usually cover) and
it has some easter eggs if you can find them 😉
Things to watch out for while using Pool: ¶
- You can’t have multiple layers of pool as in- a pool of processes using another pool of processes underneath. This is the biggest 😔 for me with pool
- Watch out for zombies- make sure your pool is properly terminated at the end
- Higher pool sizes can be counter productive; strike the right balance- beware of context switching
multiprocessing.Process
module and
the techniques related to- sharing state across multiple processes and synchronization is also worth learning.