Python, Subprocesses, Pipes and Doubts
24 Dec 2013
An expression of the doubts and fears I experienced when I once tried to implement pipes between processes in Python, and a proposal for another approach.
1 Doubts
The official documentation explains how implementing a shell pipeline with the subprocess module is done. After reading the example, I asked myself a few questions:
- In this example, at no point in time does the parent process wait (with
Popen.wait()orPopen.communicate()) forp1to finish, causing dead child processes. Was the example simplified for making reading easier and is the developer supposed to add the necessary code to avoid dead child processes? - I feel uncomfortable about closing
p1.stdoutwhen I've just previously set it toPIPE. It may have to do with theSIGPIPEsignal whose effect I do presumably not fully comprehend. - I'm afraid of what could happen if
p2writes a lot of data tostdout, what withp2.communicate()buffering it all up. Trying it out, I saw that this indeed buffers up a lot of memory.
2 Alternative Approach
For a tool of mine I once wrote and use extensively with large amounts of data, I took on a different approach. This is the example for a tar process piping tarred up data into gpg to subsequently encrypt it:
# Run tar
tarproc = Popen(tarargs, stdout=PIPE)
# Run gpg
gpgproc = Popen(gpgargs, stdin=tarproc.stdout)
# Manage processes
gpgproc.communicate()
tarproc.communicate()Following the points I noted earlier on:
- I call
Popen.communicate()for bothgpgprocandtarproc, to avoid dead children. - I don't do anything about
SIGPIPEs. I don't particularly care aboutgpgprocexiting beforetarproc(which it shouldn't do anyway). - Note how the order in which you call
Popen.communicate()is important, because if you calltarproc.communicate()first, it will buffer up all whattarprocwrites tostdout, which you don't want to do if it's a lot of data. On the other hand if you calltarproc.communicate()last, as you should, there will be no data left to buffer up since it's been consumed bygpgproc(which doesn't havestdout=PIPE). So you'll be fine.
