Python, Subprocesses, Pipes and Doubts
24 Dec 2013
An expression of the doubts and fears I experienced when I once tried to implement pipes between processes in Python, and a proposal for another approach.
1 Doubts
The official documentation explains how implementing a shell pipeline with the subprocess
module is done. After reading the example, I asked myself a few questions:
- In this example, at no point in time does the parent process wait (with
Popen.wait()
orPopen.communicate()
) forp1
to finish, causing dead child processes. Was the example simplified for making reading easier and is the developer supposed to add the necessary code to avoid dead child processes? - I feel uncomfortable about closing
p1.stdout
when I've just previously set it toPIPE
. It may have to do with theSIGPIPE
signal whose effect I do presumably not fully comprehend. - I'm afraid of what could happen if
p2
writes a lot of data tostdout
, what withp2.communicate()
buffering it all up. Trying it out, I saw that this indeed buffers up a lot of memory.
2 Alternative Approach
For a tool of mine I once wrote and use extensively with large amounts of data, I took on a different approach. This is the example for a tar
process piping tarred up data into gpg
to subsequently encrypt it:
# Run tar
tarproc = Popen(tarargs, stdout=PIPE)
# Run gpg
gpgproc = Popen(gpgargs, stdin=tarproc.stdout)
# Manage processes
gpgproc.communicate()
tarproc.communicate()
Following the points I noted earlier on:
- I call
Popen.communicate()
for bothgpgproc
andtarproc
, to avoid dead children. - I don't do anything about
SIGPIPE
s. I don't particularly care aboutgpgproc
exiting beforetarproc
(which it shouldn't do anyway). - Note how the order in which you call
Popen.communicate()
is important, because if you calltarproc.communicate()
first, it will buffer up all whattarproc
writes tostdout
, which you don't want to do if it's a lot of data. On the other hand if you calltarproc.communicate()
last, as you should, there will be no data left to buffer up since it's been consumed bygpgproc
(which doesn't havestdout=PIPE
). So you'll be fine.