Jérôme Belleman
Home  •  Tools  •  Posts  •  Talks  •  Travels  •  Graphics  •  About Me

Python, Subprocesses, Pipes and Doubts

24 Dec 2013

An expression of the doubts and fears I experienced when I once tried to implement pipes between processes in Python, and a proposal for another approach.

1 Doubts

The official documentation explains how implementing a shell pipeline with the subprocess module is done. After reading the example, I asked myself a few questions:

  1. In this example, at no point in time does the parent process wait (with Popen.wait() or Popen.communicate()) for p1 to finish, causing dead child processes. Was the example simplified for making reading easier and is the developer supposed to add the necessary code to avoid dead child processes?
  2. I feel uncomfortable about closing p1.stdout when I've just previously set it to PIPE. It may have to do with the SIGPIPE signal whose effect I do presumably not fully comprehend.
  3. I'm afraid of what could happen if p2 writes a lot of data to stdout, what with p2.communicate() buffering it all up. Trying it out, I saw that this indeed buffers up a lot of memory.

2 Alternative Approach

For a tool of mine I once wrote and use extensively with large amounts of data, I took on a different approach. This is the example for a tar process piping tarred up data into gpg to subsequently encrypt it:

# Run tar
tarproc = Popen(tarargs, stdout=PIPE)

# Run gpg
gpgproc = Popen(gpgargs, stdin=tarproc.stdout)

# Manage processes
gpgproc.communicate()
tarproc.communicate()

Following the points I noted earlier on:

  1. I call Popen.communicate() for both gpgproc and tarproc, to avoid dead children.
  2. I don't do anything about SIGPIPEs. I don't particularly care about gpgproc exiting before tarproc (which it shouldn't do anyway).
  3. Note how the order in which you call Popen.communicate() is important, because if you call tarproc.communicate() first, it will buffer up all what tarproc writes to stdout, which you don't want to do if it's a lot of data. On the other hand if you call tarproc.communicate() last, as you should, there will be no data left to buffer up since it's been consumed by gpgproc (which doesn't have stdout=PIPE). So you'll be fine.

3 References