Naming Paths in Code

16 Jul 2017

1 The Problem
2 Several Considerations
3 Variable Naming Rules
4 Conclusion

An attempt at a proposal to come up with conventions to clearly describe different parts of file paths during software development, avoiding namespace clashes.

1 The Problem

I'm often having this issue when writing programs which shuffle files that I have to use variables to refer to files, directories, their parent directories, their absolute paths, or paths relative to another custom root directory, ... Before I know it, I end up with a large number of variables, have to pay attention to avoid name clashes, and need to think a good while before I give them a meaningful name. This often becomes the issue when I can't be bothered to organise my code correctly and split it up into a number of functions of sensible lengths. But then I would get the same problem with functions names. Could more sophisticated IDEs help there? To some extent, maybe, but how many other problems would they introduce in the process? No, I ended up deciding that there's nothing like a bit of diligence when writing code. And so I tried to come up with some guidelines to name paths in code.

2 Several Considerations

While it very much depends on the context of the software you write, there are some general considerations that you may be able to identify:

Are you dealing with a file or directory? Some say directories are a kind of files. But for our purpose here, let's keep it simple and say that files are regular files, e.g. in the sense given by the UNIX find command.
Are you dealing with an input or output file/directory?
Are you dealing with the absolute, relative path of the file/directory or only its name?

This is a glimpse of the sort of considerations you might want to keep in mind when working out what kind of names your variables should bear.

3 Variable Naming Rules

Once these considerations are clearly defined, coming up with a set of rules if fairly straightforward. Following the previously-listed considerations:

If you're dealing with a file, your variable name should comprise the word file, somewhere. Otherwise, if you're dealing with a directory, it should comprise dir. This can easily go further, of course, with symlink, hardlink, socket, pipe, ...
There's certainly the problem that, depending on the language you use, some useful words might be reserved. For instance, Python reserves file and dir which, more often than I'd like, clashes with the names I would gladly have christened my variables with. But using the next rules will naturally solve this by adding more particles to the names.
If you're dealing with an input file or directory, call it infile or indir. Likewise for output files and directories, you see where I'm going with this.
If you're dealing with an absolute path, you could use the word root. If you're dealing with a relative path, you could use the word path. If you're dealing with just the name of a file or directory, you could use the word name.

In particular, it appears that it's desirable to avoid leaving any room for ambiguity. For instance, instead of saying just indir, be explicit as to whether you're referring to the indir's absolute path (indirroot), the indir's name (indirname), or the indir's relative path (indirpath).

In that last case, you might arguably want to altogether cast out the word path as, if you're like me, it might lead to confusion because it's the word you naturally use in your variable names when you don't give it much thought. In fact, why not just use inabsdir for absolutes, inreldir for relatives, indirname for names? It's a bit counter-intuitive, however, that the name particle comes last when the abs and rel ones come middle. Food for thought.

Of course, when you're working in a small function where only one variable will be used to refer to some kind of path, you might be debating whether troubling yourselves with this kind of rules is at all worthwhile. But then, before you know it, small functions become large ones, or you might want to move the code to a larger block, or you might want to know what the path refers to in the context where the function is called from. So maybe this sort of diligence really does apply everywhere.

4 Conclusion

To summarise the above proposal, the following particles would be in order:

Type: file, dir, symlink, hardlink, socket, pipe
Purpose: in, out
Component: root, path, name

It appears that the longest variable could in this case be longouthardlinkname. That's 17 characters, 21% of an 80-columns line, if you care about such things.

Again, this is just an example of how you could organise your path name rules. To come up with your own, it might be worth for inspiration having a look at the terminology your programming language uses. For instance, looking at the Python os.path reference, I see that they use abspath, basename, dirname, normpath, realpath, relpath which must have influenced me when I came up with my own rules.