LSF Exit Codes
Jobs run by Platform LSF return exit codes, much like UNIX processes do. Things become confusing when you have to make do with both UNIX and LSF exit codes.
It's in essence because scheduling batch systems run UNIX processes that you have to deal with both types of exit codes. I thought I'd describe what's what and add to the confusion even more. Note that the terminology used here comes from what we get to see in accounting files, as recorded in the master into lsb.acct
files.
1 The Exit Info
There's the exitInfo
which is the Job termination reason, mapped to corresponding termination keyword displayed by bacct, if I quote LSF. Crucially, there's nothing about the UNIX exit code in it.
2 The Exit Status
There's the exitStatus
, which is the UNIX exit status of the job, if I quote LSF. Crucially, there's nothing about the LSF exit code in it. In particular, it's wrong to think that the error return code reported by LSF consists of the LSF error code and the User job error code. But there might be a hint about the termination signal. Read on.
3 Termination by Signal
I'm not sure whether processes terminated with a signal have an exit code. Some say they don't, I think they do, but that's besides the point anyway. What matters is that if there is a termination signal, it will be encoded in the exitStatus
.