Friday, January 26, 2007

Having some 'tee' and smoking 'pipe'

Okay - I am digressing here. It is about Unix shell, not ruby.

Building data warehouse means you are shuffling data from one depot to another. That means a lot of scripting. [ No - I am not going to use ETL - I refuse to use the product where mouse clicking is the only mean of directing what needs to be done ]

These batch jobs usually run at night when I am getting some 'zzz' or 'vodka' :) So it has to log and bullet proof. So you want your script to automatically generate log file and store.

Wait - if I direct my output to a file, I won't see anything on the screen while running via shell. This is very annoying when you are testing.

So why don't you have some 'tee'?

Straight from our 'man', tee 'reads from standard input and writes to standard output and files'. [please google 'tee' for more info ]

Here is what I do with my script


#!/bin/bash

LOGFILE=/a/log/file

(

cmd1

cmd2

cmd3

) 2>&1 | tee -a $LOGFILE

Woo - now we are piping to tee. Basically, we are redirecting STDOUT and STDERR to 'tee'. 'tee' dutifully direct its STDIN to both $LOGFILE and STDOUT.

This is useful when you run the script via terminal. You can see while it is running but the log file is also created and saved automatically.

Alas - we think we are done - no, my friend.

There is an unfortunate side effect of using "|". STATUS of execution.

$?
You should always check $? after executing the command.

For example,

execute_some_program

if [ $? -ne 0 ]; then
echo "some error message"
exit 1
end
However, when you use "|", you only get the exit status of the last command in the chain.

For example
cmd1 | cmd2 | cmd3
$? is that of "cmd3"

So what does it got to with our 'tee' and script.

Remember all the hardworking is done before 'tee'. If there is any error, you want your script to exit with non-zero value to indicate there is a problem. However, our script will return status of 'tee' problem.

Let's translate this. Even though there is a problem and you dutifully exit with non-zero, tee will hijack the status code and replace with its own.

No, this is not a bug. $? is one variable. Unix creator has to decide which of $? needs to reported and they picked the last one.

But there is a trick. Actually, I have to confess that we said 'Unix' shell at the beginning. It is actually 'Bash'.

Bash decided that this is common and created another variable - $PIPESTATUS. This is an array "containing a list of exit status values from the processes in the most-recently-executed foreground pipeline (which may contain only a single command).

So we can improve our program by doing this.

#!/bin/bash

LOGFILE=/a/log/file

(

cmd1

cmd2

cmd3

) 2>&1 | tee -a $LOGFILE

exit ${PIPESTATUS[0]}
So now you had some tee and smoked pipe.

Enjoy.

No comments: