[last updated - 29 July 2003]
There is a file comparison utility on Unix called diff. This is an extremely useful utility if you are a Clinical SAS programmer for when you want to compare table or listing output. It is a simple matter to compare one output with another using this utility. You can look up how to do this. But it becomes even more useful when you can compare an old set of outputs with a new set of outputs. Suppose you have downloaded new data and you want to know what changes resulted in the tables or listings. It would be great if you could move all the old outputs to a subdirectory and then rerun and compare identically named outputs in the subdirectory with those in the parent directory. Of course, when you do a new run, there will be a line on each page saying who ran it and when. So there will be a difference on every page that you are not interested in. It would be better if that difference were ignored and just the actual results were compared.
This is easy to do once you understand how to handle loops in scripts and "pattern matching". You can loop through the files you have in your subdirectory and compare them with those in the parent directory and use what you know about pattern matching to mask the unimportant differences. If a utility is useful on one file then it can be made to work on many files with little effort except a bit of script writing. And if you have studied all the examples up to this point and done the exercises then you will have come to the conclusion that script writing is not difficult. But you won't have seen an example yet of one useful utility being made to work on many files and the resulting extra power at your disposal. This is where you will see a very practical and useful application of this.
This utility is called ddiff, rather than diff, because it compares whole directories. It is directory difference rather than just the difference between two files. The user makes the subdirectory the current directory and types in the file pattern of outputs they want to compare with in the parent directory. Temporary copies will be made in the home directory where superficial information such as run date will be removed and a comparison between like-named files can be done. Output, as always, will go to the terminal window, and it is up to the user to redirect the output.
But here we have to know pattern matching well. In the example below, I am assuming people use my pagexofy utility to add "Page x of Y" labels to output and that all the extra transient information such as date and time and who ran it is on that line. I am going to mask that line using sed to make a substitution and will explain the pattern I have used. But for now, here is the utility:
#!/bin/sh # Script : ddiff # Version : 1.0 # Author : Roland Rashleigh-Berry # Date : 29 July 2003 # Contact : roland@rashleigh-berry.fsnet.co.uk # Purpose : To compare all the outputs in a subdirectory with identically # named outputs in a parent directory. # SubScripts : none # Notes : Refer to documentation on diff. Note that this utility has # pattern matching and substitution in it to mask lines that will # always change for each run. You need to amend this for your own # standard outputs. # Usage : ddiff *.lst # #================================================================================ # PARAMETERS: #-pos- -------------------------------description-------------------------------- # 1 file or list of files to compare #================================================================================ # AMENDMENT HISTORY: # init --date-- mod-id ----------------------description------------------------- # #================================================================================ # This is public domain software. No guarantee as to suitability or accuracy is # given or implied. User uses this code entirely at their own risk. #================================================================================ if [ $# -lt 1 ] ; then echo "Usage: ddiff t*.lst" 1>&2 exit 1 fi for file in "$@" ; do echo ; echo ; echo echo '=========================================================================' echo 'Comparing ' $file ': < = lower directory > = upper directory' echo '=========================================================================' sed 's/.*Page .* of .*//' $file > $HOME/file1.tmp sed 's/.*Page .* of .*//' ../$file > $HOME/file2.tmp diff -b $HOME/file1.tmp $HOME/file2.tmp done rm -f $HOME/file1.tmp $HOME/file2.tmp
Now to explain the pattern matching in the sed calls. ".*Page .* of .*" will match any number of characters greater than zero up to the word "Page" and then a space and one character at least and then a space and then "of" and then a space and then one or more characters. It will substitute that whole line with nothing. I am sure I am going to match my lines with "Page x of Y" on and blank it out so that these lines will not be compared. If your changeable line in outputs is different then you will have to amend this utility to suit.
Go back to the home page.
E-mail the macro and web site author.