dirtidy

[last updated - 29 July 2003]

A very important utility that I haven't introduced yet is the join utility. It's a bit like merge in SAS. It comes in useful when you are comparing lists of things. In this case I am going to develop a script that checks on the SAS programs in a directory and also checks the list and log outputs to see which logs and list outputs do not belong to a SAS program (usually because a SAS program got renamed). This list will then be fed to the delsome utility so that the user can be prompted to ask them if they want to delete them. You will need access to Unix when you read this as you will have to try out some commands. Please make current a directory where you have sas programs and associated logs and/or listings and where there might possibly be some logs and listings left over from previously named programs.

Firstly get a list of sas programs by typing in the command ls *.sas. If we are going to compare the programs with what logs and lists are there they we won't be able to join on names with extensions. We want names without the .sas , .log and .lst at the end. There are different ways of doing this. We can use awk and tell it that the field separator character is a period and ask it to print the first field. Try this:

ls *.sas | awk -F. '{print $1}'

That works just fine. But I'd like to introduce you to yet another very useful utility called sed. "sed" is short for stream editor. You'll probably never use it as such, though. You are more likely to use it with its substitution facility. I want you to try this command. But I will warn you now that although the output looks correct the call to the command is not correct:

ls *.sas | sed 's/.sas//g'

The output LOOKS okay. The "s" before the first slash tells you to substitute what is inbetween the first two slashes with what is between the last two slashes (nothing in this case) and the "g" at the end tells you to do it not once but globally (repeat until there are none left). The way it appears to have worked is that it has substituted ".sas" in the first two slashes with nothing in the last two slashes. But remember I told you you had to become good at something called "pattern matching" if you are going to become a script writer? This is where we get into it a bit more deeply. There is actually something wrong with the ".sas" in the first two slashes if what you want to do is substitute the ".sas" in the filenames with nothing. Try copying this command and submitting it:

echo asashello worldysas | sed 's/.sas//'

You get left with the message "hello world". What could be the matter? Well I'll tell you straight off. They call these patterns "regular expressions". You should not, however, get them mixed up with the RX expressions in SAS. They are not the same thing (and I am somwhat disappointed that SAS did their own thing with RX). So a period in a regular expression is not a period. It stands for "any single character". So in "asashello", ".sas" matches with "asas" and in "worldysas" it matches with "ysas". And that is why we get "hello world" coming out. But sometimes we really do mean a period rather than any character so to tell it we really mean a period we put an excape character in front of it. This happens to be a slash also but one that goes the other way. Try this:

echo asashello worldysas | sed 's/\.sas//'

This time it works okay. No substitution has been done. But let's check on this command:

echo .sashello world.sas | sed 's/\.sas//g'

It has matched the .sas at the start of ".sashello". We want it to match the end of the line. To do this we add a trailing dollar sign like this:

echo .sashello world.sas | sed 's/\.sas$//g'

You will see that this works as we expected it. So the correct command to drop the ".sas" at the end of a list of files is in the form above. Try this again with the ls command:

ls *.sas | sed 's/\.sas$//g'

Hopefully you are satisfied that this is working correctly. The next thing I want you to do is to sort this list and store it in your home directory like this:

ls *.sas | sed 's/\.sas$//g' | sort > $HOME/saslist.tmp

cat the output file, if you like, to convince yourself the list was written to there.

Now we are going to do the same with the log and lst files. Except we will give the files different names. For the log files, type in this command:

ls *.log | sed 's/\.log$//g' | sort > $HOME/loglist.tmp

and for the lst files do this:

ls *.lst | sed 's/\.lst$//g' | sort > $HOME/lstlist.tmp

Now we have three files for the files ending .sas , .log and .lst . We are now going to join them to find out the differences. Because we have used the sort utility, at least we know they are in the right order for joining ( I sort of sneaked in that utility, didn't I? ).

We don't know how join works yet, so go to your home directory (you should know how) and let's create some test files. Type in this command and ebter the values "a", "c", "e" for file1.tmp like this. Use the Ctrl-d key combination to finish inputting these values:

cat > file1.tmp
a
c
e
Ctrl-d

cat > file2.tmp
a
b
c
d
Ctrl-d

Now let's do a straight join without know anything about how join works:

join file1.tmp file2.tmp

We see "a" and "c" listed. These were the ones common to each file. But what we are interested in are the ones that do not match. Let's say file1 contains out sas programs and file2 contains our logs then we want to list out "b" and "d" because they are not matched with sas programs. The way to do it is like this. Try it out yourself:

join -v2 file1.tmp file2.tmp

The -v option limits output to unpairable lines. This is what we want. But we only want those unpairable lines in the second file so we use the option -v2 to specify that we only want the unpairable lines in file 2 (file2.tmp). So in a sense these will be the log file that are unpaired with sas programs. It would be useful to put the ".log" back on the end of these items so try this command instead:

join -v2 file1.tmp file2.tmp | awk '{print $0 ".log"}'

We have put the extension ".log" back on the mismatched items.

Now we are ready to create this list of mismatched log and lst entries. Let's call our output file badlog.tmp for mismatched logs and badlst.tmp for mismatched lst files. First we will write the mismatched log entries to the output file like this:

join -v2 saslist.tmp loglist.tmp | awk '{print $0 ".log"}' > badlog.tmp

Now we will do the same for the lst files but now adding the ".lst" extension like this:

join -v2 saslist.tmp lstlist.tmp | awk '{print $0 ".lst"}' > badlst.tmp

Now we have two temporary files containing a list of files we might like to delete. We can combine them and sort them and then feed them into the delsome utility so the user can delete them if they choose. We can do this like this:

cat badlog.tmp > bad.tmp
cat badlst.tmp >> bad.tmp
sort bad.tmp | delsome

That is the idea, anyway. The trouble is that we are now located in our home directory and the files are not there. This really needs to be a script so we don't have to type all this in. But it is important for you to know about how it works. So here is the script and you are welcome to try it out after you have copied and pasted it into your script library and called it dirtidy. If you think you do not have any unmatched log and list files in your programs directory then create some especially so you can delete them with this utility.

#!/bin/sh
# Script     : dirtidy
# Version    : 1.0
# Author     : Roland Rashleigh-Berry
# Date       : 29 July 2003
# Contact    : roland@rashleigh-berry.fsnet.co.uk
# Purpose    : To identify widowed .lst and .log files in a directory and prompt
#              for deletion.
# SubScripts : delsome
# Notes      : This script requires no parameters. You must make the directory
#              you want to tidy the current directory before invoking this script
# Usage      : dirtidy
# 
#================================================================================
# PARAMETERS:
#-pos- -------------------------------description--------------------------------
# N/A  (none) 
#================================================================================
# AMENDMENT HISTORY:
# init --date-- mod-id ----------------------description-------------------------
# 
#================================================================================
# This is public domain software. No guarantee as to suitability or accuracy is
# given or implied. User uses this code entirely at their own risk.
#================================================================================

ls *.sas | sed 's/\.sas$//g' | sort > $HOME/saslist.tmp
ls *.log | sed 's/\.log$//g' | sort > $HOME/loglist.tmp
ls *.lst | sed 's/\.lst$//g' | sort > $HOME/lstlist.tmp

join -v2 $HOME/saslist.tmp $HOME/loglist.tmp | awk '{print $0 ".log"}' > $HOME/badlog.tmp
join -v2 $HOME/saslist.tmp $HOME/lstlist.tmp | awk '{print $0 ".lst"}' > $HOME/badlst.tmp

cat $HOME/badlog.tmp > $HOME/bad.tmp
cat $HOME/badlst.tmp >> $HOME/bad.tmp

cat $HOME/bad.tmp | sort | delsome

rm -f $HOME/saslist.tmp $HOME/loglist.tmp $HOME/lstlst.tmp \
$HOME/badlog.tmp $HOME/badlst.tmp $HOME/bad.tmp

Go back to the home page.

E-mail the macro and web site author.