[lug] Newbie Column #7 - In Unix Everything is a File
Wayde Allen
wallen at boulder.nist.gov
Fri Feb 25 15:08:16 MST 2000
At this point you should have some idea how to move around the UNIX file
system, make directories, and look at the contents of a file. What I'd
like to do now is point out one of the real strengths of the UNIX system,
namely, that everything is simply a file! You make use of the same basic
commands to read or write to a disk, keyboard or datafile. As you can
imagine, that is a pretty handy thing from a programmer's point of view.
What might not be so obvious is what this gives you the user.
The simple answer is that this makes it possible to string specific
command operations together as needed to manipulate or combine your data
as desired. You don't necessarily have to purchase or write a specialized
program for the task. The reason this works is that since everything is a
file, all commmands read data from some input file, and write their
results to some output file. Data printed on the computer screen is
actually printed to a file called the standard output or STDOUT. Similarly
the keyboard is usually referred to as the standard input file or STDIN.
Let's see how this works. Remember how the "cat" command can be used to
print the contents of a file to the screen? If instead of typing:
cat <filename>
where <filename> is the name of a file, you simply typed:
cat
the command will, by default, read input from STDIN and print output on
STDOUT. Try it. Once you start the "cat" command, every time you hit the
return key the "cat" command will write what you've typed to STDOUT (the
screen) and you should see both what you typed and what "cat" printed on
your monitor. You can end the "cat" command by hitting the control-D key
combination. (This means pressing the "control" key and the "d" key at the
same time.)
OK, so this may not seem too useful right now. Lets do something more
reasonable. This time try typing:
cat > datafile
1 10 20
2 11 21
3 12 22
4 13 23
5 14 24
6 15 25
Then, end the session with the control-D key combination. You will notice
that this time what you typed didn't appear on the screen twice. That is
because we used the > symbol to redirect the output of the "cat" command
to the file called "datafile". If you list the contents of your current
directory ("ls" command) you should see the file "datafile", and you can
see that it contains what you typed using either the "cat" or "more"
commands. Of course this isn't a very forgiving way to create a datafile
since you can't edit what you've typed once you hit the return key, but it
serves our purpose for this example.
Now suppose that you would like to extract the second column of data from
this newly created data set. All you would need to do is type:
cut -d" " -f2 datafile
The option -d" " means that the data is separated by a single space, and
-f2 means extract field (column) 2. When you hit the return key the
numbers in the second column of your datafile will be printed to the
screen. If you wanted these data in a new file called "column2.data" you
could simply type:
cut -d" " -f2 datafile > column2.data
Extracting the other columns could be done using the commands:
cut -d" " -f1 datafile > column1.data
cut -d" " -f3 datafile > column3.data
Now if you want to create a new file with our original column three as the
first column and column one as the last column you'd only need to type:
paste -d" " column3.data column1.data
On my system I've also written a program to average data sent to it on
STDIN so I can, for example, average the data in column 2 by typing:
cut -d" " -f2 datafile | avg
This last example takes the output from the cut command and pipes it, with
the pipe symbol |, to the input to my averaging program, avg. You can
create your own averaging program using the language of your choice, or I
can send you a copy of the one I wrote.
As you can see even with just two of the file redirection operators and a
few simple commands we can do some pretty useful things. A list of the
file redirection operators I can think of are:
> - Redirect STDOUT to a named file
>> - Append STDOUT to the named file
>! - Redirect STDOUT to a named file and overwrite
the file if it already exists
< - Read STDIN from a named file
| - Pipe STDOUT to a specified STDIN
There are likely a few I've forgotten that deal with redirecting STDIN,
but you should get the idea. This allows you to write very small simple
programs such as my averaging program, and to combine these programs
easily to solve larger, more complex tasks.
More information about the LUG
mailing list