|
|
Got a favorite Unix tool? Post about it here.
22 responses total.
Early versions of the Unix system were marked by an emphasis on what came to be known as the tools philosophy: write small programs that act as a filters, doing one thing and doing it well, and combining these programs in pipelines in the shell. It seems in recent years that that philosophy has fallen by the wayside. As near as I can tell, this started with the introduction of perl, which was a Swiss Army knife of a tool, capable of doing many things (some very well, some, well, not so well). However, the basic Unix tools still ship with almost every Unix (or derivitive) system, and knowing how to use them effectively is worthwhile. Indeed, it is possible to combine tools in surprising ways, creating complex yet elegant systems entirely in shell: often programs that would take tens or hundreds of lines of C or even Perl can be written in one or two lines of shell. Often, new filters are prototyped this way. In this item, talk about your favorite Unix tools: what they do, a basic introduction to how to use them, and why you like them.
To start, here's a list of some of the basic tools for manipulating
files:
ls - LiSt the files in the current directory (default)
Or as specified by command line arguments.
mv - Move a file to a new name; this is also used for renaming.
cp - Create a new copy of a file or files.
rm - Remove a file or files.
cat - ConCATenate files to the standard output. For each argument,
print its contents on stdout. Defaults to copying the standard
input stream to the standard output stream.
cd - Usually built into the shell, and sometimes called 'chdir'.
This changes the current working directory for the current
process to the specified directory. Defaults to changing
to the user's home directory.
chmod - CHange file MODe. Change the permissions of a file.
chown - CHange file OWNer. Change what user owns a file.
chgrp - CHange file GRouP. Change what group owns a file.
These, and a handful of others, are the basic tools for working with
files. Most of them take many and varied options, some of greater or
lesser utility. Indeed, a general proliferation of options to basic
Unix tools prompted Brian Kernighan and Rob Pike to write
the, "Program Design in the UNIX Environment" paper and given a
presentation at the 1983 Summer USENIX conference called, "UNIX Style,
or cat -v Considered Harmful."
vi The best editor out there. (Shut your EMACS moufs!) ;)
Hey! There's already a text editor holy war item! Today's tool of the day is: look(1). Look takes a sorted file (which defaults to the system dictionary file) and a prefix, and prints all the strings in the file that start with that prefix. It does this using a binary search algorithm (hence the reason the file be sorted. One of the preconditions of a binary search is that the data be ordered in some meaningful way; since in general look is looking things up lexiographically, it expects the data to be sorted that way), and can be quite speedy. It's not so much of a filter, but can be used at the beginning of a pipeline as a data source.
I don't know if ssh is commonly considered a "tool" in the same way
that ls and cat are, but I know that it and its relative scp can be
used in pipelines and other Unixy constructs for processing remote
data locally, without establishing a remote "session" in the usual
sense. A few (probably silly) examples off the top of my
head of things I could do at a local shell prompt in a terminal
window on my laptop. (What the examples actually do is left as
an exercise for the reader.)
(1) ssh grex.org last | grep '\.msu\.edu ' | wc -l
(2) ssh grex.org cat '/bbs/agora50/_*'|grep '^,U.*,cross$'|wc -l
(3) for f in `ssh grex.org ls /bbs/agora33/_*`
do scp grex.org:$f .
done
Yes, I'm sure there are better ways of doing all the above (example 3
is particularly hideous), but I'm just illustrating a point here.
resp:5 Nice. Certainly, example (3) could be rewritten as:
scp 'grex.org:/bbs/agora33/_*' .
I'm pretty sure that's what John meant when he said it was
particularly hideous.
Btw: I don't know if you're using Bash or some such shell, but it may
not be a bad idea to quote the "*" in the back-tick'ed part of the for
loop in (3).
An often handy flag for ssh is '-C', which compresses the data
stream. That can speed up things like this, at the expense of a
little more CPU and RAM usage on either end of the connection.
If you don't mind running grep on Grex instead of the local end of
your connection, you could substitute 'grep' for 'cat' in (1). There
used to be people on USENET who complained about "useless uses of cat"
and there was an award. However, note how this differs: the pipeline
is spread across the network, and there may be legitimate reasons that
one does not want to run grep on Grex ("cat" is a much, much simpler
program). There are legitimate reasons to use cat in pipelines; this
could be one of them.
I think that John's examples are very much in the spirit of Unix tool
usage; 'ssh' certainly counts in this case, particularly for its
ability to spread computation across the network. That's an oft-
neglected capability, usually overlooked in favor of people who simply
want to use it interactively. I claim that using, e.g., 'scp' is
certainly more "Unixy" than, say, the sftp or ftp commands.
Today's Unix tool o' the day is: ed. The Standard Editor.
And that's what it is: ed is a line oriented text editor. One uses it
by running 'ed file' and then typing in commands from the ed command
set. See the ed(1) manual page, or the book, "The Unix Programming
Environment" for information on how to use ed.
Ed is not a graphical editor; it does not display a full screen-full
of information at a time. In fact, it won't display any part of the
file unless you tell it to by running special ed commands; ed was
designed for use on printing teletype terminals (ie, terminals that
output to an actual typewriter-like device, instead of a graphic
screen). Its diagnostics are horrible in their simplicity: usually,
if you do something ed does not understand, you get a line with the
single character '?' on it, all by itself. Some have called it
the, "What you get is what you get" or WYGIWYG text editor.
Ed is probably the first Unix text editor. I say "probably" because
it's certainly been rewritten several times, and they tried several
variants out at Bell Labs. It is descended from the QED editor that
was used on the Multics system and CTSS. Ed was implemented mostly by
Ken Thompson in the early 1970s, and went on to inspire the Sam and
acme text editors written by Rob Pike. It is called "the Standard
Editor" because it was the only traditional text editor that came with
early Unix distributions.
So, ed is not graphical, has horrible diagnostics, an arcane command
set, and was designed for 1960s standards, why on earth would anyone
want to learn it now?
Because it's tremendously useful as an editing tool for in-place
files. Given that it reads commands from standard input, it can also
be used as a target in a pipeline: one can write scripts that generate
ed commands that are then piped into ed running on a file; this is
handy for batch-mode editing without using temporary files (ed might
make a temporary file, but the script-writer won't have to worry about
that).
Have you got 1000 files in a directory that all have to have the
word "UNIX" changed to "Unix Operating System" in them? Here's a
simple way:
for file in *.html
do
(echo 'g/UNIX/s/UNIX/Unix Operating System/g'; echo w; echo q) |
ed $file
done
Done. Note, some versions of ed(1) support chaining some commands
together, and some support combining the 'w' and 'q' commands into a
single 'wq', but the original ed does not.
Re resp:6 - I use bash, and example (3) worked without quoting the asterisk in the command substiution. Re resp:7 - My introduction to Unix was with Version 6 in 1981 on a PDP-somthing-or-other. (My god that was almost 30 years ago!) The editor choices were 'ed' and a full-screen editor from the Rand Corporation whose name I don't remember. The vi editor may have existed at that point but was not available on Version 6. The Rand editor was pretty resource-intensive and anyway didn't support the dumb terminal I used at home for dialup. So I had to get pretty familiar with ed, and even though I've scarcely used it for many years, the basic commands are still baked into my fingers. I agree that it's still useful for batch processing. Is it true that the standalone grep and sed commands are spinoffs of facilities built into the original ed?
resp:8 Right. If bash doesn't match a wildcard pattern, it will just pass the wildcard as a string to whatever command, instead of an error. Csh and tcsh have the opposite behavior; Bourne shell derivitives probably vary, but I'm not sure. It is true that grep and sed were spinoffs of ed. For the first, the idiom in ed to print all lines matching some regular expression is: g/regular expression/p The 'g' says to apply the following command against all lines in the file. The regular expression is, of course, the regular expression, and p is the command to print a line (if it matched the RE). This functions as a loop; for each line in the file, if it matches the regular expression, print the line (ordinarily regular expression searchs are only within the current line). The etymology of grep is thus clear: g-re-p, or 'global regular expression print.' Sed was similar; it implements much of the ed command language, but against a stream of data. "Sed" is short for "Stream EDitor", or "S Ed", which is just written as 'sed.' Sed was freed from ed when pipes were added to Unix. In particular because 'ed' only works on files, not streams, and (I'm speculating here) it was probably felt that augmenting ed to work with streams was not a good idea for a few reasons. First, it would not have fit in with the whole "software tools" philosophy. Second, I imagine that it was felt that the command sets would diverge, given the different usage patterns and demands of files versus streams. For instance, the 'g' command is gone because it is presumed (given that you're working with a stream) that you want to work with all lines; the notion of a 'current line' selected by the user as a unit of work doesn't make much sense when you are implicitly iterating over all lines, so 'g' is the default.
i thnk it was 1989 when gordon good taught we umich consultnats about how unix got put togetenr with an almost identical explanation to corss's above. i was stunned at the purity of the comcept ... still am
Terribly handy UNIX tool of the day: script(1). script just starts a shell (or another program); it then records the entire terminal session into a file (default 'typescript'). Very handy when you want to record some output from a program or remember what you did.
Yes, that's a handy one. I just used that yesterday, while updating my FreeBSD system, to keep the log messages from the kernel and world makes. Those tend to produce many lines of output and script grabs it all very nicely.
Which reminds me of another handy Unix tool that I don't think has been mentioned yet: tee (1). It lets you capture to files output from intermediate steps in a pipeline.
Script is very handy, and tee is a wonderfully light-weight way of
getting much of the functionality (or of saving some intermediate
step in a pipeline for later use).
One of the beauties of the pipe model is that the same command can
be used multiple times in the same job. So for instance, something
I did today was extract some package dependencies for installed
packages here on Grex and and translate those into URLs so I could
then download them. Suppose I wanted to save a list of both the
package names as well as the resulting URLs: a command to do that
might look like the following:
; grep '^dependencies:' file |
sed 's/^.*: //' |
tr ' ' '\012' |
tee packages |
sed 's,^,http://ftp.openbsd.org/pub/OpenBSD/4.8/packages/i386/,;s/$/.tg
z/' |
tee urls |
sed 's/^/ftp /' |
sh
Now everything downloads, and I've saved not only the package names,
but also the URLs. Note that I could get rid of the "grep" command,
buy wrapping it into the first 'sed', thus making it something like,
"sed -ne 's/^dependencies: //p' file" but I chose to leave it in
for expository purposes. For that matter, I could have tightened
the RE in the first sed command a little in case a package name had
a colon in it (^[^:]*:). None did in my real-life example, but I
guess it's possible. Again, I didn't want to get too far down in
the weeds.
So I've got two invocations of tee, plus three of sed, and I'm
generating output for the shell (an often under-used technique).
I think in systems that force you to operate within the confines
of what one may call a "monolithic" interface, there's less flexibility
to combine tools in this manner. Truly, if you can use the same
tool multiple times in the same job, to do different things, you
gain enormous power and flexibility.
and how much didfferent is script from tee which i have used a buncha times.
script allocates a pseudo-terminal and starts another interactive session on it; it records everything from that session (both input and output) into a file. You could run a graphical editor like vi or emacs under a script session and see all the escape codes, etc, in the resulting file. Tee is like a tee-joint in a water pipe: it writes whatever data it reads to a file *and* to stdout. It's much simpler.
Script reminds me of one of my favorite Unix tools, screen(1). It allows you to create multiple pseudo-terminals, and switch between them, sort of like switching through virtual consoles. The feature I really find handy is the ability to detach and reattach from the pseudo-terminals without closing them.
Screen can be wonderfully handy, particularly if you're coming in over an unreliable link of some kind.
Today's Unix Tools o' the day are for comparing files. When you start to consider the text and files as the basic unit of data exchanged between programs, then comparing that text and those files starts to become an important issue. Since Unix is very much based around the idea of text as a basic representation of data, the Unix authors wisely wrote some tools to help out in this regard. The three basic tools most commonly used here are: 1. diff - Compute the differences between lines of text in two separate files. 2. comm - Show the commonalities (and differences) between two separate files. 3. cmp - Compare two files. Diff shows you in a variety of different formats what the differences between two files are. Want to see how one line of code differs from previous versions? Diff is the answer (indeed, it's so useful that most revision control systems have a built-in diff command to see differences in the revision history). Early versions of diff actually output 'ed' commands, so that changes could be made to a file to match a "diff" by simply piping into ed! Now, that's power. Comm shows you what is common between two files (or what is in one or the other but not in both). It can be extremely useful in a shell pipeline for finding differences from a baseline. In this sense, it is very much a tradiitonal Unix filter, rather than a standalone tool. Finally, cmp is a general utility for detecting whether differences exist between files at the byte level; it works on binary files just as well as text files. It is less sophisticated than diff (which can take options to ignore whitespace and so on), but can be great in shell scripts where one wants to test for differences, but doesn't necessarily care what the differences are. Finally, no discussion of diff would be complete without also mentioning patch. Patch is a tool for applying incremental updates to a file; it's input can be specified in the ubiquitous diff format.
There are also file comparison program similar to those, such as sdiff, wdiff, and colordiff. All interesting to check out. But diff and cmp are by far my favorites. It just depends on how much output I want to see. For those who use vim, the vi improved editor, there is a diff mode, vim -d, for comparing 2 files side by side (similar to sdiff). This works nicely for merging code between two versions of a file.
dd is my Unix Tool of the Day. I find it useful for wiping hard disks (I read input from /dev/zero and write output to the raw device file for the disklabel partition that covers the whole disk) and for making images of disk drives and flash cards. It can also be used the other way: to write an image file to a disk, flash card or USB flash drive. Use it with care though: type the wrong parameter as root and it's easy to accidentally overwrite the wrong disk.
To add to the list of tools, xargs, which allows you to convert lines in standard input to arguments to a command. I often use it in various pipelines, for instance where I select a bunch of files to add to a tarball: grep '.c$' packing-list | xargs tar cvf source.tar Another place where I use it is as an alternative to using find's -exec switch, which I never really got the hang of: find . -name '*.c' | xargs wc -l The above finds all C source files in the current directory and counts the total number of lines (with a summary at the end). In the above case, using -exec IIRC runs `wc -l` on each file separately, which means you don't get a total at the end. However, do note that you might hit into various system limits (particularly with argument length limits) if you use xargs. Normally, xargs splits arguments via whitespace. What if you have files with spaces in them? xargs has a switch, -0, that (ab)uses the null character as an argument separator; when used with find's -print0 switch as well, files with spaces can be fed in the above example.
Response not possible - You must register and login before posting.
|
|
- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss