You are not logged in. Login Now
 0-22          
 
Author Message
cross
Unix tool o' the day. Mark Unseen   Sep 18 06:28 UTC 2010

Got a favorite Unix tool?  Post about it here.
22 responses total.
cross
response 1 of 22: Mark Unseen   Sep 18 06:59 UTC 2010

Early versions of the Unix system were marked by an emphasis on what 
came to be known as the tools philosophy: write small programs that 
act as a filters, doing one thing and doing it well, and combining 
these programs in pipelines in the shell.

It seems in recent years that that philosophy has fallen by the 
wayside.  As near as I can tell, this started with the introduction of 
perl, which was a Swiss Army knife of a tool, capable of doing many 
things (some very well, some, well, not so well).  However, the basic 
Unix tools still ship with almost every Unix (or derivitive) system, 
and knowing how to use them effectively is worthwhile.  Indeed, it is 
possible to combine tools in surprising ways, creating complex yet 
elegant systems entirely in shell: often programs that would take tens 
or hundreds of lines of C or even Perl can be written in one or two 
lines of shell.  Often, new filters are prototyped this way.

In this item, talk about your favorite Unix tools: what they do, a 
basic introduction to how to use them, and why you like them.
cross
response 2 of 22: Mark Unseen   Sep 18 07:07 UTC 2010

To start, here's a list of some of the basic tools for manipulating 
files:

ls  - LiSt the files in the current directory (default)
      Or as specified by command line arguments.

mv  - Move a file to a new name; this is also used for renaming.

cp  - Create a new copy of a file or files.

rm  - Remove a file or files.

cat - ConCATenate files to the standard output.  For each argument,
      print its contents on stdout.  Defaults to copying the standard
      input stream to the standard output stream.

cd  - Usually built into the shell, and sometimes called 'chdir'.
      This changes the current working directory for the current
      process to the specified directory.  Defaults to changing
      to the user's home directory.

chmod - CHange file MODe.  Change the permissions of a file.

chown - CHange file OWNer.  Change what user owns a file.

chgrp - CHange file GRouP.  Change what group owns a file.

These, and a handful of others, are the basic tools for working with 
files.  Most of them take many and varied options, some of greater or 
lesser utility.  Indeed, a general proliferation of options to basic 
Unix tools prompted Brian Kernighan and Rob Pike to write 
the, "Program Design in the UNIX Environment" paper and given a 
presentation at the 1983 Summer USENIX conference called, "UNIX Style, 
or cat -v Considered Harmful."
tod
response 3 of 22: Mark Unseen   Sep 18 20:39 UTC 2010

vi
The best editor out there.  (Shut your EMACS moufs!) ;)
cross
response 4 of 22: Mark Unseen   Sep 19 08:55 UTC 2010

Hey!  There's already a text editor holy war item!

Today's tool of the day is: look(1).

Look takes a sorted file (which defaults to the system dictionary 
file) and a prefix, and prints all the strings in the file that start 
with that prefix.  It does this using a binary search algorithm (hence 
the reason the file be sorted.  One of the preconditions of a binary 
search is that the data be ordered in some meaningful way; since in 
general look is looking things up lexiographically, it expects the 
data to be sorted that way), and can be quite speedy.  It's not so 
much of a filter, but can be used at the beginning of a pipeline as a 
data source.
remmers
response 5 of 22: Mark Unseen   Sep 21 22:15 UTC 2010

I don't know if ssh is commonly considered a "tool" in the same way
that ls and cat are, but I know that it and its relative scp can be
used in pipelines and other Unixy constructs for processing remote
data locally, without establishing a remote "session" in the usual
sense.  A few (probably silly) examples off the top of my
head of things I could do at a local shell prompt in a terminal
window on my laptop.  (What the examples actually do is left as
an exercise for the reader.)

(1)  ssh grex.org last | grep '\.msu\.edu ' | wc -l

(2)  ssh grex.org cat '/bbs/agora50/_*'|grep '^,U.*,cross$'|wc -l

(3)  for f in `ssh grex.org ls /bbs/agora33/_*`
     do scp grex.org:$f .
     done

Yes, I'm sure there are better ways of doing all the above (example 3
is particularly hideous), but I'm just illustrating a point here.
cross
response 6 of 22: Mark Unseen   Sep 22 12:03 UTC 2010

resp:5 Nice.  Certainly, example (3) could be rewritten as:

    scp 'grex.org:/bbs/agora33/_*' .

I'm pretty sure that's what John meant when he said it was 
particularly hideous.

Btw: I don't know if you're using Bash or some such shell, but it may 
not be a bad idea to quote the "*" in the back-tick'ed part of the for 
loop in (3).

An often handy flag for ssh is '-C', which compresses the data 
stream.  That can speed up things like this, at the expense of a 
little more CPU and RAM usage on either end of the connection.

If you don't mind running grep on Grex instead of the local end of 
your connection, you could substitute 'grep' for 'cat' in (1).  There 
used to be people on USENET who complained about "useless uses of cat" 
and there was an award.  However, note how this differs: the pipeline 
is spread across the network, and there may be legitimate reasons that 
one does not want to run grep on Grex ("cat" is a much, much simpler 
program).  There are legitimate reasons to use cat in pipelines; this 
could be one of them.

I think that John's examples are very much in the spirit of Unix tool 
usage; 'ssh' certainly counts in this case, particularly for its 
ability to spread computation across the network.  That's an oft-
neglected capability, usually overlooked in favor of people who simply 
want to use it interactively.  I claim that using, e.g., 'scp' is 
certainly more "Unixy" than, say, the sftp or ftp commands.
cross
response 7 of 22: Mark Unseen   Sep 22 12:33 UTC 2010

Today's Unix tool o' the day is: ed.  The Standard Editor.

And that's what it is: ed is a line oriented text editor.  One uses it 
by running 'ed file' and then typing in commands from the ed command 
set.  See the ed(1) manual page, or the book, "The Unix Programming 
Environment" for information on how to use ed.

Ed is not a graphical editor; it does not display a full screen-full 
of information at a time.  In fact, it won't display any part of the 
file unless you tell it to by running special ed commands; ed was 
designed for use on printing teletype terminals (ie, terminals that 
output to an actual typewriter-like device, instead of a graphic 
screen).  Its diagnostics are horrible in their simplicity: usually, 
if you do something ed does not understand, you get a line with the 
single character '?' on it, all by itself.  Some have called it 
the, "What you get is what you get" or WYGIWYG text editor.

Ed is probably the first Unix text editor.  I say "probably" because 
it's certainly been rewritten several times, and they tried several 
variants out at Bell Labs.  It is descended from the QED editor that 
was used on the Multics system and CTSS.  Ed was implemented mostly by 
Ken Thompson in the early 1970s, and went on to inspire the Sam and 
acme text editors written by Rob Pike.  It is called "the Standard 
Editor" because it was the only traditional text editor that came with 
early Unix distributions.

So, ed is not graphical, has horrible diagnostics, an arcane command 
set, and was designed for 1960s standards, why on earth would anyone 
want to learn it now?

Because it's tremendously useful as an editing tool for in-place 
files.  Given that it reads commands from standard input, it can also 
be used as a target in a pipeline: one can write scripts that generate 
ed commands that are then piped into ed running on a file; this is 
handy for batch-mode editing without using temporary files (ed might 
make a temporary file, but the script-writer won't have to worry about 
that).

Have you got 1000 files in a directory that all have to have the 
word "UNIX" changed to "Unix Operating System" in them?  Here's a 
simple way:

for file in *.html
do
    (echo 'g/UNIX/s/UNIX/Unix Operating System/g'; echo w; echo q) |
        ed $file
done

Done.  Note, some versions of ed(1) support chaining some commands 
together, and some support combining the 'w' and 'q' commands into a 
single 'wq', but the original ed does not.
remmers
response 8 of 22: Mark Unseen   Sep 22 13:26 UTC 2010

Re resp:6 - I use bash, and example (3) worked without quoting the
asterisk in the command substiution.

Re resp:7 - My introduction to Unix was with Version 6 in 1981 on
a PDP-somthing-or-other.  (My god that was almost 30 years ago!)
The editor choices were 'ed' and a full-screen editor from the Rand
Corporation whose name I don't remember.  The vi editor may have
existed at that point but was not available on Version 6.  The Rand
editor was pretty resource-intensive and anyway didn't support the
dumb terminal I used at home for dialup.  So I had to get pretty
familiar with ed, and even though I've scarcely used it for many
years, the basic commands are still baked into my fingers.  I agree
that it's still useful for batch processing.

Is it true that the standalone grep and sed commands are spinoffs of
facilities built into the original ed?
cross
response 9 of 22: Mark Unseen   Sep 22 13:56 UTC 2010

resp:8 Right.  If bash doesn't match a wildcard pattern, it will just 
pass the wildcard as a string to whatever command, instead of an 
error.  Csh and tcsh have the opposite behavior; Bourne shell 
derivitives probably vary, but I'm not sure.

It is true that grep and sed were spinoffs of ed.

For the first, the idiom in ed to print all lines matching some 
regular expression is:

g/regular expression/p

The 'g' says to apply the following command against all lines in the 
file.  The regular expression is, of course, the regular expression, 
and p is the command to print a line (if it matched the RE).  This 
functions as a loop; for each line in the file, if it matches the 
regular expression, print the line (ordinarily regular expression 
searchs are only within the current line).  The etymology of grep is 
thus clear: g-re-p, or 'global regular expression print.'

Sed was similar; it implements much of the ed command language, but 
against a stream of data.  "Sed" is short for "Stream EDitor", or "S 
Ed", which is just written as 'sed.'  Sed was freed from ed when pipes 
were added to Unix.  In particular because 'ed' only works on files, 
not streams, and (I'm speculating here) it was probably felt that 
augmenting ed to work with streams was not a good idea for a few 
reasons.  First, it would not have fit in with the whole "software 
tools" philosophy.  Second, I imagine that it was felt that the 
command sets would diverge, given the different usage patterns and 
demands of files versus streams.  For instance, the 'g' command is 
gone because it is presumed (given that you're working with a stream) 
that you want to work with all lines; the notion of a 'current line' 
selected by the user as a unit of work doesn't make much sense when 
you are implicitly iterating over all lines, so 'g' is the default.
tsty
response 10 of 22: Mark Unseen   Sep 23 00:26 UTC 2010

  
i thnk it was 1989 when gordon good taught we umich consultnats about how unix
got put togetenr with an almost identical explanation to corss's above.
  
i was stunned at the purity of the comcept ... still am
vsrinivas
response 11 of 22: Mark Unseen   Nov 30 13:07 UTC 2010

Terribly handy UNIX tool of the day: script(1).

script just starts a shell (or another program); it then records the 
entire terminal session into a file (default 'typescript'). Very handy 
when you want to record some output from a program or remember what you 
did.

kentn
response 12 of 22: Mark Unseen   Nov 30 13:54 UTC 2010

Yes, that's a handy one.  I just used that yesterday, while updating
my FreeBSD system, to keep the log messages from the kernel and world
makes.  Those tend to produce many lines of output and script grabs
it all very nicely.  
remmers
response 13 of 22: Mark Unseen   Dec 1 19:55 UTC 2010

Which reminds me of another handy Unix tool that I don't think has been
mentioned yet:  tee (1).  It lets you capture to files output from
intermediate steps in a pipeline.
cross
response 14 of 22: Mark Unseen   Dec 1 21:21 UTC 2010

Script is very handy, and tee is a wonderfully light-weight way of
getting much of the functionality (or of saving some intermediate
step in a pipeline for later use).

One of the beauties of the pipe model is that the same command can
be used multiple times in the same job.  So for instance, something
I did today was extract some package dependencies for installed
packages here on Grex and and translate those into URLs so I could
then download them.  Suppose I wanted to save a list of both the
package names as well as the resulting URLs: a command to do that
might look like the following:

; grep '^dependencies:' file |
    sed 's/^.*: //' |
    tr ' ' '\012' |
    tee packages |
    sed 's,^,http://ftp.openbsd.org/pub/OpenBSD/4.8/packages/i386/,;s/$/.tg
z/' |
    tee urls |
    sed 's/^/ftp /' |
    sh

Now everything downloads, and I've saved not only the package names,
but also the URLs.  Note that I could get rid of the "grep" command,
buy wrapping it into the first 'sed', thus making it something like,
"sed -ne 's/^dependencies: //p' file" but I chose to leave it in
for expository purposes.  For that matter, I could have tightened
the RE in the first sed command a little in case a package name had
a colon in it (^[^:]*:).  None did in my real-life example, but I
guess it's possible.  Again, I didn't want to get too far down in
the weeds.

So I've got two invocations of tee, plus three of sed, and I'm
generating output for the shell (an often under-used technique).
I think in systems that force you to operate within the confines
of what one may call a "monolithic" interface, there's less flexibility
to combine tools in this manner.  Truly, if you can use the same
tool multiple times in the same job, to do different things, you
gain enormous power and flexibility.
tsty
response 15 of 22: Mark Unseen   Dec 2 08:17 UTC 2010

  
and how much didfferent is    script     from    tee   which i have used a
buncha times.
  
cross
response 16 of 22: Mark Unseen   Dec 2 08:56 UTC 2010

script allocates a pseudo-terminal and starts another interactive
session on it; it records everything from that session (both input
and output) into a file.  You could run a graphical editor like vi
or emacs under a script session and see all the escape codes, etc,
in the resulting file.

Tee is like a tee-joint in a water pipe: it writes whatever data
it reads to a file *and* to stdout.  It's much simpler.
nharmon
response 17 of 22: Mark Unseen   Dec 6 16:30 UTC 2010

Script reminds me of one of my favorite Unix tools, screen(1). It allows
you to create multiple pseudo-terminals, and switch between them, sort
of like switching through virtual consoles. The feature I really find
handy is the ability to detach and reattach from the pseudo-terminals
without closing them.
cross
response 18 of 22: Mark Unseen   Dec 6 21:32 UTC 2010

Screen can be wonderfully handy, particularly if you're coming in over an
unreliable link of some kind.
cross
response 19 of 22: Mark Unseen   Dec 6 21:45 UTC 2010

Today's Unix Tools o' the day are for comparing files.

When you start to consider the text and files as the basic unit of data
exchanged between programs, then comparing that text and those files starts
to become an important issue.  Since Unix is very much based around the idea
of text as a basic representation of data, the Unix authors wisely wrote some
tools to help out in this regard.

The three basic tools most commonly used here are:

1. diff - Compute the differences between lines of text in two separate files.
2. comm - Show the commonalities (and differences) between two separate files.
3. cmp  - Compare two files.

Diff shows you in a variety of different formats what the differences between
two files are.  Want to see how one line of code differs from previous
versions?  Diff is the answer (indeed, it's so useful that most revision
control systems have a built-in diff command to see differences in the
revision history).  Early versions of diff actually output 'ed' commands,
so that changes could be made to a file to match a "diff" by simply
piping into ed!  Now, that's power.

Comm shows you what is common between two files (or what is in one or the
other but not in both).  It can be extremely useful in a shell pipeline for
finding differences from a baseline.  In this sense, it is very much a
tradiitonal Unix filter, rather than a standalone tool.

Finally, cmp is a general utility for detecting whether differences exist
between files at the byte level; it works on binary files just as well as text
files.  It is less sophisticated than diff (which can take options to ignore
whitespace and so on), but can be great in shell scripts where one wants to
test for differences, but doesn't necessarily care what the differences are.

Finally, no discussion of diff would be complete without also mentioning patch.
Patch is a tool for applying incremental updates to a file; it's input can be
specified in the ubiquitous diff format.
kentn
response 20 of 22: Mark Unseen   Dec 6 23:49 UTC 2010

There are also file comparison program similar to those, such as sdiff,
wdiff, and colordiff.  All interesting to check out.  But diff and cmp
are by far my favorites.  It just depends on how much output I want to
see.

For those who use vim, the vi improved editor, there is a diff mode, vim
-d, for comparing 2 files side by side (similar to sdiff).  This works
nicely for merging code between two versions of a file.
ball
response 21 of 22: Mark Unseen   Jul 16 16:32 UTC 2015

    dd is my Unix Tool of the Day.  I find it useful for
wiping hard disks (I read input from /dev/zero and write
output to the raw device file for the disklabel partition
that covers the whole disk) and for making images of disk
drives and flash cards.  It can also be used the other way:
to write an image file to a disk, flash card or USB flash
drive.  Use it with care though: type the wrong parameter
as root and it's easy to accidentally overwrite the wrong
disk.
butiki
response 22 of 22: Mark Unseen   Sep 4 23:47 UTC 2017

To add to the list of tools, xargs, which allows you to convert lines in
standard input to arguments to a command. I often use it in various pipelines,
for instance where I select a bunch of files to add to a tarball:

   grep '.c$' packing-list | xargs tar cvf source.tar

Another place where I use it is as an alternative to using find's -exec
switch, which I never really got the hang of:

   find . -name '*.c' | xargs wc -l

The above finds all C source files in the current directory and counts the
total number of lines (with a summary at the end). In the above case, using
-exec IIRC runs `wc -l` on each file separately, which means you don't get
a total at the end. However, do note that you might hit into various system
limits (particularly with argument length limits) if you use xargs.

Normally, xargs splits arguments via whitespace. What if you have files with
spaces in them? xargs has a switch, -0, that (ab)uses the null character as
an argument separator; when used with find's -print0 switch as well, files
with spaces can be fed in the above example.
 0-22          
Response Not Possible: You are Not Logged In
 

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss