|
|
| Author |
Message |
| 17 new of 22 responses total. |
cross
|
|
response 6 of 22:
|
Sep 22 12:03 UTC 2010 |
resp:5 Nice. Certainly, example (3) could be rewritten as:
scp 'grex.org:/bbs/agora33/_*' .
I'm pretty sure that's what John meant when he said it was
particularly hideous.
Btw: I don't know if you're using Bash or some such shell, but it may
not be a bad idea to quote the "*" in the back-tick'ed part of the for
loop in (3).
An often handy flag for ssh is '-C', which compresses the data
stream. That can speed up things like this, at the expense of a
little more CPU and RAM usage on either end of the connection.
If you don't mind running grep on Grex instead of the local end of
your connection, you could substitute 'grep' for 'cat' in (1). There
used to be people on USENET who complained about "useless uses of cat"
and there was an award. However, note how this differs: the pipeline
is spread across the network, and there may be legitimate reasons that
one does not want to run grep on Grex ("cat" is a much, much simpler
program). There are legitimate reasons to use cat in pipelines; this
could be one of them.
I think that John's examples are very much in the spirit of Unix tool
usage; 'ssh' certainly counts in this case, particularly for its
ability to spread computation across the network. That's an oft-
neglected capability, usually overlooked in favor of people who simply
want to use it interactively. I claim that using, e.g., 'scp' is
certainly more "Unixy" than, say, the sftp or ftp commands.
|
cross
|
|
response 7 of 22:
|
Sep 22 12:33 UTC 2010 |
Today's Unix tool o' the day is: ed. The Standard Editor.
And that's what it is: ed is a line oriented text editor. One uses it
by running 'ed file' and then typing in commands from the ed command
set. See the ed(1) manual page, or the book, "The Unix Programming
Environment" for information on how to use ed.
Ed is not a graphical editor; it does not display a full screen-full
of information at a time. In fact, it won't display any part of the
file unless you tell it to by running special ed commands; ed was
designed for use on printing teletype terminals (ie, terminals that
output to an actual typewriter-like device, instead of a graphic
screen). Its diagnostics are horrible in their simplicity: usually,
if you do something ed does not understand, you get a line with the
single character '?' on it, all by itself. Some have called it
the, "What you get is what you get" or WYGIWYG text editor.
Ed is probably the first Unix text editor. I say "probably" because
it's certainly been rewritten several times, and they tried several
variants out at Bell Labs. It is descended from the QED editor that
was used on the Multics system and CTSS. Ed was implemented mostly by
Ken Thompson in the early 1970s, and went on to inspire the Sam and
acme text editors written by Rob Pike. It is called "the Standard
Editor" because it was the only traditional text editor that came with
early Unix distributions.
So, ed is not graphical, has horrible diagnostics, an arcane command
set, and was designed for 1960s standards, why on earth would anyone
want to learn it now?
Because it's tremendously useful as an editing tool for in-place
files. Given that it reads commands from standard input, it can also
be used as a target in a pipeline: one can write scripts that generate
ed commands that are then piped into ed running on a file; this is
handy for batch-mode editing without using temporary files (ed might
make a temporary file, but the script-writer won't have to worry about
that).
Have you got 1000 files in a directory that all have to have the
word "UNIX" changed to "Unix Operating System" in them? Here's a
simple way:
for file in *.html
do
(echo 'g/UNIX/s/UNIX/Unix Operating System/g'; echo w; echo q) |
ed $file
done
Done. Note, some versions of ed(1) support chaining some commands
together, and some support combining the 'w' and 'q' commands into a
single 'wq', but the original ed does not.
|
remmers
|
|
response 8 of 22:
|
Sep 22 13:26 UTC 2010 |
Re resp:6 - I use bash, and example (3) worked without quoting the
asterisk in the command substiution.
Re resp:7 - My introduction to Unix was with Version 6 in 1981 on
a PDP-somthing-or-other. (My god that was almost 30 years ago!)
The editor choices were 'ed' and a full-screen editor from the Rand
Corporation whose name I don't remember. The vi editor may have
existed at that point but was not available on Version 6. The Rand
editor was pretty resource-intensive and anyway didn't support the
dumb terminal I used at home for dialup. So I had to get pretty
familiar with ed, and even though I've scarcely used it for many
years, the basic commands are still baked into my fingers. I agree
that it's still useful for batch processing.
Is it true that the standalone grep and sed commands are spinoffs of
facilities built into the original ed?
|
cross
|
|
response 9 of 22:
|
Sep 22 13:56 UTC 2010 |
resp:8 Right. If bash doesn't match a wildcard pattern, it will just
pass the wildcard as a string to whatever command, instead of an
error. Csh and tcsh have the opposite behavior; Bourne shell
derivitives probably vary, but I'm not sure.
It is true that grep and sed were spinoffs of ed.
For the first, the idiom in ed to print all lines matching some
regular expression is:
g/regular expression/p
The 'g' says to apply the following command against all lines in the
file. The regular expression is, of course, the regular expression,
and p is the command to print a line (if it matched the RE). This
functions as a loop; for each line in the file, if it matches the
regular expression, print the line (ordinarily regular expression
searchs are only within the current line). The etymology of grep is
thus clear: g-re-p, or 'global regular expression print.'
Sed was similar; it implements much of the ed command language, but
against a stream of data. "Sed" is short for "Stream EDitor", or "S
Ed", which is just written as 'sed.' Sed was freed from ed when pipes
were added to Unix. In particular because 'ed' only works on files,
not streams, and (I'm speculating here) it was probably felt that
augmenting ed to work with streams was not a good idea for a few
reasons. First, it would not have fit in with the whole "software
tools" philosophy. Second, I imagine that it was felt that the
command sets would diverge, given the different usage patterns and
demands of files versus streams. For instance, the 'g' command is
gone because it is presumed (given that you're working with a stream)
that you want to work with all lines; the notion of a 'current line'
selected by the user as a unit of work doesn't make much sense when
you are implicitly iterating over all lines, so 'g' is the default.
|
tsty
|
|
response 10 of 22:
|
Sep 23 00:26 UTC 2010 |
i thnk it was 1989 when gordon good taught we umich consultnats about how unix
got put togetenr with an almost identical explanation to corss's above.
i was stunned at the purity of the comcept ... still am
|
vsrinivas
|
|
response 11 of 22:
|
Nov 30 13:07 UTC 2010 |
Terribly handy UNIX tool of the day: script(1).
script just starts a shell (or another program); it then records the
entire terminal session into a file (default 'typescript'). Very handy
when you want to record some output from a program or remember what you
did.
|
kentn
|
|
response 12 of 22:
|
Nov 30 13:54 UTC 2010 |
Yes, that's a handy one. I just used that yesterday, while updating
my FreeBSD system, to keep the log messages from the kernel and world
makes. Those tend to produce many lines of output and script grabs
it all very nicely.
|
remmers
|
|
response 13 of 22:
|
Dec 1 19:55 UTC 2010 |
Which reminds me of another handy Unix tool that I don't think has been
mentioned yet: tee (1). It lets you capture to files output from
intermediate steps in a pipeline.
|
cross
|
|
response 14 of 22:
|
Dec 1 21:21 UTC 2010 |
Script is very handy, and tee is a wonderfully light-weight way of
getting much of the functionality (or of saving some intermediate
step in a pipeline for later use).
One of the beauties of the pipe model is that the same command can
be used multiple times in the same job. So for instance, something
I did today was extract some package dependencies for installed
packages here on Grex and and translate those into URLs so I could
then download them. Suppose I wanted to save a list of both the
package names as well as the resulting URLs: a command to do that
might look like the following:
; grep '^dependencies:' file |
sed 's/^.*: //' |
tr ' ' '\012' |
tee packages |
sed 's,^,http://ftp.openbsd.org/pub/OpenBSD/4.8/packages/i386/,;s/$/.tg
z/' |
tee urls |
sed 's/^/ftp /' |
sh
Now everything downloads, and I've saved not only the package names,
but also the URLs. Note that I could get rid of the "grep" command,
buy wrapping it into the first 'sed', thus making it something like,
"sed -ne 's/^dependencies: //p' file" but I chose to leave it in
for expository purposes. For that matter, I could have tightened
the RE in the first sed command a little in case a package name had
a colon in it (^[^:]*:). None did in my real-life example, but I
guess it's possible. Again, I didn't want to get too far down in
the weeds.
So I've got two invocations of tee, plus three of sed, and I'm
generating output for the shell (an often under-used technique).
I think in systems that force you to operate within the confines
of what one may call a "monolithic" interface, there's less flexibility
to combine tools in this manner. Truly, if you can use the same
tool multiple times in the same job, to do different things, you
gain enormous power and flexibility.
|
tsty
|
|
response 15 of 22:
|
Dec 2 08:17 UTC 2010 |
and how much didfferent is script from tee which i have used a
buncha times.
|
cross
|
|
response 16 of 22:
|
Dec 2 08:56 UTC 2010 |
script allocates a pseudo-terminal and starts another interactive
session on it; it records everything from that session (both input
and output) into a file. You could run a graphical editor like vi
or emacs under a script session and see all the escape codes, etc,
in the resulting file.
Tee is like a tee-joint in a water pipe: it writes whatever data
it reads to a file *and* to stdout. It's much simpler.
|
nharmon
|
|
response 17 of 22:
|
Dec 6 16:30 UTC 2010 |
Script reminds me of one of my favorite Unix tools, screen(1). It allows
you to create multiple pseudo-terminals, and switch between them, sort
of like switching through virtual consoles. The feature I really find
handy is the ability to detach and reattach from the pseudo-terminals
without closing them.
|
cross
|
|
response 18 of 22:
|
Dec 6 21:32 UTC 2010 |
Screen can be wonderfully handy, particularly if you're coming in over an
unreliable link of some kind.
|
cross
|
|
response 19 of 22:
|
Dec 6 21:45 UTC 2010 |
Today's Unix Tools o' the day are for comparing files.
When you start to consider the text and files as the basic unit of data
exchanged between programs, then comparing that text and those files starts
to become an important issue. Since Unix is very much based around the idea
of text as a basic representation of data, the Unix authors wisely wrote some
tools to help out in this regard.
The three basic tools most commonly used here are:
1. diff - Compute the differences between lines of text in two separate files.
2. comm - Show the commonalities (and differences) between two separate files.
3. cmp - Compare two files.
Diff shows you in a variety of different formats what the differences between
two files are. Want to see how one line of code differs from previous
versions? Diff is the answer (indeed, it's so useful that most revision
control systems have a built-in diff command to see differences in the
revision history). Early versions of diff actually output 'ed' commands,
so that changes could be made to a file to match a "diff" by simply
piping into ed! Now, that's power.
Comm shows you what is common between two files (or what is in one or the
other but not in both). It can be extremely useful in a shell pipeline for
finding differences from a baseline. In this sense, it is very much a
tradiitonal Unix filter, rather than a standalone tool.
Finally, cmp is a general utility for detecting whether differences exist
between files at the byte level; it works on binary files just as well as text
files. It is less sophisticated than diff (which can take options to ignore
whitespace and so on), but can be great in shell scripts where one wants to
test for differences, but doesn't necessarily care what the differences are.
Finally, no discussion of diff would be complete without also mentioning patch.
Patch is a tool for applying incremental updates to a file; it's input can be
specified in the ubiquitous diff format.
|
kentn
|
|
response 20 of 22:
|
Dec 6 23:49 UTC 2010 |
There are also file comparison program similar to those, such as sdiff,
wdiff, and colordiff. All interesting to check out. But diff and cmp
are by far my favorites. It just depends on how much output I want to
see.
For those who use vim, the vi improved editor, there is a diff mode, vim
-d, for comparing 2 files side by side (similar to sdiff). This works
nicely for merging code between two versions of a file.
|
ball
|
|
response 21 of 22:
|
Jul 16 16:32 UTC 2015 |
dd is my Unix Tool of the Day. I find it useful for
wiping hard disks (I read input from /dev/zero and write
output to the raw device file for the disklabel partition
that covers the whole disk) and for making images of disk
drives and flash cards. It can also be used the other way:
to write an image file to a disk, flash card or USB flash
drive. Use it with care though: type the wrong parameter
as root and it's easy to accidentally overwrite the wrong
disk.
|
butiki
|
|
response 22 of 22:
|
Sep 4 23:47 UTC 2017 |
To add to the list of tools, xargs, which allows you to convert lines in
standard input to arguments to a command. I often use it in various pipelines,
for instance where I select a bunch of files to add to a tarball:
grep '.c$' packing-list | xargs tar cvf source.tar
Another place where I use it is as an alternative to using find's -exec
switch, which I never really got the hang of:
find . -name '*.c' | xargs wc -l
The above finds all C source files in the current directory and counts the
total number of lines (with a summary at the end). In the above case, using
-exec IIRC runs `wc -l` on each file separately, which means you don't get
a total at the end. However, do note that you might hit into various system
limits (particularly with argument length limits) if you use xargs.
Normally, xargs splits arguments via whitespace. What if you have files with
spaces in them? xargs has a switch, -0, that (ab)uses the null character as
an argument separator; when used with find's -print0 switch as well, files
with spaces can be fed in the above example.
|