No Next Item No Next Conference Can't Favor Can't Forget Item List Conference Home Entrance    Help
View Responses


Grex Systems Item 20: The sed, awk, regular expressions and other tools item
Entered by cross on Sat Sep 16 23:25:04 UTC 2006:

Sed, awk, and Unix filters are the powertools of the Unix environment.  Unix
tools have long been famous for accepting cryptic regular expressions for
describing textual input and acting on it accordingly; indeed, the first
pipeline tool was grep: "Global Regular Expression Print."  All of these tools
have gone through several iterations and they an be confusing; ask your
questions about Unix shell tools, sed, awk and nawk, and regular expressions
here.

7 responses total.



#1 of 7 by yuni on Tue Sep 30 00:35:35 2014:

Hello all. I am trying to take some directories that and transfer them
from Linux to Windows. The problem is that the files on Linux have
colons in them. And I need to copy these directories (I cannot alter
them directly since they are needed as they are the server) over to
files with a name that Windows can use. For example, the name of a
directory on the server might be:

IAPLTR2b-ERVK-LTR_chr9:113137544-113137860_-

while I need it to be:

IAPLTR2b-ERVK-LTR_chr9-113137544-113137860_-

I have about sixty of these directories and I have collected the names
of the files with their absolute paths in a file I call directories.txt.
I need to walk through this file changing the colons to hyphens. Thus
far, my attempt is this:

#!/bin/bash

$DIRECTORIES=`cat directories.txt`
for $i in $DIRECTORIES;
do
    cp -r  "$DIRECTORIES" "`echo $DIRECTORIES | sed 's/:/-/'`"
done

However I get the error:

./my_shellscript.sh: line 10:
=/bigpartition1/JKim_Test/test_bs_1/129c-test-biq/IAPLTR1_Mm-ERVK-LTR_chr
10:104272652-104273004_+.fasta: No such file or directory
./my_shellscript.sh: line 14: `$i': not a valid identifier

Can anyone here help me identify what I am doing wrong and maybe what I
need to do?

Thanks in advance.


#2 of 7 by cross on Tue Sep 30 14:19:32 2014:

If I understand your problem statement, the colons can be inside
of filenames, not just directory names, right?

It looks like you're applying sed to the names of directories, but
directories are (in this sense) just containers for files.  Further,
you command is saying, basically, to copy a bunch of directories
into other directories on your remote host.

What you probably need to do instead is apply the command to the
names of *files*.  I would generate a simple shell script that would
execute the appropriate copy commands.  Something like:

% find `cat directories.txt` -type f -print |
      awk '{a=$0; gsub(/:/,"-",a); print "scp "$0,"remote.host:"$a}' >s.sh

Then, you can have a look at s.sh and make sure that it does
approximately what you want.

I think You will want to make sure that the directories it copies
into exist on the windows side *before* attempting to copy, .  That's
relatively easy: you can just take the script generated above, and
use that to generate commands that you could give to e.g. PowerShell.
On the Unix side, you'd create a list of directories as:

% awk '{d=$NF; sub(s/[^/]*$/,"",d); if (!a[d]++){print "mkdir "d}}' s.sh >d.psh

Then copy 'd.psh' to the windows side, change to the appropriate
directory, and run 'psh d.psh'.  You could also name it 'd.cmd' and
run it as a batch file and it create the necessary (empty) directories.
Now you run the copy script back on the Unix side of things and,
barring network failures, it should populate the directories on the
windows machine with the files of interest.


#3 of 7 by yuni on Wed Oct 1 00:31:47 2014:

Hello cross, I really appreciate your kind assistance. I am going to try
what you suggested tonight. It is something that I will have to do a bit
more of in the future, and shell scripting was never my strong point.
Again, I deeply appreciate your response. 


#4 of 7 by cross on Wed Oct 1 01:16:41 2014:

No problem!  I'm happy to help.  Btw: I made a mistake in the first command
I gave you.  It should be 'a' where it is now '$a'.  E.g.:

 % find `cat directories.txt` -type f -print |
       awk '{a=$0; gsub(/:/,"-",a); print "scp "$0,"remote.host:"a}' >s.sh

Sorry about that!


#5 of 7 by papa on Mon Jan 15 04:07:06 2018:

A regexp puzzle

(Actually, I'm working on a Lua string pattern, but they're close 
enough to regexp's I think it's worthwhile asking here.)

I want a regexp that match file path strings iff the file has a 
given base name. That is, a regexp that matches the first two of 
the following strings but not the third:

/some/path/to/file/basename
basename
notbasename

The following regexp should work, but when I try it none of the 
strings match.

^(.*/)basename$

(select lines that exactly match "basename", or "basename" and an 
optional prefix string that ends with "/")

What am I missing?



#6 of 7 by papa on Mon Jan 15 04:10:16 2018:

Sorry, I meant to type the following for the problematic regexp:

^(.*/)?basename$


#7 of 7 by papa on Mon Jan 15 12:24:46 2018:

I'll answer my own question.

The regexp DOES work with grep, but it is necessary to specify the extended
regexp option (-E, or run as `egrep`).

However, Lua's string library doesn't support extended regexps. Lua's string
pattern syntax is intermediate between globs and extended regexps in
complexity.

One way to do it in Lua is to test against two patterns ("^basename$",
"/basename$"), but I may look for a way to do it with simpler string
operations.

Response not possible - You must register and login before posting.

No Next Item No Next Conference Can't Favor Can't Forget Item List Conference Home Entrance    Help

- Backtalk version 1.3.30 - Copyright 1996-2006, Jan Wolter and Steve Weiss