[wellylug] more basics / sorting files
Brent Wood
pcreso at pcreso.com
Thu Dec 23 19:54:54 NZDT 2004
--- "E.Chalaron" <e.chalaron at xtra.co.nz> wrote:
> Hi there
>
> I have a rather simple question.
That's not the issue. What's more relevant is how simple is the answer :-)
> I have 2 files,
> one (A) contains on each line a word
> the second (B) contains several data per line including the word in A.
This is one of several possible approaches, hopefully they point you towards a
viable solution :-) Shell scripts provide a quick and dirty way for doing
this, for large jobs perl can be orders of magnitudes faster, if you really
wanna learn perl. But for simple one off's like this, a few lines of shell
script work fine...
Note that depending on your system the ">>" to append output data may not work
to create a file, you'll need a different redirection to make it work, but
hopefully this will be OK...
So:
I assume that the ref no is always in the same field (ie, always the 3rd (or
whatever) column), assume the fields are separated by spaces (note 'man cut'
might help)
#!/bin/bash
#read in each line from the file
while read LINE ; do
#extract out the KEY field from this line - try man cut for help with cut
# this example sets the delimiter to a space (-d" ") & grabs the 3rd field
KEY=`echo "$LINE" | cut -f3 -d" "`
#write the line to a file which uses the key field as part of the name
echo "$LINE" >> file_${KEY}.txt
done < file_B
This will write all lines from B to a new file with $KEY as part of the name,
so each line with a given KEY values will be in a separate file. A more complex
version, which only writes lines where the KEY field matches a list (file_A)
could be:
#!/bin/bash
while read LINE ; do
#as desctibe in the prev example above
KEY=`echo "$LINE" | cut -f3 -d" "`
#see if the KEY is in the list in A, ie, record the count from wc of matching
# lines (use man wc if you are not familiar with wc- it returns the nos of
# letters (chars/bytes), words & lines in a file, -l is just the no of
lines
# I assume you have come across grep b4... otherwise man grep for info
COUNT=`grep $KEY file_A | wc -l`
#if the count != 0 then yes, it is listed in file_A, so write it out as above
# otherwise don't write it & jump to the next line
if [ $COUNT -ne 0 ] ; then
#write the line using ">>" to append it to the file (> will overwrite)
echo "$LINE" >> file_${KEY}.txt
fi
done < file_B
There are other approaches for slightly different scenarios/data structures,
naming schemes, etc...
but you might try these to get a feel for wc/echo use
echo "" | wc -l
Which will return a 1 coz echo puts out a null string terminated by a LF, ie: 1
line
touch ttt
cat ttt | wc -l
Which will return 0 as the empty file ttt has no lines
or, just for interest & with bash, other shells may vary on syntax
echo -e "\c" | wc -l
which should return 0, as \c tells echo NOT to write the LF at the end...
Hope this helps.....
Brent
More information about the wellylug
mailing list