[wellylug] more basics / sorting files

Thu Dec 23 19:54:54 NZDT 2004

--- "E.Chalaron" <e.chalaron at xtra.co.nz> wrote:

> Hi there
> 
> I have a rather simple question.

That's not the issue. What's more relevant is how simple is the answer :-)

> I have 2 files, 
> one (A) contains on each line a word
> the second (B) contains several data per line including the word in A.

This is one of several possible approaches, hopefully they point you towards a
viable solution :-)  Shell scripts provide a quick and dirty way for doing
this, for large jobs perl can be orders of magnitudes faster, if you really
wanna learn perl. But for simple one off's like this, a few lines of shell
script work fine...

Note that depending on your system the ">>" to append output data may not work
to create a file, you'll need a different redirection to make it work, but
hopefully this will be OK...

So:

I assume that the ref no is always in the same field (ie, always the 3rd (or
whatever) column), assume the fields are separated by spaces (note 'man cut'
might help) 

#!/bin/bash
#read in each line from the file
while read LINE ; do
  #extract out the KEY field from this line - try man cut for help with cut
  # this example sets the delimiter to a space (-d" ") & grabs the 3rd field
  KEY=`echo "$LINE" | cut -f3 -d" "`

  #write the line to a file which uses the key field as part of the name
  echo "$LINE" >> file_${KEY}.txt 
done < file_B

This will write all lines from B to a new file with $KEY as part of the name,
so each line with a given KEY values will be in a separate file. A more complex
version, which only writes lines where the KEY field matches a list (file_A)
could be:

#!/bin/bash
while read LINE ; do
  #as desctibe in the prev example above
  KEY=`echo "$LINE" | cut -f3 -d" "`

  #see if the KEY is in the list in A, ie, record the count from wc of matching

  #   lines (use man wc if you are not familiar with wc- it returns the nos of 
  #   letters (chars/bytes), words & lines in a file, -l is just the no of
lines
  #   I assume you have come across grep b4... otherwise man grep for info
  COUNT=`grep $KEY file_A | wc -l`

  #if the count != 0 then yes, it is listed in file_A, so write it out as above
  #  otherwise don't write it & jump to the next line
  if [ $COUNT -ne 0 ] ; then
     #write the line using ">>" to append it to the file (> will overwrite)
     echo "$LINE" >> file_${KEY}.txt 
  fi
done < file_B

There are other approaches for slightly different scenarios/data structures,
naming schemes, etc...

but you might try these to get a feel for wc/echo use

echo "" | wc -l

Which will return a 1 coz echo puts out a null string terminated by a LF, ie: 1
line

touch ttt
cat ttt | wc -l

Which will return 0 as the empty file ttt has no lines

or, just for interest & with bash, other shells may vary on syntax
echo -e "\c" | wc -l

which should return 0, as \c tells echo NOT to write the LF at the end...

Hope this helps.....

   Brent