[wellylug] Regex, replace stuff in command line help...
Jim Cheetham
jim at gonzul.net
Thu Jun 29 10:22:42 NZST 2006
On Thu, Jun 29, 2006 at 10:14:59AM +1200, Jo Booth wrote:
> Trying to figure out how to get an id back out of a series of old
> saved web pages.... haven't had to use Regular Expressions for
> replacing stuff in wrath for a while...
>
> Bunch of links.. in a html file random scattered, but in tables etc...
> something like <a href="url or relative link/profile?u=mangee">
First question - are all these hrefs going to be completely on one line,
or might they be split onto two?
If they're on one line, you would probably get a long way by just
(inefficiently) chaining a series of greps ...
grep 'href=' * | grep '?u='
Look at the lines you get back out, if they are sufficiently uniform
then hand them over to your favourite text field extractor (I like cut,
sed and awk are good, perl is fine too) ... if the href is the only
interesting thing on the line, try grabbing stuff between the second =
and the next " with
| cut -d= -f3 | cut -d" -f1
Basically, I prefer homing in on thematch, rather than trying an
all-or-nothing regexp first. Especially if this is a one-off process.
-jim
More information about the wellylug
mailing list