spam (was Re: [wellylug] Extracting IPs from spambucket)

Valient Gough vgough at pobox.com
Thu Dec 18 11:36:19 NZDT 2003


I've found the combination of Bogofilter and SpamAssassin to be very
effective.  I know SpamAssassin has added statistical checks recently,
but Bogofilter seems more advanced.   I have had my same email address
for years, and it is visible is plenty of places, so I get a fair amount
of spam (100 - 150 a day).  Usually about 4-8 get through the filters
each week.  Although I periodically check for false-positives (a real
message marked as spam), I haven't run across one in many months..

Here's what my procmail spam rules look like:

## Check from address against whitelist..
# First, remove any existing X-Whitelist header..
:0fhb
* ^X-Whitelist
| $FORMAIL -IX-Whitelist

# remove spamassassin headers, so we don't confuse existing ones with
our own
:0fhb
* ^X-Spam
| $FORMAIL -IX-Spam

# tag the message with "X-Whitelist: Yes" if the sender was found
:0
* ? $FORMAIL -x"From" -x"From:" -x"Sender:" -x"Reply-To:"
-x"Return-Path:" \
     | egrep -is -f $MDIR/whitelist
{
  :0fhb
  | $FORMAIL -a"X-Whitelist: Yes"
}

## filter mail through bogofilter, tagging it as spam and
## updating the word lists
:0fw
| $BOGOFILTER -r -u -e -p

# if bogofilter failed, return the mail to the queue, the MTA will
# retry to deliver it later
# 75 is the value for EX_TEMPFAIL in /usr/include/sysexits.h
:0e
{ EXITCODE=75 HOST }


:0
* ^X-Whitelist: Yes
{
        # If bogofilter marked this as spam, then it got it wrong,
        # since we're processing white-list messages. Retrain BF
        :0c
        * ^X-Bogosity: Yes
        | $BOGOFILTER -Sn -r
}

# ELSE...  process non-whitelist messages
:0
* !^X-Whitelist: Yes
{
        # if bogofilter thinks it is spam, that's enough..
        :0
        * ^X-Bogosity: Yes
        $SPAMDIR

        # run spam assassin on it!
        #### Spam Assassin 2.5x
        :0fw: spamassassin.lock
        | $SPAMASSASSIN

        # if spam assassin thinks it is spam but bogofilter doesn't,
        # give preference to spam assassin and retrain BF
        :0
        * ^X-Spam-Status: Yes
        * ^X-Bogosity: No
        {
                # Retrain bogofilter
                :0c
                | $BOGOFILTER -Ns -r

                :0                
                $SPAMDIR
        }
}


That isn't a complete procmail file (I've left out various setup of
variable and such).  I highly recommend a backup rule when playing with
procmail, such as this to keep the last 100 messages in a backup
directory:

:0 c
$MDIR/backup

 :0 ic
 | cd $MDIR/backup && ls -t msg.* | sed "1,100d" | xargs rm -f


regards,
Valient

On Wed, 2003-12-17 at 13:22, Rob Stockley wrote:
> Today I've been messing with RBL and postfix. It all seems to work as
> advertised. Now I'm looking at a local hash table of IPs gleaned from
> messages that slip through the cracks. 
> 
> I have been using spamassassin for a while now. It's been a few months
> since I had a clean out and the spambucket consists of 9.3M of wasted
> space. I've been experimenting with ways of extracting the source IP's
> from this file.
> 
> I've googled a bit and run round in circles trying to find the right
> search terms. Either no one has done what I'm doing (unlikely) or it's
> got a common name that I've never heard of.
> 
> After a bit of playing I've got the following: Pipe the file through
> this command line:
> 
> |formail -c -s script.sh | sort | uniq > list_of_bad_ips
> 
> The file script.sh contains the following:
> 
> #!/bin/sh
> # Script to extract source IP from mail message
> formail -U "Received" | formail -x "Received" \
>          | sed "s/^.*\[//" | sed "s/\].*$//"
> # end of script.sh
> 
> It works but this approach feels like I'm using a crowbar to open a can
> of sardines. Is there an easier way? 
> 
> Eventually I'll set it up as a cron job to be run in the wee hours. I'm
> very interested in what other LUGers are doing in this regard.
> 
> Rob
> 




More information about the wellylug mailing list