spam (was Re: [wellylug] Extracting IPs from spambucket)
Valient Gough
vgough at pobox.com
Thu Dec 18 11:36:19 NZDT 2003
I've found the combination of Bogofilter and SpamAssassin to be very
effective. I know SpamAssassin has added statistical checks recently,
but Bogofilter seems more advanced. I have had my same email address
for years, and it is visible is plenty of places, so I get a fair amount
of spam (100 - 150 a day). Usually about 4-8 get through the filters
each week. Although I periodically check for false-positives (a real
message marked as spam), I haven't run across one in many months..
Here's what my procmail spam rules look like:
## Check from address against whitelist..
# First, remove any existing X-Whitelist header..
:0fhb
* ^X-Whitelist
| $FORMAIL -IX-Whitelist
# remove spamassassin headers, so we don't confuse existing ones with
our own
:0fhb
* ^X-Spam
| $FORMAIL -IX-Spam
# tag the message with "X-Whitelist: Yes" if the sender was found
:0
* ? $FORMAIL -x"From" -x"From:" -x"Sender:" -x"Reply-To:"
-x"Return-Path:" \
| egrep -is -f $MDIR/whitelist
{
:0fhb
| $FORMAIL -a"X-Whitelist: Yes"
}
## filter mail through bogofilter, tagging it as spam and
## updating the word lists
:0fw
| $BOGOFILTER -r -u -e -p
# if bogofilter failed, return the mail to the queue, the MTA will
# retry to deliver it later
# 75 is the value for EX_TEMPFAIL in /usr/include/sysexits.h
:0e
{ EXITCODE=75 HOST }
:0
* ^X-Whitelist: Yes
{
# If bogofilter marked this as spam, then it got it wrong,
# since we're processing white-list messages. Retrain BF
:0c
* ^X-Bogosity: Yes
| $BOGOFILTER -Sn -r
}
# ELSE... process non-whitelist messages
:0
* !^X-Whitelist: Yes
{
# if bogofilter thinks it is spam, that's enough..
:0
* ^X-Bogosity: Yes
$SPAMDIR
# run spam assassin on it!
#### Spam Assassin 2.5x
:0fw: spamassassin.lock
| $SPAMASSASSIN
# if spam assassin thinks it is spam but bogofilter doesn't,
# give preference to spam assassin and retrain BF
:0
* ^X-Spam-Status: Yes
* ^X-Bogosity: No
{
# Retrain bogofilter
:0c
| $BOGOFILTER -Ns -r
:0
$SPAMDIR
}
}
That isn't a complete procmail file (I've left out various setup of
variable and such). I highly recommend a backup rule when playing with
procmail, such as this to keep the last 100 messages in a backup
directory:
:0 c
$MDIR/backup
:0 ic
| cd $MDIR/backup && ls -t msg.* | sed "1,100d" | xargs rm -f
regards,
Valient
On Wed, 2003-12-17 at 13:22, Rob Stockley wrote:
> Today I've been messing with RBL and postfix. It all seems to work as
> advertised. Now I'm looking at a local hash table of IPs gleaned from
> messages that slip through the cracks.
>
> I have been using spamassassin for a while now. It's been a few months
> since I had a clean out and the spambucket consists of 9.3M of wasted
> space. I've been experimenting with ways of extracting the source IP's
> from this file.
>
> I've googled a bit and run round in circles trying to find the right
> search terms. Either no one has done what I'm doing (unlikely) or it's
> got a common name that I've never heard of.
>
> After a bit of playing I've got the following: Pipe the file through
> this command line:
>
> |formail -c -s script.sh | sort | uniq > list_of_bad_ips
>
> The file script.sh contains the following:
>
> #!/bin/sh
> # Script to extract source IP from mail message
> formail -U "Received" | formail -x "Received" \
> | sed "s/^.*\[//" | sed "s/\].*$//"
> # end of script.sh
>
> It works but this approach feels like I'm using a crowbar to open a can
> of sardines. Is there an easier way?
>
> Eventually I'll set it up as a cron job to be run in the wee hours. I'm
> very interested in what other LUGers are doing in this regard.
>
> Rob
>
More information about the wellylug
mailing list