#!/usr/bin/perl -w

######################################################################
# Copyright (C) Brian Candler 1999 <B.Candler@pobox.com>             #
# This is free software. You may use, distribute and/or modify this  #
# software at will, as long as (1) this notice remains intact, and   #
# (2) you agree that under no circumstances will the author be       #
# liable for any loss or damage whatsoever caused by it.             #
######################################################################

# Use 'pod2man' or 'pod2html' on this file to get formatted documentation

###########################################################################
# List here regexp patterns for 'your' mail machine(s).
# These are regular expressions: remeber to use '\.' to match a dot,
# and '.*' to match any sequence of (0 or more) characters.

@mymailhosts = (
#	'.*\.pobox\.com',		# [anything].pobox.com
#	'.*\.mail\.demon\.net',		# [anything].mail.demon.net
);

###########################################################################
# Select one or more RBL-style servers to check against.
# Although IMRSS catches lots of spam, it also catches mail from genuine
# people who happen to use a badly-configured system. Caveat emptor.

@rblservers = (
	'rbl.maps.vix.com',	# http://maps.vix.com/rbl/	Persistent spammers
#	'dul.maps.vix.com',	# http://maps.vix.com/dul/	Dialup netblocks
#	'relays.orbs.org',	# http://www.orbs.org/		Reported open relays
#	'mr-out.imrss.org',	# http://www.imrss.org/		Scanned open relays
);

=head1 NAME

rblfilter - extract IP address from E-mail and check against RBL lists

=head1 SYNOPSIS

B<rblfilter>
[ B<--filter> | B<--debug> ]

=head1 DESCRIPTION

This program extracts the remote IP address from the Received: headers
in an E-mail message, and checks it against one or more RBL-style lists.
Specifically, it looks for the first line of the form:

Received: from I<not-our-mail-server> by I<our-mail-server>

This allows you to use RBL-style spam filtering even if you receive
mail at some remote host (e.g. your ISP) where you cannot get them
to do RBL checking at the time the message is received.

=head2 EXAMPLE

I receive mail via the forwarding service 'pobox.com', which
then forwards to my ISP, from where I collect via POP3. Typically
a message looks like this by the time I receive it:

	Received: from pop3.demon.co.uk
		by localhost with POP3 (fetchmail-5.0.5)
		...
	Received: from punt-2.mail.demon.net by mailstore ...
	Received: from growl.pobox.com ([208.210.124.27])
		by punt-2.mail.demon.net ...
	Received: from somehost (somehost [z.z.z.z])
		by growl.pobox.com ...

In this case, it is the [z.z.z.z] address which I want to check against
the RBL. To do this, I simply give rblfilter some patterns to match
B<*pobox.com> and B<*mail.demon.net> as 'my' mail servers.

Notice that because I included B<*mail.demon.net>, this also works if
someone sends mail directly to my ISP account; in this case, the relevant
header is

	Received: from somehost (somehost [z.z.z.z])
		by punt-1.mail.demon.net ...

In general, you just list all the hosts where you have an E-mail account.

=head2 CONFIGURATION

This is a perl program. Therefore, by definition, it is a hack :-)

You configure the program by editing the source code -
look for B<@mymailservers> and B<@rblservers> at the top of the program,
and edit to suit.

=head1 USAGE

This program has three modes:

=head2 Check mode

This is the default. It accepts message headers on stdin, finds the first
suitable Received: header, looks it up in the RBL lists, and exits with 0 if
the message is clean or 1 if it is listed in any of the RBL lists. Nothing
is output on stdout.

This mode is best if you just need to detect quickly whether a single message
is RBL listed. For example, in F<.procmailrc> the following recipe will filter
matching messages into a 'junk' folder:

	:0:
	* ! ? /usr/local/bin/rblfilter
	Mail/junk

=head2 Filter mode

This is enabled by adding B<--filter> to the command
line. This copies every line from stdin to stdout, and if appropriate adds an

	X-rblfilter: <RBL domain>

header for each RBL list matched. Multiple lines will be added if the
address matches multiple RBL lists. Multiple messages separated by the
UNIX mailbox separator S<'From '> can be processed. The exit code is always
0.

This mode is useful for processing a whole mailbox file, or for processing
mail whilst leaving the decision about which messages to keep or drop to the
end-user (who can filter on the X-rblfilter: headers using their own mail
client)

=head2 Debug mode

Enabled by adding B<--debug> to the command line. This is identical to
filter mode, but adds additional messages to show rblfilter's decision
making process. This is useful to work out why rblfilter is not triggering
on a particular message when you think it should be.

=head1 WARNINGS

B<DO NOT CONFIGURE YOUR SYSTEM TO DELETE MAIL BASED ON THIS PROGRAM!>
Move it to a 'junk' folder instead. Not only does this protect you against
bugs in the software, it also gives you a chance to save important mail
from a machine which happened to be RBL-listed. Plus it gives you a file
full of spam messages useful for testing :-)

In addition, don't attempt to automatically bounce messages which match the
RBL. Spam almost always has forged return addresses, so all you will do is
cause unnecessary grief to the person who legitimately owns the forged
address.

Remember that using RBL filtering will slow down message delivery -
sometimes by several seconds per message - while the RBL data is looked up
in the DNS. This also means that you must stay connected to your ISP while
the messages are being delivered.

=head1 TESTING

Pipe an existing mailbox file through B<rblfilter --debug> to check that the
correct IP addresses are being extracted (look for "X-Matched: ..."). If you
have some stored junk mail which was relayed via an RBL-listed machine, even
better.

RBL lists usually have a test address which is 127.0.0.2 or 127.0.0.3. If
you are running rblfilter on a Unix machine, you can pass a test message
directly to sendmail to trip your filter:

	/usr/sbin/sendmail -v <your-address>
	Received: from 127.0.0.2 ([127.0.0.2]) by <your-mailserver>

	This is a test
	.

(sendmail will also add its own Received: header, but since this won't
contain an IP address for a locally-submitted message, it will be skipped by
rblfilter)

Alternatively, if you have a shell account on the machine which receives
your mail, you can login there and inject a mail message using telnet to the
SMTP port:

        telnet 127.0.0.2 25
        HELO some.domain
        MAIL FROM:<someone@somewhere>
        RCPT TO:<youraddress@yourdomain>
        DATA
        Subject: This is a test message

        Testing 1 2 3
        .
        QUIT

=head1 BUGS

This is ALPHA software. It has been developed under Red Hat Linux 6.0 i386
with perl-5.00503-2 and procmail-3.13.1-2. It may not work for you.

The format of Received: headers is variable, and unfortunately not yet
standardised; the pattern which rblfilter looks for may not match the mailer
used by your ISP.

Probably should do more error checking.

Probably should be written in a real programming language, with a real
configuration file.

=head1 AUTHOR

Brian Candler E<lt>B.CandlerB<@>pobox.comE<gt>. The latest version of this
file should be available at http://pobox.com/~b.candler/software/rblfilter/

=head1 SEE ALSO

http://www.xnet.com/~emarshal/rblcheck/

=cut

###########################################################################
# Here is the pattern we match against. You shouldn't have to change it.

# In principle, we search for lines of the form:
#   Received: ... from xxxx (ident@yyyy [z.z.z.z]...) ... by rrrr...
#                            ^^^^^^^^^^^
#                            (optional)
# where rrrr is a member of mymailhosts, and yyyy (or xxxx) isn't.

# However, there are a bunch of formats for the 'from' section in common
# use:
#      dns ([ip]) 
#      dns([ip]) (comment)
#      helo (dns [ip])
#      helo(dns[ip]) (comment)
#      dns ([ip] HELO helo...)
#      helo (ip)
#      [ip] (HELO=helo)
#      (helo) [ip]
#      dns (HELO helo) (ip)
#      helo

# The following regexp tries to capture them all :-) Note that we can
# ignore the HELO name as long as we have the DNS name. Some mailers
# just say "from xxxx ([z.z.z.z])" if the reverse DNS and HELO match.

$NAME     = '([^\s;()\[\]]*)';	# match an optional hostname/HELO name
				# (HELO names sometimes contain odd symbols)
$NAME_REQ = '([-_.a-z0-9]+)';	# match a required hostname
$IP       = '([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)';	# match an IP address

$pattern =
  "\\sfrom" .
  "\\s+$NAME" .					#   $1            -- dns/helo
  "\\s*(\\($NAME\\s*\\[$IP\\][^)]*\\))?" .	#   ($3 [$4]..)   -- dns,ip
  "\\s*([(\\[]${IP}[)\\]])?" .			#   ($6) | [$6]   -- ip
  "\\s*(\\(([^()]+)\\))?" .			#   ($8)          -- text
  "\\s*([(\\[]${IP}[)\\]])?" .			#   ($10) | [$10] -- ip
  ".*\\sby\\s+$NAME_REQ";                       #   .. by $11   -- receiver

# We obviously need the IP address of the 'from' machine, but why do we care
# about the name? Because we want to skip over Received: lines of the form
# "from <our-machine>", which is simply mail being passed internally
# between our mail servers.
# We use the logged reverse DNS name (yyyy) or the HELO name (xxxx),
# because this is cheaper than doing a fresh reverse DNS lookup. At
# least, I think it is :-)

###########################################################################
# Now the actual code

$version = "1.0.0";
$debug = $filter = 0;

# Process the command line

while (@ARGV and ($_ = $ARGV[0]) =~ /^-/) {
	if ($_ eq "--filter") { $filter = 1; next; }
	if ($_ eq "--debug")  { $debug = $filter = 1; next; }
	print STDERR "Unknown flag $_\nVersion $version\nUsage: rblfilter [--filter]\n";
	exit 100;
} continue {
	shift @ARGV;
}

die "Please configure \@mymailhosts\n" unless @mymailhosts;
die "Please configure \@rblservers\n" unless @rblservers;

# Build a single regexp to match all mailhosts
$mailhosts = '^' . (join '$|^', @mymailhosts) . '$';

$_ = <>;

RUN:
while (1) {

	HDRS:
	while (1) {
		last RUN if not defined($_);

		if (/^Received:/i) {
			print $_ if $filter;

			# Join together continuation lines
			# (leaving the next line in $_)

			chomp;
			$hdr = $_;
			while (defined($_ = <>) and /^[\t ]/) {
				print $_ if $filter;
				chomp;
				$hdr .= $_;
			}

			# Is this header the right format?
			next unless $hdr =~ /$pattern/i;

			$sender_ip = $4;
			$sender_ip = $6 if not $sender_ip;
			$sender_ip = $10 if not $sender_ip;
			# Skip if there is no IP address
			next if not $sender_ip;

			$sender_name = $3;
			$sender_name = $1 if not $sender_name or $sender_name eq "unknown";
			$sender_name = $8 if not $sender_name or $sender_name eq "unknown";
			$sender_name = $sender_ip if not $sender_name or $sender_name eq "unknown";

			$receiver_name = $11;

			if ($receiver_name !~ /$mailhosts/i) {
				print "X-Ignored: receiver '$receiver_name' is not in mymailhosts\n" if $debug;
				next;
			}

			# Remove 'helo/ehlo=' and 'ident@'
			$sender_name =~ s/^[he][he]lo[ =]+//i;
			$sender_name =~ s/^.*@//;

			if ($sender_name =~ /$mailhosts/i) {
				print "X-Ignored: sender '$sender_name' is in mymailhosts\n" if $debug;
				next;
			}

			print "X-Matched: $sender_name [$sender_ip]\n" if $debug;
			next unless $sender_ip =~ /^([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)$/;
			$rev = "$4.$3.$2.$1";
			for $rbl (@rblservers) {
				print "X-Trying: $rev.$rbl.\n" if $debug;
				if (gethostbyname("$rev.$rbl.")) {
					# Gotcha, you spamhaus!
					exit 1 unless $filter;
					print "X-rblfilter: $rbl\n";
				}
			}
			# Whether or not we matched an RBL, that was the
			# line we wanted - so don't parse any more headers.
			last HDRS;
		}

		# Blank line indicates end of headers
		last HDRS if /^\s*$/;

		# Otherwise, just pass through
		print $_ if $filter;
		$_ = <>;
	}

	# In check mode, we now know the message is clean

	exit 0 unless $filter;

	# Copy remaining headers and/or the message body verbatim,
	# until we reach the start of another message. This is only
	# used in filter mode.

	BODY:
	while (1) {
		last RUN if not defined($_);
		last BODY if (/^From /);
		print $_;
		$_ = <>;
	}
}

exit 0;
