SpamAssassin sa-learn cron script for virtual users

Because I couldn’t find something to fully automate the SpamAssassin sa-learn process for virtual email users in a MySQL database, I wrote my own (borrowing liberally from Jason Schaefer’s SpamAssassin training and spam cleanup script).

This script assumes that:

  • You have a MySQL database with virtual users, with a user table called ‘virtual_users’ and the full email address stored in a field called ’email’.
  • Your email is stored in Maildir folders, with a heirarchy starting from /var/vmail/domain.com/username/…
#!/bin/bash

## Database details

USER=''
PASS=''
HOST=''
DB=''

## Where to log stuff

LOG='/var/log/sa-learn.log'

## How many days to wait before deleting spam
## Comment out to disable

CLEAN=30

echo -e "\n\nRun started `date +%c`"  >> $LOG 2>&1

## Spam and ham training for all virtual users
## Delete spam older than $CLEAN days

mysql --skip-column-names -u$USER -p$PASS -h$HOST -D$DB -e "SELECT SUBSTRING(email, 1, LOCATE('@', email) - 1) AS user, SUBSTRING(email, LOCATE('@', email) + 1) AS domain FROM virtual_users" | while read user domain;

do
  ## Spam
  echo "Spam training for $user@$domain" >> $LOG 2>&1
  /usr/bin/sa-learn --no-sync --spam /var/vmail/$domain/$user/.Junk/{cur,new} >> $LOG 2>&1
  ## Ham
  echo "Ham training for $user@$domain" >> $LOG 2>&1
  /usr/bin/sa-learn --no-sync --ham /var/vmail/$domain/$user/{cur} >> $LOG 2>&1
  ## Delete
  if [ -n $CLEAN ]; then
    echo "Deleting spam for $user@$domain older than $CLEAN days" >> $LOG 2>&1
    find /var/vmail/$domain/$user/.Junk/cur/ -type f -mtime +$CLEAN -exec rm {} \;
  fi
done

## Sync the SpamAssassin journal and print out stats

echo "Syncing the SpamAssassin journal" >> $LOG 2>&1
/usr/bin/sa-learn --sync >> $LOG 2>&1
echo "Statistics for this run:" >> $LOG 2>&1
/usr/bin/sa-learn --dump magic >> $LOG 2>&1

echo -e "Run finished `date +%c`"  >> $LOG 2>&1

exit