Because I couldn’t find something to fully automate the SpamAssassin sa-learn process for virtual email users in a MySQL database, I wrote my own (borrowing liberally from Jason Schaefer’s SpamAssassin training and spam cleanup script).

This script assumes that:

  • You have a MySQL database with virtual users, with a user table called ‘virtual_users’ and the full email address stored in a field called ’email’.
  • Your email is stored in Maildir folders, with a heirarchy starting from /var/vmail/…

## Database details


## Where to log stuff


## How many days to wait before deleting spam
## Comment out to disable


echo -e "\n\nRun started `date +%c`"  >> $LOG 2>&1

## Spam and ham training for all virtual users
## Delete spam older than $CLEAN days

mysql --skip-column-names -u$USER -p$PASS -h$HOST -D$DB -e "SELECT SUBSTRING(email, 1, LOCATE('@', email) - 1) AS user, SUBSTRING(email, LOCATE('@', email) + 1) AS domain FROM virtual_users" | while read user domain;

  ## Spam
  echo "Spam training for $user@$domain" >> $LOG 2>&1
  /usr/bin/sa-learn --no-sync --spam /var/vmail/$domain/$user/.Junk/{cur,new} >> $LOG 2>&1
  ## Ham
  echo "Ham training for $user@$domain" >> $LOG 2>&1
  /usr/bin/sa-learn --no-sync --ham /var/vmail/$domain/$user/{cur} >> $LOG 2>&1
  ## Delete
  if [ -n $CLEAN ]; then
    echo "Deleting spam for $user@$domain older than $CLEAN days" >> $LOG 2>&1
    find /var/vmail/$domain/$user/.Junk/cur/ -type f -mtime +$CLEAN -exec rm {} \;

## Sync the SpamAssassin journal and print out stats

echo "Syncing the SpamAssassin journal" >> $LOG 2>&1
/usr/bin/sa-learn --sync >> $LOG 2>&1
echo "Statistics for this run:" >> $LOG 2>&1
/usr/bin/sa-learn --dump magic >> $LOG 2>&1

echo -e "Run finished `date +%c`"  >> $LOG 2>&1


2 thoughts on “SpamAssassin sa-learn cron script for virtual users

  1. Thanks for this useful script 🙂

    This script spits out the error “find: invalid argument `-exec’ to `-mtime'”. I don’t mind as I have a separate script to purge old spam so I commented out that part.

  2. Hello! I think it could be slight variances in how find works between distributions, but I may have worked it out: try updating the offending line to remove the + symbol after -mtime (so “find /var/vmail/$domain/$user/.Junk/cur/ -type f -mtime $CLEAN -exec rm {} \;”)

    If that fixes it for you, let me know!

    – zac.

Comments are closed.