POPFile topten Utility (Enhanced Version)

The Top Ten utility is a tool for use with POPFile to list the top ten (or some other quantity you select) words in each bucket's corpus ranked high to low on the probability and the word count.

This version has been tested in a Windows environment with version 0.19.x and 0.20.x of POPFile and version 0.21.0 of POPFile, it is not compatible with earlier versions of POPFile. The author believes that the utility is platform independent and will work properly on non-Windows POPFile installs, but has not tested on those platforms.

POPFile is an automatic email classification tool authored by John Graham-Cumming available from SourceForge.


Instructions for use

  1. Download the correct version of script to your POPFile install directory, normally c:\Program Files\Popfile by clicking below;

  2. Open a DOS Command box (click the DOS icon on your desktop or Start/Run and type command in the open box and click ok).

  3. Change to your POPFile installation directory, e.g.,

    cd  "\program files\popfile"

  4. Run topten.pl using Perl.

    perl topten.pl > topten.htm

  5. The resulting report will be in the file named 'topten.htm', open it with your browser to view it.

    start topten.htm
    Or browse to your POPFile install directory and open it from there with Explorer.

Note: to select more (or less) words simply place the commandline option -topten_count with an integer value on the command line when you execute topten.pl, e.g.,

perl topten.pl -topten_count 50 >topten.htm
The above would list the top 50 words in each bucket.

Sample Output Report

The following is a sample of the output from topten run against the author's corpus on June 22, 2003 with the comandline option of 50 to show the top 50.



Running topten Automatically

Users can easily create a batch file (see below) and schedule it (also below) in the Task Scheduler to run periodically. By bookmarking the output file in your favorites, the latest run will be available to you at any time from that bookmark.

  1. Create a batch file as follows:

    perl topten.pl -topten_count 50 > topten.htm

  2. Save the batch file in your POPFile directory, name it topten.bat

  3. Open your task scheduler and add a scheduled task.

Alternatively, Windows users who have Tim Charron's Blat utility can easily set up topten to run automatically and email the results.

  1. Obtain and install Blat from Tim Charron's page here.

  2. install Blat in a directory in your path, or the POPFile directory

  3. run Blat -install <server address> <senders address> to get Blat configured correctly. Make sure that <server address> points to an smtp server that you are permitted to relay mail thru, usually this will be the same smtp server you set up in your mail client.

  4. Create a batch file as follows:

    perl topten.pl -topten_count 50 | blat - -t youremail@address.here -s "POPFile Top Ten Report" -html

  5. Save the batch file in your POPFile directory, name it topten.bat

  6. Open your task scheduler and add a scheduled task.

You're done. The task scheduler will run the batch file at the time(s) you scheduled. The batch file will run the Top Ten report and email it off to you. No muss, no fuss <g>



Copyright (C) 2003 - 2004 Scott W. Leighton

Licensed under the terms of the GNU General Public License.

Contributed to the POPFile project under the terms of the POPFile License Agreement.

Back to POPFile Utilities