The Snapshot Stats utility is a tool to capture "snapshots" of POPFile's accuracy statistics by periodically running via the task scheduler (or a cron job) and updating an Excel compatible CSV file containing your accuracy history.
The following data items are captured each run;
This version has been tested in a Windows environment with versions 0.19.x and 0.20.x of POPFile and version 0.21.0 of POPFile. It is not compatible with earlier versions of POPFile. The author believes that the utility is platform independent and will work properly on non-Windows POPFile installs, but has not tested on those platforms.
POPFile is an automatic email classification tool authored by John Graham-Cumming available from SourceForge.
Download the correct version of script to your POPFile install directory, normally c:\Program Files\Popfile by clicking below;
Use the Wizard to browse to your POPFile installation directory, usually "c:\Program Files\Popfile", and select the wperl.exe program
change the name of the task to Snapshot Stats
Select the frequency to run it (daily is recommended)
Select the time and day(s) to run it
Check the Open Advanced Properties for this task when I click Finish checkbox.
Click finish
The advanced properties box will open up. In the Run: box, add the scriptname "snapshot_stats.pl" one space to the right of wperl.exe. If you did this right, the Run: box should look like this (for a normal POPFile installation),
"c:\program files\popfile\wperl.exe" snapshot_stats.pl
Click Apply then Click OK
Close the task scheduler (or test it by right clicking on the new entry you made and selecting run)
You're done. The task scheduler will run the snapshot_stats script at the time(s) you scheduled, the script will update the snapshots_stats.csv file in your POPFile folder. You can periodically open that file with Excel to view your historical stats and analyze them in various ways.
The following is a sample of the CSV file created by snapshot_stats when run against the author's POPFile installation on May 25, 2003. Note that this example shows only one set of statistics for each bucket. In the real world, an entire history of snapshots (taken at come recurring interval, like daily) would be captured in the file.
BucketName,BucketColor,UnixTimestamp,Timestamp,BucketUniqueWords,BucketWordCount,BucketMailsClassified,BucketFalsePositives,BucketFalseNegatives,GlobalWordCount,GlobalDownloads,GlobalMessages,GlobalErrors,LastResetDate normal,blue,1053935655,Mon May 26 00:54:15 2003,8647,45189,125,2,0,69939,34152,132,2,Sun May 25 00:45:53 2003 spam,red,1053935655,Mon May 26 00:54:15 2003,6790,24750,7,0,2,69939,34152,132,2,Sun May 25 00:45:53 2003
The script accepts commandline options to optionally override the separator character or quotes used in producing the CSV file.
-csv_separator this option permits you to override the default comma separator to some other character.
-csv_quote this option permits you to override the default field quoting character (none) to some other character.
Changing the default comma separator to a semi-colon.
perl snapshot_stats.pl -csv_separator ;
Changing the default quote character to a double-quote mark (most shell scripts will require you to escape it as shown in the example).
perl snapshot_stats.pl -csv_quote \"
Changing the default comma separator to a colon and the default quote character to a single quote.
perl snapshot_stats.pl -csv_separator : -csv_quote \'
Important Note: If you have already begun using snapshot_stats and want to change the separator character, you must either delete the snapshot_stats.csv file in your POPFile directory or manually edit it with a text editor to change the existing separator characters in the file to match the new separator character. If you fail to do so, you will end up with a snapshot_stats.csv file that has mixed separator's in the various rows of the file.
I noticed a snapshot subdirectory was created in my POPFile folder, why is this?
This occurs only with V 0.19.x or v 0.20.x of POPFile. The program uses the POPFile API to gather all of the corpus data. The API calls automatically create a couple of files, popfile.pid and a popfile#.log file. In order to ensure that running this program does not interfere with your running POPFile installation, we divert the version of those files created by this program to a safe place, the snapshot subdirectory, where they will be harmless. You can delete the subdirectory and contents at will.
Copyright (C) 2003 - 2007 Scott W. Leighton
Licensed under the terms of the GNU General Public License.
Contributed to the POPFile project under the terms of the POPFile License Agreement.