POPFile corpus_diff Utility

This utility is deprecated and no longer supported. It will not function properly with versions higher than 0.20.x.

The corpus_diff utility is a POPFile tool intended for developers or those advanced POPFile users who wish to analyze the difference between a reference corpus and the current corpus. The diff will show all deleted, changed and added entries.

The utility is not intended for the casual POPFile user. It requires a good understanding of using command line programs, the directory structure, and of how to copy files from directory to directory.

Corpus_diff is compatible with versions 0.19.0 or higher of POPFile.

POPFile is an automatic email classification tool authored by John Graham-Cumming available from SourceForge.

Instructions for use

  1. Download the script;

  2. Create a reference corpus by copying your existing corpus folder to a folder named corpus.bak (replace the word corpus with the name of your corpus folder if you changed it from the default. This reference corpus will be the corpus that corpus_diff compares against.

  3. After reclassifying mail, you can, at will, run the corpus_diff utility to see the changes since your referenece corpus.

    • Open a DOS box and change to your POPFile directory.

    • run corpus_diff

      perl corpus_diff.pl > diff.htm
      

    • View the results via your browser, either browse to your POPFile directory and open diff.htm, or, type

      start diff.htm
      
      at the DOS prompt to startup the browser and display diff.htm.

Sample Output Report

The following is a sample of the output from corpus_diff run against the author's corpus on June 25, 2003.

sample_diff.htm

 

Copying

Copyright (C) 2003 Scott W. Leighton

Licensed under the terms of the GNU General Public License.

Contributed to the POPFile project under the terms of the POPFile License Agreement.


Back to POPFile Utilities