Inderscience PublishersInderscience PublishersInderscience Publishers
  PUBLISHERS OF DISTINGUISHED ACADEMIC, SCIENTIFIC AND PROFESSIONAL JOURNALS

Article Abstract

Title: A comparison of data preparation approaches for e-mail categorisation
  Author: Helmut Berger, Dieter Merkl, Michael Dittenbach   Email author(s)
  Address: E-Commerce Competence Center (EC3), Donau City Strasse 1, A-1220 Wien, Austria. ' Institut fur Softwaretechnik und Interaktive Systeme, Technische Universitat Wien, Favoritenstrasse 9-11/188, A-1040 Wien, Austria. ' E-Commerce Competence Center (EC3), Donau City Strasse 1, A-1220 Wien, Austria
  Journal: International Journal of Intelligent Information and Database Systems 2007 - Vol. 1, No.2  pp. 91 - 121
  Abstract: This paper reports on experiments in multi-class e-mail categorisation with supervised and unsupervised machine learning techniques. To this end, Support Vector Machines, decision tree learners, instance-based classifiers, Naive Bayes classification approaches and Self-Organising Maps were applied. A word-based and a character n-gram document representation approach were employed in order to assess the categorisation performance of the various learning approaches. The results indicate a substantial increase in classification accuracy when e-mail header information is considered in the document representation. To a much lesser degree, word-based document representations are advantageous over n-gram representations.
  Keywords: e-mail categorisation; document representation; machine learning; indexing methods; information filtering; feature selection; data preparation; support vector machines; decision tree learners; instance-based classifiers; Bayes classification; self-organising maps; header information.
  DOI: 10.1504/IJIIDS.2007.014946
  Access for editors and complimentary subscribers       Access for Subscribers   Purchase this Paper        We welcome your comments about this paper Comment on the Paper