Title: Hybrid singular value decomposition; a model of human text classification

Authors: Amirali Noorinaeini, Mark R. Lehto

Addresses: Ph.D. Student, Purdue University, School of Industrial Engineering, West Lafayette, IN 47906, USA. ' Associate Professor of Industrial Engineering, Purdue University, West Lafayette, IN 47907-2023, USA

Abstract: The objective of this study was to investigate and compare the accuracy of three Singular Value Decomposition (SVD) based models in classifying injury narratives into external-cause-of-injury and poisoning (E-codes) categories. Two SVD-Bayesian models and one SVD-Regression model were developed for free text classification purposes. This study used injury narratives and corresponding E-codes assigned by human experts from the 1997 and 1998 US National Health Interview Survey (NHIS). Sensitivity, specificity and positive predictive value were measured by comparing all the three models| results with E-code categories assigned by experts. The performance of the equidistant Bayes model and regression model improved as more SVD vectors were used for the input. The regression model was compared to the fuzzy Bayes model as well. It was concluded that all three models are capable of learning from human experts to accurately categorise cause-of-injury codes from injury narratives, with the regression-based model being the strongest.

Keywords: accident narratives; bayes modelling; human modelling; regression; singular value decomposition; SVD; statistical modelling; text classification; injury causes; simulation; accidents; injuries; cause-of-injury codes; coding accuracy.

DOI: 10.1504/IJHFMS.2006.011684

International Journal of Human Factors Modelling and Simulation, 2006 Vol.1 No.1, pp.95 - 118

Published online: 14 Dec 2006 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article