A connectionist approach for classifying accident narratives

Subrata Chatterjee, Purdue University


A connectionist-based model for classifying free-text accident descriptions is designed and validated in this thesis. Model selection for the classifier was done using 10-fold cross-validation with early stopping to prevent over-fitted networks. Singular value decomposition was used for feature extraction and network training was done using scaled conjugate gradient. The complexity of the accident corpus allowed the selection of a single layer linear network with 130 input nodes and 10 sigmoidal output nodes. The performance of the classifier was validated by comparing it with a fuzzy Bayes and a keyword-based model using a test bed of 3680 narratives that were previously classified by two human subjects. Using a randomly selected set of 100 narratives, Pearson's r for inter-rater reliability between the two human subjects was found to be 0.80 (p = 0.001). By repeating 100 narratives within each questionnaire given to the two human subjects, intra-rater reliability (stability) was calculated. The ratings from the two subjects were found to be quite stable with Pearson's r coefficient values of 0.84 (p = 0.001) and 0.77 (p = 0.001) respectively for the two human subjects. It was hypothesized that the connectionist and the fuzzy Bayes model would be able to (1) classify narratives with synonyms and also perform better than the keyword-based model, (2) provide correct predictions for narratives that require word-sense disambiguation, and (3) detect implicit events from incomplete accident descriptions. Results obtained in this thesis showed that (1) both the connectionist and the fuzzy Bayes model were superior in performance to that of the keyword model by classifying a significantly large number of narratives that the keyword model had failed to classify, (2) the connectionist model was superior in performance to both the fuzzy Bayes and keyword model for ambiguous narratives, and (3) there was no significant difference between the three models for detecting implicit events. The results in this thesis can be used as a guideline for choosing task-based classifiers and also for designing context-sensitive search engines. ^




Major Professor: Mark R. Lehto, Purdue University.

Subject Area

Engineering, Industrial|Operations Research|Computer Science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server