Learning to Classify Medical Documents According to Formal and Informal Style

Title	Learning to Classify Medical Documents According to Formal and Informal Style
Publication Type	Conference Paper
Year of Publication	2010
Authors	Abu Sheikha F, Inkpen D
Conference Name	Workshop on Intelligent Methods for Protecting Privacy and Confidentiality in Data
Publisher	University of Ottawa
Conference Location	Ottawa, Canada
ISBN Number	978-0-9866482-0-5
Abstract	This paper discusses an important issue in computational linguistics: classifying sets of medical documents into formal or informal style. This might be important for patient safety. Formal documents are more likely to have been published by medical authorities; therefore, the patients could trust them more than they can trust informal documents. We used machine learning techniques in order to automatically classify documents into formal and informal style. First, we studied the main characteristics of each style in order to train a system that can distinguish between them. Then, we built our data set by collecting documents for both styles, from different sources. After that, we performed preprocessing tasks on the collected documents to extract features that represent the main characteristics of both styles. Finally, we test several classification algorithms, namely Decision Trees, Naïve Bayes, and Support Vector Machines, to choose the classifier that leads to the best classification results.
URL	http://ww.ehealthinformation.ca/documents/IMPPCD-2010.pdf#page=33

You are here