You are here

Learning to Classify Medical Documents According to Formal and Informal Style

TitleLearning to Classify Medical Documents According to Formal and Informal Style
Publication TypeConference Paper
Year of Publication2010
AuthorsAbu Sheikha, F, Inkpen, D
Conference NameWorkshop on Intelligent Methods for Protecting Privacy and Confidentiality in Data
PublisherUniversity of Ottawa
Conference LocationOttawa, Canada
ISBN Number978-0-9866482-0-5

This paper discusses an important issue in computational linguistics: classifying sets of medical documents into formal or informal style. This might be important for patient safety. Formal documents are more likely to have been published by medical authorities; therefore, the patients could trust them more than they can trust informal documents. We used machine learning techniques in order to automatically classify documents into formal and informal style.
First, we studied the main characteristics of each style in order to train a system that can distinguish between them. Then, we built our data set by collecting documents for both styles, from different sources. After that, we performed preprocessing tasks on the collected documents to extract features that represent the main characteristics of both styles. Finally, we test several classification algorithms, namely Decision Trees, Naïve Bayes, and Support Vector Machines, to choose the classifier that leads to the best classification results.