Disease Prediction Based on Prior Knowledge

Increasing demand for digitalization of Electronic Health Records results in increased demand for effective data mining solutions. In this study we enhance the classical Support Vector Machine - Recursive Feature Elimination (SVM-RFE) approach to optimally estimate disease risk from hospital discharge record data. Our approach is based on incorporating prior knowledge from human disease networks extracted from hospital discharge historical data and lowering the burden of building classifiers from huge amounts of data. To predict future risk of hospitalization based on highly imbalanced and 11,170 dimensional hospital discharge data consisting of nearly 7 million records collected in year 2008, we adopt a knowledge representation from complex systems and a feature selection technique used in bioinformatics. Our out of sample results on year 2009 dataset of similar size provide evidence that the proposed method is beneficial in cases where the classical SVM-RFE model is unstable. When using the new method we demonstrate that stability is improved in cases where one aims to remove large batches of features in a single iteration.