Population Cost Prediction on Public Healthcare Datasets

The increasing availability of digital health records should ideally improve accountability in healthcare. In this context, the study of predictive modeling of healthcare costs forms a foundation for accountable care, at both population and individual patient-level care. In this research we use machine learning algorithms for accurate predictions of healthcare costs on publicly available claims and survey data. Specifically, we investigate the use of the regression trees, M5 model trees and random forest, to predict healthcare costs of individual patients given their prior medical (and cost) history. Overall, three observations showcase the utility of our research: (a) prior healthcare cost alone can be a good indicator for future healthcare cost, (b) M5 model tree technique led to very accurate future healthcare cost prediction, and (c) although state-of-the-art machine learn- ing algorithms are also limited by skewed cost distributions in healthcare, for a large fraction (75%) of population, we were able to predict with higher accuracy using these algorithms. In particular, using M5 model trees we were able to accurately predict costs within less than $125 for 75% of the population when compared to prior techniques. Since models for predicting healthcare costs are often used to as- certain overall population health, our work is useful to eval- uate future costs for large segments of disease populations with reasonably low error as demonstrated in our results on real-world publicly available datasets.
Conference: 
5th International Conference on Digital Health
Year: 
2015
Authors: 
Shanu Sushmita, Stacey Newman, James Marquardt, Prabhu Ram, Virendra Prasad, Martine De Cock, Ankur Teredesai