Thesis Defense, Darren Hon - Sequence based prediction takes an ordered list of events as input and makes predictions about the next event. Most existing work on sequence based prediction assumes that the sequences are simple, i.e. consisting of symbols drawn from a small alphabet (like a DNA sequence), or consisting of numbers (like a time series). In some applications, the events are a lot more complex. In medical applications for instance, data often comes in the form of a longitudinal sequence of patient records, each of which internally contains hundreds of features of various data types.
In this thesis we propose a new technique for sequence based prediction that is domain independent and that takes the order of occurrence of events into account when making predictions. The key idea is to dissect each sequence of k feature vectors of size m into a set of m simple sequences of length k, train m models using well established machine learning techniques such as decision trees or support vector machines, and group the m trained models into an ensemble for making the final prediction. We evaluate the predictive ability of our new technique by measuring its accuracy for predicting risk of 30-day readmission, cost and length of stay using hospital discharge records of hundreds of thousands of congestive heart failure patients.