Big Data Infrastructure

Our group focuses on the evolution and utility of various big data platforms and tools and how to evaluate the right set of tools for a particular problem domain. We help industry partners and researchers benchmark hadoop stacks and leverage an understanding of our expertise in data transformation, parallel and distributed computing and machine learning.

In-Database In-Memory Mining (IDIM2)

With increasing availability of large amounts of main memory and shared nothing architectures, there is an increasing opportunity to design data parallel in-memory operators that blend query processing and map-reduce like constructs. Thus machine learning and data mining can be scaled up significantly if such operators and data structures become available. I like to term such structures in-database in-memory operators. In this project I would like to collaborate with students to port traditional memory bound data mining algorithms to this new paradigm and develop an open source architecture. Programming would likely be in C, R, Java and Scala. Application datasets would range from social graphs to healthcare datasets. Joint project with any interested colleagues from institute and industry.

ACO Risk Stratification

Accountable Care Organizations are a special new thing resulting from what is popularly called Obamacare. One key challenge ACOs face today are identifying patients that are high or low risk to understand the cost of managing various patients. Once cost models are understood, patients can receive quality care improving holistic health rather than just symptoms related treatments. In this project we will engage in a year long investigation of ACO's and how they stratify risk. We will build machine learning models to predict risk categories for existing patients and validate our results in partnership with Edifecs, a local claims processing startup. Joint project with industry partners and interested colleagues including post-docs Si-Chi Chin and Archana Ramesh.