Prediction of Hospitalization Cost for Childbirth Variables

Authors - Si-Chi Chin, James Marquardt, Rui Liu, Martine De Cock
 
As part of a workshop submission to HI-KDD on predicting monetary costs to mothers hospitalized ahead of childbirth, we include a list of variables used in our models. These variable descriptions are publicly available from the Healthcare Cost Utilization Project (HCUP) State Inpatient Database (SID) website at http://www.hcup-us.ahrq.gov/db/state/siddist/sid_multivar.jsp.
 
We employ two methods of feature selection ahead of model training; frequency based (only select variables which have at least 10% non-missing/positive values) and regression based (only select variables based on regression testing). We observe the variables selected from the frequency based method to be a subset of those selected from the regression based method. In the descriptions below, all variables listed were selected via regression testing. Those annotated with a * are were selected via frequency selection.
 
Base (B): Age*, Race*, Length of Stay*
Commorbidities (C): Deficiency anemias*, Rheumatoid arthritis/collagen vascular diseases, Chronic blood loss anemia*, Congestive heart failure, Chronic pulmonary disease, Coagulopathy, Depression, Diabetes (uncomplicated), Diabetes with chronic complications, Drug abuse, Hypertension (combine uncomplicated and complicated), Hypothyroidism, Liver disease, Fluid and electrolyte disorders, Obesity*, Paralysis, Peripheral vascular disorders, Psychoses, Pulmonary circulation disorders, Renal failure, Valvular disease
 
Revenue code (groupings listed) (R): Room and board*, nursery, intensive care, coronary care, incremental nursing care, pharmacy drug*, IV therapy, medical/surgical supplies*, laboratory*, radiology, nuclear medicine, CT scan, operating room services*, anesthesia*, blood storage and processing, ultrasound*, respiratory services, physical therapy, occupationaly therapy, speech language pathology, emergency room, pulmonary function, audiology, cardiology, outpatient services, clinic, MRI, drugs requiring specific identification*, trauma response, recovery room*, labor room/delivery*, electrocardiogram, electroencephalogram, gastrointestinal services, treatment or observation room*, preventive care services, inpatient renal dialysis, behavioural health, other diagnostic services, other theraputic services*, patient convenience items
 
Diagnosis Related Group (D): Single variable indicating beneficiary DRG
 
Hospital information (H): Wage index*, hospital specific all-payer in-patient cost-charge ratio (CCR)*, group average all-payer inpatient CCR*, hospital type*, capital cost adjustment index*