Prediction of Hospitalization Cost for Childbirth Variables

Authors - Si-Chi Chin, James Marquardt, Rui Liu, Martine De Cock
As part of a workshop submission to HI-KDD on predicting monetary costs to mothers hospitalized ahead of childbirth, we include a list of variables used in our models. These variable descriptions are publicly available from the Healthcare Cost Utilization Project (HCUP) State Inpatient Database (SID) website at
We employ two methods of feature selection ahead of model training; frequency based (only select variables which have at least 10% non-missing/positive values) and regression based (only select variables based on regression testing). We observe the variables selected from the frequency based method to be a subset of those selected from the regression based method. In the descriptions below, all variables listed were selected via regression testing. Those annotated with a * are were selected via frequency selection.
Base (B): Age*, Race*, Length of Stay*
Commorbidities (C): Deficiency anemias*, Rheumatoid arthritis/collagen vascular diseases, Chronic blood loss anemia*, Congestive heart failure, Chronic pulmonary disease, Coagulopathy, Depression, Diabetes (uncomplicated), Diabetes with chronic complications, Drug abuse, Hypertension (combine uncomplicated and complicated), Hypothyroidism, Liver disease, Fluid and electrolyte disorders, Obesity*, Paralysis, Peripheral vascular disorders, Psychoses, Pulmonary circulation disorders, Renal failure, Valvular disease
Revenue code (groupings listed) (R): Room and board*, nursery, intensive care, coronary care, incremental nursing care, pharmacy drug*, IV therapy, medical/surgical supplies*, laboratory*, radiology, nuclear medicine, CT scan, operating room services*, anesthesia*, blood storage and processing, ultrasound*, respiratory services, physical therapy, occupationaly therapy, speech language pathology, emergency room, pulmonary function, audiology, cardiology, outpatient services, clinic, MRI, drugs requiring specific identification*, trauma response, recovery room*, labor room/delivery*, electrocardiogram, electroencephalogram, gastrointestinal services, treatment or observation room*, preventive care services, inpatient renal dialysis, behavioural health, other diagnostic services, other theraputic services*, patient convenience items
Diagnosis Related Group (D): Single variable indicating beneficiary DRG
Hospital information (H): Wage index*, hospital specific all-payer in-patient cost-charge ratio (CCR)*, group average all-payer inpatient CCR*, hospital type*, capital cost adjustment index*