Age and Gender Identification in Social Media

This paper describes the submission of the University of Washington’s Center for Data Science to the PAN 2014 author profiling task. We examine the predictive quality in terms of age and gender of several sets of features extracted from various genres of online social media. Through comparison, we establish a feature set which maximizes accuracy of gender and age prediction across all genres examined. We report accuracies obtained by two approaches to the multi-label classification problem of predicting both age and gender; a model wherein the multi-label problem is reduced to a single-label problem using powerset transformation, and a chained classifier approach wherein the output of a dedicated classifier for gender is used as input for a classifier for age.
in: CLEF 2014 Working Notes proceedings (5th Conference and Labs of the Evaluation Forum)
J. Marquardt, G. Farnadi, G. Vasudevan, M.-F. Moens, S. Davalos, A. Teredesai, M. De Cock
Publication File: