We develop methods for describing users based on their posts to an online discussion forum. These methods build on existing techniques to describe other aspects of online discussions communities, but the application of these techniques to describing users is novel. We demonstrate the utility of our proposed methods by showing that they are superior to existing methods over distinct thread-level, post-level and user-level classification tasks, utilizing real world datasets. In all cases, we attain statistically significant improvements over baseline results. In post-level classification, we also see statistically significant improvements over state-of-the-art benchmark methods.
Our major contributions in this work are
• creation of a corpus with user-level annotations
• detailed description and analysis of three relevant corpora
• implementation of a data model for accessing forum data
• implementation of feature extraction techniques
• evaluation and analysis of user-level features over classification tasks
Our work on preparing corpora and providing extensible implementations of feature extraction will be of particular value to researchers looking to work in this field.