Feature selection and data sampling methods for learning reputation dimensions: The University of Amsterdam at RepLab 2014

Authors	C. Gârbacea M. Tsagkias M. de Rijke
Publication date	2014
Host editors	L. Cappellato N. Ferro M. Halvey W. Kraaij
Book title	Working Notes for CLEF 2014 Conference
Book subtitle	Sheffield, UK, September 15-18, 2014
Series	CEUR Workshop Proceedings
Event	CLEF 2014 Labs and Workshop
Pages (from-to)	1479-1490
Publisher	Aachen: CEUR-WS
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	We report on our participation in the reputation dimension task of the CLEF RepLab 2014 evaluation initiative, i.e., to classify social media updates into eight predefined categories. We address the task by using corpus-based methods to extract textual features from the labeled training data to train two classifiers in a supervised way. We explore three sampling strategies for selecting training examples, and probe their effect on classification performance. We find that all our submitted runs outperform the baseline, and that elaborate feature selection methods coupled with balanced datasets help improve classification accuracy.
Document type	Conference contribution
Language	English
Published at	http://ceur-ws.org/Vol-1180/CLEF2014wn-Rep-GarbaceaEt2014.pdf (Final published version)
Other links	http://ceur-ws.org/Vol-1180
Downloads	CLEF2014wn-Rep-GarbaceaEt2014 (Final published version)
Permalink to this page

Back

UvA-DARE