Feature selection and data sampling methods for learning reputation dimensions: The University of Amsterdam at RepLab 2014
| Authors |
|
|---|---|
| Publication date | 2014 |
| Host editors |
|
| Book title | Working Notes for CLEF 2014 Conference |
| Book subtitle | Sheffield, UK, September 15-18, 2014 |
| Series | CEUR Workshop Proceedings |
| Event | CLEF 2014 Labs and Workshop |
| Pages (from-to) | 1479-1490 |
| Publisher | Aachen: CEUR-WS |
| Organisations |
|
| Abstract |
We report on our participation in the reputation dimension task of the CLEF RepLab 2014 evaluation initiative, i.e., to classify social media updates into eight predefined categories. We address the task by using corpus-based methods to extract textual features from the labeled training data to train two classifiers in a supervised way. We explore three sampling strategies for selecting training examples, and probe their effect on classification performance. We find that all our submitted runs outperform the baseline, and that elaborate feature selection methods coupled with balanced datasets help improve classification accuracy.
|
| Document type | Conference contribution |
| Language | English |
| Published at | http://ceur-ws.org/Vol-1180/CLEF2014wn-Rep-GarbaceaEt2014.pdf |
| Other links | http://ceur-ws.org/Vol-1180 |
| Downloads |
CLEF2014wn-Rep-GarbaceaEt2014
(Final published version)
|
| Permalink to this page | |
