Salvaging the Internet Hate Machine: Using the discourse of extremist online subcultures to identify emergent extreme speech

doi:https://doi.org/10.5281/zenodo.3676482

Salvaging the Internet Hate Machine: Using the discourse of extremist online subcultures to identify emergent extreme speech

Creators	Stijn Peeters Sal Hagen Partha Das
Publication date	20-02-2020
Description	This dataset accompanies a paper submitted to the WebSci 20 conference. In this paper, we present a lexicon of 'extreme speech' that may be used to detect hate speech and extreme speech on online platforms. We outline a cross-disciplinary research protocol through which this lexicon is initially extracted from a corpus of 3,335,265 posts from 4chan's /pol/ sub-forum using a hybrid method comprising word2vec modeling and subsequent snowballing of nearest neighbours of a small initial expert seed list of extreme language. The choice of corpus is significant, as 4chan is a space of rapid language innovation and obscure extreme vernacular, complicating generalised approaches. Our lexicon detects significantly more extreme posts within a corpus from a more mainstream platform (Reddit) than another popular lexicon, Hatebase, with similar accuracy. Our lexicon and the method of its creation thus provide a contribution to the study of the toxicity of online subcultures similar to 4chan, as well as more mainstream platforms. As we demonstrate, the lexicon allows for more effective detecting of extreme speech in these spaces. This method and the lexicon have further been made available through an open-source web tool for the study of online social platforms, 4CAT. The computational methods and lexicon on offer here can thus be used by a wide academic audience, fostering interdisciplinary approaches to the study of online hate and extreme speech.
Publisher	Zenodo
Organisations	Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam School for Cultural Analysis (ASCA) Faculty of Science (FNWI) - Informatics Institute (IVI)
Document type	Dataset
DOI	https://doi.org/10.5281/zenodo.3676482
Other links	https://zenodo.org/record/3676483
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Salvaging the Internet Hate Machine: Using the discourse of extremist online subcultures to identify emergent extreme speech