Topic Crawler for Social Networks Monitoring
| Authors |
|
|---|---|
| Publication date | 2013 |
| Host editors |
|
| Book title | Knowledge Engineering and the Semantic Web |
| Book subtitle | 4th International Conference, KESW 2013, St. Petersburg, Russia, October 7-9, 2013 : proceedings |
| ISBN |
|
| ISBN (electronic) |
|
| Series | Communications in Computer and Information Science |
| Event | 4th International Conference on Knowledge Engineering and Semantic Web, KESW 2013 |
| Pages (from-to) | 214-227 |
| Number of pages | 14 |
| Publisher | Heidelberg: Springer |
| Organisations |
|
| Abstract |
Paper describes a focused crawler for monitoring social networks which is used for information extraction and content analysis. Crawler implements MapReduce model for distributed computations and is oriented to big text data. Focused crawler allows to look for the pages classified as relevant to the specified topic. Classifier is build using knowledge database that defines words, their classes and rules of joining words into the phrases. Based on the weights of words and phrases the text weight which indicates relevance to the topic is obtained. This system was used to detect drug community in Russian segment of Livejournal social network. Official and slang drug terminology was implemented to develop knowledge database. Different aspects of knowledge database and classifier are studied. The non-homogeneous Poisson process was used to model blogs changing since it permits to build a monitoring policy that includes blogs update frequency and day-time effect. Evaluation on real data shows 25% increase in new posts detection. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1007/978-3-642-41360-5_17 |
| Other links | https://www.scopus.com/pages/publications/84884640207 |
| Permalink to this page | |
