Collection and Indexing of Tweets with a Geographical FocusReportar como inadecuado

Collection and Indexing of Tweets with a Geographical Focus - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

* Corresponding author 1 OeAW - Austrian Academy of Sciences

Abstract : This paper introduces a Twitter corpus currently focused geographically in order to 1 test selection and collection processes for a given region and 2 find a suitable database to query, filter, and visualize the tweets. Due to access restrictions, it is not possible to retrieve all available tweets, which is why corpus construction implies a series of decisions described below. The corpus focuses on Austrian users, as data collection grounds on a two-tier detection process addressing corpus construction and user location issues. The emphasis lies on short messages whose sender mentions a place in Austria as his-her hometown or tweets from places located in Austria. The resulting user base is then queried and enlarged using focused crawling and random sampling, so that the corpus is refined and completed in the way of a monitor corpus. Its current volume is 21.7 million tweets from approximately 125,000 users. The tweets are indexed using Elasticsearch and queried via the Kibana frontend, which allows for queries on metadata as well as for the visualization of geolocalized tweets currently about 3.3% of the collection.

Keywords : Computer-Mediated Communication Web Corpus Construction Database Solutions Visualization

Autor: Adrien Barbaresi -



Documentos relacionados