Abstract
Social media generates a massive amount of data at a very fast pace. Objective information such as news, and subjective content such as opinions and emotions are intertwined and readily available. This data is very appealing from both a research and a commercial point of view, for applications such as public polling or marketing purposes. A complete understanding requires a combined view of information from different sources which are usually enriched (e.g. sentiment analysis) and visualized in a dashboard.
In this work, we present a toolkit that tackles these issues on different levels: (1) to extract heterogeneous information, it provides independent data extractors and web scrapers; (2) data processing is done with independent semantic analysis services that are easily deployed; (3) a configurable Big Data orchestrator controls the execution of extraction and processing tasks; (4) the end result is presented in a sensible and interactive format with a modular visualization framework based on Web Components that connects to different sources such as SPARQL and ElasticSearch endpoints. Data workflows can be defined by connecting different extractors and analysis services. The different elements of this toolkit interoperate through a linked data principled approach and a set of common ontologies. To illustrate the usefulness of this toolkit, this work describes several use cases in which the toolkit has been successfully applied.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Sentiment140, MeaningCloud and IBM Watson are online sentiment analysis services available at http://www.sentiment140.com/, https://www.meaningcloud.com/ and https://www.ibm.com/watson/services/natural-language-understanding/, respectively.
- 3.
GSICrawler’s documentation: https://gsicrawler.readthedocs.io.
- 4.
- 5.
- 6.
Sefarad’s documentation: http://sefarad.readthedocs.io/.
- 7.
- 8.
Soneti’s documentation: https://soneti.readthedocs.io/.
- 9.
References
Aramburu García, P.: Design and development of a sentiment analysis system on Facebook from political domain. Master’s thesis, ETSI Telecomunicación, June 2017
Barbado, R.: Design of a prototype of a big data analysis system of online radicalism based on semantic and deep learning technologies. TFM, ETSI Telecomunicación, June 2018
Bermejo, R.: Desarrollo de un framework HTML5 de Visualización y Consulta Semántica de Repositorios RDF. Master’s thesis, Universidad Politécnica de Madrid, June 2014
Breslin, J.G., Decker, S., Harth, A., Bojars, U.: SIOC: an approach to connect web-based communities. Int. J. Web Based Commun. 2(2), 133–142 (2006)
Carmona, J.E.: Development of a social media crawler for sentiment analysis. Master’s thesis, ETSI Telecomunicación, February 2016
Conde-Sánchez, E.: Development of a social media monitoring system based on elasticsearch and web components technologies. Master’s thesis, ETSI Telecomunicación, June 2016
Díaz-Vega, R.: Design and implementation of an HTML5 framework for biodiversity and environmental information visualization based on geo linked data. Master’s thesis, ETSI Telecomunicación, December 2014
García-Castaño, J.: Development of a monitoring dashboard for sentiment and emotion in geolocated social media. Master’s thesis, ETSI Telecomunicación, July 2017
Glazkov, D., Weinstein, R., Ross, T.: HTML templates W3C working group note 18. Technical report, W3C, March 2014
Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. O’Reilly Media, Inc., Newton (2015)
Graves, M., Constabaris, A., Brickley, D.: FOAF: connecting people on the semantic web. Cat. Classif. Q. 43(3–4), 191–202 (2007)
Guha, R.V., Brickley, D., Macbeth, S.: Schema.org: evolution of structured data on the web. Commun. ACM 59(2), 44–51 (2016)
Gupta, Y.: Kibana Essentials. Packt Publishing Ltd., Birmingham (2015)
Hellmann, S.: Integrating natural language processing (NLP) and language resources using linked data. Ph.D. thesis, Universität Leipzig (2013)
Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 98–113. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_7
Hernando, M.: Development of a classifier of radical tweets using machine learning algorithms. Master’s thesis, ETSI Telecomunicación, January 2018
Ito, H.: Shadow DOM. Technical report, W3C, March 2018
Izquierdo-Mora, J.M.: Design and development of a lyrics emotion analysis system for creative industries. Master’s thesis, ETSI Telecomunicación, January 2018
Jena, A.: Apache Jena Fuseki. The Apache Software Foundation (2014)
Kotliar, M., Kartashov, A., Barski, A.: CWL-Airflow: a lightweight pipeline manager supporting common workflow language. bioRxiv p. 249243 (2018)
Kouzis-Loukas, D.: Learning Scrapy. Packt Publishing Ltd., Birmingham (2016)
Krug, M.: Distributed event-based communication for web components. In: Proceedings of Studierendensymposium Informatik 2016 der TU Chemnitz, pp. 133–136 (2016)
Lampa, S., Alvarsson, J., Spjuth, O.: Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles. J. Cheminform. 8(1), 67 (2016)
Missier, P., Belhajjame, K., Cheney, J.: The W3C PROV family of specifications for modelling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 773–776. ACM (2013)
Moreno Sánchez, C.: Design and development of an affect analysis system for football matches in Twitter based on a corpus annotated with a crowdsourcing platform. Master’s thesis, ETSI Telecomunicación (2018)
Morita, H., Glazkov, D.: HTML imports. W3C working draft, W3C, February 2016
Ochoa, J.: Design and Implementation of a scraping system for sport news. Master’s thesis, ETSI Telecomunicación, February 2017
Pascual-Saavedra, A.: Development of a dashboard for sentiment analysis of football in Twitter based on web components and D3.js. Master’s thesis, ETSI Telecomunicación, June 2016
Pinterest: Pinball. https://github.com/pinterest/pinball
Ranic, T., Gusev, M.: Overview of workflow management systems. In: Proceedings of the 14th International Conference for Informatics and Information Technology, CIIT 2017. Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Macedonia (2017)
Sánchez-Rada, J.F., Iglesias, C.A.: Onyx: a linked data approach to emotion representation. Inf. Process. Manag. 52(1), 99–114 (2016)
Sánchez-Rada, J.F., Iglesias, C.A., Corcuera, I., Araque, O.: Senpy: a pragmatic linked sentiment analysis framework. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 735–742. IEEE (2016)
Sánchez-Rada, J.F., Iglesias, C.A., Gil, R.: A linked data model for multimodal sentiment and emotion analysis. In: Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications, pp. 11–19. Association for Computational Linguistics, Beijing, July 2015
Sánchez-Rada, J.F., Iglesias, C.A., Sagha, H., Schuller, B., Wood, I., Buitelaar, P.: Multimodal multimodel emotion analysis as linked data. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 111–116. IEEE (2017)
Sánchez-Rada, J.F., Torres, M., Iglesias, C.A., Maestre, R., Peinado, R.: A linked data approach to sentiment and emotion analysis of Twitter in the financial domain. In: Second International Workshop on Finance and Economics on the Semantic Web, FEOSW 2014, vol. 1240, pp. 51–62, May 2014. http://ceur-ws.org/Vol-1240/
Saura Villanueva, A.: Development of a framework for geolinked data query and visualization based on web components. PFC, ETSI Telecomunicación, June 2015
Schröder, M., Baggia, P., Burkhardt, F., Pelachaud, C., Peter, C., Zovato, E.: EmotionML – an upcoming standard for representing emotions and related states. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011. LNCS, vol. 6974, pp. 316–325. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24600-5_35
Souto, D.S.: Design and development of a system for sleep disorder characterization using social media mining. Master’s thesis, ETSI Telecomunicación, ETSIT, Madrid, June 2018
Sporny, M., Kellogg, G., Lanthaler, M.: JSON-LD 1.0, January 2014. http://json-ld.org/spec/latest/json-ld/
Spotify: Luigi. https://github.com/spotify/luigi
Stephen, J.J., Savvides, S., Sundaram, V., Ardekani, M.S., Eugster, P.: STYX: stream processing with trustworthy cloud-based execution. In: Proceedings of the Seventh ACM Symposium on Cloud Computing, pp. 348–360. ACM (2016)
Stokolosa, V.: Communication between components (2018). https://hackernoon.com/communication-between-components-7898467ce15b
Thusoo, A., et al.: Hive-a petabyte scale data warehouse using Hadoop. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE (2010)
Torres, M.: Prototype of stock prediction system based on Twitter emotion and sentiment analysis. Master’s thesis, ETSI Telecomunicación, July 2014
Warr, W.A.: Scientific workflow systems: pipeline pilot and Knime. J. Comput.-Aided Mol. Des. 26(7), 801–804 (2012)
Westerski, A., Iglesias, C.A., Tapia, F.: Linked opinions: describing sentiments on the structured web of data. In: Proceedings of the Fourth International Workshop on Social Data on the Web, SDoW2011, pp. 21–32. CEUR, October 2011
WHATWG (Apple, Google, Mozilla, Microsoft): HTML living standard. Technical report, W3C, July 2018
White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Newton (2012)
Wilde, E., Duerst, M.: URI fragment identifiers for the text/plain media type, April 2008
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Acknowledgements
The authors want to thank Roberto Bermejo, Alejandro Saura, Rubén Díaz and José Carmona for working on previous versions of the toolkit. In addition, we want to thank Marcos Torres, Jorge García-Castaño, Pablo Aramburu, Rodrigo Barbado, Jose \(\text {M}^{\text {a}}\) Izquierdo, Mario Hernando, Carlos Moreno, Javier Ochoa and Daniel Souto, who have applied the toolkit in different domains. Lastly, we also thank our partners at Taiger and HI-Iberia for using the toolkit and collaborating in the integration of their analysis services with the toolkit as part of project SoMeDi (ITEA3 16011).
This work is supported by ITEA 3 EUREKA Cluster programme together with the National Spanish Funding Agencies CDTI (INNO-20161089) and MINETAD (TSI-102600-2016-1), the Spanish Ministry of Economy and Competitiveness under the R&D project SEMOLA (TEC2015-68284-R) and by the European Union through the project Trivalent (Grant Agreement no: 740934).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Sánchez-Rada, J.F., Pascual, A., Conde, E., Iglesias, C.A. (2018). A Big Linked Data Toolkit for Social Media Analysis and Visualization Based on W3C Web Components. In: Panetto, H., Debruyne, C., Proper, H., Ardagna, C., Roman, D., Meersman, R. (eds) On the Move to Meaningful Internet Systems. OTM 2018 Conferences. OTM 2018. Lecture Notes in Computer Science(), vol 11230. Springer, Cham. https://doi.org/10.1007/978-3-030-02671-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-02671-4_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02670-7
Online ISBN: 978-3-030-02671-4
eBook Packages: Computer ScienceComputer Science (R0)