Skip to main content

A Big Linked Data Toolkit for Social Media Analysis and Visualization Based on W3C Web Components

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems. OTM 2018 Conferences (OTM 2018)

Abstract

Social media generates a massive amount of data at a very fast pace. Objective information such as news, and subjective content such as opinions and emotions are intertwined and readily available. This data is very appealing from both a research and a commercial point of view, for applications such as public polling or marketing purposes. A complete understanding requires a combined view of information from different sources which are usually enriched (e.g. sentiment analysis) and visualized in a dashboard.

In this work, we present a toolkit that tackles these issues on different levels: (1) to extract heterogeneous information, it provides independent data extractors and web scrapers; (2) data processing is done with independent semantic analysis services that are easily deployed; (3) a configurable Big Data orchestrator controls the execution of extraction and processing tasks; (4) the end result is presented in a sensible and interactive format with a modular visualization framework based on Web Components that connects to different sources such as SPARQL and ElasticSearch endpoints. Data workflows can be defined by connecting different extractors and analysis services. The different elements of this toolkit interoperate through a linked data principled approach and a set of common ontologies. To illustrate the usefulness of this toolkit, this work describes several use cases in which the toolkit has been successfully applied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.polymer-project.org/.

  2. 2.

    Sentiment140, MeaningCloud and IBM Watson are online sentiment analysis services available at http://www.sentiment140.com/, https://www.meaningcloud.com/ and https://www.ibm.com/watson/services/natural-language-understanding/, respectively.

  3. 3.

    GSICrawler’s documentation: https://gsicrawler.readthedocs.io.

  4. 4.

    https://github.com/gsi-upm/gsicrawler.

  5. 5.

    https://d3js.org/.

  6. 6.

    Sefarad’s documentation: http://sefarad.readthedocs.io/.

  7. 7.

    https://github.com/gsi-upm/sefarad.

  8. 8.

    Soneti’s documentation: https://soneti.readthedocs.io/.

  9. 9.

    https://reactjs.org/.

References

  1. Aramburu García, P.: Design and development of a sentiment analysis system on Facebook from political domain. Master’s thesis, ETSI Telecomunicación, June 2017

    Google Scholar 

  2. Barbado, R.: Design of a prototype of a big data analysis system of online radicalism based on semantic and deep learning technologies. TFM, ETSI Telecomunicación, June 2018

    Google Scholar 

  3. Bermejo, R.: Desarrollo de un framework HTML5 de Visualización y Consulta Semántica de Repositorios RDF. Master’s thesis, Universidad Politécnica de Madrid, June 2014

    Google Scholar 

  4. Breslin, J.G., Decker, S., Harth, A., Bojars, U.: SIOC: an approach to connect web-based communities. Int. J. Web Based Commun. 2(2), 133–142 (2006)

    Article  Google Scholar 

  5. Carmona, J.E.: Development of a social media crawler for sentiment analysis. Master’s thesis, ETSI Telecomunicación, February 2016

    Google Scholar 

  6. Conde-Sánchez, E.: Development of a social media monitoring system based on elasticsearch and web components technologies. Master’s thesis, ETSI Telecomunicación, June 2016

    Google Scholar 

  7. Díaz-Vega, R.: Design and implementation of an HTML5 framework for biodiversity and environmental information visualization based on geo linked data. Master’s thesis, ETSI Telecomunicación, December 2014

    Google Scholar 

  8. García-Castaño, J.: Development of a monitoring dashboard for sentiment and emotion in geolocated social media. Master’s thesis, ETSI Telecomunicación, July 2017

    Google Scholar 

  9. Glazkov, D., Weinstein, R., Ross, T.: HTML templates W3C working group note 18. Technical report, W3C, March 2014

    Google Scholar 

  10. Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. O’Reilly Media, Inc., Newton (2015)

    Google Scholar 

  11. Graves, M., Constabaris, A., Brickley, D.: FOAF: connecting people on the semantic web. Cat. Classif. Q. 43(3–4), 191–202 (2007)

    Google Scholar 

  12. Guha, R.V., Brickley, D., Macbeth, S.: Schema.org: evolution of structured data on the web. Commun. ACM 59(2), 44–51 (2016)

    Article  Google Scholar 

  13. Gupta, Y.: Kibana Essentials. Packt Publishing Ltd., Birmingham (2015)

    Google Scholar 

  14. Hellmann, S.: Integrating natural language processing (NLP) and language resources using linked data. Ph.D. thesis, Universität Leipzig (2013)

    Google Scholar 

  15. Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 98–113. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_7

    Chapter  Google Scholar 

  16. Hernando, M.: Development of a classifier of radical tweets using machine learning algorithms. Master’s thesis, ETSI Telecomunicación, January 2018

    Google Scholar 

  17. Ito, H.: Shadow DOM. Technical report, W3C, March 2018

    Google Scholar 

  18. Izquierdo-Mora, J.M.: Design and development of a lyrics emotion analysis system for creative industries. Master’s thesis, ETSI Telecomunicación, January 2018

    Google Scholar 

  19. Jena, A.: Apache Jena Fuseki. The Apache Software Foundation (2014)

    Google Scholar 

  20. Kotliar, M., Kartashov, A., Barski, A.: CWL-Airflow: a lightweight pipeline manager supporting common workflow language. bioRxiv p. 249243 (2018)

    Google Scholar 

  21. Kouzis-Loukas, D.: Learning Scrapy. Packt Publishing Ltd., Birmingham (2016)

    Google Scholar 

  22. Krug, M.: Distributed event-based communication for web components. In: Proceedings of Studierendensymposium Informatik 2016 der TU Chemnitz, pp. 133–136 (2016)

    Google Scholar 

  23. Lampa, S., Alvarsson, J., Spjuth, O.: Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles. J. Cheminform. 8(1), 67 (2016)

    Article  Google Scholar 

  24. Missier, P., Belhajjame, K., Cheney, J.: The W3C PROV family of specifications for modelling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 773–776. ACM (2013)

    Google Scholar 

  25. Moreno Sánchez, C.: Design and development of an affect analysis system for football matches in Twitter based on a corpus annotated with a crowdsourcing platform. Master’s thesis, ETSI Telecomunicación (2018)

    Google Scholar 

  26. Morita, H., Glazkov, D.: HTML imports. W3C working draft, W3C, February 2016

    Google Scholar 

  27. Ochoa, J.: Design and Implementation of a scraping system for sport news. Master’s thesis, ETSI Telecomunicación, February 2017

    Google Scholar 

  28. Pascual-Saavedra, A.: Development of a dashboard for sentiment analysis of football in Twitter based on web components and D3.js. Master’s thesis, ETSI Telecomunicación, June 2016

    Google Scholar 

  29. Pinterest: Pinball. https://github.com/pinterest/pinball

  30. Ranic, T., Gusev, M.: Overview of workflow management systems. In: Proceedings of the 14th International Conference for Informatics and Information Technology, CIIT 2017. Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Macedonia (2017)

    Google Scholar 

  31. Sánchez-Rada, J.F., Iglesias, C.A.: Onyx: a linked data approach to emotion representation. Inf. Process. Manag. 52(1), 99–114 (2016)

    Article  Google Scholar 

  32. Sánchez-Rada, J.F., Iglesias, C.A., Corcuera, I., Araque, O.: Senpy: a pragmatic linked sentiment analysis framework. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 735–742. IEEE (2016)

    Google Scholar 

  33. Sánchez-Rada, J.F., Iglesias, C.A., Gil, R.: A linked data model for multimodal sentiment and emotion analysis. In: Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications, pp. 11–19. Association for Computational Linguistics, Beijing, July 2015

    Google Scholar 

  34. Sánchez-Rada, J.F., Iglesias, C.A., Sagha, H., Schuller, B., Wood, I., Buitelaar, P.: Multimodal multimodel emotion analysis as linked data. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 111–116. IEEE (2017)

    Google Scholar 

  35. Sánchez-Rada, J.F., Torres, M., Iglesias, C.A., Maestre, R., Peinado, R.: A linked data approach to sentiment and emotion analysis of Twitter in the financial domain. In: Second International Workshop on Finance and Economics on the Semantic Web, FEOSW 2014, vol. 1240, pp. 51–62, May 2014. http://ceur-ws.org/Vol-1240/

  36. Saura Villanueva, A.: Development of a framework for geolinked data query and visualization based on web components. PFC, ETSI Telecomunicación, June 2015

    Google Scholar 

  37. Schröder, M., Baggia, P., Burkhardt, F., Pelachaud, C., Peter, C., Zovato, E.: EmotionML – an upcoming standard for representing emotions and related states. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011. LNCS, vol. 6974, pp. 316–325. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24600-5_35

    Chapter  Google Scholar 

  38. Souto, D.S.: Design and development of a system for sleep disorder characterization using social media mining. Master’s thesis, ETSI Telecomunicación, ETSIT, Madrid, June 2018

    Google Scholar 

  39. Sporny, M., Kellogg, G., Lanthaler, M.: JSON-LD 1.0, January 2014. http://json-ld.org/spec/latest/json-ld/

  40. Spotify: Luigi. https://github.com/spotify/luigi

  41. Stephen, J.J., Savvides, S., Sundaram, V., Ardekani, M.S., Eugster, P.: STYX: stream processing with trustworthy cloud-based execution. In: Proceedings of the Seventh ACM Symposium on Cloud Computing, pp. 348–360. ACM (2016)

    Google Scholar 

  42. Stokolosa, V.: Communication between components (2018). https://hackernoon.com/communication-between-components-7898467ce15b

  43. Thusoo, A., et al.: Hive-a petabyte scale data warehouse using Hadoop. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE (2010)

    Google Scholar 

  44. Torres, M.: Prototype of stock prediction system based on Twitter emotion and sentiment analysis. Master’s thesis, ETSI Telecomunicación, July 2014

    Google Scholar 

  45. Warr, W.A.: Scientific workflow systems: pipeline pilot and Knime. J. Comput.-Aided Mol. Des. 26(7), 801–804 (2012)

    Article  Google Scholar 

  46. Westerski, A., Iglesias, C.A., Tapia, F.: Linked opinions: describing sentiments on the structured web of data. In: Proceedings of the Fourth International Workshop on Social Data on the Web, SDoW2011, pp. 21–32. CEUR, October 2011

    Google Scholar 

  47. WHATWG (Apple, Google, Mozilla, Microsoft): HTML living standard. Technical report, W3C, July 2018

    Google Scholar 

  48. White, T.: Hadoop: The Definitive Guide. O’Reilly Media Inc., Newton (2012)

    Google Scholar 

  49. Wilde, E., Duerst, M.: URI fragment identifiers for the text/plain media type, April 2008

    Google Scholar 

  50. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

The authors want to thank Roberto Bermejo, Alejandro Saura, Rubén Díaz and José Carmona for working on previous versions of the toolkit. In addition, we want to thank Marcos Torres, Jorge García-Castaño, Pablo Aramburu, Rodrigo Barbado, Jose \(\text {M}^{\text {a}}\) Izquierdo, Mario Hernando, Carlos Moreno, Javier Ochoa and Daniel Souto, who have applied the toolkit in different domains. Lastly, we also thank our partners at Taiger and HI-Iberia for using the toolkit and collaborating in the integration of their analysis services with the toolkit as part of project SoMeDi (ITEA3 16011).

This work is supported by ITEA 3 EUREKA Cluster programme together with the National Spanish Funding Agencies CDTI (INNO-20161089) and MINETAD (TSI-102600-2016-1), the Spanish Ministry of Economy and Competitiveness under the R&D project SEMOLA (TEC2015-68284-R) and by the European Union through the project Trivalent (Grant Agreement no: 740934).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Fernando Sánchez-Rada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sánchez-Rada, J.F., Pascual, A., Conde, E., Iglesias, C.A. (2018). A Big Linked Data Toolkit for Social Media Analysis and Visualization Based on W3C Web Components. In: Panetto, H., Debruyne, C., Proper, H., Ardagna, C., Roman, D., Meersman, R. (eds) On the Move to Meaningful Internet Systems. OTM 2018 Conferences. OTM 2018. Lecture Notes in Computer Science(), vol 11230. Springer, Cham. https://doi.org/10.1007/978-3-030-02671-4_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02671-4_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02670-7

  • Online ISBN: 978-3-030-02671-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics