Lumina: an adaptive, automated and extensible prototype for exploring, enriching and visualizing data

Abstract

Given a tabular dataset which should be graphically represented, how could the current complex visualization pipeline be improved? Could we produce a more visually enriched final representation, while minimizing the user intervention? Most of the existing approaches lack in capacity to provide a simplified end-to-end solution and leave the intricate process of setting up the data connections to the user. Their results mainly depend on necessary user actions at every step of the visualization pipeline and fail to consider the data structural properties and the constantly rising volume of open and linked data. This work is motivated by the need of a flexible framework which will improve the user experience and interaction by simplifying the process and enhancing the result, capitalizing on the enrichment of the final visualization based on the semantic analysis of linked data. We propose Lumina, a visualization framework, which : (a) builds on structural data analytics and semantic analysis principles, (b) increases the explainability and expressiveness of the visualization leveraging open data and semantic enrichment, (c) minimizes user interventions at every step of the visualization pipeline and (d) fulfills the growing need for open-source, modular and self-hosted solutions. Using publicly available read-world datasets, we validate the adaptability of Lumina and demonstrate the effectiveness and practicality of our method, in comparison to other open source solutions.

Graphic abstract

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Notes

  1. 1.

    https://www.tableau.com/.

  2. 2.

    https://www.tableau.com/learn/articles/data-visualization.

  3. 3.

    https://powerbi.microsoft.com/en-us/.

  4. 4.

    https://d3js.org.

  5. 5.

    https://www.adobe.com/products/illustrator.html.

  6. 6.

    https://developers.google.com/chart.

  7. 7.

    https://github.com/vega/polestar.

  8. 8.

    https://fontawesome.com/.

  9. 9.

    https://en.wikipedia.org/wiki/Webdings.

References

  1. Bostock M, Ogievetsky V, Heer J (2011) D\(^3\) data-driven documents. IEEE Trans Vis Comput Gr 17(12):2301–2309

    Google Scholar 

  2. Bradford L (2018) How open-source development is democratizing the tech industry. https://www.forbes.com/sites/laurencebradford/2018/03/26/how-open-source-development-is-democratizing-the-tech-industry/#6141022a3bb6

  3. Bryan C, Ma K-L, Woodring J (2016) Temporal summary images: an approach to narrative visualization via interactive annotation generation and placement. IEEE Trans Vis Comput Gr 23(1):511–520

    Article  Google Scholar 

  4. Card M (1999) Readings in information visualization: using vision to think. Morgan Kaufmann, Burlington

    Google Scholar 

  5. Cui Z, Badam SK, Yalçin MA, Elmqvist N (2019) Datasite: proactive visual data exploration with computation of insight-based recommendations. Inf Vis 18(2):251–267

    Article  Google Scholar 

  6. Cui W, Zhang X, Wang Y, Huang H, Chen B, Fang L, Zhang H, Lou J-G, Zhang D (2019) Text-to-viz: automatic generation of infographics from proportion-related natural language statements. IEEE Trans Vis Comput Gr 26:906–916

    Article  Google Scholar 

  7. Daniel Heward-Mills (2017) Using Tableau as a launch point for semantic web of linked data exploration. https://medium.com/virtuoso-blog/virtuoso-tableau-sparql-f9411852a87d

  8. Gandy D (2015) Font awesome, the iconic font and css toolkit. Fortawesome. github. io

  9. Gilson O, Silva N, Grant PW, Chen M (2008) From web data to visualization via ontology mapping. Computer graphics forum, vol 27. Wiley Online Library, New Jersey, pp 959–966

    Google Scholar 

  10. Grammel L, Bennett C, Tory M, Storey M-AD(2013) A survey of visualization construction user interfaces. In: EuroVis (Short Papers). Citeseer

  11. Heer J, Bostock M (2010) Declarative language design for interactive visualization. IEEE Trans Vis Comput Gr 16(6):1149–1156

    Article  Google Scholar 

  12. Helliwell JF,  Layard R, Sachs JD(2019) World happiness report 2019. New York: Sustainable development solutions network

  13. Hullman J, Diakopoulos N, Adar E (2013) Contextifier: automatic generation of annotated stock visualizations. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 2707–2716

  14. Johnson I (2018) The trouble with D3 DailyJS medium. https://medium.com/dailyjs/the-trouble-with-d3-4a84f7de011f

  15. Kong H-K, Liu Z, Karahalios K (2017) “Internal and external visual cue preferences for visualizations in presentations. Computer graphics forum, vol 36. Wiley Online Library, New Jersey, pp 515–525

    Google Scholar 

  16. Kosara R (2016) Presentation-oriented visualization techniques. IEEE Comput Gr Appl 36(1):80–85

    Article  Google Scholar 

  17. Liu Z, Thompson J, Wilson A, Dontcheva M, Delorey J, Grigg S, Kerr B, Stasko J (2018) Data illustrator: augmenting vector design tools with lazy data binding for expressive visualization authoring. In: Proceedings of the 2018 CHI conference on human factors in computing systems. ACM, p 123

  18. Luo MR, Cui G, Rigg B (2001) The development of the cie 2000 colour-difference formula: Ciede 2000. Color Res Appl 26(5):340–350

    Article  Google Scholar 

  19. Luo Y,  Qin X, Tang N, Li G (2018) Deepeye: towards automatic data visualization. In: 2018 IEEE 34th international conference on data engineering (ICDE). IEEE , pp 101–112

  20. Mackinlay J (1986) Automating the design of graphical presentations of relational information. ACM Trans Gr (Tog) 5(2):110–141

    Article  Google Scholar 

  21. Mei H, Ma Y, Wei Y, Chen W (2018) The design space of construction tools for information visualization: a survey. J Vis Lang Comput 44:120–132

    Article  Google Scholar 

  22. Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  23. Moritz D, Wang C, Nelson GL, Lin H, Smith AM, Howe B, Heer J (2018) Formalizing visualization design knowledge as constraints: actionable and extensible models in draco. IEEE Trans Vis Comput Gr 25(1):438–448

    Article  Google Scholar 

  24. Morstatter F, Kumar S, Liu H, Maciejewski R(2013) Understanding twitter data with tweetxplorer. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1482–1485

  25. Onorati T, Díaz P, Carrion B (2019) From social networks to emergency operation centers: a semantic visualization approach. Future Gener Comput Syst 95:829–840

    Article  Google Scholar 

  26. Ren D, Lee B, Brehmer M (2018) Charticulator: interactive construction of bespoke chart layouts. IEEE Tran Vis Comput Gr 25(1):789–799

    Article  Google Scholar 

  27. Roy RS, Singh A, Chawla P, Saxena S, Sinha AR (2017) Automatic assignment of topical icons to documents for faster file navigation. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1, IEEE, pp 1338–1345

  28. Satyanarayan A, Heer J (2014) Lyra: an interactive visualization design environment. Computer graphics forum, vol 33. Wiley Online Library, New Jersey, pp 351–360

    Google Scholar 

  29. Satyanarayan A, Moritz D, Wongsuphasawat K, Heer J (2016) Vega-lite: a grammar of interactive graphics. IEEE Trans Vis Comput Gr 23(1):341–350

    Article  Google Scholar 

  30. Sears A, Jacko JA (2007) The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications. CRC Press, Boca Raton

    Google Scholar 

  31. Setlur V, Stone MC (2015) A linguistic approach to categorical color assignment for data visualization. IEEE Trans Vis Comput Gr 22(1):698–707

    Article  Google Scholar 

  32. Setlur V, Battersby SE, Tory M, Gossweiler R, Chang AX(2016) Eviza: a natural language interface for visual analysis. In: Proceedings of the 29th annual symposium on user interface software and technology. ACM, pp 365–377

  33. Setlur V, Mackinlay JD (2014) Automatic generation of semantic icon encodings for visualizations. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 541–550

  34. Srinivasan A, Drucker SM, Endert A, Stasko J (2018) Augmenting visualizations with interactive data facts to facilitate interpretation and communication. IEEE Trans Vis Comput Gr 25(1):672–681

    Article  Google Scholar 

  35. Sun Y, Leigh J, Johnson A, Di Eugenio B (2014) Articulate: Creating meaningful visualizations from natural language. In: Innovative approaches of data visualization and visual analytics. IGI Global, pp 218–235

  36. Syed Z, Finin T, Mulwad V, Joshi A et al. (2010) Exploiting a web of semantic data for interpreting tables. In: Proceedings of the second web science conference

  37. Tufte ER (1986) The visual display of quantitative information. Graphics Press, Cheshire

    Google Scholar 

  38. Tufte ER, Goeler NH, Benson R (1990) Envisioning information, vol 126. Graphics press, Cheshire

    Google Scholar 

  39. Viegas FB, Wattenberg M, Van Ham F, Kriss J, McKeon M (2007) Manyeyes: a site for visualization at internet scale. IEEE Trans Vis Comput Gr 13(6):1121–1128

    Article  Google Scholar 

  40. Voigt M, Pietschmann S, Grammel L , Meißner K (2012) “Context-aware recommendation of visualization components,” In The 4th international conference on information, process, and knowledge management (eKNOW). Citeseer, pp 101–109

  41. Wang Y, Sun Z, Zhang H, Cui W, Xu K, Ma X, Zhang D (2019) Datashot: automatic generation of fact sheets from tabular data. IEEE Trans Vis Comput Gr 26:895–905

    Article  Google Scholar 

  42. Wang J, Wang H , Wang Z, Zhu KQ(2012) Understanding tables on the web. In: International conference on conceptual modeling. Springer, pp 141–155

  43. Wongsuphasawat K , Qu Z, Moritz D, Chang R, Ouk F, Anand A, Mackinlay J, Howe B, Heer J (2017) Voyager 2: augmenting visual analysis with partial view specifications. In: Proceedings of the 2017 CHI conference on human factors in computing systems. ACM, pp 2648–2659

  44. Wongsuphasawat K, Moritz D, Anand A, Mackinlay J, Howe B, Heer J (2015) Voyager: exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans Vis Comput Gr 22(1):649–658

    Article  Google Scholar 

  45. Wongsuphasawat K, Moritz D, Anand A, Mackinlay J, Howe B, Heer J (2016) Towards a general-purpose query language for visualization recommendation. In: Proceedings of the workshop on human-in-the-loop data analytics. ACM, p 4

  46. Yu B, Silva CT (2016) Visflow-web-based visualization framework for tabular data with a subset flow model. IEEE Trans Vis Comput Gr 23(1):251–260

    Article  Google Scholar 

  47. Yu B, Silva CT (2019) Flowsense: a natural language interface for visual data exploration within a dataflow system. IEEE Trans Vis Comput Gr 26(1):1–11

    Article  Google Scholar 

  48. Zwicklbauer S, Einsiedler C, Granitzer M, Seifert C (2013) Towards disambiguating web tables. In: International Semantic Web Conference (Posters & Demos), pp 205–208

Download references

Acknowledgements

This research has been cofinanced by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH CREATE INNOVATE (Project Code: T1EDK-03052), as well as from the H2020 Research and Innovation Programme under Grant Agreement No. 780121.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ilias Dimitriadis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Semantic concepts

RDF - known as Resource Description Framework - is a W3C specification that is heavily used for knowledge management in web applications and provides models to describe knowledge as web resources and relationships between those web resources. It is designed to be mostly readable by computers connected on the world wide web and relies heavily on XML. RDF is used to provide semantic meaning to the information available on the web. Describes resources (web resources identified using URIS) using properties (resources also identified by URIS) and property values (other resources or literal values). RDF statements are in the form of triplets: subject predicate object. Such triplets are used in special datastores called RDF-stores or RDF-triplet stores and form the basis of describing a larger knowledge graph of semantically connected resources. Such knowledge graphs are maintained in online projects, such as DBpedia and Wikidata, that contain structured information with semantic meaning mainly retrieved from Wikipedia articles. DBpedia and Wikidata maintain open source knowledge base graphs of semantic information extracted from various Wikimedia projects, mainly Wikipedia. They provide public API endpoints for issuing semantic queries to the knowledge graph. Both of them also present structured information of concepts in the form of HTML pages. Knowledge graphs based on RDF triples can be queried by a special query language for the semantic web called SPARQL. Public knowledge graph projects, such as the ones mentioned above, expose HTTP REST APIs that support SPARQL queries. SPARQL queries can retrieve values from structured or semi structured and assist exploration by querying relationships between resources.

For example in DBpedia concepts are entities of a specific ontology type, such as Woman is an entity of type Person. Person is a subClass of Agent and Agent is a subClass of the root ontology owl:Thing. By issuing a specific SPARQL query in the DBpedia API we can retrieve all top-level classes that are direct ancestors of owl:Thing, thus compiling a list of the DBpedia’s core concepts, see Fig. 15:

Fig. 15
figure15

DBpedia html page with semantic information for the concept of Woman

In Wikidata, each item represents a semantic topic directly identifiable by an ID prefixed with Q. For example, the item with ID Q1 maps to the semantic concept of Universe as shown in Fig. 16

Fig. 16
figure16

Wikidata page describing the item Q1 that maps to the semantic concept: universe

The root class of Wikidata items is a special Entity item described by the ID: Q35120. By Issuing a specific SPARQL query we can retrieve all direct ancestors of the Entity item.

Semantic analysis also involves a large amount of text and name analysis on raw data and specially meanings behind words, such as using the Wordnet system. Python includes a powerful framework widely used in Industry and Academics for natural language processing named NLTK-Natural Language Toolkit. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as Wordnet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries (https://www.nltk.org/). Wordnet is a lexical database for the English Language, which mainly groups English words into synsets (sets of synonyms) providing also Part Of Speech information for each word (Noun, verb, adjective, adverb). Words are linked by semantic relationships, such as hypernyms or hyponyms and form semantic hierarchies. Hypernym of a word is a general term that the word belongs to, for example worker is a Hypernym of word skilled worker. Person is a Hypernym of word worker. Hyponym of a word is a more specific term, whose semantic meaning is included in the meaning of the original word. For example, soccer ball is a hyponym of the word ball. Wordnet corpus is included in the Natural Language Toolkit library for Python. By exploring programmatically, the hypernym closures of random words the root hypernym of all is the synset entity.n.01 . Wordnet synsets are described by the following string pattern <word>.<letter for Part of speech>.<instance number sense>. Synset entity.n.01 refers to the first sense of wordentity as a noun.

Appendix B Semantic enrichment, symbol and color retrieval

The Semantic Analysis phase produces a data model that maintains a dictionary with all the recognized semantic concepts and named entities. The named entities recognized in the semantic analysis process are forwarded to the Named Entity Analysis process which further recognizes the individual named entity types and tries to gather Linked data from external data sources such as DBpedia, Wikidata, Twitter API and Geolocation services, see Fig. 17.

Fig. 17
figure17

High level system design view

The next step is to feed all recognized semantic terms to the back-end library and retrieve relevant symbol assets (icons) and colors which are stored in reference objects in the internal data model. These symbols and color references will be ready to be used in the visualization process.

A great source for generic vector representations of everyday concepts is the highly popular FontAwesome library that includes   1500 icons in SVG format with complete search metadata. By analyzing FontAwesome search metadata files, a list of semantic topics was extracted and used to calculate popular hypernyms in Wordnet sorted by order of occurrences (Top 100). The top elements were used as a set to cover the basic semantic concepts that all words can be reduced to using hypernym closures. This ensures that we can retrieve relevant visualizations for most terms encountered in data. Visualization service maintains a semantic library of symbols represented by vector images. Each symbol is mapped to a core semantic concept and maintains metadata with other related semantic terms. For example, the specific semantic concept of hospital can be represented by a vector symbol stored in a hospital.svg file and also carry references to the following semantic terms: building, medical, health, service , see Fig. 18

Fig. 18
figure18

Symbol for semantic concept of hospital

Another set of useful vector representations added in the vector library were the Flags of all Countries along with semantic metadata for country codes, telephone prefixes and coordinates. Representations for the ISO 7001 Public information symbols were also added, as well as a specialized list of emoji representations (According to Unicode standards), that were linked semantically with sentiment concepts. Lumina includes the Visual service API which allows for color management. It supports a wide list of known color names mapped to HEX, RGB and HSL values. The list contains a couple of thousands of colors retrieved from specific domains such as X11 and HTML4 spec. A second list of semantic mappings to color names or color values is maintained. Visual service API provides calls for generating color palettes, based on arrays of semantic terms, see Fig. 19.

Fig. 19
figure19

API blueprint for requesting colors by terms

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kagkelidis, K., Dimitriadis, I. & Vakali, A. Lumina: an adaptive, automated and extensible prototype for exploring, enriching and visualizing data. J Vis (2021). https://doi.org/10.1007/s12650-020-00718-y

Download citation

Keywords

  • Information visualization
  • Semantics
  • Automatic visualization
  • Open source visualization software