Abstract
A wide variety of predictive analytics techniques have been developed in statistics, machine learning and data mining; however, many of these algorithms take a black-box approach in which data is input and future predictions are output with no insight into what goes on during the process. Unfortunately, such a closed system approach often leaves little room for injecting domain expertise and can result in frustration from analysts when results seem spurious or confusing. In order to allow for more human-centric approaches, the visualization community has begun developing methods to enable users to incorporate expert knowledge into the prediction process at all stages, including data cleaning, feature selection, model building and model validation. This paper surveys current progress and trends in predictive visual analytics, identifies the common framework in which predictive visual analytics systems operate, and develops a summarization of the predictive analytics workflow.
Similar content being viewed by others
References
Larose D T, Larose C D. Data Mining and Predictive Analytics, 2nd ed. Hoboken: John Wiley & Sons, 2015
Schlangenstein M. UPS crunches data to make more routes more efficient, save gas. http://www.bloomberg.com/news/articles/2013-10-30/ups-uses-big-data-to-make-routes-more-efficient-save-gas, 2013
Ginsberg J, MohebbiMH, Patel R S, Brammer L, SmolinskiMS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature, 2009, 457(7232): 1012–1014
Butler D. When Google got flu wrong. Nature, 2013, 494(7436): 155–156
Culotta A. Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the 1st Workshop on Social Media Analytics. 2010, 115–122
Lazer D, Kennedy R, King G, Vespignani A. The parable of Google flu: traps in big data analysis. Science, 2014, 343(6176): 1203–1205
Keim D A, Kohlhammer J, Ellis G, Mansmann F. Mastering the Information Age — Solving Problems with Visual Analytics. Goslar: Florian Mansmann, 2010
Bertini E, Lalanne D. Surveying the complementary role of automatic data analysis and visualization in knowledge discovery. In: Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration. 2009, 12–20
Sacha D, Stoffel A, Stoffel F, Kwon B C, Ellis G, Keim D. Knowledge generation model for visual analytics. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1604–1613
El-Assady M, Jentner W, Stein M, Fischer F, Schreck T, Keim D. Predictive visual analytics —approaches for movie ratings and discussion of open research challenges. In: Proceedings of IEEE VIS Workshop: Visualization for Predictive Analytics. 2014
Krause J, Perer A, Bertini E. INFUSE: interactive feature selection for predictive modeling of high dimensional data. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1614–1623
Gleicher M. Position paper: towards comprehensible predictive modeling. In: Proceedings of IEEE VIS Workshop: Visualization for Predictive Analytics. 2014
Kandel S, Paepcke A, Hellerstein J, Heer J. Wrangler: interactive visual specification of data transformation scripts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2011, 3363–3372
Rahm E, Do H H. Data cleaning: problems and current approaches. IEEE Data Eng. Bull., 2000, 23(4): 3–13
Kim W, Choi B J, Hong E K, Kim S K, Lee D. A taxonomy of dirty data. Data Mining and Knowledge Discovery, 2003, 7(1): 81–99
Ganuza M L, Ferracutti G, Gargiulo M F, Castro S M, Bjerg E, Gröller E, Matković K. The spinel explorer — interactive visual analysis of spinel group minerals. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1913–1922
Brown E T, Ottley A, Zhao H, Lin Q, Souvenir R, Endert A, Chang R. Finding waldo: learning about users from their interactions. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1663–1672
Born S, Sundermann S H, Russ C, Hopf R, Ruiz C E, Falk V, GessatM. Stent maps — comparative visualization for the prediction of adverse events of transcatheter aortic valve implantations. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2704–2713
Xie C, Chen W, Huang X X, Hu Y Q, Barlowe S, Yang J. VAET: a visual analytics approach for e-transactions time-series. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1743–1752
Madhavan K, Elmqvist N, Vorvoreanu M, Chen X, Wong Y, Xian H, Dong Z, Johri A. Dia2: Web-based cyberinfrastructure for visual analysis of funding portfolios. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1823–1832
Hao M C, Janetzko H, Mittelstädt S, Hill W, Dayal U, Keim D A, Marwah M, Sharma R K. A visual analytics approach for peak-preserving prediction of large seasonal time series. Computer Graphics Forum, 2011, 30(3): 691–700
Hao M C, Marwah M, Janetzko H, Dayal U, Keim D A, Patnaik D, Ramakrishnan N, Sharma R K. Visual exploration of frequent patterns in multivariate time series. Information Visualization, 2012, 11(1): 71–83
Malik A, Maciejewski R, Towers S, McCullough S, Ebert D S. Proactive spatiotemporal resource allocation and predictive visual analytics for community policing and law enforcement. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1863–1872
Hollt T, Magdy A, Zhan P, Chen G, Gopalakrishnan G, Hoteit I, Hansen C D, Hadwiger M. Ovis: a framework for visual analysis of ocean forecast ensembles. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(8): 1114–1126
Doraiswamy H, Ferreira N, Damoulas T, Freire J, Silva C T. Using topological analysis to support event-guided exploration in urban data. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2634–2643
Chen W, Guo F, Wang F Y. A survey of traffic data visualization. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(6): 2970–2984
Koch S, John M, Worner M, Muller A, Ertl T. Varifocalreader-in-depth visual analysis of large text documents. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1723–1732
Zhao J, Cao N, Wen Z, Song Y, Lin Y R, Collins C M. # FluxFlow: visual analysis of anomalous information spreading on social media. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1773–1782
Sun G, Wu Y, Liu S, Peng T Q, Zhu J J, Liang R. EvoRiver: visual analysis of topic coopetition on social media. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1753–1762
Klemm P, Oeltze-Jafra S, Lawonn K, Hegenscheid K, Volzke H, Preim B. Interactive visual analysis of image-centric cohort study data. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1673–1682
Arietta S M, Efros A, Ramamoorthi R, Agrawala M. City forensics: using visual elements to predict non-visual city attributes. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2624–2633
Ma Y X, Xu J Y, Peng D C, Zhang T, Jin C Z, Qu HM, ChenW, Peng Q S. A visual analysis approach for community detection of multi-context mobile social networks. Journal of Computer Science and Technology, 2013, 28(5): 797–809
Van den Elzen S, Holten D, Blaas J, Van Wijk J J. Dynamic network visualization with extended massive sequence views. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(8): 1087–1099
Van den Elzen S, Van Wijk J J. Multivariate network exploration and presentation: From detail to overview via selections and aggregations. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2310–2319
Van den Elzen S, Holten D, Blaas J, Van Wijk J J. Reducing snapshots to points: a visual analytics approach to dynamic network exploration. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 1–10
Gschwandtner T, Gärtner J, Aigner W, Miksch S. A taxonomy of dirty time-oriented data. In: Proceedings of International Conference on Availability, Reliability, and Security. 2012, 58–72
Eaton C, Plaisant C, Drizd T. Visualizing missing data: graph interpretation user study. In: Proceedings of IFIP Conference on HumanComputer Interaction. 2005, 861–872
Templ M, Alfons A, Filzmoser P. Exploring incomplete data using visualization techniques. Advances in Data Analysis and Classification, 2012, 6(1): 29–47
Lin J, Wong J, Nichols J, Cypher A, Lau T A. End-user programming of mashups with vegemite. In: Proceedings of the 14th International Conference on Intelligent User Interfaces. 2009, 97–106
Scaffidi C, Myers B, Shaw M. Intelligently creating and recommending reusable reformatting rules. In: Proceedings of the 14th International Conference on Intelligent User Interfaces. 2009, 297–306
Ives Z, Knoblock C, Minton S, Jacob M, Talukdar P, Tuchinda R, Ambite J L, Muslea M, Gazen C. Interactive data integration through smart copy & paste. In: Proceedings of the Biennial Conference on Innovative Data Systems Research. 2009
Kandel S, Heer J, Plaisant C, Kennedy J, Van Ham F, Riche N H, Weaver C, Lee B, Brodbeck D, Buono P. Research directions in data wrangling: visualizations and transformations for usable and credible data. Information Visualization, 2011, 10(4): 271–288
Robertson G G, Czerwinski M P, Churchill J E. Visualization of mappings between schemas. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2005, 431–439
Altova. Data integration: opportunities, challenges, and altova mapforce. http://www.altova.com/whitepapers/mapforce.pdf, 2014
Informatica. The informatica data quality methodology: a framework to achieve pervasive data quality through enhanced businessit collaboration. https://www.informatica.com/downloads/7130-DQMethodology-wp-web.pdf, 2010
Zheng Y. Methodologies for cross-domain data fusion: an overview. IEEE Transactions on Big Data, 2015, 1(1): 16–34
Dash M, Liu H. Feature selection for classification. Intelligent Data Analysis, 1997, 1(3): 131–156
Fogarty J, Hudson S E. Toolkit support for developing and deploying sensor-based statistical models of human situations. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2007, 135–144
Markovitch S, Rosenstein D. Feature generation using general constructor functions. Machine Learning, 2002, 49(1): 59–98
Schuller B, Reiter S, Rigoll G. Evolutionary feature generation in speech emotion recognition. In: Proceedings of the IEEE International Conference on Multimedia and Expo. 2006, 5–8
Guo D S. Coordinating computational and visual approaches for interactive feature selection and multivariate clustering. Information Visualization, 2003, 2(4): 232–246
Seo J, Shneiderman B. A rank-by-feature framework for unsupervised multidimensional data exploration using low dimensional projections. In: Proceedings of the IEEE Symposium on Information Visualization. 2004, 65–72
Piringer H, Berger W, Hauser H. Quantifying and comparing features in high-dimensional datasets. In: Proceedings of the 12th International Conference on Information Visualization. 2008, 240–245
May T, Bannach A, Davey J, Ruppert T, Kohlhammer J. Guiding feature subset selection with an interactive visualization. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2011, 111–120
Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1): 273–324
Klemm P, Lawonn K, Glaßer S, Niemann U, Hegenscheid K, Völzke H, Preim B. 3D regression heat map analysis of population study data. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 81–90
Lu Y, Wang F, Maciejewski R. Business intelligence from social media: a study from the vast box office challenge. IEEE Computer Graphics and Applications, 2014, 34(5): 58–69
Brooks M, Amershi S, Lee B, Drucker S M, Kapoor A, Simard P. Featureinsight: visual support for error-driven feature ideation in text classification. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2015, 105–112
Bögl M, Aigner W, Filzmoser P, Lammarsch T, Miksch S, Rind A. Visual analytics for model selection in time series analysis. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12): 2237–2246
Lu Y, Kruger R, Thom D, Wang F, Koch S, Ertl T, Maciejewski R. Integrating predictive analytics and social media. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2014, 193–202
Piringer H, Berger W, Krasser J. Hypermoval: Interactive visual validation of regression models for real-time simulation. Computer Graphics Forum, 2010, 29(3): 983–992
Mühlbacher T, Piringer H. A partition-based framework for building and validating regression models. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12): 1962–1971
Gotz D, Sun J. Visualizing accuracy to improve predictive model performance. In: Proceedings of the IEEE VISWorkshop on Visualization for Predictive Analytics. 2014
Quinlan J R. Induction of decision trees. Machine Learning, 1986, 1(1): 81–106
Suykens J A, Vandewalle J. Least squares support vector machine classifiers. Neural Processing Letters, 1999, 9(3): 293–300
Johnson B, Shneiderman B. Tree-maps: a space-filling approach to the visualization of hierarchical information structures. In: Proceedings of the IEEE Conference on Visualization. 1991, 284–291
Stasko J, Zhang E. Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. In: Proceedings of the IEEE Symposium on Information Visualization. 2000, 57–65
Ware M, Frank E, Holmes G, Hall M, Witten I H. Interactive machine learning: letting users build classifiers. International Journal of Human-Computer Studies, 2001, 55(3): 281–292
Ankerst M, Elsen C, Ester M, Kriegel H P. Visual classification: an interactive approach to decision tree construction. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999, 392–396
Van den Elzen S, Van Wijk J J. Baobabview: Interactive construction and analysis of decision trees. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2011, 151–160
Becker B, Kohavi R, Sommerfield D. Visualizing the simple Baysian classifier. In: Fayyad U, Grinstein G G, Wierse A, eds. Information Visualization in Data Mining and Knowledge Discovery. San Francisco: Morgan Kaufmann Publishers Inc., 2002
Caragea D, Cook D, Honavar V G. Gaining insights into support vector machine pattern classifiers using projection-based tour methods. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2001, 251–256
Ma Y. Easy SVM: a visual analysis approach for open-box support vector machines. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
John G H, Langley P. Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in artificial intelligence. 1995, 338–345
Ho T K. Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995, 278–282
Mühlbacher T, Piringer H, Gratzl S, Sedlmair M, Streit M. Opening the black box: strategies for increased user involvement in existing algorithm implementations. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1643–1652
Paiva J G S, Schwartz W R, Pedrini H, Minghim R. An approach to supporting incremental visual data classification. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(1): 4–17
Talbot J, Lee B, Kapoor A, Tan D S. EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2009, 1283–1292
Wu Y, Pitipornvivat N, Zhao J, Yang S, Huang G, Qu H. egoSlider: visual analysis of egocentric network evolution. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 260–269
Stolper C D, Perer A, Gotz D. Progressive visual analytics: user-driven visual exploration of in-progress analytics. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1653–1662
Ng K, Ghoting A, Steinhubl S R, Stewart W F, Malin B, Sun J. PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records. Journal of Biomedical Informatics, 2014, 48: 160–170
Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27
Bögl M, Aigner W, Filzmoser P, Gschwandtner T, Lammarsch T, Miksch S, Rind A. Visual analytics methods to guide diagnostics for time series model predictions. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
Andrienko N, Andrienko G, Rinzivillo S. Experiences from supporting predictive analytics of vehicle traffic. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
Maciejewski R, Hafen R, Rudolph S, Larew S G, Mitchell M, Cleveland W S, Ebert D S. Forecasting hotspots — a predictive analytics approach. IEEE Transactions on Visualization and Computer Graphics, 2011, 17(4): 440–453
Cleveland R B, Cleveland W S, McRae J E, Terpenning I. STL: a seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 1990, 6(1): 3–73
Bryan C, Wu X, Mniszewski S, Ma K L. Integrating predictive analytics into a spatiotemporal epidemic simulation. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2015, 17–24
Chuang J, Socher R. Interactive visualizations for deep learning. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
Yeon H, Jang Y. Predictive visual analytics using topic composition. In: Proceedings of the 8th International Symposium on Visual Information Communication and Interaction. 2015, 1–8
Wu Y C, Liu S X, Yan K, Liu M C, Wu F Z. OpinionFlow: visual analysis of opinion diffusion on social media. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1763–1772
Choo J, Lee H, Kihm J, Park H. iVisClassifier: an interactive visual analytics system for classification based on supervised dimension reduction. In: Proceedings of the IEEE Symposium on Visual Analytics Science and Technology. 2010, 27–34
Höferlin B, Netzel R, Höferlin M, Weiskopf D, Heidemann G. Interactive learning of ad-hoc classifiers for video visual analytics. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2012, 23–32
Heimerl F, Koch S, Bosch H, Ertl T. Visual classifier training for text document retrieval. IEEE Transactions on Visualization and Computer Graphics, 2012, 18(12): 2839–2848
Munzner T. Visualization Analysis and Design. Boca Raton: CRC Press, 2014
Delevingne L. Hedge fund robots crushed human rivals in 2014. http://www.cnbc.com/2015/01/05/hedge-fund-robots-crushed-humanrivals-in-2014.html, 2015
Seifert M, Hadida A L. On the relative importance of linear model and human judge(s) in combined forecasting. Organizational Behavior and Human Decision Processes, 2013, 120(1): 24–36
Ruchikachorn P, Mueller K. Learning visualizations by analogy: promoting visual literacy through visualization morphing. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(9): 1028–1044
Amini F, Rufiange S, Hossain Z, Ventura Q, Irani P, McGuffin MJ. The impact of interactivity on comprehending 2D and 3D visualizations of movement data. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(1): 122–135
Acknowledgements
This work was supported by National Basic Research Program of China (973 Program) (2015CB352503), Major Program of the National Natural Science Foundation of China (61232012), the National Natural Science Foundation of China (Grant Nos. 61303141, 61422211, u1536118, u1536119), Zhejiang Provincial Natural Science Foundation of China (LR13F020001), the Fundamental Research Funds for the Central Universities, the Innovation Joint Research Center for Cyber-Physical-Society System, and the United State’s National Science Foundation (1350573).
Author information
Authors and Affiliations
Corresponding author
Additional information
Junhua Lu is currently working toward the PhD degree with the State Key Laboratory of Computer Aided Design and Computer Graphics, Zhejiang University, China. His research interests include visualization and visual analytics.
Wei Chen is currently a professor at the State Key Laboratory of Computer Aided Design and Computer Graphics, Zhejiang University, China. He has published more than 60 papers in international journals and conferences. Prof. Chen served as a steering committee member of the IEEE Pacific Visualization, the conference chair of the IEEE Pacific Visualization 2015, and a paper cochair of the IEEE Pacific Visualization 2014. He is an awardee of the NSFC Excellent Young Scholars Program in 2014.
Yuxin Ma is currently working toward the PhD degree with the State Key Laboratory of Computer Aided Design and Computer Graphics, Zhejiang University, China. His research interests include visual analytics and visual data mining.
Junming Ke is an undergraduate student of Zhejiang University of Technology, China. He is undergoing an internship in the State Key Laboratory of Computer Aided Design and Computer Graphics, Zhejiang University, China.
Zongzhuang Li is an undergraduate student of Zhejiang University (ZJU), China. He is working on his graduation proposal in the State Key Laboratory of Computer Aided Design and Computer Graphics, ZJU.
Fan Zhang is currently an associate professor at the Zhejiang University of Technology, China. His research interests include visual analytics and parallel computing.
Ross Maciejewski is an assistant professor in the School of Computing, Informatics & Decision Systems Engineering, Arizona State University, USA. His primary research interests are in the areas of geographical visualization and visual analytics focusing on public health, dietary analysis, social media, and criminal incident reports. He has served on the organizing committee for the IEEE Conference on Visual Analytics Science and Technology (2012-2013, 2015) and the IEEE/VGTC EuroVis Conference (2014-2016) and has been involved in award winning submissions to the IEEE Visual Analytics Contest (2010, 2013, 2015). He is also a fellow of the Global Security Initiative at ASU and the recipient of an NSF CAREER Award (2014).
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Lu, J., Chen, W., Ma, Y. et al. Recent progress and trends in predictive visual analytics. Front. Comput. Sci. 11, 192–207 (2017). https://doi.org/10.1007/s11704-016-6028-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-016-6028-y