Advertisement

Journal of Visualization

, Volume 22, Issue 1, pp 125–140 | Cite as

GBRTVis: online analysis of gradient boosting regression tree

  • Yifei Huang
  • Yuhua Liu
  • Chenhui Li
  • Changbo WangEmail author
Regular Paper
  • 81 Downloads

Abstract

Visualizations of machine learning models have developed rapidly during these days, attracting great interests of industry and researchers. However, a pipeline that visualizations are created from logged data is a time-consuming process. In this work, we adopt progressive visual analytics to propose a new pipeline to facilitate the visual analysis progress of gradient boosting regression tree (GBRT). Visualizations such as tree view, instances view, and cluster view are created according to different types of data in real time. Users can explore GBRT with different visualization components interactively through GBRTVis. Case studies demonstrate that our pipeline can improve the efficiency of the training process and understanding. Furthermore, we propose a mixed structure of GBRT to improve itself. Two tests on different datasets show the effectiveness of the improvement.

Graphical Abstract

Keywords

Model analysis Online visualization Interaction Mixed structure 

Notes

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grants (No. 61672237, 61802339, 61802128). In addition, we thank the four anonymous reviewers for their constructive comments that helped us improve the quality of this manuscript.

References

  1. Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics, pp 1027–1035Google Scholar
  2. Barlow T, Neville P (2001) Case study: visualization for decision tree analysis in data mining. In: IEEE symposium on information visualization, 2001. INFOVIS 2001, IEEE, pp 149–152Google Scholar
  3. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305MathSciNetzbMATHGoogle Scholar
  4. Bostock M, Ogievetsky V, Heer J (2011) \(\text{ D }^3\) data-driven documents. IEEE Trans Vis Comput Graph 17(12):2301–2309Google Scholar
  5. Do TN (2007) Towards simple, easy to understand, an interactive decision tree algorithm. College Information Technology Can tho University, Can Tho, Vietnam, technology report, pp 06–01Google Scholar
  6. Do TN, Poulet F (2004) Enhancing SVM with visualization. In: International conference on discovery science. Springer, New York, pp 183–194Google Scholar
  7. El-Assady M, Sevastjanova R, Sperrle F, Keim D, Collins C (2017) Progressive learning of topic modeling parameters: a visual analytics framework. IEEE Trans Vis Comput Graph 24:382–391Google Scholar
  8. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232MathSciNetzbMATHGoogle Scholar
  9. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378MathSciNetzbMATHGoogle Scholar
  10. Guerra-Gómez J, Pack ML, Plaisant C, Shneiderman B (2013) Visualizing change over time using dynamic hierarchies: Treeversity2 and the stemview. IEEE Trans Vis Comput Graph 19(12):2566–2575Google Scholar
  11. Holten D, Van Wijk JJ (2008) Visual comparison of hierarchically organized data. Comput Graph Forum 27(3):759–766Google Scholar
  12. Holten D, Van Wijk JJ (2009) Force-directed edge bundling for graph visualization. Comput Graph Forum 28(3):983–990Google Scholar
  13. Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688Google Scholar
  14. Jain N, Mangal P, Mehta D (2015) Angularjs: a modern mvc framework in javascript. J Glob Res Comput Sci 5(12):17–23Google Scholar
  15. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, ACM, pp 675–678Google Scholar
  16. Kahng M, Andrews PY, Kalro A, Chau DHP (2018) Activis: visual exploration of industry-scale deep neural network models. IEEE Trans Vis Comput Graph 24(1):88–97Google Scholar
  17. Korel B (1990) Automated software test data generation. IEEE Trans Softw Eng 16(8):870–879Google Scholar
  18. Kwon BC, Eysenbach B, Verma J, Ng K, De Filippi C, Stewart WF, Perer A (2018) Clustervision: visual supervision of unsupervised clustering. IEEE Trans Vis Comput Graph 24(1):142–151Google Scholar
  19. Lamping J, Rao R, Pirolli P (1995) A focus+ context technique based on hyperbolic geometry for visualizing large hierarchies. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press/Addison-Wesley Publishing Co., pp 401–408Google Scholar
  20. Liu M, Shi J, Li Z, Li C, Zhu J, Liu S (2017) Towards better analysis of deep convolutional neural networks. IEEE Trans Vis Comput Graph 23(1):91–100Google Scholar
  21. Liu S, Wang X, Liu M, Zhu J (2017) Towards better analysis of machine learning models: a visual analytics perspective. Vis Inform 1(1):48–56Google Scholar
  22. Liu S, Xiao J, Liu J, Wang X, Wu J, Zhu J (2017) Visual diagnosis of tree boosting methods. IEEE Trans Vis Comput Graph 24:163–173Google Scholar
  23. Liu Y, Salvendy G (2007) Interactive visual decision tree classification. In: International conference on human-computer interaction, Springer, pp 92–105Google Scholar
  24. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440Google Scholar
  25. Munzner T, Guimbretière F, Tasiran S, Zhang L, Zhou Y (2003) Treejuxtaposer: scalable tree comparison using focus+ context with guaranteed visibility. ACM Trans Graph 22(3):453–462Google Scholar
  26. Musser DR (1997) Introspective sorting and selection algorithms. Softw Pract Exper 27(8):983–993Google Scholar
  27. Nguyen TD, Ho TB, Shimodaira H (2000) Interactive visualization in mining large decision trees. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 345–348Google Scholar
  28. Otto group product classification challenge (https://www.kaggle.com/c/otto-group-product-classification-challenge)
  29. Paiva JGS, Schwartz WR, Pedrini H, Minghim R (2015) An approach to supporting incremental visual data classification. IEEE Trans Vis Comput Graph 21(1):4–17Google Scholar
  30. Palmas G, Bachynskyi M, Oulasvirta A, Seidel HP, Weinkauf T (2014) An edge-bundling layout for interactive parallel coordinates. In: Visualization symposium (PacificVis), 2014 IEEE Pacific, IEEE, pp 57–64Google Scholar
  31. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830MathSciNetzbMATHGoogle Scholar
  32. Pezzotti N, Höllt T, Van Gemert J, Lelieveldt BP, Eisemann E, Vilanova A (2018) Deepeyes: progressive visual analytics for designing deep neural networks. IEEE Trans Vis Comput Graph 24(1):98–108Google Scholar
  33. Pham NK, Do TN, Poulet F, Morin A (2007) Interactive exploration of decision tree results. In: International symposium on applied stochastic models and data analysis, ASMDA’07, pp 152–160Google Scholar
  34. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106Google Scholar
  35. Ren D, Amershi S, Lee B, Suh J, Williams JD (2017) Squares: supporting interactive performance analysis for multiclass classifiers. IEEE Trans Vis Comput Graph 23(1):61–70Google Scholar
  36. Sedlmair M, Heinzl C, Bruckner S, Piringer H, Möller T (2014) Visual parameter space analysis: a conceptual framework. IEEE Trans Vis Comput Graph 20(12):2161–2170Google Scholar
  37. Shneiderman B, Plaisant C (1998) Treemaps for space-constrained visualization of hierarchiesGoogle Scholar
  38. Stolper CD, Perer A, Gotz D (2014) Progressive visual analytics: user-driven visual exploration of in-progress analytics. IEEE Trans Vis Comput Graph 20(12):1653–1662Google Scholar
  39. Tikir MM, Hollingsworth JK (2002) Efficient instrumentation for code coverage testing. In: ACM SIGSOFT software engineering notes, vol. 27, ACM, pp 86–96Google Scholar
  40. Tu Y, Shen HW (2007) Visualizing changes of hierarchical data using treemaps. IEEE Trans Vis Comput Graph 13(6):1286–1293Google Scholar
  41. Van Den Elzen S, van Wijk JJ (2011) Baobabview: interactive construction and analysis of decision trees. In: IEEE conference on visual analytics science and technology (VAST), pp 151–160Google Scholar
  42. Ware M, Frank E, Holmes G, Hall M, Witten IH (2001) Interactive machine learning: letting users build classifiers. Int J Human-Comput Stud 55(3):281–292zbMATHGoogle Scholar
  43. Wongsuphasawat K, Smilkov D, Wexler J, Wilson J, Mané D, Fritz D, Krishnan D, Vigas FB, Wattenberg M (2018) Visualizing dataflow graphs of deep learning models in tensorflow. IEEE Trans Vis Comput Graph. http://idl.cs.washington.edu/papers/tfgraph
  44. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67(2):301–320MathSciNetzbMATHGoogle Scholar

Copyright information

© The Visualization Society of Japan 2018

Authors and Affiliations

  • Yifei Huang
    • 1
  • Yuhua Liu
    • 2
  • Chenhui Li
    • 1
  • Changbo Wang
    • 1
    Email author
  1. 1.School of Computer Science and Software EngineeringEast China Normal UniversityShanghaiChina
  2. 2.Zhejiang University of Finance and EconomicsHangzhouChina

Personalised recommendations