GBRTVis: online analysis of gradient boosting regression tree

Abstract

Visualizations of machine learning models have developed rapidly during these days, attracting great interests of industry and researchers. However, a pipeline that visualizations are created from logged data is a time-consuming process. In this work, we adopt progressive visual analytics to propose a new pipeline to facilitate the visual analysis progress of gradient boosting regression tree (GBRT). Visualizations such as tree view, instances view, and cluster view are created according to different types of data in real time. Users can explore GBRT with different visualization components interactively through GBRTVis. Case studies demonstrate that our pipeline can improve the efficiency of the training process and understanding. Furthermore, we propose a mixed structure of GBRT to improve itself. Two tests on different datasets show the effectiveness of the improvement.

Graphical Abstract

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics, pp 1027–1035

  2. Barlow T, Neville P (2001) Case study: visualization for decision tree analysis in data mining. In: IEEE symposium on information visualization, 2001. INFOVIS 2001, IEEE, pp 149–152

  3. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305

    MathSciNet  MATH  Google Scholar 

  4. Bostock M, Ogievetsky V, Heer J (2011) \(\text{ D }^3\) data-driven documents. IEEE Trans Vis Comput Graph 17(12):2301–2309

    Article  Google Scholar 

  5. Do TN (2007) Towards simple, easy to understand, an interactive decision tree algorithm. College Information Technology Can tho University, Can Tho, Vietnam, technology report, pp 06–01

  6. Do TN, Poulet F (2004) Enhancing SVM with visualization. In: International conference on discovery science. Springer, New York, pp 183–194

  7. El-Assady M, Sevastjanova R, Sperrle F, Keim D, Collins C (2017) Progressive learning of topic modeling parameters: a visual analytics framework. IEEE Trans Vis Comput Graph 24:382–391

    Article  Google Scholar 

  8. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    MathSciNet  MATH  Article  Google Scholar 

  9. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    MathSciNet  MATH  Article  Google Scholar 

  10. Guerra-Gómez J, Pack ML, Plaisant C, Shneiderman B (2013) Visualizing change over time using dynamic hierarchies: Treeversity2 and the stemview. IEEE Trans Vis Comput Graph 19(12):2566–2575

    Article  Google Scholar 

  11. Holten D, Van Wijk JJ (2008) Visual comparison of hierarchically organized data. Comput Graph Forum 27(3):759–766

    Article  Google Scholar 

  12. Holten D, Van Wijk JJ (2009) Force-directed edge bundling for graph visualization. Comput Graph Forum 28(3):983–990

    Article  Google Scholar 

  13. Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688

    Article  Google Scholar 

  14. Jain N, Mangal P, Mehta D (2015) Angularjs: a modern mvc framework in javascript. J Glob Res Comput Sci 5(12):17–23

    Google Scholar 

  15. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, ACM, pp 675–678

  16. Kahng M, Andrews PY, Kalro A, Chau DHP (2018) Activis: visual exploration of industry-scale deep neural network models. IEEE Trans Vis Comput Graph 24(1):88–97

    Article  Google Scholar 

  17. Korel B (1990) Automated software test data generation. IEEE Trans Softw Eng 16(8):870–879

    Article  Google Scholar 

  18. Kwon BC, Eysenbach B, Verma J, Ng K, De Filippi C, Stewart WF, Perer A (2018) Clustervision: visual supervision of unsupervised clustering. IEEE Trans Vis Comput Graph 24(1):142–151

    Article  Google Scholar 

  19. Lamping J, Rao R, Pirolli P (1995) A focus+ context technique based on hyperbolic geometry for visualizing large hierarchies. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press/Addison-Wesley Publishing Co., pp 401–408

  20. Liu M, Shi J, Li Z, Li C, Zhu J, Liu S (2017) Towards better analysis of deep convolutional neural networks. IEEE Trans Vis Comput Graph 23(1):91–100

    Article  Google Scholar 

  21. Liu S, Wang X, Liu M, Zhu J (2017) Towards better analysis of machine learning models: a visual analytics perspective. Vis Inform 1(1):48–56

    Article  Google Scholar 

  22. Liu S, Xiao J, Liu J, Wang X, Wu J, Zhu J (2017) Visual diagnosis of tree boosting methods. IEEE Trans Vis Comput Graph 24:163–173

    Article  Google Scholar 

  23. Liu Y, Salvendy G (2007) Interactive visual decision tree classification. In: International conference on human-computer interaction, Springer, pp 92–105

  24. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  25. Munzner T, Guimbretière F, Tasiran S, Zhang L, Zhou Y (2003) Treejuxtaposer: scalable tree comparison using focus+ context with guaranteed visibility. ACM Trans Graph 22(3):453–462

    Article  Google Scholar 

  26. Musser DR (1997) Introspective sorting and selection algorithms. Softw Pract Exper 27(8):983–993

    Article  Google Scholar 

  27. Nguyen TD, Ho TB, Shimodaira H (2000) Interactive visualization in mining large decision trees. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 345–348

  28. Otto group product classification challenge (https://www.kaggle.com/c/otto-group-product-classification-challenge)

  29. Paiva JGS, Schwartz WR, Pedrini H, Minghim R (2015) An approach to supporting incremental visual data classification. IEEE Trans Vis Comput Graph 21(1):4–17

    Article  Google Scholar 

  30. Palmas G, Bachynskyi M, Oulasvirta A, Seidel HP, Weinkauf T (2014) An edge-bundling layout for interactive parallel coordinates. In: Visualization symposium (PacificVis), 2014 IEEE Pacific, IEEE, pp 57–64

  31. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830

    MathSciNet  MATH  Google Scholar 

  32. Pezzotti N, Höllt T, Van Gemert J, Lelieveldt BP, Eisemann E, Vilanova A (2018) Deepeyes: progressive visual analytics for designing deep neural networks. IEEE Trans Vis Comput Graph 24(1):98–108

    Article  Google Scholar 

  33. Pham NK, Do TN, Poulet F, Morin A (2007) Interactive exploration of decision tree results. In: International symposium on applied stochastic models and data analysis, ASMDA’07, pp 152–160

  34. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  35. Ren D, Amershi S, Lee B, Suh J, Williams JD (2017) Squares: supporting interactive performance analysis for multiclass classifiers. IEEE Trans Vis Comput Graph 23(1):61–70

    Article  Google Scholar 

  36. Sedlmair M, Heinzl C, Bruckner S, Piringer H, Möller T (2014) Visual parameter space analysis: a conceptual framework. IEEE Trans Vis Comput Graph 20(12):2161–2170

    Article  Google Scholar 

  37. Shneiderman B, Plaisant C (1998) Treemaps for space-constrained visualization of hierarchies

  38. Stolper CD, Perer A, Gotz D (2014) Progressive visual analytics: user-driven visual exploration of in-progress analytics. IEEE Trans Vis Comput Graph 20(12):1653–1662

    Article  Google Scholar 

  39. Tikir MM, Hollingsworth JK (2002) Efficient instrumentation for code coverage testing. In: ACM SIGSOFT software engineering notes, vol. 27, ACM, pp 86–96

  40. Tu Y, Shen HW (2007) Visualizing changes of hierarchical data using treemaps. IEEE Trans Vis Comput Graph 13(6):1286–1293

    Article  Google Scholar 

  41. Van Den Elzen S, van Wijk JJ (2011) Baobabview: interactive construction and analysis of decision trees. In: IEEE conference on visual analytics science and technology (VAST), pp 151–160

  42. Ware M, Frank E, Holmes G, Hall M, Witten IH (2001) Interactive machine learning: letting users build classifiers. Int J Human-Comput Stud 55(3):281–292

    MATH  Article  Google Scholar 

  43. Wine quality data set (http://archive.ics.uci.edu/ml/datasets/Wine+Quality)

  44. Wongsuphasawat K, Smilkov D, Wexler J, Wilson J, Mané D, Fritz D, Krishnan D, Vigas FB, Wattenberg M (2018) Visualizing dataflow graphs of deep learning models in tensorflow. IEEE Trans Vis Comput Graph. http://idl.cs.washington.edu/papers/tfgraph

  45. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67(2):301–320

    MathSciNet  MATH  Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grants (No. 61672237, 61802339, 61802128). In addition, we thank the four anonymous reviewers for their constructive comments that helped us improve the quality of this manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Changbo Wang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huang, Y., Liu, Y., Li, C. et al. GBRTVis: online analysis of gradient boosting regression tree. J Vis 22, 125–140 (2019). https://doi.org/10.1007/s12650-018-0514-2

Download citation

Keywords

  • Model analysis
  • Online visualization
  • Interaction
  • Mixed structure