Skip to main content

GBRTVis: online analysis of gradient boosting regression tree

Abstract

Visualizations of machine learning models have developed rapidly during these days, attracting great interests of industry and researchers. However, a pipeline that visualizations are created from logged data is a time-consuming process. In this work, we adopt progressive visual analytics to propose a new pipeline to facilitate the visual analysis progress of gradient boosting regression tree (GBRT). Visualizations such as tree view, instances view, and cluster view are created according to different types of data in real time. Users can explore GBRT with different visualization components interactively through GBRTVis. Case studies demonstrate that our pipeline can improve the efficiency of the training process and understanding. Furthermore, we propose a mixed structure of GBRT to improve itself. Two tests on different datasets show the effectiveness of the improvement.

Graphical Abstract

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, society for industrial and applied mathematics, pp 1027–1035

  • Barlow T, Neville P (2001) Case study: visualization for decision tree analysis in data mining. In: IEEE symposium on information visualization, 2001. INFOVIS 2001, IEEE, pp 149–152

  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305

    MathSciNet  MATH  Google Scholar 

  • Bostock M, Ogievetsky V, Heer J (2011) \(\text{ D }^3\) data-driven documents. IEEE Trans Vis Comput Graph 17(12):2301–2309

    Article  Google Scholar 

  • Do TN (2007) Towards simple, easy to understand, an interactive decision tree algorithm. College Information Technology Can tho University, Can Tho, Vietnam, technology report, pp 06–01

  • Do TN, Poulet F (2004) Enhancing SVM with visualization. In: International conference on discovery science. Springer, New York, pp 183–194

  • El-Assady M, Sevastjanova R, Sperrle F, Keim D, Collins C (2017) Progressive learning of topic modeling parameters: a visual analytics framework. IEEE Trans Vis Comput Graph 24:382–391

    Article  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    MathSciNet  MATH  Article  Google Scholar 

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    MathSciNet  MATH  Article  Google Scholar 

  • Guerra-Gómez J, Pack ML, Plaisant C, Shneiderman B (2013) Visualizing change over time using dynamic hierarchies: Treeversity2 and the stemview. IEEE Trans Vis Comput Graph 19(12):2566–2575

    Article  Google Scholar 

  • Holten D, Van Wijk JJ (2008) Visual comparison of hierarchically organized data. Comput Graph Forum 27(3):759–766

    Article  Google Scholar 

  • Holten D, Van Wijk JJ (2009) Force-directed edge bundling for graph visualization. Comput Graph Forum 28(3):983–990

    Article  Google Scholar 

  • Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688

    Article  Google Scholar 

  • Jain N, Mangal P, Mehta D (2015) Angularjs: a modern mvc framework in javascript. J Glob Res Comput Sci 5(12):17–23

    Google Scholar 

  • Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, ACM, pp 675–678

  • Kahng M, Andrews PY, Kalro A, Chau DHP (2018) Activis: visual exploration of industry-scale deep neural network models. IEEE Trans Vis Comput Graph 24(1):88–97

    Article  Google Scholar 

  • Korel B (1990) Automated software test data generation. IEEE Trans Softw Eng 16(8):870–879

    Article  Google Scholar 

  • Kwon BC, Eysenbach B, Verma J, Ng K, De Filippi C, Stewart WF, Perer A (2018) Clustervision: visual supervision of unsupervised clustering. IEEE Trans Vis Comput Graph 24(1):142–151

    Article  Google Scholar 

  • Lamping J, Rao R, Pirolli P (1995) A focus+ context technique based on hyperbolic geometry for visualizing large hierarchies. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press/Addison-Wesley Publishing Co., pp 401–408

  • Liu M, Shi J, Li Z, Li C, Zhu J, Liu S (2017) Towards better analysis of deep convolutional neural networks. IEEE Trans Vis Comput Graph 23(1):91–100

    Article  Google Scholar 

  • Liu S, Wang X, Liu M, Zhu J (2017) Towards better analysis of machine learning models: a visual analytics perspective. Vis Inform 1(1):48–56

    Article  Google Scholar 

  • Liu S, Xiao J, Liu J, Wang X, Wu J, Zhu J (2017) Visual diagnosis of tree boosting methods. IEEE Trans Vis Comput Graph 24:163–173

    Article  Google Scholar 

  • Liu Y, Salvendy G (2007) Interactive visual decision tree classification. In: International conference on human-computer interaction, Springer, pp 92–105

  • Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  • Munzner T, Guimbretière F, Tasiran S, Zhang L, Zhou Y (2003) Treejuxtaposer: scalable tree comparison using focus+ context with guaranteed visibility. ACM Trans Graph 22(3):453–462

    Article  Google Scholar 

  • Musser DR (1997) Introspective sorting and selection algorithms. Softw Pract Exper 27(8):983–993

    Article  Google Scholar 

  • Nguyen TD, Ho TB, Shimodaira H (2000) Interactive visualization in mining large decision trees. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 345–348

  • Otto group product classification challenge (https://www.kaggle.com/c/otto-group-product-classification-challenge)

  • Paiva JGS, Schwartz WR, Pedrini H, Minghim R (2015) An approach to supporting incremental visual data classification. IEEE Trans Vis Comput Graph 21(1):4–17

    Article  Google Scholar 

  • Palmas G, Bachynskyi M, Oulasvirta A, Seidel HP, Weinkauf T (2014) An edge-bundling layout for interactive parallel coordinates. In: Visualization symposium (PacificVis), 2014 IEEE Pacific, IEEE, pp 57–64

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830

    MathSciNet  MATH  Google Scholar 

  • Pezzotti N, Höllt T, Van Gemert J, Lelieveldt BP, Eisemann E, Vilanova A (2018) Deepeyes: progressive visual analytics for designing deep neural networks. IEEE Trans Vis Comput Graph 24(1):98–108

    Article  Google Scholar 

  • Pham NK, Do TN, Poulet F, Morin A (2007) Interactive exploration of decision tree results. In: International symposium on applied stochastic models and data analysis, ASMDA’07, pp 152–160

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  • Ren D, Amershi S, Lee B, Suh J, Williams JD (2017) Squares: supporting interactive performance analysis for multiclass classifiers. IEEE Trans Vis Comput Graph 23(1):61–70

    Article  Google Scholar 

  • Sedlmair M, Heinzl C, Bruckner S, Piringer H, Möller T (2014) Visual parameter space analysis: a conceptual framework. IEEE Trans Vis Comput Graph 20(12):2161–2170

    Article  Google Scholar 

  • Shneiderman B, Plaisant C (1998) Treemaps for space-constrained visualization of hierarchies

  • Stolper CD, Perer A, Gotz D (2014) Progressive visual analytics: user-driven visual exploration of in-progress analytics. IEEE Trans Vis Comput Graph 20(12):1653–1662

    Article  Google Scholar 

  • Tikir MM, Hollingsworth JK (2002) Efficient instrumentation for code coverage testing. In: ACM SIGSOFT software engineering notes, vol. 27, ACM, pp 86–96

  • Tu Y, Shen HW (2007) Visualizing changes of hierarchical data using treemaps. IEEE Trans Vis Comput Graph 13(6):1286–1293

    Article  Google Scholar 

  • Van Den Elzen S, van Wijk JJ (2011) Baobabview: interactive construction and analysis of decision trees. In: IEEE conference on visual analytics science and technology (VAST), pp 151–160

  • Ware M, Frank E, Holmes G, Hall M, Witten IH (2001) Interactive machine learning: letting users build classifiers. Int J Human-Comput Stud 55(3):281–292

    MATH  Article  Google Scholar 

  • Wine quality data set (http://archive.ics.uci.edu/ml/datasets/Wine+Quality)

  • Wongsuphasawat K, Smilkov D, Wexler J, Wilson J, Mané D, Fritz D, Krishnan D, Vigas FB, Wattenberg M (2018) Visualizing dataflow graphs of deep learning models in tensorflow. IEEE Trans Vis Comput Graph. http://idl.cs.washington.edu/papers/tfgraph

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67(2):301–320

    MathSciNet  MATH  Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grants (No. 61672237, 61802339, 61802128). In addition, we thank the four anonymous reviewers for their constructive comments that helped us improve the quality of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Changbo Wang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huang, Y., Liu, Y., Li, C. et al. GBRTVis: online analysis of gradient boosting regression tree. J Vis 22, 125–140 (2019). https://doi.org/10.1007/s12650-018-0514-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12650-018-0514-2

Keywords

  • Model analysis
  • Online visualization
  • Interaction
  • Mixed structure