Panel Data Analysis Via Variable Selection and Subject Clustering

Lu, Haibing; Huang, Shengsheng; Li, Yingjiu; Yang, Yanjiang

doi:10.1007/978-3-642-45252-9_5

Haibing Lu³,
Shengsheng Huang⁴,
Yingjiu Li⁵ &
…
Yanjiang Yang⁶

Part of the book series: Studies in Big Data ((SBD,volume 3))

3526 Accesses

Abstract

A panel data set contains observations on multiple phenomena observed over multiple time periods for the same subjects (e.g., firms or individuals). Panel data sets frequently appeared in the study of Marketing, Economics, and many other social sciences. An important panel data analysis task is to analyze and predict a variable of interest. As in social sciences, the number of collected data records for each subject is usually not large enough to support accurate and reliable data analysis, a common solution is to pool all subjects together and then run a linear regression method in attempt to discover the underlying relationship between the variable of interest and other observed variables. However, this method suffers from two limitations. First, subjects might not be poolable due to their heterogeneous nature. Second, not all variables might have significant relationships to the variable of interest. A regression on many irrelevant regressors will lead to wrong predictions. To address these two issues, we propose a novel approach, called Selecting and Clustering, which derives underlying linear models by first selecting variables highly correlated to the variable of interest and then clustering subjects into homogenous groups of the same linear models with respect to those variables. Furthermore, we build an optimization model to formulate this problem, the solution of which enables one to select variables and clustering subjects simultaneously. Due to the combinatorial nature of the problem, an effective and efficient algorithm is proposed. Studies on real data sets validate the effectiveness of our approach as our approach performs significantly better than other existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Baltagi, B.H.: Econometric Analysis of Panel Data, 3rd ed. Wiley, Chichester (2005)
Google Scholar
Baltagi, B.H., Griffin, J.M.: Pooled estimators vs. their heterogeneous counterparts in the context of dynamic demand for gasoline. J. Econometrics 77(2), 303–327 (1997)
Google Scholar
Durlauf, S.N., Johnson, P.A.: Multiple regimes and cross-country growth behaviour. J. Appl. Econometrics 10(4), 365–384 (1995)
Google Scholar
Kapetanios, G.: Cluster analysis of panel data sets using non-standard optimisation of information criteria. J. Econ. Dyn. Control 30(8), 1389–1408 (2006)
Google Scholar
Pesaran, M.H., Smith, R.: Estimating long-run relationships from dynamic heterogeneous panels. J. Econometrics 68(1), 79–113 (1995)
Google Scholar
Maddala, G.S., Wu, S.: Cross-country growth regressions: problems of heterogeneity, stability and interpretation. Appl. Econ. 32(5), 635–642 (2000)
Google Scholar
Vahid, F.: Clustering Regression Functions in a Panel. Monash University. Clayton (2000)
Google Scholar
DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5(2), 249–282 (1988)
Google Scholar
Baltagi, B.H., Griffin, J.M.: Gasolne Demand in the OECD: An application of pooling and testing procedures, gasolne demand in the OECD: an application of pooling and testing procedures. Testing for country heterogeneity in growth models using a finite mixture approach. J. Appl. Econometrics 23(4), 487–514 (2008)
Google Scholar
Castellacci, F.: Evolutionary and new growth theories. Are they converging? J. Econ. Surv. 21(3), 585–627 (2007)
Google Scholar
Castellacci, F., Archibugi, D.: The technology clubs: the distribution of knowledge across nations. Res. Policy 37(10), 1659–1673 (2008)
Google Scholar
Su, J.J.: Convergence clubs among 15 oecd countries. Appl. Econ. Lett. 10(2), 113 (2003)
Google Scholar
Zhang, B.: Regression Clustering, p. 451. IEEE Computer Society, Washington (2003)
Google Scholar
Späth, H.: Algorithm 39: clusterwise linear regression. Computing 22, 367–373 (1979)
Google Scholar
Gaffney, S., Smyth, P.: Trajectory Clustering with Mixtures of Regression Models, pp. 63–72. ACM, New York, (1999)
Google Scholar
Torgo, L., Da Costa, J.P.: Clustered partial linear regression. Mach. Learn. 50(3), 303–319 (2003)
Google Scholar
Ross, S.M.: Simulation, 3rd edn. (Statistical Modeling and Decision Science) (Hardcover). Academic Press, San Diego (2002)
Google Scholar
Besag, J., Green, P., Higdon, D., Mengersen, K.: Bayesian computation and stochastic systems. Statist. Sci. 10(1), 43–46 (1995)
Google Scholar
Baltagi, B.H., Griffin, J.M.: Gasolne demand in the oecd: an application of pooling and testing procedures. Eur. Econ. Rev. 22, 117–137 (1983)
Google Scholar

Download references

Author information

Authors and Affiliations

Santa Clara University, Santa Clara, United States
Haibing Lu
University of Houston—Victoria, Victoria, United States
Shengsheng Huang
Singapore Management University, Singapore, Singapore
Yingjiu Li
Institute for Infocomm Research, Singapore, Singapore
Yanjiang Yang

Authors

Haibing Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shengsheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yingjiu Li
View author publications
You can also search for this author in PubMed Google Scholar
Yanjiang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haibing Lu .

Editor information

Editors and Affiliations

Faculty of Commerce, Kansai University, Osaka, Japan
Katsutoshi Yada

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lu, H., Huang, S., Li, Y., Yang, Y. (2014). Panel Data Analysis Via Variable Selection and Subject Clustering. In: Yada, K. (eds) Data Mining for Service. Studies in Big Data, vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45252-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-45252-9_5
Published: 04 January 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45251-2
Online ISBN: 978-3-642-45252-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics