Skip to main content

Panel Data Analysis Via Variable Selection and Subject Clustering

  • Chapter
  • First Online:
Data Mining for Service

Part of the book series: Studies in Big Data ((SBD,volume 3))

  • 3526 Accesses

Abstract

A panel data set contains observations on multiple phenomena observed over multiple time periods for the same subjects (e.g., firms or individuals). Panel data sets frequently appeared in the study of Marketing, Economics, and many other social sciences. An important panel data analysis task is to analyze and predict a variable of interest. As in social sciences, the number of collected data records for each subject is usually not large enough to support accurate and reliable data analysis, a common solution is to pool all subjects together and then run a linear regression method in attempt to discover the underlying relationship between the variable of interest and other observed variables. However, this method suffers from two limitations. First, subjects might not be poolable due to their heterogeneous nature. Second, not all variables might have significant relationships to the variable of interest. A regression on many irrelevant regressors will lead to wrong predictions. To address these two issues, we propose a novel approach, called Selecting and Clustering, which derives underlying linear models by first selecting variables highly correlated to the variable of interest and then clustering subjects into homogenous groups of the same linear models with respect to those variables. Furthermore, we build an optimization model to formulate this problem, the solution of which enables one to select variables and clustering subjects simultaneously. Due to the combinatorial nature of the problem, an effective and efficient algorithm is proposed. Studies on real data sets validate the effectiveness of our approach as our approach performs significantly better than other existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://people.stern.nyu.edu/wgreene/Econometrics/PanelDataSets.htm

  2. 2.

    http://pages.stern.nyu.edu/~wgreene/Econometrics/PanelDataSets.htm

  3. 3.

    http://pages.stern.nyu.edu/~wgreene/Econometrics/PanelDataSets.htm

References

  1. Baltagi, B.H.: Econometric Analysis of Panel Data, 3rd ed. Wiley, Chichester (2005)

    Google Scholar 

  2. Baltagi, B.H., Griffin, J.M.: Pooled estimators vs. their heterogeneous counterparts in the context of dynamic demand for gasoline. J. Econometrics 77(2), 303–327 (1997)

    Google Scholar 

  3. Durlauf, S.N., Johnson, P.A.: Multiple regimes and cross-country growth behaviour. J. Appl. Econometrics 10(4), 365–384 (1995)

    Google Scholar 

  4. Kapetanios, G.: Cluster analysis of panel data sets using non-standard optimisation of information criteria. J. Econ. Dyn. Control 30(8), 1389–1408 (2006)

    Google Scholar 

  5. Pesaran, M.H., Smith, R.: Estimating long-run relationships from dynamic heterogeneous panels. J. Econometrics 68(1), 79–113 (1995)

    Google Scholar 

  6. Maddala, G.S., Wu, S.: Cross-country growth regressions: problems of heterogeneity, stability and interpretation. Appl. Econ. 32(5), 635–642 (2000)

    Google Scholar 

  7. Vahid, F.: Clustering Regression Functions in a Panel. Monash University. Clayton (2000)

    Google Scholar 

  8. DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5(2), 249–282 (1988)

    Google Scholar 

  9. Baltagi, B.H., Griffin, J.M.: Gasolne Demand in the OECD: An application of pooling and testing procedures, gasolne demand in the OECD: an application of pooling and testing procedures. Testing for country heterogeneity in growth models using a finite mixture approach. J. Appl. Econometrics 23(4), 487–514 (2008)

    Google Scholar 

  10. Castellacci, F.: Evolutionary and new growth theories. Are they converging? J. Econ. Surv. 21(3), 585–627 (2007)

    Google Scholar 

  11. Castellacci, F., Archibugi, D.: The technology clubs: the distribution of knowledge across nations. Res. Policy 37(10), 1659–1673 (2008)

    Google Scholar 

  12. Su, J.J.: Convergence clubs among 15 oecd countries. Appl. Econ. Lett. 10(2), 113 (2003)

    Google Scholar 

  13. Zhang, B.: Regression Clustering, p. 451. IEEE Computer Society, Washington (2003)

    Google Scholar 

  14. Späth, H.: Algorithm 39: clusterwise linear regression. Computing 22, 367–373 (1979)

    Google Scholar 

  15. Gaffney, S., Smyth, P.: Trajectory Clustering with Mixtures of Regression Models, pp. 63–72. ACM, New York, (1999)

    Google Scholar 

  16. Torgo, L., Da Costa, J.P.: Clustered partial linear regression. Mach. Learn. 50(3), 303–319 (2003)

    Google Scholar 

  17. Ross, S.M.: Simulation, 3rd edn. (Statistical Modeling and Decision Science) (Hardcover). Academic Press, San Diego (2002)

    Google Scholar 

  18. Besag, J., Green, P., Higdon, D., Mengersen, K.: Bayesian computation and stochastic systems. Statist. Sci. 10(1), 43–46 (1995)

    Google Scholar 

  19. Baltagi, B.H., Griffin, J.M.: Gasolne demand in the oecd: an application of pooling and testing procedures. Eur. Econ. Rev. 22, 117–137 (1983)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haibing Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lu, H., Huang, S., Li, Y., Yang, Y. (2014). Panel Data Analysis Via Variable Selection and Subject Clustering. In: Yada, K. (eds) Data Mining for Service. Studies in Big Data, vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45252-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45252-9_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45251-2

  • Online ISBN: 978-3-642-45252-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics