Ten challenges in modeling bibliographic data for bibliometric analysis

Ferrara, Alfio; Salini, Silvia

doi:10.1007/s11192-012-0810-x

Ten challenges in modeling bibliographic data for bibliometric analysis

Published: 15 July 2012

Volume 93, pages 765–785, (2012)
Cite this article

Scientometrics Aims and scope Submit manuscript

Alfio Ferrara¹ &
Silvia Salini²

1348 Accesses
24 Citations
Explore all metrics

Abstract

The complexity and variety of bibliographic data is growing, and efforts to define new methodologies and techniques for bibliometric analysis are intensifying. In this complex scenario, one of the most crucial issues is the quality of data and the capability of bibliometric analysis to cope with multiple data dimensions. Although the problem of enforcing a multidimensional approach to the analysis and management of bibliographic data is not new, a reference design pattern and a specific conceptual model for multidimensional analysis of bibliographic data are still missing. In this paper, we discuss ten of the most relevant challenges for bibliometric analysis when dealing with multidimensional data, and we propose a reference data model that, according to different goals, can help analysis designers and bibliographic experts in working with large collections of bibliographic data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

A detailed description of each fact schema according to DFM is given in the following sections.
The model has also been tested on a collection of about 8,000 publications in the research area of databases and data modeling.
A very important contribution about the statistical issues in comparing institutional performance is [22].
For a detailed overview of the international rankings, see [19]; for the quality assessment of composite indicators, see [3].
http://bulletin.imstat.org/2011/09/presidential-address-peter-hall/.

References

Agrawal, R., Gupta, A., Sarawagi, S. (1997). Modeling multidimensional databases. In: Proceedings of the Thirteenth International Conference on Data Engineering, ICDE ’97, (pp. 232–243). Washington, DC, USA: IEEE Computer Society. http://portal.acm.org/citation.cfm?id=645482.653299.
Bakkalbasi, N., Bauer, K., Glover, J., Wang, L. (2006). Three options for citation tracking: Google scholar, scopus and web of science. Biomedical digital libraries, 3(1), 7.
Article Google Scholar
Benito, M., Romera, R. (2011). Improving quality assessment of composite indicators in university rankings: A case study of french and german universities of excellence. Scientometrics, 89, 153–176.
Article Google Scholar
Blei, D., Lafferty, J. (2006). Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning (pp. 113–120). New York: ACM.
Blei, D., Lafferty, J. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17–35.
Article MATH MathSciNet Google Scholar
Blei, D., Lafferty, J. (2009). Topic models. Text mining: classification, clustering, and applications, 10, 71.
Article Google Scholar
Blei, D., Ng, A., Jordan, M. (2003) Latent Dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.
MATH Google Scholar
Borg, I., Groenen, P. (2005). Modern multidimensional scaling: Theory and applications. Berlin: Springer.
MATH Google Scholar
Brockwell, P., Davis, R. (2002). Introduction to time series and forecasting. Berlin: Springer.
Book MATH Google Scholar
Bryk, A., Raudenbush, S. (1992) Hierarchical linear models: Applications and data analysis methods. New York: Sage Publications, Inc.
Google Scholar
Castano, S., Ferrara, A., Lorusso, D., Montanelli, S. (2008). On the Ontology Instance Matching Problem. In: Proceedings of the 7th DEXA Workshop on Web Semantics (WebS 08) (pp. 180–184). Turin, Italy
Coates, H. (2007). Universities on the catwalk: Models for performance ranking in australia. Higher Education Management and Policy, 19(2), 69.
Article Google Scholar
Codd, E., Codd, S., Salley, C. (1993). Providing olap to user-analysts: An it mandate. Tech. rep.
DeBattisti, F., Salini, S. (2010). Bibliometric indicators for statisticians: critical assessment in the Italian context. Università di Firenze, Firenze. http://air.unimi.it/handle/2434/152106.
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391–407.
Article Google Scholar
Falagas, M., Pitsouni, E., Malietzis, G., Pappas, G. (2008). Comparison of pubmed, scopus, web of science, and Google scholar: strengths and weaknesses. The FASEB Journal, 22(2), 338.
Article Google Scholar
Franceschet, M. (2009). A cluster analysis of scholar and journal bibliometric indicators. Journal of the American Society for Information Science and Technology, 60(10), 1950–1964.
Article Google Scholar
Friedman, J., Tibshirani, R., Hastie, T. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.
MATH Google Scholar
Geraci, M., Degli Esposti, M. (2011). Where do italian universities stand? An in-depth statistical analysis of national and international rankings. Scientometrics, 87(3), 667–681.
Article Google Scholar
Glänzel, W., Schubert, A. (2003). A new classification scheme of science fields and subfields designed for scientometric evaluation purposes. Scientometrics, 56(3), 357–367.
Article Google Scholar
Goldstein, H. (2010). Multilevel statistical models, 4th edn. New York: Wiley.
Book Google Scholar
Goldstein, H., Spiegelhalter, D. (1996) League tables and their limitations: Statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society. Series A (Statistics in Society), 385–443.
Golfarelli, M., Rizzi, S. (2009). Data Warehouse design: Modern principles and methodologies. Maidenheach: McGraw-Hill.
Google Scholar
Greenacre, M., Blasius, J. (2006). Multiple correspondence analysis and related methods. Boca Raton: Chapman & Hall/CRC.
Book MATH Google Scholar
Hirsch, J. (2005) An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United states of America, 102(46), 16,569.
Article Google Scholar
Hofmann, T. (1999). Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 50–57). New york: ACM.
Hubert, J. (1977). Bibliometric models for journal productivity. Social Indicators Research, 4(1), 441–473.
Article MathSciNet Google Scholar
Hudomalj, E., Vidmar, G. (2003). Olap and bibliographic databases. Scientometrics, 58(3), 609–622.
Article Google Scholar
Irvine, J., Martin, B. (1984). Foresight in science: picking the winners. London.
Jensen, F. (1996). An introduction to Bayesian networks, vol. 210. London: UCL press.
Google Scholar
Kenett, R., Salini, S. (2011). Modern analysis of customer satisfaction surveys: comparison of models and integrated analysis. Applied Stochastic Models in Business and Industry, 27(5), 465–475.
Article MathSciNet Google Scholar
Kolaczyk, E. (2009). Statistical analysis of network data: methods and models. Berlin: Springer.
Book MATH Google Scholar
Mallig, N. (2010). A relational database for bibliometric analysis. Journal of Informetrics, 4(4), 564–580.
Article Google Scholar
Mann, G., Mimno, D., McCallum, A. (2006). Bibliometric impact measures leveraging topic analysis. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries (pp. 65–74). New york: ACM.
Meho, L., Yang, K. (2007). Impact of data sources on citation counts and rankings of lis faculty: Web of science versus scopus and google scholar. Journal of the American Society for Information Science and Technology, 58(13), 2105–2125.
Article Google Scholar
Molinari, J., Molinari, A. (2008). A new methodology for ranking scientific institutions. Scientometrics, 75(1), 163–174.
Article Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T. (2000). Text classification from labeled and unlabeled documents using em. Machine learning 39(2), 103–134.
Article MATH Google Scholar
Steyvers, M., Griffiths, T. (2007) Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424–440.
Google Scholar
Tapper, T., Filippakou, O. (2009). The world-class league tables and the sustaining of international reputations in higher education. Journal of Higher Education Policy and Management, 31(1), 55–66.
Article Google Scholar
Teh, Y., Jordan, M., Beal, M., Blei, D. (2006). Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476), 1566–1581.
Article MATH MathSciNet Google Scholar
Vassiliadis, P. (1998). Modeling multidimensional databases, cubes and cube operations. In: Scientific and Statistical Database Management, International Conference on, (p. 53). IEEE Computer Society, Los Alamitos, CA, USA. http://doi.ieeecomputersociety.org/10.1109/SSDM.1998.688111.
Vassiliadis, P., Sellis, T. (1999). A survey of logical models for olap databases. SIGMOD Rec. 28, 64–69. http://doi.acm.org/10.1145/344816.344869. http://doi.acm.org/10.1145/344816.344869.
Vinkler, P. (2010). The evaluation of research by scientometric indicators. London: Chandos Publishing.
Book Google Scholar
Wolfram, D. (2006). Applications of SQL for informetric frequency distribution processing. Scientometrics, 67(2), 301–313.
Article Google Scholar
Yu, H., Davis, M., Wilson, C., Cole, F. (2008). Object-relational data modelling for informetric databases. Journal of Informetrics, 2(3), 240–251.
Article Google Scholar

Download references

Acknowledgments

We would like to thank the UNIMIVAL group of the University of Milan (http://www.unimi.it/cataloghi/nucelo_valutazione/ricercatori_in_breve.pdf). In the last year, they worked with us on the subject of bibliometrics; many of our ideas come from our common work and from our many fruitful discussions.

Author information

Authors and Affiliations

Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milan, Italy
Alfio Ferrara
Dipartimento di Scienze Economiche, Aziendali e Statistiche, Università degli Studi di Milano, Milan, Italy
Silvia Salini

Authors

Alfio Ferrara
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Salini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvia Salini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferrara, A., Salini, S. Ten challenges in modeling bibliographic data for bibliometric analysis. Scientometrics 93, 765–785 (2012). https://doi.org/10.1007/s11192-012-0810-x

Download citation

Received: 20 October 2011
Published: 15 July 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s11192-012-0810-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ten challenges in modeling bibliographic data for bibliometric analysis

Abstract

Access this article

Similar content being viewed by others

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Literature reviews as independent studies: guidelines for academic practice

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ten challenges in modeling bibliographic data for bibliometric analysis

Abstract

Access this article

Similar content being viewed by others

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Literature reviews as independent studies: guidelines for academic practice

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation