Skip to main content

Guideline Assessment Project II: statistical calibration informed the development of an AGREE II extension for surgical guidelines



To inform the development of an AGREE II extension specifically tailored for surgical guidelines.

Summary background data

AGREE II was designed to inform the development, reporting, and appraisal of clinical practice guidelines. Previous research has suggested substantial room for improvement of the quality of surgical guidelines.


A previously published search in MEDLINE for clinical practice guidelines published by surgical scientific organizations with an international scope between 2008 and 2017, resulted in a total of 67 guidelines. The quality of these guidelines was assessed using AGREE II. We performed a series of statistical analyses (reliability, correlation and Factor Analysis, Item Response Theory) with the objective to calibrate AGREE II for use specifically in surgical guidelines.


Reliability/correlation/factor analysis and Item Response Theory produced similar results and suggested that a structure of 5 domains, instead of 6 domains of the original instrument, might be more appropriate. Furthermore, exclusion and re-arrangement of items to other domains was found to increase the reliability of AGREE II when applied in surgical guidelines.


The findings of this study suggest that statistical calibration of AGREE II might improve the development, reporting, and appraisal of surgical guidelines.

This is a preview of subscription content, access via your institution.


  1. Graham R, Mancher M, Miller Wolman D, Greenfield S, Steinberg E (eds) (2011) Clinical practice guidelines we can trust. Institute of Medicine (US) Committee on Standards for Developing Trustworthy Clinical Practice Guidelines. National Academies Press, Washington (DC).

    Google Scholar 

  2. Antoniou SA et al (2019) Guideline assessment project: filling the GAP in surgical guidelines: quality improvement initiative by an International Working Group. Ann Surg 269:642–651.

    Article  PubMed  Google Scholar 

  3. Guyatt GH et al (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336:924–926.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Alhazzani W, Guyatt G (2018) An overview of the GRADE approach and a peek at the future. Med J Aust 209:291–292.

    Article  PubMed  Google Scholar 

  5. Brouwers MC et al (2010) Development of the AGREE II, part 1: performance, usefulness and areas for improvement. Can Med Assoc J 182:1045–1052.

    Article  Google Scholar 

  6. Brouwers MC et al (2010) Development of the AGREE II, part 2: assessment of validity of items and tools to support application. Can Med Assoc J 182:E472–E478.

    Article  Google Scholar 

  7. Schünemann H, Brożek J, Guyatt G, Oxman A (eds) (2013) GRADE handbook for grading quality of evidence and strength of recommendations. The GRADE Working Group. Available from Accessed 4 Jan 2021

  8. Brouwers MC, Kerkvliet K, Spithoff K (2016) The AGREE reporting checklist: a tool to improve reporting of clinical practice guidelines. BMJ 352:i1152.

    Article  PubMed  PubMed Central  Google Scholar 

  9. World Health Organization (2012) WHO Handbook for Guideline Development. Accessed 4 Jan 2021

  10. NICE, T. G. M. G. A. G. Accessed 2 Oct 2017

  11. Norris SL et al (2016) The skills and experience of GRADE methodologists can be assessed with a simple tool. J Clin Epidemiol 79:150-158.e151.

    Article  PubMed  Google Scholar 

  12. Tavakol M, Dennick R (2011) Making sense of Cronbach’s alpha. Int J Med Educ 2:53–55.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16:297–334.

    Article  Google Scholar 

  14. Nunnally JC (1978) Psychometric theory. McGraw-Hill, New York

    Google Scholar 

  15. Bhattacherjee A (2012) Social science research: principles, methods, and practices. Textbooks Collection. 3. Available from Accessed 4 Jan 2021

  16. Pearson K, Pearson ES (1922) On polychoric coefficients of correlation. Biometrika 14:127–156.

    Article  Google Scholar 

  17. Olsson U (1979) Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika 44:443–460.

    Article  Google Scholar 

  18. Reckase MD (2009) Multidimensional item response theory. Springer, New York

    Book  Google Scholar 

  19. AGREE II Extension for Surgical Guidelines. Accessed 30 Mar 2020

Download references


The authors received no funding for this work.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Dimitrios Mavridis.

Ethics declarations


Mrs. Sofia Tsokani, Dr. Stavros A. Antoniou, Prof. Irini Moustaki, Prof. Manuel López-Cano, Mr George A. Antoniou, Prof. Ivan D. Flórez, Prof. Gianfranco Silecchia, Mr Sheraz Markar, Prof. Dimitrios Stefanidis, Prof. Giovanni Zanninotto, Prof Nader K. Francis, Prof George H. Hanna, Prof. Salvador Morales-Conde, Prof. Hendrik Jaap Bonjer, Prof. Melissa C. Brouwers, and Prof. Dimitrios Mavridis, PhD have no conflicts of interest or financial ties to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 2976 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tsokani, S., Antoniou, S.A., Moustaki, I. et al. Guideline Assessment Project II: statistical calibration informed the development of an AGREE II extension for surgical guidelines. Surg Endosc 35, 4061–4068 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Clinical practice guidelines
  • Surgery
  • Methodological quality
  • Reporting quality