Skip to main content

Programmers, Professors, and Parasites: Credit and Co-Authorship in Computer Science

Abstract

This article presents an in-depth analysis of past and present publishing practices in academic computer science to suggest the establishment of a more consistent publishing standard. Historical precedent for academic publishing in computer science is established through the study of anecdotes as well as statistics collected from databases of published computer science papers. After examining these facts alongside information about analogous publishing situations and standards in other scientific fields, the article concludes with a list of basic principles that should be adopted in any computer science publishing standard. These principles would contribute to the reliability and scientific nature of academic publications in computer science and would allow for more straightforward discourse in future publications.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    Analyses of the DBLP and NRC Research-Doctorate Programs in the United States data were carried out by the author. Programs in C++ were devised for parsing and analyzing the data; for example, the NRC data analysis program is shown in the Appendix. Figures 26 were produced using the output of these programs, exported to a spreadsheet application. Occasionally it was not possible to parse the data correctly (due to incorrect formatting or other inconsistencies); these situations were documented (see the Appendix for an example). Instances of this problem were relatively small and should not affect the trends observed in this study.

  2. 2.

    Personal communication.

  3. 3.

    The R 2 value is the “coefficient of determination” for a statistical fit line. R 2 values close to 1 represent ideal fit lines, while R 2 ≈ 0 implies little to no correlation between a fit curve and the data. To produce these values, optimal least-squared fit curves were chosen from standard models (exponential, logarithmic, linear) for statistical variation. The curves are imposed on Figs. 5 and 6 for inspection. Here we see that primary authorship and computer science “scholarly quality” are related by a fit line with sufficiently high R 2 value to indicate some type of correlation, while the relationship between secondary authorship and “scholarly quality” is insubstantial.

  4. 4.

    Personal communication.

  5. 5.

    Incidentally, the NRC data concerning primary versus secondary authorship apparently is somewhat flawed. Certain schools with considerable numbers of publications are identified as having no secondary-authored publications, which is unlikely given that any team with multiple researchers from the same school would have to have one or more secondary authors since only one person can appear first on an author list. Furthermore, not all lines in the data file have the right number of characters to agree with the description of the data format.

References

  1. 1.

    Adiga, N. R., et al. (2002). An overview of the BlueGene/L Supercomputer. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, November 16–22, 2002, 1–22.

  2. 2.

    Frye, J., Ananthanarayanan, R., & Modha, D. S. Towards Real-Time, Mouse-Scale Cortical Simulations. IBM Research Report RJ10404 (A0702-001). Retrieved February 7, 2007, from http://www.modha.org/papers/rj10404.pdf.

  3. 3.

    Cliff, P. (1989). The Oxford English Dictionary, (2nd ed.) OED Online. Retrieved May 26, 1989, from http://dictionary.oed.com/cgi/entry/50015051. Accessed 2007.

  4. 4.

    Monastersky, R. (2005). The number that’s devouring science. The Chronicle of Higher Education, 14, A12.

    Google Scholar 

  5. 5.

    Association for Computing Machinery. (2007). ACM. Retrieved May 7, 2007, from http://www.acm.org.

  6. 6.

    Institute of Electrical and Electronics Engineers. IEEE. Retrieved May 7, 2007, from http://www.ieee.org.

  7. 7.

    Computer Science Bibliography. Michael Ley, maintainer. Retrieved May, 7, 2007, from http://www.informatik.uni-trier.de/~ley/db/.

  8. 8.

    Petricek, V., et al. (1994). Modeling the author bias between two on-line computer science citation databases. Special interest tracks and posters, The 14th International World Wide Web Conference, May 10–14, 2005, 1062–1063.

  9. 9.

    Glänzel, W. Coauthorship patterns and trends in the sciences (1980–1998): A bibliometric study with implications for database indexing and search strategies. Library Trends, 50(3), 461–474.

  10. 10.

    Garfield, E. (1996). What is the primordial reference for the phrase ‘publish or perish’? Scientist (Philadelphia, PA), 10(12), 10–11.

    Google Scholar 

  11. 11.

    Abrahams, M. (2002). The Ig Nobel prizes: The annals of improbable research. New York: Dutton.

    Google Scholar 

  12. 12.

    Joseph, K., Laband, D., & Patil, V. (2005). Author order and research quality. Southern Economic Journal, 71(3), 545–555.

    Article  Google Scholar 

  13. 13.

    Laband, D., & Tollison, R. (2006). Alphabetized coauthorship. Applied Economics, 38(14), 1649–1653. doi:10.1080/00036840500427007.

    Article  Google Scholar 

  14. 14.

    Djerassi, C. (1989). Cantor’s dilemma. New York: Doubleday.

    Google Scholar 

  15. 15.

    Rudd, E. (1977). The effect of alphabetical order of author listing on the careers of scientists. Social Studies of Science, 7(2), 268–269. doi:10.1177/030631277700700208.

    Article  Google Scholar 

  16. 16.

    Goldberger, M., Maher, B., & Flattau, P. E. (Eds.). (1995). Research-doctorate programs in the United States: Continuity and change. Washington: National Academy Press.

  17. 17.

    Research-doctorate programs in the United States. Data set. (1995). CD-ROM. Washington: National Academies Press.

  18. 18.

    Lederman, D. Rating doctoral programs. Inside Higher Ed, 23 Nov. 2005. Retrieved June 7, 2007, from http://insidehighered.com/news/2005/11/23/graduate.

  19. 19.

    Day, K., & Eodice, M. (2001). (First Person) 2 : A study of co-authoring in the academy. Logan: Utah State University Press.

    Google Scholar 

  20. 20.

    Macrina, F. L. (2005). Scientific integrity (3rd ed.). Washington: ASM Press.

  21. 21.

    Luey, B. (2002). Handbook for academic authors (4th ed.). Cambridge: Cambridge University Press.

    Google Scholar 

  22. 22.

    La Follette, M. C. (1992). Stealing into print: Fraud, plagiarism, and misconduct in scientific publishing. Berkeley: University of California Press.

    Google Scholar 

  23. 23.

    Broad, W. J. (1983). Notorious Darsee case shakes assumptions about science. New York Times, 14 June, C2.

  24. 24.

    Rennie, D., & Flanagin, A. (1994). Authorship! Authorship!: Guests, ghosts, grafters, and the two-sided coin. Journal of the American Medical Association, 271(6), 469–471. doi:10.1001/jama.271.6.469.

    Article  Google Scholar 

  25. 25.

    Birnholtz, J. (2006). What does it mean to be an author? The intersection of credit, contribution, and collaboration in science. Journal of the American Society for Information Science and Technology, 57(13), 1758–1770.

    Google Scholar 

  26. 26.

    Belcher, A., et al. (2007). Letter from members of the biological engineering division faculty: Statement of facts in regard to the James Sherley tenure case. MIT Faculty Newsletter, 19(6), 13–15.

    Google Scholar 

  27. 27.

    Kajiya, J. How To Get Your SIGGRAPH Paper Rejected. 29 August 2006. Retrieved March 28, 2007, from http://www.siggraph.org/publications/instructions/rejected.

  28. 28.

    American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: American Psychological Association.

    Google Scholar 

  29. 29.

    Ethical Guidelines for Journal Publication. Elsevier. Retrieved May 25, 2007, from http://www.elsevier.com/wps/find/intro.cws_home/ethical_guidelines. .

  30. 30.

    International Committee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals: Writing and editing for biomedical publication. February 2006. Retrieved May 26, 2007, from http://www.icmje.org/icmje.pdf.

  31. 31.

    Committee on Publication Ethics. Guidelines on Good Publication Practice. Retrieved May 28, 2007, from http://www.publicationethics.org.uk/guidelines.

  32. 32.

    Lee, W., et al. (2005). Genome-wide requirements for resistance to functionally distinct DNA-damaging agents. PLOS Genetics, 1(2), 235–246. doi:10.1371/journal.pgen.0010024.

    Article  Google Scholar 

Download references

Acknowledgements

Special thanks to James Robert Wood, Stanford Department of English and Program in Writing and Rhetoric, for his advice throughout the writing process and revisions of the final paper. Additional thanks to Prof. Clifford Nass (Stanford Department of Communications), Prof. Andrea Lunsford (Stanford Department of English), and my family for their support.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Justin Solomon.

Appendix: C++ Code Used to Parse NRC Study Data

Appendix: C++ Code Used to Parse NRC Study Data

The following short program was used to parse the data accompanying the National Research Council’s study, Research-Doctorate Programs in the United States. It is included to make transparent the methods used for data analysis and to enable easier analysis of the NRC data in future studies.Footnote 5

  • #include <iostream>

  • #include <fstream>

  • #include <vector>

  • #include <string>

  • #include <conio.h>

  • #include <set>

  • using namespace std;

  • int main() {

  •  //Citation data

  •  ifstream infile(“PUB_CIT.dat”,ios::binary|ios::in);

  •  //List of faculty

  •  ifstream faclist(“FACLIST.dat”,ios::binary|ios::in);

  •  //Output file

  •  ofstream outfile(“pub_cit_analysis2.txt”);

  •  set<string> validnames;//set of names of CS researchers

  •  string curline;

  •  int b = 0, cs=0;//b = number of invalid lines; cs = number of CS professors

  •  while (getline(faclist,curline), !faclist.fail()) {

  •   //according to NRC standard, all lines in FACLIST.dat should have 63 characters

  •   if (curline.length() < 63) {

  •    b++;//invalid line

  •    continue;

  •   }

  •   string facname = curline.substr(0,5);//Name of faculty; NRC 5-character code

  •   string progcode = curline.substr(60,2);//Program/department code

  •   if (progcode == “26”) {//code 26 = computer science

  •    validnames.insert(facname);

  •    cs++;

  •   }

  •  }

  •  cout ≪ b ≪ “bad lines.\n”;

  •  cout ≪ cs ≪ “CS profs.\n”;

  •  //all schools indexed by 3-digit code, so vectors have 1000 elements to cover all codes

  •  vector<int> primary_authorship(1000);//number of primary-authored papers per school

  •  vector<int> secondary_authorship(1000);//number of secondary-authored papers per school

  •  vector<int> single(1000);//number of singly-authored papers per school

  •  vector<int> multiple(1000);//number of multiple-authored papers per school

  •  vector<int> total(1000);//total number of papers per school

  •  string empty=““;

  •  int numlines = 0;//number of lines parsed

  •  int numbad = 0;//number of bad lines

  •  while (getline(infile,curline), !infile.fail()) {//for each publication

  •   numlines++;//update status

  •   if (curline.length() < 98) {//invalid line

  •    numbad++;

  •    continue;

  •   }

  • if (!validnames.count(curline.substr(0,5)))//not a CS publication

  •    continue;

  •   char t = curline[2];

  •   if (t == ‘A’ || t == ‘O’ || t == ‘S’ || t == ‘Y’)

  •    continue;//only accept proceedings, journals

  •   int numAuthors = curline[19]-’0’;//number of authors (between 0 and 9)

  •   if (numAuthors == 1) single[c]++;

  •   else if (numAuthors > 1) multiple[c]++;

  •   if (curline[31] == ‘P’) primary_authorship[c]++;//‘P’ indicates primary authorship

  •   else if (curline[31] == ‘S’) secondary_authorship[c]++;

  •   total[c]++;

  •  }

  •  for (int i = 0; i < 1000; i++)

  •   if (total[i]) {//if school published in CS, output data

  •    outfile ≪ i ≪ ‘‘;

  •    outfile ≪ total[i] ≪ ‘‘;

  •    outfile ≪ single[i] ≪ ‘‘≪ multiple[i] ≪ ‘‘;

  •    outfile ≪ primary_authorship[i] ≪ ‘‘≪ secondary_authorship[i] ≪ ‘‘;

  •    outfile ≪ (double)primary_authorship[i]/(primary_authorship[i] +

  •     secondary_authorship[i]);

  •    outfile ≪ endl;

  •   }

  •  cout ≪ “Data processing is done.\n”;

  •  cout ≪ numlines ≪ “lines processed.\n”;

  •  cout ≪ numbad ≪ “bad lines.\n”;

  •  getch();

  • }

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Solomon, J. Programmers, Professors, and Parasites: Credit and Co-Authorship in Computer Science. Sci Eng Ethics 15, 467–489 (2009). https://doi.org/10.1007/s11948-009-9119-4

Download citation

Keywords

  • Co-authorship
  • Computer science research
  • Publishing