Standardization problem of author affiliations in citation indexes
- First Online:
- Cite this article as:
- Taşkın, Z. & Al, U. Scientometrics (2014) 98: 347. doi:10.1007/s11192-013-1004-x
- 435 Views
Academic effectiveness of universities is measured with the number of publications and citations. However, accessing all the publications of a university reveals a challenge related to the mistakes and standardization problems in citation indexes. The main aim of this study is to seek a solution for the unstandardized addresses and publication loss of universities with regard to this problem. To achieve this, all Turkey-addressed publications published between 1928 and 2009 were analyzed and evaluated deeply. The results show that the main mistakes are based on character or spelling, indexing and translation errors. Mentioned errors effect international visibility of universities negatively, make bibliometric studies based on affiliations unreliable and reveal incorrect university rankings. To inhibit these negative effects, an algorithm was created with finite state technique by using Nooj Transducer. Frequently used 47 different affiliation variations for Hacettepe University apart from “Hacettepe Univ” and “Univ Hacettepe” were determined by the help of finite state grammar graphs. In conclusion, this study presents some reasons of the inconsistencies for university rankings. It is suggested that, mistakes and standardization issues should be considered by librarians, authors, editors, policy makers and managers to be able to solve these problems.
KeywordsStandardization problemFinite state techniqueData accuracyData unificationAddress unificationResearch evaluationUniversity rankingsCitation indexesNooj
Citation indexes are used not only for following literature, but also making citation analyses. Citation analysis studies are conducted to measure intellectual effects of researchers and quality of papers (Cole 2000). The content of citation indexes has been growing with the diffusion of the usage of these indexes. In time, some problems of data accuracy have emerged (Galvez and Moya-Anegón 2006a). The data accuracy issues depend on spelling, translation or abbreviation of affiliations (Galvez and Moya-Anegón 2007a). Mistakes originating from affiliations cause problems about showing and evaluating collaborations, limiting scientific fields and effecting performance evaluations negatively (Galvez and Moya-Anegón 2006a).
Scientific studies include organizational and geographical address information of author(s) at the beginning as footnote. In the beginning, address information was given to provide connection between authors and readers. However, the usage of this information has changed gradually with the development of research evaluations and affiliations have become vital for departments, laboratories and research units (De Bruin and Moed 1990). The process of giving affiliations have begun with the author(s)’ choice and continued with the formalization of these addresses by the editors’ and publishers’. However, when it is left to people’s choices, it creates confusion about addresses. In consequence, people working in the same university or department may give different addresses from each other causing hundreds of variations for a university or an organization name in citation indexes. This can lead to serious confusions (Moed 2005, pp. 183–184).
Some organizations’ budgets have been determined based on their publication counts. However, since some publications disappear because of address mistakes, organizations or research groups end up losing their budget supports. The situation in Turkey is the same as it is in the world. Organizations have been appraised by using their publication counts. The universities with higher number of publications have been approved as better universities by some authorities. Author(s) are required to publish certain number of articles in citation indexes to take academic degrees (Öğretim 2007). This actually brings the quantitative evaluations to the forefront instead of qualitative ones. On the other hand, Turkish Scientific Research Council (TÜBİTAK) has announced to give support only to articles that have “Turkey” on the address field within the context of incentive program for scientific publications (ULAKBİM 2010). In addition, the rankings of Turkish Universities are declared by The Council of Higher Education every year (The Council of Higher Education 2010). Similarly, URAP (University Ranking by Academic Performance) research laboratory publishes university rankings every year by using various criteria (URAP 2011).
Mentioned implementations have shown the usage of citation indexes in Turkey. Although the numbers do not measure the quality of publications, it is obvious that they have an importance for some communities and policy makers. However, it should be kept in mind that it is unavoidable to make mistakes when using manual indexing systems for citation indexes. Managers should take into account the quantitative analyses based on inaccurate data since access to all publications of each organization has become more of an issue.
The main aim of this study is to develop an algorithm to find mistakes for Turkish Universities in Web of Science. First of all, the types of mistakes are identified and their effects are measured. Then, the mistakes are found easily by using an algorithm created by finite state technique, which has been widely used for recognition of characters, grammar checking, pattern matching, spelling correction and many different areas in the literature. Finite state is defined in the literature as the operation of sets of strings or sequences of a word (Roche and Schabes 1996, p. 1). Detailed information about finite state technique is explained in the following part. After finding mistakes and standardization errors for affiliations by finite state technique, some suggestions to reduce the problem are given at the end of the study.
There is a standardization problem for Turkish Universities in citation indexes.
It is possible to reduce standardization problems by using finite state technique.
Although there are a few publications about data unification for citation indexes in Turkey, this study is the first to identify addresses automatically. Therefore, this study is expected to present some solutions for libraries and decision makers.
Finite state technique
Finite state algorithms accept strings by following predetermined labels if it can trace a path from the initial state to the finite (Galvez and Moya-Anegón 2007a, p. 9). These algorithms have networks of defined states and links which are labeled (Roche and Schabes 1995, pp. 236–237). The main operation of finite state depends on reading labeled strings from left to right by considering links between states. If the string matches the predetermined label, the automaton moves on the following state. This process continues until it reaches the final state (Galvez and Moya-Anegón 2007a, p. 9).
Finite state algorithms are used not only for pattern matching and recognition, speech tagging, recognition of handwriting, optical character recognition and encryption algorithms but also in wide range of scientific areas (Roche and Schabes 1997, p. 227). It is possible to draw a parallel between finite state algorithm and subway turnstile. To make an analogy, closed turnstile is the initial state for finite state algorithm. If a passenger inserts coin, it moves on second state (gate opens). The turnstile moves on the closed state after passengers pass. By the way the system opens only under the condition of inserting coins (Scholl 2008). The system logic for turnstile has resemblance with finite state algorithms from the point of following and identifying states until the process ends.
Finite state transducers are computer software for implementing finite state algorithms to high amount of texts automatically. Transducers produce strings with regard to existing states to control all stems and forms of a word (Goldsmith 1993; Altıntaş 2001). If all the rules about this word are accepted by transducer, the word is accepted as correct. In the contrary case, the word is rejected or accepted partially (Altıntaş 2001). Finite state algorithm is defined as simple and effective model for natural language processing. Phonological, morphological and syntactical analyses, symbolizations and language modeling can be made easily by using these algorithms (Roche and Schabes 1997). However, usage areas have changed and spread to different fields from linguistics in recent years.
Finite state algorithms are used in many areas in the scientific literature such as engineering, linguistics, medicine and librarianship. The main reason for such commonly use is the customization feature of finite state. In the beginning, although it was propounded that using finite state algorithms to identify natural languages like English was impossible (e.g. Chomsky 1964, p. 21), it is now commonly used for natural language processing and for revealing morphological structure of languages. Many researchers working on finite state indicate that it can be easily used for natural language processing due to its velocity and density (Johnson 1972; Kaplan and Kay 1994; Roche and Schabes 1995; Oflazer 1996; Mohri 1997).
Previous studies about data accuracy in citation indexes
Data accuracy has been popular in recent years for library and information science with the growth of citation indexes. Many studies have tried to find out standardization problems and solve them. According to Moed (2005), the main mistakes have been made because of the complexity of the names of authors and organizations. An author’s name can be used in many different formats and there can be many authors with the same name. In addition, translating authors’ names from different languages (such as Chinese) to English and using nicknames also create confusion with author names. On the other hand, the problem of organization names depends on flexibility of giving addresses. Namely, two scientists working for the same organization may identify their organizations in different ways causing unauthorized variations in organization names.
Changing names have been determined as one of the main problems for organizations (Hood and Wilson 2003). On the other hand, non-standardization of university names has created problems regarding information retrieval and the solution is well-structured unification (De Bruin and Moed 1990). Organizations and authors can only be evaluated correctly under the condition of using accurate data (Toutkoushian and Webber 2011, p. 130). Although the common assumption for university names is “University X”, Van Raan (2005) indicated that this was totally wrong. He also suggested address unification to specify the addresses of all universities.
Finite state algorithms are used in library and information science to make information retrieval more effective (Galvez and Moya-Anegón 2006b; Galvez et al. 2005; Kettunen 2008) and to standardize author and organization names (Galvez and Moya-Anegón 2006a, 2007a, b).
Studies about organization names have been carried out for standardizing organizational name array, which is as follows; university name, institute/faculty/research group, department, city, country. These studies did not aim to determine spelling mistakes for organization names (Galvez and Moya-Anegón 2006a, 2007a). On the other hand, the study on standardizing author names was designed to find different versions of an author name (Galvez and Moya-Anegón 2007b).
Although there are some papers about standardization in the literature (Cornell 1982; Piternick 1982; Williams and Lannom 1981; Ruiz-Pérez et al. 2002; Falahati Qadimi Fumani et al. 2012), the issue has not been popular in Turkey, yet. The only unification work for Turkey is a published book on national science indicators (ULAKBİM 2007), which presents all the possible variants of Turkish University names.
The dominant trend for international papers on this topic is finding and solving standardization problems for university names. However, they do not focus on word/spelling mistakes. This study is the first research into the identification of standardization problems in Turkey, presenting some solution proposals for Turkish literature.
Some publications in citation indexes such as Middle Eastern Studies and Athenaeum-Studi Periodici Di Letteratura E Storia Dell Antichitabazi does not have affiliation information.
Country information for some publications was incorrect.
“Turkey” was not included in the address field for some publications.
“Istanbul Univ, Res & Applicat Ctr Biotechnol & Genet Engn, TR-34118 Istanbul, Turkey”.
Gaziosmanpasa Univ, Dept Med Biol, Fac Med, Tokat, Turkey.
Sch Med, Gazi Univ, Ankara, Turkey.
The addresses of this publication was written in the institution column as “ISTANBUL;GAZIOSMANPASA;GAZI”. Thus, all the publications can be classified under their unified affiliation information; department and faculty names are not unified within the scope of this study. Hierarchical order of the addresses was not considered during unification process. If the organization name was indicated in the middle of the string, it was also unified.
Some publications were excluded from this study because of the unspecified addresses like “dept plast & reconstruct surg, ankara, turkey”. In such a case, the author area (AU) has been evaluated to find the specified university that is in Ankara. If the address information cannot be accessed even by using author names, the records of these publications were not evaluated in this study. Similarly, home and working place addresses like “Fecri Ebcioglu Sokagi (street), Dilek Apt 6-8,1 Levent, TR-34340 Istanbul, Turkey” were determined and left out of scope. However, this situation did not cause a big problem because only 647 records (0.3 % of all records) were affected from that exclusion.
Determination of standardization problem and measure its magnitude
After unification of 198,687 records, the distribution of Turkey-addressed publications among universities was identified. Then, the most productive first 20 universities that have more than 4,000 publications have been chosen for the determination of standardization problems. The differences rate between correct addresses and errors have been determined for these 20 universities. The correct address of a university was accepted as “Univ X” and “X Univ” (such as Hacettepe Univ, Univ Hacettepe). Some Turkish Universities have a Turkish and an English name such as Orta Doğu Teknik University; Middle East Technical University. In such a case, all possible correct variants were accepted.
To present the effect of errors on addresses, bibliometric collaboration maps were created by using CiteSpace (http://cluster.cis.drexel.edu/~cchen/citespace/). Two collaboration maps were drawn to show the effects. First map includes Web of Science affiliations (original addresses), and the second one shows unified university names. Then, the differences between these two maps were also evaluated.
Implementing finite state transducer
One of the aims of this study is to find all possible variants of a university’s addresses from a huge amount of dataset by using finite state transducer. To achieve this, Hacettepe University, the most productive university in Turkey, was chosen for implementation. Nooj created by Max Silbertzein in 2002 was used as a transducer. Nooj is an open-source linguistic development environment with its large-coverage dictionaries, grammars and corporas (Nooj 2012).
The .txt file that included address data was converted into Nooj text format and named as.not. Then, grammar graphs were created to implement.not file. Detailed information about implementing and creating graphs is explained in the “findings”.
The most productive top 20 universities
Number of publications
Number of mistakes
Percent of mistakes (%)
İstanbul Teknik University
Dokuz Eylül University
Ondokuz Mayıs University
Karadeniz Teknik University
As it is seen in Table 1, ratios of mistakes differ across universities. Although the most productive institution, Hacettepe University, has lower mistakes than others, second productive İstanbul University lost over 10 % of its publications. The main reason of lower loss of Hacettepe University’s publications can be explained with the list presented by Hacettepe University Libraries that includes all possible address variants of the university (Hacettepe University Libraries 2012).
Although İstanbul University is announced as the top productive by The Council of Higher Education every year (The Council of Higher Education 2010), it is determined in this study that this university takes place at the second rank with its loss of 10.3 %. It is obvious that the publications produced by Istanbul University should be evaluated deeply. In this sense, a search with the keyword “Univ Istanbul” was carried out on April 2, 2012 and 4,383 of the publications did not belong to Istanbul University. Results showed that 2,000 of them were produced by İstanbul Technical University. Result list also included universities that are located in İstanbul such as Koç, Sabancı and Marmara Universities. The main reason of this problem is the addresses that were given as “Koc Univ, Istanbul”. Even though searches are conducted with quotation marks, Web of Science retrieves these records for “Univ Istanbul” search. Under these circumstances, both İstanbul University Library and their decision-makers should take this situation into account during ranking and policy making processes.
The most common mistakes were identified for GATA (Gülhane Military Medicine Academy) out of 20 universities. GATA generally takes place at lower ranks in the lists that include publication counts (The Council of Higher Education 2010; ULAKBİM 2007, p. 168). In fact, GATA should be ranked 10th. It is conceivable that all the publications of GATA cannot be determined in the previous studies.
Loss of the publications increased towards the end of the list. The mistake rates of Kahramanmaraş Sütçü İmam (KSU) and Yüzüncü Yıl Universities (YYU), which were not among the top 20 universities, were very high. One-fourth of KSU’s publications were missing due to the address mistakes. Likewise, YYU lost 15 % of its publications.
It is obvious that mistakes make evaluation processes for universities harder. If the reasons of these mistakes can be identified, the solutions will be found easily. In order to find the reasons, a correlation test was carried out for top 40 universities. However, the results of the correlation test showed no meaningful correlation between mistakes and character count (r = 0.064, p = 0.699), word count (r = 0.040, p = 0.810) or Turkish character count (r = 0.066, p = 0.692) on the university names. The only positive correlation was found between total publication count and mistaken publication count (r = 0.585, p < 0.001), but this kind of relationship is usually expected and it is natural. Because of these unexplainable errors on addresses, fixing the standardization problem became more problematic.
After evaluation and unification processes, different types of mistakes were identified. Main error types are listed below.
Character or spelling errors
The main mistakes were specified as errors originating from keyboard while writing university addresses. According to Damerau (1964), over 80 % of spelling errors depend on insertion, deletion, substitution and transposition of characters. The same issues were determined for Turkish Universities’ addresses such as “Hacetteppe Univ”, “Hacattepe Univ”, “Maramara Univ”, “Egge Univ”, “Inonoii Univ” and “Dukuz Eylul Univ”. These kinds of errors were made not only by authors, but also by editors or indexers.
Web of Science’s indexing logic is based on digitization of sources and indexing on the database manually (Thomson Reuters 2009). However, descriptive manuals of Web of Science did not explain the way of indexing clearly. An e-mail message from Thomson Reuters Technical Support Team indicated that the indexers depend on the addresses that are written on original texts. In addition to this, it is indicated that an abbreviation list is being used for some words and word groups such as university, faculty, research center etc. It seems that the natural language indexing and digitization of texts cause most of the errors on the address fields of the records in Web of Science.
Another indexing error type is the mistyping of characters. To illustrate, some addresses have “rn” instead of “m”; “m” instead of “in”; “1” instead of “i”; “i” instead of “l” such as “Parnukkale Univ” (Pamukkale University), “Dokuz Eylui Univ” (Dokuz Eylül University), “F1rat Univ” (Fırat University) and “Dumlupmar Univ” (Dumlupınar University). These types of errors may be originating from OCR process of documents. Therefore, digitized materials should be controlled effectively.
As seen on Fig. 4, although the address of original article is “Adnan Menderes University Faculty of Medicine, Department of Dermatology”, it was indexed in Web of Science as “Uslu Univ”. These kinds of mistakes affect the visibility of the publications. In such a case, these publications can only be found by one by one evaluation of all records. This process requires more workforce, time and attention.
Translation errors made by authors
In the international arena, Turkish Universities are addressed by their Turkish names except a few of them, like Middle East Technical University and Istanbul Technical University. Web of Science does not translate affiliations into English due to its natural language indexing, yet Turkish authors sometimes prefer to write their affiliations in English. For example, one of the well-known universities, Boğaziçi University, was indexed in Web of Science as “Bosphorus Univ”, but its English name is Boğaziçi University.
Translation for Boğaziçi University does not pose a big problem for this University due to the uniqueness of the name, “Bosphorus”. However, some universities have serious problems because of the translation of their names. For example, some authors used “Aegean University”, “Mediterranean University” and “Trakia University” instead of “Ege University”, “Akdeniz University” and “Trakya University”. This causes confusion since there are other universities bearing that name in the world. In other words, there is a “University of Aegean” in Greece (http://www3.aegean.gr/), a “Mediterranean University” in France (http://www.univmed.fr/) and a “Trakia University” in Bulgaria (http://www.uni-sz.bg/engl). Consequently, if someone searches for Ege Universities’ publications and add “Aegean University” to address field, the search results cannot present the correct publication numbers. Obviously, bibliometric studies regarding the number of publications would be inaccurate because of these indexing confusions.
Standardization problem of university addresses
Besides the above mentioned errors, standardization of university names is problematic. The problem for university addresses does not only depend on spelling, translating or indexing of the names, but also depends on different usage of university names, such as “X Univ”, “Univ X”, “X Med Sch”, “X Sch Med”, etc. There is no standard array or usage for university names. Galvez and Moya-Anegón (2007a, p.8) explained the correct array of university names as “university name, faculty, department, postal code, city, country”. However, most Turkey addressed publications do not have this kind of structure in the affiliations.
Searching with the abbreviations will not retrieve correct results if the organizations do not have unique abbreviations like METU (Middle East Technical University). By searching with the “HU” keyword, one can access the documents written by Harran, Hacettepe, Haliç, Hakkari and Hitit Universities inevitably. In addition to these universities, searching with the “HU” term also brings the addresses like “ICO Badalona, HU Germans, Barcelona, Spain”, “HU Bellvitge, Lhospitalet De Llobregat, Spain”, and “HU Vaudois, Lausanne, Switzerland”. Due to the reasons listed above, the use of abbreviations for universities should be discouraged.
Effects of mistakes and non-standardization
Incorrect and non-standard addresses remarkably affect the accuracy of the search results. There are several problems along with the reduction of visibility of the organizations.
As it is mentioned before, incentive program for scientific publications has been given according to the visibility of country affiliation of the publications in Turkey. If the affiliation is not specified for an article, this article will not have the right to take incentive.
In order to visualize the connections between organizations, and properties of these connections, collaboration networks between organizations are created by bibliometric studies. Such studies need correct, reliable and standard data. Collaboration maps created by using non-standardized data cannot present the real connections between organizations and they are not meaningful visually, either.
The nodes that cannot be visualized in Fig. 7, can be easily seen in the map of unified affiliations (see Fig. 8). Figure 8 also shows the major collaborative partners of this university and their connections with each other. The difference between the two figures emphasizes the importance of well-structured unification process. However, working on the unification process manually is time-consuming. If the unification can be achieved by using automatic techniques, the analysis process and the results of bibliometric studies will be easier and far more effective.
Solution proposals for standardization problem
The variety of mistakes and its effects were explained in the previous parts of this study. In this part, the solution proposal for the standardization problem by using the finite state technique and Nooj finite state transducer is introduced.
First stage: detection of erroneous addresses
The circles on the characters work to find extra characters. With this method, even “hhaacceettteeppee” term can be retrieved. Bridges between characters help to find the terms with missing characters. For instance, algorithm can find “hacttepe” term by the help of the bridge between “c” and “t”. Although the beginning state was specified as “h” at first, it is changed into “h” and “a” in the second graph to access the first-letter-missing records.
Accessed words and their frequencies
As it is seen in Table 2, the total frequency (27,725) is higher than the total publication count of Hacettepe University (19,166). The main reason is the existence of two “Hacettepe” words in the name for some records. In addition, although total mistakes for Hacettepe University were identified as 340, there are only 43 mistakes shown on the Table 2. Rest of the other mistakes depends on standardization problems.
Although all possible variations are tried to be envisaged, the graph still could not retrieve some words (that have undefined errors). However, it is easy to find the unidentified words with the token link on.not file. These words are “Halettepe”, “Hakettepe”, “Hacehepe”, “Hacette” and “HACETIEPE”.
Second stage: detection of unstandardized addresses
After creating the grammar for spelling mistakes, another grammar has been developed to identify the variety of addresses apart from “Hacettepe Univ” and “Univ Hacettepe”.
The second stage is about syntactic rules for the Hacettepe University and consequently “syntax” module was chosen for the second stage.
Detected addresses for Hacettepe University
Hacettepe Univ/Univ Hacettepe
Hacettepe Childrens Hosp
Hacettepe Med Sch
Hacettepe Sch Med
Hacettepe Med Fac
Hacettepe Fac Med
Hacettepe Oncol Inst
Hacettepe Med Ctr
Hacettepe Tip Fak
Hacettepe Children Hosp
Hacattepe Univ Hosp
Haccetepe Fac Med
Haccettepe Univ Hosp
Hacettepe Med Acad
Hacettepe Kuniv Hastaneleri
Haceteppe Childrens Hosp
Hacettepe Adult Hosp
Hacettepe Child Hosp
Hacettepe Cocuk Hastabanesi
Hakettepe Childrens Hosp
Hacettepe Cocuk Hastahanesi
Hacettepe Cocuk Hastanesi
Hacettepe Cocuk Hastenesi
Hacettepe Eriskin Hastanesi
Hecettepe Childrens Hosp
Hacettepe Inst Oncol
Finite state graphs could not retrieve 10 different addresses (such as “Laacettepe Univ”, “Ibsan Dogramaci Childrens Hosp”, “HUTF Plast Cerrahi ABD”) that are used only 10 times. In previous studies (ULAKBIM 2007, pp. 354–355), 69 different addresses for Hacettepe University were identified in Web of Science. Although, unretrieved addresses (“Ihsan Dogramaci Childrens Hosp” and “HU Biol Dept”) and the 21 of the retrieved addresses that contained “Hacettepe Univ” or “Univ Hacettepe” (like “Hacettepe Univ Hastaneleri”, “Hacettepe Univ Hastanesi”, “Hacettepe Univ Med”) were covered. 11 records which were accessed by finite state technique did not take place in the list of ULAKBIM. As a result, 49 out of 59 different address variations could be accessed by the graph and they are the ones which were frequently used in the address field of Web of Science.
It is concluded that identifying and accessing address variations for universities is possible by using the methodology of this study. However, it is hard to apply this technique for the universities like Ege and Gazi because of the characteristics of their names. It is possible to say that this technique can be applied to universities that have a distinctive name.
Some quantitative analyses based on publication counts of universities and organizations have been commonly used and taken into consideration by some authorities. Therefore, the general opinion about publications has been transformed to “more publication indicates better organizations”. Moreover, the existence of publications in citation indexes is becoming more and more prominent, which indicates that citation indexes’ main aim of usage has been changing dramatically. Attaching a particular value to publication counts makes it more important to determine all publications for each organization. However, calculating publication counts has been problematic because of the manual indexing in citation indexes. Some indexing mistakes have made all evaluations depending on publication counts unreliable.
International visibility is vital for some organizations to catch collaboration opportunities. It is also important to create correct collaboration maps to represent the networks between organizations. The lack of standardization has not only been affecting quantitative analyses, but also reducing institutional visibility of universities and organizations. Quantitative analyses have been also popular for Turkey and they have been affecting public opinion about universities recently. However, many of these evaluations present different results from each other because of the inaccurate data. It is quite obvious that making standardization is quite important for reflecting correct results with bibliometric studies.
The mistaken affiliations in citation indexes for Hacettepe University have been specified with this study. Also, the finite state technique is proposed to standardize affiliations by designing some finite state algorithms. The main hypothesis which was determined as “the mistakes in citation indexes can be detected by using finite state technique” is proved at the end of the study. This technique can work for many universities which have distinctive names such as Hacettepe, Uludağ, and Atatürk Universities. However, as mentioned in findings part, it can be foreseen that this technique is not applicable for short-named organizations (like Ege).
Unidentified address variations of Hacettepe University can be retrieved with the designed finite state algorithm which had been missed by the previous studies. Consequently, the effective results can be obtained by using finite state algorithms with least effort for annual publication count reports of Turkey. Furthermore, a general algorithm can be developed as a future study to extract all possible address variants for all Turkish Universities.
This study also shows the accuracy and reliability problems of citation indexes and quantitative rankings. The policies developed by using publication counts can be unreliable in parallel with the questionable data of citation indexes. Moreover, evaluating universities’ performances with quantitative methods should be investigated. Future studies can comprise alternative evaluation methods instead of counting publications.
University libraries should be conscious of mistaken data when reporting statistics to authorities. Searching citation indexes by using some basic search terms and use all the gathered records for reporting is not a perfect way to represent the real performance. Downloading data from citation indexes and cleaning it is a better way to access correct records than searching. Libraries can also suggest some standard alternatives about possible usage of the institution’s affiliations on their web sites. Hacettepe University has such guidance on its web site and by this way its loss on publication count seems lower when compared to most of the universities in Turkey.
Authors should be careful when they write affiliations on their studies. If any mistaken affiliation is detected for a study which has the correct address in the original text, that means it is possibly an indexing mistake and, it can be corrected by Thomson Reuters. In such a case, authors can fill the correcting form and follow the process in the web site (http://ip-science.thomsonreuters.com/techsupport/datachange/).
The job also falls to the editors on the process of formalization of affiliations. They should review the articles properly and correct the affiliation mistakes. It is presumable that there will be lower mistakes for publications that are evaluated deeply during the editorial process.
The providers of citation indexes have the main duties about address standardization as they are the critical actors in the field. The providers may lost their confidence and prestige in the community. Therefore, to leave the manual indexing should be their primary task, since it is hard to identify the human-induced mistakes than the mistakes originated from computers. One of the well-known databases, Scopus, has challenged this issue with its identifier mechanism entitled “affiliation identifier” (SciVerse Scopus 2012). As for Web of Science, there are some works to unify author names (ResearcherID) and affiliations (Organization-Enhanced list). Thomson Reuters launched “Organization-Enhanced list feature of Web of Science in May 2012, which allows users to search preferred organization names and/or their name variants to add their search queries (Thomson Reuters 2012). Although these efforts offer some quick and practical formulas to solve the problem, it should be taken into account that standardization problem can be minimized only with unique identifiers; all other efforts to solve them generate temporary solutions.
University rankings are one of the hot topics on the agenda of some organizations. Students decide their schools by checking its rank and Turkish Research Council gives incentives to researchers according to the organizational affiliations of their publications. Therefore, managers and policy makers should consider about data accuracy issues in citation indexes. Not the values of scientific works but their numbers are becoming more and more important for Turkey. It is alarming that if this situation continues, there will be a group of useless publications. The most important thing is to find some ways to determine quality of works.
This article is based on Taşkın’s (2012) MA thesis and was supported in part by a research grant of the Turkish Scientific and Technological Research Center (110K044). We thank Dr. İrem Soydal and Dr. Mustafa Şahiner for their meticulous reading of a draft version of this paper and for their invaluable suggestions.