Analyzing information (bibliometrics) |

Bibliometrics is the quantitative analysis of scientific output and the networks behind that output. It can be performed at different levels. At the Institut Pasteur, for example, bibliometric research can be performed on the scientific output of a scientist, a unit, a department or a project. But there are also large-scale bibliometric studies for entire institutions, countries, continents, etc.

Three people in the CeRIS library work on bibliometrics, in collaboration with scientists but also with the management team and support services. The library can help Institut Pasteur scientists calculate their indicators and carry out comprehensive bibliometric studies.

Bibliometrics: a tool for research planning

There are various methods and indicators to meet each of these needs. It is important to understand what each represents so that you can choose the most appropriate indicators for you.

Bibliometrics can also be used for assessment purposes

Bibliometrics can also be taken into account when assessing scientists. However, we would recommend:

not relying solely on bibliometric indicators to assess scientists;

being aware that scientific disciplines and subdisciplines can have different publication methods and different citation behavior;

not assessing scientists based on the impact of the journals in which they are publishing (so we would advise not using impact factor) but rather assessing the value of the publications themselves;

always combining several indicators;

understanding the limits of the databases used to calculate indicators;

being aware of the differences in coverage between different databases so that you can choose which is most appropriate;

not comparing indicators that were obtained using different databases.

For several years now, a growing number of institutions have begun to criticize the negative effects of using indicators to assess scientists. For example:

The San Francisco Declaration on Research Assessment (DORA), published in 2013 and signed by the Institut Pasteur, criticizes the distortions linked with impact factors.
The 2015 Leiden Manifesto, sets out ten principles that can serve as best practices for research evaluation, including the need to "base assessment of individual researchers on a qualitative judgment of their portfolio", which is directly aimed at the use of the h-index.

Institut Pasteur - Bibliothèque du CeRIS - Aide à l'évaluation

Following these initiatives, a number of organizations, including the French High Council for Evaluation of Research and Higher Education (Hcéres), are calling for more qualitative evaluation methods that do not use the impact factor or h-index but take into account all the activities carried out by researchers, including teaching, applied research and scientific outreach.

Impact factor: a heavily criticized indicatorThis indicator is no longer used for its intended purpose. Although it was created to measure the impact of journals and help libraries select which journals they should subscribe to, it has come to be used for researcher evaluation, with potential for manipulation. The Institut Pasteur does not use impact factor to assess its scientists.

Bibliometric indicators

All indicators have advantages and disadvantages. It is important to choose the appropriate indicators depending on the aim of the bibliometric analysis.

The most well-known indicators

h-index: this is a way of evaluating scientific impact based on the productivity of a researcher, a unit, an institution, etc. An h-index of 10, for example, indicates that of all the articles published, ten have been cited at least ten times.
Impact factor: this indicator measures the impact of a journal. The impact factor calculated by Clarivate Analytics indicates the average number of citations received by all articles published in a journal over the past two (or five) years, giving an idea of the impact that a journal has had on science over that period.
Scholarly output: this indicates the number of documents produced by a unit, a researcher or an institution as a way of measuring productivity.
Citation count: this is the total number of citations received by all the analyzed documents.

Citations per output: this is the average number of citations per document.
Field-Weighted Citation Impact (FWCI) or Category Normalized Citation Impact (CNCI): this is the ratio between the number of citations received by a publication and the global average expected rate for the same field, publication type and year of publication. For example, a Field-Weighted Citation Impact of 2 indicates that the analyzed publications have been cited twice as much as the global average.
Outputs in Top Percentile: this is the number or percentage of publications that have been cited enough times to figure in the most cited publications in the world (compared with publications of the same type, from the same year and in the same field). In general, the top 1% and the top 10% of most cited articles are used.

Use with precautionSince bibliometric indicators only give a snapshot at a specific moment in time, they always need to be placed in context and several should be used at once. An indicator is not an absolute score, and it is always important to check any values that seem strange by going back to the publications.

New indicators

The need to improve existing indicators and the use of social media has led to the creation of new indicators.

The m-index is the h-index divided by the number of years of publication under study. One of the problems of the h-index is that the longer the scientist's career, the higher his or her h-index. In dividing by the number of years, the index is brought down to a value that is independent of career length.

The Scimago Journal Rank (SJR) of a journal is the number of citations received by an article in that journal during the three years after its publication; each citation received is weighted by the importance or prestige of the citing journal. It is calculated in the Scopus database.

The Source Normalized Impact per Paper (SNIP) of a journal measures the impact of that journal based on the number of citations received by the articles published in the journal over the past three years, the total number of articles published by the same journal during the same period and the potential number of citations in the disciplinary field of the journal. It is also calculated in the Scopus database.

The CiteScore of a journal (or a series) is the ratio between the number of citations received by all the documents published in the journal (or series) over the past three years and the total number of documents published by that journal (or series). It is calculated in the Scopus database.

The Eigenfactor of a journal is the percentage of citations received by all the articles in the journal over the past five years out of the total citations received during the same period by all the articles in all the journals analyzed in Journal Citation Reports. It is calculated in the Web of Science database.

Hot Papers, is an indicator proposed by the Essential Science Indicators database (via Web of Science): it looks at the articles published over the previous two years that are in the top 0.1% of most cited articles during a two-month period before the database is updated.

Altmetrics do not indicate impact but web audience, giving an idea of which researchers are making the news in a given field. They take into account mentions in social media (likes, retweets, downloads on Mendeley, etc.).

Indicators of social impact are important to take into account in relation with the open science movement. One way of measuring the impact on society may include calculating indicators for patents, articles in the general press or the number of interviews granted (to radio or television), but this type of data is currently difficult to collect and remains patchy.

Bibliometric tools

Databases

Databases are used to compile a corpus of references for analysis. All databases have limits and drawbacks (user-friendliness, topics covered, number of journals and books indexed and time period). It is therefore important to choose a database that corresponds with the aim of the bibliometric analysis.

The two subscription databases, both available at the Institut Pasteur, are:

Created in the 1960s, this is the historical database for bibliometrics on which several international rankings are based, including the Academic Ranking of World Universities (ARWU) also known as the Shanghai ranking. The limits of its coverage are well known, meaning that its inherent biases are clear, such as the fact that humanities, social sciences and economics are not well represented.

This database is more recent (2004) and its use is becoming increasingly widespread. Some international rankings use it like the Times Higher Education (THE). Its coverage is wider than WoS in thematic terms and with regard to the number of journals and books indexed. But its temporal coverage is not precisely known.

There is no real alternative to these two databases

It is not possible to use Pubmed since this database does not retrieve the list of references cited and author affiliations only began to be stored a short time ago.
The problem of Google Scholar is that we don't have a precise idea of its coverage. This means that we cannot evaluate biases in the calculation of indicators.

Always cite the source databaseAnyone can calculate their indicators using databases. But from one database to another, the value of these indicators may change (especially depending on the database coverage), which is why you should always indicate the database and the filters used.

Once the corpus of references has been compiled from the database, the references are sent to an analysis tool to be processed, generating either figures or visual representations.

Analysis tools

Institut Pasteur - Bibliothèque du CeRIS - Outils de visualisation

Data visualization tools

These tools enable users to visualize data in a way that is often more meaningful and clear, using maps. The most well-known free programs are VOSviewer and Gephi. They can be used to:

analyze large quantities of words (full text of publications);
search for co-occurrences of words and produce word clusters;
display results in the form of a map.

The maps generated by these tools are often a useful addition to traditional bibliometric analyses, and they can be of as much interest to scientists themselves as to decision-makers, as a means of identifying competitors, potential collaboration opportunities or topics.

Questions & Answers

Why do Institut Pasteur staff ask the library to carry out their bibliometric analyses for them?

Firstly because the library staff know which indicators to use. They also take the time to correct the data. To produce a quality analysis, there are several little traps that need to be avoided (homonyms, changes of unit, etc.), requiring a great deal of care and attention. Finally, the library staff provide the scientist with a full package of results, containing the indicators and also the raw data, maps and a methodology report.