CWTS, core dataset on publications

Publications, daily bread of researchers. To study, consult, or share information, publication databases are populated every single day. And what about RISIS core dataset on publications?

The dataset is called CWTS publications, and previously it was referred to as Leiden Ranking dataset. Initially it was introduced in RISIS as a dataset with key bibliometric statistics for the most prominent universities. Later on, this (aggregate) dataset was rather an entry to the detailed info at publication level, the data underlying the Leiden Ranking” said Ed Noyons, senior researcher at the Leiden University based Centre for Science & Technology Studies (CWTS) and RISIS project leader within CWTS.



The CWTS publication database is a full copy of Web of Science (WoS) dedicated to bibliometric analyses, enriched by enhancements and improvement to the original version. Main harmonized elements regard organisation names and matching cited references to source publications: “We are continuously working on improving the quality dataset: cleaning the data, harmonizing affiliations of authors, etc. In parallel we create links between our data and the other RISIS datasets, when possible, e.g., linking author affiliations to partner info in EUPRO, applicants in patents”, said Noyons.

CWTS Leiden Ranking comprises research performance statistics on more than 900 universities. These are universities with at least 1000 publications (counted fractionally) in 2014-2017 according to data from WoS. The ranking data is updated every year in May.



The results in the Leiden Ranking benefit from the harmonization investments. This relates to the accuracy of the author-affiliation used to define a university as well as to the assignment of affiliated institutions. CWTS distinguishes three different types of affiliated institutions:

Component, the affiliated institution is actually part of or controlled by the university

Joint research facility or organization, identical to a component except that it is administered by more than one organization.

Associated organization, an autonomous institution that collaborates with one or more universities based on a joint purpose but at the same time has separate missions and tasks.



We are linking the dataset with other RISIS resources by creating a permanent link from author affiliations with OrgReg identifiers. The database enables output and (scientific) impact analyses of any set of publications covered by WoS, using state of the art methods and high quality data”, concludes Noyons.

Services currently offered by the infrastructure include a public dataset to demonstrate the potential of the database, together with a complete documentation of data and methods used is available at, and more details studies can be executed on-site at CWTS using the underlying database via a research visit.