RISIS Summer School materials on Data Science

Online, on Zenodo space dedicated to RISIS project, it’s possible to find and download all materials related to first RISIS Summer School on Data Science for Studying Science, Technology and Innovation (STI), that took place in Glasgow on 24-28 June 2019.


During the 5-day training, we have learned:


R programming language and how this can be used for data science, different variable types that R can handle and how these can be manipulated, before moving on to more complex concepts like loops and conditional statements; the Tidyverse as a means for data manipulation, and how the Tidyverse can be used to combine multiple data sets in a variety of ways;


  • Publication Data Analysis, with Rodrigo Costas Comesaña and Martijn Visser (University of Leiden):

A special focus on understanding and interpret a wide range of publication and citation-based indicators and statistics; discussion on major scientometric data sources, including Web of Science, Scopus, Dimensions, Microsoft Academics and Google Scholar, including their strengths and limitations and practical applications;


Demonstration of some of the widely used tools for the cleaning and analysis of publication data including VantagePoint, OpenRefine and R bibliometrix package;


  • Citation Analysis and Vosviewer, with Rodrigo Costas Comesaña and Martijn Visser (University of Leiden):

Presentation of analysis based on funding acknowledgments, mobility, open access and altimetric. A short presentation of VOSviewer, a software tool for constructing and visualizing bibliometric networks;


  • Network Data Science, with Thomas Scherngell and Martina Neuländtner (AIT)

Overview on the theoretical and conceptual background of Network Data Science in the context of STI studies, mainly inspired from a social network, but also from an economic geography perspective; Spatial interaction models as a specifically useful instrument for STI studies to explain R&D collaboration dynamics. Basic descriptive measures from Social Network Analysis (SNA) and network visualisation, specification and estimation of spatial interaction models in R, demonstrating how to estimate factors influencing network dynamics.


Introduction of the concepts behind a Text Mining Workflow, including formatting and preparing data for text-mining; text mining process from a practical aspect where demonstrating how to perform data preparation as well as topic modelling in R. Practical session.



Available documents include presentations, exercises and dataset materials.