Methods in Action: Dealing with heteroscedasticity and other issues in regressions about scaling
Topic and objective
The course will deal with a set of common problems in regression analysis of data on science production, specifically at the organizational level, which generate methodological issues when applying OLS regressions. Most of them are generated by extremely large variations in the size of the considered organizations, such as universities, which translate into a series of known statistical problems, including:
- Non-linearity, i.e. regression coefficients depend on the size of the considered organization.
- Heteroskedasticity, i.e. the variance of the depend on the organizational size.
- The presence of outliers, such as very small organizations, as well as of leverage points, for example very large organizations in the sample.
- Non-normality of the residuals.
The course will illustrate these problems based on a recent paper on scaling of publication output over university budgets for US and European universities (Lepori, Geuna and Mira 2019), where the hypothesis of non-linear scaling is tested empirically (Leitao, Miotto, Gerlach and Altmann 2016). Statistical problems will be identified and strategies for addressing them will be presented, such as the use of Feasible Generalized Least Squares (FGLS; Hansen 2007).
In the practical part of the course, participants will be asked to test the proposed approaches on the original dataset and to discuss advantages and disadvantages; thy might also propose and test alternative approaches.
Prof. Benedetto Lepori is currently titular professor at the Faculty of Communication, Culture and Society and rector delegate per research analysis at Università della Svizzera italiana. His research interest include: Governance and organizational structures of higher education institutions, Institutional theory, particularly institutional logics approaches and hybrid organizations, Development of data infrastructure for S&T studies, Diversity and characterization of higher education systems, Comparative analysis of national research policies and funding systems, Indicators to characterize research funding systems and higher education institutions.
Monday 18.10.2021, 9-12. Presentation session (including creation of the groups and task assignment).
Thursday 21.10.2021, 16-18. Slots for interim discussion with groups.
Monday 25.10.2021. 9-12. Group presentation and discussion.
The course aims at involving participants among the following categories:
- Senior scientist, early career researchers
- PhD students at the last phase of their training
- People from the policy making level wishing to extend their analytical capabilities
- Research intermediaries (e.g. research association like Science Europe).
Maximum number of participants
Requirements for participation
- Knowledge of basic principle of statistics, as well as of standard regression techniques (OLS).
- Good working knowledge of statistical software (Stata).
Deadline for registration
Hansen, C. B. (2007). Generalized least squares inference in panel and multilevel models with serial correlation and fixed effects. Journal of Econometrics, 140(2), 670-694.
Leitao, J. C., Miotto, J. M., Gerlach, M. & Altmann, E. G. (2016). Is this scaling nonlinear? arXiv Preprint arXiv:1604.02872.
Lepori, B., Geuna, A. & Mira, A. (2019). Scientific output scales with resources. A comparison of US and European universities. PloS One, 14(10).