RISIS is enlarging its datasets family launching ESID, European Social Innovation Database that utilizes advanced machine learning and natural language processing techniques. RISIS ESID is a comprehensive and authoritative source of information on social innovation projects and actors in Europe and beyond. The main goal: to collect information about social innovation projects and actors from the publicly available information on the web.


ESID also uses limited human annotation to train its machine learning models and to ensure the quality and integrity of the data. Initially developed as part of the EU Funded KNOWMAK project, ESID is now being developed as part of the EU-funded RISIS 2 project. The ESID database contains two sets of datasets, one being the subset of the other. The full dataset comprises 9577 social innovation projects in total. The curated dataset which is part of the 9577 projects comprises data that is high quality as it has been manually checked and annotated by different annotators. For these projects, ESID contains a title, type of social innovation with scores, summary, location, and topic.



Social innovation is part of the solution to the various challenges that European societies face, especially in this complex period. From aging populations and the inclusion of marginalised groups to globalisation, it is necessary to build capabilities for societies and citizens to flourish. Many of these innovative solutions are social innovations, which have facilitated the growth of interest in the subject. From a policy perspective, social innovation is becoming increasingly important in the European Union.



ESID represents a very precious instrument for the researcher’s community as well as for policymakers. All of the existing social innovation databases collect their information through manual data input by project team members or social innovation organisations themselves. ESID employs an alternative approach to manual data input. It collects data through semi-automated machine learning. It was built on the above-mentioned publicly available databases, but it verifies, extends and enriches them. Consequently, ESID forms a definitive and comprehensive information source on social innovation with much higher precision and recall than existing databases.


What are the advantages of ESID? Surely, it is thematically more comprehensive, it covers a broad range of societal grand challenges and key enabling technologies, thanks to its full integration with the ontologies developed in KNOWMAK Project and for this reason, it provides much richer information on the projects and actors. Moreover, as it is based on semi-automatic and automatic information retrieval and knowledge discovery techniques, it is more sustainable than existing databases that rely on continued human coding. ESID is updated with minimal human supervision. Due to machine learning, the more data it includes, the more precise its data collection is, which requires less human supervision.


More information available on Zenodo