Kashif Zia - Towards Reproducible Open SIPHER’s Synthetic Population
Presenting author: Kashif Zia (University of Glasgow, UK)
Authors: Kashif Zia, Andreas Hoehn, Nik Lomax, Alison Happenstall, Petra Meier
Session: C02D - Synthetic Data - Wednesday 11:00-12:30 - Erika-Weinzierl Hall
The lack of a comprehensive register-based system in Great Britain has made it challenging to access data on individuals across multiple domains. SIPHER’s synthetic populations are being created to address this by providing a representative, attribute-rich dataset reflecting the whole of the population in Great Britain. SIPHER’s synthetic populations are the outcome of a spatial microsimulation process. SIPHER’s synthetic populations are created based on two data inputs: (1) Understanding Society survey data; (2) small-area sociodemographic census information. Both data inputs are combined via the Flexible Modelling Framework (FMF). FMF is an open-source, GUI-based application written in Java that enables the efficient processing of large-scale data for spatial microsimulations.
A particular strength of SIPHER’s synthetic populations arises from the statistical power of its small-area perspective – as this level can typically be not achieved by respective special license linkages. SIPHER’s synthetic populations enable a variety of analyses. For example, our data can be used to derive descriptive small-area estimates or allow researchers to simulate policy interventions and explore their potential impact on individuals and households across Scotland, England, and Wales. SIPHER believes in open science. Therefore not only the synthetic populations but also the sources (data and code) to replicate the process and run the validation tests are being shared using UK data service official channels and GitHub repositories. Through this paper/presentation, we intend to share our methods and success stories with the microsimulation community.