Papers
arxiv:2406.14163

A Unified Statistical And Computational Framework For Ex-Post Harmonisation Of Aggregate Statistics

Published on Jun 20, 2024
Authors:

Abstract

The Crossmaps Framework provides a mathematical and computational approach to ex-post harmonisation by defining transformation logic through computational graphs and formalising data structures for tracking statistical properties and minimising errors in multi-source dataset integration.

AI-generated summary

Ex-post harmonisation is one of many data preprocessing processes used to combine the increasingly vast and diverse sources of data available for research and analysis. Documenting provenance and ensuring the quality of multi-source datasets is vital for ensuring trustworthy scientific research and encouraging reuse of existing harmonisation efforts. However, capturing and communicating statistically relevant properties of harmonised datasets is difficult without a universal standard for describing harmonisation operations. Our paper combines mathematical and computer science perspectives to address this need. The Crossmaps Framework defines a new approach for transforming existing variables collected under a specific measurement or classification standard to an imputed counterfactual variable indexed by some target standard. It uses computational graphs to separate intended transformation logic from actual data transformations, and avoid the risk of syntactically valid data manipulation scripts resulting in statistically questionable data. In this paper, we introduce the Crossmaps Framework through the example of ex-post harmonisation of aggregated statistics in the social sciences. We define a new provenance task abstraction, the crossmap transform, and formalise two associated objects, the shared mass array and the crossmap. We further define graph, matrix and list encodings of crossmaps and discuss resulting implications for understanding statistical properties of ex-post harmonisation and designing error minimising workflows.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.14163 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.14163 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.