Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS Maternal Use of Medications and Safety StudyReportar como inadecuado

Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS Maternal Use of Medications and Safety Study - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Medical Research Methodology

, 17:97

Data collection, quality, and reporting


BackgroundData cleaning is an important quality assurance in data linkage research studies. This paper presents the data cleaning and preparation process for a large-scale cross-jurisdictional Australian study the Smoking MUMS Study to evaluate the utilisation and safety of smoking cessation pharmacotherapies during pregnancy.

MethodsPerinatal records for all deliveries 2003–2012 in the States of New South Wales NSW and Western Australia were linked to State-based data collections including hospital separation, emergency department and death data mothers and babies and congenital defect notifications babies in NSW by State-based data linkage units. A national data linkage unit linked pharmaceutical dispensing data for the mothers. All linkages were probabilistic. Twenty two steps assessed the uniqueness of records and consistency of items within and across data sources, resolved discrepancies in the linkages between units, and identified women having records in both States.

ResultsState-based linkages yielded a cohort of 783,471 mothers and 1,232,440 babies. Likely false positive links relating to 3703 mothers were identified. Corrections of baby’s date of birth and age, and parity were made for 43,578 records while 1996 records were flagged as duplicates. Checks for the uniqueness of the matches between State and national linkages detected 3404 ID clusters, suggestive of missed links in the State linkages, and identified 1986 women who had records in both States.

ConclusionsAnalysis of content data can identify inaccurate links that cannot be detected by data linkage units that have access to personal identifiers only. Perinatal researchers are encouraged to adopt the methods presented to ensure quality and consistency among studies using linked administrative data.

KeywordsData cleaning methods Data consistency Perinatal Record linkage AbbreviationsACTAustralian Capital Territory

APDCAdmitted Patient Data Collection

COD URFCauses Of Death Unit Record File

DOBDate of birth

EDEmergency department

EDDCEmergency Department Data Collection

HMDCHospital Morbidity Data Collection

MNSMidwives Notification System

NSWNew South Wales

PATIDProject-specific Patient Identification Number PBS linkage

PBSPharmaceutical Benefits Scheme

PDCPerinatal Data Collection

PPNProject-specific person number State-based linkage

RBDMRegistry of Births, Deaths and Marriages

RoCCRegister of Congenital Conditions

WAWestern Australia

YOBYear of birth

Electronic supplementary materialThe online version of this article doi:10.1186-s12874-017-0385-6 contains supplementary material, which is available to authorized users.

Download fulltext PDF

Autor: Duong Thuy Tran - Alys Havard - Louisa R. Jorm


Documentos relacionados