As part of its programme to expand capacity in and support for pathogen genomics around Africa, the Africa Centres for Disease Control and Prevention hosted its first training in data curation, a pillar of modern data science.
Addis Ababa, 21 June-Building on the launch of Africa’s first pathogen data-sharing platform in 2023, the Africa Centres for Disease Control and Prevention (Africa CDC) conducted a three-day workshop in data curation from 18-20 June 2024 in Addis Ababa, Ethiopia.
The workshop conducted in collaboration with the National Center for Biotechnology Information (NCBI), National Library of Medicine, USA drew participants from public-health laboratories in 20 Member States.
The workshop, integral to the Africa CDC’s African Pathogen Data Sharing and Archive Platform, also known as Agari is the first offered by the Africa CDC that aims to accelerate data verification and validation before sharing for public health use. This continental platform is intended for use by national public health institutions, national reference laboratories, research and academic institutions from around Africa to upload, manage, and share pathogen sequence and associated metadata to effectively respond to public health threats across Member States in a coordinated manner.
Data curation is key to the successful set-up and use of Agari as it ensures the accuracy and usability of public health data put into the system, said Professor Alan Christoffels, director of the South African National Bioinformatics Institute and a senior advisor in genomics and bioinformatics to the Africa CDC. Trained data curators, and training in data curation, are both hard to come by in Africa, noted Christoffels.
“Ideally, we should be growing a community of data curators in Africa to support reproducible data for public health use and benefit sharing” he said.
This workshop served as the first steps in building such a community and strengthen the collaboration with NCBI. The training comprised a series of lectures, combined with extended hands-on sessions. The programme included the exploration of established best practices in data curation and web-based data curation tools and will familiarise participants with the step-by-step processes of data cleaning, formatting and sharing with national, regional, and global repositories.
“It’s incredible to know that I am underutilising those tools that are available on the platform,” said Olusola Anuoluwapo, head of genomics unit at Nigeria’s Centre for Disease Control and Prevention in Abuja.
Anuoluwapo who attended the workshop works on genomic sequencing of pathogens of interested in Nigeria has used the NCBI platform for some years even as an undergraduate.
“One take home message for us is that we get to collect a lot of samples in the field and we get those analysed and published and we forget the metadata which renders our samples or efforts futile and then cannot link our results to where it’s from,” she said.
Anuoluwapo added that the NCBI database has a lot of tools that can help to minimise the turn around time when it comes to analysis of data and getting them available.
Tholwana Pelokgosi who handles biological data at the National Public Laboratory in Botswana said the workshop equipped her with the knowledge about NCBI database such as the genebank and Pubmed. “Understanding these resources will help me streamline the data retrieval and analysis better,” she said.
“We were taught that it is vital to ensure that the data that is submitted is of good quality or it meets the required standards and formats so that it is easily accessible and there is consistency and reliability of good value to the scientific community,” Pelokgosi said.
The workshop follows recommendations from the Public Health Alliance for Genomic Epidemiology (PHA4GE) data curation technical working group to develop the initial quality controls for data to be deposited into Agari. The team included members from SANBI, PHA4GE, the Mozambiquan Instituto Nacional De Saúde, Morocco’s Institut Pasteur du Maroc, the National Institute of Public Health in Uganda, and Senegal’s Institut Pasteur Dakar.
The group also developed a standard operating procedure (SOP) for standardising meta data in the Agari platform. The SOP was based on work done by PHA4G, a global coalition working to establish data standards.
“This workshop is part of the Africa PGI plan to train 100 data curators every year to accelerate and ensure genomic data quality in Africa.” says Dr Harris Onywera, bioinformatics data scientist at the Africa CDC.
About Africa CDC
About Africa CDC: The Africa Centres for Disease Control and Prevention (Africa CDC) is the African Union’s continental autonomous public health agency that supports member states in strengthening health systems and improving surveillance, emergency response, and prevention and control of diseases. Learn more at: http://www.africacdc.org
Media Contact Margaret Edwin, Director of Communication & Public Information Division: Africa CDC Tel: +251 986 632 878 | Email: EdwinM@africacdc.org