A Relational Database for Malariometric Data

ROAD-MAP: Data Engineering

Under the leadership of MAP Programme Manager Mike Thorn and Joe Harris, the ROAD-MAP team is developing a relational data model to store epidemiological data. This database has been designed in a flexible, disease-agnostic manner in order to provide utility for capturing both malariometric and a wide range of other epidemiological data.

Relational Database Design

ROAD-MAP has designed a relational data model and implemented it using PostgreSQL (with the PostGIS extension for geospatial data types). The key aspects of this database are:

  • It is disease agnostic. ROAD-MAP has reviewed the data collected and used in the study of a variety of vector-borne diseases to inform the database design.
  • Flexibility. The database design makes use of data-driven types, allowing unforeseen data to be stored without the need to change the database design. Possible data items include but are not limited to:
    • The number of cases of a given species of parasite reported for an area by routine surveillance systems
    • The parasite rate reported in a cross-sectional survey, georeferenced to a given latitude/longitude
    • The amount spent by an NGO on interventions such as mosquito nets for a given geographical area
    • The results of a blood test for an individual
    • Images of tissue samples
    • The biting habits of a given disease vector
  • It caters for authenticated access. The database design allows for the association of permissions with roles and users. These in turn can be assigned to allow or prevent various levels of access to sets of data.
  • Auditability. All data stored can be associated with bibliographic or other sources and grouped into datasets. Structures are included to log changes to records, storing the pre- and post-change record state and the user responsible for the change.
  • Modularity. The database is extensive in its scope but is divided into areas of related functionality, such as survey data, systematics, geographical data, and routine surveillance data. All of these areas are modularised, allowing flexibility over how much of the entire database design is implemented.

All our database designs will be further documented on these pages and released via the MAP GitHub account.