FAQs
What data are available in CountryData?
How does CountryData receive data from National Statistics Offices (NSOs)?
Where does the international data used for MDG comparisons come from?
Why is there a difference between national and international estimates of the same MDG indicator?
How does CountryData differ from MDGLabs?
Why are there a select number of countries presented in CountryData?
What is a Data Structure Definition (DSD)?
What is a Metadata Structure Definition (MSD)?
What Data Structure Definition (DSD) does CountryData use?
What Metadata Structure Definition (MSD) does CountryData use?
How are international and country data matched and compared?
CountryData contains data directly provided by National Statistics Offices (NSOs), as the focal point of the National Statistical System (NSS). These data cover a variety of themes like poverty, education, health and environment, agreed as priority national development indicators by the NSS. If these data match a MDG indicator published by an international agency, both national and international data for that indicator are presented side by side in the 'Comparisons' section of each country’s profile.
CountryData is built around each country’s SDMX registry which shares development indicator data and metadata packaged and published in SDMX messages based on an internationally agreed Data Structure Definition and Metadata Structure Definition.
International data presented in CountryData are extracted from the Global MDG database which is managed and updated by the United Nations Statistics Division (UNSD). This database contains indicator estimates provided by the international agencies responsible for monitoring MDGs. The database is updated annually and is used to produce the annual Millennium Development Goals report. Accompanying reference metadata are based on the Handbook for Monitoring MDGs available for international estimates.
For a variety of reasons national and international estimates for MDG indicators differ. Sometimes international agencies resort to their own internal estimates of MDG indicators because of a lack of national data or dissemination channels for national statistical indicators. Other times, agencies make adjustments to national MDG indicator data to facilitate valid cross-country comparisons. In many of these cases the reasons for the differences are not fully known or understood, and this leads to confusion among users, undermines the credibility of statistical systems, and can have serious policy implications.
MDGLabs was the first internet platform created by the United Nations Statistics Division (UNSD) to try to tackle differences between national and international estimates for MDGs. The web application now displays the discrepancies between the data collected by over 90 countries (covering Africa, Asia, Latin America and the Caribbean) and international agencies on a specific set of 20 MDGs. CountryData incorporates much of this website’s functionality, but goes further to streamline the process of data exchange (i.e. through SDMX) and concentrates on data and reference metadata availability too. CountryData will eventually replace MDGLabs when expanded to a fuller set of countries.
Many times the reason for differences between national and international estimates will not be obvious from the top-level descriptions of the data (i.e. series name, sex, location, unit of measurement etc.); therefore, more detailed textual metadata is required on definition used, methodology adopted or how the data was obtained to understand the exact nature of the data and determine the actual reason(s) for any differences. This is why CountryData makes an effort to show a complete set of reference metadata (obtained from countries also through SDMX) and present reference metadata side by side in any comparison of national and international estimates of the same MDG indicator.
CountryData is initially working with a small group of participants on a project funded by the UK’s Department for International Development (DfID) to improve the collation, availability and dissemination of national development indicators (including the MDGs). It is envisaged that over time more countries will be included in the website. Like Mexico, a non-project country, it is possible for countries with more advanced capacity to develop their own SDMX connection with CountryData.
- Allow faster, more reliable, and simpler data and metadata processing;
- Reduce human error (i.e. data transcription/ manipulation errors);
- Create a unified sharing architecture to reduce development and maintenance costs;
- Create a standardised sharing format to reduce response burden;
- And harmonise and standardise statistical metadata.
An SDMX registry is used to facilitate the dissemination of data in the form of SDMX messages. Structural metadata, such as Data Structure Definitions, Concepts, Codelists, etc, can be published at the registry. The registry also maintains links to data and reference metadata sources, and alerts subscribers (like CountryData) when updates are available.
An SDMX message is essentially an XML document that uses a Data Structure Definition or Metadata Structure Definition to structure and code/map data or metadata exported from a database or other source.
The Data Structure Definition provides the design of how data exported from a database or other source should be structured and coded in a SDMX message. Any Data Structure Definition (DSD) is established on dimensions and attributes. Dimensions (dim) are a mandatory requirement to identify the observation value (i.e. data point) while attributes (att) are optional or mandatory additional descriptive or qualitative features of the observation value.
The Metadata Structure Definition provides the design of how metadata exported from a database or other source should be structured and coded in a SDMX message.
The set of dimensions and attributes used to define the MDG DSD are presented in the table below:
Data Structure Definition (DSD) for MDGs: Dimensions & attributes | ||
---|---|---|
Type |
Name |
Type of code used |
Dimension |
Frequency |
i.e. Annual, Quarterly, etc. |
Dimension |
Series |
Indicator title |
Dimension |
Units of measurement |
i.e. Percent, number |
Dimension |
Location |
i.e. Total, Urban, Rural |
Dimension |
Age group |
i.e. 15–49 yr olds, 6–59 month |
Dimension |
Sex |
i.e. Both sexes, male, female |
Dimension |
Reference Area |
Country name |
Dimension |
Source Type |
i.e. Survey, census, admin. |
Time dimension |
Time Period |
i.e. 1990, 1991, etc. |
Measure |
Observation Value |
- |
Attribute |
Unit multiplier |
i.e. per 10,000, per 1,000 etc. |
Attribute |
Time period details |
i.e. 2001 – 2003, Q1 2010 – Q3 2011 |
Attribute |
Nature of data points |
i.e. Estimated, Modelled, Adjusted etc. |
Attribute |
Source details |
Source name & date |
Attribute |
Footnotes |
Details of methodology & other notes etc. |
Metadata Structure Definition (MSD) for MDGs: Concepts
Description |
Definition of the indicator or background series provided |
Method of computation |
Comments and limitations |
Sources of discrepancies between global and national figures |
Process of obtaining data |
Expected time of release |
A series of steps have been built into the CountryData application to automate
the process of matching national and international estimates for comparison as
much as possible. The use of the same MDG DSD
for both the national and international series simplifies this process, and the
matching is done directly by making comparisons on key dimensions of MDG DSD, such
as series, unit of measurement, location, sex and reference area (all require
an exact match except where coded "Not Applicable"); and frequency, age group
and source type (does not require an exact match).
When two time series (national and international) are paired on the above basis then
a comparison of the associated metadata can commence. This should yield some
information on the reasons for the
differences, or further follow-up may be required with either the National
Statistics Office or International Agency provider. Any follow-up response will
be asked to be fed back through in the SDMX messages
CountryData receives, otherwise the explanation is written up in a stand alone
commentary box beside the reasons for difference categories.
Label |
Definition |
No difference |
Describes when there is a complete congruence between the two series, in terms of the associated observation values and years they are allocated and are available for are the same. |
Discrepancy Labels |
|
Different age groups |
Describes when different age groups are used between the same time series. |
Different data sources |
Describes the use of results from different data sources – international agencies can use multiple data sources to compute an indicator while the country will use a single survey or an administrative source, which the agency may not have access to. |
Different definitions |
Describes when the international agency and the country define the indicator differently – the national definition used can be more inclusive that the specific categories included in these indicators as defined by the international agencies. |
Different methodologies |
Describes a different method of computation used between the country and the international agency – international agencies can use statistical models to estimate an indicator while the country will report figures directly from the survey. |
Different source type |
Describes when different source types are used between the same time series (i.e. admin vs. survey). “Different data sources” will also apply. |
Under investigation |
Applied when the data are first updated, usually a placeholder until a reason is investigated. |
Unidentified |
Describes following investigation, when there is a discrepancy but the reason remains unclear/ unresolved. |
Note; these categories directly reflect the Metadata Structure Definition and the Data Structure Definition used by CountryData.