FAQs

What data are available in CountryData?

How does CountryData receive data from National Statistics Offices (NSOs)?

Where does the international data used for MDG comparisons come from?

Why is there a difference between national and international estimates of the same MDG indicator?

How does CountryData differ from MDGLabs?

Why is metadata so important in the comparison of national and international estimates for MDG indicators?

Why are there a select number of countries presented in CountryData?

What is SDMX anyway?

What is an SDMX registry?

What is an SDMX message?

What is a Data Structure Definition (DSD)?

What is a Metadata Structure Definition (MSD)?

What Data Structure Definition (DSD) does CountryData use?

What Metadata Structure Definition (MSD) does CountryData use?

How are international and country data matched and compared?

What are the main reasons for differences between national and international estimates for development indicators?

CountryData contains data directly provided by National Statistics Offices (NSOs), as the focal point of the National Statistical System (NSS). These data cover a variety of themes like poverty, education, health and environment, agreed as priority national development indicators by the NSS. If these data match a MDG indicator published by an international agency, both national and international data for that indicator are presented side by side in the 'Comparisons' section of each country’s profile.

CountryData is built around each country’s SDMX registry which shares development indicator data and metadata packaged and published in SDMX messages based on an internationally agreed Data Structure Definition and Metadata Structure Definition.

International data presented in CountryData are extracted from the Global MDG database which is managed and updated by the United Nations Statistics Division (UNSD). This database contains indicator estimates provided by the international agencies responsible for monitoring MDGs. The database is updated annually and is used to produce the annual Millennium Development Goals report. Accompanying reference metadata are based on the Handbook for Monitoring MDGs available for international estimates.

For a variety of reasons national and international estimates for MDG indicators differ. Sometimes international agencies resort to their own internal estimates of MDG indicators because of a lack of national data or dissemination channels for national statistical indicators. Other times, agencies make adjustments to national MDG indicator data to facilitate valid cross-country comparisons. In many of these cases the reasons for the differences are not fully known or understood, and this leads to confusion among users, undermines the credibility of statistical systems, and can have serious policy implications.

MDGLabs was the first internet platform created by the United Nations Statistics Division (UNSD) to try to tackle differences between national and international estimates for MDGs. The web application now displays the discrepancies between the data collected by over 90 countries (covering Africa, Asia, Latin America and the Caribbean) and international agencies on a specific set of 20 MDGs. CountryData incorporates much of this website’s functionality, but goes further to streamline the process of data exchange (i.e. through SDMX) and concentrates on data and reference metadata availability too. CountryData will eventually replace MDGLabs when expanded to a fuller set of countries.

Many times the reason for differences between national and international estimates will not be obvious from the top-level descriptions of the data (i.e. series name, sex, location, unit of measurement etc.); therefore, more detailed textual metadata is required on definition used, methodology adopted or how the data was obtained to understand the exact nature of the data and determine the actual reason(s) for any differences. This is why CountryData makes an effort to show a complete set of reference metadata (obtained from countries also through SDMX) and present reference metadata side by side in any comparison of national and international estimates of the same MDG indicator.

CountryData is initially working with a small group of participants on a project funded by the UK’s Department for International Development (DfID) to improve the collation, availability and dissemination of national development indicators (including the MDGs). It is envisaged that over time more countries will be included in the website. Like Mexico, a non-project country, it is possible for countries with more advanced capacity to develop their own SDMX connection with CountryData.

Statistical Data and Metadata eXchange (SDMX) is a set of technical guidelines together with statistical standards to:
  • Allow faster, more reliable, and simpler data and metadata processing;
  • Reduce human error (i.e. data transcription/ manipulation errors);
  • Create a unified sharing architecture to reduce development and maintenance costs;
  • Create a standardised sharing format to reduce response burden;
  • And harmonise and standardise statistical metadata.

An SDMX registry is used to facilitate the dissemination of data in the form of SDMX messages. Structural metadata, such as Data Structure Definitions, Concepts, Codelists, etc, can be published at the registry. The registry also maintains links to data and reference metadata sources, and alerts subscribers (like CountryData) when updates are available.

An SDMX message is essentially an XML document that uses a Data Structure Definition or Metadata Structure Definition to structure and code/map data or metadata exported from a database or other source.

The Data Structure Definition provides the design of how data exported from a database or other source should be structured and coded in a SDMX message. Any Data Structure Definition (DSD) is established on dimensions and attributes. Dimensions (dim) are a mandatory requirement to identify the observation value (i.e. data point) while attributes (att) are optional or mandatory additional descriptive or qualitative features of the observation value.

The Metadata Structure Definition provides the design of how metadata exported from a database or other source should be structured and coded in a SDMX message.

CountryData chose to implement an expanded version of the DSD for MDG indicators (MDG DSD) for data exchange of national development indicators with National Statistics Offices (NSOs), because it is the most developed and internationally recognised standard for this subject domain, i.e. the MDG DSD was developed for the 125 diverse indicators (171 time series) collated as part of the coordination process for MDG progress report each year by a special Inter-Agency Expert Group (IAEG) taskforce.

The set of dimensions and attributes used to define the MDG DSD are presented in the table below:

Data Structure Definition (DSD) for MDGs: Dimensions & attributes

Type

Name

Type of code used

Dimension

Frequency

i.e. Annual, Quarterly, etc.

Dimension

Series

Indicator title

Dimension

Units of measurement

i.e. Percent, number

Dimension

Location

i.e. Total, Urban, Rural

Dimension

Age group

i.e. 15–49 yr olds, 6–59 month

Dimension

Sex

i.e. Both sexes, male, female

Dimension

Reference Area

Country name

Dimension

Source Type

i.e. Survey, census, admin.

Time dimension

Time Period

i.e. 1990, 1991, etc.

Measure

Observation Value

-

Attribute

Unit multiplier

i.e. per 10,000, per 1,000 etc.

Attribute

Time period details

i.e. 2001 – 2003, Q1 2010 – Q3 2011

Attribute

Nature of data points

i.e. Estimated, Modelled, Adjusted etc.

Attribute

Source details

Source name & date

Attribute

Footnotes

Details of methodology & other notes etc.

CountryData chose to base textual metadata exchange on the MSD for MDG indicators (MDG DSD) for data exchange of national development indicators with National Statistics Offices (NSOs), for reasons similar to the choice of the DSD for MDGs. The set of fields used to display the MDG MSD on CountryData are presented in the table below:

Metadata Structure Definition (MSD) for MDGs: Concepts

Description

Definition of the indicator or background series provided

Method of computation

Comments and limitations

Sources of discrepancies between global and national figures

Process of obtaining data

Expected time of release

A series of steps have been built into the CountryData application to automate the process of matching national and international estimates for comparison as much as possible. The use of the same MDG DSD for both the national and international series simplifies this process, and the matching is done directly by making comparisons on key dimensions of MDG DSD, such as series, unit of measurement, location, sex and reference area (all require an exact match except where coded "Not Applicable"); and frequency, age group and source type (does not require an exact match).

When two time series (national and international) are paired on the above basis then a comparison of the associated metadata can commence. This should yield some information on the reasons for the differences, or further follow-up may be required with either the National Statistics Office or International Agency provider. Any follow-up response will be asked to be fed back through in the SDMX messages CountryData receives, otherwise the explanation is written up in a stand alone commentary box beside the reasons for difference categories.

CountryData presents eight categories as the main labels of comparison in the table below. Further explanations beyond the allocation of these labels are provided where relevant in the commentary box. This enables users to decide which data are most appropriate for their specific purposes, and reduces the confusion surrounding different indicator estimates.

Label

Definition

No difference

Describes when there is a complete congruence between the two series, in terms of the associated observation values and years they are allocated and are available for are the same.


Discrepancy Labels

Different age groups

Describes when different age groups are used between the same time series.

Different data sources

Describes the use of results from different data sources – international agencies can use multiple data sources to compute an indicator while the country will use a single survey or an administrative source, which the agency may not have access to.

Different definitions

Describes when the international agency and the country define the indicator differently – the national definition used can be more inclusive that the specific categories included in these indicators as defined by the international agencies.

Different methodologies

Describes a different method of computation used between the country and the international agency – international agencies can use statistical models to estimate an indicator while the country will report figures directly from the survey.

Different source type

Describes when different source types are used between the same time series (i.e. admin vs. survey). “Different data sources” will also apply.

Under investigation

Applied when the data are first updated, usually a placeholder until a reason is investigated.

Unidentified

Describes following investigation, when there is a discrepancy but the reason remains unclear/ unresolved.


Note; these categories directly reflect the Metadata Structure Definition and the Data Structure Definition used by CountryData.