Chapter 1 Acquiring household survey data
This chapter describes the data sources used for measuring poverty and inequality as well as how the data are selected and obtained.
1.1 Selection criteria and source of data
For most countries, the Poverty and Inequality Platform (PIP) relies on household surveys carried out by countries’ National Statistical Offices (NSO). Typically, official surveys used to report poverty for the country are used. This includes household budget surveys or household income and expenditure surveys. These surveys ask a representative sample households in the country detailed questions about their income and/or consumption and much more. In some occasions, the World Bank is a co-producer of these surveys.
Other surveys on consumption or income, such as surveys carried out by academic institutions and private sector companies are generally not included in PIP. The primary reason for this is that the estimates from PIP are the official vehicle for countries’ reporting on the Sustainable Development Goal target 1.1.1 on extreme poverty, which ought to come from sources that are officially recognized by countries.
For high-income countries in the European Union (with the exception of Germany), the data are obtained from the EU Statistics on Income and Living Conditions (EU-SILC). EU-SILC data are partly obtained from surveys and partly obtained from administrative records. For other high-income economies (Australia; Canada; Germany; Israel; Japan; South Korea; Taiwan, China; the United Kingdom; and the United States) data from the Luxembourg Income Study (LIS) Database are used. These data rely on surveys carried out by NSOs or by academic institutions. Data from the LIS database are also used for high-income countries from the European Union for years predating EU-SILC, which naturally causes a break in the series.
The estimates for India from 2015-2019 rely on the Consumer Pyramids Household Survey conducted by the Centre for Monitoring Indian Economy, a private company. The survey data are adjusted to be representative of India and reflect the consumption basket of the National Sample Survey (see Sinha Roy and Van Der Weide (2022) for details).
Occasionally, a survey fulfilling the above criteria is not incorporated into the platform because the World Bank does not have access to the data, because of concerns about the survey design or welfare aggregate, or because the auxiliary data needed (such as consumer price indices and purchasing power parities) are unavailable. Decisions to exclude a survey from the platform that has previously been available are discussed carefully in the Global Poverty Working Group within the World Bank and the reason for the exclusion of a data point is summarized in the “What’s New” documents.
The input data used in PIP from these sources are most often unit record data of welfare and occasionally grouped data, which is converted to a full distribution. PIP cannot take as input data the resultant international poverty rate from a welfare distribution without the underlying distribution, such as a point estimate published on an NSO website. There are many reasons for this:
Extrapolating or interpolating country estimates to a common reference year, which is needed to calculate global poverty, requires a full distribution.
The lack of detailed information on the distribution would not allow for inequality metrics to be calculated, including the data needed for SDG 10.1.1 and 10.2.1, and would not allow users to access poverty rates as a function of the poverty line chosen.
Updates to poverty estimates in the face of revised PPPs and CPIs would not be possible to perform by the World Bank, imposing a higher burden on NSOs and potentially becoming an obstacle for timely publication of updated data.
Unit-record data allows for quality checking the data and ensuring that the choices used to create the welfare aggregate are as comparable as possible across countries.
1.2 How data are obtained
In many low-income countries, the Poverty and Equity Global Practice of the World Bank is cooperating with the National Statistical Offices and supporting their efforts of conducting household surveys and measuring poverty. Data are obtained through these partnerships. This concerns most countries in East Asia & Pacific, the Middle East & North Africa, South Asia, and Sub-Saharan Africa.
For Latin America and the Caribbean, most of the data are obtained from and harmonized by the CEDLAS’s Socio-Economic Database for Latin America and the Caribbean (SEDLAC), which is a partnership between the Center for Distributive, Labor and Social Studies (CEDLAS) at the National University of La Plata in Argentina and the World Bank’s Poverty and Equity Global Practice for Latin America and the Caribbean.
For most high-income countries data are obtained through the Luxembourg Income Study or EU-SILC.
All the data go through validity checks by the Poverty and Equity Global Practice of the World Bank. In certain occasions, the Poverty and Equity Global Practice is also responsible for harmonizing and constructing the welfare aggregates. The harmonized welfare aggregates are passed on to PIP.
1.3 Representativeness
In the majority of cases, the surveys used are nationally representative of all households in a country. Often, the selection of households is based on the most recent census. Yet three factors prevent surveys from being representative of the entire population:
- Some households are less likely to agree to participate in national surveys on income and consumption, or, if they participate, less likely to answer the questions used for calculating poverty. This is particularly a concern for wealthy households, which means that the distributions captured might have too thin a tail at the top.
- Some individuals do not reside in households as defined in household surveys and hence are not captured by the above sampling frame. This particularly concerns homeless people and individuals residing in nursery homes, prisons, and displacement camps.
- At times, conflicts and extreme weather events prevent enumerators from going to certain parts of a country. In these cases, some geographical areas may be excluded from the survey. The 2018-19 Nigeria Living Standards Survey, for example, is not representative of the Borno state, since parts of the state became inaccessible over the course of the survey.
For the purpose of calculating global or regional poverty, the poverty rates estimated in the surveys are assumed to be representative of the entire population despite of the three shortcomings above. This is largely because acquiring any alternative poverty estimate for the missing populations is challenging.
PIP also contains some surveys which explicitly only target the urban or rural part of the population. These surveys are not used for the purposes of calculating regional or global poverty, except for in Argentina and Suriname, which only have surveys representative of their urban population. For the urban population of Argentina and Suriname, these surveys are used for the regional and global poverty counts, while the rural population of Argentina and Suriname get the same treatment as countries for which we have no data.
1.4 Grouped vs. unit record data
In some cases, countries are unwilling to share unit-record data with the World Bank but they are willing to share grouped data. Grouped data are aggregated data, often to 5, 10, 20, or 100 quantiles of the distribution. Such data could reflect, for example, the mean income of each percentile or the income share of each percentile. For the purposes of estimating poverty and inequality, a Lorenz curve is estimated from these grouped data upon which inequality and poverty can be estimated (described here). Grouped data are used for Algeria, China, Guyana, Suriname, Turkmenistan, Trinidad & Tobago, Venezuela, and the United Arab Emirates and more countries for earlier years.
1.5 Multiple imputed micro data
In some countries, a full household survey with a comprehensive consumption measure cannot be collected due to concerns over the safety of enumerators and respondents, however, shorter interviews without a full welfare aggregate which minimize safety risks can be conducted. In these cases, a full welfare aggregate can be imputed using statistical models. These methods, described in greater detail in Arayavechkit et al. (2021) and Castaneda Aguilar et al. (2022), typically use multiple imputations, resulting in a dataset with many consumption values for each household. These multiple consumption values reflect the uncertainty that arises from the statistical modelling. Statistics of interest, such as poverty or inequality are computed separately on each of these consumption vectors and averaged to get to a final estimate. Currently such data are used for South Sudan 2016, Zimbabwe 2017, Nigeria 2010, 2012, and 2015, and India 2015-2019.