The curious case of area measurement in surveys (II)

Published on Tuesday, 25 October 2016

Published by Lakshman Nagraj Rao on Tuesday, 25 October 2016

Field enumerators surveying plots in Viet Nam.
Field enumerators surveying plots in Viet Nam.

Co-written with Pamela Lapitan and Rea Jean Tabaco

In our previous blog, we presented three different ways of estimating plot size. Let’s continue the mystery of the optimal data collection method for plot size.

We classified plot size into deciles based on GPS measurement and then calculated relative bias (ratio of difference of farmer estimates and GPS estimates to GPS estimates) for plots falling within each decile. A very interesting finding is revealed.

Farmers with smaller plots tend to over-estimate their plot size (in the smallest decile, close to 80%), while those with larger plots tend to under-estimate their plot size. There are several hypotheses for why there are such large variations: characteristics of farm manager (age, education, and gender), non-standard units of measurement for plot size used by farmers, the number of plots and total landholding of the household, etc. We hope to explore some of these relationships in our upcoming work looking at land area measurement biases. What is evident, though, is that the errors associated with self-reported measures are not random and are a function of plot size.

Plot size comparison between farmer-, GPS-based estimates

Source: ADB staff estimates based on the field validation and crop cutting activities under R-CDTA 8369.

Let us now think about some policy implications of using the farmer-based estimates instead of GPS/Google Earth-derived estimates. As part of our data collection activities, we also collected yield information through the crop-cutting. This technique relies on identifying a randomized spot on a plot (a square, circle or triangle) of a certain dimension and harvesting the crop within this spot to calculate the quantity harvested. Once scaled up to a sufficient number of plots, randomization ensures that there are no biases in the yield estimate for the province.

By calculating percentage difference in total production estimates using GPS- and Google Earth-based areas with respect to the farmer’s estimate of area, we find a strong downward bias in production numbers for smaller plots and a weak upward bias in production figures for larger plots.

Percentage difference in production estimates between farmer’s estimate and GPS/Google Earth by decile

Source: ADB staff estimates based on the field validation, crop-cutting activities under R-CDTA 8369.

At this point, we probably do not need to elaborate on why we should care about data quality in sample surveys. The relative ranking of the provinces in terms of production could be altered significantly after the correction in measurement error in plot size, which has broader implications for targeting of government programs to improve rice production and overall food security. 

One must also consider the cost implications vis-à-vis data quality while considering the three methods. Farmer recall-based area estimates are very cheap to obtain, and easy to implement through an additional question in a field-based survey. Area estimates derived from Google Earth entail slightly more work than a traditional farmer recall survey, but this is not hectic from a fieldwork perspective. GPS-based area estimates require enumerators to walk around the plot to obtain and record area measurement that needs to be re-verified by inputting the GPS boundary file into GIS software. The cost estimates would then follow the suggested ranking: Farmer recall estimates < Google Earth < GPS-based estimates.

Our study finds that it is more cost-effective to use high-resolution satellite data for tracing and digitizing plot boundaries instead of implementing the messy and risky process of getting enumerators to carry GPS devices and walk around rice plots. Google Earth-based area estimates are also very close to GPS-based estimates, and should therefore be considered as a strong candidate for future scale-up.

The above case is just one example of how technological improvements in the survey world help improve data quality and consequently impact policy. In fact, the growth rate of technological advancement has been phenomenal in the last 20 years. What would take 700 pieces of 3.5-inch floppy disks (of 1.44 MB capacity) to store 1 GB of data today takes a half-inch memory card. Our smartphones can now do much more than calls and sending SMS messages. We can capture images, record videos, open documents, check the weather condition, transfer money, use navigation tools and download a whole range of applications that have fundamentally changed how we interact with the world. Internet connections are getting stronger and cheaper, allowing us to upload and download data quickly and ubiquitously.

Survey practitioners can take advantage of these technological advances by using Computer-Assisted Personalized Interviewing (CAPI) techniques for collecting survey data. CAPI—which ADB is piloting with a technical assistance project in the Marshall Islands, Sri Lanka, and Viet Nam—refers to the general use of computer devices—including handheld ones such as tablets and smartphones—in data collection. The modernization of statistical survey operations through integration of ICT may increase the efficiency of statistics administration and improve data quality. More specifically, if data producers see the potential of modernized data collection, this may create an avenue for the production of timely and reliable statistics, which in turn enhances and more efficiently addresses the demands of data users.

Pamela Lapitan is an Economics and Statistics Analyst and Rea Jean Tabaco is a Consultant in ADB's Economics Research and Regional Cooperation Department.