Geek looking at data
Resources

The Quality Of The NHS Release - A Q&A With Jan Kestle

Published May 14, 2013, 08:40 AM by Jan Kestle

The Quality of the NHS Release

 A Q&A with Jan Kestle

President and Founder of Environics Analytics

Q1: The first results are now out from the National Household Survey. When the methodology was changed so that these data were to be collected on a voluntary basis—rather than as the previously compulsory long-form Census—there was a lot of concern about how this change would impact data quality. What are your impressions based on today’s release?

  1. At first look, I would say that what was released is useful. What I am more concerned about is what wasn’t released. No data were included for census tracts or dissemination areas (DAs) – the building blocks for the small area data that our users rely on.
  2. Data were released for Canada, the provinces, the CMA/CAs (census metropolitan areas/census agglomerations) and CDs (census divisions). Data were released for about 75% of the CSDs (census subdivisions) covering 97% of the population. As always, some data points were suppressed for confidentiality to protect the privacy of Canadians. But data for some small CSDs were not released because they were “suppressed for quality”. 
  3. As we understand the situation at this time, Statistics Canada used their longstanding approach to create a measure of non-response combining non-response to the entire survey and to individual questions. Where this measure of non-response for a unit of geography exceeded a threshold, the data were not released. I believe the method applied was somewhat less stringent than they have used in the past in an attempt to release as much data as possible while still staying within acceptable quality limits. Based on our understanding of what was done and the known high caliber of the methodologists at StatsCan, we believe that users will be able to rely on the released data for most purposes.
  4. The real concern for us and for our many customers in the business, government and not-for-profit sectors is the impact on small area data. We understand that we can buy the data “not suppressed for quality” at the census tract and DA level as a special tabulation even though the data were not part of the official release. We expect that Statistics Canada may release census tract data at a later date after the quality testing has been complete, but these data will have missing observations. We do not have any information about any plans they may have to release DA-level data.

Q2: Why are these small area data so important to marketers and social researchers?

  1. In the private sector these data are used to develop and deliver products and services that meet the needs of consumers. When businesses have accurate demographic data for neighbourhoods, they know where growing families need childcare services and toy stores, and whether car buyers are more likely to choose pickup trucks or minivans. And because the data help marketers better understand the tastes and preferences of specific neighbourhoods, consumers receive offers that are relevant to them, such as coupons they will actually use and events they are more likely to attend. Even not-for-profits benefit from demographic data when designing programs to combat cancer or launching fundraising campaigns in support of the arts.
  2. In the public sector these data are used to guide the development, implementation and monitoring of programs in areas such as education, housing, transportation and health. Data about subsets of the population are extremely important to researchers: immigrants, aboriginals, and different socioeconomic or age groups—all these groups may have responded in a non-representative way. The ability of researchers to understand them and help design programs to reach them will be affected negatively.

Q3: The decision to make the long-form survey responses voluntary was made out of respect for the privacy of Canadians. Isn’t that more important?

  1. Privacy is extremely important and a key aspect of our business; all of the data that we provide users are developed from many sources and are privacy compliant.
  2. There are strict rules in place at Statistics Canada to ensure that no data about any individual or household are revealed in their products.
  3. There have been very few complaints to the Privacy Commissioner in relation to the Census—ever.
  4. It is also our experience that, when consumers understand the use to which the data are put and how important they are to good public policy and a responsive business environment, they are willing to provide information as long as it is aggregated so individuals cannot be identified.
  5. Our work with consumer marketers suggests that consumers are very happy to receive advertising and offers that are relevant and that result in greater convenience and cost savings. When businesses use data effectively, they actually improve the consumer experience.

Q4: Aren’t there lots of other sources of data?

  1. There are many but none that replace the Census: survey samples are designed and weighted based on Census data and customer databases are enhanced with demographic overlays. Administrative data can be helpful, but there are strict regulations that limit the linking of data even to produce anonymous aggregate statistics.
  2. New sources like “Big Data” from the web,  cookie tracking, social media listening and enormous databases collected for many other purposes—all have value but need to be normalized and compared to the whole population in order to be useful and to allow analysts to accurately interpret the behaviour that they capture.
  3. There is still one great source: the 2011 Census. Analysts still have available the data from the mandatory Census that were collected from almost all Canadians. These data are reliable down to small areas and for the special population groups that were covered.

Q5: What does Environics Analytics intend to do now that these problems in the small area data have been identified?

  1. Both Dr. Tony Lea and Dr. Doug Norris, our Chief Methodologist and Chief Demographer, respectively, have over 25 years of experience with census and survey data, as well as with modeling. They are already studying the results and have identified options to be explored. We tackle data challenges every day, though we would have preferred it if this particular challenge could have been avoided.
  2. EA will use the data from the NHS that have been quality approved by StatsCan in the update of our DemoStats product. Every year, we produce DA-level estimates for about 300 variables using a variety of data sources and demographic, econometric and geographic modeling techniques. These data are used by hundreds of analysts across Canada. Our previous approach was to incorporate DA-level Census results every five years to ensure the accuracy of our models. Rather than using the DA level from the NHS, we have methods to use the higher-level data within the update of these models. We have techniques that we use for WealthScapes and custom models that will enable us to do a good job – just not as good as with a full Census.
  3. We will also consider buying the unsuppressed data and testing it for DemoStats.
  4. Some variables that were previously available in our Adjusted Census product may need to be added to DemoStats. This will require significant R & D within our existing modeling framework. And we will be consulting with our users to identify the required variables.
  5. In all these efforts, we will use 2006 data from the long and short form at the small area level, along with the 2011 Census and whatever is usable from the NHS.

Q6: What does this mean for future years?

  1. The methods described above that rely on a previous Census will only work for the short term. We are in a time of great demographic change in Canada and the past cannot effectively predict the future for a number of variables with new data.
  2. We hope that today’s release of the NHS initiates a national dialog about the importance of reliable national statistics – particularly at the small area.  With knowledge of how the NHS has actually fared, researchers, analysts and policy makers can have an informed discussion about what the best approach is for the future.
  3. Options that we think should be considered are asking more mandatory questions that could be used to better weight the survey, and privacy-compliant access to administrative data from other federal departments or even provincial governments.

Q7: Are you available to discuss this further?

Please contact me at jan.kestle@environicsanalytics.ca or 416.969.2834.
Back to top