The Square Matrix: The Adjusted Census

Published Jan 21, 2013, 03:07 PM by Sandra Albanese

Since the various releases of the 2011 Census, my main task at Environics Analytics (EA) lately has been developing the Adjusted Census product. That’s our database that includes all of the popular Census variables that analysts and marketers want plus values filled in for missing data that is the result of Statistics Canada methodologies. Clients lately have been asking why the Adjusted Census continues to be developed, especially now that the Census is free of charge. So without getting too technical, I thought I’d provide a very brief refresher on why the Adjusted Census is created and still makes sense for many clients.

Some of the issues with the Census that analysts and others who use it in their research or reporting encounter result from the ways that Statistics Canada deals with poor quality data and small sample sizes. Statistics Canada routinely uses random rounding and data suppression methods to cope with these problems.

With random rounding, the issue that arises is that the counts for smaller geographies do not add up to the higher level parent geographies and, as a result, the sum of the percentages may be higher or lower than 100%. For example, values for Census Subdivisions (CSDs) will not add up to their parent Census Divisions (CDs), Census Tract (CTs) will not add up to parent CSDs and Dissemination Areas (DAs) will not add up to CTs and CSDs in non-tract areas.

sandrablog2

The table above shows a typical example of data involving one theme that is released in the Census. These are actual 2011 Census counts for Female Population by Marital Status for the CSDs that comprise the parent CD of Lunenburg, Nova Scotia (1206). The cells that are highlighted in yellow show the differences between the sum of the components that make up the total Female Population 15 years and older by Marital Status and the total provided by Statistics Canada. The total of the sum of the components is 21,250, a difference of 10 from the theme total for this CD. Similarly, the difference between the CSDs that make up this CD parent for each component variable is shown in blue. At smaller geographic levels, such as the DA level, the number of differences may be more numerous, confounding the issue further.

These differences are remedied within EA’s Adjusted Census: all numbers from smaller geographies exactly add up to the larger parent geographies—from dissemination areas all the way up to the national total. Similarly, the percentages add up to 100% over the children geographies within the parent geographies. Also, the numbers of component geographies always equal the sum of the total. The product is developed on the premise that creating a square matrix where all values add up to the theme totals both horizontally and vertically, while following general rules that preserve the reliable counts that are published by Statistics Canada. If you were always taught by your schoolteachers that no percentage should be greater than 100—as I was— then the Adjusted Census is a product that will make you feel at home. Users can be carefree when manipulating Adjusted Census data with the knowledge that the total percentages will never be greater than 100%.

So what you get in the Adjusted Census is the above table revised as follows.

sandrablog1
You can see that all totals are equal horizontally and vertically. This table also shows that one of the CSD children was imputed based on strong evidence that this should not have been all zeros for this theme. (Imputation methodology is another complex topic and will not be discussed here.) Note that the parent CSD total has also changed from the previous table’s total by 10. Instead of providing a separate total, specific to the theme, so that counts are adjusted to the component variables and should add up to that randomly rounded total, a ‘master’ total is created for each theme so that the same total is equal across all themes. Accordingly, in this example, the female population 15+ would be equal to the sum of the female population by 5-year age cohorts.

There is a great deal to talk about when we consider the various intricacies that go into developing of this dataset, but that is probably best left to the technical documentation.

–Sandra Albanese

sandra_albaneseB9C24F28E616 Sandra Albanese is a Senior Research Analyst with the Custom Research team, focusing on the development of standard data products and custom projects. She holds an honours Bachelor of Science degree in environmental sciences from the University of Toronto, a certificate in geographic information systems from Mohawk College, and is now pursuing a Master of Spatial Analysis degree at Ryerson University.