Geek looking at data


What Is An Average?

Jun 22, 2011, 01:12 PM by Sean Howard
Averages are something that most of use every day. But I am not entirely sure how deeply many of us think about what an average actually means (pun intended). So to take us all back to Statistics 101 for a moment: An average is one attribute, among many, of a distribution of observations. It is one of three possible descriptors of the central tendency of a distribution, otherwise known as the mean.  The average is the value that, if each observation had the same value, would result in the same sum across the distribution. The other two measures of the central tendency of a distribution are the mode and the median.  The mode is the value that appears most often in a distribution. The median is the observation for which 50% of the observations are less than it and 50% are greater.

At this point, it might be useful to provide a simple example. So let’s work with the following set of observations.

Observation Value
1 10
2 25
3 25
4 30
5 32
6 35
7 38
8 40
9 42
10 50
Total 327

For this set of 10 observations, the mode is 25, median is 33.5 and the average is 32.7. In this case, if each observation in the set of 10 had a value of 32.7 the same total value would be obtained. But, based on the data, it is quite clear that many of the observations are quite different from the average.


Unweighted average vs. weighted average?

Averages are used widely in most analytics. The average number of people per households, the average number of dollars per customer, the average number of minutes per visitor to a web-page—all are popular metrics in the business world. In many cases, the observations that we work with in marketing analytics are already average statistics like those listed. At this point, it is probably useful to look at another example. Let’s consider the average household income for multiple neighbourhoods in a city—an important consideration for any restaurant or retail chain contemplating where to open new outlets.

There are two ways that an average can be derived for the data in the table below. An unweighted average is calculated by taking the average household income for all of the neighbourhoods in a city and computing the city average directly from the neighbourhood-level statistics–without considering the number of observations associated with each of the neighbourhoods. This number would be the average of the averages. The unweighted average for these 10 neighbourhoods would be $79,950.00.

Alternatively, in order derive the weighted or true average for these 10 neighbourhoods, we first have to weigh each of the averages based on the number of households in each of the neighbourhoods. This can be done in one of two ways. The easiest way is to multiply each average income by the number of households in the neighbourhood (a.k.a. the aggregate income), sum the aggregate incomes for the 10 neighbourhoods and then divide that figure by the total number of households in the city. Using this method, the true average for the collection of neighbourhoods is $84,888.16.

Neighbourhood Average Household Income Households Aggregate Household Income
1 $50,000 10 $500,000
2 62,000 50 3,100,000
3 75,000 34 2,550,000
4 75,500 493 37,221,500
5 76,000 592 44,992,000
6 80,000 49 3,920,000
7 86,000 99 8,514,000
8 90,000 324 29,160,000
9 100,000 222 22,200,000
10 105,000 340 35,700,000
Total 2,213 $187,857,500

From the example above, you can see that the unweighted and weighted average can produce very different results. In this particular case, the weighted average is 6% and nearly $5,000 higher than the unweighted version. In fact, there are only two general cases when the weighted and unweighted average will produce the same results. If each of the neighbourhoods, or some other collection of observations, has the same number of observations (or weight), the two averages will be the same. Second, if the average for all of the neighbourhoods is the same, then the city average—weighted or unweighted—will be same. When working with real data, however, it is highly unlikely that analysts will confront either of these two cases.

As a general rule, people should always use the weighted average because it accounts for the individual observations used to derive the neighbourhood or “collection” level averages. In fact, in 99% of analytical challenges where aggregate data is used, the weighted average is the most appropriate average to use. However, there are some peculiar cases where an unweighted average is suitable or the only possible solution. If you do not know the number of observations for each collection, whether the number of observations is equal across each collection or if the average for all collections is the same, then the unweighted average can be used. However, in the first case when the weight (or number of observations) for each collection is unknown, the unweighted average should be used with caution. Other than those instances, a weighted average is the best solution when working with aggregated data.

sean-howard (1)As a Research Associate with Environics Analytics’ Custom Research team, Sean Howard develops custom products and client solutions using census demographics and geography. A trained urban geographer, he holds a master’s degree in geographic information systems from the University of Calgary and an honours bachelor’s degree in geographic information science and geography from Queen’s University.