You are here:

Math and Science Solutions for Businesses/Mean and mode of unequal class size distributions

Advertisement


Question
1. How to calculate the mean and mode of a frequency distribution where the class sizes are not equal? For example,

0-10   5
10-20  6
20-40  13
40-50  10
50-60  3

2. Also, while calculating the kurtosis or skewness of a distribution, can we simply find the standard deviation and raise it to the appropriate power?

Answer
1. Calculating the mean (aka expected value) of a distribution from its tabulated relative frequency is done pretty much the same way whether the class sizes are equal or not. The key concept here is to add together the "weights" of the representative values for each class. This is in agreement with the definition of the expected value where each possible value of the random variable (RV) is multiplied by its relative frequency and then all the products summed up.

In your example, you have 5 classes. A representative value for each class would just be the midpoint of the class interval. For instance, for the class from 10-20, the midpoint is 15. For 20-40, it is 30. Now, associated with each of these class values (which represent the possible values of the RV, as described above) is the number of samples of the RV in the interval for the class (eg. for 10-20 represented by 15, its how many values of the RV fall between 10-20, namely 6 for your example). This number of samples is a fraction, f(15), of the total number of samples, the fraction being (for your example)

f(15) = 6 / (5+6+13+10+3) = 6/37 = 0.162.

The mean would then be calculated by summing the terms consisting of the midpoint value times its weight, such as (15)*(0.162), all together. The term for the 20-40 class would be (30)*(13/37), etc.

The mode is given by the representative class value with the highest number of samples, which in this case, would be 30.

2. It is not OK to just raise the standard deviation (SD) to the appropriate power to obtain the skewness and kurtosis. For many (most?) distributions for which a formula is available, such as normal or binomial distributions, there is a relationship between the SD and the skewness and kurtosis which involves the parameters of the distribution. In other words, these distributions can be completely described by their parameters and formula. However, it is generally not just a matter of raising the SD to an appropriate power.

This is particularly true for an empirical relative frequency distribution, as represented by a histogram, for 2 reasons. The SD is a measure of the symmetric spread about the mean using the square of the distance from the mean. This means that any asymmetry, aka skewness, will not be captured by the SD. The use of the square of the distance for the SD means that outliers are weighted by their squared distance. The kurtosis uses the 4th power of the distance so that, while it too is symmetric, the kurtosis weights the outliers more heavily. In short, you could have a distribution that is very skewed and/or have very long tails (high kurtosis) and this would not be revealed by the SD.

Hope this helps.

Math and Science Solutions for Businesses

All Answers


Answers by Expert:


Ask Experts

Volunteer


Randy Patton

Expertise

Questions regarding application of mathematical techniques and knowledge of physics and engineering principles to product and services design, optimization, prediction, feasibility and implementation. Examples include sales and product performance projections based on math/physics models in addition to standard regression; practical and cost effective sensor design and component configuration; optimal resource allocation using common tools (eg., MS Office); advanced data analysis techniques and implementation; simulation and "what if" analysis; and innovative applications of remote sensing.

Experience

26 years as professional physical scientist and project manager for elite research company providing academic quality basic and applied research for government and defense industry clients (currently retired). Projects I have been involved in include: - Notional sensor performance predictions for detecting underwater phenomena - Designing and testing guidance algorithms for multi-component system - Statistical analysis of ship tracking data and development of anomaly detector - Deployed vibration sensors in Arctic ice floes; analysis of data - Developed and tested ocean optical instrument to measure particles - Field testing of protoype sonar system - Analysis of synthetic aperture radar system data for ocean surface measurements - Redesigned dust shelters for greeters at Burning Man Festival Project management with responsibility for allocation and monitoriing of staff and equipment resources.

Publications
“A Numerical Model for Low-Frequency Equatorial Dynamics” (with Mark A. Cane), J. of Phys. Oceanogr., 14, No. 12, pp. 18531863, December 1984.

Education/Credentials
MIT, MS Physical Oceanography, 1981 UC Berkeley, BS Applied Math, 1976

Past/Present Clients
Am also an Expert in Advanced Math and Oceanography

©2016 About.com. All rights reserved.