Basic Math/???????7th grade math?????????
Expert: Josh - 9/27/2005
QuestionI am 49 years old and am trying my best to help a 13 year old boy who is staying with us to do his math that a teacher who is paid to do this won"t take the time to do so.We are doing Median,Mode,mean of numbers.I just learned tonight how to do them and what they mean!BUT,what I can"t find out is what is an outlier and how is it obtained from an eqaution?Please HELP.Thank You Josh for your time.Robert
AnswerHi Robert,
Allow me to quickly recap the definitions of these terms.
Given a data set (or observations) X,
"Mode" is the most frequently occuring score in X.
"Mean" is essentially the average. If there are N elements in X, then, the mean is the sum of all elements, divided by N.
"Median" is the middle score obtained from a sequence of data points sorted in increasing (or decreasing) order.
If the number of elements in X is even, we circle the two data points in the middle, and find their average.
"Outliers" refer to data points which do not conform to the overall trend. That is, observed values which are way smaller or larger than the bulk of the data are deemed to be "outliers".
Visually, we can regard points A, B and K in the following example as outliers.
A__B_______________C_D_E__F__G_H___I_J_________________K
Suppose we order N observations in a data set in increasing order. The "lower fourth" and "upper fourth" are given by
Case 1: if N is even
"Lower fourth":= Median of the smallest N/2 observations.
"Upper fourth":= Median of the largest N/2 observations.
Case 2: if N is odd
"Lower fourth":= Median of the smallest (N+1)/2 observations.
"Upper fourth":= Median of the largest (N+1)/2 observations.
A measure of spread that is resistant to outliers is the so called "fourth spread", f_s, given by
f_s = "Upper fourth" - "Lower fourth" .....[#1]
Using this definition, data points which lie outside the interval of [Lower Fourth-1.5*f_s,Upper Fourth+1.5*f_s] may be regarded as "outliers". Furthermore, if a point lies outside the interval spanning between Lower_Fourth-3*f_s and Upper_Fourth+3*f_s, it is referred to as an "extreme outlier".
Picture something like this.
______X______Y___Z___M__A______Z'__Y'_____X'______>
Here, point A represents the "mean" conceptually; M represents the "median" conceptually.
Let Z=Lower_Fourth, Z'=Upper_Fourth,
Let Y=Z-1.5*f_s, Y'=Z'+1.5*fs,
Let X=Z-3*f_s, X'=Z'+3*fs,
Most data points will fall in the interval betw. Z and Z'.
If a data point is in the interval [X,Y] or [Y',X'] (meaning between Y and Z, OR between Z' and Y'), it is a "mild outlier".
If a data point is in the interval [-infinity,X] or [X',+infinity], it is an "extreme outlier".
e.g., N=22 data points are observed from an experiment.
The smallest half is given by
2.68,3.06,4.31,4.71,5.71,5.99,6.06,7.04,7.17,7.46,7.5
The largest half is given by
8.27,8.42,8.73,8.84,9.14,9.19,9.21,9.39,11.28,15.19,21.06
The median is given by Median=(7.5+8.27)/2=7.885
Lower fourth =5.99
Upper fourth =9.19
f_s=9.19-5.99=3.20
1.5*f_s=4.80
3.0*f_s=9.60
Y'=Upper_Fourth+1.5*f_s=13.99
X'=Upper_Fourth+3.0*f_s=18.79
so, 15.19 is a mild outlier
21.06 is an extreme outlier and so forth.
Cheers.