You are here:

Question
QUESTION: 1.A random sample of 1000 workers from south India shows that their mean wages are `47/- per week with a standard deviation of `28/-. A random sample of 1500 workers from north India gives a mean wage of `49/- per week with a standard deviation of `40/-. Is any there any significant difference between their mean level of wages?
2.In a test given to two groups of students drawn from two normal populations, the marks obtained were as follows:
Group A   18   20   36   50   49   36   34   49   41
Group B   29   28   26   35   30   44   46   -   -
Examine at 5% level, whether the two populations have the same variance.
3.   From the following data
X   25   22   28   26   35   20   22   40   20   18
Y   18   15   20   17   22   14   16   21   15   14
a) Calculate regression equation X on Y and Y on X.
b) Estimate X when Y = 20
c) Estimate Y when X=20
d) Calculate Karl Pearson's coefficient of
correlation
e) Check whether Regression line is a good fit.
4.An index is at 100 in 1991. it rises 5% in 1992. falls 6% in 1993, falls 5% in 1994, and rises 4% in 1995 and 7% in 1996. Calculate the index numbers for all these years with 1991 as base.

Can you please answer any two questions , that would really help two complete my assignments in MBA.

regression
ANSWER: 1. For this you need to make the assumption that the difference in the means is normally distributed (or close enough) and use the standard deviation of this mean to see if it falls within an appropriate confidence interval.

Let mean of 1st group be m1 = 47, std dev be s1 = 28, for n1 = 1000 workers
2nd          m2 = 49,          s2 = 40,       n2 = 1500

Then the test statistic for the difference of the means is

X = (m2 - m1)/S  where S = sqrt{ s1^2/n1 + s2^2/n2)} = 2/1.36 = 1.47 = normally distributed with zero mean, unit variance.

The means are considered to be from the same distribution (i.e., not different) at the 95% confidence interval (levelof significance) if X < 1.96. Since 1.47 < 1.96, they are nor significantly diffference.

2. To check to see if variances are different, you need to use the F distribution and the statistic

X = { [n1/(n1-1)]･s1^2 } / { [n2/(n2-1)]･s2^2 }.

where s1 = std dev of 1st group, n1 = number; likewise for 2nd group.

Plugging in the number you listed, I get X = 2.12. At a 5% confidence, F = 2.11, so the variance are just over the threshold and would be considered different at theis confidence level.

3. Using the numbers you have for X and Y, I used Excel to make the plot in the attached image. Note the equations for the fit of X vs. Y and for Y vs. X. These eqns can be used to answer b) and c). The regression fit (part e)) is R^2 = 0.91. I'll leave part d) to you.

4. Starting with 100 in 1991

+ 5% -> (1.05)･100 = 105 in 1992
- 6% -> (100%-6% = 96%)･105 = 98.7 in 1993
-5%          93.8    1994
+ 4%          97.5     1995
+ 7%          104.3    1996.

The solutions to these problems are easily found in probability and statistics books or online. Let me know if you want more info on understanding them. Your problems look like homework problems so the above is probabily sufficient.

Randy

---------- FOLLOW-UP ----------

QUESTION: These questions are for my assignments and i want to explain very detail.

Thank you,
Sai

OK, a little more explanation. I assume you have course materials that you can refer to so I won't try to write a textbook here. In the future, more focused questions would be appreciated.

1. This a standard technique where a test statistic, X, is defined that has a known (or assumed) probability distribution with given parameters. The value of X  is then compared with tabulated values (or values calculated using numerical techniques) of the disribution to see how far out in the tails (away from the mean) the value lies. If the value lies far away from the mean, then the hypothesis that X belongs to the distribution is rejected at some confidence level (probability).

In many cases (such as this one), the test statistic is a so-called standardized variable which is assumed to be normally distributed (Gaussian) with zero mean and unit variance. This is done by "normailizing" the random variable (RV) under consideration by subtracting from it the distribution mean and dividing by the standard deviation.

In the present case, the question is whether 2 calculated means are significantly different. The test statistic is defined as the difference in the means, m2 - m1, divided by an estimate of the std dev of this difference, S. This std dev (= sqrt(variance)) is given by a simple extension of the well-known result that the variance of the mean is

Vmean = Vpop/n

where Vpop is the familiar variance of the population from which the mean is calculated and n is the number of samples. This result for the SD of the mean is very important and should be well understood. It says that the estimate of the mean, obtained by summing the random variables of the population, becomes better and better (smaller variance) as more samples are included in the estimate (larger n).

Since we are dealing with means from 2 populations, the variance S must combine the variances of the 2 populations, as shown in the previous answer. The test statistic becomes

X = (m2 - m1)/S.

If X ≤ 1.96 (i.e., the difference in the means is less than 1.96 standard deviations from 0), then the means are considered to be from the same population at the 0.05 confidence level.

2. A similar approach is taken here where now the test statistic, X, is the ratio of variances from the 2 poplations. Here, X is modeled as the ratio of 2 chi-squared RVs which is given by an F distribution (details of where the F distribution comes from are beyond the scope here but its values can be tabulated/calculated; the RVs are chi-squared distibuted because the calculated variances are assumed to be squared Gaussian RVs). Again, the test statistic is compared to the appropriate distribution (i.e., F) and judged as to whether it is too "rare" (too far out in the tail).

3. The image I sent you shows the cross-plots of X vs. Y and Y vs. X. When people use the term "regression" in this simple context, they usually mean fitting a line through the points in some optimal way. Almost invariably, this line is determined by the method of least squares (LS), which is a standard technique (that you should already be aware of). The MS Office application Excel has a tool for calculating this LS line, as shown in the image.

The LS lines are, of course, described by a slope and an intercept. For parts b) and c), all you need to do is choose the appropriate equation for X vs. Y or Y vs. X, multiply the independent variable (whose value will be 20) by the slope and add the intercept. This is just arithmetic.

The correlation coefficient measures how well x and y are linearly related (how good the linear LS fit is) and is defined in a standard way. For data points x and y and with the definition < > = average, we have

r^2 = {<xy>-<x><y>}  / sqrt{<(x-<x>)^2><(y-<y>)^2>}, that is, the average of the product of the RVs divided by the product of their std devs. Thus if x and y vary together closely (pairs x and y have close values), then <xy> ≈ var(x) ≈ var(y) so that r^2 ≈ 1.

4. The change in the index from year to year is just

index_n+1 = (1 + fractional change)･(index_n)

where the fractional change f = %change･100.

Hope this helps.

Math and Science Solutions for Businesses

Volunteer

#### Randy Patton

##### Expertise

Questions regarding application of mathematical techniques and knowledge of physics and engineering principles to product and services design, optimization, prediction, feasibility and implementation. Examples include sales and product performance projections based on math/physics models in addition to standard regression; practical and cost effective sensor design and component configuration; optimal resource allocation using common tools (eg., MS Office); advanced data analysis techniques and implementation; simulation and "what if" analysis; and innovative applications of remote sensing.

##### Experience

26 years as professional physical scientist and project manager for elite research company providing academic quality basic and applied research for government and defense industry clients (currently retired). Projects I have been involved in include: - Notional sensor performance predictions for detecting underwater phenomena - Designing and testing guidance algorithms for multi-component system - Statistical analysis of ship tracking data and development of anomaly detector - Deployed vibration sensors in Arctic ice floes; analysis of data - Developed and tested ocean optical instrument to measure particles - Field testing of protoype sonar system - Analysis of synthetic aperture radar system data for ocean surface measurements - Redesigned dust shelters for greeters at Burning Man Festival Project management with responsibility for allocation and monitoriing of staff and equipment resources.

Publications
“A Numerical Model for Low-Frequency Equatorial Dynamics” (with Mark A. Cane), J. of Phys. Oceanogr., 14, No. 12, pp. 18531863, December 1984.

Education/Credentials
MIT, MS Physical Oceanography, 1981 UC Berkeley, BS Applied Math, 1976

Past/Present Clients
Am also an Expert in Advanced Math and Oceanography