You are here:

Question
1.   “Statistical unit is necessary not only for the collection of data, but also for the interpretation and presentation”. Explain the statement.
2.   Find the standard deviation and coefficient of skewness for the following distribution
Variable    0-5   5-10   10-15   15-20   20-25   25-30   30-35   35-40
Frequency   2   5   7   13   21   16   8   3

3.   A salesman has a 60% chance of making a sale to any one customer. The behaviour of successive customers is independent. If two customers A and B enter, what is the probability that the salesman will make a sale to A or B.

4.   To verify whether a course in Research Methodology improved performance, a similar test was given to 12 participants before and after the course. The original marks and after the course marks are given below:
Original Marks   44   40   61   52   32   44   70   41   67   72   53   72
Marks after the course   53   38   69   57   46   39   73   48   73   74   60   78
Was the course useful? Consider these 12 participants as a sample from a population.
5.   Write short notes on
a)   Bernoulli Trials
b)   Standard Normal distribution
c)   Central Limit theorem

1.   “Statistical unit is necessary not only for the collection of data, but also for the interpretation and presentation”. Explain the statement.

2.   Solution:-

Solution: Statistics may be defined as the science of collection, presentation analysis and interpretation of numerical data from the logical analysis.

The word ‘statistic’ is used to refer to
-   Numerical facts, such as the number of people living in particular area.
-   The study of ways of collecting, analyzing and interpreting the facts.

Statistics is the study of the collection, organization, analysis, interpretation and presentation of data. It deals with all aspects of data, including the planning of data collection in terms of the design of surveys and experiments.

The word statistics, when referring to the scientific discipline, is singular, as in "Statistics is an art." This should not be confused with the word statistic, referring to a quantity (such as mean or median) calculated from a set of data, whose plural is statistics ("this statistic seems wrong" or "these statistics are misleading").

The plural sense of statistics means some sort of statistical data. When it means statistical data, it refers to numerical description of quantitative aspects of things, These descriptions may take the form of counts or measurements. For example, statistics of students of a college include count of the number of students, and separate counts of number of various kinds as such, male and females, married and unmarried, or undergraduates and post-graduates. They may also include such measurements as their heights and weights.

The large volume of numerical information (or data) gives rise to the need for systematic methods which can be used to collect, organise or classify, present, analyse and interpret the information effectively for the purpose of making wise decisions. Statistical methods include all those devices of analysis and synthesis by means of which statistical data are systematically collected and used to explain or describe a given phenomena.

Statistical unit is necessary not only for the collection of data, but also for the interpretation and presentation. According to this statement there are four stages:

1)   Collection of Data: It is the first step and this is the foundation upon which the entire data set. Careful planning is essential before collecting the data. There are different methods of collection of data such as census, sampling, primary, secondary, etc., and the investigator should make use of correct method.

2)   Presentation of data: The mass data collected should be presented in a suitable, concise form for further analysis. The collected data may be presented in the form of tabular or diagrammatic or graphic form.

3)   Analysis of data: The data presented should be carefully analyzed for making inference from the presented data such as measures of central tendencies, dispersion, correlation, regression etc.

4)   Interpretation of data: The final step is drawing conclusion from the data collected. A valid conclusion must be drawn on the basis of analysis. A high degree of skill and experience is necessary for the interpretation.

By now you may have realised that effective decisions. have to be based upon realistic data. The field of statistics provides the methods for collecting, presenting and meaningfully interpreting the given data. Statistical Methods broadly fall into three categories as shown in the following chart.

Fact becomes knowledge, when it is used in the successful completion of a decision process. Once you have a massive amount of facts integrated as knowledge, then your mind will be superhuman in the same sense that mankind with writing is superhuman compared to mankind before writing. The following figure illustrates the statistical thinking process based on data in constructing statistical models for decision making under uncertainties.

The above figure depicts the fact that as the exactness of a statistical model increases, the level of improvements in decision-making increases. That's why we need statistical data analysis. Statistical data analysis arose from the need to place knowledge on a systematic evidence base. This required a study of the laws of probability, the development of measures of data properties and relationships, and so on.

Statistical inference aims at determining whether any statistical significance can be attached that results after due allowance is made for any random variation as a source of error. Intelligent and critical inferences cannot be made by those who do not understand the purpose, the conditions, and applicability of the various techniques for judging significance.

Considering the uncertain environment, the chance that "good decisions" are made increases with the availability of "good information." The chance that "good information" is available increases with the level of structuring the process of Knowledge Management. The above figure also illustrates the fact that as the exactness of a statistical model increases, the level of improvements in decision-making increases.

Statistics is a science assisting you to make decisions under uncertainties (based on some numerical and measurable scales). Decision making process must be based on data neither on personal opinion nor on belief.

It is already an accepted fact that "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." So, let us be ahead of our time.

A  body  of  techniques  and  procedures  dealing  with  the  collection,  organization,  analysis,  interpretation,  and presentation of information that can be stated numerically.

Perhaps an example will clarify this definition. Say, for example, we wanted to know the level of job satisfaction nurses experience working on various units within a particular hospital (ie. psychiatric, cardiac care, obstetrics, etc.). The first thing we would need to do is collect some data. We might have all the nurses on a particular day complete a job satisfaction questionnaire. We could ask such questions as "On a scale of 1 (not satisfied) to 10 (highly satisfied), how satisfied are you with your job?". We might examine employee turnover rates for each unit during the past year. We could also examine absentee records for a two month period of time as decreased job satisfaction is correlated with higher absenteeism. Once we have collected the data, we would then organize it. In this case, we would organize it by nursing unit.

Descriptive statistics are used to organize or summarize a particular set of measurements. In other words, a descriptive statistic will describe that set of measurements. For example, in our study above, the mean described the absenteeism rates of five nurses on each unit. The U.S. census represents another example of descriptive statistics. In this case, the information that is gathered concerning gender, race, income, etc. is compiled to describe the population of the United States at a given point in time. A baseball player's batting average is another example of a descriptive statistic. It describes the baseball player's past ability to hit a baseball at any point in time. What these three examples have in common is that they organize, summarize, and describe a set of measurements.

Inferential statistics use data gathered from a sample to make inferences about the larger population from which the sample was drawn. For example, we could take the information gained from our nursing satisfaction study and make inferences to all hospital nurses. We might infer that cardiac care nurses as a group are less satisfied with their jobs as indicated by absenteeism rates.  Opinion polls  and television  ratings  systems  represent  other  uses  of  inferential statistics. For example, a limited number of people are polled during an election and then this information is used to describe voters as a whole.

Traditionally, statistics was concerned with drawing inferences using a semi-standardized methodology that was "required learning" in most sciences. This has changed with use of statistics in non-inferential contexts. What was once considered a dry subject, taken in many fields as a degree-requirement, is now viewed enthusiastically. Initially derided by some mathematical purists, it is now considered essential methodology in certain areas.

•   In number theory, scatter plots of data generated by a distribution function may be transformed with familiar tools used in statistics to reveal underlying patterns, which may then lead to hypotheses.

•   Methods of statistics including predictive methods in forecasting are combined with chaos theory and fractal geometry to create video works that are considered to have great beauty.

•   The process art of Jackson Pollock relied on artistic experiments whereby underlying distributions in nature were artistically revealed.[citation needed] With the advent of computers, statistical methods were applied to formalize such distribution-driven natural processes to make and analyze moving video art.

•   Methods of statistics may be used predicatively in performance art, as in a card trick based on a Markov process  that  only  works  some  of  the  time,  the  occasion  of  which  can  be  predicted  using  statistical methodology.

•   Statistics can be used to predicatively create art, as in the statistical or stochastic music invented by Iannis Xenakis, where the music is performance-specific. Though this type of artistry does not always come out as expected, it does behave in ways that are predictable and tunable using statistics.

Thus we can say “Statistical unit is necessary not only for the collection of data, but also for the interpretation and presentation.”

2.   Find the standard deviation and coefficient of skewness for the following distribution

Variable   0-5   5-10   10-15   15-20   20-25   25-30   30-35   35-40
Frequency   2   5   7   13   21   16   8   3

Solution:-

3.   A  salesman  has a  60% chance of  making  a  sale to  any  one customer.  The  behaviour of successive customers is independent. If two customers A and B enter, what is the probability that the salesman will make a sale to A or B.

Solution:-

The probability that the salesman fails to make sales to A The salesman fails to make a sale to B
Since the events are independent, the probability that the salesman fails to make sales to both the customers

Therefore, The probability that the Salesman is able to make sales to A or B

4.  To verify whether a course in Research Methodology improved performance, a similar test was given to 12 participants before and after the course. The original marks and after the course marks are given below:

Original Marks   44   40   61   52   32   44   70   41   67   72   53   72
Marks after the course   53   38   69   57   46   39   73   48   73   74   60   78

Was the course useful? Consider these 12 participants as a sample from a population.

Solution: Let us take the hypothesis that there is no difference in the marks obtained before and after the course, i.e. the course has not been useful Applying t-test (difference formula):

5.   Write short notes on

a)  Bernoulli Trials

b)  Standard Normal distribution c)  Central Limit theorem

a)  Bernoulli Trials

Solution: In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, "success" and "failure".

Repeated independent trials in which there can be only two outcomes are called Bernoulli trials in honor of James Bernoulli (1654-1705).
-   Bernoulli trials lead to the binomial distribution.
-   If a number of trials are large, then the probability of k successes in n trials can be approximated by
the Poisson distribution.
-   The binomial distribution and  the Poisson distribution are closely approximated  by the  normal
(Gaussian) distribution.
-   These  three  distributions  are  the  foundation  of  much  of  the  analysis  of  physical  systems  for
detection, communication and storage of information.

In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment, is conducted. The mathematical formalization of the Bernoulli trial is known as the Bernoulli process. This article offers an elementary introduction to the concept, whereas the article on the Bernoulli process offers a more advanced treatment.

Since a Bernoulli trial has only two possible outcomes, it can be framed as some "yes or no" question. For example:

•   Did the coin land heads?
•   Was the newborn child a girl?

Therefore, success and failure are merely labels for the two outcomes, and should not be construed literally. The term "success" in this sense consists in the result  meeting  specified conditions,  not  in any moral judgment. Examples of Bernoulli trials include:

•   Flipping  a  coin.  In  this  context,  obverse  ("heads")  conventionally  denotes  success  and  reverse
("tails") denotes failure. A fair coin has the probability of success 0.5 by definition.

•   Rolling a die, where a six is "success" and everything else a "failure".

•   In conducting a political opinion poll, choosing a voter at random to ascertain whether that voter will vote "yes" in an upcoming referendum.

The Assumptions of Bernoulli Trials
i.   Each trial results in one of two possible outcomes, denoted success (S) or failure (F).
ii.   The probability of S remains constant from trial-to-trial and is denoted by p. Write q = 1−p for the constant probability of F.
iii.   The trials are independent.

Independent repeated trials of an experiment with exactly two possible outcomes are called Bernoulli trials. Call one of the outcomes "success" and the other outcome "failure". Let p be the probability of success in a Bernoulli trial. Then the probability of failure q is given by

q = 1 - p.

Random variables describing Bernoulli trials are often encoded using the convention that 1 = "success", 0 = "failure".

Closely related  to  a Bernoulli trial  is a  binomial experiment,  which consists of a  fixed  number  n of statistically independent Bernoulli trials, each with a probability of success p, and counts the number of

successes. A random variable corresponding to a binomial is denoted by B (n, p), and is said to have a binomial distribution. The probability of exactly k successes in the experiment B (n, p) is given by:

Bernoulli trials may also lead to negative binomial distributions (which count the number of successes in a series of repeated Bernoulli trials until a specified number of failures are seen), as well as various other distributions.

When multiple Bernoulli trials are performed, each with its own probability of success, these are sometimes referred to as Poisson trials.

============================================================================

b)  Standard Normal distribution

Solution:  The  standard  normal  distribution  is  a  special  case  of the normal distribution .  It  is  the distribution that occurs when a normal random variable has a mean of zero and a standard deviation of one.

The normal random variable of a standard normal distribution is called a standard score or a z score. Every normal random variable X can be transformed into a z score via the following equation:

z = (X - μ) / σ

Where X is a normal random variable, μ is the mean, and σ is the standard deviation.

In other words, A standard normal distribution is a normal distribution with mean 0 and standard deviation
1. Areas under this curve can be found using a standard normal table (Table A in the Moore and Moore & McCabe textbooks). All introductory statistics texts include this table. Some do format it differently. From the 68-95-99.7 rule we know that for a variable with the standard normal distribution, 68% of the observations fall between -1 and 1 (within 1 standard deviation of the mean of 0), 95% fall between -2 and 2 (within 2 standard deviations of the mean) and 99.7% fall between -3 and 3 (within 3 standard deviations of the mean).

No naturally measured variable has this distribution. However, all other normal distributions are equivalent to this distribution when the unit of measurement is changed to measure standard deviations from the mean. (That's why this distribution is important--it's used to handle problems involving any normal distribution.)

Recall that a density curve models relative frequency as area under the curve.

Assume throughout this document then that we are working with a variable Z that has a standard normal distribution. The letter Z is usually used for such a variable, the small letter z is used to indicate the generic value that the variable may take.

For Example-

Question: What is the relative frequency of observations above -1.48?

•   Identify the range of values described by "above -1.48" (shaded green).
•   Identify the area you need to find (shaded blue). It appears to be about 95%.
•   Use the value -1.48 to look up an area in your table. However, be careful. Doing so gives you
0.0694--this is nowhere near 0.95--our initial guess. That's because the table is oriented to find areas under the curve to the left of. . . So, in fact, looking up -1.48 has found the answer to the question What is the relative frequency of measurements falling below -1.48. This range, z < -1.48 (in gray) and the associated area 0.0694 (in purple) are shown below.

There are two ways to proceed. They are, of course, equivalent.

•   Since 0.0694 of the observations fall below -1.48, the remaining 0.9306 = 1 - 0.0694 must fall above
-1.48.
•   Since the total area under the curve is exactly 1, and the purple area is 0.0694, the blue area must be
1 - 0.0694 = 0.9306.

In other words, subtraction from 1 is necessary.

93.06% of the observations fall above -1.48. (For any normal distribution, 0.9306 or 93.06% of the observations fall above 1.48 times the standard deviation below the mean.)

=========================================================================

a)  Central Limit theorem / Solution:
In probability theory, the central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of iterates of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed.[1] That is, suppose that a sample is obtained containing a large number of observations, each observation being randomly generated in a way that does not depend on the values of the other observations, and that the arithmetic average of the observed values is computed. If this procedure is performed many times, the computed average will not always be the same each time; the central limit theorem says that the computed values of the average will be distributed according to the normal distribution (commonly known as a "bell curve").

The central limit theorem has a number of variants. In its common form, the random variables must be identically distributed. In variants, convergence of the mean to the normal distribution also occurs for non-identical distributions, given that they comply with certain conditions.

The Central Limit Theorem describes the characteristics of the "population of the means" which has been created from the means of an infinite number of random population samples of size (N), all of them drawn from a given "parent population". The Central Limit Theorem predicts that regardless of the distribution of the parent population:

1)  The mean of the population of means is always equal to the mean of the parent population from which the population samples were drawn.

2)  The standard deviation of the population of means is always equal to the standard deviation of the parent population divided by the square root of the sample size (N).

3)  The distribution  of  means  will  increasingly  approximate  a normal  distribution as  the  size  N  of samples increases.

A consequence of Central Limit Theorem is that if we average measurements of a particular quantity, the distribution of our average tends toward a normal one. In addition, if a measured variable is actually a combination of several other uncorrelated variables, all of them "contaminated" with a random error of any distribution,  our  measurements tend to  be contaminated  with a random error that  is  normally distributed as the number of these variables increases.

Thus, the Central Limit Theorem explains the ubiquity of the famous bell-shaped "Normal distribution" (or "Gaussian distribution") in the measurements domain.

In more general probability theory, a central limit theorem is any of a set of weak-convergence theorems. They  all  express  the  fact  that  a  sum  of  many  independent  and  identically  distributed  (i.i.d.)  random variables, or alternatively, random variables with specific types of dependence, will tend to be distributed according to one of a small set of attractor distributions. When the variance of the i.i.d. variables is finite, the attractor distribution is the normal distribution. In contrast, the sum of a number of i.i.d. random variables with power law tail distributions decreasing as |x|−α−1 where 0 < α < 2 (and therefore having infinite variance) will tend to an alpha-stable distribution with stability parameter (or index of stability) of α as the number of variables grows.

Classical CLT

Let  {X1,  ...,  Xn}  be a  random sample of size  n—that  is,  a sequence of independent  and  identically distributed random variables drawn from distributions of expected values given by µ and finite variances given by σ2. Suppose we are interested in the sample average.

of these random variables. By the law of large numbers, the sample averages converge in probability and almost surely to the expected value µ as n → ∞. The classical central limit theorem describes the size and the distributional form of the stochastic fluctuations around the deterministic number µ during this convergence. More precisely, it states that as n gets larger, the distribution of the difference between the sample average Sn and its limit µ, when multiplied by the factor √n (that is √n(Sn − µ)), approximates the normal distribution with mean 0 and variance (σ)2. For large enough n, the distribution of Sn is close to the normal distribution with mean µ and variance (σ)2 /n.

The usefulness of the theorem is that the distribution of √n(Sn − µ) approaches normality regardless of the

shape of the distribution of the individual Xi’s. Formally, the theorem can be stated as follows:

Lindeberg–Lévy CLT. Suppose {X1, X2, ...} is a sequence of i.i.d. random variables with E[Xi] = µ and Var[Xi] = σ2
< ∞. Then as n approaches infinity, the random variables √n(Sn − µ) converge in distribution to a normal N(0, σ2)

In the case σ > 0, convergence in distribution means that the cumulative distribution functions of √n(Sn − µ) converge pointwise to the cdf of the N(0, σ2) distribution: for every real number z,

Where sup denotes the least upper bound (or supremum) of the set.

Volunteer

#### Leo Lingham

##### Experience

18 years of working management experience covering such areas

PLUS

24 years of management consulting which includes business planning, strategic planning, marketing, product management, training, business coaching etc.

Organizations
BESTBUSICON   Pty Ltd--PRINCIPAL

Education/Credentials
MASTERS IN SCIENCE