You are here:

- Home
- Jobs/Careers
- Human Resources
- Human Resources
- Research Methodology for Management Decisions

Advertisement

Que: Distinguish between “primary data” and “secondary data” with the help of examples.Which one is easy to collect, why?

Que: What is factor analysis? Mention briefly the purpose & uses of factor analysis.

1. Distinguish between “primary data” and “secondary data” with the help of examples. Which one is easy to collect, why?

1.PRIMARY RESEARCH

Primary Sources

Some definitions of primary sources:

1 Primary sources are original materials on which other research is based

2 They are usually the first formal appearance of results in the print or electronic literature (for example, the first publication of the results of scientific investigations is a primary source.)

3 They present information in its original form, neither interpreted nor condensed nor evaluated by other writers.

4 They are from the time period (for example, something written close to when what it is recording happened is likely to be a primary source.)

5 Primary sources present original thinking, report on discoveries, or share new information.

Some examples of primary sources:

1 scientific journal articles reporting experimental research results

2 proceedings of Meetings, Conferences and Symposia.

3 technical reports

4 dissertations or theses (may also be secondary)

5 patents

6 sets of data, such as census statistics

7 works of literature (such as poems and fiction)

8 diaries

9 autobiographies

10 interviews, surveys and fieldwork

11 letters and correspondence

12 speeches

13 newspaper articles (may also be secondary)

14 government documents

15 photographs and works of art

16 original documents (such as birth certificate or trial transcripts)

17 Internet communications on email, listservs, and newsgroups

ALSO

means gathering information directly from the consumers

which could involve

-using questionnaire.

-using focus group [ face to face interview ]

-telephone interviews

-panel interviews

-person to person interviews

etc etc

==============================

1.MERITS

-from the primary source.

-original information.

-current data.

-reliable.

-clearly defined.

2.DEMERITS

-time consuming.

-expensive process.

-difficult to procure, sometimes.

3.LIMITATIONS.

-due to time/ cost factors, the amount of data gathering is restricted.

###########################################

2.SECONDARY RESEARCH

Secondary Sources

Secondary sources are less easily defined than primary sources. What some define as a secondary source, others define as a tertiary source. Nor is it always easy to distinguish primary from secondary sources. A newspaper article is a primary source if it reports events, but a secondary source if it analyses and comments on those events. In science, secondary sources are those which simplify the process of finding and evaluating the primary literature. They tend to be works which repackage, reorganize, reinterpret, summarise, index or otherwise "add value" to the new information reported in the primary literature. More generally, secondary sources

Some Definitions of Secondary Sources:

1 describe, interpret, analyze and evaluate the primary sources

2 comment on and discuss the evidence provided by primary sources

3 are works which are one or more steps removed from the event or information they refer to, being written after the fact with the benefit of hindsight.

Some examples of secondary sources:

1 bibliographies (may also be tertiary)

2 biographical works

3 commentaries

4 dictionaries and encyclopedias (may also be tertiary)

5 dissertations or theses (more usually primary)

6 handbooks and data compilations (may also be tertiary)

7 history

8 indexing and abstracting tools used to locate primary & secondary sources (may also be tertiary)

9 journal articles, particularly in disciplines other than science (may also be primary)

10 monographs (other than fiction and autobiography)

11 newspaper and popular magazine articles (may also be primary)

12 review articles and literature reviews

13 textbooks (may also be tertiary)

14 treatises works of criticism and interpretation

ALSO

means gathering information indirectly from the published source

which could involve

-using census data.

-buying published data from bureaus

-gathering data from stock exchange

-collecting information from company annual reports.

etc etc

=====================================

1.MERITS

-from the secondary source.

-easy to source

-less time required.

-less expensive.

2.DEMERITS

-repackaged information.

-re-interpretation.

-not so reliable.

-old data and not current.

3.LIMITATIONS.

-not current data.

Under what circumstances might the availability of secondary data make primary research unnecessary?

THE CIRCUMSTANCES INCLUDE

-data used for developing strategic planning.

-data used for developing corporate planning.

-data used for developing business planning.

-data used for developing marketing planning.

-data used for developing demand forecasting.

etc etc

BOTH ARE DIFFICULT TO REACH BUT CAN BE CARRIED OUT.

THE PRIMAR Y DATA COLLECTION IS AN EXPENSIVE PROCESS.

#############################################

4. What is factor analysis? Mention briefly the purpose & uses of factor analysis.

Factor analysis is a statistical method used to explain variability among observed variables in terms of fewer unobserved variables called factors. The observed variables are modeled as linear combinations of the factors, plus "error" terms. The information gained about the interdependencies can be used later to reduce the set of variables in a dataset.

Factor analysis is used in behavioral sciences, social sciences, marketing, product management, operations research, and other applied sciences that deal with large quantities of data.

Applications in psychology

Factor analysis is used to identify "factors" that explain a variety of results on different tests. For example, intelligence research found that people who get a high score on a test of verbal ability are also good on other tests that require verbal abilities. Researchers explained this by using factor analysis to isolate one factor, often called crystallized intelligence or verbal intelligence, that represents the degree in which someone is able to solve problems involving verbal skills.

Factor analysis in psychology is most often associated with intelligence research. However, it also has been used to find factors in a broad range of domains such as personality, attitudes, beliefs, etc. It is linked to psychometrics, as it can assess the validity of an instrument by finding if the instrument indeed measures the postulated factors.

Advantages

• Reduction of number of variables, by combining two or more variables into a single factor. For example, performance at running, ball throwing, batting, jumping and weight lifting could be combined into a single factor such as general athletic ability. Usually, in an item by people matrix, factors are selected by grouping related items. In the Q factor analysis technique, the matrix is transposed and factors are created by grouping related people: For example, liberals, libertarians, conservatives and socialists, could form separate groups.

• Identification of groups of inter-related variables, to see how they are related to each other. For example, a factor called "broad visual perception" relates to how good an individual is at visual tasks ----a "broad auditory perception" factor, relating to auditory task capability. Furthermore, a global factor, called "g" or general intelligence, that relates to both "broad visual perception" and "broad auditory perception". This means someone with a high "g" is likely to have both a high "visual perception" capability and a high "auditory perception" capability, and that "g" therefore explains a good part of why someone is good or bad in both of those domains.

Disadvantages

• "...each orientation is equally acceptable mathematically. But different factorial theories proved to differ as much in terms of the orientations of factorial axes for a given solution as in terms of anything else, so that model fitting did not prove to be useful in distinguishing among theories." . This means all rotations represent different underlying processes, but all rotations are equally valid outcomes of standard factor analysis optimization. Therefore, it is impossible to pick the proper rotation using factor analysis alone.

• Factor analysis can be only as good as the data allows. In psychology, where researchers have to rely on more or less valid and reliable measures such as self-reports, this can be problematic.

• Interpreting factor analysis is based on using a “heuristic”, which is a solution that is "convenient even if not absolutely true". More than one interpretation can be made of the same data factored the same way, and factor analysis cannot identify causality.

Factor analysis in marketing

The basic steps are:

• Identify the salient attributes consumers use to evaluate products in this category.

• Use quantitative marketing research techniques (such as surveys) to collect data from a sample of potential customers concerning their ratings of all the product attributes.

• Input the data into a statistical program and run the factor analysis procedure. The computer will yield a set of underlying attributes (or factors).

• Use these factors to construct perceptual maps and other product positioning devices.

Information collection

The data collection stage is usually done by marketing research professionals. Survey questions ask the respondent to rate a product sample or descriptions of product concepts on a range of attributes. Anywhere from five to twenty attributes are chosen. They could include things like: ease of use, weight, accuracy, durability, colourfulness, price, or size. The attributes chosen will vary depending on the product being studied. The same question is asked about all the products in the study. The data for multiple products is coded and input into a statistical program such as SPSS and SYSTAT.

Analysis

The analysis will isolate the underlying factors that explain the data. Factor analysis is an interdependence technique. The complete set of interdependent relationships are examined. There is no specification of either dependent variables, independent variables, or causality. Factor analysis assumes that all the rating data on different attributes can be reduced down to a few important dimensions. This reduction is possible because the attributes are related. The rating given to any one attribute is partially the result of the influence of other attributes. The statistical algorithm deconstructs the rating (called a raw score) into its various components, and reconstructs the partial scores into underlying factor scores. The degree of correlation between the initial raw score and the final factor score is called a factor loading. There are two approaches to factor analysis: "principal component analysis" (the total variance in the data is considered); and "common factor analysis" (the common variance is considered).

==========================

How Many Cases and Variables?

The clearer the true factor structure, the smaller the sample size needed to discover it. But it would be very difficult to discover even a very clear and simple factor structure with fewer than about 50 cases, and 100 or more cases would be much preferable for a less clear structure.

The rules about number of variables are very different for factor analysis than for regression. In factor analysis it is perfectly okay to have many more variables than cases. In fact, generally speaking the more variables the better, so long as the variables remain relevant to the underlying factors.

How Many Factors?

This section describes two rules for choosing the number of factors. Readers familiar with factor analysis will be surprised to find no mention of Kaiser's familiar eigenvalue rule or Cattell's scree test. Both rules are mentioned later, though as explained at that time I consider both rules obsolescent. Also both use eigenvalues, which I have not yet introduced.

Of the two rules that are discussed in this section, the first uses a formal significance test to identify the number of common factors. Let N denote the sample size, p the number of variables, and m the number of factors. Also RU denotes the residual matrix U transformed into a correlation matrix, |RU| is its determinant, and ln(1/|RU|) is the natural logarithm of the reciprocal of that determinant.

To apply this rule, first compute G = N-1-(2p+5)/6-(2/3)m. Then compute

Chi-square = G ln(1/|RU|)

with

df = .5[(p-m)2-p-m]

If it is difficult to compute ln(1/|RU|), that expression is often well approximated by rU2, where the summation denotes the sum of all squared correlations above the diagonal in matrix RU.

To use this formula to choose the number of factors, start with m = 1 (or even with m = 0) and compute this test for successively increasing values of m, stopping when you find nonsignificance; that value of m is the smallest value of m that is not significantly contradicted by the data. The major difficulty with this rule is that in my experience, with moderately large samples it leads to more factors than can successfully be interpreted.

Rotation.

Rotation is the step in factor analysis that allows you to identify meaningful factor names or descriptions like these.

Linear Functions of Predictors

To understand rotation, first consider a problem that doesn't involve factor analysis. Suppose you want to predict the grades of college students (all in the same college) in many different courses, from their scores on general "verbal" and "math" skill tests. To develop predictive formulas, you have a body of past data consisting of the grades of several hundred previous students in these courses, plus the scores of those students on the math and verbal tests. To predict grades for present and future students, you could use these data from past students to fit a series of two-variable multiple regressions, each regression predicting grade in one course from scores on the two skill tests.

Now suppose a co-worker suggests summing each student's verbal and math scores to obtain a composite "academic skill" score I'll call AS, and taking the difference between each student's verbal and math scores to obtain a second variable I'll call VMD (verbal-math difference). The co-worker suggests running the same set of regressions to predict grades in individual courses, except using AS and VMD as predictors in each regression, instead of the original verbal and math scores. In this example, you would get exactly the same predictions of course grades from these two families of regressions: one predicting grades in individual courses from verbal and math scores, the other predicting the same grades from AS and VMD scores. In fact, you would get the same predictions if you formed composites of 3 math + 5 verbal and 5 verbal + 3 math, and ran a series of two-variable multiple regressions predicting grades from these two composites. These examples are all linear functions of the original verbal and math scores.

The central point is that if you have m predictor variables, and you replace the m original predictors by m linear functions of those predictors, you generally neither gain or lose any information--you could if you wish use the scores on the linear functions to reconstruct the scores on the original variables. But multiple regression uses whatever information you have in the optimum way (as measured by the sum of squared errors in the current sample) to predict a new variable (e.g. grades in a particular course). Since the linear functions contain the same information as the original variables, you get the same predictions as before.

Given that there are many ways to get exactly the same predictions, is there any advantage to using one set of linear functions rather than another? Yes there is; one set may be simpler than another. One particular pair of linear functions may enable many of the course grades to be predicted from just one variable (that is, one linear function) rather than from two. If we regard regressions with fewer predictor variables as simpler, then we can ask this question: Out of all the possible pairs of predictor variables that would give the same predictions, which is simplest to use, in the sense of minimizing the number of predictor variables needed in the typical regression? The pair of predictor variables maximining some measure of simplicity could be said to have simple structure. In this example involving grades, you might be able to predict grades in some courses accurately from just a verbal test score, and predict grades in other courses accurately from just a math score. If so, then you would have achieved a "simpler structure" in your predictions than if you had used both tests for all predictions.

Simple Structure in Factor Analysis

The points of the previous section apply when the predictor variables are factors. Think of the m factors F as a set of independent or predictor variables, and think of the p observed variables X as a set of dependent or criterion variables. Consider a set of p multiple regressions, each predicting one of the variables from all m factors. The standardized coefficients in this set of regressions form a p x m matrix called the factor loading matrix. If we replaced the original factors by a set of linear functions of those factors, we would get exactly the same predictions as before, but the factor loading matrix would be different. Therefore we can ask which, of the many possible sets of linear functions we might use, produces the simplest factor loading matrix. Specifically we will define simplicity as the number of zeros or near-zero entries in the factor loading matrix--the more zeros, the simpler the structure. Rotation does not change matrix C or U at all, but does change the factor loading matrix.

In the extreme case of simple structure, each X-variable will have only one large entry, so that all the others can be ignored. But that would be a simpler structure than you would normally expect to achieve; after all, in the real world each variable isn't normally affected by only one other variable. You then name the factors subjectively, based on an inspection of their loadings.

In common factor analysis the process of rotation is actually somewhat more abstract that I have implied here, because you don't actually know the individual scores of cases on factors. However, the statistics for a multiple regression that are most relevant here--the multiple correlation and the standardized regression slopes--can all be calculated just from the correlations of the variables and factors involved. Therefore we can base the calculations for rotation to simple structure on just those correlations, without using any individual scores.

A rotation which requires the factors to remain uncorrelated is an orthogonal rotation, while others are oblique rotations. Oblique rotations often achieve greater simple structure, though at the cost that you must also consider the matrix of factor intercorrelations when interpreting results. Manuals are generally clear which is which, but if there is ever any ambiguity, a simple rule is that if there is any ability to print out a matrix of factor correlations, then the rotation is oblique, since no such capacity is needed for orthogonal rotations.

An Example

Table 1 illustrates the outcome of rotation with a factor analysis of 24 measures of mental ability.

Table 1

Oblique Promax rotation of 4 factors of 24 mental ability variables

Verbal Numer- Visual Recog-

ical nition

General information .80 .10 -.01 -.06

Paragraph comprehension .81 -.10 .02 .09

Sentence completion .87 .04 .01 -.10

Word classification .55 .12 .23 -.08

Word meaning .87 -.11 -.01 .07

Add .08 .86 -.30 .05

Code .03 .52 -.09 .29

Counting groups of dots -.16 .79 .14 -.09

Straight & curved capitals -.01 .54 .41 -.16

Woody-McCall mixed .24 .43 .00 .18

Visual perception -.08 .03 .77 -.04

Cubes -.07 -.02 .59 -.08

Paper form board -.02 -.19 .68 -.02

Flags .07 -.06 .66 -.12

Deduction .25 -.11 .40 .20

Numerical puzzles -.03 .35 .37 .06

Problem reasoning .24 -.07 .36 .21

Series completion .21 .05 .49 .06

Word recognition .09 -.08 -.13 .66

Number recognition -.04 -.09 -.02 .64

Figure recognition -.16 -.13 .43 .47

Object-number .00 .09 -.13 .69

Number-figure -.22 .23 .25 .42

Figure-word .00 .05 .15 .37

This table reveals quite a good simple structure. Within each of the four blocks of variables, the high values (above about .4 in absolute value) are generally all in a single column--a separate column for each of the four blocks. Further, the variables within each block all seem to measure the same general kind of mental ability. The major exception to both these generalizations comes in the third block. The variables in that block seem to include measures of both visual ability and reasoning, and the reasoning variables (the last four in the block) generally have loadings in column 3 not far above their loadings in one or more other columns. This suggests that a 5-factor solution might be worth trying, in the hope that it might yield separate "visual" and "reasoning" factors. The factor names in Table 1 were given by Gorsuch, but inspection of the variables in the second block suggests that "simple repetitive tasks" might be a better name for factor 2 than "numerical".

I don't mean to imply that you should always try to make every variable load highly on only one factor. For instance, a test of ability to deal with arithmetic word problems might well load highly on both verbal and mathematical factors. This is actually one of the advantages of factor analysis over cluster analysis, since you cannot put the same variable in two different clusters.

Factor Analysis

Factor analysis includes both component analysis and common factor analysis. More than other statistical techniques, factor analysis has suffered from confusion concerning its very purpose. This affects my presentation in two ways. First, I devote a long section to describing what factor analysis does before examining in later sections how it does it. Second, I have decided to reverse the usual order of presentation. Component analysis is simpler, and most discussions present it first. However, I believe common factor analysis comes closer to solving the problems most researchers actually want to solve. Thus learning component analysis first may actually interfere with understanding what those problems are. Therefore component analysis is introduced only quite late in this chapter.

What Factor Analysis Can and Can't Do

I assume you have scores on a number of variables-- anywhere from 3 to several hundred variables, but most often between 10 and 100. Actually you need only the correlation or covariance matrix--not the actual scores. The purpose of factor analysis is to discover simple patterns in the pattern of relationships among the variables. In particular, it seeks to discover if the observed variables can be explained largely or entirely in terms of a much smaller number of variables called factors.

Some Examples of Factor-Analysis Problems

1. Factor analysis was invented nearly 100 years ago by psychologist Charles Spearman, who hypothesized that the enormous variety of tests of mental ability--measures of mathematical skill, vocabulary, other verbal skills, artistic skills, logical reasoning ability, etc.--could all be explained by one underlying "factor" of general intelligence that he called g. He hypothesized that if g could be measured and you could select a subpopulation of people with the same score on g, in that subpopulation you would find no correlations among any tests of mental ability. In other words, he hypothesized that g was the only factor common to all those measures.

It was an interesting idea, but it turned out to be wrong. Today the College Board testing service operates a system based on the idea that there are at least three important factors of mental ability--verbal, mathematical, and logical abilities--and most psychologists agree that many other factors could be identified as well.

2. Consider various measures of the activity of the autonomic nervous system--heart rate, blood pressure, etc. Psychologists have wanted to know whether, except for random fluctuation, all those measures move up and down together--the "activation" hypothesis. Or do groups of autonomic measures move up and down together, but separate from other groups? Or are all the measures largely independent? An unpublished analysis of mine found that in one data set, at any rate, the data fitted the activation hypothesis quite well.

3. Suppose many species of animal (rats, mice, birds, frogs, etc.) are trained that food will appear at a certain spot whenever a noise--any kind of noise--comes from that spot. You could then tell whether they could detect a particular sound by seeing whether they turn in that direction when the sound appears. Then if you studied many sounds and many species, you might want to know on how many different dimensions of hearing acuity the species vary. One hypothesis would be that they vary on just three dimensions--the ability to detect high-frequency sounds, ability to detect low-frequency sounds, and ability to detect intermediate sounds. On the other hand, species might differ in their auditory capabilities on more than just these three dimensions. For instance, some species might be better at detecting sharp click-like sounds while others are better at detecting continuous hiss-like sounds.

4. Suppose each of 500 people, who are all familiar with different kinds of automobiles, rates each of 20 automobile models on the question, "How much would you like to own that kind of automobile?" We could usefully ask about the number of dimensions on which the ratings differ. A one-factor theory would posit that people simply give the highest ratings to the most expensive models. A two-factor theory would posit that some people are most attracted to sporty models while others are most attracted to luxurious models. Three-factor and four-factor theories might add safety and reliability. Or instead of automobiles you might choose to study attitudes concerning foods, political policies, political candidates, or many other kinds of objects.

5. Rubenstein (1986) studied the nature of curiosity by analyzing the agreements of junior-high-school students with a large battery of statements such as "I like to figure out how machinery works" or "I like to try new kinds of food." A factor analysis identified seven factors: three measuring enjoyment of problem-solving, learning, and reading; three measuring interests in natural sciences, art and music, and new experiences in general; and one indicating a relatively low interest in money.

The Goal: Understanding of Causes

Many statistical methods are used to study the relation between independent and dependent variables. Factor analysis is different; it is used to study the patterns of relationship among many dependent variables, with the goal of discovering something about the nature of the independent variables that affect them, even though those independent variables were not measured directly. Thus answers obtained by factor analysis are necessarily more hypothetical and tentative than is true when independent variables are observed directly. The inferred independent variables are called factors. A typical factor analysis suggests answers to four major questions:

How many different factors are needed to explain the pattern of relationships among these variables?

What is the nature of those factors?

How well do the hypothesized factors explain the observed data?

How much purely random or unique variance does each observed variable include?

I illustrate these questions later.

Absolute Versus Heuristic Uses of Factor Analysis

A heuristic is a way of thinking about a topic which is convenient even if not absolutely true. We use a heuristic when we talk about the sun rising and setting as if the sun moved around the earth, even though we know it doesn't. "Heuristic" is both a noun and an adjective; to use a heuristic is to think in heuristic terms.

The previous examples can be used to illustrate a useful distinction--between absolute and heuristic uses of factor analysis. Spearman's g theory of intelligence, and the activation theory of autonomic functioning, can be thought of as absolute theories which are or were hypothesized to give complete descriptions of the pattern of relationships among variables. On the other hand, Rubenstein never claimed that her list of the seven major factors of curiosity offered a complete description of curiosity. Rather those factors merely appear to be the most important seven factors--the best way of summarizing a body of data. Factor analysis can suggest either absolute or heuristic models; the distinction is in how you interpret the output.

Is Factor Analysis Objective?

The concept of heuristics is useful in understanding a property of factor analysis which confuses many people. Several scientists may apply factor analysis to similar or even identical sets of measures, and one may come up with 3 factors while another comes up with 6 and another comes up with 10. This lack of agreement has tended to discredit all uses of factor analysis. But if three travel writers wrote travel guides to the United States, and one divided the country into 3 regions, another into 6, and another into 10, would we say that they contradicted each other? Of course not; the various writers are just using convenient ways of organizing a topic, not claiming to represent the only correct way of doing so. Factor analysts reaching different conclusions contradict each other only if they all claim absolute theories, not heuristics. The fewer factors the simpler the theory; the more factors the better the theory fits the data. Different workers may make different choices in balancing simplicity against fit.

A similar balancing problem arises in regression and analysis of variance, but it generally doesn't prevent different workers from reaching nearly or exactly the same conclusions. After all, if two workers apply an analysis of variance to the same data, and both workers drop out the terms not significant at the .05 level, then both will report exactly the same effects. However, the situation in factor analysis is very different. For reasons explained later, there is no significance test in component analysis that will test a hypothesis about the number of factors, as that hypothesis is ordinarily understood. In common factor analysis there is such a test, but its usefulness is limited by the fact that it frequently yields more factors than can be satisfactorily interpreted. Thus a worker who wants to report only interpretable factors is still left without an objective test.

A similar issue arises in identifying the nature of the factors. Two workers may each identify 6 factors, but the two sets of factors may differ--perhaps substantially. The travel-writer analogy is useful here too; two writers might each divide the US into 6 regions, but define the regions very differently.

Another geographical analogy may be more parallel to factor analysis, since it involves computer programs designed to maximize some quantifiable objective. Computer programs are sometimes used to divide a state into congressional districts which are geographically continguous, nearly equal in population, and perhaps homogeneous on dimensions of ethnicity or other factors. Two different district-creating programs might come up with very different answers, though both answers are reasonable. This analogy is in a sense too good; we believe that factor analysis programs usually don't yield answers as different from each other as district-creating programs do.

.

Comparing Two Factor Analyses

Since factor loadings are among the most important pieces of output from a factor analysis, it seems natural to ask about the standard error of a factor loading, so that for instance we might test the significance of the difference between the factor loadings in two samples. Unfortunately, no very useful general formula for such a purpose can be derived, because of ambiguities in identifying the factors themselves. To see this, imagine that "math" and "verbal" factors explain roughly equal amounts of variance in a population. The math and verbal factors might emerge as factors 1 and 2 respectively in one sample, but in the opposite order in a second sample from the same population. Then if we mechanically compared, for instance, the two values of the loading of variable 5 on factor 1, we would actually be comparing variable 5's loading on the math factor to its loading on the verbal factor. More generally, it is never completely meaningful to say that one particular factor in one factor analysis "corresponds" to one factor in another factor analysis. Therefore we need a completely different approach to studying the similarities and differences between two factor analyses.

Actually, several different questions might be phrased as questions about the similarity of two factor analyses. First we must distinguish between two different data formats:

1. Same variables, two groups. The same set of measures might be taken on men and women, or on treatment and control groups. The question then arises whether the two factor structures are the same.

2. One group, two conditions or two sets of variables. Two test batteries might be given to a single group of subjects, and questions asked about how the two sets of scores differ. Or the same battery might be given under two different conditions.

The next two sections consider these questions separately.

Comparing Factor Analyses in Two Groups

In the case of two groups and one set of variables, a question about factor structure is obviously not asking whether the two groups differ in means; that would be a question for MANOVA (multivariate analysis of variance). Unless the two sets of means are equal or have somehow been made equal, the question is also not asking whether a correlation matrix can meaningfully be computed after pooling the two samples, since differences in means would destroy the meaning of such a matrix.

The question, "Do these two groups have the same factor structure?" is actually quite different from the question, "Do they have the same factors?" The latter question is closer to the question, "Do we need two different factor analyses for the two groups?" To see the point, imagine a problem with 5 "verbal" tests and 5 "math" tests. For simplicity imagine all correlations between the two sets of tests are exactly zero. Also for simplicity consider a component analysis, though the same point can be made concerning a common factor analysis. Now imagine that the correlations among the 5 verbal tests are all exactly .4 among women and .8 among men, while the correlations among the 5 math tests are all exactly .8 among women and .4 among men. Factor analyses in the two groups separately would yield different factor structures but identical factors; in each gender the analysis would identify a "verbal" factor which is an equally-weighted average of all verbal items with 0 weights for all math items, and a "math" factor with the opposite pattern. In this example nothing would be gained from using separate factor analyses for the two genders, even though the two factor structures are quite different.

Another important point about the two-group problem is that an analysis which derives 4 factors for group A and 4 for group B has as many factors total as an analysis which derives 8 in the combined group. Thus the practical question may be not whether analyses deriving m factors in each of two groups fit the data better than an analysis deriving m factors in the combined group. Rather the two separate analyses should be compared to an analysis deriving 2m factors in the combined group. To make this comparison for component analysis, sum the first m eigenvalues in each separate group, and compare the mean of those two sums to the sum of the first 2m eigenvalues in the combined group. It would be very rare that this analysis suggests that it would be better to do separate factor analyses for the two groups. This same analysis should give at least an approximate answer to the question for common factor analysis as well.

Suppose the question really is whether the two factor structures are identical. This question is very similar to the question as to whether the two correlation or covariance matrices are identical--a question which is precisely defined with no reference to factor analysis at all. Tests of these hypotheses are beyond the scope of this work, but a test on the equality of two covariance matrices appears in Morrison (1990) and other works on multivariate analysis.

Comparing Factor Analyses of Two Sets of Variables in a Single Group

One question people often ask is whether they should analyze variable sets A and B together or separately. My answer is usually "together", unless there is obviously no overlap between the two domains studied. After all, if the two sets of variables really are unrelated then the factor analysis will tell you so, deriving one set of factors for set A and another for set B. Thus to analyze the two sets separately is to prejudge part of the very question the factor analysis is supposed to answer for you.

As in the case of two separate samples of cases, there is a question which often gets phrased in terms of factors but which is better phrased as a question about the equality of two correlation or covariance matrices--a question which can be answered with no reference to factor analysis. In the present instance we have two parallel sets of variables; that is, each variable in set A parallels one in set B. In fact, sets A and B may be the very same measures administered under two different conditions. The question then is whether the two correlation matrices or covariance matrices are identical. This question has nothing to do with factor analysis, but it also has little to do with the question of whether the AB correlations are high. The two correlation or covariance matrices within sets A and B might be equal regardless of whether the AB correlations are high or low.

Darlington, Weinberg, and Walberg (1973) described a test of the null hypothesis that the covariance matrices for variable sets A and B are equal when sets A and B are measured in the same sample of cases. It requires the assuption that the AB covariance matrix is symmetric. Thus for instance if sets A and B are the same set of tests administered in years 1 and 2, the assumption requires that the covariance between test X in year 1 and test Y in year 2 equal the covariance between test X in year 2 and test Y in year 1. Given this assumption, You can simply form two sets of scores I'll call A+B and A-B, consisting of the sums and differences of parallel variables in the two sets. It then turns out that the original null hypothesis is equivalent to the hypothesis that all the variables in set A+B are uncorrelated with all variables in set A-B. This hypothesis can be tested with MANOVA.

Factor and Component Analysis in SYSTAT 5

Inputting data

There are three different ways to enter data into SYSTAT 5 in a form usable by the FACTOR procedure. A fourth way (to be described shortly) might seem reasonable, but won't in fact work.

FACTOR will accept data in standard rectangular format. It will automatically compute a correlation matrix and use it for further analysis. If you want to analyze a covariance matrix instead, enter

TYPE = COVARIANCE

If you later want to analyze a correlation matrix, enter

TYPE = CORRELATION

The "correlation" type is the default type, so you need not enter that if you want to analyze only correlation matrices.

A second way to prepare data for a factor analysis is to compute and save a correlation or covariance matrix in the CORR menu. SYSTAT will automatically note whether the matrix is a correlation or covariance matrix at the time it is saved, and will save that information. Then FACTOR will automatically use the correct type.

A third way is useful if you have a correlation or covariance matrix from a printed source, and want to enter that matrix by hand. To do this, combine the INPUT and TYPE commands. For instance, suppose the matrix

.94 .62 .47 .36

.62 .89 .58 .29

.47 .58 .97 .38

.36 .29 .38 .87

is the covariance matrix for the four variables ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM. (Normally enter correlations or covariances to more significant digits than this.) In the DATA module you could type

SAVE MATH

INPUT ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM

TYPE COVARIANCE

RUN

.94

.62 .89

.47 .58 .97

.36 .29 .38 .87

QUIT

Notice that you input only the lower triangular portion of the matrix. In this example you input the diagonal, but if you are inputting a correlation matrix so that all diagonal entries are 1.0, then enter the command DIAGONAL ABSENT just before RUN, then omit the diagonal entries.

The fourth way, which won't work, is to enter or scan the correlation or covariance matrix into a word processor, then use SYSTAT's GET command to move the matrix into SYSTAT. In this method SYSTAT will not properly record the matrix TYPE, and will treat the matrix as a matrix of scores rather than correlations or covariances. Unfortunately, SYSTAT willgive you output in the format you expect, and there will be no obvious sign that the whole analysis has been done incorrectly.

Commands for Factor Analysis

The one-word command FACTOR produces a principal component analysis of all numerical variables in the data set. To specify certain variables, name them immediately after the word FACTOR, as in

FACTOR ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM

To choose common factor analysis instead of principal components, add the option IPA for "iterated principal axis". All options are listed after a slash; IPA is an option but the variable list is not. Thus a command might read

FACTOR ALGEBRA, GEOMETRY, COMPUTER, TRIGONOM / IPA

The ITER (iteration) option determines the maximum number of iterations to estimate communalities in common factor analysis. Increase ITER if SYSTAT warns you that communality estimates are suspect; the default is ITER = 25. The TOL option specifies a change in communality estimates below which FACTOR will stop trying to improve communality estimates; default is TOL = .001. The PLOT option yields plots of factor loadings for pairs of factors or components. The number of such plots is m(m-1)/2, which may be large if m is large. A command using all these options might read

FACTOR / IPA, TOL = .0001, ITER = 60, PLOT

These are the only options to the FACTOR command; all other instructions to the FACTOR program are issued as separate commands.

There are two commands you can use to control the number of factors: NUMBER and EIGEN. The command

NUMBER = 4

instructs FACTOR to derive 4 factors. The command

EIGEN = .5

instructs FACTOR to choose a number of factors equal to the number of eigenvalues above .5. Thus when you factor a correlation matrix, the command

EIGEN = 1

implements the Kaiser rule for choosing the number of factors. The default is EIGEN = 0, which causes FACTOR to derive all possible factors. If you use both NUMBER and EIGEN commands, FACTOR will follow whichever rule produces the smaller number of factors.

The one-word command SORT causes FACTOR to sort the variables by their factor loadings when printing the factor loading matrix. Specifically, it will make FACTOR print first all the variables loading above .5 on factor 1, then all the variables loading above .5 on factor 2, etc. Within each block of variables, variables are sorted by the size of the loading on the corresponding factor, with highest loadings first. This sorting makes it easier to examine a factor structure matrix for simple structure.

The ROTATE command allows you to choose a method of rotation. The choices are

ROTATE = VARIMAX

ROTATE = EQUAMAX

ROTATE = QUARTIMAX

The differences among these methods are beyond the scope of this chapter. In any event, rotation does not affect a factor structure's fit to the data, so you may if you wish use them all and choose the one whose results you like best. In fact, that is commonly done. The default method for rotation is varimax, so typing just ROTATE implements varimax.

There are three options for saving the output of factor analysis into files. To do this, use the SAVE command before the FACTOR command. The command

SAVE MYFILE/SCORES

saves scores on principal components into a file named MYFILE. This cannot be used with common factor analysis (the IPA option) since common factor scores are undefined. The command

SAVE MYFILE/COEF

saves the coefficients used to define components. These coefficients are in a sense the opposite of factor loadings. Loadings predict variables from factors, while coefficients define factors in terms of the original variables. If you specify a rotation, the coefficients are the ones defining the rotated components. The command

SAVE MYFILE/LOADING

saves the matrix of factor loadings; it may be used with either common factor analysis or component analysis. Again, if you specify a rotation, the loadings saved are for rotated factors.

Output

The basic output of FACTOR consists of four tables:

• eigenvalues

• factor loading matrix (called factor pattern for IPA)

• variance explained by factors (usually equal to eigenvalues)

• proportion of variance explained by factors

• rotated factor loadings

• variance explained by rotated factors

• proportion of variance explained by rotated factors

IPA adds three others:

• initial communality estimates

• an index of changes in communality estimates

• final communality estimates

PRINT LONG adds two others:

• Input correlation or covariance matrix R

• Matrix of residual covariances--the off-diagonal part of U

The PLOT option to the FACTOR command adds two other items:

• a scree plot

• plots of factor loadings, two factors at a time

There is no overlap in these lists. Thus choosing all these options will cause FACTOR to print 12 tables, a scree plot, and m(m-1)/2 factor loading plots.

An Example

File USDATA (supplied with SYSTAT) includes variables CARDIO, CANCER, PULMONAR, PNEU_FLU, DIABETES, and LIVER, giving the death rates from each of these causes in each of the 50 US states. A factor analysis of these data might shed some light on the public-health factors determining death rates from these 6 causes. To receive all the output types mentioned earlier, using the following commands:

use usdata

rotate = varimax

sort

print long

number = 2

factor cardio, cancer, pulmonar, pneu_flu, diabetes, liver / ipa, plot

Except for a scree plot and a plot of factor loadings, which have been omitted, and a few minor edits I have made for clarity, these commands will produce the following output:

MATRIX TO BE FACTORED

CARDIO CANCER PULMON PNEU_FLU DIAB LIVER

CARDIO 1.000

CANCER 0.908 1.000

PULMONAR 0.441 0.438 1.000

PNEU_FLU 0.538 0.358 0.400 1.000

DIABETES 0.619 0.709 0.227 0.022 1.000

LIVER 0.136 0.363 0.263 -0.097 0.148 1.000

INITIAL COMMUNALITY ESTIMATES

1 2 3 4 5 6

0.901 0.912 0.297 0.511 0.600 0.416

ITERATIVE PRINCIPAL AXIS FACTOR ANALYSIS

ITERATION MAXIMUM CHANGE IN COMMUNALITIES

1 .7032

2 .1849

3 .0877

4 .0489

5 .0421

6 .0372

7 .0334

8 .0304

9 .0279

10 .0259

11 .0241

12 .0226

13 .0212

14 .0201

15 .0190

16 .0181

17 .0054

18 .0009

FINAL COMMUNALITY ESTIMATES

1 2 3 4 5 6

0.867 1.000 0.256 1.000 0.525 0.110

LATENT ROOTS (EIGENVALUES)

1 2 3 4 5 6

2.831 0.968 0.245 -0.010 -0.052 -0.223

FACTOR PATTERN

1 2

CANCER 0.967 0.255

CARDIO 0.931 -0.011

DIABETES 0.620 0.374

PNEU_FLU 0.563 -0.826

LIVER 0.238 0.231

PULMONAR 0.493 -0.113

VARIANCE EXPLAINED BY FACTORS

1 2

2.831 0.968

PERCENT OF TOTAL VARIANCE EXPLAINED

1 2

47.177 16.125

ROTATED FACTOR PATTERN

1 2

CANCER 0.913 0.409

DIABETES 0.718 0.098

CARDIO 0.718 0.593

PNEU_FLU -0.080 0.997

PULMONAR 0.313 0.397

LIVER 0.330 -0.030

VARIANCE EXPLAINED BY ROTATED FACTORS

1 2

2.078 1.680

PERCENT OF TOTAL VARIANCE EXPLAINED

1 2

34.627 28.002

MATRIX OF RESIDUALS

CANCER DIABETES CARDIO PNEU_FLU PULMONAR LIVER

CANCER 0.000

DIABETES 0.011 0.000

CARDIO -0.019 -0.010 0.000

PNEU_FLU 0.005 0.024 0.030 0.000

PULMONAR 0.046 0.014 -0.037 -0.017 0.000

LIVER -0.083 0.074 0.172 -0.040 -0.087 0.000

#####

Factor analysis is a statistical method used to explain variability among observed variables in terms of fewer unobserved variables called factors. The observed variables are modeled as linear combinations of the factors, plus "error" terms. The information gained about the interdependencies can be used later to reduce the set of variables in a dataset.

Factor analysis is used in behavioral sciences, social sciences, marketing, product management, operations research, and other applied sciences that deal with large quantities of data.

Applications in psychology

Factor analysis is used to identify "factors" that explain a variety of results on different tests. For example, intelligence research found that people who get a high score on a test of verbal ability are also good on other tests that require verbal abilities. Researchers explained this by using factor analysis to isolate one factor, often called crystallized intelligence or verbal intelligence, that represents the degree in which someone is able to solve problems involving verbal skills.

Factor analysis in psychology is most often associated with intelligence research. However, it also has been used to find factors in a broad range of domains such as personality, attitudes, beliefs, etc. It is linked to psychometrics, as it can assess the validity of an instrument by finding if the instrument indeed measures the postulated factors.

Advantages

• Reduction of number of variables, by combining two or more variables into a single factor. For example, performance at running, ball throwing, batting, jumping and weight lifting could be combined into a single factor such as general athletic ability. Usually, in an item by people matrix, factors are selected by grouping related items. In the Q factor analysis technique, the matrix is transposed and factors are created by grouping related people: For example, liberals, libertarians, conservatives and socialists, could form separate groups.

• Identification of groups of inter-related variables, to see how they are related to each other. For example, a factor called "broad visual perception" relates to how good an individual is at visual tasks ----a "broad auditory perception" factor, relating to auditory task capability. Furthermore, a global factor, called "g" or general intelligence, that relates to both "broad visual perception" and "broad auditory perception". This means someone with a high "g" is likely to have both a high "visual perception" capability and a high "auditory perception" capability, and that "g" therefore explains a good part of why someone is good or bad in both of those domains.

Disadvantages

• "...each orientation is equally acceptable mathematically. But different factorial theories proved to differ as much in terms of the orientations of factorial axes for a given solution as in terms of anything else, so that model fitting did not prove to be useful in distinguishing among theories." . This means all rotations represent different underlying processes, but all rotations are equally valid outcomes of standard factor analysis optimization. Therefore, it is impossible to pick the proper rotation using factor analysis alone.

• Factor analysis can be only as good as the data allows. In psychology, where researchers have to rely on more or less valid and reliable measures such as self-reports, this can be problematic.

• Interpreting factor analysis is based on using a “heuristic”, which is a solution that is "convenient even if not absolutely true". More than one interpretation can be made of the same data factored the same way, and factor analysis cannot identify causality.

Factor analysis in marketing

The basic steps are:

• Identify the salient attributes consumers use to evaluate products in this category.

• Use quantitative marketing research techniques (such as surveys) to collect data from a sample of potential customers concerning their ratings of all the product attributes.

• Input the data into a statistical program and run the factor analysis procedure. The computer will yield a set of underlying attributes (or factors).

• Use these factors to construct perceptual maps and other product positioning devices.

Information collection

The data collection stage is usually done by marketing research professionals. Survey questions ask the respondent to rate a product sample or descriptions of product concepts on a range of attributes. Anywhere from five to twenty attributes are chosen. They could include things like: ease of use, weight, accuracy, durability, colourfulness, price, or size. The attributes chosen will vary depending on the product being studied. The same question is asked about all the products in the study. The data for multiple products is coded and input into a statistical program such as SPSS and SYSTAT.

Analysis

The analysis will isolate the underlying factors that explain the data. Factor analysis is an interdependence technique. The complete set of interdependent relationships are examined. There is no specification of either dependent variables, independent variables, or causality. Factor analysis assumes that all the rating data on different attributes can be reduced down to a few important dimensions. This reduction is possible because the attributes are related. The rating given to any one attribute is partially the result of the influence of other attributes. The statistical algorithm deconstructs the rating (called a raw score) into its various components, and reconstructs the partial scores into underlying factor scores. The degree of correlation between the initial raw score and the final factor score is called a factor loading. There are two approaches to factor analysis: "principal component analysis" (the total variance in the data is considered); and "common factor analysis" (the common variance is considered).

########################

Human Resources

Answers by Expert:

human resource management, human resource planning, strategic planning in resource, management development, training, business coaching, management training, coaching, counseling, recruitment, selection, performance management.

18 years of managerial working exercise which covers business planning , strategic planning, marketing, sales management,

management service, organization development

PLUS

24 years of management consulting which includes business planning, corporate planning, strategic planning, business development, product management, human resource management/ development,training,

business coaching, etc**Organizations**

Principal---BESTBUSICON Pty Ltd**Education/Credentials**

MASTERS IN SCIENCE

MASTERS IN BUSINESS ADMINSTRATION