Management Consulting/Research methodology for Management decissions
What is the meaning of measurement in research? What difference does it make whether we measure in terms of a nominal, ordinal, interval or ratio scale?
3. What is the meaning of measurement in research? What difference does it make whether we measure in terms of a nominal, ordinal, interval or ratio scale?
Measurement is the process observing and recording the observations that are collected as part of a research effort. There are two major issues that will be considered here.
First, you have to understand the fundamental ideas involved in measuring. Here we consider two of major measurement concepts. In Levels of Measurement, I explain the meaning of the four major levels of measurement: nominal, ordinal, interval and ratio. Then we move on to the reliability of measurement, including consideration of true score theory and a variety of reliability estimators.
Second, you have to understand the different types of measures that you might use in social research. We consider four broad categories of measurements. Survey research includes the design and implementation of interviews and questionnaires. Scaling involves consideration of the major methods of developing and implementing a scale. Qualitative research provides an overview of the broad range of non-numerical measurement approaches. And unobtrusive measures presents a variety of measurement methods that don't intrude on or interfere with the context of the research
There are different levels of measurement that have been classified into four categories. It is important for the researcher to understand the different levels of measurement, as these levels of measurement play a part in determining the arithmetic and the statistical operations that are carried out on the data.
In ascending order of precision, the four different levels of measurement are nominal, ordinal, interval, and ratio.
The first level of measurement is nominal measurement. In this level of measurement, the numbers are used to classify the data. Also, in this level of measurement, words and letters can be used. Suppose there are data about people belonging to two different genders. In this case, the person belonging to the female gender could be classified as F, and the person belonging to the male gender could be classified as M. This type of assigning classification is nothing but the nominal level of measurement.
The second level of measurement is the ordinal level of measurement. This level of measurement depicts some ordered relationship between the number of items. Suppose a student scores the maximum marks in the class. In this case, he would be assigned the first rank. Then, the person scoring the second highest marks would be assigned the second rank, and so on. This level of measurement signifies some specific reason behind the assignment. The ordinal level of measurement indicates an approximate ordering of the measurements. The researcher should note that in this type of measurement, the difference or the ratio between any two types of rankings is not the same along the scale.
The third level of measurement is the interval level of measurement. The interval level of measurement not only classifies and orders the measurements, but it also specifies that the distances between each interval on the scale are equivalent along the scale from low interval to high interval. For example, an interval level of measurement could be the measurement of anxiety in a student between the score of 10 and 11, if this interval is the same as that of a student who is in between the score of 40 and 41. A popular example of this level of measurement is temperature in centigrade, where, for example, the distance between 940C and 960C is the same as the distance between 1000C and 1020C.
The fourth level of measurement is the ratio level of measurement. In this level of measurement, the measurements can have a value of zero as well, which makes this type of measurement unlike the other types of measurement, although the properties are similar to that of the interval level of measurement. In the ratio level of measurement, the divisions between the points on the scale have an equivalent distance between them, and the rankings assigned to the items are according to their size.
The researcher should note that among these levels of measurement, the nominal level is simply used to classify data, whereas the levels of measurement described by the interval level and the ratio level are much more exact.
1. Measure the cycle time
2. Measure the Business value, ROI and not the number of items or hours worked. Who cares if fruit that you made isn’t juicy.
3. Customer satisfaction – most will put their faith into customer survey – they allocate 1024 questions and forget to free the customer off. The ultimate question is much superior approach, you ask just one question. It goes like this: Would you offer this (product) to a friend? The scale would be from 1 to 10, 1 is recommending it to an enemy.
For a meaningful comparison, quality criteria for measurement properties are needed.
STUDY DESIGN AND SETTING: Quality criteria for content validity, internal consistency, criterion validity, construct validity, reproducibility, longitudinal validity, responsiveness, floor and ceiling effects, and interpretability were derived from existing guidelines and consensus within our research group.
RESULTS: For each measurement property a criterion was defined for a positive, negative, or indeterminate rating, depending on the design, methods, and outcomes of the validation study.
CONCLUSION: Our criteria make a substantial contribution toward defining explicit quality criteria for measurement questionnaires.
Validity is the extent to which a test measures what it claims to measure. It is vital for a test to be valid in order for the results to be accurately applied and interpreted.
Validity isn’t determined by a single statistic, but by a body of research that demonstrates the relationship between the test and the behavior it is intended to measure. There are three types of validity:
When a test has content validity, the items on the test represent the entire range of possible items the test should cover. Individual test questions may be drawn from a large pool of items that cover a broad range of topics.
In some instances where a test measures a trait that is difficult to define, an expert judge may rate each item’s relevance. Because each judge is basing their rating on opinion, two independent judges rate the test separately. Items that are rated as strongly relevant by both judges will be included in the final test.
A test is said to have criterion-related validity when the test is demonstrated to be effective in predicting criterion or indicators of a construct. There are two different types of criterion validity:
• Concurrent Validity occurs when the criterion measures are obtained at the same time as the test scores. This indicates the extent to which the test scores accurately estimate an individual’s current state with regards to the criterion. For example, on a test that measures levels of depression, the test would be said to have concurrent validity if it measured the current levels of depression experienced by the test taker.
• Predictive Validity occurs when the criterion measures are obtained at a time after the test. Examples of test with predictive validity are career or aptitude tests, which are helpful in determining who is likely to succeed or fail in certain subjects or occupations.
A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait. Intelligence tests are one example of measurement instruments that should have construct validity.
Reliability refers to the consistency of a measure. A test is considered reliable if we get the same result repeatedly. For example, if a test is designed to measure a trait (such as introversion), then each time the test is administered to a subject, the results should be approximately the same. Unfortunately, it is impossible to calculate reliability exactly, but there several different ways to estimate reliability.
To gauge test-retest reliability, the test is administered twice at two different points in time. This kind of reliability is used to assess the consistency of a test across time. This type of reliability assumes that there will be no change in the quality or construct being measured. Test-retest reliability is best used for things that are stable over time, such as intelligence. Generally, reliability will be higher when little time has passed between tests.
This type of reliability is assessed by having two or more independent judges score the test. The scores are then compared to determine the consistency of the raters estimates. One way to test inter-rater reliability is to have each rater assign each test item a score. For example, each rater might score items on a scale from 1 to 10. Next, you would calculate the correlation between the two rating to determine the level of inter-rater reliability. Another means of testing inter-rater reliability is to have raters determine which category each observations falls into and then calculate the percentage of agreement between the raters. So, if the raters agree 8 out of 10 times, the test has an 80% inter-rater reliability rate.
Parellel-forms reliability is gauged by comparing to different tests that were created using the same content. This is accomplished by creating a large pool of test items that measure the same quality and then randomly dividing the items into two separate tests. The two tests should then be administered to the same subjects at the same time.
Internal Consistency Reliability
This form of reliability is used to judge the consistency of results across items on the same test. Essentially, you are comparing test items that measure the same construct to determine the tests internal consistency. When you see a question that seems very similar to another test question, it may indicate that the two questions are being used to gauge reliability. Because the two questions are similar and designed to measure the same thing, the test taker should answer both questions the same, which would indicate that the test has internal consistency.
Validity refers to how well an experiment accurately represents the real world.
Reliability refers to how well it produces results (or, how well it doesn't fail).
You make an experiment more valid by including as many real world influencing factors as you can.
You make it reliable by eliminating problematic actions/materials that interfere with the experiment's ability to be successful.
Sometimes, you end up compromising one/both because they can interfere with each other