Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: Research-MethodologyMethods-and-Techniques-by-CR-Kothari
Description: this is not a note actually this is a book .Research-MethodologyMethods-and-Techniques-by-CR-Kothari
Description: this is not a note actually this is a book .Research-MethodologyMethods-and-Techniques-by-CR-Kothari
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
This page
intentionally left
blank
Copyright © 2004, 1990, 1985, New Age International (P) Ltd
...
, Publishers
All rights reserved
...
All inquiries should be emailed to rights@newagepublishers
...
newagepublishers
...
I
am presenting this second edition, thoroughly revised and enlarged, to my readers in all humbleness
...
The feedback
received from different sources has been incorporated
...
The other highlights of this revised edition are (i) the subject contents has been developed,
refined and restructured at several points, (ii) several new problems have also been added at the end
of various chapters for the benefit of students, and (iii) every page of the book has been read very
carefully so as to improve its quality
...
I firmly believe that there is always scope for improvement and accordingly I shall look
forward to received suggestions, (which shall be thankfully acknowledged) for further enriching the
quality of the text
...
R
...
Several
research studies are undertaken and accomplished year after year
...
The result is that much of research, particularly in social sciences, contains endless word-spinning
and too many quotations
...
It may be noted, in the
context of planning and development, that the significance of research lies in its quality and not in
quantity
...
The
methodology may differ from problem to problem, yet the basic approach towards research remains
the same
...
, (i) to
enable researchers, irrespective of their discipline, in developing the most appropriate methodology
for their research studies; and (ii) to make them familiar with the art of using different researchmethods and techniques
...
Regarding the organization, the book consists of fourteen chapters, well arranged in a coherent
manner
...
Chapter
Two explains the technique of defining a research problem
...
Chapter Four presents the details of several sampling
designs
...
Chapter Six presents a comparative study of the different
methods of data collection
...
Chapter Seven deals with processing and analysis
of data
...
Chapter Nine has been exclusively devoted to several parametric tests of hypotheses,
followed by Chapter Ten concerning Chi-square test
...
Important non-parametric tests,
generally used by researchers have been described and illustrated in Chapter Twelve
...
Factor analysis has been dealt with in relatively more detail
...
The book is primarily intended to serve as a textbook for graduate and M
...
students of
Research Methodology in all disciplines of various universities
...
The book is, in fact, an
outgrowth of my experience of teaching the subject to M
...
students for the last several years
...
I am grateful to all those persons whose writings and works
have helped me in the preparation of this book
...
I thankfully acknowledge the assistance provided by the University Grants
Commission in the form of ‘on account’ grant in the preparation of the manuscript of this book
...
I look forward to suggestions from all readers, specially from experienced researchers and
scholars for further improving the subject content as well as the presentation of this book
...
R
...
Research Methodology: An Introduction
vii
ix
1
Meaning of Research 1
Objectives of Research 2
Motivation in Research 2
Types of Research 2
Research Approaches 5
Significance of Research 5
Research Methods versus Methodology 7
Research and Scientific Method 9
Importance of Knowing How Research is Done 10
Research Process 10
Criteria of Good Research 20
Problems Encountered by Researchers in India 21
2
...
Research Design
Meaning of Research Design 31
Need for Research Design 32
31
xii
Research Methodology
Features of a Good Design 33
Important Concepts Relating to Research Design
Different Research Designs 35
Basic Principles of Experimental Designs 39
Conclusion 52
Appendix
Developing a Research Plan 53
33
4
...
Measurement and Scaling Techniques
Measurement in Research 69
Measurement Scales 71
Sources of Error in Measurement 72
Tests of Sound Measurement 73
Technique of Developing Measurement Tools
Scaling 76
Meaning of Scaling 76
Scale Classification Bases 77
Important Scaling Techniques 78
Scale Construction Techniques 82
69
75
6
...
Processing and Analysis of Data
Processing Operations 122
Some Problems in Processing 129
Elements/Types of Analysis 130
Statistics in Research 131
Measures of Central Tendency 132
Measures of Dispersion 134
Measures of Asymmetry (Skewness) 136
Measures of Relationship 138
Simple Regression Analysis 141
Multiple Correlation and Regression 142
Partial Correlation 143
Association in Case of Attributes 144
Other Measures 147
Appendix: Summary Chart Concerning Analysis of Data
8
...
Testing of Hypotheses-I (Parametric or
Standard Tests of Hypotheses)
184
What is a Hypothesis? 184
Basic Concepts Concerning Testing of Hypotheses 185
Procedure for Hypothesis Testing 191
Flow Diagram for Hypothesis Testing 192
Measuring the Power of a Hypothesis Test 193
Tests of Hypotheses 195
Important Parametric Tests 195
Hypothesis Testing of Means 197
Hypothesis Testing for Differences between Means 207
Hypothesis Testing for Comparing Two Related Samples 214
Hypothesis Testing of Proportions 218
Hypothesis Testing for Difference between Proportions 220
Hypothesis Testing for Comparing a Variance to
Some Hypothesized Population Variance 224
Testing the Equality of Variances of Two Normal Populations 225
Hypothesis Testing of Correlation Coefficients 228
Limitations of the Tests of Hypotheses 229
10
...
Analysis of Variance and Covariance
Analysis of Variance (ANOVA) 256
What is ANOVA? 256
The Basic Principle of ANOVA 257
ANOVA Technique 258
Setting up Analysis of Variance Table 259
Short-cut Method for One-way ANOVA 260
Coding Method 261
Two-way ANOVA 264
233
250
256
Contents
ANOVA in Latin-Square Design 271
Analysis of Co-variance (ANOCOVA)
ANOCOVA Technique 275
Assumptions in ANOCOVA 276
xv
275
12
...
Multivariate Analysis Techniques
315
Growth of Multivariate Techniques 315
Characteristics and Applications 316
Classification of Multivariate Techniques 316
Variables in Multivariate Analysis 318
Important Multivariate Techniques 318
Important Methods of Factor Analysis 323
Rotation in Factor Analysis 335
R-type and Q-type Factor Analyses 336
Path Analysis 339
Conclusion 340
Appendix: Summary Chart: Showing the Appropriateness
of a Particular Multivariate Technique 343
14
...
The Computer: Its Role in Research
Introduction 361
The Computer and Computer Technology
The Computer System 363
Important Characteristics 364
The Binary Number System 365
Computer Applications 370
Computers and Researcher 371
361
361
Appendix—Selected Statistical Tables
Selected References and Recommended Readings
Author Index
Subject Index
375
390
395
398
Research Methodology: An Introduction
1
1
Research Methodology:
An Introduction
MEANING OF RESEARCH
Research in common parlance refers to a search for knowledge
...
In fact, research is an
art of scientific investigation
...
”1 Redman and Mory define research as a “systematized effort to gain
new knowledge
...
It is actually a voyage of discovery
...
This inquisitiveness is the mother of all knowledge and
the method, which man employs for obtaining the knowledge of whatever the unknown, can be
termed as research
...
According to Clifford Woody research comprises defining and redefining problems, formulating
hypothesis or suggested solutions; collecting, organising and evaluating data; making deductions and
reaching conclusions; and at last carefully testing the conclusions to determine whether they fit the
formulating hypothesis
...
Slesinger and M
...
”3 Research is, thus, an original contribution to the existing stock of knowledge
making for its advancement
...
In short, the search for knowledge through objective and systematic method of
finding solution to a problem is research
...
As such the term ‘research’ refers to the systematic method
1
The Advanced Learner’s Dictionary of Current English, Oxford, 1952, p
...
L
...
Redman and A
...
H
...
10
...
IX, MacMillan, 1930
...
OBJECTIVES OF RESEARCH
The purpose of research is to discover answers to questions through the application of scientific
procedures
...
Though each research study has its own specific purpose, we may think of
research objectives as falling into a number of following broad groupings:
1
...
To portray accurately the characteristics of a particular individual, situation or a group
(studies with this object in view are known as descriptive research studies);
3
...
To test a hypothesis of a causal relationship between variables (such studies are known as
hypothesis-testing research studies)
...
The
possible motives for doing research may be either one or more of the following:
1
...
Desire to face the challenge in solving the unsolved problems, i
...
, concern over practical
problems initiates research;
3
...
Desire to be of service to society;
5
...
However, this is not an exhaustive list of factors motivating people to undertake research studies
...
TYPES OF RESEARCH
The basic types of research are as follows:
(i) Descriptive vs
...
The major purpose of descriptive research is description of the state of
affairs as it exists at present
...
The main characteristic
of this method is that the researcher has no control over the variables; he can only report
what has happened or what is happening
...
Ex post facto studies also
include attempts by researchers to discover causes even when they cannot control the
variables
...
In analytical research, on the
other hand, the researcher has to use facts or information already available, and analyze
these to make a critical evaluation of the material
...
Fundamental: Research can either be applied (or action) research or
fundamental (to basic or pure) research
...
“Gathering knowledge for knowledge’s sake is termed ‘pure’ or ‘basic’ research
...
Similarly, research studies, concerning human behaviour carried on
with a view to make generalisations about human behaviour, are also examples of
fundamental research, but research aimed at certain conclusions (say, a solution) facing a
concrete social or business problem is an example of applied research
...
Thus, the
central aim of applied research is to discover a solution for some pressing practical problem,
whereas basic research is directed towards finding information that has a broad base of
applications and thus, adds to the already existing organized body of scientific knowledge
...
Qualitative: Quantitative research is based on the measurement of quantity
or amount
...
Qualitative research, on the other hand, is concerned with qualitative phenomenon, i
...
,
phenomena relating to or involving quality or kind
...
e
...
This type of research aims at discovering the underlying motives and desires, using in depth
interviews for the purpose
...
Attitude or opinion research i
...
, research designed to find out how people feel or what
they think about a particular subject or institution is also qualitative research
...
Through such research we can analyse the various
factors which motivate people to behave in a particular manner or which make people like
or dislike a particular thing
...
Young, Scientific Social Surveys and Research, p
...
4
Research Methodology
practice is relatively a difficult job and therefore, while doing such research, one should
seek guidance from experimental psychologists
...
Empirical: Conceptual research is that related to some abstract idea(s) or
theory
...
On the other hand, empirical research relies on experience or
observation alone, often without due regard for system and theory
...
We can also call it as experimental type of research
...
In such a research, the researcher must
first provide himself with a working hypothesis or guess as to the probable results
...
He then sets up
experimental designs which he thinks will manipulate the persons or the materials concerned
so as to bring forth the desired information
...
Empirical research is appropriate when proof is sought that
certain variables affect other variables in some way
...
(v) Some Other Types of Research: All other types of research are variations of one or more
of the above stated approaches, based on either the purpose of research, or the time
required to accomplish research, on the environment in which research is done, or on the
basis of some other similar factor
...
In the former case the research is
confined to a single time-period, whereas in the latter case the research is carried on over
several time-periods
...
Research can as well be understood as clinical or diagnostic research
...
Such
studies usually go deep into the causes of things or events that interest us, using very small
samples and very deep probing data gathering devices
...
The objective of exploratory research is the development of
hypotheses rather than their testing, whereas formalized research studies are those with
substantial structure and with specific hypotheses to be tested
...
to study events or ideas of
the past, including the philosophy of persons and groups at any remote point of time
...
While doing conclusionoriented research, a researcher is free to pick up a problem, redesign the enquiry as he
proceeds and is prepared to conceptualize as he wishes
...
Operations research is an example
of decision oriented research since it is a scientific method of providing executive departments
with a quantitative basis for decisions regarding operations under their control
...
, quantitative approach and the qualitative approach
...
This approach can be further sub-classified into inferential,
experimental and simulation approaches to research
...
This
usually means survey research where a sample of population is studied (questioned or observed) to
determine its characteristics, and it is then inferred that the population has the same characteristics
...
Simulation
approach involves the construction of an artificial environment within which relevant information
and data can be generated
...
The term ‘simulation’ in the context of business and social
sciences applications refers to “the operation of a numerical model that represents the structure of a
dynamic process
...
”5 Simulation approach can also
be useful in building models for understanding future conditions
...
Research in such a situation is a function of researcher’s insights and impressions
...
Generally, the techniques of focus group interviews,
projective techniques and depth interviews are used
...
Significance of Research
“All progress is born of inquiry
...
Increased amounts of research make progress possible
...
The role of research in several fields of applied economics, whether related to business or
to the economy as a whole, has greatly increased in modern times
...
Research, as an aid to economic policy, has gained added importance, both for government
and business
...
For instance, government’s budgets rest in part on an analysis of the needs and desires of the people
and on the availability of revenues to meet these needs
...
Through research we can
devise alternative policies and can as well examine the consequences of each of these alternatives
...
Meir, William T
...
Dazier, Simulation in Business and Economics, p
...
6
Research Methodology
Decision-making may not be a part of research, but research certainly facilitates the decisions of the
policy maker
...
The plight of
cultivators, the problems of big and small business and industry, working conditions, trade union
activities, the problems of distribution, even the size and nature of defence services are matters
requiring research
...
Another area in government, where research is necessary, is collecting information on the
economic and social structure of the nation
...
Collecting such statistical information is by no means a
routine task, but it involves a variety of research problems
...
Thus, in the context of government,
research as a tool to economic policy has three distinct phases of operation, viz
...
e
...
Research has its special significance in solving various operational and planning problems
of business and industry
...
Market research is the investigation of the structure and development of a market for the purpose of
formulating efficient policies for purchasing, production and sales
...
Motivational
research of determining why people behave as they do is mainly concerned with market characteristics
...
All these are of great help to people in business and industry who are responsible for
taking business decisions
...
Given knowledge of future demand, it is generally not difficult for a firm, or for an industry
to adjust its supply schedule within the limits of its projected capacity
...
Business budgeting, which ultimately results in a
projected profit and loss account, is based mainly on sales estimates which in turn depends on
business research
...
Research, thus, replaces
intuitive business decisions by more logical and scientific decisions
...
It provides the intellectual satisfaction of knowing a
few things just for the sake of knowledge and also has practical utility for the social scientist to know
for the sake of being able to do something better or in a more efficient manner
...
“This double emphasis is perhaps especially appropriate in the case
of social science
...
On
the other hand, because of its social orientation, it is increasingly being looked to for practical guidance
in solving immediate problems of human relations
...
Cook, Research Methods in Social Relations, p
...
Research Methodology: An Introduction
7
In addition to what has been stated above, the significance of research can also be understood
keeping in view the following points:
(a) To those students who are to write a master’s or Ph
...
thesis, research may mean a
careerism or a way to attain a high position in the social structure;
(b) To professionals in research methodology, research may mean a source of livelihood;
(c) To philosophers and thinkers, research may mean the outlet for new ideas and insights;
(d) To literary men and women, research may mean the development of new styles and creative
work;
(e) To analysts and intellectuals, research may mean the generalisations of new theories
...
It is a sort of
formal training which enables one to understand the new developments in one’s field in a better way
...
Research methods may be understood as all those methods/techniques that are used
for conduction of research
...
Research techniques refer to
the behaviour and instruments we use in performing research operations such as making observations, recording data,
techniques of processing data and the like
...
For instance, the difference between methods and techniques of data collection can better
be understood from the details given in the following chart—
Type
Methods
1
...
Field
(i) Non-participant direct
Research
observation
(ii) Participant observation
(iii) Mass observation
(iv)
(v)
(vi)
(vii)
(viii)
(ix)
Mail questionnaire
Opinionnaire
Personal interview
Focused interview
Group interview
Telephone survey
(x) Case study and life history
3
...
Statistical compilations and manipulations, reference and abstract
guides, contents analysis
...
Interactional recording, possible use of tape recorders, photo graphic
techniques
...
Identification of social and economic background of respondents
...
Interviewer uses a detailed schedule with open and closed questions
...
Small groups of respondents are interviewed simultaneously
...
Cross sectional collection of data for intensive analysis, longitudinal
collection of data of intensive character
...
From what has been stated above, we can say that methods are more general
...
However, in practice, the two terms are taken as interchangeable and when we talk of research methods we do, by
implication, include research techniques within their compass
...
In other words, all those methods which are used by the
researcher during the course of studying his research problem are termed as research methods
...
Keeping this in view, research methods can be put into the following
three groups:
1
...
These methods will be used where the data already available are not sufficient to
arrive at the required solution;
2
...
The third group consists of those methods which are used to evaluate the accuracy of the
results obtained
...
Research methodology is a way to systematically solve the research problem
...
In it we study the various
steps that are generally adopted by a researcher in studying his research problem along with the logic
behind them
...
Researchers not only need to know how to develop certain indices or tests,
how to calculate the mean, the mode, the median or the standard deviation or chi-square, how to
apply particular research techniques, but they also need to know which of these methods or techniques,
are relevant and which are not, and what would they mean and indicate and why
...
All this means that it is necessary for the researcher to design his methodology
for his problem as the same may differ from problem to problem
...
e
...
Similarly, in research the scientist has to expose
the research decisions to evaluation before they are implemented
...
From what has been stated above, we can say that research methodology has many dimensions
and research methods do constitute a part of the research methodology
...
Thus, when we talk of research methodology
we not only talk of the research methods but also consider the logic behind the methods we use
in the context of our research study and explain why we are using a particular method or
technique and why we are not using others so that research results are capable of being
evaluated either by the researcher himself or by others
...
Research Methodology: An Introduction
9
Research and Scientific Method
For a clear perception of the term research, one should know the meaning of scientific method
...
Research, as we have already stated,
can be termed as “an inquiry into the nature of, the reasons for, and the consequences of any
particular set of circumstances, whether these circumstances are experimentally controlled or recorded
just as they occur
...
”7 On the other hand, the philosophy common to all research methods and
techniques, although they may vary considerably from one science to another, is usually given the
name of scientific method
...
”8 Scientific method is the pursuit of truth as determined by logical
considerations
...
Scientific method
attempts to achieve “this ideal by experimentation, observation, logical arguments from accepted
postulates and a combination of these three in varying proportions
...
Further, logic develops the consequences of such alternatives, and when these are compared with
observable phenomena, it becomes possible for the researcher or the scientist to state which alternative
is most in harmony with the observed facts
...
Experimentation is done to test hypotheses and to discover new relationships
...
But the conclusions drawn on the basis of experimental data are generally criticized for
either faulty assumptions, poorly designed experiments, badly executed experiments or faulty
interpretations
...
The purpose of survey investigations may also be to
provide scientifically gathered information to work as a basis for the researchers for their conclusions
...
2
...
4
...
e
...
It results into probabilistic predictions;
6
...
It aims at formulating most general axioms or what can be termed as scientific theories
...
Mensing, Statistics in Research, p
...
10–12
...
cit
...
2
...
”10 Accordingly, scientific method implies an objective,
logical and systematic method, i
...
, a method free from personal bias or prejudice, a method to
ascertain demonstrable qualities of a phenomenon capable of being verified, a method wherein the
researcher is guided by the rules of logical reasoning, a method wherein the investigation proceeds in
an orderly manner and a method that implies internal consistency
...
In fact, importance of knowing the methodology of research or how research is done stems from
the following considerations:
(i) For one who is preparing himself for a career of carrying out research, the importance of
knowing research methodology and research techniques is obvious since the same constitute
the tools of his trade
...
It helps him to develop disciplined
thinking or a ‘bent of mind’ to observe the field objectively
...
(ii) Knowledge of how to do research will inculcate the ability to evaluate and use research
results with reasonable confidence
...
(iii) When one knows how research is done, then one may have the satisfaction of acquiring a
new intellectual tool which can become a way of looking at the world and of judging every
day experience
...
Thus, the knowledge of research
methodology provides tools to took at things in life objectively
...
The knowledge of methodology helps the consumer of research
results to evaluate them and enables him to take rational decisions
...
Research process consists of series of actions or
steps necessary to effectively carry out research and the desired sequencing of these steps
...
1 well illustrates a research process
...
Lastrucci, The Scientific Approach: Basic Principles of the Scientific Method, p
...
FF
FF
Review the literature
Review concepts
and theories
Define
research
problem
Formulate
hypotheses
Review previous
research finding
Design research
(including
sample design)
III
I
Collect data
(Execution)
IV
F
Analyse data
(Test hypotheses
if any)
V
F
VI
Interpret
and report
VII
Research Methodology: An Introduction
RESEARCH PROCESS IN FLOW CHART
II
F
Where F = feed back (Helps in controlling the sub-system
to which it is transmitted)
FF = feed forward (Serves the vital function of
providing criteria for evaluation)
Fig
...
1
11
12
Research Methodology
The chart indicates that the research process consists of a number of closely related activities,
as shown through I to VII
...
At times, the first step determines the nature of the last step to be undertaken
...
One should remember that the
various steps involved in a research process are not mutually exclusive; nor they are separate and
distinct
...
However, the following order concerning various steps provides a useful procedural guideline
regarding the research process: (1) formulating the research problem; (2) extensive literature survey;
(3) developing the hypothesis; (4) preparing the research design; (5) determining sample design;
(6) collecting the data; (7) execution of the project; (8) analysis of data; (9) hypothesis testing;
(10) generalisations and interpretation, and (11) preparation of the report or presentation of the results,
i
...
, formal write-up of conclusions reached
...
1
...
, those
which relate to states of nature and those which relate to relationships between variables
...
e
...
Initially the
problem may be stated in a broad general way and then the ambiguities, if any, relating to the problem
be resolved
...
The formulation of a general topic into a specific research
problem, thus, constitutes the first step in a scientific enquiry
...
, understanding the problem thoroughly, and rephrasing the
same into meaningful terms from an analytical point of view
...
In an academic institution the researcher can seek the
help from a guide who is usually an experienced man and has several research problems in mind
...
In private business units or in governmental
organisations, the problem is usually earmarked by the administrative agencies with whom the
researcher can discuss as to how the problem originally came about and what considerations are
involved in its possible solutions
...
He may review two types of literature—the conceptual literature concerning
the concepts and theories, and the empirical literature consisting of studies made earlier which are
similar to the one proposed
...
After this the researcher rephrases the problem
into analytical or operational terms i
...
, to put the problem in as specific terms as possible
...
The problem to be investigated must be defined unambiguously for that will help discriminating
relevant data from irrelevant ones
...
Professor W
...
Neiswanger correctly states that
Research Methodology: An Introduction
13
the statement of the objective is of basic importance because it determines the data which are to be
collected, the characteristics of the data which are relevant, relations which are to be explored, the
choice of techniques to be used in these explorations and the form of the final report
...
In fact, formulation of the problem often follows a sequential pattern where a number of
formulations are set up, each formulation more specific than the preceeding one, each one phrased in
more analytical terms, and each more realistic in terms of the available data and resources
...
Extensive literature survey: Once the problem is formulated, a brief summary of it should be
written down
...
D
...
At this juncture the researcher should undertake extensive literature survey connected with the
problem
...
Academic journals, conference proceedings, government
reports, books etc
...
In this process, it should
be remembered that one source will lead to another
...
A good library will be a great help to the researcher at
this stage
...
Development of working hypotheses: After extensive literature survey, researcher should
state in clear terms the working hypothesis or hypotheses
...
As such the manner in
which research hypotheses are developed is particularly important since they provide the focal point
for research
...
In most types of research, the
development of working hypothesis plays an important role
...
The role of the hypothesis is to
guide the researcher by delimiting the area of research and to keep him on the right track
...
It also indicates the
type of data required and the type of methods of data analysis to be used
...
Thus, working hypotheses arise as a result of a-priori thinking about the subject, examination of the
available data and material including related studies and the counsel of experts and interested parties
...
It may as well
be remembered that occasionally we may encounter a problem where we do not need working
14
Research Methodology
hypotheses, specially in the case of exploratory or formulative researches which do not aim at testing
the hypothesis
...
4
...
e
...
The preparation of such a design
facilitates research to be as efficient as possible yielding maximal information
...
But how all these can be achieved depends mainly on the research
purpose
...
, (i) Exploration, (ii) Description,
(iii) Diagnosis, and (iv) Experimentation
...
But when the purpose happens to be an accurate description of
a situation or of an association between variables, the suitable design will be one that minimises bias
and maximises the reliability of the data collected and analysed
...
Experimental designs can be either informal designs (such as before-and-after without control,
after-only with control, before-and-after with control) or formal designs (such as completely randomized
design, randomized block design, Latin square design, simple and complex factorial designs), out of
which the researcher must select one for his own project
...
e
...
5
...
A complete enumeration of all the items in the ‘population’ is known as
a census inquiry
...
But in practice this may not be true
...
Moreover, there is no way of checking the element of bias or its extent except through a
resurvey or use of sample checks
...
Not only this, census inquiry is not possible in practice under many circumstances
...
Hence, quite often we select only a few items
from the universe for our study purposes
...
The researcher must decide the way of selecting a sample or what is popularly known as the
sample design
...
Thus, the plan to select 12 of a
Research Methodology: An Introduction
15
city’s 200 drugstores in a certain way constitutes a sample design
...
With probability samples each element has a known probability
of being included in the sample but the non-probability samples do not allow the researcher to determine
this probability
...
A brief mention of the
important sample designs is as follows:
(i) Deliberate sampling: Deliberate sampling is also known as purposive or non-probability
sampling
...
When population
elements are selected for inclusion in the sample based on the ease of access, it can be
called convenience sampling
...
This would be an example of convenience sample of gasoline buyers
...
On the other hand, in judgement sampling the researcher’s judgement is
used for selecting items which he considers as representative of the population
...
Judgement sampling is used quite frequently in qualitative research where the
desire happens to be to develop hypotheses rather than to generalise to larger populations
...
For example, if we have to select a sample of 300
items from a universe of 15,000 items, then we can put the names or numbers of all the
15,000 items on slips of paper and conduct a lottery
...
To select the sample, each item is assigned a number
from 1 to 15,000
...
To do
this we select some random starting point and then a systematic pattern is used in proceeding
through the table
...
When a number exceeds the limit of the numbers in the frame, in our case over 15,000, it is
simply passed over and the next number selected that does fall within the relevant range
...
This procedure gives each item an equal probability of being selected
...
(iii) Systematic sampling: In some instances the most practical way of sampling is to select
every 15th name on a list, every 10th house on one side of a street and so on
...
An element of randomness is usually introduced
into this kind of sampling by using random numbers to pick up the unit with which to start
...
In such a
design the selection process starts by picking some random point in the list and then every
nth element is selected until the desired number is secured
...
In this technique, the population is stratified into a number of nonoverlapping subpopulations or strata and sample items are selected from each stratum
...
(v) Quota sampling: In stratified sampling the cost of taking random samples from individual
strata is often so expensive that interviewers are simply given quota to be filled from
different strata, the actual selection of items for sample being left to the interviewer’s
judgement
...
The size of the quota for each stratum is generally
proportionate to the size of that stratum in the population
...
Quota samples generally happen to be judgement samples
rather than random samples
...
Suppose some departmental store wishes to sample its credit card holders
...
The sample size is to be kept say 450
...
Three clusters might then be selected for the sample randomly
...
The clustering approach can, however, make the sampling
procedure relatively easier and increase the efficiency of field work, specially in the case
of personal interviews
...
Under area sampling we first divide
the total area into a number of smaller non-overlapping areas, generally called geographical
clusters, then a number of these smaller areas are randomly selected, and all units in these
small areas are included in the sample
...
It also makes the field interviewing more efficient
since interviewer can do many interviews at each location
...
This
technique is meant for big inquiries extending to a considerably large geographical area like
an entire country
...
If the technique of random-sampling is applied at all stages, the sampling procedure
is described as multi-stage random sampling
...
This design is usually adopted
under acceptance sampling plan in the context of statistical quality control
...
It may be pointed out here that normally one
Research Methodology: An Introduction
17
should resort to random sampling so that bias can be eliminated and sampling error can be estimated
...
Also, there are conditions under which sample designs
other than random sampling may be considered better for reasons like convenience and low costs
...
6
...
There are several
ways of collecting the appropriate data which differ considerably in context of money costs, time and
other resources at the disposal of the researcher
...
If the researcher
conducts an experiment, he observes some quantitative measurements, or the data, with the help of
which he examines the truth contained in his hypothesis
...
The information obtained relates to
what is currently happening and is not complicated by either the past behaviour or future
intentions or attitudes of respondents
...
As such this method is not
suitable in inquiries where large samples are concerned
...
This method of collecting
data is usually carried out in a structured way where output depends upon the ability of the
interviewer to a large extent
...
This is not a very widely used method but it plays an
important role in industrial surveys in developed regions, particularly, when the survey has
to be accomplished in a very limited time
...
Questionnaires are mailed to the
respondents with a request to return after completing the same
...
Before applying this method, usually
a Pilot Study for testing the questionnaire is conduced which reveals the weaknesses, if
any, of the questionnaire
...
(v) Through schedules: Under this method the enumerators are appointed and given training
...
These enumerators go to
respondents with these schedules
...
Much depends upon the capability
of enumerators so far as this method is concerned
...
18
Research Methodology
The researcher should select one of these methods of collecting the data taking into
consideration the nature of investigation, objective and scope of the inquiry, finanical resources,
available time and the desired degree of accuracy
...
In this context Dr A
...
Bowley very aptly remarks that in collection of statistical data commonsense is the chief requisite
and experience the chief teacher
...
Execution of the project: Execution of the project is a very important step in the research
process
...
The researcher should see that the project is executed in a systematic
manner and in time
...
In such a situation, questions as well as the possible answers may be
coded
...
The training may be given with the help of instruction
manuals which explain clearly the job of the interviewers at each step
...
A careful watch should be kept for unanticipated factors in order to keep the survey as much
realistic as possible
...
If some of the respondents do not cooperate, some suitable methods should be
designed to tackle this problem
...
8
...
The analysis of data requires a number of closely related operations such as establishment of
categories, the application of these categories to raw data through coding, tabulation and then drawing
statistical inferences
...
Thus, researcher should classify the raw data into some
purposeful and usable categories
...
Editing is the
procedure that improves the quality of the data for coding
...
Tabulation is a part of the technical procedure wherein the classified data are put in the form of
tables
...
A great deal of data, specially in
large inquiries, is tabulated by computers
...
Analysis work after tabulation is generally based on the computation of various percentages,
coefficients, etc
...
In the process of analysis,
relationships or differences supporting or conflicting with original or new hypotheses should be subjected
to tests of significance to determine with what validity data can be said to indicate any conclusion(s)
...
Through
the use of statistical tests we can establish whether such a difference is a real one or is the result of
random fluctuations
...
Similarly, the technique of analysis of variance can
help us in analysing whether three or more varieties of seeds grown on certain fields yield significantly
different results or not
...
9
...
Do the facts support the hypotheses or they
happen to be contrary? This is the usual question which should be answered while testing hypotheses
...
The hypotheses may be tested through the use of one or more of such tests, depending upon
the nature and object of research inquiry
...
If the researcher had no hypotheses to start with, generalisations established on the
basis of data may be stated as hypotheses to be tested by subsequent researches in times to come
...
Generalisations and interpretation: If a hypothesis is tested and upheld several times, it may
be possible for the researcher to arrive at generalisation, i
...
, to build a theory
...
If the researcher had no
hypothesis to start with, he might seek to explain his findings on the basis of some theory
...
The process of interpretation may quite often trigger off new questions which in
turn may lead to further researches
...
Preparation of the report or the thesis: Finally, the researcher has to prepare the report of
what has been done by him
...
The layout of the report should be as follows: (i) the preliminary pages; (ii) the main text,
and (iii) the end matter
...
Then there should be a table of contents followed by a list of tables and list
of graphs and charts, if any, given in the report
...
The scope
of the study along with various limitations should as well be stated in this part
...
If the findings are extensive, they
should be summarised
...
(d) Conclusion: Towards the end of the main text, researcher should again put down the
results of his research clearly and precisely
...
At the end of the report, appendices should be enlisted in respect of all technical data
...
e
...
, consulted, should also be given in the end
...
20
Research Methodology
2
...
3
...
4
...
Criteria of Good Research
Whatever may be the types of research works and studies, one thing that is important is that they all
meet on the common ground of scientific method employed by them
...
The purpose of the research should be clearly defined and common concepts be used
...
The research procedure used should be described in sufficient detail to permit another
researcher to repeat the research for further advancement, keeping the continuity of what
has already been attained
...
The procedural design of the research should be carefully planned to yield results that are
as objective as possible
...
The researcher should report with complete frankness, flaws in procedural design and
estimate their effects upon the findings
...
The analysis of data should be sufficiently adequate to reveal its significance and the
methods of analysis used should be appropriate
...
6
...
7
...
In other words, we can state the qualities of a good research12 as under:
1
...
Systematic
characteristic of the research does not rule out creative thinking but it certainly does reject
the use of guessing and intuition in arriving at conclusions
...
Good research is logical: This implies that research is guided by the rules of logical
reasoning and the logical process of induction and deduction are of great value in carrying
out research
...
In fact, logical reasoning makes research more meaningful in the
context of decision making
...
39 (March, 1958), pp
...
See, Danny N
...
Greenberg, “Marketing Research—A Management Information Approach”,
p
...
12
Research Methodology: An Introduction
21
3
...
4
...
Problems Encountered by Researchers in India
Researchers in India, particularly those engaged in empirical research, are facing several problems
...
The lack of a scientific training in the methodology of research is a great impediment
for researchers in our country
...
Many researchers
take a leap in the dark without knowing research methods
...
Research to many researchers and
even to their guides, is mostly a scissor and paste job without any insight shed on the
collated materials
...
, the research results, quite often, do
not reflect the reality or realities
...
Before undertaking research projects, researchers should be well equipped
with all the methodological aspects
...
2
...
A great deal of primary data of non-confidential nature remain untouched/untreated
by the researchers for want of proper contacts
...
There is
need for developing some mechanisms of a university—industry interaction programme so
that academics can get ideas from practitioners on what needs to be researched and
practitioners can apply the research done by the academics
...
Most of the business units in our country do not have the confidence that the material
supplied by them to researchers will not be misused and as such they are often reluctant in
supplying the needed information to researchers
...
Thus, there is the need for generating the confidence that the
information/data obtained from a business unit will not be misused
...
Research studies overlapping one another are undertaken quite often for want of
adequate information
...
This problem
can be solved by proper compilation and revision, at regular intervals, of a list of subjects on
which and the places where the research is going on
...
5
...
Hence, there is need for developing a code
of conduct for researchers which, if adhered sincerely, can win over this problem
...
Many researchers in our country also face the difficulty of adequate and timely secretarial
assistance, including computerial assistance
...
All possible efforts be made in this direction so that efficient
secretarial assistance is made available to researchers and that too well in time
...
7
...
,
rather than in tracing out relevant material from them
...
There is also the problem that many of our libraries are not able to get copies of old
and new Acts/Rules, reports and other government publications in time
...
Thus,
efforts should be made for the regular and speedy supply of all governmental publications
to reach our libraries
...
There is also the difficulty of timely availability of published data from various
government and other agencies doing this job in our country
...
10
...
Questions
1
...
3
...
Briefly describe the different steps involved in a research process
...
Distinguish between Research methods and Research methodology
...
5
...
6
...
State the problems
that are usually faced by such researchers
...
Univ
...
, M
...
Exam
...
“A research scholar has to work as a judge and derive the truth and not as a pleader who is only eager
to prove his case in favour of his plaintiff
...
Research Methodology: An Introduction
23
8
...
Discuss this statement and examine
the significance of research”
...
Univ
...
, M
...
Exam
...
“Research is much concerned with proper fact finding, analysis and evaluation
...
10
...
Account for this state of affairs and give suggestions for
improvement
...
* A researcher must find the problem and formulate it so that it becomes susceptible
to research
...
To define a problem
correctly, a researcher must know: what a problem is?
WHAT IS A RESEARCH PROBLEM?
A research problem, in general, refers to some difficulty which a researcher experiences in the
context of either a theoretical or practical situation and wants to obtain a solution for the same
...
The individual or the organisation, as the case may be, occupies
an environment, say ‘N’, which is defined by values of the uncontrolled variables, Yj
...
A course of
action is defined by one or more values of the controlled variables
...
(iii) There must be at least two possible outcomes, say O1 and O2, of the course of action, of
which one should be preferable to the other
...
e
...
(iv) The courses of action available must provides some chance of obtaining the objective, but
they cannot provide the same chance, otherwise the choice would not matter
...
In simple words, we can say that the choices
must have unequal efficiencies for the desired outcomes
...
Exploratory
or formulative research studies do not start with a problem or hypothesis, their problem is to find a problem or the
hypothesis to be tested
...
This aspect has been dealt with in chapter entitled
“Research Design”
...
e
...
Thus, an individual or a group of persons can be said to have a problem which can be
technically described as a research problem, if they (individual or the group), having one or more
desired outcomes, are confronted with two or more courses of action that have some but not equal
efficiency for the desired objective(s) and are in doubt about which course of action is best
...
(ii) There must be some objective(s) to be attained at
...
(iii) There must be alternative means (or the courses of action) for obtaining the objective(s)
one wishes to attain
...
(iv) There must remain some doubt in the mind of a researcher with regard to the selection of
alternatives
...
(v) There must be some environment(s) to which the difficulty pertains
...
e
...
There are several factors which may result in making the problem
complicated
...
All such elements (or at least the important ones) may be
thought of in context of a research problem
...
The task is a difficult one,
although it may not appear to be so
...
Nevertheless, every researcher must find out his own salvation for research problems cannot be
borrowed
...
If our eyes need glasses, it is not the optician alone who decides about the number of the lens
we require
...
Thus, a research guide can at the most only help a researcher choose a
subject
...
(ii) Controversial subject should not become the choice of an average researcher
...
L
...
26
Research Methodology
(iii) Too narrow or too vague problems should be avoided
...
Even then it is quite difficult to
supply definitive ideas concerning how a researcher should obtain ideas for his research
...
He may as well read articles published in current
literature available on the subject and may think how the techniques and ideas discussed
therein might be applied to the solution of other problems
...
In this way he should make all possible efforts in
selecting a problem
...
In other words, before the final selection of a problem is done, a researcher must
ask himself the following questions:
(a) Whether he is well equipped in terms of his background to carry out the research?
(b) Whether the study falls within the budget he can afford?
(c) Whether the necessary cooperation can be obtained from those who must participate
in research as subjects?
If the answers to all these questions are in the affirmative, one may become sure so far as
the practicability of the study is concerned
...
This may not be
necessary when the problem requires the conduct of a research closely similar to one that
has already been done
...
If the subject for research is selected properly by observing the above mentioned points, the
research will not be a boring drudgery, rather it will be love’s labour
...
The subject or the problem selected must involve the researcher and must have an upper most place
in his mind so that he may undertake all pains needed for the study
...
This statement signifies
the need for defining a research problem
...
A proper
definition of research problem will enable the researcher to be on the track whereas an ill-defined
problem may create hurdles
...
What techniques are to
be used for the purpose? and similar other questions crop up in the mind of the researcher who can
well plan his strategy and find answers to all such questions only when the research problem has
been well defined
...
In fact, formulation of a problem is often more essential than its
Defining the Research Problem
27
solution
...
TECHNIQUE INVOLVED IN DEFINING A PROBLEM
Let us start with the question: What does one mean when he/she wants to define a research problem?
The answer may be that one wants to state the problem along with the bounds within which it is to be
studied
...
How to define a research problem is undoubtedly a herculean task
...
The usual
approach is that the researcher should himself pose a question (or in case someone else wants the
researcher to carry on research, the concerned individual, organisation or an authority should pose
the question to the researcher) and set-up techniques and procedures for throwing light on the
question concerned for formulating or defining the research problem
...
Defining a research problem properly and clearly is a crucial part of a research study and must
in no case be accomplished hurriedly
...
Hence, the research problem should be defined in a systematic manner,
giving due weightage to all relating points
...
A brief description of all these points will be helpful
...
For this purpose, the researcher must immerse himself thoroughly in the subject matter
concerning which he wishes to pose a problem
...
Then the researcher can himself state the problem or he
can seek the guidance of the guide or the subject expert in accomplishing this task
...
In case there is some directive from an organisational
authority, the problem then can be stated accordingly
...
At the same time the feasibility of a particular solution has to be considered and the same
should be kept in view while stating the problem
...
The best way of understanding the problem is to discuss it
with those who first raised it in order to find out how the problem originally came about and with what
objectives in view
...
For a better
28
Research Methodology
understanding of the nature of the problem involved, he can enter into discussion with those who
have a good knowledge of the problem concerned or similar other problems
...
(iii) Surveying the available literature: All available literature concerning the problem at hand
must necessarily be surveyed and examined before a definition of the research problem is given
...
He must devote sufficient time in reviewing of
research already undertaken on related problems
...
“Knowing what data are available often
serves to narrow the problem itself as well as the technique that might be used
...
This would also
help a researcher to know if there are certain gaps in the theories, or whether the existing theories
applicable to the problem under study are inconsistent with each other, or whether the findings of the
different studies do not follow a pattern consistent with the theoretical expectations and so on
...
e
...
Studies on related problems are useful for indicating the
type of difficulties that may be encountered in the present study as also the possible analytical
shortcomings
...
(iv) Developing the ideas through discussions: Discussion concerning a problem often produces
useful information
...
Hence, a researcher
must discuss his problem with his colleagues and others who have enough experience in the same
area or in working on similar problems
...
People
with rich experience are in a position to enlighten the researcher on different aspects of his proposed
study and their advice and comments are usually invaluable to the researcher
...
Discussions with such persons should not
only be confined to the formulation of the specific problem at hand, but should also be concerned with
the general approach to the given problem, techniques that might be used, possible solutions, etc
...
Once the nature of the problem has been clearly understood, the
environment (within which the problem has got to be studied) has been defined, discussions over the
problem have taken place and the available literature has been surveyed and examined, rephrasing
the problem into analytical or operational terms is not a difficult task
...
*
In addition to what has been stated above, the following points must also be observed while
defining a research problem:
2
Robert Ferber and P
...
Verdoorn, Research Methods in Economics and Business, p
...
* Working hypotheses are a set of suggested tentative solutions of explanations of a research problem which may or may
not be the real solutions
...
Hypotheses should be clearly and
precisely stated in simple terms, they should be testable, limited in scope and should state relationship between variables
...
Defining the Research Problem
29
(a) Technical terms and words or phrases, with special meanings used in the statement of the
problem, should be clearly defined
...
(c) A straight forward statement of the value of the investigation (i
...
, the criteria for the
selection of the problem) should be provided
...
(e) The scope of the investigation or the limits within which the problem is to be studied must
be mentioned explicitly in defining a research problem
...
Rethinking and discussions
about the problem may result in narrowing down the question to:
“What factors were responsible for the higher labour productivity of Japan’s manufacturing
industries during the decade 1971 to 1980 relative to India’s manufacturing industries?”
This latter version of the problem is definitely an improvement over its earlier version for
the various ambiguities have been removed to the extent possible
...
must be explained clearly
...
In case the data for one or more industries selected are not available for the concerning
time-period, then the said industry or industries will have to be substituted by other industry or industries
...
Thus, all relevant factors must be considered
by a researcher before finally defining a research problem
...
All this results in
a well defined research problem that is not only meaningful from an operational point of view, but is
equally capable of paving the way for the development of working hypotheses and for means of
solving the problem itself
...
Describe fully the techniques of defining a research problem
...
What is research problem? Define the main issues which should receive the attention of the researcher in
formulating the research problem
...
(Raj
...
EAFM, M
...
Exam
...
How do you define a research problem? Give three examples to illustrate your answer
...
Uni
...
Phil
...
1978)
4
...
5
...
6
...
Explain
...
“Knowing what data are available often serves to narrow down the problem itself as well as the technique
that might be used
...
8
...
Research Design
31
3
Research Design
MEANING OF RESEARCH DESIGN
The formidable problem that follows the task of defining the research problem is the preparation of
the design of the research project, popularly known as the “research design”
...
“A research design is the arrangement of conditions for collection and analysis of
data in a manner that aims to combine relevance to the research purpose with economy in procedure
...
As such the design includes an
outline of what the researcher will do from writing the hypothesis and its operational implications to
the final analysis of data
...
50
...
From what has been stated above, we can state the important features of a research design as
under:
(i) It is a plan that specifies the sources and types of information relevant to the research
problem
...
(iii) It also includes the time and cost budgets since most studies are done under these two
constraints
...
NEED FOR RESEARCH DESIGN
Research design is needed because it facilitates the smooth sailing of the various research operations,
thereby making research as efficient as possible yielding maximal information with minimal expenditure
of effort, time and money
...
Research design stands for advance planning of the methods to be
adopted for collecting the relevant data and the techniques to be used in their analysis, keeping in
view the objective of the research and the availability of staff, time and money
...
Research design, in fact, has a great bearing on the reliability of the results arrived at and as such
constitutes the firm foundation of the entire edifice of the research work
...
The
importance which this problem deserves is not given to it
...
In fact, they may even give misleading conclusions
...
It is, therefore, imperative that an efficient and appropriate design must be prepared before
starting research operations
...
Such a design can even be given to
others for their comments and critical evaluation
...
Research Design
33
FEATURES OF A GOOD DESIGN
A good design is often characterised by adjectives like flexible, appropriate, efficient, economical
and so on
...
The design which gives the smallest experimental
error is supposed to be the best design in many investigations
...
Thus, the
question of good design is related to the purpose or objective of the research problem and also with
the nature of the problem to be studied
...
One single design
cannot serve the purpose of all types of research problems
...
If the research study happens to be an exploratory or a formulative one, wherein the major
emphasis is on discovery of ideas and insights, the research design most appropriate must be flexible
enough to permit the consideration of many different aspects of a phenomenon
...
Studies involving the testing of a hypothesis of a causal relationship between variables require a
design which will permit inferences about causality in addition to the minimisation of bias and
maximisation of reliability
...
It is only on the basis of its primary function that a study can be categorised either
as an exploratory or descriptive or hypothesis-testing study and accordingly the choice of a research
design may be made in case of a particular study
...
IMPORTANT CONCEPTS RELATING TO RESEARCH DESIGN
Before describing the different research designs, it will be appropriate to explain the various concepts
relating to designs so that these may be better and easily understood
...
Dependent and independent variables: A concept which can take on different quantitative
values is called a variable
...
Qualitative phenomena (or the attributes) are also quantified on the basis of the presence
34
Research Methodology
or absence of the concerning attribute(s)
...
* But all variables are not continuous
...
** Age is an example of continuous variable, but the number of children
is an example of non-continuous variable
...
For instance, if we say that height depends upon age,
then height is a dependent variable and age is an independent variable
...
Similarly, readymade films and lectures are examples of
independent variables, whereas behavioural changes, occurring as a result of the environmental
manipulations, are examples of dependent variables
...
Extraneous variable: Independent variables that are not related to the purpose of the study, but
may affect the dependent variable are termed as extraneous variables
...
In this case self-concept is an independent variable and social
studies achievement is a dependent variable
...
Whatever effect is noticed on dependent variable as a
result of extraneous variable(s) is technically described as an ‘experimental error’
...
3
...
The technical term ‘control’ is used when we design the study
minimising the effects of extraneous independent variables
...
4
...
5
...
The research hypothesis is a predictive statement that
relates an independent variable to a dependent variable
...
Predictive statements which are not to be
objectively verified or the relationships that are assumed but not to be tested, are not termed research
hypotheses
...
Experimental and non-experimental hypothesis-testing research: When the purpose of
research is to test a research hypothesis, it is termed as hypothesis-testing research
...
Research in which the independent variable
is manipulated is termed ‘experimental hypothesis-testing research’ and a research in which an
independent variable is not manipulated is called ‘non-experimental hypothesis-testing research’
...
A variable for which the individual values fall on the scale only with distinct gaps is called a discrete variable
...
This is an
example of non-experimental hypothesis-testing research because herein the independent variable,
intelligence, is not manipulated
...
At the end of the course, he administers a test to each group in order to judge the
effectiveness of the training programme on the student’s performance-level
...
, the type
of training programme, is manipulated
...
Experimental and control groups: In an experimental hypothesis-testing research when a
group is exposed to usual conditions, it is termed a ‘control group’, but when the group is exposed to
some novel or special condition, it is termed an ‘experimental group’
...
If both groups A and
B are exposed to special studies programmes, then both groups would be termed ‘experimental
groups
...
8
...
In the illustration taken above, the two treatments are the usual
studies programme and the special studies programme
...
9
...
For example, we can conduct an experiment to
examine the usefulness of a certain newly developed drug
...
,
absolute experiment and comparative experiment
...
Often, we undertake comparative experiments when we talk of designs
of experiments
...
Experimental unit(s): The pre-determined plots or the blocks, where different treatments are
used, are known as experimental units
...
DIFFERENT RESEARCH DESIGNS
Different research designs can be conveniently described if we categorize them as: (1) research
design in case of exploratory research studies; (2) research design in case of descriptive and diagnostic
research studies, and (3) research design in case of hypothesis-testing research studies
...
1
...
The main purpose of such studies is that of formulating
a problem for more precise investigation or of developing the working hypotheses from an operational
36
Research Methodology
point of view
...
As such
the research design appropriate for such studies must be flexible enough to provide opportunity for
considering different aspects of a problem under study
...
Generally, the following three methods in the context of research design for
such studies are talked about: (a) the survey of concerning literature; (b) the experience survey and
(c) the analysis of ‘insight-stimulating’ examples
...
Hypotheses stated by earlier
workers may be reviewed and their usefulness be evaluated as a basis for further research
...
In this way the
researcher should review and build upon the work already done by others, but in cases where
hypotheses have not yet been formulated, his task is to review the available material for deriving the
relevant hypotheses from it
...
He should also make an attempt to
apply concepts and theories developed in different research contexts to the area in which he is
himself working
...
Experience survey means the survey of people who have had practical experience with the
problem to be studied
...
For such a survey people who are competent
and can contribute new ideas may be carefully selected as respondents to ensure a representation of
different types of experience
...
The researcher must prepare an interview schedule for the systematic questioning of informants
...
Generally, the experiencecollecting interview is likely to be long and may last for few hours
...
This will
also give an opportunity to the respondents for doing some advance thinking over the various issues
involved so that, at the time of interview, they may be able to contribute effectively
...
This survey may as well provide information about the practical possibilities
for doing different types of research
...
It is particularly suitable in areas where there is little experience to serve as a guide
...
For this purpose the existing records, if any, may be examined, the unstructured interviewing
may take place, or some other approach may be adopted
...
Research Design
37
Now, what sort of examples are to be selected and studied? There is no clear cut answer to it
...
One can mention few examples of ‘insight-stimulating’ cases such as the reactions of
strangers, the reactions of marginal individuals, the study of individuals who are in transition from one
stage to another, the reactions of individuals from different social strata and the like
...
Thus, in an exploratory of formulative research study which merely leads to insights or hypotheses,
whatever method or research design outlined above is adopted, the only thing essential is that it must
continue to remain flexible so that many different facets of a problem may be considered as and
when they arise and come to the notice of the researcher
...
Research design in case of descriptive and diagnostic research studies: Descriptive research
studies are those studies which are concerned with describing the characteristics of a particular
individual, or of a group, whereas diagnostic research studies determine the frequency with which
something occurs or its association with something else
...
As against this, studies concerned
with specific predictions, with narration of facts and characteristics concerning individual, group or
situation are all examples of descriptive research studies
...
From the point of view of the research design, the descriptive as well as diagnostic
studies share common requirements and as such we may group together these two types of research
studies
...
Since the aim is to obtain complete and accurate information
in the said studies, the procedure to be used must be carefully planned
...
The design in such studies must be rigid and not
flexible and must focus attention on the following:
(a)
(b)
(c)
(d)
Formulating the objective of the study (what the study is about and why is it being made?)
Designing the methods of data collection (what techniques of gathering data will be adopted?)
Selecting the sample (how much material will be needed?)
Collecting the data (where can the required data be found and with what time period should
the data be related?)
(e) Processing and analysing the data
...
In a descriptive/diagnostic study the first step is to specify the objectives with sufficient precision
to ensure that the data collected are relevant
...
Then comes the question of selecting the methods by which the data are to be obtained
...
Several methods (viz
...
), with their merits and limitations, are available
for the purpose and the researcher may user one or more of these methods which have been discussed
in detail in later chapters
...
Whichever method is selected, questions must be well examined
and be made unambiguous; interviewers must be instructed not to express their own opinion; observers
must be trained so that they uniformly record a given item of behaviour
...
In other
words, we can say that “structured instruments” are used in such studies
...
More often
than not, sample has to be designed
...
Here we may only mention that the problem of designing samples
should be tackled in such a fashion that the samples may yield accurate information with a minimum
amount of research effort
...
To obtain data free from errors introduced by those responsible for collecting them, it is necessary
to supervise closely the staff of field workers as they collect and record information
...
“As
data are collected, they should be examined for completeness, comprehensibility, consistency and
reliability
...
This includes steps like coding the interview
replies, observations, etc
...
To
the extent possible, the processing and analysing procedure should be planned in detail before actual
work is started
...
Coding should be done carefully to avoid
error in coding and for this purpose the reliability of coders needs to be checked
...
In case of mechanical
tabulation the material (i
...
, the collected data or information) must be entered on appropriate cards
which is usually done by punching holes corresponding to a given code
...
Finally, statistical computations are needed and as such averages,
percentages and various coefficients must be worked out
...
The appropriate statistical operations, along with the use of appropriate tests of
significance should be carried out to safeguard the drawing of conclusions concerning the study
...
This is the task of communicating the
findings to others and the researcher must do it in an efficient manner
...
Thus, the research design in case of descriptive/diagnostic studies is a comparative design throwing
light on all points narrated above and must be prepared keeping in view the objective(s) of the study
and the resources available
...
The said design can be appropriately referred to as a survey
design since it takes into account all the steps involved in a survey concerning a phenomenon to be
studied
...
, op
...
, p
...
Research Design
39
The difference between research designs in respect of the above two types of research studies
can be conveniently summarised in tabular form as under:
Table 3
...
3
...
Such studies require procedures that will not only reduce
bias and increase reliability, but will permit drawing inferences about causality
...
Hence, when we talk of research design in such studies, we often mean the
design of experiments
...
A
...
Beginning of such designs
was made by him when he was working at Rothamsted Experimental Station (Centre for Agricultural
Research in England)
...
Professor Fisher found that by dividing agricultural fields or plots into different blocks and then by
conducting experiments in each of these blocks, whatever information is collected and inferences
drawn from them, happens to be more reliable
...
Today, the experimental designs
are being used in researches relating to phenomena of several disciplines
...
) in experimental designs
...
40
Research Methodology
According to the Principle of Replication, the experiment should be repeated more than once
...
By doing so the statistical
accuracy of the experiments is increased
...
For this purpose we may divide the field into two parts and grow one variety in one
part and the other variety in the other part
...
But if we are to apply the principle of replication to this experiment, then we
first divide the field into several parts, grow one variety in half of these parts and the other variety in
the remaining parts
...
The result so obtained will be more reliable in comparison to the conclusion we
draw without applying the principle of replication
...
Conceptually replication does not present any difficulty, but
computationally it does
...
However, it should be remembered that replication is introduced in
order to increase the precision of a study; that is to say, to increase the accuracy with which the main
effects and interactions can be estimated
...
In other words, this principle indicates that we
should design or plan the experiment in such a way that the variations caused by extraneous factors
can all be combined under the general heading of “chance
...
If
this is so, our results would not be realistic
...
e
...
As such, through the application of the principle of randomization,
we can have a better estimate of the experimental error
...
Under it
the extraneous factor, the known source of variability, is made to vary deliberately over as wide a
range as necessary and this needs to be done in such a way that the variability it causes can be
measured and hence eliminated from the experimental error
...
* In other words,
according to the principle of local control, we first divide the field into several homogeneous parts,
known as blocks, and then each such block is divided into parts equal to the number of treatments
...
Dividing the field into several
homogenous parts is known as ‘blocking’
...
In brief, through the principle of local control we can
eliminate the variability due to extraneous factor(s) from the experimental error
...
Research Design
41
Important Experimental Designs
Experimental design refers to the framework or structure of an experiment and as such there are
several experimental designs
...
,
informal experimental designs and formal experimental designs
...
Important experiment designs are as follows:
(a) Informal experimental designs:
(i) Before-and-after without control design
...
(iii) Before-and-after with control design
...
R
...
(ii) Randomized block design (R
...
Design)
...
S
...
(iv) Factorial designs
...
1
...
The treatment
is then introduced and the dependent variable is measured again after the treatment has been
introduced
...
The design can be represented thus:
Level of phenomenon
Treatment
Level of phenomenon
before treatment (X)
Test area:
introduced
after treatment (Y)
Treatment Effect = (Y) – (X)
Fig
...
1
The main difficulty of such a design is that with the passage of time considerable extraneous
variations may be there in its treatment effect
...
After-only with control design: In this design two groups or areas (test area and control area)
are selected and the treatment is introduced into the test area only
...
Treatment impact is assessed by subtracting the value
of the dependent variable in the control area from its value in the test area
...
3
...
If this assumption is not true, there is the possibility
of extraneous variation entering into the treatment effect
...
In this respect the design is
superior to before-and-after without control design
...
Before-and-after with control design: In this design two areas are selected and the dependent
variable is measured in both the areas for an identical time-period before the treatment
...
The treatment effect is determined by
subtracting the change in the dependent variable in the control area from the change in the dependent
variable in test area
...
3
...
But at times, due to lack of historical data, time or a comparable control area, we should prefer
to select one of the first two informal designs stated above
...
Completely randomized design (C
...
design): Involves only two principles viz
...
It is the simplest possible
design and its procedure of analysis is also easier
...
For instance, if we have
10 subjects and if we wish to test 5 under treatment A and 5 under treatment B, the randomization
process gives every possible group of 5 subjects selected from a set of 10 an equal opportunity of
being assigned to treatment A and treatment B
...
Even unequal replications can also work in this design
...
Such a design is generally used when
experimental areas happen to be homogeneous
...
Research Design
43
extraneous factors are included under the heading of chance variation, we refer to the design of
experiment as C
...
design
...
4
...
3
...
Further, requirement of this design is that items, after being selected randomly from the
population, be randomly assigned to the experimental and control groups (Such random
assignment of items to two groups is technically described as principle of randomization)
...
In a diagram form
this design can be shown in this way:
Two-group simple randomized experimental design (in diagram form)
Since in the sample randomized design the elements constituting the sample are randomly
drawn from the same population and randomly assigned to the experimental and control
groups, it becomes possible to draw conclusions on the basis of samples applicable for the
population
...
This design of experiment is quite common
in research studies concerning behavioural sciences
...
But the limitation of it is
that the individual differences among those conducting the treatments are not eliminated,
i
...
, it does not control the extraneous variable and as such the result of the experiment may
not depict a correct picture
...
Suppose the
researcher wants to compare two groups of students who have been randomly selected
and randomly assigned
...
, the usual training and the specialised
training are being given to the two groups
...
To determine this, he tests each group before and
after the training, and then compares the amount of gain for the two groups to accept or
reject his hypothesis
...
But this does not control the
differential effects of the extraneous independent variables (in this case, the individual
differences among those conducting the training programme)
...
3
...
In the illustration just cited above, the
teacher differences on the dependent variable were ignored, i
...
, the extraneous variable
was not controlled
...
Each
repetition is technically called a ‘replication’
...
, it provides controls for the differential effects of the extraneous independent variables
and secondly, it randomizes any individual differences among those conducting the treatments
...
3
...
The
sample is taken randomly from the population available for study and is randomly assigned
to, say, four experimental and four control groups
...
Generally, equal number of items are put in each group so that the size of
the group is not likely to affect the result of the study
...
Thus, this
random replication design is, in fact, an extension of the two-group simple randomized
design
...
Randomized block design (R
...
design) is an improvement over the C
...
design
...
B
...
In the R
...
design, subjects are first divided into groups, known as blocks, such that within
each group the subjects are relatively homogeneous in respect to some selected variable
...
The number of subjects in a given block would be equal to the
number of treatments and one subject in each block would be randomly assigned to each treatment
...
The main feature of the R
...
design is that in this
each treatment appears the same number of times in each block
...
B
...
Let us illustrate the R
...
design with the help of an example
...
Q
...
Very low
I
...
Low
I
...
Average
I
...
High
I
...
Very high
I
...
Student
A
Student
B
Student
C
Student
D
Student
E
Form 1
82
67
57
71
73
Form 2
90
68
54
70
81
Form 3
86
73
51
69
84
Form 4
93
77
60
65
71
Fig
...
6
If each student separately randomized the order in which he or she took the four tests (by using
random numbers or some similar device), we refer to the design of this experiment as a R
...
design
...
*
See Chapter 11 for the two-way ANOVA technique
...
Latin square design (L
...
design) is an experimental design very frequently used in agricultural
research
...
For instance, an experiment
has to be made through which the effects of five different varieties of fertilizers on the yield of a
certain crop, say wheat, it to be judged
...
Similarly, there may be impact of varying
seeds on the yield
...
S
...
The Latin-square design is one wherein each fertilizer, in our example, appears five times but is
used only once in each row and in each column of the design
...
S
...
The two blocking factors may be represented through rows and columns (one
through rows and the other through columns)
...
, A, B, C, D and E and the two blocking factor viz
...
3
...
S
...
The analysis of the L
...
design is very similar to the
two-way ANOVA technique
...
But this design suffers from one limitation, and it is that although each row and each column
represents equally all fertilizer varieties, there may be considerable difference in the row and column
means both up and across the field
...
S
...
This defect can, however, be
removed by taking the means of rows and columns equal to the field mean by adjusting the results
...
This reduces the utility of this design
...
S
...
If treatments are
10 or more, than each row and each column will be larger in size so that rows and columns may not
be homogeneous
...
Therefore,
L
...
design of orders (5 × 5) to (9 × 9) are generally used
...
Factorial designs: Factorial designs are used in experiments where the effects of varying more
than one factor are to be determined
...
Factorial designs
can be of two types: (i) simple factorial designs and (ii) complex factorial designs
...
Simple factorial design is also termed
as a ‘two-factor-factorial design’, whereas complex factorial design is known as ‘multifactor-factorial design
...
We
illustrate some simple factorial designs as under:
Illustration 1: (2 × 2 simple factorial design)
...
3
...
Then
there are two treatments of the experimental variable and two levels of the control variable
...
Each of the four combinations would provide
one treatment or experimental condition
...
The means for different cells may be obtained along
with the means for different rows and columns
...
Similarly, the row means in the said design are termed the main effects for levels without
regard to treatment
...
An additional merit of this design is that one can examine the interaction
between treatments and levels, through which one may say whether the treatment and levels are
independent of each other or they are not so
...
The data obtained in case of two (2 × 2) simple factorial
studies may be as given in Fig
...
9
...
5
23
...
4
Level II (High)
35
...
2
33
...
6
26
...
4
20
...
5
Level II (High)
30
...
4
35
...
5
30
...
3
...
Graphically, these can be represented as shown in Fig
...
10
...
3
...
The graph relating to Study II shows that there is no interaction effect which means that treatment
and level in this study are relatively independent of each other
...
e
...
For example, a college teacher compared the effect of the classsize as well as the introduction of the new instruction technique on the learning of research methodology
...
His design in the graphic
form would be as follows:
Experimental Variable I
(Class Size)
Small
Experimental Variable II
New
(Instruction technique)
Usual
Usual
Fig
...
11
But if the teacher uses a design for comparing males and females and the senior and junior
students in the college as they relate to the knowledge of research methodology, in that case we will
have a 2 × 2 simple factorial design wherein both the variables are control variables as no manipulation
is involved in respect of both the variables
...
The 4 × 3 simple factorial design will usually include four treatments of the experimental variable
and three levels of the control variable
...
3
...
, A, B, C, and D of the
experimental variable and three levels viz
...
This shows that a 2 × 2 simple factorial design can be generalised to any
number of treatments and levels
...
In
50
Research Methodology
such a design the means for the columns provide the researcher with an estimate of the main effects
for treatments and the means for rows provide an estimate of the main effects for the levels
...
(ii) Complex factorial designs: Experiments with more than two factors at a time involve
the use of complex factorial designs
...
In case of three factors with
one experimental variable having two treatments and two control variables, each one of
which having two levels, the design used will be termed 2 × 2 × 2 complex factorial design
which will contain a total of eight cells as shown below in Fig
...
13
...
3
...
3
...
C
Va ont
ria rol
ble
2
Experimental Variable
Treatment
A
Level II
Level I
Control Variable I
Level I
Level II
Fig
...
14
Treatment
B
Research Design
51
The dotted line cell in the diagram corresponds to Cell 1 of the above stated 2 × 2 × 2 design and
is for Treatment A, level I of the control variable 1, and level I of the control variable 2
...
e
...
The researcher can also determine the interactions between each possible pair of
variables (such interactions are called ‘First Order interactions’) and interaction between variable
taken in triplets (such interactions are called Second Order interactions)
...
e
...
To determine the main effects for the experimental variable, the researcher must necessarily
compare the combined mean of data in cells 1, 2, 3 and 4 for Treatment A with the combined mean
of data in cells 5, 6, 7 and 8 for Treatment B
...
Similarly, the main effect for control
variable 1, independent of experimental variable and control variable 2, is obtained if we compare the
combined mean of data in cells 1, 3, 5 and 7 with the combined mean of data in cells 2, 4, 6 and 8 of
our 2 × 2 × 2 factorial design
...
To obtain the first order interaction, say, for EV × CV1 in the above stated design, the researcher
must necessarily ignore control variable 2 for which purpose he may develop 2 × 2 design from the
2 × 2 × 2 design by combining the data of the relevant cells of the latter design as shown in Fig
...
15
...
3
...
The analysis of the first
order interaction, in the manner described above, is essentially a sample factorial analysis as only two
variables are considered at a time and the remaining one is ignored
...
The analysis would be termed as a complex factorial analysis
...
Of course, the greater the number of independent variables included
in a complex factorial design, the higher the order of the interaction analysis possible
...
52
Research Methodology
Factorial designs are used mainly because of the two advantages
...
Using factorial designs, we can determine the main effects of two (in
simple factorial design) or more (in case of complex factorial design) factors (or variables) in one
single experiment
...
For example, they give
information about such effects which cannot be obtained by treating one single factor at a time
...
CONCLUSION
There are several research designs and the researcher must decide in advance of collection and
analysis of data as to which design would prove to be more appropriate for his research project
...
Questions
1
...
2
...
(a) Extraneous variables;
(b) Confounded relationship;
(c) Research hypothesis;
(d) Experimental and Control groups;
(e) Treatments
...
Describe some of the important research designs used in experimental hypothesis-testing research
study
...
“Research design in exploratory studies must be flexible but in descriptive studies, it must minimise bias
and maximise reliability
...
5
...
Is single research design suitable in all research
studies? If not, why?
6
...
7
...
8
...
(Raj
...
EAFM M
...
1978)
Appendix: Developing a Research Plan
53
Appendix
Developing a Research Plan*
After identifying and defining the problem as also accomplishing the relating task, researcher must
arrange his ideas in order and write them in the form of an experimental plan or what can be
described as ‘Research Plan’
...
(b) It provides an inventory of what must be done and which materials have to be collected as
a preliminary step
...
Research plan must contain the following items
...
Research objective should be clearly stated in a line or two which tells exactly what it is
that the researcher expects to do
...
The problem to be studied by researcher must be explicitly stated so that one may know
what information is to be obtained for solving the problem
...
Each major concept which researcher wants to measure should be defined in operational
terms in context of the research project
...
The plan should contain the method to be used in solving the problem
...
5
...
For instance, if interview
method is to be used, an account of the nature of the contemplated interview procedure
should be given
...
If public
records are to be consulted as sources of data, the fact should be recorded in the research
plan
...
*
Based on the matter given in the following two books:
(i) Robert M
...
Travers, An Introduction to Educational Research, p
...
(ii) C
...
415–416
...
A clear mention of the population to be studied should be made
...
e
...
The method of identifying the sample should be such that generalisation from the
sample to the original population is feasible
...
The plan must also contain the methods to be used in processing the data
...
Such methods should not be left
until the data have been collected
...
8
...
Time and cost budgets for the research
project should also be prepared and laid down in the plan itself
...
’ A complete enumeration of
all items in the ‘population’ is known as a census inquiry
...
But in
practice this may not be true
...
Moreover, there is no way of checking the element of
bias or its extent except through a resurvey or use of sample checks
...
Therefore, when the field of inquiry is large, this
method becomes difficult to adopt because of the resources involved
...
Perhaps, government is the only institution which can get
the complete enumeration carried out
...
Further, many a time it is not possible to examine
every item in the population, and sometimes it is possible to obtain sufficiently accurate results by
studying only a part of total population
...
However, it needs to be emphasised that when the universe is a small one, it is no use resorting
to a sample survey
...
e
...
The
respondents selected should be as representative of the total population as possible in order to produce
a miniature cross-section
...
’ The survey so conducted is known as
‘sample survey’
...
Researcher must prepare a sample design
for his study i
...
, he must plan how a sample should be selected and of what size such a sample would be
...
It refers to the
technique or the procedure the researcher would adopt in selecting items for the sample
...
e
...
Sample design is determined before data are collected
...
Some designs are relatively more precise and easier to apply
than others
...
STEPS IN SAMPLE DESIGN
While developing a sampling design, the researcher must pay attention to the following points:
(i) Type of universe: The first step in developing any sample design is to clearly define the
set of objects, technically called the Universe, to be studied
...
In finite universe the number of items is certain, but in case of an infinite universe
the number of items is infinite, i
...
, we cannot have any idea about the total number of
items
...
are examples of infinite universes
...
Sampling unit may be a geographical one such as state, district, village, etc
...
, or it may be a social unit such as family, club,
school, etc
...
The researcher will have to decide one or more of
such units that he has to select for his study
...
It
contains the names of all items of a universe (in case of finite universe only)
...
Such a list should be comprehensive, correct,
reliable and appropriate
...
(iv) Size of sample: This refers to the number of items to be selected from the universe to
constitute a sample
...
The size of sample should
neither be excessively large, nor too small
...
An optimum sample is
one which fulfills the requirements of efficiency, representativeness, reliability and flexibility
...
The size of population variance needs to
be considered as in case of larger variance usually a bigger sample is needed
...
The parameters of
interest in a research study must be kept in view, while deciding the size of the sample
...
As such, budgetary constraint must
invariably be taken into consideration when we decide the sample size
...
For instance, we may be
interested in estimating the proportion of persons with some characteristic in the population,
or we may be interested in knowing some average or the other measure concerning the
population
...
All this has a strong impact upon the sample design we
would accept
...
This fact can even lead to the use of a non-probability sample
...
e
...
In
fact, this technique or procedure stands for the sample design itself
...
Obviously, he must select that design which, for a given sample
size and for a given cost, has a smaller sampling error
...
, the cost of
collecting the data and the cost of an incorrect inference resulting from the data
...
, systematic bias and sampling error
...
At best the causes responsible for these errors can be detected and
corrected
...
Inappropriate sampling frame: If the sampling frame is inappropriate i
...
, a biased representation
of the universe, it will result in a systematic bias
...
Defective measuring device: If the measuring device is constantly in error, it will result in
systematic bias
...
Similarly, if the physical measuring device is defective there will be systematic bias in the
data collected through such a measuring device
...
Non-respondents: If we are unable to sample all the individuals initially included in the sample,
there may arise a systematic bias
...
4
...
For instance, if workers are
aware that somebody is observing them in course of a work study on the basis of which the average
length of time to complete a task will be determined and accordingly the quota will be set for piece
work, they generally tend to work slowly in comparison to the speed with which they work if kept
unobserved
...
5
...
There is usually a downward bias in the
income data collected by government taxation department, whereas we find an upward bias in the
income data collected by some social organisation
...
Generally in psychological surveys, people tend to give what they think is the ‘correct’ answer rather
than revealing their true feelings
...
Since they occur randomly and are equally likely to be in either direction, their nature
happens to be of compensatory type and the expected value of such errors happens to be equal to
zero
...
Sampling error can be measured for a given sample design and size
...
If we increase the sample size,
the precision can be improved
...
, a
large sized sample increases the cost of collecting data and also enhances the systematic bias
...
In practice, however, people prefer a
less precise design because it is easier to adopt the same and also because of the fact that systematic
bias can be controlled in a better way in such a design
...
CHARACTERISTICS OF A GOOD SAMPLE DESIGN
From what has been stated above, we can list down the characteristics of a good sample design as
under:
(a)
(b)
(c)
(d)
(e)
Sample design must result in a truly representative sample
...
Sample design must be viable in the context of funds available for the research study
...
Sample should be such that the results of the sample study can be applied, in general, for
the universe with a reasonable level of confidence
...
, the representation basis and
the element selection technique
...
Probability sampling is based on the concept of random selection,
whereas non-probability sampling is ‘non-random’ sampling
...
When each sample element is drawn individually from the
population at large, then the sample so drawn is known as ‘unrestricted sample’, whereas all other
forms of sampling are covered under the term ‘restricted sampling’
...
Thus, sample designs are basically of two types viz
...
We take up these two designs separately
...
)
Purposive sampling (such as
quota sampling, judgement
sampling)
Fig
...
1
Non-probability sampling: Non-probability sampling is that sampling procedure which does
not afford any basis for estimating the probability that each item in the population has of being
included in the sample
...
In this type of sampling, items for the sample
are selected deliberately by the researcher; his choice concerning the items remains supreme
...
For instance, if economic
conditions of people living in a state are to be studied, a few towns and villages may be purposively
selected for intensive study on the principle that they can be representative of the entire state
...
In such a design, personal element has a great chance of entering into the selection of the
sample
...
Thus, there is always the danger of bias
entering into this type of sampling technique
...
However, in such a sampling, there
is no assurance that every element has some specifiable chance of being included
...
As
such this sampling design in rarely adopted in large inquires of importance
...
Quota sampling is also an example of non-probability
sampling
...
In other words, the actual selection of the
items for the sample is left to the interviewer’s discretion
...
But the samples so selected certainly do not possess the characteristic
of random samples
...
60
Research Methodology
Probability sampling: Probability sampling is also known as ‘random sampling’ or ‘chance
sampling’
...
It is, so to say, a lottery method in which individual units are picked up from the whole
group not deliberately but by some mechanical process
...
The results obtained from probability or random sampling
can be assured in terms of probability i
...
, we can measure the errors of estimation or the significance
of results obtained from a random sample, and this fact brings out the superiority of random sampling
design over the deliberate sampling design
...
This is the reason why random sampling is considered
as the best technique of selecting a representative sample
...
This applies to sampling without
replacement i
...
, once an item is selected for the sample, it cannot appear in the sample again
(Sampling with replacement is used less frequently in which procedure the element selected for the
sample is returned to the population before the next element is selected
...
In brief, the
implications of random sampling (or simple random sampling) are:
(a) It gives each element in the population an equal probability of getting into the sample; and
all choices are independent of one another
...
Keeping this in view we can define a simple random sample (or simply a random sample) from
a finite population as a sample which is chosen in such a way that each of the NCn possible samples
has the same probability, 1/NCn, of being selected
...
e
...
Suppose that we want to take a
sample of size n = 3 from it
...
If we choose one of these samples in such a way that each has the
probability 1/20 of being chosen, we will then call this a random sample
...
Such a procedure is obviously impractical, if not altogether impossible in complex
problems of sampling
...
Fortunately, we can take a random sample in a relatively easier way without taking the trouble of
enlisting all possible samples on paper-slips as explained above
...
In doing so we must make sure that in
Sampling Design
61
successive drawings each of the remaining elements of the population has the same chance of being
selected
...
We can
verify this by taking the above example
...
Since these draws are independent, the joint probability of the three elements which constitute
our sample is the product of their individual probabilities and this works out to 3/6 × 2/5 × 1/4 = 1/20
...
Even this relatively easy method of obtaining a random sample can be simplified in actual practice
by the use of random number tables
...
Generally, Tippett’s
random number tables are used for the purpose
...
He selected
41600 digits from the census reports and combined them into fours to give his random numbers
which may be used to obtain a random sample
...
First of all we reproduce the first thirty sets of
Tippett’s numbers
2952
6641
3992
9792
7979
5911
3170
5624
4167
9525
1545
1396
7203
5356
1300
2693
2370
7483
3408
2769
3563
6107
6913
7691
0560
5246
1112
9025
6008
8126
Suppose we are interested in taking a sample of 10 units from a population of 5000 units, bearing
numbers from 3001 to 8000
...
If we randomly decide to read the table numbers
from left to right, starting from the first row itself, we obtain the following numbers: 6641, 3992, 7979,
5911, 3170, 5624, 4167, 7203, 5356, and 7483
...
One may note that it is easy to draw random samples from finite populations with the aid of
random number tables only when lists are available and items are readily numbered
...
For example, if we
want to estimate the mean height of trees in a forest, it would not be possible to number the trees, and
choose random numbers to select a random sample
...
RANDOM SAMPLE FROM AN INFINITE UNIVERSE
So far we have talked about random sampling, keeping in view only the finite populations
...
However, a few examples will show the basic
characteristic of such a sample
...
If
62
Research Methodology
the probability of getting a particular number, say 1, is the same for each throw and the 20 throws are
all independent, then we say that the sample is random
...
In brief, one can say
that the selection of each item in a random sample from an infinite population is controlled by the
same probabilities and that successive selections are independent of one another
...
Such designs may as well be called ‘mixed sampling designs’ for many of
such designs may represent a combination of probability and non-probability sampling procedures in
selecting a sample
...
Sampling of this type is known as systematic sampling
...
For instance, if a 4 per cent sample is desired, the first item would be selected randomly from
the first twenty-five and thereafter every 25th item would automatically be included in the sample
...
Although a systematic sample is not a random sample in the
strict sense of the term, but it is often considered reasonable to treat systematic sample as if it were
a random sample
...
It can be taken as an improvement over a simple
random sample in as much as the systematic sample is spread more evenly over the entire population
...
But there are certain dangers too in using this type of sampling
...
For instance, every 25th item produced by a certain production process is defective
...
If all
elements of the universe are ordered in a manner representative of the total population, i
...
, the
population list is in random order, systematic sampling is considered equivalent to random sampling
...
In practice,
systematic sampling is used when lists of population are available and they are of considerable
length
...
Under stratified sampling the population is divided into several sub-populations that are
individually more homogeneous than the total population (the different sub-populations are called
‘strata’) and then we select items from each stratum to constitute a sample
...
In brief, stratified sampling results in more reliable and detailed information
...
This means that various strata be formed in
such a way as to ensure elements being most homogeneous within each stratum and most
heterogeneous between the different strata
...
One should always remember
that careful consideration of the relationship between the characteristics of the population and the
characteristics to be estimated are normally used to define the strata
...
We can do so by
taking small samples of equal size from each of the proposed strata and then examining the variances
within and among the possible stratifications, we can decide an appropriate stratification plan for our
inquiry
...
Systematic sampling can
be used if it is considered more appropriate in certain situations
...
That
is, if Pi represents the proportion of population included in stratum i, and n represents the total sample
size, the number of elements selected from stratum i is n
...
To illustrate it, let us suppose that we
want a sample of size n = 30 to be drawn from a population of size N = 8000 which is divided into
three strata of size N1 = 4000, N2 = 2400 and N3 = 1600
...
P1 = 30 (4000/8000) = 15
Similarly, for strata with N2 = 2400, we have
n2 = n
...
P3 = 30 (1600/8000) = 6
...
, 4000 : 2400 : 1600
...
But in case the purpose happens to be to
compare the differences among the strata, then equal sample selection from each stratum would be
more efficient even if the strata differ in sizes
...
= n k / N k σ k
where σ1 , σ 2 ,
...
This is called ‘optimum
allocation’ in the context of disproportionate sampling
...
+ N k σ k
for i = 1, 2, … and k
...
Illustration 1
A population is divided into three strata so that N1 = 5000, N2 = 2000 and N3 = 3000
...
How should a sample of size n = 84 be allocated to the three strata, if we want optimum allocation
using disproportionate sampling design?
Solution: Using the disproportionate sampling design for optimum allocation, the sample sizes for
different strata will be determined as under:
Sample size for strata with N1 = 5000
n1 =
b gb g
b5000g b15g + b2000g b18g + b3000g b5g
84 5000 15
= 6300000/126000 = 50
Sample size for strata with N2 = 2000
n2 =
b gb g
b5000g b15g + b2000g b18g + b3000g b5g
84 2000 18
= 3024000/126000 = 24
Sample size for strata with N3 = 3000
n3 =
b gb g
b5000g b15g + b2000g b18g + b3000g b5g
84 3000 5
= 1260000/126000 = 10
In addition to differences in stratum size and differences in stratum variability, we may have
differences in stratum sampling cost, then we can have cost optimal disproportionate sampling design
by requiring
n1
N1 σ1 C1
=
n2
N 2 σ 2 C2
=
...
The allocation in such a situation results in
the following formula for determining the sample sizes for different strata:
ni =
n ⋅ N i σ i / Ci
N 1σ1 / C1 + N 2 σ 2 / C2 +
...
, k
It is not necessary that stratification be done keeping in view a single characteristic
...
For example, a system-wide survey designed
to determine the attitude of students toward a new teaching plan, a state college system with 20
colleges might stratify the students with respect to class, sec and college
...
From what has been stated above in respect of stratified sampling, we can say that the sample so
constituted is the result of successive application of purposive (involved in stratification of items) and
random sampling methods
...
The procedure wherein we
first have stratification and then simple random sampling is known as stratified random sampling
...
Thus in cluster sampling the total population is divided into a number of relatively small subdivisions
which are themselves clusters of still smaller units and then some of these clusters are randomly
selected for inclusion in the overall sample
...
Also assume that there are 20000 machine parts in the
inventory at a given point of time, stored in 400 cases of 50 each
...
Cluster sampling, no doubt, reduces cost by concentrating surveys in selected clusters
...
There is also not as much information in ‘n’
observations within a cluster as there happens to be in ‘n’ randomly drawn observations
...
(iv) Area sampling: If clusters happen to be some geographic subdivisions, in that case cluster
sampling is better known as area sampling
...
The plus and minus points of cluster sampling are also applicable to area sampling
...
Suppose we want to investigate the working efficiency of nationalised banks in India and
we want to take a sample of few banks for this purpose
...
Then we may select certain districts and interview all banks
in the chosen districts
...
If instead of taking a census of all banks within the selected districts, we select certain towns and
interview all banks in the chosen towns
...
If
instead of taking a census of all banks within the selected towns, we randomly sample banks from
each selected town, then it is a case of using a four-stage sampling plan
...
Ordinarily multi-stage sampling is applied in big inquires extending to a considerable large
geographical area, say, the entire country
...
,
(a) It is easier to administer than most single stage designs mainly because of the fact that sampling
frame under multi-stage sampling is developed in partial units
...
(vi) Sampling with probability proportional to size: In case the cluster sampling units do not
have the same number or approximately the same number of elements, it is considered appropriate to
use a random selection process where the probability of each cluster being included in the sample is
proportional to the size of the cluster
...
Then we must sample systematically the
appropriate number of elements from the cumulative totals
...
The results of this type of sampling
are equivalent to those of a simple random sample and the method is less cumbersome and is also
relatively less expensive
...
Illustration 2
The following are the number of departmental stores in 15 cities: 35, 17, 10, 32, 70, 28, 26, 19, 26,
66, 37, 44, 33, 29 and 28
...
Solution: Let us put the information as under (Table 4
...
As we have to use the starting point of
10*, so we add successively increments of 50 till 10 numbers have been selected
...
1) against the concerning cumulative totals
...
This sample of 10 stores is the sample with probability proportional to size
...
Sampling Design
67
Table 4
...
of departmental stores
Cumulative total
Sample
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
35
17
10
32
70
28
26
19
26
66
37
44
33
29
28
35
52
62
94
164
192
218
237
263
329
366
410
443
472
500
10
60
110
160
210
260
310
360
410
460
(vii) Sequential sampling: This sampling design is some what complex sample design
...
This is usually
adopted in case of acceptance sampling plan in context of statistical quality control
...
But when the number of samples is
more than two but it is neither certain nor decided in advance, this type of system is often referred to
as sequential sampling
...
CONCLUSION
From a brief description of the various sample designs presented above, we can say that normally
one should resort to simple random sampling because under it bias is generally eliminated and the
sampling error can be estimated
...
There are
situations in real life under which sample designs other than simple random samples may be considered
better (say easier to obtain, cheaper or more informative) and as such the same may be used
...
At times, several methods of sampling may well be used in the same
study
...
What do you mean by ‘Sample Design’? What points should be taken into consideration by a researcher
in developing a sample design for this research project
...
How would you differentiate between simple random sampling and complex random sampling designs?
Explain clearly giving examples
...
Why probability sampling is generally preferred in comparison to non-probability sampling? Explain the
procedure of selecting a simple random sample
...
Under what circumstances stratified random sampling design is considered appropriate? How would you
select such sample? Explain by means of an example
...
Distinguish between:
(a) Restricted and unrestricted sampling;
(b) Convenience and purposive sampling;
(c) Systematic and stratified sampling;
(d) Cluster and area sampling
...
Under what circumstances would you recommend:
(a) A probability sample?
(b) A non-probability sample?
(c) A stratified sample?
(d) A cluster sample?
7
...
8
...
What do you mean by such a
systematic bias? Describe the important causes responsible for such a bias
...
(a) The following are the number of departmental stores in 10 cities: 35, 27, 24, 32, 42, 30, 34, 40, 29 and 38
...
(b)What sampling design might be used to estimate the weight of a group of men and women?
10
...
Respective standard deviations are: σ 1 = 1
...
0 , σ 3 = 4
...
8 , σ 5 = 6
...
How should a sample of size n = 226 be allocated to
five strata if we adopt proportionate sampling design; if we adopt disproportionate sampling design
considering (i) only the differences in stratum variability (ii) differences in stratum variability as well as
the differences in stratum sampling costs
...
We also measure when we judge how well we like a song,
a painting or the personalities of our friends
...
Measurement is a relatively complex and demanding task, specially so when it concerns
qualitative or abstract phenomena
...
It is easy to assign numbers in respect of properties of some objects, but it is relatively difficult in
respect of others
...
In other words, properties like weight, height, etc
...
We can expect high accuracy
in measuring the length of pipe with a yard stick, but if the concept is abstract and the measurement
tools are not standardized, we are less confident about the accuracy of the results of measurement
...
In measuring, we devise some form of
scale in the range (in terms of set theory, range may refer to some set) and then transform or map the
properties of objects from the domain (in terms of set theory, domain may refer to some other set)
onto this scale
...
In terms of set theory, this process is one of mapping the observed physical
properties of those coming to the show (the domain) on to a sex classification (the range)
...
Similarly, we can record a person’s marital status as 1, 2, 3 or 4, depending on whether
70
Research Methodology
the person is single, married, widowed or divorced
...
In this artificial or nominal way,
categorical data (qualitative or descriptive) can be made into numerical data and if we thus code the
various categories, we refer to the numbers we record as nominal data
...
For instance if we record marital status as 1, 2, 3, or 4 as stated above, we cannot write
4 > 2 or 3 < 4 and we cannot write 3 – 1 = 4 – 2, 1 + 3 = 4 or 4 ÷ 2 = 2
...
For instance, if one mineral can scratch another, it receives a higher hardness number
and on Mohs’ scale the numbers from 1 to 10 are assigned respectively to talc, gypsum, calcite,
fluorite, apatite, feldspar, quartz, topaz, sapphire and diamond
...
It would also be meaningless to say
that topaz is twice as hard as fluorite simply because their respective hardness numbers on Mohs’
scale are 8 and 4
...
e
...
When in addition to setting up inequalities we can also form differences, we refer to the data as
interval data
...
In this case, we can write 100° > 70° or 95° < 135° which
simply means that 110° is warmer than 70° and that 95° is cooler than 135°
...
On the other hand, it would not mean much if we said that 126° is twice as hot as 63°, even
though 126° ÷ 63° = 2
...
This difficulty arises from the
fact that Fahrenheit and Centigrade scales both have artificial origins (zeros) i
...
, the number 0 of
neither scale is indicative of the absence of whatever quantity we are trying to measure
...
e
...
In this sense, ratio data includes all the usual measurement (or determinations) of length,
height, money amounts, weight, volume, area, pressures etc
...
A researcher has to
be quite alert about this aspect while measuring properties of objects or of abstract concepts
...
g
...
But
when data is measured in units which are not interchangeable, e
...
, product preferences (by ordinal scales), the data is said
to be non-parametric and is susceptible only to a limited extent to mathematical and statistical treatment
...
The most widely used classification of measurement scales
are: (a) nominal scale; (b) ordinal scale; (c) interval scale; and (d) ratio scale
...
The usual example of this is the assignment of numbers of basketball players in
order to identify them
...
Nominal scales provide convenient ways of keeping
track of people, objects and events
...
For example,
one cannot usefully average the numbers on the back of a group of football players and come up with
a meaningful value
...
The counting of members in each group is the only possible arithmetic
operation when a nominal scale is employed
...
There is no generally used measure of dispersion for nominal scales
...
Nominal scale is the least powerful level of measurement
...
A nominal scale simply describes differences between
things by assigning them to categories
...
The scale wastes any
information that we may have about varying degrees of attitude, skills, understandings, etc
...
(b) Ordinal scale: The lowest level of the ordered scale that is commonly used is the ordinal scale
...
Rank orders represent ordinal scales and are frequently used in research
relating to qualitative phenomena
...
One has to be very careful in making statement about scores based on ordinal scales
...
The statement would make no sense at all
...
Ordinal measures have no
absolute values, and the real differences between adjacent ranks may not be equal
...
Thus, the use of an ordinal scale implies a statement of ‘greater than’ or ‘less than’ (an equality
statement is also acceptable) without our being able to state how much greater or less
...
Since the numbers of this scale have only a rank meaning, the appropriate measure of central tendency
is the median
...
Correlations are
restricted to various rank order methods
...
(c) Interval scale: In the case of interval scale, the intervals are adjusted in terms of some rule that
has been established as a basis for making the units equal
...
Interval scales can have an arbitrary zero, but it
is not possible to determine for them what may be called an absolute zero or the unique origin
...
The Fahrenheit scale is an example of an
interval scale and shows similarities in what one can and cannot do with it
...
The ratio of the two temperatures, 30° and 60°,
means nothing because zero is an arbitrary point
...
As such more powerful statistical measures can be
used with interval scales
...
Product moment correlation techniques are
appropriate and the generally used tests for statistical significance are the ‘t’ test and ‘F’ test
...
The term ‘absolute
zero’ is not as precise as it was once believed to be
...
For example, the zero point on a centimeter
scale indicates the complete absence of length or height
...
The number
of minor traffic-rule violations and the number of incorrect letters in a page of type script represent
scores on ratio scales
...
With ratio scales involved one can
make statements like “Jyoti’s” typing performance was twice as good as that of “Reetu
...
Ratio scale represents the actual amounts of variables
...
are examples
...
Multiplication and division can be used with this scale but not with other scales
mentioned above
...
Thus, proceeding from the nominal scale (the least precise type of scale) to ratio scale (the most
precise), relevant information is obtained increasingly
...
Researchers in physical
sciences have the advantage to describe variables in ratio scale form but the behavioural sciences
are generally limited to describe variables in interval scale form, a less precise type of measurement
...
This objective, however,
is often not met with in entirety
...
The following are the possible sources of error in measurement
...
All this
reluctance is likely to result in an interview of ‘guesses
...
may limit the ability of the respondent to respond accurately and fully
...
Any condition
which places a strain on interview can have serious effects on the interviewer-respondent rapport
...
If the respondent feels that anonymity is not assured, he may be reluctant to express certain
feelings
...
His
behaviour, style and looks may encourage or discourage certain replies from respondents
...
Errors may also creep in because of incorrect coding,
faulty tabulation and/or statistical calculations, particularly in the data-analysis stage
...
The use of complex
words, beyond the comprehension of the respondent, ambiguous meanings, poor printing, inadequate
space for replies, response choice omissions, etc
...
Another type of instrument deficiency is the poor
sampling of the universe of items of concern
...
He must, to the extent possible, try to eliminate, neutralize or otherwise deal
with all the possible sources of error so that the final results may not be contaminated
...
In fact, these are the
three major considerations one should use in evaluating a measurement tool
...
Reliability has to do with the
accuracy and precision of a measurement procedure
...
”1 We briefly take up the relevant details
concerning these tests of sound measurement
...
Test of Validity*
Validity is the most critical criterion and indicates the degree to which an instrument measures what
it is supposed to measure
...
In other words, validity is the
extent to which differences found with a measuring instrument reflect true differences among those
being tested
...
What is relevant, evidence often depends upon the nature of the
1
Robert L
...
, p
...
Two forms of validity are usually mentioned in research literature viz
...
External validity of research findings is their generalizability to populations, settings, treatment variables and measurement
variables
...
The internal validity of a research design is its
ability to measure what it aims to measure
...
*
74
Research Methodology
research problem and the judgement of the researcher
...
(i) Content validity is the extent to which a measuring instrument provides adequate coverage of
the topic under study
...
Its determination is primarily judgemental and intuitive
...
(ii) Criterion-related validity relates to our ability to predict some outcome or estimate the existence
of some current condition
...
The concerned criterion must possess the following qualities:
Relevance: (A criterion is relevant if it is defined in terms we judge to be the proper measure
...
)
Reliability: (A reliable criterion is stable or reproducible
...
)
In fact, a Criterion-related validity is a broad term that actually refers to (i) Predictive validity
and (ii) Concurrent validity
...
Criterion-related validity is expressed as the coefficient of correlation between
test scores and some measure of future performance or between test scores and scores on another
measure of known validity
...
A measure is said to possess construct
validity to the degree that it confirms to predicted correlations with other theoretical propositions
...
For determining construct validity, we associate a set of other propositions
with the results received from using our measurement instrument
...
If the above stated criteria and tests are met with, we may state that our measuring instrument
is valid and will result in correct measurement; otherwise we shall have to look for more information
and/or resort to exercise of judgement
...
Test of Reliability
The test of reliability is another important test of sound measurement
...
Reliable measuring instrument does contribute to validity, but
a reliable instrument need not be a valid instrument
...
, is a reliable scale, but it does not give a valid measure of weight
...
e
...
Accordingly reliability is not as valuable as
validity, but it is easier to assess reliability in comparison to validity
...
Measurement and Scaling Techniques
75
Two aspects of reliability viz
...
The stability
aspect is concerned with securing consistent results with repeated measurements of the same person
and with the same instrument
...
The equivalence aspect considers how much error may get introduced
by different investigators or different samples of the items being studied
...
Reliability can be improved in the following two ways:
(i) By standardising the conditions under which the measurement takes place i
...
, we must
ensure that external sources of variation such as boredom, fatigue, etc
...
That will improve stability aspect
...
This will improve equivalence aspect
...
Test of Practicality
The practicality characteristic of a measuring instrument can be judged in terms of economy,
convenience and interpretability
...
e
...
Economy consideration
suggests that some trade-off is needed between the ideal research project and that which the budget
can afford
...
Although more items give greater reliability as stated earlier, but in the interest of limiting
the interview or observation time, we have to take only few items for our study purpose
...
Convenience
test suggests that the measuring instrument should be easy to administer
...
For instance, a questionnaire,
with clear instructions (illustrated by examples), is certainly more effective and easier to complete
than one which lacks these features
...
The measuring instrument, in
order to be interpretable, must be supplemented by (a) detailed instructions for administering the test;
(b) scoring keys; (c) evidence about the reliability and (d) guides for using the test and for interpreting
results
...
The first and foremost step is that of concept development which means that the researcher
should arrive at an understanding of the major concepts pertaining to his study
...
The second step requires the researcher to specify the dimensions of the concepts that he
developed in the first stage
...
e
...
For instance, one may think of several dimensions such as product
reputation, customer treatment, corporate leadership, concern for individuals, sense of social
responsibility and so forth when one is thinking about the image of a certain company
...
Indicators are specific questions, scales, or other devices by
which respondent’s knowledge, opinion, expectation, etc
...
As there is seldom a perfect
measure of a concept, the researcher should consider several alternatives for the purpose
...
The last step is that of combining the various indicators into an index, i
...
, formation of an
index
...
One simple way for getting an overall index is to
provide scale values to the responses and then sum up the corresponding scores
...
”2 This way we
must obtain an overall index for the various concepts concerning the research study
...
Alternatively, we can say that while measuring attitudes
and opinions, we face the problem of their valid measurement
...
As such
we should study some procedures which may enable us to measure abstract concepts more accurately
...
Meaning of Scaling
Scaling describes the procedures of assigning numbers to various degrees of opinion, attitude and
other concepts
...
, (i) making a judgement about some characteristic
of an individual and then placing him directly on a scale that has been defined in terms of that
characteristic and (ii) constructing questionnaires in such a way that the score of individual’s responses
assigns him a place on a scale
...
g
...
) and the lowest
point along with several intermediate points between these two extreme points
...
112
...
Numbers for
measuring the distinctions of degree in the attitudes/opinions are, thus, assigned to individuals
corresponding to their scale-positions
...
Hence the term ‘scaling’ is applied to the procedures for attempting to determine
quantitative measures of subjective abstract concepts
...
”3
Scale Classification Bases
The number assigning procedures or the scaling procedures may be broadly classified on one or
more of the following bases: (a) subject orientation; (b) response form; (c) degree of subjectivity;
(d) scale properties; (e) number of dimensions and (f) scale construction techniques
...
(a) Subject orientation: Under it a scale may be designed to measure characteristics of the respondent
who completes it or to judge the stimulus object which is presented to the respondent
...
In the latter approach, we
ask the respondent to judge some specific object in terms of one or more dimensions and we presume
that the between-respondent variation will be small as compared to the variation among the different
stimuli presented to respondents for judging
...
Categorical scales are also known as rating scales
...
Under comparative scales, which are also
known as ranking scales, the respondent is asked to compare two or more objects
...
The essence of ranking is, in fact, a relative comparison of a certain property of two or
more objects
...
In the former case, the
respondent is asked to choose which person he favours or which solution he would like to see
employed, whereas in the latter case he is simply asked to judge which person is more effective in
some aspect or which solution will take fewer resources without reflecting any personal preference
...
Nominal scales merely classify without indicating order, distance or unique
origin
...
Interval scales have both order and distance values, but no unique origin
...
(e) Number of dimensions: In respect of this basis, scales can be classified as ‘unidimensional’
and ‘multidimensional’ scales
...
3
Bernard S
...
, p
...
78
Research Methodology
(f) Scale construction techniques: Following are the five main techniques by which scales can
be developed
...
This is
the most widely used approach
...
(ii) Consensus approach: Here a panel of judges evaluate the items chosen for inclusion in
the instrument in terms of whether they are relevant to the topic area and unambiguous in
implication
...
After administering the test, the total scores are
calculated for every one
...
(iv) Cumulative scales are chosen on the basis of their conforming to some ranking of items
with ascending and descending discriminating power
...
(v) Factor scales may be constructed on the basis of intercorrelations of items which indicate
that a common factor accounts for the relationship between items
...
Important Scaling Techniques
We now take up some of the important scaling techniques often used in the context of research
specially in context of social or business research
...
When we use rating scales (or categorical scales), we judge an object
in absolute terms against some specified criteria i
...
, we judge properties of objects without reference
to other similar objects
...
There is no specific
rule whether to use a two-points scale, three-points scale or scale with still more points
...
Rating scale may be either a graphic rating scale or an itemized rating scale
...
Under it the
various points are usually put along the line to form a continuum and the rater indicates his
rating by simply making a mark (such as ü) at the appropriate point on a line that runs from
one extreme to the other
...
The following is an example
of five-points graphic rating scale when we wish to ascertain people’s liking or disliking any
product:
Measurement and Scaling Techniques
79
How do you like the product?
(Please check)
Like very
much
Like some
what
Neutral
Dislike some
what
Dislike very
much
Fig
...
1
This type of scale has several limitations
...
The meanings of
the terms like “very much” and “some what” may depend upon respondent’s frame of
reference so much so that the statement might be challenged in terms of its equivalency
...
g
...
(ii) The itemized rating scale (also known as numerical scale) presents a series of statements
from which a respondent selects one as best reflecting his evaluation
...
An example of itemized
scale can be given to illustrate it
...
He is often at odds with one or more of his fellow workers
...
He infrequently becomes involved in friction with others
...
The chief merit of this type of scale is that it provides more information and meaning to the rater,
and thereby increases reliability
...
Rating scales have certain good points
...
They require less time, are interesting to use and have a wide range of
applications
...
But their
value for measurement purposes depends upon the assumption that the respondents can and do
make good judgements
...
Three
types of errors are common viz
...
The error of leniency occurs when certain respondents are either easy raters or hard
raters
...
The error of hallo effect or the systematic bias occurs when the rater carries over a
generalised impression of the subject from one rating to another
...
In other words, hallo effect is
likely to appear when the rater is asked to rate many factors, on a number of which he has no
evidence for judgement
...
The respondents under this method directly compare two or more
objects and make choices among them
...
(a) Method of paired comparisons: Under it the respondent can express his attitude by making a
choice between two objects, say between a new flavour of soft drink and an established brand of
drink
...
For instance, if there are ten suggestions for bargaining proposals available to a workers union, there
are 45 paired comparisons that can be made with them
...
We can
reduce the number of comparisons per respondent either by presenting to each one of them only a
sample of stimuli or by choosing a few objects which cover the range of attractiveness at about equal
intervals and then comparing all other stimuli to these few standard objects
...
If there is substantial consistency, we will find that if X is
preferred to Y, and Y to Z, then X will consistently be preferred to Z
...
It should be remembered that paired comparison provides ordinal data, but the same may be
converted into an interval scale by the method of the Law of Comparative Judgement developed by
L
...
Thurstone
...
J
...
Guilford in his book “Psychometric Methods” has given a procedure which is
relatively easier
...
The
committee wants to know how the union membership ranks these proposals
...
1: Response Patterns of 100 Members Paired Comparisons of
4 Suggestions for Union Bargaining Proposal Priorities
Suggestion
A
TOTAL:
*
C
D
–
40
45
80
A
B
C
D
B
*
65
–
50
20
32
38
–
98
20
42
70
–
165
135
168
132
Read as 65 members preferred suggestion B to suggestion A
...
Measurement and Scaling Techniques
81
Rank order
2
3
1
4
Mp
0
...
4625
0
...
4550
Zj
0
...
09
0
...
11
Rj
0
...
02
0
...
00
Comparing the total number of preferences for each of the four proposals, we find that C is the
most popular, followed by A, B and D respectively in popularity
...
By following the composite standard method, we can develop an interval scale from the pairedcomparison ordinal data given in the above table for which purpose we have to adopt the following
steps in order:
(i) Using the data in the above table, we work out the column mean with the help of the
formula given below:
Mp =
b g = 165 +
...
5375
nN
4 b100g
C +
...
The column means have been shown in the Mp row in the above table
...
When the Mp value is less than
...
5, the Z values are positive
...
(iii) As the Zj values represent an interval scale, zero is an arbitrary value
...
11 in our example which we shall take equal to zero) and then adding the absolute
value of this lowest scale value to all other scale items
...
Graphically we can show this interval scale that we have derived from the paired-comparison
data using the composite standard method as follows:
D B
0
...
1
0
...
3
0
...
5
...
5 from all Mp values which exceed
...
For all Mp values
of less than
...
5 to secure the values with which to enter the normal curve area table
for which Z values can be obtained but the Z values in this situation will be with negative sign
...
This method is easier and faster than the method of paired comparisons stated
above
...
The problem of transitivity (such as A
prefers to B, B to C, but C prefers to A) is also not there in case we adopt method of rank order
...
To secure a simple ranking of all items involved we simply total
rank values received by each item
...
But then there are limitations of this method
...
Then there may be the problem of respondents becoming careless in assigning ranks
particularly when there are many (usually more than 10) items
...
Under this approach, the respondent expresses his
agreement or disagreement with a number of statements relevant to the issue
...
Researchers must as well be aware that inferring attitude from what has been recorded in
opinionnaires has several limitations
...
They may not really know how they feel about a social issue
...
Even behaviour itself is at times not a true indication of attitude
...
Thus, there is no sure method of measuring attitude; we only try to measure the expressed opinion
and then draw inferences from it about people’s real feelings or attitudes
...
The researcher should know these techniques so as to
develop an appropriate scale for his own study
...
Measurement and Scaling Techniques
83
Table 5
...
Arbitrary approach
Arbitrary scales
2
...
Item analysis approach
4
...
Factor analysis approach
Cumulative scales (such as Guttman’s Scalogram)
Factor scales (such as Osgood’s Semantic
Differential, Multi-dimensional Scaling, etc
...
Arbitrary Scales
Arbitrary scales are developed on ad hoc basis and are designed largely through the researcher’s
own subjective selection of items
...
Some of these are selected for inclusion
in the measuring instrument and then people are asked to check in a list the statements with which
they agree
...
They can also be designed to be highly specific and adequate
...
At the same time there are some limitations of these scales
...
We have simply to rely on researcher’s insight and competence
...
L
...
Under such an approach the selection of items is made by a panel of
judges who evaluate the items in terms of whether they are relevant to the topic area and unambiguous
in implication
...
e
...
(b) These statements are then submitted to a panel of judges, each of whom arranges them in
eleven groups or piles ranging from one extreme to another in position
...
(c) This sorting by each judge yields a composite position for each of the items
...
84
Research Methodology
(d) For items that are retained, each is given its median scale value between one and eleven as
established by the panel
...
(e) A final selection of statements is then made
...
The
statements so selected, constitute the final scale to be administered to respondents
...
After developing the scale as stated above, the respondents are asked during the administration
of the scale to check the statements with which they agree
...
It may be noted
that in the actual instrument the statements are arranged in random order of scale value
...
However, at
times divergence may occur when a statement appears to tap a different attitude dimension
...
Such scales are considered most
appropriate and reliable when used for measuring a single attitude
...
Another weakness of such scales is that the
values assigned to various statements by the judges may reflect their own attitudes
...
Critics of this method also
opine that some other scale designs give more information about the respondent’s attitude in comparison
to differential scales
...
Those items or statements that best meet this sort of
discrimination test are included in the final instrument
...
The respondent
indicates his agreement or disagreement with each statement in the instrument
...
In other words, the overall score represents the respondent’s
position on the continuum of favourable-unfavourableness towards an issue
...
For this reason they are often referred to as Likert-type scales
...
For example, when
asked to express opinion whether one considers his job quite pleasant, the respondent may respond in
any one of the following ways: (i) strongly agree, (ii) agree, (iii) undecided, (iv) disagree, (v) strongly
disagree
...
At one extreme of the scale there is strong
agreement with the given statement and at the other, strong disagreement, and between them lie
intermediate points
...
5
...
Response indicating the least favourable degree of job
satisfaction is given the least score (say 1) and the most favourable is given the highest score (say 5)
...
The Likert scaling technique, thus, assigns a scale value to each of the five
responses
...
This
way the instrument yields a total score for each respondent, which would then measure the respondent’s
favourableness toward the given point of view
...
30 × 5 = 150 Most favourable response possible
30 × 3 = 90 A neutral attitude
30 × 1 = 30 Most unfavourable attitude
...
If the score happens to be above
90, it shows favourable opinion to the given point of view, a score of below 90 would mean unfavourable
opinion and a score of exactly 90 would be suggestive of a neutral attitude
...
(ii) After the statements have been gathered, a trial test should be administered to a number of
subjects
...
(iii) The response to various statements are scored in such a way that a response indicative of
the most favourable attitude is given the highest score of 5 and that with the most unfavourable
attitude is given the lowest score, say, of 1
...
(v) The next step is to array these total scores and find out those statements which have a high
discriminatory power
...
These two
extreme groups are interpreted to represent the most favourable and the least favourable
attitudes and are used as criterion groups by which to evaluate individual statements
...
(vi) Only those statements that correlate with the total test should be retained in the final
instrument and all others must be discarded from it
...
Mention may be made of the important
ones
...
(b) Likert-type scale is considered more reliable because under it respondents answer each
statement included in the instrument
...
(c) Each statement, included in the Likert-type scale, is given an empirical test for discriminating
ability and as such, unlike Thurstone-type scale, the Likert-type scale permits the use of
statements that are not manifestly related (to have a direct relationship) to the attitude
being studied
...
e
...
(e) Likert-type scale takes much less time to construct, it is frequently used by the students of
opinion research
...
Limitations: There are several limitations of the Likert-type scale as well
...
There is no basis for belief that the five
positions indicated on the scale are equally spaced
...
This means that Likert scale
does not rise to a stature more than that of an ordinal scale, whereas the designers of Thurstone
scale claim the Thurstone scale to be an interval scale
...
It is unlikely that the respondent can validly react to a short
statement on a printed form in the absence of real-life qualifying situations
...
”4 This particular weakness of the Likert-type scale is met by using a cumulative scale
which we shall take up later in this chapter
...
They are equally useful when we are concerned with a programme of
*
A
...
Edwards and K
...
Kenney, “A comparison of the Thurstone and Likert techniques of attitude scale construction”,
Journal of Applied Psychology, 30, 72–83, 1946
...
Best and James V
...
, Prentice-Hall of India Pvt
...
, New Delhi, 1986,
p
...
Measurement and Scaling Techniques
87
change or improvement in which case we can use the scales to measure attitudes before and after
the programme of change or improvement in order to assess whether our efforts have had the
desired effects
...
All this accounts for the
popularity of Likert-type scales in social studies relating to measuring of attitudes
...
The
special feature of this type of scale is that statements in it form a cumulative series
...
3, also replies favourably to items No
...
4 also replies favourably to items No
...
This being so an
individual whose attitude is at a certain point in a cumulative scale will answer favourably all the
items on one side of this point, and answer unfavourably all the items on the other side of this point
...
If one knows this total score, one can estimate as to how a
respondent has answered individual statements constituting cumulative scales
...
We attempt a brief description of the
same below
...
Scalogram analysis refers to the procedure for determining whether a set of items
forms a unidimensional scale
...
Under this technique, the respondents are asked to indicate in respect of
each item whether they agree or disagree with it, and if these items form a unidimensional scale, the
response pattern will be as under:
Table 5
...
But a score of 3 would mean that the respondent is not agreeable to
item 4, but he agrees with all others
...
This pattern reveals that the universe of content is scalable
...
In other words, we must lay down in
clear terms the issue we want to deal within our study
...
(c) The third step consists in pre-testing the items to determine whether the issue at hand is
scalable (The pretest, as suggested by Guttman, should include 12 or more items, while the
final scale may have only 4 to 6 items
...
In a pretest the respondents are asked to record their opinions on all selected items using
a Likert-type 5-point scale, ranging from ‘strongly agree’ to ‘strongly disagree’
...
The
total score can thus range, if there are 15 items in all, from 75 for most favourable to 15 for
the least favourable
...
If the responses of an item form a cumulative scale, its response category
scores should decrease in an orderly fashion as indicated in the above table
...
e
...
Sometimes the overlapping in category responses can be reduced by combining categories
...
(d) The next step is again to total the scores for the various opinionnaires, and to rearray them
to reflect any shift in order, resulting from reducing the items, say, from 15 in pretest to, say,
5 for the final scale
...
4
...
4: The Final Pretest Results in a Scalogram Analysis*
Scale type
5
5 (perfect)
4 (perfect)
(nonscale)
(nonscale)
3 (perfect)
2 (perfect)
1 (perfect)
(nonscale)
(nonscale)
0 (perfect)
12
X
–
–
–
–
–
–
–
–
–
X
X
X
X
–
–
–
–
–
–
Item
3
10
X
X
–
X
X
–
–
X
X
–
n=5
*
X
X
X
–
X
X
–
–
–
–
7
Errors
per case
Number of
cases
Number of
errors
X
X
X
X
X
X
X
–
–
–
0
0
1
1
0
0
0
2
2
0
7
3
1
2
5
2
1
1
1
2
0
0
1
2
0
0
0
2
2
0
N = 25
e=7
(Figures in the table are arbitrary and have been used to explain the tabulation process only
...
The number of respondents is 25 whose responses to various items have been
tabulated along with the number of errors
...
Non-scale types are those in which the category pattern differs from that
expected from the respondent’s total score i
...
, non-scale cases have deviations from
unidimensionality or errors
...
Guttman has set 0
...
He has
given the following formula for measuring the level of reproducibility:
Guttman’s Coefficient of Reproducibility = 1 – e/n(N)
where e = number of errors
n = number of items
N = number of cases
For the above table figures,
Coefficient of Reproducibility = 1 – 7/5(25) =
...
Scalogram, analysis, like any other scaling technique, has several advantages as well as
limitations
...
Researcher’s subjective judgement is not allowed to creep in the development
of scale since the scale is determined by the replies of respondents
...
Scalogram analysis can
appropriately be used for personal, telephone or mail surveys
...
This method is not a
frequently used method for the simple reason that its development procedure is tedious and
complex
...
Conceptually, this analysis is a bit more difficult in comparison to other scaling
methods
...
Factor scales are
particularly “useful in uncovering latent attitude dimensions and approach scaling through the concept
of multiple-dimension attribute space
...
, how to deal
*
A detailed study of the factor scales and particularly the statistical procedures involved in developing factor scales is
beyond the scope of this book
...
5
C
...
264–65
...
An important
factor scale based on factor analysis is Semantic Differential (S
...
) and the other one is
Multidimensional Scaling
...
Semantic differential scale: Semantic differential scale or the S
...
scale developed by Charles
E
...
J
...
H
...
This scale is based on the presumption that an object can have
different dimensions of connotative meanings which can be located in multidimensional property
space, or what can be called the semantic space in the context of S
...
scale
...
For instance, the S
...
scale items for analysing candidates for leadership
position may be shown as under:
(E) Successful
Unsuccessful
(P) Severe
Lenient
(P) Heavy
Light
(A) Hot
Cold
(E) Progressive
Regressive
(P) Strong
Weak
(A) Active
Passive
(A) Fast
Slow
(E) True
False
(E) Sociable
Unsociable
3
2
1
0
–1
–2
–3
Fig
...
4
Candidates for leadership position (along with the concept—the ‘ideal’ candidate) may be
compared and we may score them from +3 to –3 on the basis of the above stated scales
...
, evaluation, potency and activity respectively, written
along the left side are not written in actual scale
...
)
Osgood and others did produce a list of some adjective pairs for attitude research purposes and
concluded that semantic space is multidimensional rather than unidimensional
...
, evaluation, potency and activity, contributed most
to meaningful judgements by respondents
...
Procedure: Various steps involved in developing S
...
scale are as follows:
(a) First of all the concepts to be studied are selected
...
Measurement and Scaling Techniques
91
(b) The next step is to select the scales bearing in mind the criterion of factor composition and
the criterion of scale’s relevance to the concepts being judged (it is common practice to use
at least three scales for each factor with the help of which an average factor score has to
be worked out)
...
(c) Then a panel of judges are used to rate the various stimuli (or objects) on the various
selected scales and the responses of all judges would then be combined to determine the
composite scaling
...
D
...
It is an efficient and easy
way to secure attitudes from a large sample
...
The total set of responses provides a comprehensive picture of the
meaning of an object, as well as a measure of the subject doing the rating
...
”6
Multidimensional scaling: Multidimensional scaling (MDS) is relatively more complicated scaling
device, but with this sort of scaling one can scale objects, individuals or both with a minimum of
information
...
It “provides useful methodology
for portraying subjective judgements of diverse kinds
...
The underlying assumption in MDS is that people (respondents) “perceive a set of
objects as being more or less similar to one another on a number of dimensions (usually uncorrelated
with one another) instead of only one
...
In fact, these techniques attempt to locate
the points, given the information about a set of interpoint distances, in space of one or more dimensions
such as to best summarise the information contained in the interpoint distances
...
For instance, if objects,
say X and Y, are thought of by the respondent as being most similar as compared to all other possible
pairs of objects, MDS techniques will position objects X and Y in such a way that the distance
between them in multidimensional space is shorter than that between any two other objects
...
, the metric approach and the non-metric approach, are usually talked about
in the context of MDS, while attempting to construct a space containing m points such that
m(m – 1)/2 interpoint distances reflect the input data
...
, p
...
Paul E
...
421
...
Sheth, “The Multivariate Revolution in Marketing Research”, quoted in “Marketing Research” by Danny
N
...
Greenberg, p
...
*
Additive constant refers to that constant with which one can, either by subtracting or adding, convert interval scale to
a ratio scale
...
If one were to subtract 3 from each of these distances, they would be 4, 3 and 0 respectively
...
Obviously, one can add 3 to all the
converted distances and reachieve the ratio scale of distances
...
Well
defined iterative approach is employed in practice for estimating appropriate additive constant
...
This approach utilises all the information in the
data in obtaining a solution
...
e
...
If the data reflect exact
distances between real objects in an r-dimensional space, their solution will reproduce the set of
interpoint distances
...
Generally, the judged similarities among a set of
objects are statistically transformed into distances by placing those objects in a multidimensional
space of some dimensionality
...
Such non-metric data is then
transformed into some arbitrary metric space and then the solution is obtained by reducing the
dimensionality
...
This is achieved by requiring only that the distances in the
solution be monotone with the input data
...
The significance of MDS lies in the fact that it enables the researcher to study “the perceptual
structure of a set of stimuli and the cognitive processes underlying the development of this structure
...
”10 The MDS techniques, infact, do away with the need in the data collection process to
specify the attribute(s) along which the several brands, say of a particular product, may be compared
as ultimately the MDS analysis itself reveals such attribute(s) that presumably underlie the expressed
relative similarities among objects
...
g
...
company images, advertisement brands, etc
...
Many of its methods are quite laborious in terms of both the collection
of data and the subsequent analyses
...
The techniques have been specifically applied in “finding
out the perceptual dimensions, and the spacing of stimuli along these dimensions, that people, use in
making judgements about the relative similarity of pairs of Stimuli
...
”13
9
Robert Ferber (ed
...
3–51
...
, p
...
11
G
...
Giles, Marketing, p
...
12
Paul E
...
421
...
Nunnally, Psychometric Theory, p
...
10
Measurement and Scaling Techniques
93
Questions
1
...
2
...
(2) Stability and equivalence aspects of reliability essentially mean the same thing
...
(4) There is no difference between concept development and concept specification
...
3
...
Describe the tests of sound measurement
...
Are the following nominal, ordinal, interval or ratio data? Explain your answers
...
(b) Military ranks
...
(d) Number of passengers on buses from Delhi to Mumbai
...
5
...
Ranking scales
...
Cumulative scales
...
Factor analysis
...
The following table shows the results of a paired-comparison preference test of four cold drinks from a
sample of 200 persons:
Name
Coca Cola
Limca
Goldspot
Thumps up
*
Coca Cola
Limca
Goldspot
Thumps up
–
160
75
165
60*
–
40
120
105
150
–
145
45
70
65
–
To be read as 60 persons preferred Limca over Coca Cola
...
(b) Develop an interval scale for the four varieties of cold drinks
...
(1) Narrate the procedure for developing a scalogram and illustrate the same by an example
...
8
...
9
...
10
...
” Discuss
...
Methods of Data Collection
95
6
Methods of Data Collection
The task of data collection begins after a research problem has been defined and research design/
plan chalked out
...
, primary and secondary
...
The secondary data, on the other hand, are those which have already been collected by someone
else and which have already been passed through the statistical process
...
The methods of collecting primary and
secondary data differ since primary data are to be originally collected, while in case of secondary
data the nature of data collection work is merely that of compilation
...
COLLECTION OF PRIMARY DATA
We collect primary data during the course of doing experiments in an experimental research but in
case we do research of the descriptive type and perform surveys, whether sample surveys or census
surveys, then we can obtain primary data either through observation or through direct communication
with respondents in one form or another or through personal interviews
...
In an experiment the investigator measures the effects of an experiment which he conducts intentionally
...
In a survey, the investigator examines those phenomena which exist in the universe independent of
his action
...
Important ones are: (i) observation method, (ii) interview method, (iii) through questionnaires,
(iv) through schedules, and (v) other methods which include (a) warranty cards; (b) distributor
audits; (c) pantry audits; (d) consumer panels; (e) using mechanical devices; (f) through projective
techniques; (g) depth interviews, and (h) content analysis
...
Observation Method
The observation method is the most commonly used method specially in studies relating to behavioural
sciences
...
Observation becomes a scientific tool and the method of data collection for the researcher,
when it serves a formulated research purpose, is systematically planned and recorded and is subjected
to checks and controls on validity and reliability
...
For
instance, in a study relating to consumer behaviour, the investigator instead of asking the brand of
wrist watch used by the respondent, may himself look at the watch
...
Secondly, the information
obtained under this method relates to what is currently happening; it is not complicated by either the
past behaviour or future intentions or attitudes
...
This method is
particularly suitable in studies which deal with subjects (i
...
, respondents) who are not capable of
giving verbal reports of their feelings for one reason or the other
However, observation method has various limitations
...
Secondly,
the information provided by this method is very limited
...
At times, the fact that some people are rarely accessible to
direct observation creates obstacle for this method to collect data effectively
...
But when observation
is to take place without these characteristics to be thought of in advance, the same is termed as
unstructured observation
...
We often talk about participant and non-participant types of observation in the context of studies,
particularly of social sciences
...
If the observer observes by making himself, more or less, a
member of the group he is observing so that he can experience what the members of the group
experience, the observation is called as the participant observation
...
(When the
observer is observing in such a manner that his presence may be unknown to the people he is
observing, such an observation is described as disguised observation
...
(ii) The researcher can even gather information which
could not easily be obtained if he observes in a disinterested fashion
...
But
there are also certain demerits of this type of observation viz
...
Sometimes we talk of controlled and uncontrolled observation
...
In non-controlled observation, no attempt is made to use precision instruments
...
It has a
tendency to supply naturalness and completeness of behaviour, allowing sufficient time for observing
it
...
Such observation has a tendency to supply formalised data upon which
generalisations can be built with some degree of assurance
...
There is also the danger of having the feeling that we
know more about the observed phenomena than we actually do
...
Interview Method
The interview method of collecting data involves presentation of oral-verbal stimuli and reply in
terms of oral-verbal responses
...
(a) Personal interviews: Personal interview method requires a person known as the interviewer
asking questions generally in a face-to-face contact to the other person or persons
...
) This sort of interview may be in the
form of direct personal investigation or it may be indirect oral investigation
...
He has to be on the spot and has to meet people from whom data have to be collected
...
But in certain cases it may not be
possible or worthwhile to contact directly the persons concerned or on account of the extensive
scope of enquiry, the direct personal investigation technique may not be used
...
Most of the commissions and committees appointed by government
to carry on investigations make use of this method
...
As such we call the interviews as structured interviews
...
Thus,
98
Research Methodology
the interviewer in a structured interview follows a rigid procedure laid down, asking questions in a
form and order prescribed
...
Unstructured interviews do not follow a system of pre-determined
questions and standardised techniques of recording information
...
He may even change the sequence
of questions
...
But this sort of flexibility results in lack of comparability of one interview with
another and the analysis of unstructured responses becomes much more difficult and time-consuming
than that of the structured responses obtained in case of structured interviews
...
Unstructured interview,
however, happens to be the central technique of collecting information in case of exploratory or
formulative research studies
...
We may as well talk about focussed interview, clinical interview and the non-directive interview
...
Under it the interviewer has the freedom to decide the manner and sequence in which the
questions would be asked and has also the freedom to explore reasons and motives
...
Such interviews are used generally in the development of
hypotheses and constitute a major type of unstructured interviews
...
The
method of eliciting information under it is generally left to the interviewer’s discretion
...
The interviewer often acts as a
catalyst to a comprehensive expression of the respondents’ feelings and beliefs and of the frame of
reference within which such feelings and beliefs take on personal significance
...
The chief merits of the interview method are as
follows:
(i) More information and that too in greater depth can be obtained
...
(iii) There is greater flexibility under this method as the opportunity to restructure questions is
always there, specially in case of unstructured interviews
...
(v) Personal information can as well be obtained easily under this method
...
(vii) The interviewer can usually control which person(s) will answer the questions
...
If so desired, group discussions may also be
held
...
(ix) The language of the interview can be adopted to the ability or educational level of the
person interviewed and as such misinterpretations concerning questions can be avoided
...
But there are also certain weaknesses of the interview method
...
(ii) There remains the possibility of the bias of interviewer as well as that of the respondent;
there also remains the headache of supervision and control of interviewers
...
(iv) This method is relatively more-time-consuming, specially when the sample is large and recalls upon the respondents are necessary
...
(vi) Under the interview method the organisation required for selecting, training and supervising
the field-staff is more complex with formidable problems
...
(viii) Effective interview presupposes proper rapport with respondents that would facilitate free
and frank responses
...
Pre-requisites and basic tenets of interviewing: For successful implementation of the interview
method, interviewers should be carefully selected, trained and briefed
...
Occasional field checks should be made to ensure that interviewers are neither cheating, nor deviating
from instructions given to them for performing their job efficiently
...
In fact, interviewing is an art governed by certain scientific principles
...
The interviewer must ask questions properly and
intelligently and must record the responses accurately and completely
...
The interviewers approach must be friendly, courteous, conversational and unbiased
...
100
Research Methodology
(b) Telephone interviews: This method of collecting information consists in contacting respondents
on telephone itself
...
The chief merits of such a system are:
1
...
3
...
5
...
7
...
9
...
It is more flexible in comparison to mailing method
...
e
...
It is cheaper than personal interviewing method; here the cost per response is relatively low
...
There is a higher rate of response than what we have in mailing method; the non-response
is generally very low
...
Interviewer can explain requirements more easily
...
No field staff is required
...
But this system of collecting information is not free from demerits
...
1
...
2
...
3
...
4
...
5
...
6
...
COLLECTION OF DATA THROUGH QUESTIONNAIRES
This method of data collection is quite popular, particularly in case of big enquiries
...
In this method a questionnaire is sent (usually by post) to the persons concerned with a request to
answer the questions and return the questionnaire
...
The questionnaire is mailed to respondents
who are expected to read and understand the questions and write down the reply in the space meant
for the purpose in the questionnaire itself
...
The method of collecting data by mailing the questionnaires to respondents is most extensively
employed in various economic and business surveys
...
There is low cost even when the universe is large and is widely spread geographically
...
3
...
5
101
It is free from the bias of the interviewer; answers are in respondents’ own words
...
Respondents, who are not easily approachable, can also be reached conveniently
...
The main demerits of this system can also be listed here:
1
...
2
...
3
...
4
...
5
...
6
...
7
...
Before using this method, it is always advisable to conduct ‘pilot study’ (Pilot Survey) for testing
the questionnaires
...
Pilot survey is
infact the replica and rehearsal of the main survey
...
From the
experience gained in this way, improvement can be effected
...
Hence it should be very carefully constructed
...
This fact requires us to study the main aspects of a questionnaire viz
...
Researcher should note the
following with regard to these three main aspects of a questionnaire:
1
...
Structured questionnaires are those questionnaires in which
there are definite, concrete and pre-determined questions
...
Resort is taken to this sort of standardisation
to ensure that all respondents reply to the same set of questions
...
e
...
e
...
Structured questionnaires may also have fixed
alternative questions in which responses of the informants are limited to the stated alternatives
...
When these characteristics are not present
in a questionnaire, it can be termed as unstructured or non-structured questionnaire
...
102
Research Methodology
Structured questionnaires are simple to administer and relatively inexpensive to analyse
...
But
such questionnaires have limitations too
...
They are usually considered inappropriate
in investigations where the aim happens to be to probe for attitudes and reasons for certain actions or
feelings
...
In such situations, unstructured questionnaires may be used effectively
...
2
...
A proper sequence of questions reduces considerably the chances of individual questions
being misunderstood
...
The first few questions are particularly important
because they are likely to influence the attitude of the respondent and in seeking his desired
cooperation
...
The following type
of questions should generally be avoided as opening questions in a questionnaire:
1
...
questions of a personal character;
3
...
Following the opening questions, we should have questions that are really vital to the research
problem and a connecting thread should run through successive questions
...
Knowing what information is desired,
the researcher can rearrange the order of the questions (this is possible in case of unstructured
questionnaire) to fit the discussion in each particular case
...
Relatively difficult questions must be relegated
towards the end so that even if the respondent decides not to answer such questions, considerable
information would have already been obtained
...
For instance,
if one question deals with the price usually paid for coffee and the next with reason for preferring
that particular brand, the answer to this latter question may be couched largely in terms of pricedifferences
...
Question formulation and wording: With regard to this aspect of questionnaire, the researcher
should note that each question must be very clear for any sort of misunderstanding can do irreparable
harm to a survey
...
Questions should be constructed with a view to their forming a logical part of a well
thought out tabulation plan
...
e
...
(For
Methods of Data Collection
103
instance, instead of asking
...
, multiple choice
question and the open-end question
...
The
question with only two possible answers (usually ‘Yes’ or ‘No’) can be taken as a special case of the
multiple choice question, or can be named as a ‘closed question
...
Multiple choice or closed questions have the
advantages of easy handling, simple to answer, quick and relatively inexpensive to analyse
...
Sometimes, the provision of alternative replies helps to make
clear the meaning of the question
...
e
...
They are not appropriate when the issue
under consideration happens to be a complex one and also when the interest of the researcher is in
the exploration of a process
...
Such questions give the respondent considerable latitude in phrasing a reply
...
But one
should not forget that, from an analytical point of view, open-ended questions are more difficult to
handle, raising problems of interpretation, comparability and interviewer bias
...
The various forms complement each other
...
For instance, multiple-choice questions constitute the basis of a
structured questionnaire, particularly in a mail survey
...
Researcher must pay proper attention to the wordings of questions since reliable and meaningful
returns depend on it to a large extent
...
Simple words, which are familiar to all respondents should be employed
...
Similarly, danger words, catch-words or words with
emotional connotations should be avoided
...
Question wording, in no case, should bias the
answer
...
Essentials of a good questionnaire: To be successful, questionnaire should be comparatively
short and simple i
...
, the size of the questionnaire should be kept to the minimum
...
Personal and intimate
questions should be left to the end
...
Questions may be dichotomous (yes or no
answers), multiple choice (alternative answers listed) or open-ended
...
There
should be some control questions in the questionnaire which indicate the reliability of the respondent
...
104
Research Methodology
first in terms of financial expenditure and later in terms of weight
...
Questions affecting
the sentiments of respondents should be avoided
...
There should always be provision for indications of
uncertainty, e
...
, “do not know,” “no preference” and so on
...
Finally, the physical appearance
of the questionnaire affects the cooperation the researcher receives from the recipients and as such
an attractive looking questionnaire, particularly in mail surveys, is a plus point for enlisting cooperation
...
COLLECTION OF DATA THROUGH SCHEDULES
This method of data collection is very much like the collection of data through questionnaire, with
little difference which lies in the fact that schedules (proforma containing a set of questions) are
being filled in by the enumerators who are specially appointed for the purpose
...
In certain
situations, schedules may be handed over to respondents and enumerators may help them in recording
their answers to various questions in the said schedules
...
This method requires the selection of enumerators for filling up schedules or assisting respondents
to fill up schedules and as such enumerators should be very carefully selected
...
Enumerators should be intelligent and must possess the capacity of crossexamination in order to find out the truth
...
This method of data collection is very useful in extensive enquiries and can lead to fairly reliable
results
...
Population census all over the world is conducted through this
method
...
There is much resemblance in the nature of these two methods and this fact has made many people
to remark that from a practical point of view, the two methods can be taken to be the same
...
The important points of difference
are as under:
1
...
The schedule
Methods of Data Collection
2
...
4
...
6
...
8
...
10
...
12
...
To collect data through questionnaire is relatively cheap and economical since we have to
spend money only in preparing the questionnaire and in mailing the same to respondents
...
To collect data through schedules is relatively more expensive
since considerable amount of money has to be spent in appointing enumerators and in
importing training to them
...
Non-response is usually high in case of questionnaire as many people do not respond and
many return the questionnaire without answering all questions
...
As against this, non-response is generally very low in case of
schedules because these are filled by enumerators who are able to get answers to all
questions
...
In case of questionnaire, it is not always clear as to who replies, but in case of schedule the
identity of respondent is known
...
Personal contact is generally not possible in case of the questionnaire method as
questionnaires are sent to respondents by post who also in turn return the same by post
...
Questionnaire method can be used only when respondents are literate and cooperative, but
in case of schedules the information can be gathered even when the respondents happen to
be illiterate
...
Risk of collecting incomplete and wrong information is relatively more under the questionnaire
method, particularly when people are unable to understand questions properly
...
As a result, the information collected through schedules is relatively more accurate
than that obtained through questionnaires
...
In order to attract the attention of respondents, the physical appearance of questionnaire
must be quite attractive, but this may not be so in case of schedules as they are to be filled
in by enumerators and not by respondents
...
106
Research Methodology
SOME OTHER METHODS OF DATA COLLECTION
Let us consider some other methods of data collection, particularly used by big business houses in
modern times
...
Warranty cards: Warranty cards are usually postal sized cards which are used by dealers of
consumer durables to collect information regarding their products
...
2
...
Distributors get the retail stores audited
through salesmen and use such information to estimate market size, market share, seasonal purchasing
pattern and so on
...
For
instance, in case of a grocery store audit, a sample of stores is visited periodically and data are
recorded on inventories on hand either by observation or copying from store records
...
The principal advantage of this method is that it offers the
most efficient way of evaluating the effect on sales of variations of different techniques of in-store
promotion
...
Pantry audits: Pantry audit technique is used to estimate consumption of the basket of goods at
the consumer level
...
Thus in pantry audit data are recorded from the examination of
consumer’s pantry
...
Quite often, pantry audits are supplemented by direct questioning
relating to reasons and circumstances under which particular products were purchased in an attempt
to relate these factors to purchasing habits
...
An important limitation of pantry audit approach is that, at times, it may not be possible
to identify consumers’ preferences from the audit data alone, particularly when promotion devices
produce a marked rise in sales
...
Consumer panels: An extension of the pantry audit approach on a regular basis is known as
‘consumer panel’, where a set of consumers are arranged to come to an understanding to maintain
detailed daily records of their consumption and the same is made available to investigator on demands
...
Mostly consume panels are of two types viz
...
A transitory consumer panel is set up to measure the effect of
a particular phenomenon
...
Initial interviews
are conducted before the phenomenon takes place to record the attitude of the consumer
...
It is a favourite tool of advertising and
of social research
...
Such
panels have been used in the area of consumer expenditure, public opinion and radio and TV listenership
Methods of Data Collection
107
among others
...
The representativeness of the panel relative to
the population and the effect of panel membership on the information obtained after the two major
problems associated with the use of this method of data collection
...
Use of mechanical devices: The use of mechanical devices has been widely made to collect
information by way of indirect means
...
Eye cameras are designed to record the focus of eyes of a respondent on a specific portion of a
sketch or diagram or written material
...
Pupilometric cameras record dilation of the pupil as a result of a visual stimulus
...
Psychogalvanometer is used for measuring
the extent of body excitement as a result of the visual stimulus
...
Influence of packaging or the information given on the label would stimulate a buyer to perform
certain physical movements which can easily be recorded by a hidden motion picture camera in the
shop’s four walls
...
A device is fitted in the television instrument itself to record
these changes
...
6
...
In projective
techniques the respondent in supplying information tends unconsciously to project his own attitudes
or feelings on the subject under study
...
The use of these techniques requires intensive specialised training
...
The stimuli may
arouse many different kinds of reactions
...
The stimulus may be a photograph, a picture, an inkblot and so on
...
in the context of some pre-established psychological conceptualisation of what the
individual’s responses to the stimulus mean
...
(i) Word association tests: These tests are used to extract information regarding such words which
have maximum association
...
If the
interviewer says cold, the respondent may say hot and the like ones
...
Analysis of the matching words supplied by the respondents
indicates whether the given word should be used for the contemplated purpose
...
A number of qualities of a product may be listed and informants may be asked to write
108
Research Methodology
brand names possessing one or more of these
...
This technique is frequently used in advertising research
...
Under this, informant may be asked to complete a sentence (such as: persons who
wear Khadi are
...
Several
sentences of this type might be put to the informant on the same subject
...
This technique permits the
testing not only of words (as in case of word association tests), but of ideas as well and thus, helps in
developing hypotheses and in the construction of questionnaires
...
(iii) Story completion tests: Such tests are a step further wherein the researcher may contrive
stories instead of sentences and ask the informants to complete them
...
(iv) Verbal projection tests: These are the tests wherein the respondent is asked to comment on or
to explain what other people do
...
(v) Pictorial techniques: There are several pictorial techniques
...
A
...
): The TAT consists of a set of pictures (some of the
pictures deal with the ordinary day-to-day events while others may be ambiguous pictures
of unusual situations) that are shown to respondents who are asked to describe what they
think the pictures represent
...
(b) Rosenzweig test: This test uses a cartoon format wherein we have a series of cartoons
with words inserted in ‘balloons’ above
...
From what the respondents
write in this fashion, the study of their attitudes can be made
...
The design happens
to be symmetrical but meaningless
...
This test is frequently used but the problem
of validity still remains a major problem of this test
...
H
...
This test consists of 45 inkblot cards (and not 10 inkblots
as we find in case of Rorschach Test) which are based on colour, movement, shading and
other factors involved in inkblot perception
...
Form responses are interpreted for knowing the accuracy (F) or
inaccuracy (F–) of respondent’s percepts; shading and colour for ascertaining his affectional
and emotional needs; and movement responses for assessing the dynamic aspects of his life
...
I
...
has several special features or advantages
...
Secondly, it facilitates studying
the responses of a respondent to different cards in the light of norms of each card instead of
lumping them together
...
There are some limitations of this test as well
...
This fact
emphasises that the test must be administered individually and a post-test inquiry must as well
be conducted for knowing the nature and sources of responses and this limits the scope of
HIT as a group test of personality
...
is still to be established
...
For
instance, Fisher and Cleveland in their approach for obtaining Barrier score of an individual’s
personality have developed a series of multiple choice items for 40 of HIT cards
...
Subject taking the test is to check the choice he likes most, make a different mark
against the one he likes least and leave the third choice blank
...
(e) Tomkins-Horn picture arrangement test: This test is designed for group administration
...
The respondent is asked to arrange them in
a sequence which he considers as reasonable
...
(vi) Play techniques: Under play techniques subjects are asked to improvise or act out a situation
in which they have been assigned various roles
...
These techniques have been used for
knowing the attitudes of younger ones through manipulation of dolls
...
The manner in
which children organise dolls would indicate their attitude towards the class of persons represented
by dolls
...
The choice of colour, form, words, the sense of orderliness and other reactions may provide opportunities
to infer deep-seated feelings
...
In this procedure both long and short questions are framed to
test through them the memorising and analytical ability of candidates
...
In an indirect way, sociometry attempts to describe attractions or repulsions between
1
S
...
Dass, “Personality Assessment Through Projective Movie Pictures”, p
...
110
Research Methodology
individuals by asking them to indicate whom they would choose or reject in various situations
...
“Under this an
attempt is made to trace the flow of information amongst groups and then examine the ways in which
new ideas are diffused
...
”2 Sociograms
are charts that depict the sociometric choices
...
This approach
has been applied to the diffusion of ideas on drugs amongst medical practitioners
...
Depth interviews: Depth interviews are those interviews that are designed to discover underlying
motives and desires and are often used in motivational research
...
In other words, they aim to elicit unconscious as also
other types of material relating especially to personality dynamics and motivations
...
Unless the researcher has specialised training, depth interviewing should not be attempted
...
The difference
lies in the nature of the questions asked
...
Thus, for instance, the informant may be asked on his frequency of air travel and he might
again be asked at a later stage to narrate his opinion concerning the feelings of relatives of some
other man who gets killed in an airplane accident
...
If the depth interview involves questions of such type, the same may be
treated as projective depth interview
...
8
...
Content-analysis prior to 1940’s was mostly quantitative analysis of
documentary materials concerning certain characteristics that can be identified and counted
...
“The difference is somewhat like that between a casual interview and
depth interviewing
...
the latter type of contentanalysis
...
Content analysis measures
pervasiveness and that is sometimes an index of the intensity of the force
...
A review of research in any area, for instance, involves the analysis
of the contents of research articles that have been published
...
It is at a simple level when we pursue it on the basis of certain
characteristics of the document or verbal materials that can be identified and counted (such as on the
basis of major scientific concepts in a book)
...
2
G
...
Giles, Marketing, p
...
Carter V
...
Scates, Methods of Research, p
...
4
Ibid
...
670
...
e
...
When the researcher utilises secondary data, then he
has to look into various sources from where he can obtain them
...
Secondary
data may either be published data or unpublished data
...
; (f) reports prepared by research
scholars, universities, economists, etc
...
The sources of unpublished data are many;
they may be found in diaries, letters, unpublished biographies and autobiographies and also may be
available with scholars and research workers, trade associations, labour bureaus and other public/
private individuals and organisations
...
He must make a minute scrutiny
because it is just possible that the secondary data may be unsuitable or may be inadequate in the
context of the problem which the researcher wants to study
...
A
...
Bowley
very aptly observes that it is never safe to take published statistics at their face value without knowing
their meaning and limitations and it is always necessary to criticise arguments that can be based on
them
...
Reliability of data: The reliability can be tested by finding out such things about the said data:
(a) Who collected the data? (b) What were the sources of data? (c) Were they collected by using
proper methods (d) At what time were they collected?(e) Was there any bias of the compiler?
(t) What level of accuracy was desired? Was it achieved ?
2
...
Hence, if the available data are found to be unsuitable, they should not be
used by the researcher
...
Similarly, the object, scope and nature of the original enquiry must also be studied
...
3
...
The data will also be considered inadequate, if they are related to an area which may be either
narrower or wider than the area of the present enquiry
...
The already
available data should be used by the researcher only when he finds them reliable, suitable and
adequate
...
At times, there may be wealth of
usable information in the already available data which must be used by an intelligent researcher but
with due precaution
...
As such the researcher must judiciously select
the method/methods for his own study, keeping in view the following factors:
1
...
The method selected should be such that it suits the type of enquiry
that is to be conducted by the researcher
...
2
...
When funds at the disposal of the researcher are
very limited, he will have to select a comparatively cheaper method which may not be as efficient
and effective as some other costly method
...
3
...
Some methods take relatively more time, whereas with others the data can be
collected in a comparatively shorter duration
...
4
...
But one must always remember that each method of data collection has its uses and none is
superior in all situations
...
In case funds permit
and more information is desired, personal interview method may be said to be relatively better
...
When funds are ample, time is also
ample and much information with no precision is to be collected, then either personal interview or the
mail-questionnaire or the joint use of these two methods may be taken as an appropriate method of
collecting data
...
The secondary data may be used in case the researcher finds them reliable, adequate
and appropriate for his research
...
Such techniques are of immense value in case the reason is
obtainable from the respondent who knows the reason but does not want to admit it or the reason
relates to some underlying psychological attitude and the respondent is not aware of it
...
Since projective
techniques are as yet in an early stage of development and with the validity of many of them remaining
an open question, it is usually considered better to rely on the straight forward statistical methods
with only supplementary use of projective techniques
...
Thus, the most desirable approach with regard to the selection of the method depends on the
nature of the particular problem and on the time and resources (money and personnel) available
along with the desired degree of accuracy
...
Dr
...
L
...
”
CASE STUDY METHOD
Meaning: The case study method is a very popular form of qualitative analysis and involves a
careful and complete observation of a social unit, be that unit a person, a family, an institution, a
cultural group or even the entire community
...
The
case study places more emphasis on the full analysis of a limited number of events or conditions and
their interrelations
...
Thus, case study is essentially an intensive investigation of the particular unit under consideration
...
According to H
...
”5 Thus, a fairly exhaustive study of a person (as to what he does and has
done, what he thinks he does and had done and what he expects to do and says he ought to do) or
group is called a life or case history
...
”6 Pauline V
...
”7 In brief, we can say
that case study method is a form of qualitative analysis where in careful and complete observation of
an individual or a situation or an institution is done; efforts are made to study each and every aspect
of the concerning unit in minute details and then from case data generalisations and inferences are
drawn
...
Under this method the researcher can take one single social unit or more of such units for
his study purpose; he may even take a situation to study the same comprehensively
...
Here the selected unit is studied intensively i
...
, it is studied in minute details
...
5
H
...
229
...
26 in Georges Gurvitch and W
...
Moore (Eds
...
7
Pauline V
...
247
...
In the context of this method we make complete study of the social unit covering all facets
...
4 Under this method the approach happens to be qualitative and not quantitative
...
Every possible effort is made to collect information
concerning all aspects of life
...
For instance, under this method we not only study how many crimes
a man has done but shall peep into the factors that forced him to commit crimes when we
are making a case study of a man as a criminal
...
5
...
6
...
7
...
In its
absence, generalised social science may get handicapped
...
The credit for introducing this method to the field of social investigation goes
to Frederic Le Play who used it as a hand-maiden to statistics in his studies of family budgets
...
Dr
...
Similarly, anthropologists, historians,
novelists and dramatists have used this method concerning problems pertaining to their areas of
interests
...
In brief, case study method is being used in several disciplines
...
Assumptions: The case study method is based on several assumptions
...
(ii) The assumption of studying the natural history of the unit concerned
...
Major phases involved: Major phases involved in case study are as follows:
(i) Recognition and determination of the status of the phenomenon to be investigated or the
unit of attention
...
(iii) Diagnosis and identification of causal factors as a basis for remedial or developmental
treatment
...
e
...
Methods of Data Collection
115
(v) Follow-up programme to determine effectiveness of the treatment applied
...
Mention may be made here of the important advantages
...
In the words of Charles Horton Cooley,
“case study deepens our perception and gives us a clearer insight into life…
...
”
(ii) Through case study a researcher can obtain a real and enlightened record of personal
experiences which would reveal man’s inner strivings, tensions and motivations that drive
him to action along with the forces that direct him to adopt a certain pattern of behaviour
...
(iv) It helps in formulating relevant hypotheses along with the data which may be helpful in
testing them
...
(v) The method facilitates intensive study of social units which is generally not possible if we
use either the observation method or the method of collecting information through schedules
...
(vi) Information collected under the case study method helps a lot to the researcher in the task
of constructing the appropriate questionnaire or schedule for the said task requires thorough
knowledge of the concerning universe
...
In other words, the use of different
methods such as depth interviews, questionnaires, documents, study reports of individuals,
letters, and the like is possible under case study method
...
This is the reason why at times the case study
method is alternatively known as “mode of organising data”
...
Besides, it is also a technique to suggest measures for improvement
in the context of the present environment of the concerned social units
...
(xi) Case study method enhances the experience of the researcher and this in turn increases
his analysing ability and skill
...
On account of the minute study of
the different facets of a social unit, the researcher can well understand the social change
then and now
...
In fact, it may be considered the gateway to and at the
same time the final destination of abstract knowledge
...
They
are also of immense value in taking decisions regarding several management problems
...
Limitations: Important limitations of the case study method may as well be highlighted
...
Since the subject under case study tells history in his own words,
logical concepts and units of scientific classification have to be read into it or out of it by the
investigator
...
”8 Real information is often not collected because the subjectivity of the
researcher does enter in the collection of information in a case study
...
(iv) It consumes more time and requires lot of expenditure
...
(v) The case data are often vitiated because the subject, according to Read Bain, may write
what he thinks the investigator wants; and the greater the rapport, the more subjective the
whole process is
...
(vii) Case study method can be used only in a limited sphere
...
Sampling is also not possible under a case study method
...
He often
thinks that he has full knowledge of the unit and can himself answer about it
...
In fact, this is more the fault of the researcher
rather than that of the case method
...
Most of the limitations can be removed if researchers are always
conscious of these and are well trained in the modern methods of collecting case data and in the
scientific techniques of assembling, classifying and processing the same
...
Possibly, this is also the reason why case studies are becoming popular day by day
...
Enumerate the different methods of collecting data
...
8
Pauline V
...
262
...
“It is never safe to take published statistics at their face value without knowing their meaning and
limitations
...
Illustrate your answer by examples wherever possible
...
Examine the merits and limitations of the observation method in collecting material
...
4
...
5
...
6
...
7
...
8
...
9
...
10
...
(ii) Data collection through projective techniques is considered relatively more reliable
...
11
...
Explain fully the survey method of research
...
Phi
...
1987 Raj
...
]
12
...
” Discuss, what are the problems in
the introduction of this research design in business organisation?
[M
...
A
...
1985 Raj
...
]
118
Research Methodology
Appendix (i)
Guidelines for Constructing
Questionnaire/Schedule
The researcher must pay attention to the following points in constructing an appropriate and effective
questionnaire or a schedule:
1
...
He must be clear about the various aspects
of his research problem to be dealt with in the course of his research project
...
Appropriate form of questions depends on the nature of information sought, the sampled
respondents and the kind of analysis intended
...
Questions should be simple and must be constructed with a
view to their forming a logical part of a well thought out tabulation plan
...
3
...
Questionnaires or schedules previously drafted (if available)
may as well be looked into at this stage
...
Researcher must invariably re-examine, and in case of need may revise the rough draft for
a better one
...
5
...
The questionnaire may
be edited in the light of the results of the pilot study
...
Questionnaire must contain simple but straight forward directions for the respondents so
that they may not feel any difficulty in answering the questions
...
However, the following points may be kept in
view by an interviewer for eliciting the desired information:
1
...
He must choose a suitable time and place so that the interviewee may be at ease during the
interview period
...
2
...
Initially friendly greetings in
accordance with the cultural pattern of the interviewee should be exchanged and then the
purpose of the interview should be explained
...
All possible effort should be made to establish proper rapport with the interviewee; people
are motivated to communicate when the atmosphere is favourable
...
Interviewer must know that ability to listen with understanding, respect and curiosity is the
gateway to communication, and hence must act accordingly during the interview
...
5
...
But the interviewer must
control the course of the interview in accordance with the objective of the study
...
In case of big enquiries, where the task of collecting information is to be accomplished by
several interviewers, there should be an interview guide to be observed by all so as to
ensure reasonable uniformity in respect of all salient points in the study
...
(ii) Survey-type research studies usually have larger samples because the percentage of
responses generally happens to be low, as low as 20 to 30%, especially in mailed questionnaire
studies
...
As against this, experimental studies generally
need small samples
...
The researcher does not manipulate the variable or arrange for
events to happen
...
They are primarily concerned with the present but at times do consider
past events and influences as they relate to current conditions
...
Experimental research provides a systematic and logical method for answering the question,
“What will happen if this is done when certain variables are carefully controlled or
manipulated?” In fact, deliberate manipulation is a part of the experimental method
...
(iv) Surveys are usually appropriate in case of social and behavioural sciences (because many
types of behaviour that interest the researcher cannot be arranged in a realistic setting)
where as experiments are mostly an essential feature of physical and natural sciences
...
Appendix (iii): Difference Between Survey and Experiment
121
(vi) Surveys are concerned with hypothesis formulation and testing the analysis of the relationship
between non-manipulated variables
...
After experimenters define a problem, they propose a hypothesis
...
The confirmation or rejection is always stated in terms of probability
rather than certainty
...
The ultimate
purpose of experimentation is to generalise the variable relationships so that they may be
applied outside the laboratory to a wider population of interest
...
They may also be classified as social
surveys, economic surveys or public opinion surveys
...
Case study method can as well be used
...
(viii) In case of surveys, research design must be rigid, must make enough provision for protection
against bias and must maximise reliability as the aim happens to be to obtain complete and
accurate information
...
(ix) Possible relationships between the data and the unknowns in the universe can be studied
through surveys where as experiments are meant to determine such relationships
...
*
John W
...
Kahn, “Research in Education”, 5th ed
...
Ltd
...
111
...
This is essential for a scientific study and
for ensuring that we have all relevant data for making contemplated comparisons and analysis
...
The term analysis refers to the computation of certain
measures along with searching for patterns of relationship that exist among data-groups
...
1 But there are persons (Selltiz, Jahoda and others) who do
not like to make difference between processing and analysis
...
We, however, shall prefer to observe the difference between the two terms as
stated here in order to understand their implications more clearly
...
1
...
As a matter of fact, editing involves
a careful scrutiny of the completed questionnaires and/or schedules
...
With regard to points or stages at which editing should be done, one can talk of field editing and
central editing
...
B
...
44
...
This type of editing is necessary in view of the
fact that individual writing styles often can be difficult for others to decipher
...
While doing field editing, the investigator must restrain himself and must not correct errors of omission
by simply guessing what the informant would have said if the question had been asked
...
This type of editing implies that all forms should get a thorough editing by a single editor
in a small study and by a team of editors in case of a large inquiry
...
In case of inappropriate on missing replies, the editor can sometimes
determine the proper answer by reviewing the other information in the schedule
...
The editor must strike out the answer if the same is
inappropriate and he has no basis for determining the correct answer or the response
...
All the wrong replies, which are quite obvious, must be
dropped from the final results, especially in the context of mail surveys
...
(b) While crossing out an original entry for one reason or another, they
should just draw a single line on it so that the same may remain legible
...
(d) They should
initial all answers which they change or supply
...
2
...
Such classes should be appropriate
to the research problem under consideration
...
e
...
Another rule to be observed is that of unidimensionality by which is meant that every class is defined
in terms of only one concept
...
Coding decisions
should usually be taken at the designing stage of the questionnaire
...
But in case of hand coding some standard
method may be used
...
The
other method can be to transcribe the data from the questionnaire to a coding sheet
...
3
...
This fact necessitates classification
of data which happens to be the process of arranging data in groups or classes on the basis of
common characteristics
...
Classification can be one of the
following two types, depending upon the nature of the phenomenon involved:
(a) Classification according to attributes: As stated above, data are classified on the basis
of common characteristics which can either be descriptive (such as literacy, sex, honesty,
etc
...
Descriptive characteristics refer
to qualitative phenomenon which cannot be measured quantitatively; only their presence or
absence in an individual item can be noticed
...
Such classification can be simple classification or manifold classification
...
But in manifold classification we consider
two or more attributes simultaneously, and divide that data into a number of classes (total
number of classes of final order is given by 2n, where n = number of attributes considered)
...
(b) Classification according to class-intervals: Unlike descriptive characteristics, the
numerical characteristics refer to quantitative phenomenon which can be measured through
some statistical units
...
come under this
category
...
For instance, persons whose incomes, say, are within Rs 201 to Rs 400 can
form one group, those whose incomes are within Rs 401 to Rs 600 can form another group
and so on
...
’ Each group of class-interval, thus, has an upper
limit as well as a lower limit which are known as class limits
...
We may have classes with equal class
magnitudes or with unequal class magnitudes
...
All the classes or groups, with their
respective frequencies taken together and put in the form of a table, are described as group
frequency distribution or simply frequency distribution
...
The decision
about this calls for skill and experience of the researcher
...
Typically, we may have 5 to 15 classes
...
Hence the
*
Classes of the final order are those classes developed on the basis of ‘n’ attributes considered
...
, class AB, class Ab, class aB, and class ab
...
Multiples
of 2, 5 and 10 are generally preferred while determining class magnitudes
...
A
...
3 log N)
where
i = size of class interval;
R = Range (i
...
, difference between the values of the largest item and smallest item
among the given items);
N = Number of items to be grouped
...
Such intervals may be expressed like under Rs 500 or Rs 10001 and over
...
The researcher must always remain conscious of this fact
while deciding the issue of the total number of class intervals in which the data are to be classified
...
Consistent with this, the class limits should be located at multiples of 2, 5, 10, 20, 100
and such other figures
...
For example, an item whose
value is exactly 30 would be put in 30–40 class interval and not in 20–30 class interval
...
126
Research Methodology
Inclusive type class intervals: They are usually stated as follows:
11–20
21–30
31–40
41–50
In inclusive type class intervals the upper limit of a class interval is also included in the
concerning class interval
...
The stated upper limit of the class interval 11–20 is 20 but the real limit is
20
...
When the phenomenon under consideration happens to be a discrete one (i
...
, can be measured
and stated only in integers), then we should adopt inclusive type classification
...
*
(iii) How to determine the frequency of each class?
This can be done either by tally sheets or by mechanical aids
...
The general practice is that after every four small
vertical lines in a class group, the fifth line for the item falling in the same group, is
indicated as horizontal line through the said four lines and the resulting flower (IIII)
represents five items
...
An illustrative tally sheet can be shown as under:
Table 7
...
e
...
, sorting machines that are available for the
purpose
...
There are machines
*
The stated limits of class intervals are different than true limits
...
Processing and Analysis of Data
127
which can sort out cards at a speed of something like 25000 cards per hour
...
4
...
This procedure is referred to as
tabulation
...
e
...
In a broader sense, tabulation is an
orderly arrangement of data in columns and rows
...
1
...
3
...
It conserves space and reduces explanatory and descriptive statement to a minimum
...
It facilitates the summation of items and the detection of errors and omissions
...
Tabulation can be done by hand or by mechanical or electronic devices
...
In relatively large inquiries, we may use mechanical or computer tabulation
if other factors are favourable and necessary facilities are available
...
Hand tabulation may be done using the direct tally, the list and tally or the card
sort and count methods
...
Under this method, the codes are written on a sheet of paper, called tally sheet, and for
each response a stroke is marked against the code in which it falls
...
These groups of five are easy to count and the data are sorted against each code
conveniently
...
This way a large number of questionnaires can be listed on
one work sheet
...
The card sorting method is the most flexible
hand tabulation
...
Each hole stands for a code and when cards are stacked, a needle passes
through particular hole representing a particular code
...
In this way frequencies of various codes can be found out by the repetition of this technique
...
Tabulation may also be classified as simple and complex tabulation
...
Simple tabulation generally results
in one-way tables which supply answers to questions about one characteristic of data only
...
Two-way tables, three-way tables or
manifold tables are all examples of what is sometimes described as cross tabulation
...
Every table should have a clear, concise and adequate title so as to make the table intelligible
without reference to the text and this title should always be placed just above the body of
the table
...
Every table should be given a distinct number to facilitate easy reference
...
The column headings (captions) and the row headings (stubs) of the table should be clear
and brief
...
The units of measurement under each heading or sub-heading must always be indicated
...
Explanatory footnotes, if any, concerning the table should be placed directly beneath the
table, along with the reference symbols used in the table
...
Source or sources from where the data in the table have been obtained must be indicated
just below the table
...
Usually the columns are separated from one another by lines which make the table more
readable and attractive
...
8
...
9
...
10
...
Similarly,
percentages and/or averages must also be kept close to the data
...
It is generally considered better to approximate figures before tabulation as the same would
reduce unnecessary details in the table itself
...
In order to emphasise the relative significance of certain categories, different kinds of type,
spacing and indentations may be used
...
It is important that all column figures be properly aligned
...
14
...
15
...
16
...
If the data happen
to be very large, they should not be crowded in a single table for that would make the table
unwieldy and inconvenient
...
Total of rows should normally be placed in the extreme right column and that of columns
should be placed at the bottom
...
Processing and Analysis of Data
129
18
...
Above all, the table must suit the needs
and requirements of an investigation
...
One category of such
responses may be ‘Don’t Know Response’ or simply DK response
...
But when it is relatively big, it becomes a matter of major concern
in which case the question arises: Is the question which elicited DK response useless? The answer
depends on two points viz
...
In the first case the concerned question is said to be
alright and DK response is taken as legitimate DK response
...
How DK responses are to be dealt with by researchers? The best way is to design better type of
questions
...
But what about the DK responses that have already taken place? One way to tackle this issue is to
estimate the allocation of DK answers from other data in the questionnaire
...
Yet another way is to assume that DK responses occur more or less randomly and as such
we may distribute them among the other answers in the ratio in which the latter have occurred
...
(b) Use or percentages: Percentages are often used in data presentation for they simplify numbers,
reducing all of them to a 0 to 100 range
...
While using
percentages, the following rules should be kept in view by researchers:
1
...
2
...
3
...
If this is not kept in view,
the real differences may not be correctly read
...
Percentage decreases can never exceed 100 per cent and as such for calculating the
percentage of decrease, the higher figure should invariably be taken as the base
...
Percentages should generally be worked out in the direction of the causal-factor in case of
two-dimension tables and for this purpose we must select the more significant factor out of
the two given factors as the causal factor
...
Analysis, particularly in case
of survey or experimental data, involves estimating the values of unknown parameters of the population
and testing of hypotheses for drawing inferences
...
“Descriptive
analysis is largely the study of distributions of one variable
...
Composition, efficiency, preferences, etc
...
this sort of analysis may be in respect of one
variable (described as unidimensional analysis), or in respect of two variables (described as bivariate
analysis) or in respect of more than two variables (described as multivariate analysis)
...
We may as well talk of correlation analysis and causal analysis
...
Causal analysis is concerned with the study of how one or more variables affect
changes in another variable
...
This analysis can be termed as regression analysis
...
In modern times, with the availability of computer facilities, there has been a rapid development
of multivariate analysis which may be defined as “all statistical methods which simultaneously
analyse more than two variables on a sample of observations”3
...
The objective of
this analysis is to make a prediction about the dependent variable based on its covariance with all the
concerned independent variables
...
The object of this analysis happens to be to predict an entity’s possibility of
belonging to a particular group based on several predictor variables
...
(d) Canonical analysis: This analysis can be used in case of both measurable and non-measurable
variables for the purpose of simultaneously predicting a set of dependent variables from their joint
covariance with a set of independent variables
...
William Emory, Business Research Methods, p
...
Jagdish N
...
35, No
...
1971), pp
...
*
Readers are referred to standard texts for more details about these analyses
...
It
is also concerned with the estimation of population values
...
e
...
STATISTICS IN RESEARCH
The role of statistics in research is to function as a tool in designing research, analysing its data and
drawing conclusions therefrom
...
Clearly the science of statistics cannot be ignored by any research worker, even though he may not
have occasion to use statistical methods in all their details and ramifications
...
Only after this
we can adopt the process of generalisation from small groups (i
...
, samples) to population
...
, descriptive statistics and inferential statistics
...
Inferential statistics are also known as sampling statistics
and are mainly concerned with two major type of problems: (i) the estimation of population parameters,
and (ii) the testing of statistical hypotheses
...
Amongst the measures of central tendency, the three most important ones are the arithmetic
average or mean, median and mode
...
From among the measures of dispersion, variance, and its square root—the standard deviation
are the most often used measures
...
are also
used
...
In respect of the measures of skewness and kurtosis, we mostly use the first measure of skewness
based on mean and mode or on mean and median
...
Kurtosis is also used to measure the
peakedness of the curve of the frequency distribution
...
Multiple correlation coefficient, partial correlation coefficient, regression
analysis, etc
...
Index numbers, analysis of time series, coefficient of contingency, etc
...
We give below a brief outline of some important measures (our of the above listed measures)
often used in the context of research studies
...
132
Research Methodology
MEASURES OF CENTRAL TENDENCY
Measures of central tendency (or statistical averages) tell us the point about which items have a
tendency to cluster
...
Measure of central tendency is also known as statistical average
...
Mean, also known as arithmetic average, is the most common
measure of central tendency and may be defined as the value which we get by dividing the total of
the values of various given items in a series by the total number of items
...
+ X n
= 1
n
n
where X = The symbol we use for mean (pronounced as X bar)
∑ = Symbol for summation
Xi = Value of the ith item X, i = 1, 2, …, n
n = total number of items
In case of a frequency distribution, we can work out mean in this way:
∑ fi Xi
f X + f 2 X 2 +
...
+ f n = n
Sometimes, instead of calculating the simple mean, as stated above, we may workout the weighted
mean for a realistic average
...
Its chief
use consists in summarising the essential features of a series and in enabling data to be compared
...
It is a relatively
stable measure of central tendency
...
, it is unduly affected by
extreme items; it may not coincide with the actual value of an item in a series, and it may lead to
wrong impressions, particularly when the item values are not given with the average
...
Median is the value of the middle item of series when it is arranged in ascending or descending
order of magnitude
...
If the values of the items arranged
in the ascending order are: 60, 74, 80, 90, 95, 100, then the value of the 4th item viz
...
We can also write thus:
Mean (or X ) * =
*
If we use assumed average A, then mean would be worked out as under:
X = A+
b
g or
∑ Xi − A
n
method of finding X
...
This is also known as short cut
∑ fi Xi − A
∑ fi
Processing and Analysis of Data
133
b g
Median M = Value of
FG n + 1IJ th item
H 2 K
Median is a positional average and is used only in the context of qualitative phenomena, for
example, in estimating intelligence, etc
...
Median is
not useful where items need to be assigned relative importance and weights
...
Mode is the most commonly or frequently occurring value in a series
...
In general, mode is the size of the item
which has the maximum frequency, but at items such an item may not be mode on account of the
effect of the frequencies of the neighbouring items
...
it is, therefore, useful in all situations where we want to
eliminate the effect of extreme variations
...
For example, a manufacturer of shoes is usually interested in finding out the size most in demand so
that he may manufacture a larger quantity of that size
...
but there are certain limitations of
mode as well
...
It is considered unsuitable in
cases where we want to give relative importance to items under consideration
...
It is defined as the nth root of the
product of the values of n times in a given series
...
M
...
X n
where
G
...
= geometric mean,
n = number of items
...
M
...
6
...
e
...
Harmonic mean is defined as the reciprocal of the average of reciprocals of the values of items
of a series
...
M
...
= Rec
...
X 1 + Rec
...
+ Rec
...
M
...
= Reciprocal
Xi = ith value of the variable X
n = number of items
For instance, the harmonic mean of the numbers 4, 5, and 10 is worked out as
H
...
= Rec
1/ 4 + 1/ 5 + 1/ 10
= Rec
3
= Rec
15 + 12 + 6
60
3
FG 33 × 1IJ = 60 = 5
...
The harmonic mean gives largest weight to the smallest item and smallest weight to the largest item
...
From what has been stated above, we can say that there are several types of statistical averages
...
There are no hard and fast rules for the
selection of a particular average in statistical analysis for the selection of an average mostly depends
on the nature, type of objectives of the research study
...
The chief characteristics and the limitations of the
various averages must be kept in view; discriminate use of average is very essential for sound
statistical analysis
...
Specially it fails to give any idea about the scatter of
the values of items of a variable in the series around the true value of average
...
Important measures of
dispersion are (a) range, (b) mean deviation, and (c) standard deviation
...
Thus,
FH
IK FH
IK
Highest
Lowest
Range = item in a value of an − item in avalue of an
series
series
The utility of range is that it gives an idea of the variability very quickly, but the drawback is that
range is affected very greatly by fluctuations of sampling
...
As such, range is mostly used as a rough measure of variability and
is not considered as an appropriate measure in serious research studies
...
Such a difference is technically described as deviation
...
Mean
deviation is, thus, obtained as under:
Processing and Analysis of Data
135
c h
Mean deviation from mean δ X =
∑ Xi − X
, if deviations, X i − X , are obtained from
n
or
b g
Mean deviation from median δ m =
arithmetic average
...
where
δ = Symbol for mean deviation (pronounced as delta);
Xi = ith values of the variable X;
n = number of items;
X = Arithmetic average;
M = Median;
Z = Mode
...
Coefficient of mean deviation
is a relative measure of dispersion and is comparable to similar measure of other series
...
It is a better measure of variability than range as it takes into consideration the values of
all items of a series
...
(c) Standard deviation is most widely used measure of dispersion of a series and is commonly
denoted by the symbol ‘ σ ’ (pronounced as sigma)
...
It is worked out as under:
bg
Standard deviation σ =
*
d
∑ Xi − X
i
2
n
*
If we use assumed average, A, in place of X while finding deviations, then standard deviation would be worked out as
under:
σ=
b
g − F ∑b X − AgI
GH n JK
∑ Xi − A
n
2
2
i
Or
σ=
b
g − F ∑ f b X − AgI
GH ∑ f JK
∑ fi Xi − A
∑ fi
2
i
i
2
, in case of frequency distribution
...
136
Research Methodology
Or
bg
Standard deviation σ =
d
∑ fi Xi − X
∑ fi
i
2
, in case of frequency distribution
where fi means the frequency of the ith item
...
When this coefficient of standard deviation
is multiplied by 100, the resulting figure is known as coefficient of variation
...
The standard deviation (along with several related measures like variance, coefficient of variation,
etc
...
It is amenable to mathematical manipulation because the algebraic signs are not ignored
in its calculation (as we ignore in case of mean deviation)
...
These advantages make standard deviation and its coefficient a very popular measure of
the scatteredness of a series
...
MEASURES OF ASYMMETRY (SKEWNESS)
When the distribution of item in a series happens to be perfectly symmetrical, we then have the
following type of curve for the distribution:
(X = M = Z )
Curve showing no skewness in which case we have X = M = Z
Fig
...
1
Such a curve is technically described as a normal curve and the relating distribution as normal
distribution
...
But if the curve is distorted (whether on the right
side or on the left side), we have asymmetrical distribution which indicates that there is skewness
...
7
...
In a symmetrical distribution, the items show a perfect balance on either side of
the mode, but in a skew distribution the balance is thrown to one side
...
The difference between the
mean, median or the mode provides an easy way of expressing skewness in a series
...
Usually we measure skewness in this way:
Skewness = X – Z and its coefficient (j) is worked
X −Z
σ
out as j =
In case Z is not well defined, then we work out skewness as under:
Skewness = 3( X – M) and its coefficient (j) is worked
out as j =
d
3 X − M
σ
i
The significance of skewness lies in the fact that through it one can study the formation of series
and can have the idea about the shape of the curve, whether normal or otherwise, when the items of
a given series are plotted on a graph
...
A bell shaped curve or the normal curve
is Mesokurtic because it is kurtic in the centre; but if the curve is relatively more peaked than the
normal curve, it is called Leptokurtic whereas a curve is more flat than the normal curve, it is called
Platykurtic
...
It may be pointed out here that knowing the shape of the distribution curve is crucial to the use of
statistical methods in research analysis since most methods make specific assumptions about the
nature of the distribution curve
...
e
...
But if we have the data on two
variables, we are said to have a bivariate population and if the data happen to be on more than two
variables, the population is known as multivariate population
...
In addition, we may also have a corresponding value of the third variable, Z, or
the forth variable, W, and so on, the resulting pairs of values are called a multivariate population
...
We may like to know, for example, whether the number of
hours students devote for studies is somehow related to their family income, to age, to sex or to
similar other factor
...
Thus we
have to answer two types of questions in bivariate or multivariate populations viz
...
There are several methods of applying the two techniques, but the important
ones are as under:
In case of bivariate population: Correlation can be studied through (a) cross tabulation;
(b) Charles Spearman’s coefficient of correlation; (c) Karl Pearson’s coefficient of correlation;
whereas cause and effect relationship can be studied through simple regression equations
...
We can now briefly take up the above methods one by one
...
Under it we
classify each variable into two or more categories and then cross classify the variables in these subcategories
...
A symmetrical relationship is one in which the two variables vary together, but we
assume that neither variable is due to the other
...
Asymmetrical relationship is said to exist if one variable
(the independent variable) is responsible for another variable (the dependent variable)
...
This sort of analysis can be further elaborated in which
case a third factor is introduced into the association through cross-classifying the three variables
...
The correlation, if any, found through this approach is not considered a very
Processing and Analysis of Data
139
powerful form of statistical correlation and accordingly we use some other methods when data
happen to be either ordinal or interval or ratio data
...
The main objective of this coefficient is to determine
the extent to which the two sets of ranking are similar or dissimilar
...
As rank correlation is a non-parametric technique for measuring relationship between paired
observations of two variables when data are in the ranked form, we have dealt with this technique in
greater details later on in the book in chapter entitled ‘Hypotheses Testing II (Non-parametric tests)’
...
This coefficient assumes the following:
(i) that there is linear relationship between the two variables;
(ii) that the two variables are casually related which means that one of the variables is
independent and the other one is dependent; and
(iii) a large number of independent causes are operating in both variables so as to produce a
normal distribution
...
Karl Pearson’s coefficient of correlation (or r)* =
*
d
∑ Xi − X
i dY − Y i
n ⋅ σ X ⋅ σY
Alternatively, the formula can be written as:
r=
d
i dY − Y i
∑d X − X i ⋅ ∑dY − Y i
∑ Xi − X
i
2
i
2
i
Or
r=
d
id
i
Covariance between X and Y ∑ X i − X Yi − Y / n
=
σx ⋅ σy
σx ⋅ σy
Or
r=
∑ X i Yi − n ⋅ X ⋅ Y
∑ X i2 − nX 2
i
∑ Yi 2 − nY 2
(This applies when we take zero as the assumed mean for both variables, X and Y
...
This is the short cut approach for finding ‘r’ in case of ungrouped data
...
e
...
Processing and Analysis of Data
141
Karl Pearson’s coefficient of correlation is also known as the product moment correlation
coefficient
...
Positive values of r indicate positive correlation
between the two variables (i
...
, changes in both variables take place in the statement direction),
whereas negative values of ‘r’ indicate negative correlation i
...
, changes in the two variables taking
place in the opposite directions
...
When r = (+) 1, it indicates perfect positive correlation and when it is (–)1, it indicates
perfect negative correlation, meaning thereby that variations in independent variable (X) explain
100% of the variations in the dependent variable (Y)
...
But if such change occurs in the opposite
direction, the correlation will be termed as perfect negative
...
SIMPLE REGRESSION ANALYSIS
Regression is the determination of a statistical relationship between two or more variables
...
Regression can only interpret what exists
physically i
...
, there must be a physical way in which independent variable X can affect dependent
variable Y
...
This equation is known
as the regression equation of Y on X (also represents the regression line of Y on X when drawn on a
graph) which means that each unit change in X produces a change of b in Y, which is positive for
direct and negative for inverse relationships
...
To use it efficiently, we first determine
∑ xi2 = ∑ X i2 − nX 2
∑ yi2 = ∑ Yi 2 − nY 2
∑ xi yi = ∑ X i Yi − nX ⋅ Y
Then
b=
∑ xi yi
∑ xi2
, a = Y − bX
These measures define a and b which will give the best possible fit through the original X and Y
points and the value of r can then be worked out as under:
r=
b ∑ xi2
∑ yi2
142
Research Methodology
Thus, the regression analysis is a statistical method to deal with the formulation of mathematical
model depicting relationship amongst variables which can be used for the purpose of prediction of the
values of dependent variable, given the values of the independent variable
...
, a and b by using the following two
normal equations:
∑ Yi = na + b ∑ X i
∑ X i Yi = a ∑ X i + b ∑ X i2
and then solving these equations for finding a and b values
...
In a similar fashion, we can develop the regression equation of X and Y viz
...
MULTIPLE CORRELATION AND REGRESSION
When there are two or more than two independent variables, the analysis concerning relationship is
known as multiple correlation and the equation describing such relationship as the multiple regression
equation
...
In this situation the results are interpreted as shown below:
Multiple regression equation assumes the form
$
Y = a + b1X1 + b2X2
where X1 and X2 are two independent variables and Y being the dependent variable, and the constants
a, b1 and b2 can be solved by solving the following three normal equations:
∑Yi = na + b1 ∑ X 1i + b2 ∑ X 2i
2
∑ X 1i Yi = a ∑ X 1i + b1 ∑ X 1i + b2 ∑ X 1i X 2i
2
∑ X 2i Yi = a ∑ X 2i + b1 ∑ X 1i X 2i + b2 ∑ X 2i
(It may be noted that the number of normal equations would depend upon the number of
independent variables
...
)
In multiple regression analysis, the regression coefficients (viz
...
, X1, X2) increases
...
In such a situation we should use only one set of the
independent variable to make our estimate
...
Nevertheless, the
prediction for the dependent variable can be made even when multicollinearity is present, but in such
a situation enough care should be taken in selecting the independent variables to estimate a dependent
variable so as to ensure that multi-collinearity is reduced to the minimum
...
The collective effect is given by the coefficient of multiple correlation,
R y ⋅ x1 x2 defined as under:
R y ⋅ x1 x2 =
b1 ∑Yi X 1i − nY X 1 + b2 ∑Yi X 2i − nY X 2
∑Yi 2 − nY 2
Alternatively, we can write
R y ⋅ x1 x2 =
b1 ∑ x1i yi + b2 ∑ x2i yi
∑Yi 2
where
x1i = (X1i – X 1 )
x2i = (X2i – X 2 )
yi = (Yi – Y )
and b1 and b2 are the regression coefficients
...
In other words, in partial correlation analysis, we
aim at measuring the relation between a dependent variable and a particular independent variable by
holding all other variables constant
...
To obtain it, it is first necessary to compute the
simple coefficients of correlation between each set of pairs of variables as stated earlier
...
Also,
ryx2 ⋅ x1 =
2
2
R y ⋅ x1 x2 − ryx1
2
1 − ryx1
in which X1 and X2 are simply interchanged, given the added effect of X2 on Y
...
The partial correlation coefficients are called first order coefficients when one
variable is held constant as shown above; they are known as second order coefficients when two
variables are held constant and so on
...
It is not necessary that the objects may process only one attribute;
rather it would be found that the objects possess more than one attribute
...
For
example, among a group of people we may find that some of them are inoculated against small-pox
and among the inoculated we may observe that some of them suffered from small-pox after inoculation
...
In other words,
we may be interested in knowing whether inoculation and immunity from small-pox are associated
...
The association may be positive or negative (negative association is also known as disassociation)
...
In case the class frequency of AB is equal to expectation, the two attributes are considered
as independent i
...
, are said to have no association
...
b Ag × b Bg × N , then AB are negatively related/associated
...
e
...
If AB =
Processing and Analysis of Data
145
Where (AB) = frequency of class AB and
b Ag × b B g × N
N
N
= Expectation of AB, if A and B are independent, and N being the number of
items
In order to find out the degree or intensity of association between two or more sets of attributes, we
should work out the coefficient of association
...
It can be mentioned as under:
Q AB =
b ABg babg − b Abg baBg
b ABg babg + b Abg baBg
where,
QAB = Yule’s coefficient of association between attributes A and B
...
(Ab) = Frequency of class Ab in which A is present but B is absent
...
(ab) = Frequency of class ab in which both A and B are absent
...
If the attributes are completely
associated (perfect positive association) with each other, the coefficient will be +1, and if they are
completely disassociated (perfect negative association), the coefficient will be –1
...
The varying degrees
of the coefficients of association are to be read and understood according to their positive and
negative nature between +1 and –1
...
For example, we may observe positive association between inoculation
and exemption for small-pox, but such association may be the result of the fact that there is positive
association between inoculation and richer section of society and also that there is positive association
between exemption from small-pox and richer section of society
...
We can workout the coefficient of partial association
between A and B in the population of C by just modifying the above stated formula for finding
association between A and B as shown below:
ABC abC − AbC aBC
Q AB
...
C = Coefficient of partial association between A and B in the population of C; and all other
values are the class frequencies of the respective classes (A, B, C denotes the presence
of concerning attributes and a, b, c denotes the absence of concerning attributes)
...
This sort of association may be the result of
146
Research Methodology
some attribute, say C with which attributes A and B are associated (but in reality there is no association
between A and B)
...
Researcher must
remain alert and must not conclude association between A and B when in fact there is no such
association in reality
...
Association between two attributes in case of manifold classification and the resulting contingency
table can be studied as explained below:
We can have manifold classification of the two attributes in which case each of the two attributes
are first observed and then each one is classified into two or more subclasses, resulting into what is
called as contingency table
...
Table 7
...
For instance, if we combine (A1) + (A2) to form (A) and (A3) + (A4) to form
(a) and similarly if we combine (B1) + (B2) to form (B) and (B3) + (B4) to form (b) in the above
contingency table, then we can write the table in the form of a 2 × 2 table as shown in Table 4
...
Processing and Analysis of Data
147
Table 7
...
But the practice of combining classes
is not considered very correct and at times it is inconvenient also, Karl Pearson has suggested a
measure known as Coefficient of mean square contingency for studying association in contingency
tables
...
This is considered a satisfactory measure of studying association in contingency tables
...
Index numbers: When series are expressed in same units, we can use averages for the purpose
of comparison, but when the units in which two or more series are expressed happen to be different,
statistical averages cannot be used to compare them
...
Once such method
is to convert the series into a series of index numbers
...
We can, thus, define an index
number as a number which is used to measure the level of a given phenomenon as compared to the
level of the same phenomenon at some standard date
...
But one must always remember that index numbers measure only the
relative changes
...
Different indices serve different purposes
...
Index numbers may measure
cost of living of different classes of people
...
But index numbers have their own limitations with which researcher must always keep himself
aware
...
Chances of error also remain at one point or the other
while constructing an index number but this does not diminish the utility of index numbers for they still
can indicate the trend of the phenomenon being measured
...
2
...
Such data is labelled as
‘Time Series’
...
Such series are usually the result of
the effects of one or more of the following factors:
(i) Secular trend or long term trend that shows the direction of the series in a long period of
time
...
Sometimes, secular trend is simply stated as trend (or T)
...
e
...
(b) Seasonal fluctuations (or S) are of short duration occurring in a regular sequence at
specific intervals of time
...
Usually
these fluctuations involve patterns of change within a year that tend to be repeated
from year to year
...
(c) Irregular fluctuations (or I), also known as Random fluctuations, are variations which
take place in a completely unpredictable fashion
...
To
study the effect of one type of factor, the other type of factor is eliminated from the series
...
For analysing time series, we usually have two models; (1) multiplicative model; and (2) additive
model
...
Processing and Analysis of Data
149
Additive model considers the total of various components resulting in the given values of the
overall time series and can be stated as:
Y=T+C+S+I
There are various methods of isolating trend from the given series viz
...
The analysis of time series is done to understand the dynamic conditions for achieving the shortterm and long-term goals of business firm(s)
...
On the basis of past trends, the future
patterns can be predicted and policy or policies may accordingly be formulated
...
By studying cyclical variations, we can keep in view the impact of
cyclical changes while formulating various policies to make them as realistic as possible
...
Thus, analysis of time
series is important in context of long term as well as short term forecasting and is considered a very
powerful tool in the hands of business analysts and researchers
...
“Processing of data implies editing, coding, classification and tabulation”
...
2
...
, how many classes should
be there? How to choose class limits? How to determine class frequency? State how these problems
should be tackled by a researcher
...
Why tabulation is considered essential in a research study? Narrate the characteristics of a good table
...
(a) How the problem of DK responses should be dealt with by a researcher? Explain
...
Write a brief note on different types of analysis of data pointing out the significance of each
...
What do you mean by multivariate analysis? Explain how it differs from bivariate analysis
...
How will you differentiate between descriptive statistics and inferential statistics? Describe the important
statistical measures often used to summarise the survey/research data
...
What does a measure of central tendency indicate? Describe the important measures of central tendency
pointing out the situation when one measure is considered relatively appropriate in comparison to other
measures
...
Describe the various measures of relationships often used in context of research studies
...
Write short notes on the following:
(i) Cross tabulation;
(ii) Discriminant analysis;
150
Research Methodology
(iii) Coefficient of contingency;
(iv) Multicollinearity;
(v) Partial association between two attributes
...
“The analysis of time series is done to understand the dynamic conditions for achieving the short-term
and long-term goals of business firms
...
12
...
Explain this statement pointing out the utility of index numbers
...
Distinguish between:
(i) Field editing and central editing;
(ii) Statistics of attributes and statistics of variables;
(iii) Exclusive type and inclusive type class intervals;
(iv) Simple and complex tabulation;
(v) Mechanical tabulation and cross tabulation
...
“Discriminate use of average is very essential for sound statistical analysis”
...
15
...
Analysis of Data
(in a broad general way can be categorised into)
Analysis of Data
(Analysis proper)
Processing of Data
(Preparing data for analysis)
Editing
Descriptive and Causal Analyses
Coding
Classification
Tabulation
Using percentages
(Calculation of several measures
mostly concerning one variable)
(i) Measures of Central Tendency;
(ii) Measures of dispersion;
(iii) Measures of skewness;
(iv) One-way ANOVA, Index numbers,
Time series analysis; and
(v) Others (including simple correlation
and regression in simple classification
of paired data)
Uni-dimensional
analysis
Bivariate
analysis
(Analysis of
two variables
or attributes
in a two-way
classification)
Simple regression* and
simple correlation (in
respect of variables)
Association of attributes
(through coefficient of
association and coefficient
of contingency)
Two-way ANOVA
Inferential analysis/Statistical analysis
Multi-variate
analysis
(simultaneous
analysis of
more than
two variables/
attributes in
a multiway
classification)
Estimation of
parameter values
Point
estimate
Interval
estimate
Testing
hypotheses
Parametric
tests
Appendix: Developing a Research Plan
Appendix
(Summary chart concerning analysis of data)
Nonparametric
tests or
Distribution
free tests
Multiple regression* and multiple correlation/
partial correlation in respect of variables
Multiple discriminant analysis (in respect of
attributes)
Multi-ANOVA (in respect of variables)
Canonical analysis (in respect of both variables
and attributes)
(Other types of analyses (such as factor analysis,
cluster analysis)
* Regression analysis (whether simple or multiple) is termed as Causal analysis whereas correlation analysis indicates simply co-variation between two or
more variables
...
In other words, it is the
process of obtaining information about an entire population by examining only a part of it
...
The researcher quite often selects only a few items from the universe for his study purposes
...
The items so selected constitute what is technically called a sample, their selection process or technique
is called sample design and the survey conducted on the basis of sample is described as sample
survey
...
NEED FOR SAMPLING
Sampling is used in practice for a variety of reasons such as:
1
...
A sample study is usually less expensive than a census
study and produces results at a relatively faster speed
...
Sampling may enable more accurate measurements for a sample study is generally conducted
by trained and experienced investigators
...
Sampling remains the only way when population contains infinitely many members
...
Sampling remains the only choice when a test involves the destruction of the item under
study
...
Sampling usually enables to estimate the sampling errors and, thus, assists in obtaining
information concerning some characteristic of the population
...
Sampling Fundamentals
153
1
...
The attributes that are the object of study are referred to as
characteristics and the units possessing them are called as elementary units
...
Thus, all units in any field of inquiry constitute universe and
all elementary units (on the basis of one characteristic or more) constitute population
...
However, a researcher must necessarily define these terms precisely
...
The population is said to be finite if it
consists of a fixed number of elements so that it is possible to enumerate it in its totality
...
The
symbol ‘N’ is generally used to indicate how many elements (or items) are there in case of a finite
population
...
Thus, in an infinite population the number of items is infinite i
...
, we cannot have any
idea about the total number of items
...
One should remember that no truly infinite population of physical
objects does actually exist in spite of the fact that many such populations appear to be very very
large
...
This way we use the theoretical concept of
infinite population as an approximation of a very large finite population
...
Sampling frame: The elementary units or the group or cluster of such units may form the basis
of sampling process in which case they are called as sampling units
...
Thus sampling frame consists of a list of items from which the
sample is to be drawn
...
In most cases they are not identical because
it is often impossible to draw a sample directly from population
...
For
instance, one can use telephone directory as a frame for conducting opinion survey in a city
...
3
...
It refers to the technique or the procedure the researcher would adopt in selecting some
sampling units from which inferences about the population is drawn
...
Various sampling designs have already been explained earlier in the
book
...
Statisitc(s) and parameter(s): A statistic is a characteristic of a sample, whereas a parameter is
a characteristic of a population
...
But when such measures describe the characteristics of a population, they are known
as parameter(s)
...
To obtain the estimate of a parameter from a statistic constitutes the prime
objective of sampling analysis
...
Sampling error: Sample surveys do imply the study of a small portion of the population and as
such there would naturally be a certain amount of inaccuracy in the information collected
...
In other words, sampling errors are
bg
154
Research Methodology
those errors which arise on account of sampling and they generally happen to be random variations
(in case of random sampling) in the sample estimates around the true population values
...
(If we add measurement error or the non-sampling error
to sampling error, we get total error)
Fig
...
1
Sampling error = Frame error + Chance error + Response error
(If we add measurement error or the non-sampling error to sampling error, we get total error)
...
The magnitude of
the sampling error depends upon the nature of the universe; the more homogeneous the universe, the
smaller the sampling error
...
e
...
A measure of the random sampling
error can be calculated for a given sample design and size and this measure is often called the
precision of the sampling plan
...
As opposed to sampling errors, we may have non-sampling errors which may creep in during the
process of collecting actual information and such errors occur in all surveys whether census or
sample
...
6
...
For instance, if the estimate is Rs 4000 and the precision desired is
± 4%, then the true value will be no less than Rs 3840 and no more than Rs 4160
...
But if we desire that the estimate
Sampling Fundamentals
155
should not deviate from the actual value by more than Rs 200 in either direction, in that case the
range would be Rs 3800 to Rs 4200
...
Confidence level and significance level: The confidence level or reliability is the expected
percentage of times that the actual value will fall within the stated precision limits
...
95 in 1) that the
sample results represent the true condition of the population within a specified precision range against
5 chances in 100 (or
...
Precision is the range within which the answer may
vary and still be acceptable; confidence level indicates the likelihood that the answer will fall within
that range, and the significance level indicates the likelihood that the answer will fall outside that
range
...
e
...
e
...
We should also remember that the area of normal curve within precision limits for the specified
confidence level constitute the acceptance region and the area of the curve outside these limits in
either direction constitutes the rejection regions
...
Sampling distribution: We are often concerned with sampling distribution in sampling analysis
...
, then we can find that each sample may give its own value for
the statistic under consideration
...
Accordingly, we can have sampling distribution of mean, or the sampling distribution of standard
deviation or the sampling distribution of any other statistical measure
...
The sampling distribution tends quite
closer to the normal distribution if the number of samples is large
...
Thus, the mean of the sampling distribution can be taken as the mean of the universe
...
A brief mention of each one of these sampling distribution will be helpful
...
Sampling distribution of mean: Sampling distribution of mean refers to the probability distribution
of all the possible means of random samples of a given size that we take from a population
...
But
when sampling is from a population which is not normal (may be positively or negatively skewed),
even then, as per the central limit theorem, the sampling distribution of mean tends quite closer to the
normal distribution, provided the number of sample items is large i
...
, more than 30
...
e
...
156
Research Methodology
x−µ
normal variate z =
for the sampling distribution of mean
...
2
...
This happens in case of statistics of attributes
...
Usually the statistics of attributes correspond to the
conditions of a binomial distribution that tends to become normal distribution as n becomes larger and
larger
...
e
...
e
...
Presuming the binomial distribution approximating the normal distribution for large
distribution of proportion of successes has a mean = p with standard deviation =
n, the normal variate of the sampling distribution of proportion z =
$
p− p
b p⋅q g n
$
, where p (pronounced
as p-hat) is the sample proportion of successes, can be used for testing of hypotheses
...
Student’s t-distribution: When population standard deviation σ p is not known and the sample
b
g
is of a small size i
...
, n < 30 , we use t distribution for the sampling distribution of mean and
workout t variable as:
d
t = X −µ
where σ s =
d
Σ Xi − X
n
i
i eσ
s
/ n
j
2
−1
i
...
, the sample standard deviation
...
The variable t differs from z in the sense
that we use sample standard deviation σ s in the calculation of t, whereas we use standard deviation
d i
b g
of population σ p in the calculation of z
...
e
...
The degrees of freedom for a sample of size n is n – 1
...
In fact for sample sizes of more than 30, the t distribution is so close to the normal
distribution that we can use the normal to approximate the t-distribution
...
The t-distribution tables are available which give the critical values of t for different degrees of
freedom at various levels of significance
...
*
b g
b g
2
2
4
...
F ratio is computed in a way that the larger variance is always in the numerator
...
The calculated value of F from the sample data is compared with
the corresponding table value of F and if the former is equal to or exceeds the latter, then we infer
that the null hypothesis of the variances being equal cannot be accepted
...
e j
5
...
Variances of samples require us to add a collection
of squared quantities and thus have distributions that are related to chi-square distribution
...
Thus, σ s / σ p
j bn − 1g would have the same distribution as chi-square
distribution with (n – 1) degrees of freedom
...
One must know the degrees of freedom for using chi-square distribution
...
The generalised shape of χ distribution depends
2
upon the d
...
and the χ value is worked out as under:
2
χ =
k
∑
i =1
bO − E g
i
i
2
Ei
2
Tables are there that give the value of χ for given d
...
which may be used with calculated value of
χ 2 for relevant d
...
at a desired level of significance for testing hypotheses
...
CENTRAL LIMIT THEOREM
When sampling is from a normal population, the means of samples drawn from such a population are
themselves normally distributed
...
158
Research Methodology
sample plays a critical role
...
The theorem which explains this sort of relationship between the shape of the population distribution
and the sampling distribution of the mean is known as the central limit theorem
...
It assures that the sampling distribution of the
mean approaches normal distribtion as the sample size increases
...
”1
“The significance of the central limit theorem lies in the fact that it permits us to use sample
statistics to make inferences about population parameters without knowing anything about the shape
of the frequency distribution of that population other than what we can get from the sample
...
Sampling theory is applicable only to random samples
...
In other
words, a universe is the complete group of items about which knowledge is sought
...
finite universe is one which has a definite and certain number of items, but
when the number of items is uncertain and infinite, the universe is said to be an infinite universe
...
In the former case the universe in fact does
not exist and we can only imagin the items constituting it
...
Existent universe is a universe of concrete objects i
...
, the universe
where the items constituting it really exist
...
The theory of sampling studies the
relationships that exist between the universe and the sample or samples drawn from it
...
The theory of sampling is concerned with estimating the properties of the population from
those of the sample and also with gauging the precision of the estimate
...
In more clear terms “from the sample we attempt to draw inference concerning the
universe
...
”3 The methodology dealing
with all this is known as sampling theory
...
Harnett and James L
...
223
...
Levin, Statistics for Management, p
...
3
J
...
Chaturvedi: Mathematical Statistics, p
...
2
Sampling Fundamentals
159
(i) Statistical estimation: Sampling theory helps in estimating unknown population parameters from
a knowledge of statistical measures based on sample studies
...
The estimate can either be a
point estimate or it may be an interval estimate
...
, the upper limit and the lower limit
within which the parameter value may lie
...
(ii) Testing of hypotheses: The second objective of sampling theory is to enable us to decide
whether to accept or reject hypothesis; the sampling theory helps in determining whether observed
differences are actually due to chance or whether they are really significant
...
It also helps in determining the accuracy
of such generalisations
...
, the sampling of attributes and the
sampling of variables and that too in the context of large and small samples (By small sample is
commonly understood any sample that includes 30 or fewer items, whereas alarge sample is one in
which the number of items is more than 30)
...
The presence of an attribute may be termed as a ‘success’ and its absence a ‘failure’
...
In such a situation we would say that
sample consists of 600 items (i
...
, n = 600) out of which 120 are successes and 480 failures
...
2 (i
...
, p = 0
...
8
...
If n is large, the binomial distribution tends to become normal distribution
which may be used for sampling analysis
...
(ii) The parameter value is not known and we have to estimate it from the sample
...
e
...
All the above stated problems are studied using the appropriate standard errors and the tests of
significance which have been explained and illustrated in the pages that follow
...
e
...
160
Research Methodology
The tests of significance used for dealing with problems relating to large samples are different
from those used for small samples
...
In case of large samples, we assume that the sampling
distribution tends to be normal and the sample values are approximately close to the population
values
...
When n is large, the probability of a sample value of the statistic deviating from the parameter by
more than 3 times its standard error is very small (it is 0
...
Appropriate standard errors have to be worked out which will enable us to give the
limits within which the parameter values would lie or would enable us to judge whether the difference
happens to be significant or not at certain confidence levels
...
73% confidence
...
The sampling theory for large samples is not applicable in small samples because when samples
are small, we cannot assume that the sampling distribution is approximately normal
...
Sir William S
...
Student’s t-test is used when two conditions are fulfilled viz
...
While using t-test we assume that
the population from which sample has been taken is normal or approximately normal, sample is a
random sample, observations are independent, there is no measurement error and that in the case of
two samples when equality of the two population means is to be tested, we assume that the population
variances are equal
...
e
...
If the calculated value of ‘t’ is either equal to or exceeds the table value, we
infer that the difference is significant, but if calculated value of t is less than the concerning table
value of t, the difference is not treated as significant
...
*
The z-test may as well be applied in case of small sample provided we are given the variance of the population
...
f
...
(iii) To test the significance of the coefficient of simple correlation
t=
r
1− r
2
×
n − 2 or t = r
n−2
1 − r2
where
r = the coefficient of simple correlation
and the d
...
= (n – 2)
...
f
...
(v) To test the difference in case of paired or correlated samples data (in which case t test is
ofter described as difference test)
t=
where
D − µD
σD
n
i
...
,
t=
D−0
σD
n
b g
Hypothesised mean difference µ D is taken as zero (0),
D = Mean of the differences of correlated sample items
σ D = Standard deviation of differences worked out as under
σD =
Σ Di2 − D n
n−1
Di = Differences {i
...
, Di = (Xi – Yi)}
n = number of pairs in two samples and the d
...
= (n – 1)
...
His approach
is described as Sandler’s A-test that serves the same purpose as is accomplished by t-test relating to
paired data
...
e
...
Psychologists generally use this
test in case of two groups that are matched with respect to some extraneous variable(s)
...
A-statistic is
found as follows:
A=
the sum of squares of the differences
ΣDi2
=
2
the squares of the sum of the differences
ΣDi
b g
The number of degrees of freedom (d
...
) in A-test is the same as with Student’s t-test i
...
,
d
...
= n – 1, n being equal to the number of pairs
...
f
...
One has to compare the computed value of A with its corresponding table value for drawing inference
concerning acceptance or rejection of null hypothesis
...
But if the calculated value of A is more than its table value, then A-statistic is taken as
insignificant and accordingly we accept H0
...
, t and A are
inversely related
...
As such the use of A-statistic
result in considerable saving of time and labour, specially when matched groups are to be compared
with respect to a large number of variables
...
Sandler’s A-statistic can as well be used “in the one sample case as a direct substitute for the
Student t-ratio
...
When we use A-test in one sample case, the following steps are involved:
b g
(i) Subtract the hypothesised mean of the population µ H from each individual score (Xi) to
obtain Di and then work out ΣDi
...
J Psych
...
225–226
...
4
Richard P
...
28
Sampling Fundamentals
163
(ii) Square each Di and then obtain the sum of such squares i
...
, Σ Di2
...
(v) Finally, draw the inference as under:
When calculated value of A is equal to or less than the table value, then reject H0 (or accept
Ha) but when computed A is greater than its table value, then accept H0
...
5 of Chapter IX of this book itself
...
E) and
is considered the key to sampling theory
...
The standard error helps in testing whether the difference between observed and expected
frequencies could arise due to chance
...
E
...
E
...
This criterion is based on the fact that at X ± 3 (S
...
) the normal curve
covers an area of 99
...
Sometimes the criterion of 2 S
...
is also used in place of 3 S
...
Thus the standard error is an important measure in significance tests or in examining hypotheses
...
96 times the S
...
, the
difference is taken as significant at 5 per cent level of significance
...
e
...
5 per cent on both sides) outside
the 95 per cent area of the sampling distribution
...
In such a situation our hypothesis that there
is no difference is rejected at 5 per cent level of significance
...
96
times the S
...
, then it is considered not significant at 5 per cent level and we can say with 95 per cent
confidence that it is because of the fluctuations of sampling
...
1
...
The product of the critical value at a certain
level of significance and the S
...
is often described as ‘Sampling Error’ at that particular level of
significance
...
The following table gives some idea about the criteria at various levels for
judging the significance of the difference between observed and expected values:
164
Research Methodology
Table 8
...
0%
95
...
96
196σ
...
> 196σ
...
1
...
0%
2
...
5758 σ
± 2
...
5758 σ
< 2
...
7%
99
...
55%
95
...
2
...
The smaller the
S
...
, the greater the uniformity of sampling distribution and hence, greater is the reliability of sample
...
E
...
In such a situation the unreliability of the sample is greater
...
E
...
If double
reliability is required i
...
, reducing S
...
to 1/2 of its existing magnitude, the sample size should be
increased four-fold
...
The standard error enables us to specify the limits within which the parameters of the population
are expected to lie with a specified degree of confidence
...
The following table gives the percentage of samples having their mean values
bg
within a range of population mean µ ± S
...
Table 8
...
E
...
27%
µ ± 2 S
...
95
...
E
...
73%
µ ± 196 S
...
...
00%
µ ± 2
...
E
...
00%
Important formulae for computing the standard errors concerning various measures based on
samples are as under:
(a) In case of sampling of attributes:
(i) Standard error of number of successes =
where
n⋅ p⋅q
n = number of events in each sample,
p = probability of success in each event,
q = probability of failure in each event
...
Such a situation often arises
in study of association of attributes
...
(ii) Standard error of mean when population standard deviation is unknown:
σX =
σs
n
where
σ s = standard deviation of the sample and is worked out as under
σs=
d
Σ Xi − X
n = number of items in the sample
...
(v) Standard error of the coeficient of simple correlation:
σr =
1 − r2
n
where
r = coefficient of simple correlation
n = number of items in the sample
...
)
(b) When two samples are drawn from different populations:
d σ i + dσ i
2
σ X1 − X 2 =
p1
n1
2
p2
n2
(If σ p1 and σ p2 are not known, then in their places σ s1 and σ s2 respectively may
be substituted
...
But in case of finite population where sampling is done
without replacement and the sample is more than 5% of the population, we must as well use the finite
population multiplier in our standard error formulae
...
E
...
E
...
5, the finite population multiplier is generally not used
...
d
Σ Xi − X
σX =
σs
n
=
n−1
i
2
n
(ii) Standard error of difference between two sample means when σ p is unknown
σX
1
− X2
=
d
Σ X1i − X1
i
2
d
+ Σ X2i − X2
n1 + n2 − 2
i
2
⋅
1
1
+
n1
n2
ESTIMATION
In most statistical research studies, population parameters are usually unknown and have to be
estimated from a sample
...
The random variables (such as X and σ 2 ) used to estimate population parameters, such as
s
µ and σ 2 are conventionally called as ‘estimators’, while specific values of these (such as X = 105
p
or σ 2 = 2144 ) are referred to as ‘estimates’ of the population parameters
...
s
population parameter may be one single value or it could be a range of values
...
The
168
Research Methodology
researcher usually makes these two types of estimates through sampling analysis
...
Accordingly he must know the various properties of a good
estimator so that he can select appropriate estimators for his study
...
This is popularly known as the property of unbiasedness
...
The sample mean X is he most widely used estimator because of the fact that it provides
an unbiased estimate of the population mean µ
...
This means that the most efficient
estimator, among a group of unbiased estimators, is one which has the smallest variance
...
(iii) An estimator should use as much as possible the information available from the sample
...
(iv) An estimator should approach the value of population parameter as the sample size becomes
larger and larger
...
Keeping in view the above stated properties, the researcher must select appropriate
estimator(s) for his study
...
d i
bg
ESTIMATING THE POPULATION MEAN (µ)
So far as the point estimate is concerned, the sample mean X is the best estimator of the population
mean, µ , and its sampling distribution, so long as the sample is sufficiently large, approximates the
normal distribution
...
Assume that we take a sample of 36
students and find that the sample yields an arithmetic mean of 6
...
e
...
2
...
5 this time
...
9; fourth a mean of 6
...
We go on drawing such samples till we accumulate a large number of means of samples
of 36
...
When such
means are presented in the form of a distribution, the distribution happens to be quite close to normal
...
Even
if the population is not normal, the sample means drawn from that population are dispersed around
the parameter in a distribution that is generally close to normal; the mean of the distribution of sample
means is equal to the population mean
...
This relationship between a population distribution and a distribution of sample
5
C
...
145
Sampling Fundamentals
169
mean is critical for drawing inferences about parameters
...
How to find σ p when we have the sample data only for our analysis? The answer is that we must
use some best estimate of σ p and the best estimate can be the standard deviation of the sample,
σ s
...
Suppose we take one sample of
36 items and work out its mean X to be equal to 6
...
8, Then the best point estimate of population mean µ is 6
...
The standard error of mean
σ X would be 38 36 = 38 / 6 = 0
...
If we take the interval estimate of µ to be
...
X ± 1
...
20 ± 124 or from 4
...
44, it means that there is a 95 per cent chance that
...
96 to 7
...
In other words, this means that if we were to take
a complete census of all items in the population, the chances are 95 to 5 that we would find the
population mean lies between 4
...
44*
...
Usually
we think of increasing the sample size till we can secure the desired interval estimate and the degree
of confidence
...
5 years respectively
...
Solution: The given information can be written as under:
6
d
To make the sample standard deviation an unbiased estimate of the population, it is necessary to divide Σ Xi − X
i
2
by (n – 1) and not by simply (n)
...
170
Research Methodology
n = 36
X = 40 years
σ s = 4
...
96 (as per the normal curve area table)
...
5
or
40 ± 1
...
96 0
...
47 years
36
b gb g
Illustration 2
In a random selection of 64 of the 2400 intersections in a small city, the mean number of scooter
accidents per year was 3
...
8
...
(2) Work out the standard error of mean for this finite population
...
90, what will be the upper and lower limits of the confidence
interval for the mean number of accidents per intersection per year?
Solution: The given information can be written as under:
N = 2400 (This means that population is finite)
n = 64
X = 3
...
8
and the standard variate (z) for 90 per cent confidence is 1
...
Now we can answer the given questions thus:
(1) The best point estimate of the standard deviation of the population is the standard deviation
of the sample itself
...
8
(2) Standard error of mean for the given finite population is as follows:
σX =
σs
×
n
N−n
N −1
Sampling Fundamentals
171
=
=
0
...
8
64
×
2400 − 64
2400 − 1
×
2336
2399
= (0
...
97)
=
...
2 ± b1
...
097g
X±z
s
= 3
...
16 accidents per intersection
...
But how to handle estimation problem when population standard deviation is not known and
the sample size is small (i
...
, when n < 30 )? In such a situation, normal distribution is not appropriate,
but we can use t-distribution for our purpose
...
There is a different t-distribution for each of the possible degrees of
freedom
...
Let
us illustrate this by taking an example
...
8 tons per shift and the sample standard deviation to be 2
...
Construct a 90 per cent confidence interval around this estimate
...
The
given information can be written as under:
X = 36
...
8 tons per shift
n=4
degrees of freedom = n – 1 = 4 – 1 = 3 and the critical value of ‘t’ for 90 per cent confidence interval
or at 10 per cent level of significance is 2
...
f
...
172
Research Methodology
Thus, 90 per cent confidence interval for population mean is
X±t
= 36
...
353
2
...
8 ± 2
...
= 36
...
294 tons per shift
...
Thus, if we take a
random sample of 50 items and find that 10 per cent of these are defective i
...
, p =
...
10) as best estimator of the population proportion p = p =
...
In
case we want to construct confidence interval to estimate a population poportion, we should use the
binomial distribution with the mean of population µ = n ⋅ p , where n = number of trials, p =
bg
b
g
bg
probability of a success in any of the trials and population standard deviation = n p q
...
The mean of the sampling distribution of the
proportion of successes ( µ p ) is taken as equal to p and the standard deviation for the proportion of
successes, also known as the standard error of proportion, is taken as equal to
p q n
...
Sampling Fundamentals
173
We now illustrate the use of this formula by an example
...
Find the confidence
limits for the proportion of consumers motivated by advertising in the population, given a confidence
level equal to 0
...
Solution: The given information can be written as under:
n = 64
p = 64% or
...
64 =
...
96 (as per the normal curve area table)
...
64g b0
...
64 ± 1
...
64 ± 196
...
=
...
1176
Thus, lower confidence limit is 52
...
76%
For the sake of convenience, we can summarise the formulae which give confidence intevals
$
while estimating population mean µ and the population proportion p as shown in the following
table
...
3: Summarising Important Formulae Concerning Estimation
In case of finite population*
In case of infinite
population
X ± z⋅
Estimating population mean
bg
σp
n
µ when we know σ p
X ± z⋅
Estimating population mean
σs
X ± z⋅
n
σp
×
×
n
σs
n
N−n
N−1
N−n
N−1
p
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
bµ g when we do not know σ
X ± z⋅
Contd
...
e
...
e
...
σs
n
pq
n
X±t⋅
p ± z⋅
σs
n
×
pq
×
n
N−n
N −1
N−n
N −1
* In case of finite population, the standard error has to be multiplied by the finite population multiplier viz
...
SAMPLE SIZE AND ITS DETERMINATION
In sampling analysis the most ticklish question is: What should be the size of the sample or how large
or small should be ‘n’? If the sample size (‘n’) is too small, it may not serve to achieve the objectives
and if it is too large, we may incur huge cost and waste resources
...
e
...
Technically, the sample size should be large enough to give a confidence inerval of desired width and
as such the size of the sample must be chosen by some logical process before sample is taken from
the universe
...
If
the items of the universe are homogenous, a small sample can serve the purpose
...
Technically, this can be termed
as the dispersion factor
...
(iii) Nature of study: If items are to be intensively and continuously studied, the sample should
be small
...
(iv) Type of sampling: Sampling technique plays an important part in determining the size of the
sample
...
Sampling Fundamentals
175
(v) Standard of accuracy and acceptable confidence level: If the standard of acuracy or
the level of precision is to be kept high, we shall require relatively larger sample
...
(vi) Availability of finance: In prctice, size of the sample depends upon the amount of money
available for the study purposes
...
(vii) Other considerations: Nature of units, size of the population, size of questionnaire, availability
of trained investigators, the conditions under which the sample is being conducted, the time
available for completion of the study are a few other considerations to which a researcher
must pay attention while selecting the size of the sample
...
The first approach
is “to specify the precision of estimation desired and then to determine the sample size necessary to
insure it” and the second approach “uses Bayesian statistics to weigh the cost of additional information
against the expected value of the additional information
...
The limitation
of this technique is that it does not analyse the cost of gathering information vis-a-vis the expected
value of information
...
Hence, we shall mainly concentrate
here on the first approach
...
Researcher will have to
specify the precision that he wants in respect of his estimates concerning the population parameters
...
In this case we will say that the desired precision is ± 3 , i
...
, if the
sample mean is Rs 100, the true value of the mean will be no less than Rs 97 and no more than
Rs 103
...
Keeping this in view,
we can now explain the determination of sample size so that specified precision is ensured
...
96 for a 95%
confidence level;
n = size of the sample;
7
Rodney D
...
Siskih, Quantitative Techniques for Business Decisions, p
...
176
Research Methodology
σ p = standard deviation of the popultion (to be estimated from past experience or on the basis of
a trial sample)
...
8 for our purpose
...
8
e = z⋅
or 3 = 1
...
96g b4
...
834 ≅ 10
...
Bu in case of finite population, the
above stated formula for determining sample size will become
n=
n=
z2 ⋅ N ⋅ σ 2 *
p
b N−1g e
2
+ z2 σ 2
p
* In case of finite population the confidence interval for µ is given by
X±z
where
σp
n
b N − ng
b N − 1g
×
b N − ng b N − 1g is the finite population multiplier and all other terms mean the same thing as stated above
...
Sampling Fundamentals
177
where
N = size of population
n = size of sample
e = acceptable error (the precision)
σ p = standard deviation of population
z = standard variate at a given confidence level
...
(2) estimate should be within 0
...
Will there be a change in the size of the sample if we assume infinite population in the given
case? If so, explain by how much?
Solution: In the given problem we have the following:
N = 5000;
σ p = 2 ounces (since the variance of weight = 4 ounces);
e = 0
...
8 ounces of the true average weight);
z = 2
...
Hence, the confidence interval for µ is given by
X± z ⋅
σp
n
N−n
N −1
⋅
and accordingly the sample size can be worked out as under:
n=
z2 ⋅ N ⋅σ2
p
b N − 1g e + z σ
b2
...
8g + b2
...
95 ≅ 41
3199
...
4196 3225
...
But if we take population to be infinite, the sample size will be worked
out as under:
178
Research Methodology
n=
z2 σ 2
p
e2
b2
...
8g
2
=
2
2
=
26
...
28 ~ 41
−
0
...
In the above illustration, the standard deviation of the population was given, but in many cases
the standard deviation of the population is not available
...
In such a situation, if we have an idea about the range (i
...
, the difference between the
highest and lowest values) of the population, we can use that to get a crude estimate of the standard
deviation of the population for geting a working idea of the required sample size
...
7 per cent of the area under normal curve lies within the range of ± 3 standard deviations,
we may say that these limits include almost all of the distribution
...
Thus, a rough estimate of the population
standard deviation would be:
$
6 σ = the given range
or
the given range
$
σ =
6
If the range happens to be, say Rs 12, then
$ 12 = Rs 2
...
(b) Sample size when estimating a percentage or proportion: If we are to find the sample size for
estimating a proportion, our reasoning remains similar to what we have said in the context of estimating
the mean
...
Sampling Fundamentals
179
$
Since p is actually what we are trying to estimate, then what value we should assign to it ? One
method may be to take the value of p = 0
...
This will be the most conservative sample size
...
In this context it has been suggested that a pilot study of something
like 225 or more items may result in a reasonable approximation of p value
...
But in case of finite population the above stated formula will be
changed as under:
n=
e2
b
z2 ⋅ p ⋅ q ⋅ N
N − 1 + z2 ⋅ p ⋅ q
g
Illustration 6
What should be the size of the sample if a simple random sample from a population of 4000 items is
to be drawn to estimate the per cent defective within 2 per cent of the true value with 95
...
02 (since the estimate should be within 2% of true value);
z = 2
...
5%)
...
02 (This may be on the basis of our experience or on the basis of past data or
may be the result of a pilot study)
...
005g b
...
02g b4000g
=
b
...
005g b
...
02g
2
2
=
2
3151699
...
=
= 187
...
0788
...
But if the population happens to be infinite, then our sample size will be as under:
z2 ⋅ p ⋅ q
n=
e2
b2
...
02g b1 −
...
02g
2
2
=
...
98 ~ 197
...
The reservation manager wants to be 95 per cent confident that the
percentage has been estimated to be within ± 3% of the true value
...
03 (since the estimate should be within 3% of the true value);
z = 1
...
As we want the most conservative sample size we shall take the value of p =
...
5
...
b g ⋅ b
...
5g =
...
11 ~ 1067
=
...
03g
2
2
Thus, the most conservative sample size needed for the problem is = 1067
...
The procedure for finding the optimal value of ‘n’ or the size of sample under this approach is as under:
Sampling Fundamentals
181
(i) Find the expected value of the sample information (EVSI)* for every possible n;
(ii) Also workout reasonably approximated cost of taking a sample of every possible n;
(iii) Compare the EVSI and the cost of the sample for every possible n
...
The computation of EVSI for every possible n and then comparing the same with the respective
cost is often a very cumbersome task and is generally feasible with mechanised or computer help
...
Questions
1
...
3
...
5
...
7
...
Explain the meaning and significance of the concept of “Standard Error’ in sampling analysis
...
State the reasons why sampling is used in the context of research studies
...
Distinguish between the following:
(a) Statistic and parameter;
(b) Confidence level and significance level;
(c) Random sampling and non-random sampling;
(d) Sampling of attributes and sampling of variables;
(e) Point estimate and interval estimation
...
500 articles were selected at random out of a batch containing 10000 articles and 30 were found defective
...
Estimate the population proportion at 95% confidence level
...
A smaple of 16 measurements of the diameter of a sphere gave a mean X = 4
...
08 inches
...
10
...
Show that the standard error of the population of bad ones in a sample of this size is 0
...
5 and 17
...
*
EVSI happens to be the difference between the expected value with sampling and the expected value without sampling
...
182
Research Methodology
11
...
Estimate the percentage of defective iron nails in the packet and assign limits within which the
percentage probably lies
...
A random sample of 200 measurements from an infinite population gave a mean value of 50 and a
standard deviation of 9
...
13
...
Deduce
that the percentage of bad mangoes in the consignment almost certainly lies between 31
...
75
given that the standard error of the proportion of bad mangoes in the sample 1/16
...
A random sample of 900 members is found to have a mean of 4
...
Can it be reasonably regarded as
a sample from a large population whose mean is 5 cms and variance is 4 cms?
15
...
To test this claim, 9 randomly selected
individuals were examined and the average excess weight was found to be 18 pounds
...
The foreman of a certain mining company has estimated the average quantity of ore extracted to be 34
...
8 tons per shift, based upon a random selection
of 6 shifts
...
17
...
(Is the sample representative of a large consignment with a
mean of 130 ml
...
? Mention the level of significance you use
...
A sample of 900 days is taken from meteorological records of a certain district and 100 of them are found
to be foggy
...
Suppose the following ten values represent random observations from a normal parent population:
2, 6, 7, 9, 5, 1, 0, 3, 5, 4
...
20
...
Set
98% confidence limits on the true proportion of all Playboy readers with this background
...
(a) What are the alternative approaches of determining a sample size? Explain
...
45%
probability
...
Phil
...
(EAFM) RAJ
...
1979]
22
...
(ii) Variance of weight of the cereal containers on the basis of past records = 8 kg
...
4 kg
...
(b)What would be the size of the sample if infinite universe is assumed in question number 22 (a) above?
23
...
If the Corporation wants to be 95% confident that the true mean of this year’s salesmen’s
income does not differ by more than 2% of the last year’s mean income of Rs 12,000, what sample size
would be required assuming the population standard deviation to be Rs 1500?
[M
...
(EAFM) Special Exam
...
Uni
...
Mr
...
He is interested in determining at a confidence
level of 95% what proportion (within plus or minus 4%), is defective
...
5 and q =
...
Sampling Fundamentals
183
25
...
How large should the sample size be for the team to be 98% certain that the sample
proportion of cure is within plus and minus 2% of the proportion of all cases that the drug will cure?
26
...
Kishore wants to determine the average time required to complete a job with which he is concerned
...
How large should the sample be so
that Mr
...
Its main function is to
suggest new experiments and observations
...
Decision-makers often face situations wherein they are
interested in testing hypotheses on the basis of available information and then take decisions on the
basis of such testing
...
Thus hypothesis testing enables us to make probability
statements about population parameter(s)
...
Before we explain how hypotheses are tested
through different tests meant for the purpose, it will be appropriate to explain clearly the meaning of
a hypothesis and the related concepts for better understanding of the hypothesis testing techniques
...
But for a researcher hypothesis is a formal question that he intends to
resolve
...
Quite often a research hypothesis is a predictive statement, capable of being tested
by scientific methods, that relates an independent variable to some dependent variable
...
”
These are hypotheses capable of being objectively verified and tested
...
Testing of Hypotheses I
185
Characteristics of hypothesis: Hypothesis must possess the following characteristics:
(i) Hypothesis should be clear and precise
...
(ii) Hypothesis should be capable of being tested
...
Some prior study may be done by
researcher in order to make hypothesis a testable one
...
”1
(iii) Hypothesis should state relationship between variables, if it happens to be a relational
hypothesis
...
A researcher must remember
that narrower hypotheses are generally more testable and he should develop such hypotheses
...
But one must remember that simplicity of hypothesis
has nothing to do with its significance
...
e
...
In other words, it should be one which judges accept
as being the most likely
...
One should not use
even an excellent hypothesis, if the same cannot be tested in reasonable time for one
cannot spend a life-time collecting data to test it
...
This means
that by using the hypothesis plus other known and accepted generalizations, one should be
able to deduce the original problem condition
...
BASIC CONCEPTS CONCERNING TESTING OF HYPOTHESES
Basic concepts in the context of testing of hypotheses need to be explained
...
If we are to compare method A with method B
about its superiority and if we proceed on the assumption that both methods are equally good, then
this assumption is termed as the null hypothesis
...
The
null hypothesis is generally symbolized as H0 and the alternative hypothesis as Ha
...
Then we would say that the null hypothesis is that the population mean is equal to the hypothesised
mean 100 and symbolically we can express as:
H0 : µ = µ H0 = 100
1
C
...
33
...
What we conclude rejecting the null hypothesis is known as alternative hypothesis
...
If we
accept H 0, then we are rejecting H a and if we reject H 0, then we are accepting H a
...
1
Alternative hypothesis
Ha : µ ≠ µ H0
To be read as follows
(The alternative hypothesis is that the population mean is not
equal to 100 i
...
, it may be more or less than 100)
Ha : µ > µ H0
(The alternative hypothesis is that the population mean is greater
than 100)
Ha : µ < µ H0
(The alternative hypothesis is that the population mean is less
than 100)
The null hypothesis and the alternative hypothesis are chosen before the sample is drawn (the researcher
must avoid the error of deriving hypotheses from the data that he collects and then testing the
hypotheses from the same data)
...
Thus, a null hypothesis represents the hypothesis
we are trying to reject, and alternative hypothesis represents all other possibilities
...
(c) Null hypothesis should always be specific hypothesis i
...
, it should not state about or
approximately a certain value
...
Why so? The answer is that on the assumption that null hypothesis is true, one
can assign the probabilities to different possible sample results, but this cannot be done if we proceed
with the alternative hypothesis
...
(b) The level of significance: This is a very important concept in the context of hypothesis testing
...
In case we take the significance level at 5 per cent, then this implies that H0 will be rejected
*
If a hypothesis is of the type µ = µ H0 , then we call such a hypothesis as simple (or specific) hypothesis but if it is
of the type µ ≠ µ H or µ > µ H or µ < µ H , then we call it a composite (or nonspecific) hypothesis
...
e
...
05 probability of occurring if H0
is true
...
Thus the
significance level is the maximum value of the probability of rejecting H0 when it is true and is usually
determined in advance before testing the hypothesis
...
e
...
e
...
For instance, if (H0 is that a certain lot is good (there are very few
defective items in it) against Ha) that the lot is not good (there are too many defective items in it),
then we must decide the number of items to be tested and the criterion for accepting or rejecting the
hypothesis
...
This
sort of basis is known as decision rule
...
We may reject H0 when H0 is true and we may accept H0 when in fact H0 is
not true
...
In other words, Type I
error means rejection of hypothesis which should have been accepted and Type II error means
accepting the hypothesis which should have been rejected
...
In a tabular form the said two errors can be presented as follows:
Table 9
...
If type I error is fixed at 5 per cent, it means that there are
about 5 chances in 100 that we will reject H0 when H0 is true
...
For instance, if we fix it at 1 per cent, we will say that the maximum
probability of committing Type I error would only be 0
...
But with a fixed sample size, n, when we try to reduce Type I error, the probability of committing
Type II error increases
...
There is a trade-off
between two types of errors which means that the probability of making one type of error can only
be reduced if we are willing to increase the probability of making the other type of error
...
If Type I error involves the time and
trouble of reworking a batch of chemicals that should have been accepted, whereas Type II error
means taking a chance that an entire group of users of this chemical compound will be poisoned, then
188
Research Methodology
in such a situation one should prefer a Type I error to a Type II error
...
2 Hence, in the testing of
hypothesis, one must make all possible effort to strike an adequate balance between Type I and Type
II errors
...
A two-tailed test rejects the null hypothesis if, say, the
sample mean is significantly higher or lower than the hypothesised value of the mean of the population
...
Symbolically, the twotailed test is appropriate when we have H0 : µ = µ H and Ha : µ ≠ µ H which may mean µ > µ H0
0
0
or µ < µ H0
...
475
of area
0
...
025 of area
0
...
95 or 95% of area
Z = –1
...
9
...
Levin, Statistics for Management, p
...
Also known as critical regions
...
96
Testing of Hypotheses I
189
Mathematically we can state:
Acceptance Region A : Z < 1
...
Rejection Region R : Z > 196
If the significance level is 5 per cent and the two-tailed test is to be applied, the probability of the
rejection area will be 0
...
025) and that of the
acceptance region will be 0
...
If we take µ = 100 and if our sample
mean deviates significantly from 100 in either direction, then we shall reject the null hypothesis; but
if the sample mean does not deviate significantly from µ , in that case we shall accept the null
hypothesis
...
A one-tailed test
would be used when we are to test, say, whether the population mean is either lower than or higher
than some hypothesised value
...
50 of
area
0
...
05 of area
Both taken together equals
0
...
645
H0
=m
Reject H0 if the sample mean
(X ) falls in this region
Fig
...
2
Mathematically we can state:
Acceptance Region A : Z > −1
...
190
Research Methodology
If our µ = 100 and if our sample mean deviates significantly from100 in the lower direction, we
shall reject H0, otherwise we shall accept H0 at a certain level of significance
...
05 of area in the left
tail as has been shown in the above curve
...
05 of area
0
...
05 of area
Both taken together equals
0
...
645
Reject H0 if the sample mean
falls in this region
Fig
...
3
Mathematically we can state:
Acceptance Region A : Z < 1
...
Rejection Region A : Z > 1645
If our µ = 100 and if our sample mean deviates significantly from 100 in the upward direction, we
shall reject H0, otherwise we shall accept the same
...
05 of area in the right-tail as has been shown in the
above curve
...
We only mean that there is no statistical evidence to reject it, but
we are certainly not saying that H0 is true (although we behave as if H0 is true)
...
In hypothesis testing the main question is: whether to accept the
null hypothesis or not to accept the null hypothesis? Procedure for hypothesis testing refers to all
those steps that we undertake for making a choice between the two actions i
...
, rejection and
acceptance of a null hypothesis
...
This means that hypotheses should be clearly stated,
considering the nature of the research problem
...
Mohan of the Civil Engineering
Department wants to test the load bearing capacity of an old bridge which must be more than 10
tons, in that case he can state his hypotheses as under:
Null hypothesis H0 : µ = 10 tons
Alternative Hypothesis Ha : µ > 10 tons
Take another example
...
To evaluate a state’s education system, the average score of 100 of the state’s students selected on
random basis was 75
...
In such a situation the hypotheses may be stated as under:
Null hypothesis H0 : µ = 80
Alternative Hypothesis Ha : µ ≠ 80
The formulation of hypotheses is an important step which must be accomplished with due care in
accordance with the object and nature of the problem under consideration
...
If Ha is of the type greater than (or of the type
lesser than), we use a one-tailed test, but when Ha is of the type “whether greater or smaller” then
we use a two-tailed test
...
Generally, in practice, either 5% level or 1% level is
adopted for the purpose
...
In brief, the level of
significance must be adequate in the context of the purpose and nature of enquiry
...
The choice generally remains
between normal distribution and the t-distribution
...
(iv) Selecting a random sample and computing an appropriate value: Another step is to select
a random sample(s) and compute an appropriate value from the sample data concerning the test
statistic utilizing the relevant distribution
...
(v) Calculation of the probability: One has then to calculate the probability that the sample result
would diverge as widely as it has from expectations, if the null hypothesis were in fact true
...
If the calculated probability is equal to or
smaller than the α value in case of one-tailed test (and α /2 in case of two-tailed test), then reject
the null hypothesis (i
...
, accept the alternative hypothesis), but if the calculated probability is greater,
then accept the null hypothesis
...
FLOW DIAGRAM FOR HYPOTHESIS TESTING
The above stated general procedure for hypothesis testing can also be depicted in the from of a flowchart for better understanding as shown in Fig
...
4:3
FLOW DIAGRAM FOR HYPOTHESIS TESTING
State H0 as well as Ha
Specify the level of
significance (or the a value)
Decide the correct sampling
distribution
Sample a random sample(s)
and workout an appropriate
value from sample data
Calculate the probability that sample
result would diverge as widely as it has
from expectations, if H0 were true
Is this probability equal to or smaller than
a value in case of one-tailed test and /2
a
in case of two-tailed test
Yes
No
Reject H0
Accept H0
thereby run the risk
of committing
Type I error
thereby run some
risk of committing
Type II error
Fig
...
4
3
Based on the flow diagram in William A
...
Irwin INC
...
48
...
The probability
of Type I error is denoted as α (the significance level of the test) and the probability of Type II error
is referred to as β
...
But what can we say about β ? We all know that
hypothesis test cannot be foolproof; sometimes the test does not reject H0 when it happens to be a
false one and this way a Type II error is made
...
Alternatively, we would like that 1 – β
(the probability of rejecting H0 when H0 is not true) to be as large as possible
...
e
...
0), we can infer that the test is working quite well, meaning thereby
that the test is rejecting H0 when it is not true and if 1 – β is very much nearer to 0
...
Accordingly 1 – β value is the measure of how well the test is working or what is technically
described as the power of the test
...
Thus
power curve of a hypothesis test is the curve that shows the conditional probability of rejecting H0 as
a function of the population parameter and size of the sample
...
In other words, the power
function of a test is that function defined for all values of the parameter(s) which yields the probability
that H0 is rejected and the value of the power function at a specific parameter point is called the
power of the test at that point
...
e
...
We know that this probability is simply the significance level of the test, and as such
the power curve of a test terminates at a point that lies at a height of α (the significance level)
directly over the population parameter
...
If power function is represented as H and operating characteristic function as L, then we have
L = 1 – H
...
How to compute the power of a test (i
...
, 1 – β ) can be explained through
examples
...
batch with a corresponding standard deviation of 5 lbs
...
of waste per batch
...
Compute the power of the test for µ = 16 lbs
...
would be affected?
194
Research Methodology
Solution: As we want to test the hypothesis that the average quantity of waste per batch of 60 lbs
...
, we can write
as under:
H0 : µ < 15 lbs
...
As Ha is one-sided, we shall use the one-tailed test (in the right tail because Ha is of more than type)
at 10% level for finding the value of standard deviate (z), corresponding to
...
28 as per normal curve area table
...
28 (α p / n )
Accept
e
X < 15 + 128 5/ 100
...
64
at 10% level of significance otherwise accept Ha
...
which does not come in the acceptance region as above
...
For finding
the power of the test, we first calculate β and then subtract it from one
...
We can now
write β = p (Accept H0 : µ < 15 µ = 16)
...
64 (at 10% level of significance), therefore β = p ( X < 15
...
2358
m=
X = 15
...
9
...
1
...
0
...
64 and 16 in the above curve first
by finding z and then using the area table for the purpose
...
64 − 16) / (5/ 100 ) = − 0
...
2642
...
5000 –
0
...
2358 and the power of the test = (1 – β ) = (1 –
...
7642 for µ = 16
...
84 5
100
j
X < 15
...
42 µ = 16
or β =
...
b
g b
g
Hence, 1 − β = 1 −
...
8770
TESTS OF HYPOTHESES
As has been stated above that hypothesis testing determines the validity of the assumption (technically
described as null hypothesis) with a view to choose between two conflicting hypotheses about the
value of a population parameter
...
Statisticians have developed
several tests of hypotheses (also known as the tests of significance) for the purpose of testing of
hypotheses which can be classified as: (a) Parametric tests or standard tests of hypotheses; and
(b) Non-parametric tests or distribution-free test of hypotheses
...
Assumptions like observations come from a normal population, sample size is large,
assumptions about the population parameters like mean, variance, etc
...
But there are situations when the researcher cannot or does not want
to make such assumptions
...
Besides, most non-parametric tests assume only nominal or
ordinal data, whereas parametric tests require measurement equivalent to at least an interval scale
...
4 We take up in the present chapter some of the important parametric
tests, whereas non-parametric tests will be dealt with in a separate chapter later in the book
...
All these tests
are based on the assumption of normality i
...
, the source of data is considered to be normally distributed
...
Harnett and James L
...
368
...
This has been made clear in Chapter 10 entitled χ -test
...
z-test is based on the normal probability distribution and is used for judging the significance of
several statistical measures, particularly the mean
...
This is a most
frequently used test in research studies
...
z-test is generally used for comparing the mean of a sample to
some hypothesised mean for the population in case of large sample, or when population variance is
known
...
z-test is also used for comparing
the sample proportion to a theoretical value of population proportion or for judging the difference in
proportions of two independent samples when n happens to be large
...
t-test is based on t-distribution and is considered an appropriate test for judging the significance
of a sample mean or for judging the significance of difference between the means of two samples in
case of small sample(s) when population variance is not known (in which case we use variance of
the sample as an estimate of the population variance)
...
It can also be used for judging the significance of the
coefficients of simple and partial correlations
...
It may be noted that t-test applies only in case of small sample(s) when
population variance is unknown
...
F-test is based on F-distribution and is used to compare the variance of the two-independent
samples
...
It is also used for judging the
significance of multiple correlation coefficients
...
The table on pages 198–201 summarises the important parametric tests along with test statistics
and test situations for testing hypotheses relating to important parameters (often used in research
studies) in the context of one sample and also in the context of two samples
...
*
The test statistic is the value obtained from the sample data that corresponds to the parameter under investigation
...
Our testing technique will differ in different situations
...
1
...
Population normal, population finite, sample size may be large or small but variance
of the population is known, Ha may be one-sided or two-sided:
In such a situation z-test is used and the test statistic z is worked out as under (using
finite population multiplier):
z=
eσ
p
j
n ×
X − µ H0
b N − ng b N − 1g
3
...
f
...
Population normal, population finite, sample size small and variance of the population
unknown, and Ha may be one-sided or two-sided:
In such a situation t-test is used and the test statistic ‘t’ is worked out as under (using
finite population multiplier):
t=
X − µ H0
eσ / n j × b N − ng / b N − 1g
s
with d
...
= (n – 1)
Unknown
parameter
1
Mean (µ )
Test situation (Population
characteristics and other
conditions
...
e
...
3: Names of Some Parametric Tests along with Test Situations and Test Statistics used in Context of Hypothesis Testing
3
4
z-test and the
test statistic
z=
X − µ H0
σp
d
σ2
p
i
FG 1 + 1 IJ
Hn n K
1
2
is used when two samples are drawn from the
same population
...
Research Methodology
n−1
X1 − X2
z=
n
Σ Xi − X
5
z-test for difference in means and the test
statistic
In case σ p is not
known, we use
σ s in its place
calculating
σs =
Related
2
3
4
5
Testing of Hypotheses I
1
OR
z
X1 − X2
σ 21
p
n1
+
σ22
p
n2
is used when two samples are drawn from
different populations
...
We use σ s1 and σ s2 respectively in their
places calculating
d
σ s1 =
Σ X1i − X1
i
2
n1 − 1
and
d
σ s2 =
Mean (µ )
Populations(s) normal
and
sample size small (i
...
,
n < 30 )
and
population variance(s)
unknown (but the
population variances
assumed equal in case of
test on difference between
means)
t-test and the
test statistic
t=
X − µ H0
σs
n
with
d
...
= (n – 1)
where
Σ X2 i − X2
i
2
n2 − 1
t-test for difference in means and the test statistic Paired t-test or
difference test and
X1 − X2
1
1
the test statistic
t=
×
+
d
Σ X1i − X1
i
2
d
+ Σ X2 i − X2
n1 + n2 − 2
with d
...
= (n1 + n2 – 2)
i
2
n1
n2
t=
D −0
Σ
− D2 , n
n−1
Di2
n
with d
...
2
3
σs =
4
d
Σ Xi − X
n−1
i
5
200
1
2
pairs in two samples
...
f
...
e
...
But
when populations are similar with respect to a
given attribute, we work out the best estimate of
p and q in their
the population proportion as under:
places
p0 =
$
$
n1 p1 + n2 p2
n1 + n2
Contd
...
f
...
f
...
f
...
of items in a sample, n1 = No
...
of items in sample two, µ H0 = Hypothesised mean for population, σ p = standard deviation of population, σ s = standard deviation of
$
sample, p = population proportion, q = 1 − p , p = sample proportion, q = 1 − p
...
Population may not be normal but sample size is large, variance of the population
may be known or unknown, and Ha may be one-sided or two-sided:
In such a situation we use z-test and work out the test statistic z as under:
z=
X − µ H0
σp/ n
(This applies in case of infinite population when variance of the population is known but
when variance is not known, we use σ s in place of σ p in this formula
...
)
Illustration 2
A sample of 400 male students is found to have a mean height 67
...
Can it be reasonably
regarded as a sample from a large population with mean height 67
...
30 inches? Test at 5% level of significance
...
39 inches,
we can write:
H0 : µ H0 = 67
...
39"
and the given information as X = 67
...
Assuming the population to be
...
47 − 67
...
=
0
...
0
...
96
The observed value of z is 1
...
96 and thus
H0 is accepted
...
47") can be regarded
Testing of Hypotheses I
203
to have been taken from a population with mean height 67
...
30" at 5%
level of significance
...
The past records show that the mean of the
distribution of annual turnover is 320 employees, with a standard deviation of 75 employees
...
Is the sample mean consistent with the population mean? Test at 5% level
...
888
...
67
As Ha is two-sided in the given question, we shall apply a two-tailed test for determining the
rejection regions at 5% level of significance which comes to as under, using normal curve area table:
R : | z | > 1
...
67 which is in the acceptance region since R : | z | > 1
...
e
...
Illustration 4
The mean of a certain production process is known to be 50 with a standard deviation of 2
...
The
production manager may welcome any change is mean value towards higher side but would like to
safeguard against decreasing values of mean
...
5
...
Solution: Taking the mean value of the population to be 50, we may write:
H0 : µ H0 = 50
*
Being a case of finite population
...
)
and the given information as X = 48
...
5 and n = 12
...
5 − 50
2
...
= − 2
...
5 / 3
...
645
The observed value of z is – 2
...
We can conclude that the production process is showing mean which
is significantly less than the population mean and this calls for some corrective action concerning the
said process
...
weight):
578, 572, 570, 568, 572, 578, 570, 572, 596, 544
Test (using Student’s t-statistic)whether the mean breaking strength of the lot may be taken to be
578 kg
...
Verify the inference so drawn by using
Sandler’s A-statistic as well
...
, we can write:
H0 : µ = µ H0 = 578 kg
...
No
...
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
Testing of Hypotheses I
205
Xi
4
5
6
7
8
9
10
dX
568
572
578
570
572
596
544
S
...
–4
0
6
–2
0
24
– 28
i
− X
i
dX − Xi
16
0
36
4
0
576
784
d
∑ X i = 5720
n = 10
∴
∑ Xi − X
X =
d
n−1
i
2
= 1456
2
1456
= 12
...
10 − 1
=
572 − 578
t =
Hence,
i
∑ Xi
5720
=
= 572 kg
...
488
12
...
f
...
262
As the observed value of t (i
...
, – 1
...
weight
...
3: Computations for A-Statistic
d
Di = X i − µ H0
0
–6
–8
–10
0
36
64
100
○
○
○
○
○
○
○
○
○
○
○
○
578
578
578
578
○
○
○
○
○
○
○
○
○
○
○
○
○
578
572
570
568
○
○
○
○
○
○
○
○
○
○
○
○
1
2
3
4
Di2
○
mH0 = 578 kg
...
No
...
*
Table No
...
○
206
Research Methodology
S
...
Xi
d
Di = X i − µ H0
Hypothesised mean
mH0 = 578 kg
...
5044
H0 : µ H0 = 578 kg
...
As Ha is two-sided, the critical value of A-statistic from the A-statistic table (Table No
...
e
...
f
...
276
...
5044), being greater than 0
...
weight
...
Illustration 6
Raju Restaurant near the railway station at Falna has been having average sales of 500 tea cups per
day
...
During the first
12 days after the start of the bus stand, the daily sales were as under:
550, 570, 490, 615, 505, 580, 570, 460, 600, 580, 530, 526
On the basis of this sample information, can one conclude that Raju Restaurant’s sales have increased?
Use 5 per cent level of significance
...
As the sample size is small and the population standard deviation is not known, we shall use t-test
assuming normal population and shall work out the test statistic t as:
t=
X −µ
σs/ n
(To find X and σ s we make the following computations:)
Testing of Hypotheses I
207
Table 9
...
No
...
68/ 12
i
=
2
=
23978
= 46
...
558
13
...
796
The observed value of t is 3
...
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS
In many decision-situations, we may be interested in knowing whether the parameters of two
populations are alike or different
...
We shall explain now the technique of
208
Research Methodology
hypothesis testing for differences between means
...
Alternative hypothesis may be of not equal to or less than or greater than type as stated
earlier and accordingly we shall determine the acceptance or rejection regions for testing the
hypotheses
...
Population variances are known or the samples happen to be large samples:
In this situation we use z-test for difference in means and work out the test statistic z as
under:
X1 − X 2
z=
σ21
p
n1
+
σ2 2
p
n2
In case σ p1 and σ p2 are not known, we use σ s1 and σ s2 respectively in their places
calculating
σ s1 =
d
∑ X 1i − X 1
n1 − 1
i
d
2
∑ X 2i − X 2
and σ s2 =
n2 − 1
i
2
2
...
2 (combined standard deviation of the two samples)
in its place calculating
σ s1
...
2
2
2
i
i
− X 1
...
2 =
n1 X 1 + n2 X 2
n1 + n2
3
...
f
...
f
...
per acre with a standard deviation
of 10 lbs
...
with a standard deviation of
12 lbs
...
Solution: Taking the null hypothesis that the means of two populations do not differ, we can write
H0 : µ = µ 2
Ha : µ1 ≠ µ 2
and the given information as n1 = 100; n2 = 150;
X 1 = 200 lbs
...
;
σ s1 = 10 lbs
...
;
σ p = 11 lbs
...
08
142
...
96
The observed value of z is – 14
...
This means that the difference
between means of two samples is statistically significant and not due to sampling fluctuations
...
5
City
Mean monthly
earnings (Rs)
Standard deviation of
sample data of
monthly earnings
(Rs)
Size of
sample
A
B
695
710
40
60
200
175
Test the hypothesis at 5 per cent level that there is no difference between monthly earnings of
workers in the two cities
...
)
Hence z =
695 − 710
b40g + b60g
2
2
200
=−
15
8 + 20
...
809
175
As Ha is two-sided, we shall apply a two-tailed test for determining the rejection regions at 5 per
cent level of significance which come to as under, using normal curve area table:
R : | z | > 1
...
809 which falls in the rejection region and thus we reject H0 at
5 per cent level and conclude that earning of workers in the two cities differ significantly
...
3
4
...
Solution: Taking the null hypothesis that the means of two populations do not differ we can write:
H0 : µ1 = µ 2
Ha : µ1 ≠ µ 2
and the given information as follows:
Table 9
...
3
s
n1 = 5
Sample from town B
As sample two
X 2 = 61
σ 22 = 4
...
f
...
3 + 6 4
...
053
1 1
+
5 7
Degrees of freedom = (n1 + n2 – 2) = 5 + 7 – 2 = 10
As Ha is two-sided, we shall apply a two-tailed test for determining the rejection regions at 5 per
cent level which come to as under, using table of t-distribution for 10 degrees of freedom:
R : | t | > 2
...
053 which falls in the rejection region and thus, we reject H0 and
conclude that the difference in sales in the two towns is significant at 5 per cent level
...
Test at 5 per cent level whether there is significant evidence that
additional protein has increased the weight of the chickens
...
Solution: Taking the null hypothesis that additional protein has not increased the weight of the chickens
we can write:
H0 : µ 1 = µ 2
Ha : µ1 > µ 2 (as we want to conclude that additional protein has increased the weight of
chickens)
Since in the given question variances of the populations are not known and the size of samples is
small, we shall use t-test for difference in means, assuming the populations to be normal and thus
work out the test statistic t as under:
t=
bn − 1g
1
σ 21
s
b
X1 − X 2
g
+ n2 − 1 σ 22
s
n1 + n2 − 2
×
1
1
+
n1 n2
with d
...
= (n1 + n2 – 2)
From the sample data we work out X 1 , X 2 , σ 21 and σ 22 (taking high protein diet sample as
s
s
sample one and low protein diet sample as sample two) as shown below:
Testing of Hypotheses I
213
Table 9
...
No
...
No
...
2
...
4
...
6
...
n1 = 7;
12
15
11
16
14
14
16
4
25
1
36
16
16
36
∑ X 1i − A1 = 28 ;
∑ X 1i − A1
g
b
1
...
3
...
5
...
667 ounces
7−1
=
b
∑ X 2i − A2
g
2
b
− ∑ X 2i − A2
bn − 1g
g
2
/ n2
2
=
Hence,
t=
− A2
b g
2
69 − 15 /5
5−1
0
4
36
4
25
b
∑ X 2i − A2
= 69
1
σ 22
s
2i
(A2 = 8)
2
5
1
6
4
4
6
b
bX
X2i – A2
= 6 ounces
14 − 11
b7 − 1gb3
...
6 ×
...
381
...
812
The observed value of t is 2
...
HYPOTHESIS TESTING FOR COMPARING TWO RELATED SAMPLES
Paired t-test is a way to test for comparing two related samples, involving small values of n that does
not require the variances of the two populations to be equal, but the assumption that the two populations
are normal must continue to apply
...
e
...
”5 Such a test is generally considered appropriate in a before-and-after-treatment
study
...
To apply this test,
we first work out the difference score for each matched pair, and then find out the average of such
differences, D , along with the sample variance of the difference score
...
e
...
2
=
d i ⋅n
∑ Di2 − D
2
n−1
Assuming the said differences to be normally distributed and independent, we can apply the paired ttest for judging the significance of mean of differences and work out the test statistic t as under:
t=
where
5
D −0
σ diff / n
with (n – 1) degrees of freedom
D = Mean of differences
Donald L
...
Murphy, “Introductory Statistical Analysis”, p
...
Testing of Hypotheses I
215
σ diff
...
We can also use Sandler’s A-test for this very purpose as stated earlier in
Chapter 8
...
State at 5 per cent level of
significance whether the training was effective from the following scores:
Student
1
2
3
4
5
6
7
8
9
Before
After
10
12
15
17
9
8
3
5
7
6
12
11
16
18
17
20
4
3
Use paired t-test as well as A-test for your answer
...
/ n
To find the value of t, we shall first have to work out the mean and standard deviation of differences
as shown below:
Table 9
...
778
n
9
and Standard deviation of differences or
∴ Mean of Differences or D =
d i ⋅n
2
∑ Di2 − D
σ diff
...
778 × 9
=
9−1
= 2
...
−0
...
715/ 9
=
−
...
361
0
...
As Ha is one-sided, we shall apply a one-tailed test (in the left tail because Ha is of less than type)
for determining the rejection region at 5 per cent level which comes to as under, using the table of
t-distribution for 8 degrees of freedom:
R : t < – 1
...
361 which is in the acceptance region and thus, we accept H0 and
conclude that the difference in score before and after training is insignificant i
...
, it is only due to
sampling fluctuations
...
Solution using A-test: Using A-test, we workout the test statistic for the given problem thus:
A=
∑ Di2
=
29
b∑ D g b−7g
2
2
= 0
...
Accordingly, at 5% level of
significance the table value of A-statistic for (n – 1) or (9 – 1) = 8 d
...
in the given case is 0
...
The computed value of A i
...
, 0
...
In other words,
we should conclude that the training was not effective
...
)
Illustration 12
The sales data of an item in six shops before and after a special promotional campaign are:
Shops
A
B
C
D
E
F
Before the promotional campaign
After the campaign
53
58
28
29
31
30
48
55
50
56
42
45
Can the campaign be judged to be a success? Test at 5 per cent level of significance
...
Testing of Hypotheses I
217
Solution: Let the sales before campaign be represented as X and the sales after campaign as Y and
then taking the null hypothesis that campaign does not bring any improvement in sales, we can write:
H0 : µ 1 = µ 2 which is equivalent to test H 0 : D = 0
Ha : µ 1 < µ 2 (as we want to conclude that campaign has been a success)
...
/ n
To find the value of t, we first work out the mean and standard deviation of differences as under:
Table 9
...
=
Hence,
t =
25
1
1
49
36
9
∑ Di2 = 121
∑ Di
21
=−
= − 3
...
5 − 0
3
...
5 × 6
= 3
...
5
= − 2
...
257
Degrees of freedom = (n – 1) = 6 – 1 = 5
As Ha is one-sided, we shall apply a one-tailed test (in the left tail because Ha is of less than type)
for determining the rejection region at 5 per cent level of significance which come to as under, using
table of t-distribution for 5 degrees of freedom:
R : t < – 2
...
784 which falls in the rejection region and thus, we reject H0 at 5
per cent level and conclude that sales promotional campaign has been a success
...
2744
i
Since Ha in the given problem is one-sided, we shall apply one-tailed test
...
f
...
372 (as per table of A-statistic given in appendix)
...
2744, is less
than this table value and as such A-statistic is significant
...
HYPOTHESIS TESTING OF PROPORTIONS
In case of qualitative phenomena, we have data on the basis of presence or absence of an attribute(s)
...
Instead of taking mean number of successes and standard deviation of the number of
successes, we may record the proportion of successes in each sample in which case the mean and
standard deviation (or the standard error) of the sampling distribution may be obtained as follows:
b g
Mean proportion of successes = n ⋅ p / n = p
p⋅q
...
For testing of proportion, we formulate H0 and Ha and construct rejection region, presuming
normal approximation of the binomial distribution, for a predetermined level of significance and then
may judge the significance of the observed sample result
...
Illustration 13
A sample survey indicates that out of 3232 births, 1705 were boys and the rest were girls
...
Solution: Starting from the null hypothesis that the sex ratio is 50 : 50 we may write:
Testing of Hypotheses I
219
H0 : p = p H0 =
1
2
H a : p ≠ p H0
1
1
and the probability of girl birth is also
...
The standard error of proportion of success
...
0088
3232
Observed sample proportion of success, or
$
p = 1705/ 3232 = 0
...
5275 −
...
= 3125
...
96
The observed value of z is 3
...
96 and
thus, H0 is rejected in favour of Ha
...
Illustration 14
The null hypothesis is that 20 per cent of the passengers go in first class, but management recognizes
the possibility that this percentage could be more or less
...
Can the null hypothesis be rejected at 10 per cent
level of significance?
Solution: The null hypothesis is
H0 : p = 20% or 0
...
20 and
q = 0
...
175
and the test statistic z =
$
p− p
p⋅q
n
=
0175 −
...
...
80
400
= − 1
...
645
The observed value of z is –1
...
Thus the null hypothesis cannot be rejected at 10 per cent level of significance
...
A supplier of new raw material claims
that the use of his material would reduce the proportion of defectives
...
Can the supplier’s claim be
accepted? Test at 1 per cent level of significance
...
10 and the alternative hypothesis
Ha : p < 0
...
Hence,
p = 0
...
90
$
Observed sample proportion p = 34/400 = 0
...
085 −
...
10 ×
...
015
= − 1
...
015
As Ha is one-sided, we shall determine the rejection region applying one-tailed test (in the left tail
because Ha is of less than type) at 1% level of significance and it comes to as under, using normal
curve area table:
R : z < – 2
...
HYPOTHESIS TESTING FOR DIFFERENCE BETWEEN PROPORTIONS
If two samples are drawn from different populations, one may be interested in knowing whether the
difference between the proportion of successes is significant or not
...
In other words, we take
$
$
the null hypothesis as H 0 : p1 = p2 and for testing the significance of difference, we work out the
test statistic as under:
z=
where
$
$
p1 − p2
$ $
$ $
p1 ⋅ q1
p ⋅q
+ 2 2
n1
n2
$
p1 = proportion of success in sample one
$
p2 = proportion of success in sample two
$
$
q1 = 1 − p1
$
$
q 2 = 1 − p2
n1 = size of sample one
n2 = size of sample two
and
$ $
$ $
p1q1
pq
+ 2 2 = the standard error of difference between two sample proportions
...
We can
now illustrate all this by examples
...
The drugs are administered to two different sets of animals
...
The
research unit wants to test whether there is a difference between the efficacy of the said two drugs
at 5 per cent level of significance
...
But
on the assumption that the populations are similar as regards the given attribute, we make use of the following formula for
working out the standard error of difference between proportions of the two samples:
S
...
Diff
...
e
...
e
...
583
$
$
q1 = 1 − p1 = 0
...
520
$
$
q2 = 1 − p2 = 0
...
583 − 0
...
583gb
...
520gb
...
093
As Ha is two-sided, we shall determine the rejection regions applying two-tailed test at 5% level
which comes as under using normal curve area table:
R : | z | > 1
...
093 which is in the rejection region and thus, H0 is rejected in favour of
Ha and as such we conclude that the difference between the efficacy of the two drugs is significant
...
After the tax on tobacco had been heavily increased, another random sample of 600 men in the same
city included 400 smokers
...
Solution: We start with the null hypothesis that the proportion of smokers even after the heavy tax
$
$
on tobacco remains unchanged i
...
H0 : p1 = p2 and the alternative hypothesis that proportion of
smokers after tax has decreased i
...
,
$
$
H a : p1 > p2
On the presumption that the given populations are similar as regards the given attribute, we work
out the best estimate of proportion of smokers (p0) in the population as under, using the given information:
$
$
n p + n2 p2
=
p0 = 1 1
n1 + n2
500
FG 400 IJ + 600 FG 400IJ
H 500 K H 600K =
500 + 600
800
8
=
=
...
2727
The test statistic z can be worked out as under:
z=
$
$
p1 − p2
p0 q0
pq
+ 0 0
n1
n2
=
b
400 400
−
500 600
...
2727
...
2727
+
500
600
gb
g b
gb
g
0133
...
926
0
...
645
The observed value of z is 4
...
Testing the difference between proportion based on the sample and the proportion given
for the whole population: In such a situation we work out the standard error of difference between
proportion of persons possessing an attribute in a sample and the proportion given for the population
as under:
Standard error of difference between sample proportion and
=
population proportion or S
...
diff
...
We
take an example to illustrate the same
...
In a random sample study 20 were found smokers in the college and the
proportion of smokers in the university is 0
...
Is there a significant difference between the proportion
of smokers in the college and university? Test at 5 per cent level
...
05
100
2000 − 100
...
95
100 2000
b gb g b gb
g
0150
...
143
0
...
96
The observed value of z is 7
...
=
HYPOTHESIS TESTING FOR COMPARING A VARIANCE
TO SOME HYPOTHESISED POPULATION VARIANCE
The test we use for comparing a sample variance to some theoretical or hypothesised variance of
population is different than z-test or the t-test
...
The chi-square value to test the null hypothesis viz, H 0 : σ 2 = σ 2 worked out as under:
s
p
χ2 =
σ2
s
σ2
p
bn − 1g
where σ 2 = variance of the sample
s
σ 2 = variance of the population
p
(n – 1) = degree of freedom, n being the number of items in the sample
...
If the calculated value
of χ is equal to or less than the table value, the null hypothesis is accepted; otherwise the null
hypothesis is rejected
...
*
TESTING THE EQUALITY OF VARIANCES
OF TWO NORMAL POPULATIONS
When we want to test the equality of variances of two normal populations, we make use of F-test
based on F-distribution
...
This hypothesis is tested on the basis
p
of sample data and the test statistic F is found, using σ 2 and σ 22 the sample estimates for σ 2 1 and
s
p
s1
σ 2 2 respectively, as stated below:
p
F=
where
σ 21
s
=
d
∑ X 1i − X 1
bn − 1g
i
2
and
σ 22
s
=
While calculating F,
σ 22
s
∑ X 2i − X 2
1
σ 21
s
d
σ 21
s
bn − 1g
i
2
2
is treated
> σ 22
s
which means that the numerator is always the greater
**
variance
...
By comparing the observed value of F with the corresponding table value, we can infer
whether the difference between the variances of samples could have arisen due to sampling
fluctuations
...
Degrees of
freedom for greater variance is represented as v1 and for smaller variance as v2
...
If F-ratio is considered non-significant, we accept the null hypothesis, but if F-ratio is considered
significant, we then reject H0 (i
...
, we accept Ha)
...
The object of F-test is to test the hypothesis whether the two samples are from the same normal
population with equal variance or from two normal populations with equal variances
...
F-distribution tables [Table 4(a) and Table 4(b)] have been given in appendix at the end of the book
...
The following examples illustrate the use of F-test for testing the
equality of variances of two normal populations
...
Solution: We take the null hypothesis that the two populations from where the samples have been
2
drawn have the same variances i
...
, H 0 : σ 2 = σ 2
...
10
Sample 1
X1i
20
16
26
27
23
22
18
24
25
19
dX
1i
− X1
Sample 2
i dX
–2
–6
4
5
1
0
–4
2
3
–3
1i
i
∑ X 1i − X 1
27
33
42
35
32
34
38
28
41
43
30
37
i
2
2i
− X2
i dX
–8
–2
7
0
–3
–1
3
–7
6
8
–5
2
∑ X 2i = 420
= 120
n1 = 10
220
∑ X 1i
=
= 22 ;
10
n1
σ 21
s
=
d
∑ X 1i − X 1
n1 − 1
X2 =
i
=
120
= 13
...
55
12 − 1
2
s2
> σ 21
s
j
28
...
14
1333
...
11 and the table
value of F at 1 per cent level of significance for v1 = 11 and v2 = 9 is 5
...
Since the calculated value of F = 2
...
11 and also less than 5
...
=
Illustration 20
Given n1 = 9; n2 = 8
d
∑dX
i = 184
i = 38
∑ X 1i − X 1
2
− X2
2
2i
Apply F-test to judge whether this difference is significant at 5 per cent level
...
To test this, we work out the F-ratio as under:
F =
=
σ 21
s
σ 22
s
d
=
∑dX
i b g
− X i / bn − 1g
2
∑ X 1i − X 1 / n1 − 1
2
2i
2
184/8
23
=
= 4
...
43
v1 = 8 being the number of d
...
for greater variance
v2 = 7 being the number of d
...
for smaller variance
...
73
...
Accordingly we reject
H0 and conclude that the difference is significant
...
For this purpose we may use (in the context of
small samples) normally either the t-test or the F-test depending upon the type of correlation coefficient
...
This calculated value of t is then compared with its table value and if the calculated value is less
than the table value, we accept the null hypothesis at the given level of significance and may infer
that there is no relationship of statistical significance between the two variables
...
If the value of t in the table is greater than the calculated value, we may accept null hypothesis
and infer that there is no correlation
...
The test is performed by entering tables of the F-distribution
with
v1 = k – 1 = degrees of freedom for variance in numerator
...
If the calculated value of F is less than the table value, then we may infer that there is no statistical
evidence of significant correlation
...
Readers may look into standard tests for further details
...
But there are several limitations of the said tests which
should always be borne in mind by a researcher
...
It should be kept in view that testing
is not decision-making itself; the tests are only useful aids for decision-making
...
”6
(ii) Test do not explain the reasons as to why does the difference exist, say between the means
of the two samples
...
(iii) Results of significance tests are based on probabilities and as such cannot be expressed
with full certainty
...
(iv) Statistical inferences based on the significance tests cannot be said to be entirely correct
evidences concerning the truth of the hypotheses
...
For greater reliability, the size of samples be sufficiently enlarged
...
Questions
1
...
2
...
5"
...
26"
...
6"
...
The procedure of testing hypothesis requires a researcher to adopt several steps
...
6
Ya-Lun-Chou, “Applied Business and Economic Statistics”
...
What do you mean by the power of a hypothesis test? How can it be measured? Describe and illustrate
by an example
...
Briefly describe the important parametric tests used in context of testing hypotheses
...
6
...
7
...
(b) Write a brief note on “Sandler’s A-test” explaining its superiority over t-test
...
Point out the important limitations of tests of hypotheses
...
A coin is tossed 10,000 times and head turns up 5,195 times
...
In some dice throwing experiments, A threw dice 41952 times and of these 25145 yielded a 4 or 5 or 6
...
A machine puts out 16 imperfect articles in a sample of 500
...
Has the machine improved? Test at 5% level of significance
...
In two large populations, there are 35% and 30% respectively fair haired people
...
In a certain association table the following frequencies were obtained:
(AB) = 309, (Ab) = 214, (aB) = 132, (ab) = 119
...
A sample of 900 members is found to have a mean of 3
...
Can it be reasonably regarded as a simple
sample from a large population with mean 3
...
and standard deviation 2
...
?
15
...
5 and 68
...
Can the
samples be regarded to have been drawn from the same population of standard deviation 9
...
16
...
The brand that has been used in the past
has an average life of 1000 hours with a standard deviation of 100 hours
...
It is decided that they will
switch to the new brand unless it is proved with a level of significance of 5% that the new brand has
smaller average life than the old brand
...
Assuming that the standard deviation of the new brand is the same
as that of the old brand,
(a) What conclusion should be drawn and what decision should be made?
(b) What is the probability of accepting the new brand if it has the mean life of 950 hours?
17
...
In the light of these data, discuss the suggestion that the mean height of the
students of the school is 54 inches
...
18
...
Test at five per cent level of significance
...
The heights of six randomly chosen sailors are, in inches, 63, 65, 58, 69, 71 and 72
...
Do these figures indicate
that soldiers are on an average shorter than sailors? Test at 5% level of significance
...
Ten young recruits were put through a strenuous physical training programme by the army
...
Suppose a test on the hypotheses H0 : µ = 200 against Ha : µ > 200 is done with 1% level of significance,
σ p = 40 and n = 16
...
The following nine observations were drawn from a normal population:
27 19 20 24 23 29 21 17 27
(i) Test the null hypothesis H0 : µ = 26 against the alternative hypothesis Ha : µ ≠ 26
...
Suppose that a public corporation has agreed to advertise through a local newspaper if it can be established
that the newspaper circulation reaches more than 60% of the corporation’s customers
...
24
...
25
...
26
...
Is there
reason to doubt the hypothesis that males and females are in equal numbers in the city? Use 1% level of
significance
...
12 students were given intensive coaching and 5 tests were conducted in a month
...
Does the score from Test 1 to Test 5 show an improvement? Use 5% level of
significance
...
of students
1
2
3
4
5
6
7
8
9
10
11
12
Marks in 1st Test
50
42
51
26
35
42
60
41
70
55
62
38
Marks in 5th test
62
40
61
35
30
52
68
51
84
63
72
50
232
Research Methodology
28
...
Another random sample of 200 villages from
the same district gave an average population of 480 per village with a standard deviation of 60
...
(ii) The means of the random samples of sizes 9 and 7 are 196
...
42 respectively
...
94 and 18
...
Can the samples be constituted
to have been drawn from the same normal population? Use 5% level of significance
...
A farmer grows crops on two fields A and B
...
10 worth of manure per acre and on B Rs 20
worth
...
Test at 5% level of significance
...
ABC Company is considering a site for locating their another plant
...
They take a traffic sample of 20 days and find an average volume per day of 2140 with standard deviation
equal to 100 trucks
...
05, should they purchase the site?
(ii) If we assume the population mean to be 2140, what is the β error?
Chi-square Test
233
10
Chi-Square Test
The chi-square test is an important test amongst the several tests of significance developed by
2
statisticians
...
As a non-parametric* test, it “can be used to determine if categorical data shows dependency or the
two classifications are independent
...
”1 Thus, the chi-square test is applicable in
large number of problems
...
CHI-SQUARE AS A TEST FOR COMPARING VARIANCE
The chi-square value is often used to judge the significance of population variance i
...
, we can use
the test to judge if a random sample has been drawn from a normal population with mean (µ ) and
with a specified variance ( σ p )
...
Such a distribution we encounter
when we deal with collections of values that involve adding up squares
...
If we take each one of a collection of sample variances, divided them by the known
population variance and multiply these quotients by (n – 1), where n means the number of items in
2
the sample, we shall obtain a χ -distribution
...
*
1
g
σ2
σ2
s
s
n − 1 = 2 (d
...
) would have the same
2
σp
σp
See Chapter 12 Testing of Hypotheses-II for more details
...
Ullman, Elementary Statistics—An Applied Approach, p
...
234
Research Methodology
The χ -distribution is not symmetrical and all the values are positive
...
The smaller the number of degrees of freedom, the more skewed is the
distribution which is illustrated in Fig
...
1:
2
c 2—distribution for different degrees of freedom
df = 1
df = 3
df = 5
df = 10
0
5
10
15
20
25
30
c 2 — Values
Fig
...
1
2
Table given in the Appendix gives selected critical values of χ for the different degrees of
freedom
...
In brief, when we have to use chi-square as a test of population variance, we have to work out
2
2
the value of χ to test the null hypothesis (viz
...
Then by comparing the calculated value with the table value of χ for (n – 1) degrees of
freedom at a given level of significance, we may either accept or reject the null hypothesis
...
All this can be made clear by
an example
...
No
...
)
38
40
45
53
47
43
55
48
52
49
Can we say that the variance of the distribution of weight of all students from which the above
sample of 10 students was drawn is equal to 20 kgs? Test this at 5 per cent and 1 per cent level of
significance
...
1
S
...
Xi (Weight in kgs
...
n
10
d
∑ Xi − X
n −1
i
2
=
280
...
σ 2 = 3111
...
In order to test this hypothesis we work out the χ
p
s
value as under:
χ2 =
σ2
s
σ2
p
bn − 1g
236
Research Methodology
b
g
3111
...
999
...
At 5 per cent level of significance
=
2
the table value of χ = 16
...
67 for 9 d
...
and both
these values are greater than the calculated value of χ which is 13
...
Hence we accept the null
hypothesis and conclude that the variance of the given distribution can be taken as 20 kgs at 5 per
cent as also at 1 per cent level of significance
...
2
Illustration 2
A sample of 10 is drawn randomly from a certain population
...
Test the hypothesis that the variance of the population is 5 at
5 per cent level of significance
...
In order to test this hypothesis, we work out the χ
p
s
value as under:
χ2 =
b
g
b
g
50
50 1 9
σ2
s
n − 1 = 9 10 − 1 =
× × = 10
2
5
9
5 1
σp
Degrees of freedom = (10 – 1) = 9
...
f
...
92
...
2
2
CHI-SQUARE AS A NON-PARAMETRIC TEST
Chi-square is an important non-parametric test and as such no rigid assumptions are necessary in
respect of the type of population
...
As a non-parametric test, chi-square can be used (i) as a test
of goodness of fit and (ii) as a test of independence
...
When some theoretical distribution is fitted to the given data, we are always interested in
knowing as to how well this distribution fits with the observed data
...
If the calculated value of χ is less than the table value at a certain level of significance,
the fit is considered to be a good one which means that the divergence between the observed and
2
expected frequencies is attributable to fluctuations of sampling
...
2
As a test of independence, χ test enables us to explain whether or not two attributes are
associated
...
In such a situation, we proceed
with the null hypothesis that the two attributes (viz
...
On this basis we first calculate
2
2
the expected frequencies and then work out the value of χ
...
e
...
But if the calculated value of χ is greater
than its table value, our inference then would be that null hypothesis does not hold good which means
the two attributes are associated and the association is not because of some chance factor but it
exists in reality (i
...
, the new medicine is effective in controlling the fever and as such may be
2
prescribed)
...
In order that we may apply the chi-square test either as a test of goodness of fit or as a test to
judge the significance of association between attributes, it is necessary that the observed as well as
theoretical or expected frequencies must be grouped in the same way and the theoretical distribution
2
must be adjusted to give the same total frequency as we find in case of observed distribution
...
Eij = expected frequency of the cell in ith row and jth column
...
Instead of working out the probabilities, we can use ready table which gives
2
2
probabilities for given values of χ
...
If the calculated value of χ is equal to or exceeds the table value, the difference
between the observed and expected frequencies is taken as significant, but if the table value is more
2
than the calculated value of χ , then the difference is considered as insignificant i
...
, considered to
have arisen as a result of chance and as such can be ignored
...
If there are 10
frequency classes and there is one independent constraint, then there are (10 – 1) = 9 degrees of
freedom
...
f
...
In the case of a
contingency table (i
...
, a table with 2 columns and 2 rows or a table with two columns and more than
two rows or a table with two rows but more than two columns or a table with more than two rows
and more than two columns), the d
...
is worked out as follows:
d
...
= (c – 1) (r – 1)
where ‘c’ means the number of columns and ‘r’ means the number of rows
...
(ii) All the itmes in the sample must be independent
...
In case where the frequencies
are less than 10, regrouping is done by combining the frequencies of adjoining groups so
that the new frequencies become greater than 10
...
(iv) The overall number of items must also be reasonably large
...
(v) The constraints must be linear
...
e
...
STEPS INVOLVED IN APPLYING CHI-SQUARE TEST
The various steps involved are as follows:
*
For d
...
greater than 30, the distribution of
distribution is
LM
N
2χ 2 −
2χ 2 approximates the normal distribution wherein the mean of
2χ 2
2d
...
− 1 and the standard deviation = 1
...
f
...
f
...
e
...
f
...
Usually in case of a 2 × 2 or any contingency table, the expected
frequency for any given cell is worked out as under:
(Row total for the row of that cell) ×
(Column total for the column of that cell)
Expected frequency of any cell =
(Grand total)
OP
PP
Q
LM
MM
N
(ii) Obtain the difference between observed and expected frequencies and find out the squares
of such differences i
...
, calculate (Oij – Eij)2
...
(iv) Find the summation of (Oij – Eij)2/Eij
dO
values or what we call ∑
ij
2
required χ value
...
This is the
2
2
The χ value obtained as such should be compared with relevant table value of χ and then
inference be drawn as stated above
...
Illustration 3
A die is thrown 132 times with following results:
Number turned up
1
2
3
4
5
6
Frequency
16
20
25
14
29
28
Is the die unbiased?
Solution: Let us take the hypothesis that the die is unbiased
...
Now we can write the observed frequencies along with expected frequencies
2
and work out the value of χ as follows:
Table 10
...
turned
up
Observed
frequency
Oi
Expected
frequency
Ei
(Oi – Ei )
(Oi – Ei )2
(Oi – Ei )2/Ei
1
2
3
4
5
6
16
20
25
14
29
28
22
22
22
22
22
22
–6
–2
3
–8
7
6
36
4
9
64
49
36
36/22
4/22
9/22
64/22
49/22
36/22
240
Research Methodology
∑ [(Oi – Ei)2/Ei] = 9
...
Q Degrees of freedom in the given problem is
(n – 1) = (6 – 1) = 5
...
071
...
The result, thus, supports the
hypothesis and it can be concluded that the die is unbiased
...
3
Class
A and B
C
D and E
Expected
frequency Ei
(8 + 29) = 37
44
(15 + 4) = 19
∴
Oi – Ei
(Oi – Ei)2/Ei
(7 + 24) = 31
38
(24 + 7) = 31
Observed
frequency Oi
6
6
–12
36/31
36/38
144/31
χ2 = ∑
bO
i
− Ei
Ei
g
2
= 6
...
Illustration 5
Genetic theory states that children having one parent of blood type A and the other of blood type B
will always be of one of three types, A, AB, B and that the proportion of three types will on
an average be as 1 : 2 : 1
...
Test the
2
hypothesis by χ test
...
*Table No
...
χ 2 for specified degrees of freedom has been given in Appendix at the end
Chi-square Test
241
The expected frequencies of type A, AB and B (as per the genetic theory) should have been 75,
150 and 75 respectively
...
4
Type
Observed
frequency
Oi
A
AB
B
Expected
frequency
Ei
90
135
75
75
150
75
χ2 = ∑
∴
Q
(Oi – Ei)
bO
i
15
–15
0
− Ei
Ei
g
(Oi – Ei)2
(Oi – Ei)2/Ei
225
225
0
225/75 = 3
225/150 = 1
...
5 + 0 = 4
...
f
...
2
Table value of χ for 2 d
...
at 5 per cent level of significance is 5
...
2
The calculated value of χ is 4
...
This supports the theoretical hypothesis of the genetic theory
that on an average type A, AB and B stand in the proportion of 1 : 2 : 1
...
Test your result
2
with the help of χ at 5 per cent level of significance
...
e
...
On the basis of this hypothesis, the expected
frequency corresponding to the number of persons vaccinated and attacked would be:
Expectation of ( AB) =
when A represents vaccination and B represents attack
...
5: Calculation of Chi-Square
Group
AB
Ab
aB
ab
Observed
frequency
Oij
31
469
158
1315
Expected
frequency
Eij
(Oij – Eij)
(Oij – Eij)2
54
446
162
1338
–23
+23
+23
–23
529
529
529
529
χ
2
dO
=∑
ij
− Eij
i
(Oij – Eij)2/Eij
529/54 = 9
...
186
529/162 = 3
...
395
2
Eij
= 14
...
2
The table value of χ for 1 degree of freedom at 5 per cent level of significance is 3
...
The
2
calculated value of χ is much higher than this table value and hence the result of the experiment
does not support the hypothesis
...
Illustration 7
Two research workers classified some people in income groups on the basis of sampling studies
...
Solution: Let us take the hypothesis that the sampling techniques adopted by research workers are
similar (i
...
, there is no difference between the techniques adopted by research workers)
...
6
Groups
Investigator A
classifies people as poor
classifies people as
middle class people
classifies people as rich
Investigator B
classifies people as poor
classifies people as
middle class people
classifies people as rich
(Oij – Eij)2 Eij
Observed
frequency
Oij
Expected
frequency
Eij
Oij – Eij
160
120
40
1600/120 = 13
...
00
100/20 = 5
...
88
120
40
90
30
30
10
900/90 = 10
...
33
244
Research Methodology
χ
Hence,
2
dO
=∑
− Eij
ij
i
2
= 55
...
2
The table value of χ for two degrees of freedom at 5 per cent level of significance is 5
...
The calculated value of χ is much higher than this table value which means that the calculated
value cannot be said to have arisen just because of chance
...
Hence, the hypothesis
does not hold good
...
Naturally, then the technique of one must be superior than that of the other
...
Solution: Let us take the hypothesis that the coins are not biased
...
In such a case the expected values of getting 0, 1, 2, … heads in a single
throw in 256 throws of eight coins will be worked out as follows*
...
7
Events or
No
...
*
The probabilities of random variable i
...
, various possible events have been worked out on the binomial principle viz
...
The expansion of the term
n
Cr pr qn–r has given the required probabilities which have been multiplied by 256 to obtain the expected frequencies
...
of heads
Expected frequencies
3
8
4
8
5
8
6
8
7
8
8
8
FG 1 IJ
H 2K
F 1I
C G J
H 2K
F 1I
CG J
H 2K
F 1I
C G J
H 2K
F 1I
C G J
H 2K
F 1I
C G J
H 2K
3
C3
4
4
5
5
6
6
7
7
8
8
FG 1 IJ × 256 = 56
H 2K
FG 1 IJ × 256 = 70
H 2K
FG 1 IJ × 256 = 56
H 2K
FG 1 IJ × 256 = 28
H 2K
FG 1 IJ × 256 = 8
H 2K
FG 1 IJ × 256 = 1
H 2K
5
4
3
2
1
0
2
The value of χ can be worked out as follows:
Table 10
...
of heads
Expected
frequency
Ei
Oi – Ei
0
1
2
3
4
5
6
7
8
∴
Observed
frequency
Oi
2
6
30
52
67
56
32
10
1
1
8
28
56
70
56
28
8
1
1
–2
2
–4
–3
0
4
2
0
χ2 = ∑
bO
i
− Ei
Ei
g
2
= 3
...
00
4/8 = 0
...
14
16/56 = 0
...
13
0/56 = 0
...
57
4/8 = 0
...
00
246
Research Methodology
∴ Degrees of freedom = (n – 1) = (9 – 1) = 8
2
The table value of χ for eight degrees of freedom at 5 per cent level of significance is 15
...
The calculated value of χ is much less than this table and hence it is insignificant and can be
ascribed due to fluctuations of sampling
...
2
ALTERNATIVE FORMULA
2
There is an alternative method of calculating the value of χ in the case of a (2 × 2) table
...
The alternative formula is
rarely used in finding out the value of chi-square as it is not applicable uniformly in all cases but can
be used only in a (2 × 2) contingency table
...
Yates has suggested a correction for continuity in χ value calculated in connection with a (2 × 2)
table, particularly when cell frequencies are small (since no cell frequency should be less than 5 in
2
any case, through 10 is better as stated earlier) and χ is just on the significance level
...
It involves the reduction of the deviation
2
of observed from expected frequencies which of course reduces the value of χ
...
5, but this
adjustment is made in all the cells without disturbing the marginal totals
...
5 N
2
2
In case we use the usual formula for calculating the value of chi-square viz
...
5
χ (corrected) =
2
E1
2
O2 − E 2 − 0
...
It may again be emphasised that Yates’ correction is made only in case of (2 × 2) table and that
too when cell frequencies are small
...
Solution: Take the hypothesis that there is no difference so far as shops run by men and women in
towns and villages
...
9
Groups
(AB)
(Ab)
(aB)
(ab)
Observed
frequency
Oij
Expected
frequency
Eij
17
18
3
12
14
21
6
9
χ2
∴
dO
=∑
ij
(Oij – Eij)
3
–3
–3
3
− Eij
Eij
i
(Oij – Eij)2/Eij
9/14 = 0
...
43
9/6 = 1
...
00
2
= 3
...
5
2
+
14
18 − 21 − 0
...
5
6
b2
...
5g + b2
...
5g
=
2
14
2
2
21
6
2
+
12 − 9 − 0
...
446 + 0
...
040 + 0
...
478
Q Degrees of freedom = (c – 1) (r – 1) = (2 – 1) (2 – 1) = 1
2
Table value of χ for one degree of freedom at 5 per cent level of significance is 3
...
The
2
calculated value of χ by both methods (i
...
, before correction and after Yates’ correction) is less
than its table value
...
We can conclude that there is no difference
between shops run by men and women in villages and towns
...
This means that several
2
values of χ can be added together and if the degrees of freedom are also added, this number gives
2
2
2
the degrees of freedom of the total value of χ
...
Such addition of various values of χ gives one
2
value of χ which helps in forming a better idea about the significance of the problem under
2
consideration
...
Illustration 10
2
The following values of χ from different investigations carried to examine the effectiveness of a
recently invented medicine for checking malaria are obtained:
Investigation
χ2
d
...
1
2
3
4
5
2
...
2
4
...
7
4
...
0
...
f
...
We can now state that the value of χ for 5
degrees of freedom (when all the five investigations are taken together) is 18
...
2
2
Let us take the hypothesis that the new medicine is not effective
...
070
...
As such
the hypothesis is rejected and it can be concluded that the new medicine is effective in checking
malaria
...
In
other words, chi-square tells us about the significance of a relation between variables; it provides no
answer regarding the magnitude of the relation
...
Coefficient of contingency is also known as
coefficient of Mean Square contingency
...
2
IMPORTANT CHARACTERISTICS OF χ TEST
(i) This test (as a non-parametric test) is based on frequencies and not on the parameters like
mean and standard deviation
...
(iii) This test possesses the additive property as has already been explained
...
(v) This test is an important non-parametric test as no rigid assumptions are necessary in
regard to the type of population, no need of parameter values and relatively less mathematical
details are involved
...
It should be borne in mind that the test is to be applied only when the individual observations
of sample are independent which means that the occurrence of one individual observation (event)
has no effect upon the occurrence of any other observation (event) in the sample under consideration
...
The other possible reasons concerning the improper application or misuse of this test can be (i)
neglect of frequencies of non-occurrence; (ii) failure to equalise the sum of observed and the sum of
the expected frequencies; (iii) wrong determination of the degrees of freedom; (iv) wrong computations,
and the like
...
Chi-square Test
251
Questions
1
...
2
...
3
...
In a certain
hospital chloromycetin was given to 285 out of the 392 patients suffering from typhoid
...
2
(The χ value at 5 per cent level of significance for one degree of freedom is 3
...
2
(M
...
, Rajasthan University, 1966)
4
...
No
...
, 3
...
5
...
The frequencies of the digits were:
Digit
0
1
2
3
4
5
6
7
8
9
Frequency 18
19
23
21
16
25
22
20
21
15
2
2
Calculate χ
...
Five dice were thrown 96 times and the number of times 4, 5, or 6 was thrown were
Number of dice throwing
4, 5 or 6
5
4
3
2
1
0
Frequency
8
18
35
24
10
1
Find the value of Chi-square
...
Find Chi-square from the following information:
Condition of home
Condition of child
Total
Clean
Clean
Fairly clean
Dirty
Total
8
...
10
...
Dirty
70
80
35
50
20
45
120
100
80
185
115
300
State whether the two attributes viz
...
In a certain cross the types represented by XY, Xy, xY and xy are expected to occur in a 9 : 5 : 4 : 2 ratio
...
The normal rate of infection for a certain disease in cattle is known to be 50 per cent
...
Can the
evidence be regarded as conclusive (at 1 per cent level of significance) to prove the value of the new
vaccine?
Result of throwing die were recorded as follows:
Number falling upwards
1
2
3
4
5
6
Frequency
27
33
31
29
30
24
Is the die unbiased? Answer on the basis of Chi-square test
...
In an
experiment among 1600 beans, the number in the four groups were 882, 313, 287 and 118
...
2
(M
...
A
...
You are given a sample of 150 observations classified by two attributes A and B as follows:
A1
A2
A3
Total
B1
B2
B3
40
11
9
25
26
9
15
8
7
80
45
25
Total
60
60
30
150
Use the χ test to examine whether A and B are associated
...
A
...
, Patiala University, 1975)
13
...
of boys
5
4
3
2
1
0
No
...
of families
Is this distribution consistent with the hypothesis that male and female births are equally probable?
Apply Chi-square test
...
What is Yates’ correction? Find the value of Chi-square applying Yates’ correction to the following data:
Passed
Failed
Total
Day classes
Evening classes
10
4
20
66
30
70
Total
14
86
100
Also state whether the association, if any, between passing in the examination and studying in day
classes is significant using Chi-square test
...
(a) 1000 babies were born during a certain week in a city of which 600 were boys and 400 girls
...
(b) The percentage of smokers in a certain city was 90
...
Is the sample proportion significantly different from the
proportion of smokers in the city? Answer on the basis of Chi-square test
...
A college is running post-graduate classes in five subjects with equal number of students
...
Test the hypothesis that these classes are alike in
absenteeism if the actual absentees in each are as follows:
History
= 19
Philosophy = 18
Economics = 15
Commerce = 12
Chemistry = 11
(M
...
(EAFM) Exam
...
Uni
...
The number of automobile accidents per week in a certain community were as follows:
12, 8, 20, 2, 14, 10, 15, 6, 9, 4
Are these frequencies in agreement with the belief that accident conditions were the same during the
10 week period under consideration?
18
...
From scientific
analysis, sea water is known to contain sodium chloride, magnesium and other elements in the ratio of
62 : 4 : 34
...
Are these data consistent with the scientific model at 5 per cent level of significance?
19
...
The results of the test were
as given below:
2
254
Research Methodology
Area
Total
A
B
C
Strikes
Dry holes
7
10
10
18
8
9
25
37
Total number of test wells
17
28
17
62
Do the three areas have the same potential, at the 10 per cent level of significance?
20
...
The following tables gives the observed number of periods in
which there were 0, 1, 2, 3, 4, or more arrivals as well as the expected number of such periods if arrivals per
half hour have a Poisson distribution λ = 2
...
Number of observed
Number of periods
Number of periods
arrivals (per half hour)
observed
expected (Poisson, λ = 2)
0
1
2
3
4 or more
47
56
71
44
32
34
68
68
45
35
21
...
05) that there are no differences among frequencies of first choice of
tested publications
...
05 that
there are no differences
...
A group of 150 College students were asked to indicate their most liked film star from among six different
well known film actors viz
...
The observed
frequency data were as follows:
Actors
A
B
C
D
E
F
Total
Frequencies 24
20
32
25
28
21
150
Test at 5 per cent whether all actors are equally popular
...
For the data in question 12, find the coefficient of contingency to measure the magnitude of relationship
between A and B
...
(a) What purpose is served by calculating the Phi coefficient ( φ )? Explain
...
2
256
Research Methodology
11
Analysis of Variance
and Co-variance
ANALYSIS OF VARIANCE (ANOVA)
Analysis of variance (abbreviated as ANOVA) is an extremely useful technique concerning researches
in the fields of economics, biology, education, psychology, sociology, business/industry and in researches
of several other disciplines
...
As
stated earlier, the significance of the difference between the means of two samples can be judged
through either z-test or the t-test, but the difficulty arises when we happen to examine the significance
of the difference amongst more than two sample means at the same time
...
Using this technique, one can draw inferences about whether
the samples have been drawn from populations having the same mean
...
In such circumstances one generally does not want to consider all possible
combinations of two populations at a time for that would require a great number of tests before we
would be able to arrive at a decision
...
Therefore, one
quite often utilizes the ANOVA technique and through it investigates the differences among the
means of all the populations simultaneously
...
A
...
*
Variance is an important statistical measure and is described as the mean of the squares of deviations taken from the
mean of the given series of data
...
Its squareroot is known as standard deviation,
i
...
, Standard deviation = Variance
...
ANOVA is essentially a procedure for testing the difference among different groups of data for
homogeneity
...
”1 There may be variation between samples and also within sample
items
...
Hence, it is a method of
analysing the variance to which a response is subject into its various components corresponding to
various sources of variation
...
Similarly, the differences in
various types of feed prepared for a particular class of animal or various types of drugs manufactured
for curing a specific disease may be studied and judged to be significant or not through the application
of ANOVA technique
...
Thus, through ANOVA technique one can, in general, investigate any number of factors which
are hypothesized or said to influence the dependent variable
...
If we take only one factor and investigate the differences amongst its various
categories having numerous possible values, we are said to use one-way ANOVA and in case we
investigate two factors at the same time, then we use two-way ANOVA
...
e
...
THE BASIC PRINCIPLE OF ANOVA
The basic principle of ANOVA is to test for differences among the means of the populations by
examining the amount of variation within each of these samples, relative to the amount of variation
between the samples
...
e
...
Thus while using
ANOVA, we assume that each of the samples is drawn from a normal population and that each of
these populations has the same variance
...
This, in other words, means that we assume the absence of
many factors that might affect our conclusions concerning the factor(s) to be studied
...
, one based on between
samples variance and the other based on within samples variance
...
Estimate of population variance based on between samples variance
F=
Estimate of population variance based on within samples variance
1
Donald L
...
Murphy, Introductory Statistical Analysis, p
...
258
Research Methodology
This value of F is to be compared to the F-limit for given degrees of freedom
...
4(a) and 4(b) given in
appendix), we may say that there are significant differences between the sample means
...
We then determine if there are differences within that factor
...
e
...
, X k
when there are k samples
...
+ X k
No
...
This is known as the sum of squares for
variance between the samples (or SS between)
...
+ n k X k − X
IK
2
(iv) Divide the result of the (iii) step by the degrees of freedom between the samples to obtain
variance or mean square (MS) between samples
...
f
...
(v) Obtain the deviations of the values of the sample items for all the samples from corresponding
means of the samples and calculate the squares of such deviations and then obtain their
total
...
Symbolically this can be written:
d
SS within = ∑ X 1i − X 1
i
2
d
+ ∑ X 2i − X 2
i
2
d
+
...
Symbolically, this can be written:
*
It should be remembered that ANOVA test is always a one-tailed test, since a low calculated value of F from the sample
data would mean that the fit of the sample means to the null hypothesis (viz
...
= X k ) is a very good fit
...
e
...
(vii) For a check, the sum of squares of deviations for total variance can also be worked out by
adding the squares of deviations when the deviations for the individual items in all the
samples have been taken from the mean of the sample means
...
e
...
The degrees of freedom for total variance will be equal to the number of items in all
samples minus one i
...
, (n – 1)
...
e
...
(viii) Finally, F-ratio may be worked out as under:
F -ratio =
MS between
MS within
This ratio is used to judge whether the difference among several sample means is significant
or is just a matter of sampling fluctuations
...
If the
worked out value of F, as stated above, is less than the table value of F, the difference is
taken as insignificant i
...
, due to chance and the null-hypothesis of no difference between
sample means stands
...
The higher the calculated value of F is above the table value, the more definite and
sure one can be about his conclusions
...
260
Research Methodology
Table 11
...
IK
+ nk X k − X
Within
samples or
categories
d
∑ X 1i − X 1
d
i
Degrees of
freedom (d
...
)
2
F-ratio
(k – 1)
SS between
( k – 1)
MS between
MS within
(n – k)
SS within
(n – k )
2
+
...
f
...
The various steps involved in the shortcut method are as under:
(i) Take the total of the values of individual items in all the samples i
...
, work out ∑X ij
i = 1, 2, 3, …
j = 1, 2, 3, …
and call it as T
...
Subtract the
correction factor from this total and the result is the sum of squares for total variance
...
Subtract the correction factor from this total and the result is the sum of squares
for variance between the samples
...
(v) The sum of squares within the samples can be found out by subtracting the result of (iv)
step from the result of (iii) step stated above and can be written as under:
R∑ X − bT g U − R∑dT i − bT g U
|
|
| |
SS within = S
V
V S n
n | |
n |
|
T
W T
W
dT i
= ∑X − ∑
2
2
2
j
2
ij
j
2
2
ij
j
nj
After doing all this, the table of ANOVA can be set up in the same way as explained
earlier
...
This is based on an important property of
F-ratio that its value does not change if all the n item values are either multiplied or divided by a
common figure or if a common figure is either added or subtracted from each of the given n item
values
...
This method should be used
specially when given figures are big or otherwise inconvenient
...
Illustration 1
Set up an analysis of variance table for the following per acre production data for three varieties of
wheat, each grown on 4 plots and state if the variety differences are significant
...
We try below both the methods
...
2
Source of
variation
SS
d
...
MS
F-ratio
5% F-limit
(from the F-table)
Between sample
Within sample
8
24
(3 – 1) = 2
(12 – 3) = 9
8/2 = 4
...
67
4
...
67 = 1
...
26
Total
32
(12 – 1) = 11
The above table shows that the calculated value of F is 1
...
26 at 5% level with d
...
being v1 = 2 and v2 = 9 and hence could have arisen due to chance
...
We may, therefore, conclude
that the difference in wheat output due to varieties is insignificant and is just a matter of chance
...
T in the given case = 60
and
n = 12
2
Hence, the correction factor = (T) /n = 60 × 60/12 = 300
...
From now onwards we can set up ANOVA table and interpret F-ratio in the same manner
as we have already done under the direct method
...
For
example, the agricultural output may be classified on the basis of different varieties of seeds and also
on the basis of different varieties of fertilizers used
...
In a factory, the
various units of a product produced during a certain period may be classified on the basis of different
varieties of machines used and also on the basis of different grades of labour
...
The ANOVA
technique is little different in case of repeated measurements where we also compute the interaction
variation
...
(a) ANOVA technique in context of two-way design when repeated values are not there: As we
do not have repeated values, we cannot directly compute the sum of squares within samples as we
had done in the case of one-way ANOVA
...
Analysis of Variance and Co-variance
265
The various steps involved are as follows:
(i) Use the coding device, if the same simplifies the task
...
(iii) Work out the correction factor as under:
Correction factor =
bT g
2
n
(iv) Find out the square of all the item values (or their coded values as the case may be) one by
one and then take its total
...
Symbolically, we can write it as:
Sum of squares of deviations for total variance or total SS
=
2
∑ X ij
bT g
−
2
n
(v) Take the total of different columns and then obtain the square of each column total and
divide such squared values of each column by the number of items in the concerning
column and take the total of the result thus obtained
...
(vi) Take the total of different rows and then obtain the square of each row total and divide
such squared values of each row by the number of items in the corresponding row and take
the total of the result thus obtained
...
(vii) Sum of squares of deviations for residual or error variance can be worked out by subtracting
the result of the sum of (v)th and (vi)th steps from the result of (iv)th step stated above
...
(viii) Degrees of freedom (d
...
) can be worked out as under:
d
...
for total variance
= (c
...
f
...
f
...
f
...
3: Analysis of Variance Table for Two-way Anova
Source of
variation
Between
columns
treatment
Sum of squares
(SS)
Degrees of
freedom (d
...
)
Mean square
(MS)
dT i − bT g
∑
(c – 1)
SS between columns
(c – 1)
MS between columns
MS residual
(r – 1)
SS between rows
(r – 1)
MS between rows
MS residual
(c – 1) (r – 1)
SS residual
(c – 1) (r – 1)
2
2
j
nj
n
bT g − bT g
2
2
Between
rows
treatment
∑
Residual
or error
Total SS – ( SS
between columns
+ SS between rows)
Total
2
∑ X ij
i
ni
n
bT g
−
F-ratio
2
(c
...
Thus, MS residual or the residual variance provides the basis for the F-ratios concerning
variation between columns treatment and between rows treatment
...
Both the F-ratios are compared with their corresponding table values, for given degrees of
freedom at a specified level of significance, as usual and if it is found that the calculated
F-ratio concerning variation between columns is equal to or greater than its table value,
then the difference among columns means is considered significant
...
Illustration 2
Set up an analysis of variance table for the following two-way design results:
Per Acre Production Data of Wheat
(in metric tonnes)
Varieties of seeds
A
B
C
Varieties of fertilizers
W
X
Y
Z
6
7
3
8
5
5
3
7
5
4
3
4
Also state whether variety differences are significant at 5% level
...
ANOVA table can be set up for the given problem as shown in Table 11
...
From the said ANOVA table, we find that differences concerning varieties of seeds are insignificant
at 5% level as the calculated F-ratio of 4 is less than the table value of 5
...
76
...
For this measure we can calculate the sum
of squares and degrees of freedom in the same way as we had worked out the sum of squares for
variance within samples in the case of one-way ANOVA
...
We then find left-over sums of squares and
left-over degrees of freedom which are used for what is known as ‘interaction variation’ (Interaction
is the measure of inter relationship among the two different classifications)
...
We illustrate the same with an
example
...
4: Computations for Two-way Anova (in a design without repeated values)
bT g
2
T = 60, n = 12, ∴ Correction factor =
Step (ii)
Total SS = (36 + 25 + 25 + 49 + 25 + 16 + 9 + 9 + 9 + 64 + 49 + 16) –
n
=
60 × 60
= 300
12
Step (i)
FG 60 × 60 IJ
H 12 K
= 332 – 300
= 32
Step (iii)
SS between columns treatment =
LM 24 × 24 + 20 × 20 + 16 × 16 OP − LM 60 × 60 OP
4
4 Q N 12 Q
N 4
= 144 + 100 + 64 – 300
=8
LM16 × 16 + 16 × 16 + 9 × 9 + 19 × 19 OP − LM 60 × 60 OP
3
3
3 Q N 12 Q
N 3
Step (iv)
SS between rows treatment =
Step (v)
= 85
...
33 + 27
...
33 – 300
= 18
SS residual or error = Total SS – (SS between columns + SS between rows)
= 32 – (8 + 18)
=6
268
Research Methodology
Table 11
...
f
...
e
...
14
Between rows
(i
...
, between varieties
of fertilizers)
Residual or error
18
(4 – 1) = 3
18/3 = 6
6/1 = 6
F(3, 6) = 4
...
Solution: We first make all the required computations as shown below:
We can set up ANOVA table shown in Table 11
...
Analysis of Variance and Co-variance
269
Table 11
...
72
18
Step (i)
T = 187, n = 18, thus, the correction factor =
Step (ii)
Total SS = [(14)2 + (15)2 + (12)2 + (11)2 + (10)2 + (11)2 + (10)2 +(9)2 + (7)2 + (8)2 + (11)2 + (11)2 + (11)2
LM b187g
+ (11) + (10) + (11) + (8) + (7) ] –
MN 18
2
2
2
2
2
2
OP
PQ
= (2019 – 1942
...
28
SS between columns (i
...
, between drugs) =
Step (iii)
LM 73 × 73 + 56 × 56 + 58 × 58 OP − LM b187g
6
6 Q M 18
N 6
N
2
O
P
P
Q
= 888
...
66 + 560
...
72
= 28
...
e
...
67 + 580
...
67 – 1942
...
78
SS within samples = (14 – 14
...
5)2 + (10 – 9
...
5)2 + (11 – 11)2 + (11 – 11)2
+ (12 – 11
...
5)2 + (7 – 7
...
5)2
+ (10 – 10
...
5)2 + (10 – 10
...
5)2
+ (11 – 11)2 + (11 – 11)2 + (8 – 7
...
5)2
= 3
...
28 – [28
...
78 + 3
...
23
Step (v)
Step (vi)
Table 11
...
77
2
14
...
389
F (2, 9) = 4
...
385
= 36
...
78
2
7
...
389
= 7
...
0
F (2, 9) = 4
...
78
F-ratio
○
28
...
e
...
e
...
f
...
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
270
Research Methodology
Source of variation
SS
d
...
MS
F-ratio
5% F-limit
Interaction
29
...
23
4
7
...
389
F (4, 9) = 3
...
50
(18 – 9) = 9
76
...
9
= 0
...
Thus, interaction SS = (76
...
77 + 14
...
50) = 29
...
The above table shows that all the three F-ratios are significant of 5% level which means that
the drugs act differently, different groups of people are affected differently and the interaction term
is significant
...
e
...
Graphic method of studying interaction in a two-way design: Interaction can be studied in a
two-way design with repeated measurements through graphic method also
...
Then we plot the averages for all the samples
on the graph and connect the averages for each variety of the other factor by a distinct mark (or a
coloured line)
...
Let us draw such a graph for the data of illustration 3 of this chapter to see whether
there is any interaction between the two factors viz
...
Graph of the averages for amount of blood pressure reduction in millimeters of
mercury for different drugs and different groups of people
...
11
...
Analysis of Variance and Co-variance
271
The graph indicates that there is a significant interaction because the different connecting lines
for groups of people do cross over each other
...
The highest reduction in blood pressure in case of C is with drug Y and the
lowest reduction is with drug Z, whereas the highest reduction in blood pressure in case of A and B
is with drug X and the lowest reduction is with drug Y
...
In such a situation,
performing F-tests is meaningless
...
ANOVA IN LATIN-SQUARE DESIGN
Latin-square design is an experimental design used frequently in agricultural research
...
The ANOVA technique in case of Latin-square design remains
more or less the same as we have already stated in case of a two-way design, excepting the fact that
the variance is splitted into four parts as under:
(i) variance between columns;
(ii) variance between rows;
(iii) variance between varieties;
(iv) residual variance
...
8
dT i − bT g
∑
2
Variance between
columns or MS
between columns
2
j
nj
n
bc − 1g
bT g − bT g
∑
n
n
=
br − 1g
bT g − bT g
∑
n
n
=
bv − 1g
=
2
Variance between
rows or MS
between rows
=
SS between columns
d
...
=
SS between rows
d
...
=
SS between varieties
d
...
2
i
i
2
Variance between
varieties or MS
between varieties
2
v
v
Residual or error
variance or MS
residual
Total SS – ( SS between columns + SS
between rows + SS between varieties)
=
(c – 1) (c – 2) *
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
Contd
...
○
272
Research Methodology
bT g
SS = ∑d x i −
n
2
where total
2
ij
c = number of columns
r = number of rows
v = number of varieties
Illustration 4
Analyse and interpret the following statistics concerning output of wheat per field obtained as a
result of experiment conducted to test four varieties of wheat viz
...
C
B
A
A
23
25
D
B
D
D
14
18
C
20
17
C
B
20
17
B
21
A
19
20
C
19
19
D
20
A
15
21
Solution: Using the coding method, we subtract 20 from the figures given in each of the small
squares and obtain the coded figures as under:
Row totals
Columns
1
1
2
5
–1
Rows
3
Column
totals
0
1
–3
–10
0
A
B
C
–2
–2
C
D
–6
–1
8
0
B
C
–1
A
B
D
4
3
D
A
D
A
B
C
4
3
2
–3
0
1
0
–4
–1
–7
–7
–5
T = –12
Fig
...
2 (a)
Squaring these coded figures in various columns and rows we have:
Analysis of Variance and Co-variance
273
Squares of
coded figures
Sum of
squares
Columns
1
1
2
C
D
C
D
9
C
7
C
36
D
4
1
A
1
34
B
1
B
0
0
D
1
4
A
9
A
Rows
3
B
25
4
3
2
B
A
9
0
1
46
11
29
35
25
36
Sum of
squares
46
0
T = 122
Fig
...
2 (b)
bT g = b−12g b−12g = 9
2
Correction factor =
SS for total variance =
n
16
bT g
∑d X i −
n
2
= 122 − 9 = 113
ij
dT i − bT g
∑
2
SS for variance between columns =
2
2
j
nj
n
R b0g + b−4 g + b−1g + b−7g
|
=S
4
4
|4 4
T
2
=
2
2
66
− 9 = 7
...
5
4
SS for variance between varieties would be worked out as under:
=
2
U 9
|−
V
|
W
2
U−9
|
V
|
W
274
Research Methodology
For finding SS for variance between varieties, we would first rearrange the coded data in the
following form:
Table 11
...
5
4
∴ Sum of squares for residual variance will work out to
113 – (7
...
5 + 48
...
50
d
...
for variance between columns = (c – 1) = (4 – 1) = 3
d
...
for variance between rows
= (r – 1) = (4 – 1) = 3
d
...
for variance between varieties = (v – 1) = (4 – 1) = 3
d
...
for total variance
= (n – 1) = (16 – 1) = 15
d
...
for residual variance
= (c – 1) (c – 2) = (4 – 1) (4 – 2) = 6
ANOVA table can now be set up as shown below:
=
Table 11
...
f
...
50
= 2
...
50
= 143
...
175
F (3, 6) = 4
...
50
3
46
...
3
...
85
...
76
○
7
...
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
Analysis of Variance and Co-variance
Source of
variation
SS
Between
varieties
Residual
or error
Total
275
d
...
MS
F-ratio
5% F-limit
48
...
50
= 1617
...
1617
= 9
...
175
F (3, 6) = 4
...
50
6
10
...
6
113
...
85 and 9
...
76
...
43 is less
than the table value of 4
...
ANALYSIS OF CO-VARIANCE (ANOCOVA)
WHY ANOCOVA?
The object of experimental design in general happens to be to ensure that the results observed may
be attributed to the treatment variable and to no other causal circumstances
...
“In psychology and education primary
interest in the analysis of covariance rests in its use as a procedure for the statistical control of an
uncontrolled variable
...
In other words, covariance analysis
consists in subtracting from each individual score (Yi) that portion of it Yi´ that is predictable from
uncontrolled variable (Zi) and then computing the usual analysis of variance on the resulting
(Y – Y´)’s, of course making the due adjustment to the degrees of freedom because of the fact that
estimation using regression method required loss of degrees of freedom
...
, p
...
Degrees of freedom associated with adjusted sums of squares will be as under:
Between
within
k–1
N–k–1
Total
N–2
276
Research Methodology
ASSUMPTIONS IN ANOCOVA
The ANOCOVA technique requires one to assume that there is some sort of relationship between
the dependent variable and the uncontrolled variable
...
Other assumptions are:
(i) Various treatment groups are selected at random from the population
...
(iii) The regression is linear and is same from group to group
...
Calculate the adjusted total, within groups and
between groups, sums of squares on X and test the significance of differences between the adjusted
means on X by using the appropriate F-ratio
...
Solution: We apply the technique of analysis of covariance and work out the related measures as
under:
Table 11
...
80
6
...
80
14
...
00
21
...
27
N
∑Y = 33 + 72 + 105 = 210
b∑Y g
2
= 2940
N
∑ X 2 = 9476 ∑Y 2 = 3734 ∑XY = 5838
Correction factor for Y =
Correction factor for XY =
∑ X ⋅ ∑Y
= 4732
N
Hence, total SS for X = ∑X 2 – correction factor for X
= 9476 – 7616
...
73
R b49g + b114g + b175g
|
S 5
5
5
|
T
2
SS between for X =
2
2
U − correction factor for X
| l
q
V
|
W
= (480
...
2 + 6125) – (7616
...
13
SS within for X = (total SS for X) – (SS between for X)
= (1859
...
13) = 271
...
8 + 1036
...
6
SS within for Y = (total SS for Y) – (SS between for Y)
= (794) – (519
...
4
Then, we work out the following values in respect of both X and Y
Total sum of product of XY = ∑ XY – correction factor for XY
= 5838 – 4732 = 1106
SS between for XY =
Rb49g b33g + b114g b72g + b175g b105g U − correction factor for XY
S 5
V
5
5
T
W
= (323
...
6 + 3675) – (4732) = 908
SS within for XY = (Total sum of product) – (SS between for XY)
= (1106) – (908) = 198
278
Research Methodology
ANOVA table for X, Y and XY can now be set up as shown below:
Anova Table for X, Y and XY
Source
d
...
SS for X
SS for Y
Sum of product XY
Between groups
Within groups
2
12
1588
...
60
519
...
40
908
EXY 198
Total
14
TXX 1859
...
00
TXY 1106
Adjusted total SS = TXX
bT g
−
2
XY
TYY
b1106g
= 1859
...
73) – (1540
...
13
Adjusted SS within group = E XX
bE g
−
2
XY
EYY
b198g
= 271
...
40
= (271
...
87)) = 128
...
13 – 128
...
40
Anova Table for Adjusted X
Source
d
...
SS
MS
F-ratio
Between groups
Within group
2
11
190
...
73
95
...
7
8
...
13
At 5% level, the table value of F for v1 = 2 and v2 = 11 is 3
...
21
...
e
...
14 is greater
than table values) and accordingly we infer that F-ratio is significant at both levels which means the
difference in group means is significant
...
e
...
7216
274
...
40
0
...
00
9
...
80
35
...
80) – 0
...
4) = 15
...
80) – 0
...
40) = 22
...
00) – 0
...
00) = 29
...
(a) Explain the meaning of analysis of variance
...
(b)State the basic assumptions of the analysis of variance
...
What do you mean by the additive property of the technique of the analysis of variance? Explain how
this technique is superior in comparison to sampling
...
Write short notes on the following:
(i) Latin-square design
...
(iii) F-ratio and its interpretation
...
4
...
Variety
Yields in fields per acre
1
A
B
2
3
30
20
32
18
22
16
Set up a table of analysis of variance and calculate F
...
71 as the table value of F at 5% level for v1 = 1 and v2 = 4
...
Com
...
, Rajasthan University, 1976)
5
...
Four beds were prepared in each plot and
the manure used
...
6
...
A
16
B
10
C
11
D
09
E
09
E
10
C
09
A
14
B
12
D
11
B
15
D
08
E
08
C
10
A
18
D
12
E
06
B
13
A
13
C
12
C
13
A
11
D
10
E
07
B
14
7
...
05 level of significance that µ 1 = µ 2 = µ 3 for the following data:
Samples
No
...
three
(3)
6
7
6
–
–
Total
No
...
Three varieties of wheat W1, W2 and W3 are treated with four different fertilizers viz
...
The
yields of wheat per acre were as under:
Analysis of Variance and Co-variance
Fertilizer treatment
281
Varieties of wheat
Total
W1
W2
W3
f1
f2
f3
f4
55
64
58
59
72
66
57
57
47
53
74
58
174
183
189
174
Total
236
252
232
720
Set up a table for the analysis of variance and work out the F-ratios in respect of the above
...
The following table gives the monthly sales (in thousand rupees) of a certain firm in three states by its
four salesmen:
States
Salesmen
Total
A
B
C
D
X
Y
Z
5
7
9
4
8
6
4
5
6
7
4
7
20
24
28
Total
21
18
15
18
72
Set up an analysis of variance table for the above information
...
10
...
Manufacturing and Fashion retailing:
Banking
Manufacturing
Fashion retailing
41
45
34
53
51
44
54
48
46
55
43
45
43
39
51
Can we consider the psychological health of corporate executives in the given three fields to be equal at
5% level of significance?
11
...
(M
...
(EAFM) Exam
...
University, 1979)
12
...
The following are paired observations for three experimental groups concerning an experimental involving
three methods of teaching performed on a single class
...
12 pupils were assigned at random to 3 groups of 4 pupils each, one group from one
method as shown in the table
...
Also calculate the adjusted means on Y
...
Adjusted means on Y will be as under:
For Group I
20
...
70
For Group III
22
...
The test technique makes use of one or more values obtained from sample data [often
called test statistic(s)] to arrive at a probability statement about the hypothesis
...
For instance, it may assume that population is normally distributed, sample drawn is a random
sample and similar other assumptions
...
But no such assumptions
are made in case of non-parametric tests
...
, an assertion directly related to the
purpose of investigation and other assertions to make a probability statement
...
When we apply a test (to test the hypothesis) without a model, it is known as
distribution-free test, or the nonparametric test
...
In other words, under non-parametric or distribution-free tests we do not assume that a particular
distribution is applicable, or that a certain value is attached to a parameter of the population
...
In fact, there is a growing use of such tests in
situations when the normality assumption is open to doubt
...
The present chapter discusses few such tests
...
The following distribution-free tests are important
and generally used:
(i) Test of a hypothesis concerning some single value for the given data (such as one-sample
sign test)
...
(iii) Test of a hypothesis of a relationship between variables (such as Rank correlation, Kendall’s
coefficient of concordance and other tests for dependence
...
e
...
, Kruskal-Wallis test
...
, one sample runs test
...
, the chi-square test
...
) The chi-square test can as well be used to make comparison between
theoretical populations and actual data when categories are used
...
1
...
Its name comes from the fact that it is based on
the direction of the plus or minus signs of observations in a sample and not on their numerical
magnitudes
...
(a) One sample sign test: The one sample sign test is a very simple non-parametric test applicable
when we sample a continuous symmetrical population in which case the probability of getting a
sample value less than mean is 1/2 and the probability of getting a sample value greater than mean is
also 1/2
...
But if the value happens
to be equal to µ H0 , then we simply discard it
...
For
performing one sample sign test when the sample is small, we can use tables of binomial probabilities,
but when sample happens to be large, we use normal approximation to binomial distribution
...
*
If it is not possible for one reason or another to assume a symmetrical population, even then we can use the one sample
~ ~
~
sign test, but we shall then be testing the null hypothesis µ = µ H0 , where µ is the population median
...
Use the sign test at 5% level of significance to test the null hypothesis
that professional golfers average µ H0 = 284 for four rounds against the alternative hypothesis
µ H0 < 284
...
05) level of significance, we first replace each value greater than 284 with a plus sign and each
value less than 284 with a minus sign and discard the one value which actually equals 284
...
Now we can examine whether the one plus sign observed in 10 trials support the null hypothesis
p = 1/2 or the alternative hypothesis p < 1/2
...
010 + 0
...
011
Since this value is less than α = 0
...
In other words, we
conclude that professional golfers’ average is less than 284 for four rounds of golf
...
If we do that,
we find the observed proportion of success, on the basis of signs that we obtain, is 1/10 and that of
failure is 9/10
...
standard error of proportion assuming null hypothesis p = 1/2 is as under:
σ prop
...
1581
For testing the null hypothesis i
...
, p = 1/2 against the alternative hypothesis p < 1/2, a one-tailed test
is appropriate which can be indicated as shown in the Fig
...
1
...
45 of the area
under normal curve and it is 1
...
Using this, we now work out the limit (on the lower side as the
alternative hypothesis is of < type) of the acceptance region as under:
p − z ⋅ σ b prop
...
64) (0
...
2593
2
0
...
8 given in appendix at the end of the book
...
64) (s
0
...
45 of area)
0
...
12
...
1 which comes in the rejection region, we
reject the null hypothesis at 5% level of significance and accept the alternative hypothesis
...
(b) Two sample sign test (or the sign test for paired data): The sign test has important applications
in problems where we deal with paired data
...
In case the two values are equal, the concerning pair is discarded
...
) The testing technique remains the same as started in case of one sample sign
test
...
Illustration 2
The following are the numbers of artifacts dug up by two archaeologists at an ancient cliff dwelling
on 30 days
...
Solution: First of all the given paired values are changed into signs (+ or –) as under:
Testing of Hypotheses-II
287
Table 12
...
)
Thus the observed proportion of pluses (or successes) in the sample is = 20/26 = 0
...
2308
...
Hence,
the standard error of proportion of successes, given the null hypothesis and the size of the sample, we
have:
σ prop
...
0981
26
Since the alternative hypothesis is that the archaeologists X is better (or p > 1/2), we find one
tailed test is appropriate
...
32 (s
0
...
01 of area
0
...
12
...
49 of the
area under normal curve and it is 2
...
Using this, we now work out the limit (on the upper side as the
alternative hypothesis is of > type) of the acceptance region as under:
b
g
p + 2
...
= 0
...
32 0
...
5 + 0
...
7276
and we now find the observed proportion of successes is 0
...
In other words, we accept the alternative hypothesis, and thus conclude
that archaeologist X is better
...
They are generally based on binomial distribution, but when the
sample size happens to be large enough (such that n ⋅ p and n ⋅ q both happen to be greater than 5),
we can as well make use of normal approximation to binomial distribution
...
Fisher-Irwin Test
Fisher-Irwin test is a distribution-free test used in testing a hypothesis concerning no difference
among two sets of data
...
Suppose the management of a business unit has designed a new training programme which is now
ready and as such it wishes to test its performance against that of the old training programme
...
This group of
twelve is then divided into two groups of six each, one group for each training programme
...
After the training is completed, all workers are given the
same examination and the result is as under:
Table 12
...
passed
No
...
But the
question arises: Is it really so? It is just possible that the difference in the result of the two groups may
be due to chance factor
...
Then how can a decision be made? We may test the hypothesis for the purpose
...
Prior to testing, the significance level (or the
α value) must be specified and supposing the management fixes 5% level for the purpose, which
must invariably be respected following the test to guard against bias entering into the result and to
avoid the possibility of vacillation oil the part of the decision maker
...
This should be done keeping in view the probability principles
...
of Group A doing as well or better
= Pr
...
(6 passing and 0 failing)
=
8
C5 × 4 C1
12
C6
+
8
C6 × 4 C0
12
C6
224
28
+
= 0
...
03 = 0
...
of Group B doing as well or worse
= Pr
...
(2 passing and 4 failing)
=
=
8
C3 × 4 C3
12
C6
+
8
C2 × 4 C4
12
C6
224
28
+
= 0
...
03 = 0
...
05
already specified by the management
...
05 and hence, we must accept the null hypothesis
...
Hence, we can infer that both training programmes
are equally good
...
For instance, in the given example the worker’s performance was classified as fail or pass and
accordingly numbers failed and passed in each group were obtained
...
This in fact is the limitation
of the Fisher-Irwin test which can be removed if we apply some other test, say, Wilcoxon test as
stated in the pages that follow
...
McNemer Test
McNemer test is one of the important nonparametric tests often used when the data happen to be
nominal and relate to two related samples
...
The experiment is designed for the use of this test in such a way
that the subjects initially are divided into equal groups as to their favourable and unfavourable views
about, say, any system
...
Through McNemer test
we in fact try to judge the significance of any observed change in views of the same subjects before
290
Research Methodology
and after the treatment by setting up a table in the following form in respect of the first and second
set of responses:
Table 12
...
The test statistic under McNemer Test is worked out as under (as it
uses the under-mentioned transformation of Chi-square test):
χ
2
c A − D − 1h
=
b A + Dg
2
with d
...
= 1
The minus 1 in the above equation is a correction for continuity as the Chi-square test happens to be
a continuous distribution, whereas the observed data represent a discrete distribution
...
Solution: In the given question we have nominal data and the study involves before-after
measurements of the two related samples, we can use appropriately the McNemer test
...
This, in other words, means that the probability of favourable response before
and unfavourable response after is equal to the probability of unfavourable response before and
favourable response after i
...
,
H0: P(A) = P (D)
We can test this hypothesis against the alternative hypothesis (Ha) viz
...
67
300
Degree of freedom = 1
...
84
...
67 which is greater than the table value,
indicating that we should reject the null hypothesis
...
2
4
...
e
...
, Wilcoxon matched-paires test
...
The actual signs of each difference are then put to
corresponding ranks and the test statistic T is calculated which happens to be the smaller of the two
sums viz
...
While using this test, we may come across two types of tie situations
...
e
...
The other situation arises when two or more pairs
have the same difference value in which case we assign ranks to such pairs by averaging their rank
positions
...
5 i
...
, (5 + 6)/2 = 5
...
When the given number of matched pairs after considering the number of dropped out pair(s), if
any, as stated above is equal to or less than 25, we use the table of critical values of T (Table No
...
For this test, the calculated value of T must be equal to or smaller than the table value in order to
reject the null hypothesis
...
Illustration 4
An experiment is conducted to judge the effect of brand name on quality perception
...
The following data are obtained:
Pair
Brand A
Brand B
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
73
43
47
53
58
47
52
58
38
61
56
56
34
55
65
75
51
41
43
41
47
32
24
58
43
53
52
57
44
57
40
68
Test the hypothesis, using Wilcoxon matched-pairs test, that there is no difference between the
perceived quality of the two samples
...
Solution: Let us first write the null and alternative hypotheses as under:
H0: There is no difference between the perceived quality of two samples
...
Using Wilcoxon matched-pairs test, we work out the value of the test statistic T as under:
Table 12
...
5
13
2
...
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
Testing of Hypotheses-II
293
Pair
Brand A
Brand B
3
4
5
6
7
8
9
10
11
12
13
14
15
16
47
53
58
47
52
58
38
61
56
56
34
55
65
75
43
41
47
32
24
58
43
53
52
57
44
57
40
68
Difference
di
4
12
11
15
28
0
–5
8
4
–1
–10
–2
25
7
Rank of
|d i|
4
...
5
1
9
2
...
5
11
10
12
15
–
…
8
4
...
5
…
…
TOTAL
101
...
5
Hence,
T = 18
...
The table value of T at five percent level of significance when n = 15 is 25 (using a two-tailed
test because our alternative hypothesis is that there is difference between the perceived quality of
the two samples)
...
5 which is less than the table value of 25
...
5
...
, the U test and the H test
...
A brief description of the said two tests is given below:
(a) Wilcoxon-Mann-Whitney test (or U-test): This is a very popular test amongst the rank sum
tests
...
It uses more information than the sign test or the Fisher-Irwin test
...
However,
in practice even the violation of this assumption does not affect the results very much
...
We usually adopt low to high
ranking process which means we assign rank 1 to an item with lowest value, rank 2 to the next higher
item and so on
...
For example, if sixth, seventh and eighth values are identical, we
would assign each the rank (6 + 7 + 8)/3 = 7
...
Then we work out the test statistic i
...
, U, which is a measurement of
the difference between the ranked observations of the two samples as under:
U = n1 ⋅ n2 +
b
g−R
n1 n1 + 1
2
1
where n1, and n2 are the sample sizes and R1 is the sum of ranks assigned to the values of the first
sample
...
)
In applying U-test we take the null hypothesis that the two samples come from identical populations
...
Under the alternative hypothesis, the
means of the two populations are not equal and if this is so, then most of the smaller ranks will go to
the values of one sample while most of the higher ranks will go to those of the other sample
...
e
...
But if either n1 or n2 is so small that the normal curve
approximation to the sampling distribution of U cannot be used, then exact tests may be based on
special tables such as one given in the, appendix,* showing selected values of Wilcoxon’s (unpaired)
distribution
...
Illustration 5
The values in one sample are 53, 38, 69, 57, 46, 39, 73, 48, 73, 74, 60 and 78
...
Test at the 10% level the hypothesis that they
come from populations with the same mean
...
Solution: First of all we assign ranks to all observations, adopting low to high ranking process on the
presumption that all given items belong to a single sample
...
6 given in appendix at the end of the book
...
5
Size of sample item in
ascending order
Rank
Name of related sample:
[A for sample one and
B for sample two]
32
38
39
40
41
44
44
46
48
52
53
53
57
60
61
67
69
70
72
72
73
73
74
78
1
2
3
4
5
6
...
5
8
9
10
11
...
5
13
14
15
16
17
18
19
...
5
21
...
5
23
24
B
A
A
B
B
B
B
A
A
B
B
A
A
A
B
B
A
B
B
B
A
A
A
A
From the above we find that the sum of the ranks assigned to sample one items or R1 = 2 + 3 + 8 +
9 + 11
...
5 + 21
...
5 and similarly we find that the sum of ranks
assigned to sample two items or R2 = 1 + 4 + 5 + 6
...
5 + 10 + 11
...
5 + 19
...
5 and we have n1 = 12 and n2 = 12
g−R
2
12 b12 + 1g
= b12g b12g +
− 167
...
5 = 54
...
Keeping this in view, we work out the mean and standard
deviation taking the null hypothesis that the two samples come from identical populations as under:
296
Research Methodology
b gb g
12 12
n1 × n2
=
= 72
2
2
µU =
b
g = b12g b12g b12 + 12 + 1g
n1n2 n1 + n2 + 1
σU =
12
12
= 17
...
Accordingly the limits of acceptance region, keeping in view 10% level of
significance as given, can be worked out as under:
- 164
...
s
u
Limit
u
Limit
m
0
...
05 of area
0
...
45 of area
43
...
4
(Shaded portion indicates
rejection regions)
Fig
...
3
As the z value for 0
...
64, we have the following limits
of acceptance region:
b g
...
32g = 43
...
...
32 = 100
...
Lower limit = µ U − 164 σU
As the observed value of U is 54
...
We can as well calculate the U statistic as under using R2 value:
U = n1 ⋅ n2 +
b gb g
b
g−R
n2 n2 + 1
= 12 12 +
2
2
b
g − 132
...
5 = 89
...
We can take one more example concerning U test wherein n1 and n2 are both less than 8 and as
such we see the use of table given in the appendix concerning values of Wilcoxon’s distribution
(unpaired distribution)
...
Test applying Wilcoxon test whether the two samples come from populations with the
same mean at 10% level against the alternative hypothesis that these samples come from populations
with different means
...
6
Size of sample item
in ascending order
Rank
Name of related sample
(Sample one as A
Sample two as B)
6
24
33
36
39
44
53
90
94
1
2
3
4
5
6
7
8
9
B
B
B
A
B
A
B
A
A
Sum of ranks assigned to items of sample one = 4 + 6 + 8 + 9 = 27
No
...
of items in this sample = 5
As the number of items in the two samples is less than 8, we cannot use the normal curve
approximation technique as stated above and shall use the table giving values of Wilcoxon’s distribution
...
Also, let ‘s’ be
the number of items in the sample with smaller sum and let ‘l’ be the number of items in the sample
with the larger sum
...
We now find the
difference between Ws and the minimum value it might have taken, given the value of s
...
Thus, (Ws – Minimum Ws) = 18 – 15 = 3
...
The entry in this cell is 0
...
Since the alternative hypothesis is that the two samples come from populations with different
means, a two-tailed test is appropriate and accordingly 10% significance level will mean 5% in the
left tail and 5% in the right tail
...
05, given the null hypothesis and the significance level
...
05 (which actually is so in the given case as 0
...
05), then we
should accept the null hypothesis
...
(The same result we can get by using the value of Wl
...
Since for this problem, the maximum value of Wl (given s = 5 and
l = 4) is the sum of 6 through 9 i
...
, 6 + 7 + 8 + 9 = 30, we have Max
...
All other things then remain the
same as we have stated above)
...
This test is used to test the null hypothesis that ‘k’ independent random samples
come from identical universes against the alternative hypothesis that the means of these universes
are not equal
...
In this test, like the U test, the data are ranked jointly from low to high or high to low as if they
constituted a single sample
...
+ nk and Ri being the sum of the ranks assigned to ni observations in the ith
sample
...
As such we can reject the null hypothesis at a
given level of significance if H value calculated, as stated above, exceeds the concerned table value
of chi-square
...
A
With Ball No
...
C
With Ball No
...
Siegel
...
7
Bowling results
Rank
Name of the
ball associated
302
297
282
279
276
275
271
270
268
266
262
260
258
257
255
252
248
246
242
239
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
B
D
A
D
B
B
A
D
B
C
A
C
D
A
C
B
A
C
D
C
For finding the values of Ri, we arrange the above table as under:
Table 12
...
02857) (2362
...
51 – 63 = 4
...
Now taking the null hypothesis that the bowler performs equally well with the
2
four balls, we have the value of χ = 7
...
Since the calculated value of H is only 4
...
815, so we accept the null hypothesis and conclude that bowler performs equally well with the four
bowling balls
...
One Sample Runs Test
One sample runs test is a test used to judge the randomness of a sample on the basis of the order in
which the observations are taken
...
This is particularly true when we have little or no
control over the selection of the data
...
None of this information constitutes a random sample in the strict sense
...
A run is a
succession of identical letters (or other kinds of symbols) which is followed and preceded by different
letters or no letters at all
...
In this way there are 7 runs in all or r = 7
...
In the given case there seems some grouping
i
...
, the diseased trees seem to come in groups
...
We
shall use the following symbols for a test of runs:
n1 = number of occurrences of type 1 (say H in the given case)
n2 = number of occurrences of type 2 (say D in the given case)
*
For the application of H test, it is not necessary that all samples should have equal number of items
...
In the given case the values of n1, n2 and r would be as follows:
n1 = 20; n2 = 10; r = 7
The sampling distribution of ‘r’ statistic, the number of runs, is to be used and this distribution has
its mean
2n1n2
+1
n1 + n2
µr =
and the standard deviation σ r = 2n1n2
bn
1
2n1n2 − n1 − n2
+ n2
g bn
2
1
g
+ n2 − 1
In the given case, we work out the values of µ r and σ r as follows:
b2g b20g b10g + 1 = 14
...
38
b20 + 10g b20 + 10 − 1g
σr =
2
For testing the null hypothesis concerning the randomness of the planted trees, we should have been
given the level of significance
...
01
...
- 2
...
58
s
0
...
495 of
area
8
...
495 of
area
= 14
...
005 of area
20
...
12
...
e
...
But in case
n1 or n2 is so small that the normal curve approximation assumption cannot be used, then exact tests may be based on
special tables which can be seen in the book Non-parametric Statistics for the Behavioural Science by S
...
302
Research Methodology
By using the table of area under normal curve, we find the appropriate z value for 0
...
58
...
58) (2
...
33 + 6
...
47 and
Lower limit = µ r – (2
...
38) = 14
...
14 = 8
...
e
...
e
...
Therefore, we cannot accept the null hypothesis of randomness at the given
level of significance viz
...
01
...
One sample runs test, as explained above, is not limited only to test the randomness of series of
attributes
...
Numbers equal to the median are omitted
...
(The method of runs above and below the median is helpful in testing for trends or cyclical
patterns concerning economic data
...
In case of a cyclical pattern, there will be a systematic alternating of a’s and b’s and probably many
runs
...
Spearman’s Rank Correlation
When the data are not available to use in numerical form for doing correlation analysis but when the
information is sufficient to rank the data as first, second, third, and so forth, we quite often use the
rank correlation method and work out the coefficient of rank correlation
...
In other words, it is
a measure of association that is based on the ranks of the observations and not on the numerical
values of the data
...
For calculating rank correlation coefficient, first of all the actual observations be replaced by
their ranks, giving rank 1 to the highest value, rank 2 to the next highest value and following this very
order ranks are assigned for all values
...
The second step is to record
the difference between ranks (or ‘d’) for each pair of observations, then square these differences to
obtain a total of such differences which can symbolically be stated as ∑d i2
...
Rho is to be used when the sample size does not exceed 30
...
The value of Spearman’s rank correlation coefficient will always vary between ±1 , +1, indicating
a perfect positive correlation and –1 indicating perfect negative correlation between two variables
...
Suppose we get r = 0
...
But how we should test this value of 0
...
For small values of n (i
...
, n less than 30), the distribution of r is not normal and as such
we use the table showing the values for Spearman’s Rank correlation (Table No
...
Suppose we get r = 0
...
In this case our problem is reduced to test the null
hypothesis that there is no correlation i
...
, ur = 0 against the alternative hypothesis that there is a
correlation i
...
, µ r ≠ 0 at 5% level
...
05 and find that the critical
values for r are ±0
...
e
...
5179 and the lower limit
of the acceptance region is –0
...
And since our calculated r = 0
...
In case the sample consists of more than 30 items, then the sampling distribution of r is
approximately normal with a mean of zero and a standard deviation of 1/ n − 1 and thus, the
standard error of r is:
σr =
1
n−1
We can use the table of area under normal curve to find the appropriate z values for testing
hypotheses about the population rank correlation and draw inference as usual
...
Illustration 8
Personnel manager of a certain company wants to hire 30 additional programmers for his corporation
...
The agency doing aptitude test had charged Rs
...
200 for a test
...
100 for a test was a reasonable price
...
200
...
If he
becomes confident (using 0
...
What decision should he take on the basis
of the following sample data concerning 35 applicants?
304
Research Methodology
Sample Data Concerning 35 Applicants
Serial Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Interview score
81
88
55
83
78
93
65
87
95
76
60
85
93
66
90
69
87
68
81
84
82
90
63
78
73
79
72
95
81
87
93
85
91
94
94
Aptitude test score
113
88
76
129
99
142
93
136
82
91
83
96
126
108
95
65
96
101
111
121
83
79
71
109
68
121
109
121
140
132
135
143
118
147
138
Solution: To solve this problem we should first work out the value of Spearman’s r as under:
Testing of Hypotheses-II
305
Table 12
...
No
...
5
6
32
13
1
...
5
6
31
9
...
5
33
24
...
5
21
13
6
15
...
5
3
...
5
22
...
5
24
35
22
...
5
31
33
18
...
5
3
7
7
–28
...
5
–7
–4
12
...
5
–6
–9
...
5
–21
...
5
17
5
–1
13
...
5
–1
...
25
9
49
49
812
...
25
49
16
156
...
25
36
90
...
25
462
...
25
289
25
1
182
...
25
225
∑di2 = 3583
306
Research Methodology
R 6 ∑ d U R 6 × 3583 U
|=1− |
|
Spearman’s ‘ r’ = 1 – S
V S 35e35 − 1j |
V
|
| nen − 1j | |
W T
W
T
2
i
2
2
21498
= 0
...
Hence the standard error of r is
σr =
1
n−1
1
=
35 − 1
= 0
...
01 level of significance, the problem
can be stated:
Null hypothesis that there is no correlation between interview score and aptitude test score i
...
,
µ r = 0
...
e
...
As such one-tailed test is appropriate which can be indicated as under in the given case:
m
+ 2
...
01 of area
0
...
3978
(Shaded area shows
rejection region)
Fig
...
5
By using the table of area under normal curve, we find the appropriate z value for 0
...
32
...
32) (0
...
3978
= 0
...
498 and as such it comes in the rejection region and, therefore,
we reject the null hypothesis at 1% level and accept the alternative hypothesis
...
Accordingly personnel
manager should decide that the aptitude test be discontinued
...
Kendall’s Coefficient of Concordance
Kendall’s coefficient of concordance, represented by the symbol W, is an important non-parametric
measure of relationship
...
When there are only two sets of rankings of N objects, we
generally work out Spearman’s coefficient of correlation, but Kendall’s coefficient of concordance
(W) is considered an appropriate measure of studying the degree of association among three or more
sets of rankings
...
The basis of Kendall’s coefficient of concordance is to imagine how the given data would look if
there were no agreement among the several sets of rankings, and then to imagine how it would look
if there were perfect agreement among the several sets
...
Another applicant would be assigned
a rank 2 by all four and the sum of his ranks will be 2 + 2 + 2 + 2 = 8
...
In general, when
perfect agreement exists among ranks assigned by k judges to N objects, the rank sums are k, 2k,
3k, … Nk
...
The degree of agreement between judges reflects itself in the variation in the rank sums
...
Disagreement between judges reflects itself in a reduction in
the variation of rank sums
...
This provides the basis for the definition of a coefficient of concordance
...
When maximum disagreement exists, W equals to
0
...
Thus, coefficient of concordance (W) is an index
of divergence of the actual agreement shown in the data from the perfect agreement
...
of sets of rankings i
...
, the number of judges;
N = number of objects ranked;
e
j
1 2 3
k N − N = maximum possible sum of the squared deviations i
...
, the sum s which
12
would occur with perfect agreement among k rankings
...
e
...
If the ties are not numerous, we may
compute ‘W’ as stated above without making any adjustment in the formula; but if the ties are
numerous, a correction factor is calculated for each set of ranks
...
For instance, if the ranks on X are 1, 2, 3
...
5, 8, 10, 8, 8, we have two groups of ties, one
of two ranks and one of three ranks
...
5
− 2 + 33 − 3
12
A correction factor T is calculated for each of the k sets of ranks and these are added together
over the k sets to obtain ∑T
...
W =
e
j
(e) The method for judging whether the calculated value of W is significantly different from
zero depends on the size of N as stated below:
(i) If N is 7 or smaller, Table No
...
If an observed s is
equal to or greater than that shown in the table for a particular level of significance,
then H0 T (i
...
, k sets of rankings are independent) may be rejected at that level of
significance
...
W with
2
d
...
= (N – 1) for judging W’s significance at a given level in the usual way of using χ
values
...
Kendall,
therefore, suggests that the best estimate of the ‘true’ rankings of N objects is provided,
when W is significant, by the order of the various sums of ranks, Rj
...
The best estimate is related to the
lowest value observed amongst Rj
...
Illustration 9
Seven individuals have been assigned ranks by four judges at a certain music competition as shown
in the following matrix:
Individuals
A
B
C
D
E
F
G
1
2
3
1
3
4
4
2
2
1
1
5
5
3
2
4
7
7
7
6
4
5
6
3
6
6
5
7
Judge 1
Judge 2
Judge 3
Judge 4
Is there significant agreement in ranking assigned by different judges? Test at 5% level
...
Solution: As there are four sets of rankings, we can work out the coefficient of concordance (W) for
judging significant agreement in ranking by different judges
...
9
K=4
∴N=7
Individuals
A
B
C
D
E
F
G
Judge 1
Judge 2
Judge 3
Judge 4
1
2
3
1
3
4
4
2
2
1
1
5
5
3
2
4
7
7
7
6
4
5
6
3
6
6
5
7
Sum of ranks (Rj)
7
13
9
14
27
18
24
∑R j = 112
81
9
49
4
121
4
64
∴ s = 332
dR
j
− Rj
i
2
310
Q
Q
Research Methodology
Rj =
∑R j
N
=
112
= 16
7
s = 332
∴ W=
e
s
1 2 3
k N − N
12
j
=
332
b g e7
1
4
12
2
3
−7
j
=
332
332
=
= 0
...
9 given in appendix for finding the
value of s at 5% level for k = 4 and N = 7
...
0 and thus for accepting the null
hypothesis (H0) that k sets of rankings are independent) our calculated value of s should be less than
217
...
741 is significant
...
e
...
The lowest value observed amongst Rj is 7 and as
such the best estimate of true rankings is in the case of individual A i
...
, all judges on the whole place
the individual A as first in the said music competition
...
577
Determine the significance of W at 5% level
...
577)
or
χ 2 = (247) (0
...
52
2
2
Table value of χ at 5% level for N – 1 = 20 – 1 = 19 d
...
is 30
...
52 and this is considerably higher than the table value
...
RELATIONSHIP BETWEEN SPEARMANS r s AND KENDALLS W
As stated above, W is an appropriate measure of studying the degree of association among three or
more sets of ranks, but we can as well determine the degree of association among k sets of rankings
by averaging the Spearman’s correlation coefficients (r’s) between all possible pairs (i
...
, kC2 or
k (k – 1)/2) of rankings keeping in view that W bears a linear relation to the average r’s taken over
Testing of Hypotheses-II
311
all possible pairs
...
Illustration 11
Using data of illustration No
...
Solution: As k = 4 in the given question, the possible pairs are equal to k(k – 1)/2 = 4(4 – 1)/2 = 6 and
we work out Spearman’s r for each of these pairs as shown in Table 12
...
Now we can find W using the following relationship formula between r’s average and W
Average of r’s = (kW – 1)/(k – 1)
or
0
...
655) (3) = 4W – 1
or
W =
b0
...
965 = 0
...
They do not suppose any particular distribution and the consequential assumptions
...
They are rather quick and easy to use i
...
, they do not require laborious computations since
in many cases the observations are replaced by their rank order and in many others we
simply use signs
...
They are often not as efficient or ‘sharp’ as tests of significance or the parametric tests
...
The reason being that these tests do not
use all the available information but rather use groupings or rankings and the price we pay
is a loss in efficiency
...
4
...
5
...
6
...
Individuals
|d |
|d |
Pair 1 – 3
d2
|d |
d2
1
1
–1
–2
0
1
0
A
B
C
D
E
F
G
Pair 1 – 2
d2
Pair 1 – 4
1
1
1
4
0
1
0
2
1
1
3
0
2
1
4
1
1
9
0
4
1
0
1
3
1
1
1
1
0
1
9
1
1
1
1
Pair 2 – 3
|d |
d2
1
0
0
1
0
1
1
1
0
0
1
0
1
1
Pair 2 – 4
|d |
d2
1
2
4
1
1
2
1
1
4
16
1
1
4
1
312
Table 12
...
643
r14= 0
...
929
r24= 0
...
250
Spearman’s
Coefficient of
Correlation
r =1−
6 ∑d i2
e
N N2 − 1
r12= 0
...
857 + 0
...
750 + 0
...
500 + 0
...
929
= 0
...
) cannot
be met, then we can use non-parametric methods
...
This is the reason why such tests have become popular
...
But
then the other side must also be kept in view that the more one assumes, the more one limits the
applicability of one’s methods
...
Give your understanding of non-parametric or distribution free methods explaining their important
characteristics
...
Narrate the various advantages of using non-parametric tests
...
3
...
4
...
Kalicharan had to wait 4, 8, 2, 7, 7, 5, 8, 6, 1, 9, 6, 6, 5, 9 and 5 minutes for the bus he
takes to reach his office
...
Kalicharan should not have to wait more than 5 minutes for a bus
...
The following are the numbers of tickets issued by two policemen on 20 days:
By first policeman:
By second policeman:
7, 10, 14, 12, 6, 9, 11, 13, 7, 6, 10, 8, 14, 8, 12, 11, 9, 8, 10 and 15
...
Use the sign test at 1% level of significance to test the null hypothesis that on the average the two
policemen issue equal number of tickets against the alternative hypothesis that on the average the
second policeman issues more tickets than the first one
...
(a) Under what circumstances is the Fisher-Irwin test used? Explain
...
Two brick
manufacturing concerns have given him nearly identical rates for supplying the bricks
...
The nature of the test is to
subject each sampled brick to a force of 900 pounds
...
The results were as follows:
Of the 8 bricks from concern A, two were broken and of the 8 bricks from concern B, five were broken
...
7
...
The force applied at the time the brick breaks (calling it the breaking
point) is recorded as under:
Breaking-points
Bricks of concern A
Bricks of concern B
880,
915,
950,
790,
990,
905,
975
900,
895,
890,
1030, 1025,
825,
810
1010
885
...
8
...
Use the Kruskal-Wallis test at the level of significance α = 0
...
9
...
Test for
randomness at 1% level of significance
...
Test whether this arrangement of A’s and F’s may be regarded as random at 5% as well as at 10% level
of significance
...
Use a rank correlation at the 1% significance level and determine if there is significant positive correlation
between the two samples on the basis of the following information:
Blender
model
A1
A2
A3
B
C1
C2
D1
D2
E
F1
F2
G1
G2
H
Sample 1
1
11
12
2
13
10
3
4
14
5
6
9
7
8
Sample 2
4
12
11
2
13
10
1
3
14
8
6
5
9
7
11
...
Test the significance of W at 5% and 1%
levels of significance and state what should be inferred from the same
...
12
...
607
rac = 0
...
393
Calculate Kendall’s coefficient of concordance W from the above information and test its significance at
5% level
...
We may as well use the term ‘multivariate
analysis’ which is a collection of methods for analyzing data in which a number of observations are
available for each object
...
For instance, in the field of intelligence testing if we start with the theory that general
intelligence is reflected in a variety of specific performance measures, then to study intelligence in
the context of this theory one must administer many tests of mental skills, such as vocabulary, speed
of recall, mental arithmetic, verbal analogies and so on
...
Most of the research
studies involve more than two variables in which situation analysis is desired of the association
between one (at times many) criterion variable and several independent variables, or we may be
required to study the association between variables having no dependency relationships
...
In brief, techniques that
take account of the various relationships among variables are termed multivariate analyses or
multivariate techniques
...
The main reason being that a series of univariate analysis carried out
separately for each variable may, at times, lead to incorrect interpretation of the result
...
As a result, during the last fifty years, a number of statisticians have contributed to the development
of several multivariate techniques
...
These techniques
are used in analyzing social, psychological, medical and economic data, specially when the variables
concerning research studies of these fields are supposed to be correlated with each other and when
rigorous probabilistic models cannot be appropriately used
...
316
Research Methodology
CHARACTERISTICS AND APPLICATIONS
Multivariate techniques are largely empirical and deal with the reality; they possess the ability to
analyse complex data
...
Besides being a tool for analyzing the
data, multivariate techniques also help in various types of decision-making
...
This system, though
apparently fair, may at times be biased in favour of some subjects with the larger standard deviations
...
We may also cite an example from medical field
...
Each of the
results of such examinations has significance of its own, but it is also important to consider relationships
between different test results or results of the same tests at different occasions in order to draw
proper diagnostic conclusions and to determine an appropriate therapy
...
In view of all this, we can state that “if the researcher is interested in
making probability statements on the basis of sampled multiple measurements, then the best strategy
of data analysis is to use some suitable multivariate statistical technique
...
In other words, multivariate techniques transform a mass of observations
into a smaller number of composite scores in such a way that they may reflect as much information
as possible contained in the raw data obtained concerning a research study
...
Mathematically, multivariate techniques consist in “forming a linear
composite vector in a vector subspace, which can be represented in terms of projection of a vector
onto certain specified subspaces
...
Even then before applying multivariate techniques for meaningful results, one must consider
the nature and structure of the data and the real aim of the analysis
...
CLASSIFICATION OF MULTIVARIATE TECHNIQUES
Today, there exist a great variety of multivariate techniques which can be conveniently classified into
two broad categories viz
...
This sort of
classification depends upon the question: Are some of the involved variables dependent upon others?
If the answer is ‘yes’, we have dependence methods; but in case the answer is ‘no’, we have
interdependence methods
...
Firstly, in case some variables are dependent, the question is how many variables are
dependent? The other question is, whether the data are metric or non-metric? This means whether
1
2
K
...
Yanai and B
...
Mukherji, The Foundations of Multivariate Analysis, p
...
Ibid
...
iii
...
The technique to be used for a given situation depends upon the
answers to all these very questions
...
Sheth in his article on “The multivariate revolution in
marketing research”3 has given the flow chart that clearly exhibits the nature of some important
multivariate techniques as shown in Fig
...
1
...
In the former category are included techniques like multiple regression analysis, multiple
discriminant analysis, multivariate analysis of variance and canonical analysis, whereas in the latter
category we put techniques like factor analysis, cluster analysis, multidimensional scaling or MDS
(both metric and non-metric) and the latent structure analysis
...
13
...
35, No
...
1971), pp
...
318
Research Methodology
VARIABLES IN MULTIVARIATE ANALYSIS
Before we describe the various multivariate techniques, it seems appropriate to have a clear idea
about the term, ‘variables’ used in the context of multivariate analysis
...
Important
ones are as under:
(i) Explanatory variable and criterion variable: If X may be considered to be the cause of Y,
then X is described as explanatory variable (also termed as causal or independent variable) and Y is
described as criterion variable (also termed as resultant or dependent variable)
...
, Xp) may be called a set of explanatory variables and the set (Y1, Y2, Y3, …
...
In economics, the explanatory variables are called external or
exogenous variables and the criterion variables are called endogenous variables
...
(ii) Observable variables and latent variables: Explanatory variables described above are supposed
to be observable directly in some situations, and if this is so, the same are termed as observable
variables
...
We call such unobservable variables as latent variables
...
(iv) Dummy variable (or Pseudo variable): This term is being used in a technical sense and is
useful in algebraic manipulations in context of multivariate analysis
...
, m) a
dummy variable, if only one of Xi is 1 and the others are all zero
...
This technique is appropriate
when the researcher has a single, metric criterion variable
...
The main objective in using this technique is to predict the variability the
dependent variable based on its covariance with all the independent variables
...
Given a dependent variable, the linear-multiple regression problem is to estimate
constants B1, B2,
...
+ BkXk + A pare rovides
a good estimate of an individual’s Y score based on his X scores
...
zk; each
z has a mean of 0 and standard deviation of 1
...
+ β k z k
y
*
See Chapter 7 also for other relevant information about multiple regression
...
The expression on the right
side of the above equation is the linear combination of explanatory variables
...
The least-squares-method is used, to estimate the
beta weights in such a way that the sum of the squared prediction errors is kept as small as possible
d
′
i
...
, the expression ∑ z y − z y
i
2
is minimized
...
This special correlation coefficient from Karl Pearson is termed the multiple correlation
coefficient (R)
...
e
...
Sometimes the researcher may use step-wise regression techniques to have a better idea of the
independent contribution of each explanatory variable
...
Formal computerized techniques are available for the
purpose and the same can be used in the context of a particular problem being studied by the
researcher
...
Discriminant analysis requires interval independent variables and
a nominal dependent variable
...
is
being investigated, then we should use the technique of discriminant analysis
...
Thus discriminant
analysis is considered an appropriate technique when the single dependent variable happens to be
non-metric and is to be classified into two or more groups, depending upon its relationship with
several independent variables which all happen to be metric
...
In case we classify the dependent variable in more than two groups, then we
use the name multiple discriminant analysis; but in case only two groups are to be formed, we simply
use the term discriminant analysis
...
(i) There happens to be a simple scoring system that assigns a score to each individual or
object
...
On the basis of this score, the individual is assigned to the ‘most
likely’ category
...
Let b1, b2, and b3 be the weights attached
to the independent variables of age, income and education respectively
...
, Handbook of Marketing Research
...
early user, late user or a non-user)
...
Thus,
through the discriminant analysis, the researcher can as well determine which independent
variables are most useful in predicting whether the respondent is to be put into one group or
the other
...
In case only two groups of the individuals are to be formed on the basis of several
independent variables, we can then have a model like this
zi = b0 + b1X1i + b2X2i +
...
= the critical value for the discriminant score
...
, classify individual i as belonging to Group I
If zi < zcrit, classify individual i as belonging to Group II
...
Every individual on one side of the line is classified as Group I and
on the other side, every one is classified as belonging to Group II
...
In n-group discriminant analysis, a discriminant function is formed for each pair of groups
...
The b values for each function tell which variables are
important for discriminating between particular pairs of groups
...
Then use is made of the transitivity of the relation “more likely than”
...
This way all necessary comparisons are made and the
individual is assigned to the most likely of all the groups
...
For judging the statistical significance between two groups, we work out the Mahalanobis
statistic, D2, which happens to be a generalized distance between two groups, where each
group is characterized by the same set of n variables and where it is assumed that variancecovariance structure is identical for both groups
...
From all this, we can conclude that the discriminant analysis provides a predictive equation,
measures the relative importance of each variable and is also a measure of the ability of the equation
to predict actual class-groups (two or more) concerning the dependent variable
...
This technique is considered appropriate when
several metric dependent variables are involved in a research study along with many non-metric
explanatory variables
...
)
In other words, multivariate analysis of variance is specially applied whenever the researcher wants
to test hypotheses concerning multivariate differences in group responses to experimental
manipulations
...
In that case he should use the technique of multivariate analysis of variance
for meeting his objective
...
Both metric and non-metric data can be used in the context of this
multivariate technique
...
For example, if we want to relate
grade school adjustment to health and physical maturity of the child, we can then use canonical
correlation analysis, provided we have for each child a number of adjustment scores (such as tests,
teacher’s ratings, parent’s ratings and so on) and also we have for each child a number of health and
physical maturity scores (such as heart rate, height, weight, index of intensity of illness and so on)
...
Mathematically, in canonical correlation analysis, the weights of the two sets viz
...
yj are so determined that the variables X = a1X1 + a2X2 +
...
The process of finding the weights requires factor
analyses with two matrices
...
(v) Factor analysis: Factor analysis is by far the most often used multivariate technique of research
studies, specially pertaining to social and behavioural sciences
...
For instance, we might have data, say, about an individual’s income, education, occupation and dwelling
*
See, Eleanor W
...
167–168
...
The technique used for such purpose is generally described as factor
analysis
...
This technique allows the researcher to group variables into
factors (based on correlation between variables) and the factors so derived may be treated as new
variables (often termed as latent variables) and their value derived by summing the values of the
original variables which have been grouped into the factor
...
Since the factors happen to be linear combinations
of data, the coordinates of each observation or variable is measured to obtain what are called factor
loadings
...
The mathematical basis of factor analysis concerns a data matrix* (also termed as score
matrix), symbolized as S
...
Thus a1 is the
score of person 1 on measure a, a2 is the score of person 2 on measure a, and kN is the score of
person N on measure k
...
...
N
b
c
k
a1
a2
a3
...
...
...
bN
c1
c2
c3
...
...
...
kN
It is assumed that scores on each measure are standardized [i
...
, xi = ( X − X i ) 2 /σ i ]
...
0
...
After this, we work out factor loadings (i
...
, factor-variable correlations)
...
For realistic results, we resort to the technique of rotation, because such rotations reveal different
structures in the data
...
They also facilitate comparison among groups of items as groups
...
*
Alternatively the technique can be applied through the matrix of correlations, R as stated later on
...
As such
factor analysis is not a single unique method but a set of techniques
...
Before we describe these different methods of factor analysis, it seems appropriate that some
basic terms relating to factor analysis be well understood
...
There
can be one or more factors, depending upon the nature of the study and the number of variables
involved in it
...
They are also known as factor-variable correlations
...
It is the absolute size
(rather than the signs, plus or minus) of the loadings that is important in the interpretation of a factor
...
A high value of communality means that not
much of the variable is left over after whatever the factors represent is taken into consideration
...
Eigen value indicates
the relative importance of each factor in accounting for the particular set of variables being analysed
...
This value, when divided by the number of variables (involved in a study),
results in an index that shows how the particular solution accounts for what all the variables taken
together represent
...
If
they fall into one or more highly redundant groups, and if the extracted factors account for all the
groups, the index will then approach unity
...
Just as different stains on it reveal different structures in the tissue, different rotations reveal
different structures in the data
...
However, from the standpoint of making sense of the results of factor analysis, one must
select the right rotation
...
Communality for each variables will remain undisturbed
regardless of rotation but the eigen values will change as result of rotation
...
Factor scores can help explain what the factors
mean
...
We can now take up the important methods of factor analysis
...
L
...
* The centroid method tends to
maximize the sum of loadings, disregarding signs; it is the method which extracts the largest sum of
absolute loadings for each factor in turn
...
0 or – 1
...
The main merit of this method is that it is relatively simple, can be easily
understood and involves simpler computations
...
Various steps** involved in this method are as follows:
(i) This method starts with the computation of a matrix of correlations, R, wherein unities are
place in the diagonal spaces
...
(ii) If the correlation matrix so obtained happens to be positive manifold (i
...
, disregarding the
diagonal elements each variable has a large sum of positive correlations than of negative
correlations), the centroid method requires that the weights for all variables be +1
...
In
other words, the variables are not weighted; they are simply summed
...
(iii) The first centroid factor is determined as under:
(a) The sum of the coefficients (including the diagonal unity) in each column of the correlation
matrix is worked out
...
(c) The sum of each column obtained as per (a) above is divided by the square root of T
obtained in (b) above, resulting in what are called centroid loadings
...
The full set of loadings so
obtained constitute the first centroid factor (say A)
...
For this purpose, the loadings for the two variables on the first centroid factor
are multiplied
...
The resulting matrix of factor cross products may
be named as Q1
...
See, Jum C
...
, p
...
**
Multivariate Analysis Techniques
325
correlation, R, and the result is the first matrix of residual coefficients, R1
...
The aim in doing this
should be to obtain a reflected matrix, R'1, which will have the highest possible sum of
coefficients (T)]
...
When this is done, the matrix is named
as ‘reflected matrix’ form which the loadings are obtained in the usual way (already explained
in the context of first centroid factor), but the loadings of the variables which were reflected
must be given negative signs
...
Thus loadings on the second centroid factor are obtained from R'1
...
) the same process outlined above is repeated
...
This
is then subtracted from R1 (and not from R'1) resulting in R2
...
First, some of the variables would have
to be reflected to maximize the sum of loadings, which would produce R'2
...
Again, it would be necessary to give negative
signs to the loadings of variables which were reflected which would result in third centroid
factor (C)
...
Illustration 1
Given is the following correlation matrix, R, relating to eight variables with unities in the diagonal spaces:
1
Variables
1
2
3
4
5
6
7
8
1
...
709
...
081
...
113
...
774
2
3
Variables
4
5
6
7
8
...
204
...
626
...
155
...
000
...
089
...
098
...
652
...
000
...
123
...
582
...
089
...
000
...
798
...
111
...
123
...
000
...
201
...
098
...
798
...
000
...
120
...
582
...
201
...
000
...
652
...
111
...
120
...
000
Using the centroid method of factor analysis, work out the first and second centroid factors from the
above information
...
Each diagonal element is a partial variance i
...
, the
variance that remains after the influence of the first factor is partialed
...
e
...
This can be verified by looking at the
partial correlation coefficient between any two variables say 1 and 2 when factor A is held constant
r12 − r1 A ⋅ r2 A
r12 ⋅ A =
1 − r12A 1 − r22A
(The numerator in the above formula is what is found in R1 corresponding to the entry for variables 1 and 2
...
Likewise
the partial variance for 2 is found in the diagonal space for that variable in the residual matrix
...
326
Research Methodology
Solution: Given correlation matrix, R, is a positive manifold and as such the weights for all variables
be +1
...
Accordingly, we calculate the first centroid factor (A) as under:
Table 13
...
000
...
204
...
626
...
155
...
709
1
...
051
...
581
...
083
...
204
...
000
...
123
...
582
...
081
...
671
1
...
022
...
613
...
626
...
123
...
000
...
201
...
113
...
689
...
047
1
...
801
...
155
...
582
...
201
...
000
...
774
...
072
...
724
...
152
1
...
662
3
...
392
3
...
324
3
...
587
3
...
884
∴
T = 5
...
662 3
...
392 3
...
324 3
...
605
...
281 5
...
281 5
...
281 5
...
281 5
...
693,
...
642,
...
629,
...
679,
...
1 (b)
Variables
Factor loadings concerning
first Centroid factor A
1
2
3
4
5
6
7
8
...
618
...
641
...
694
...
683
To obtain the second centroid factor B, we first of all develop (as shown on the next page) the
first matrix of factor cross product, Q1:
Since in R1 the diagonal terms are partial variances and the off-diagonal terms are partial covariances, it is easy to convert
the entire table to a matrix of partial correlations
...
Multivariate Analysis Techniques
327
First Matrix of Factor Cross Product (Q1)
First centroid
factor A
...
642
...
629
...
679
...
480
...
445
...
436
...
471
...
693
...
642
...
629
...
679
...
618
...
382
...
396
...
429
...
422
...
397
...
412
...
446
...
438
...
396
...
411
...
445
...
438
...
389
...
403
...
437
...
430
...
429
...
445
...
482
...
474
...
420
...
435
...
471
...
464
...
422
...
438
...
474
...
466
Now we obtain first matrix of residual coefficient (R1) by subtracting Q1 from R as shown
below:
First Matrix of Residual Coefficient (R1)
Variables
2
3
4
5
6
7
1
1
2
3
4
5
6
7
8
Variables
...
281
–
...
363
...
368
–
...
301
...
618
–
...
307
...
331
–
...
230
–
...
346
...
259
–
...
243
...
366
–
...
307
...
589
–
...
353
...
327
...
192
–
...
381
...
390
–
...
294
–
...
331
...
353
–
...
518
...
354
8
–
...
337
...
178
–
...
330
...
312
...
230
–
...
327
...
354
–
...
534
Reflecting the variables 3, 4, 6 and 7, we obtain reflected matrix of residual coefficient (R'1) as
under and then we can extract the second centroid factor (B) from it as shown on the next page
...
281
...
346
...
192
...
337
...
346
...
259
...
243
...
363
...
259
...
381
...
178
...
192
...
381
...
390
...
368
...
243
...
390
...
330
...
337
...
178
...
330
...
301
...
366
...
294
...
312
○
5
○
4*
○
3*
...
281
...
363
...
368
...
○
○
○
○
○
○
○
○
○
○
○
○
○
○
328
Research Methodology
Variables
1
2
3*
4*
5
6*
7*
8
8
...
230
...
327
...
354
...
534
Column sums:
2
...
642
2
...
757
2
...
887
2
...
718
∴
Sum of column sums (T) = 20
...
581
Second centroid factor B =
...
577 –
...
602
...
630 –
...
593
*
These variables were reflected
...
693
...
642
...
629
...
679
...
563
...
539
–
...
558
–
...
518
...
1 of this
chapter
...
Solution: We work out the communality and eigen values for the given problem as under:
Table 13
...
693
...
642
...
629
...
679
...
563
...
539
–
...
558
–
...
518
...
693)2 + (
...
797
(
...
577)2 =
...
642)2 + (–
...
703
(
...
602)2 =
...
629)2 + (
...
707
(
...
630)2 =
...
679)2 + (–
...
729
(
...
593)2 =
...
○
○
○
○
○
○
○
○
○
○
○
○
○
○
Multivariate Analysis Techniques
Variables
329
Centroid Factor
A
Eigen value
(Variance
accounted for i
...
,
common variance)
Proportion of total
variance
Proportion of
common variance
Communality (h2)
Factor loadings
Centroid Factor
B
3
...
631
6
...
44
(44%)
...
33
(33%)
...
77
(77%)
1
...
For instance, 79
...
3% of the total
variance in variable one scores is thought of as being made up of two parts: a factor specific to the
attribute represented by variable one, and a portion due to errors of measurement involved in the
assessment of variable one (but there is no mention of these portions in the above table because we
usually concentrate on common variance in factor analysis)
...
33 to be the minimum
absolute value to be interpreted
...
This criterion, though arbitrary, is being used more or less by way of
convention, and as such must be kept in view when one reads and interprets the multivariate research
results
...
33 on all variables; such a factor is usually
called “the general factor” and is taken to represent whatever it is that all of the variables have in
common
...
The factor name is
chosen in such a way that it conveys what it is that all variables that correlate with it (that “load on
it”) have in common
...
33, but half of them are
with negative signs
...
Each of these poles is defined by a cluster of variables—one pole by those
with positive loadings and the other pole with negative loadings
...
The
rows at the bottom of the above table give us further information about the usefulness of the two
factors in explaining the relations among the eight variables
...
In this present example, then V = 8
...
The row labeled “Eigen value” or “Common variance” gives
the numerical value of that portion of the variance attributed to the factor in the concerning column
above it
...
Thus the total value, 8
...
490 as eigen value for factor A and 2
...
121 as the sum of eigen values for these two factors
...
0, are shown in the next row; there we can notice that 77% of the
330
Research Methodology
total variance is related to these two factors, i
...
, approximately 77% of the total variance is common
variance whereas remaining 23% of it is made up of portions unique to individual variables and the
techniques used to measure them
...
Thus it can be concluded that the
two factors together “explain” the common variance
...
C
...
Hotelling,
seeks to maximize the sum of squared loadings of each factor extracted in turn
...
The aim of the principal components method is the construction out of a given set of variables
Xj’s (j = 1, 2, …, k) of new variables (pi), called principal components which are linear combinations
of the Xs
p1 = a11 X1 + a12 X2 +
...
...
...
...
...
...
e
...
The aij’s are called loadings and are worked out in such a way that the extracted principal
components satisfy two conditions: (i) principal components are uncorrelated (orthogonal) and (ii) the
first principal component (p1) has the maximum variance, the second principal component (p2) has
the next maximum variance and so on
...
e
...
A decision is also taken with regard to the question: how
many of the components to retain into the analysis?
(ii) We then proceed with the regression of Y on these principal components i
...
,
b
$
$
$
Y = y1 p1 + y 2 p2 +
...
Alternative method for finding the factor loadings is as under:
(i) Correlation coefficients (by the product moment method) between the pairs of k variables
are worked out and may be arranged in the form of a correlation matrix, R, as under:
Multivariate Analysis Techniques
331
Correlation Matrix, R
Variables
X1
Variables
X1
X2
X3
...
Xk
X2
X3
…
...
...
...
...
…
...
r1k
r3k
r3k
…
...
The
correlation matrix happens to be a symmetrical matrix
...
The vector of column sums is
referred to as Ua1 and when Ua1 is normalized, we call it Va1
...
Then elements in Va1
are accumulatively multiplied by the first row of R to obtain the first element in a new
vector Ua2
...
To obtain the second element of Ua2, the same process would be
repeated i
...
, the elements in Va1 are accumulatively multiplied by the 2nd row of R
...
Then Ua2 would be normalized to obtain Va2
...
If
they are nearly identical, then convergence is said to have occurred (If convergence does
not occur, one should go on using these trial vectors again and again till convergence
occurs)
...
e
...
(iii) To obtain factor B, one seeks solutions for Vb, and the actual factor loadings for second
component factor, B
...
(iv) This very procedure is repeated over and over again to obtain the successive PC factors
(viz
...
332
Research Methodology
Other steps involved in factor analysis
(a) Next the question is: How many principal components to retain in a particular study? Various
criteria for this purpose have been suggested, but one often used is Kaiser’s criterion
...
(b) The principal components so extracted and retained are then rotated from their beginning
position to enhance the interpretability of the factors
...
A high communality figure means
that not much of the variable is left over after whatever the factors represent is taken into
consideration
...
The amount of variance explained (sum of squared
loadings) by each PC factor is equal to the corresponding characteristic root
...
(d) The variables are then regressed against each factor loading and the resulting regression
coefficients are used to generate what are known as factor scores which are then used in
further analysis and can also be used as inputs in several other multivariate analyses
...
Solution: Since the given correlation matrix is a positive manifold, we work out the first principal
component factor (using trial vectors) as under:
Table 13
...
155
...
582
...
201
...
774
...
072
...
724
...
113
...
689
...
047
1
...
626
...
123
...
000
...
081
...
671
1
...
022
...
204
...
000
...
123
...
709
1
...
051
...
581
...
000
...
204
...
626
...
○
○
○
○
○
○
○
○
○
○
○
○
○
Multivariate Analysis Techniques
1
333
Normalizing
Ua1 we
obtain Va1
i
...
, Va1 =
Ua /Nor-
4
5
6
...
652
...
072
...
111
...
724
3
...
155
...
263
3
...
385
3
...
371
...
344
...
337
...
801
...
000
...
152
1
...
666
3
...
605
...
365
malizing
factor*
*
Normalizing factor =
=
...
662g + b3
...
392g + b3
...
324g + b3
...
605g
2
2
2
2
2
2
2
2
97
...
868
Then we obtain Ua2 by accumulatively multiplying Va1 row by row into R and the result comes as
under:
Ua2 : [1
...
143, 1
...
201, 1
...
308, 1
...
275]
Normalizing it we obtain (normalizing factor for Ua2 will be worked out as above and will be
= 3
...
371,
...
344,
...
334,
...
366,
...
Hence Va1 is taken as the characteristic vector, Va
...
The result is as under:
Variables
(Characteristic
×
normalizing factor of U a 2
=
vector Va )
1
2
3
4
5
6
7
8
...
331
...
343
...
372
...
365
Principal
Component I
×
×
×
×
×
×
×
×
1
...
868
1
...
868
1
...
868
1
...
868
=
=
=
=
=
=
=
=
...
62
...
64
...
70
...
68
334
Research Methodology
For finding principal component II, we have to proceed on similar lines (as stated in the context
of obtaining centroid factor B earlier in this chapter) to obtain the following result*:
Variables
Principal Component II
1
2
3
4
5
6
7
8
+
...
59
–
...
59
+
...
61
–
...
61
The other parts of the question can now be worked out (after first putting the above information in a
matrix form) as given below:
Variables
Principal Components
I
II
Communality, h2
...
62
...
64
...
70
...
68
+
...
59
–
...
59
+
...
61
–
...
61
(
...
57)2 =
...
62)2 + (
...
733
(
...
52)2 =
...
64)2 + (–
...
758
(
...
57)2 =
...
70)2 + (–
...
862
(
...
49)2 =
...
68)2 + (–
...
835
3
...
6007
6
...
436
(43
...
325
(32
...
761
(76%)
Proportion
of common
variance
...
427
(43%)
1
...
e
...
*
This can easily be worked out
...
Multivariate Analysis Techniques
335
(C) Maximum Likelihood (ML) Method of Factor Analysis
The ML method consists in obtaining sets of factor loadings successively in such a way that each, in
turn, explains as much as possible of the population correlation matrix as estimated from the sample
correlation matrix
...
Thus, the ML method is a statistical
approach in which one maximizes some relationship between the sample of data and the population
from which the sample was drawn
...
Iterative approach is employed in ML method also to find
each factor, but the iterative procedures have proved much more difficult than what we find in the
case of PC method
...
*
The loadings obtained on the first factor are employed in the usual way to obtain a matrix of the
residual coefficients
...
This goes on repeatedly in search of one factor after another
...
The final
product is a matrix of factor loadings
...
ROTATION IN FACTOR ANALYSIS
One often talks about the rotated solutions in the context of factor analysis
...
e
...
Simple
structure according to L
...
Thurstone is obtained by rotating the axes** until:
(i) Each row of the factor matrix has one zero
...
(iii) For each pair of factors, there are several variables for which the loading on one is virtually
zero and the loading on the other is substantial
...
(v) For every pair of factors, the number of variables with non-vanishing loadings on both of
them is small
...
*
The basic mathematical derivations of the ML method are well explained in S
...
Mulaik’s, The Foundations of Factor
Analysis
...
Only the axes of the graph (wherein the points
representing variables have been shown) are rotated keeping the location of these points relative to each other undisturbed
...
Varimax rotation is one such method that maximizes
(simultaneously for all factors) the variance of the loadings within each factor
...
In essence, the solution obtained through varimax rotation produces factors that are characterized
by large loadings on relatively few variables
...
As a result, the solution obtained through this method permits a
general factor to emerge, whereas in case of varimax solution such a thing is not possible
...
e
...
It should, however, be emphasised
that right rotation must be selected for making sense of the results of factor analysis
...
In R-type factor
analysis, high correlations occur when respondents who score high on variable 1 also score high on
variable 2 and respondents who score low on variable 1 also score low on variable 2
...
In Q-type factor analysis, the correlations
are computed between pairs of respondents instead of pairs of variables
...
Factors emerge when there are high correlations within groups of people
...
Factor analysis has been mainly used in developing psychological tests (such as IQ tests, personality
tests, and the like) in the realm of psychology
...
Merits: The main merits of factor analysis can be stated thus:
(i) The technique of factor analysis is quite useful when we want to condense and simplify the
multivariate data
...
(iii) The technique can reveal the latent factors (i
...
, underlying factors not directly observed)
that determine relationships among several variables concerning a research study
...
(iv) The technique may be used in the context of empirical clustering of products, media or
people i
...
, for providing a classification scheme when data scored on various rating scales
have to be grouped together
...
Important ones are
as follows:
Multivariate Analysis Techniques
337
(i) Factor analysis, like all multivariate techniques, involves laborious computations involving
heavy cost burden
...
e
...
(ii) The results of a single factor analysis are considered generally less reliable and dependable
for very often a factor analysis starts with a set of imperfect data
...
”4 To overcome this difficulty, it has been
realised that analysis should at least be done twice
...
(iii) Factor-analysis is a complicated decision tool that can be used only when one has thorough
knowledge and enough experience of handling this tool
...
To conclude, we can state that in spite of all the said limitations “when it works well, factor
analysis helps the investigator make sense of large bodies of intertwined data
...
5
(vi) Cluster Analysis
Cluster analysis consists of methods of classifying variables into clusters
...
The basic objective of cluster analysis is to determine how many
mutually and exhaustive groups or clusters, based on the similarities of profiles among entities, really
exist in the population and then to state the composition of such groups
...
Steps: In general, cluster analysis contains the following steps to be performed:
(i) First of all, if some variables have a negative sum of correlations in the correlation matrix,
one must reflect variables so as to obtain a maximum sum of positive correlations for the
matrix as a whole
...
e
...
(iii) Then one looks for those variables that correlate highly with the said two variables and
includes them in the cluster
...
(iv) To obtain the nucleus of the second cluster, we find two variables that correlate highly but
have low correlations with members of the first cluster
...
Such variables along the said two variables thus
constitute the second cluster
...
4
Srinibas Bhattacharya, Psychometrics and Behavioural Research, p
...
William D
...
Sheth in their article on “Factor Analysis” forming chapter 9 in Robert Ferber, (ed
...
2–471
...
For problems concerning large number of variables, various cut-andtry methods have been proposed for locating clusters
...
In spite of the above stated limitation, cluster analysis has been found useful in context of market
research studies
...
(vii) Multidimensional Scaling**
Multidimensional scaling (MDS) allows a researcher to measure an item in more than one dimension
at a time
...
There are several MDS techniques (also known as techniques for dimensional reduction) often
used for the purpose of revealing patterns of one sort or another in interdependent data structures
...
Then the judged similarities are transformed into distances through statistical manipulations and are
consequently shown in n-dimensional space in a way that the interpoint distances best preserve the
original interpoint proximities
...
The significance of MDS lies in the fact that it enables the researcher to study “The perceptual
structure of a set of stimuli and the cognitive processes underlying the development of this structure
...
”6 With MDS, one can scale objects, individuals or both with a minimum of information
...
(viii) Latent Structure Analysis
This type of analysis shares both of the objectives of factor analysis viz
...
This type of analysis is appropriate when the
variables involved in a study do not possess dependency relationship and happen to be non-metric
...
*
These are beyond the scope of this book and hence have been omitted
...
C
...
E
...
**
See, Chapter No
...
6
Robert Ferber, ed
...
3–52
...
The technique
of path analysis is based on a series of multiple regression analyses with the added assumption of
causal relationship between independent and dependent variables
...
An
illustrative path diagram showing interrelationships between Fathers’ education, Fathers’ occupation,
Sons’ education, Sons’ first and Sons’ present occupation can be shown in the Fig
...
2
...
In linear additive effects are assumed, then through path analysis a simple set
of equations can be built up showing how each variable depends on preceding variables
...
”7
The merit of path analysis in comparison to correlational analysis is that it makes possible the
assessment of the relative influence of each antecedent or explanatory variable on the consequent or
criterion variables by first making explicit the assumptions underlying the causal connections and
then by elucidating the indirect effect of the explanatory variables
...
13
...
Each dependent variable is regarded as determined by the variables preceding it in the path
diagram, and a residual variable, defined as uncorrelated with the other variables, is postulated to
account for the unexplained portion of the variance in the dependent variable
...
”8
7
8
K
...
op
...
, The Foundations of Multivariate Analysis, p
...
Ibid
...
121–122
...
13
...
p21 may be estimated
from the simple regression of X2 on X1 i
...
, X2 = b21Xl and p31 and p32 may be estimated from the
regression of X3 on X2 and X1 as under:
$
X 3 = b31
...
1 X 2
where b31
...
In path analysis the beta coefficient indicates the direct effect of Xj (j = 1, 2, 3,
...
Squaring the direct effect yields the proportion of the variance in the dependent
variable Y which is due to each of the p number of independent variables Xj (i = 1, 2, 3,
...
After
calculating the direct effect, one may then obtain a summary measure of the total indirect effect of
Xj on the dependent variable Y by subtracting from the zero correlation coefficient ryxj, the beta
coefficient bj i
...
,
Indirect effect of Xj on Y = cjy = ryxj – bj
for all j = 1, 2,
...
Such indirect effects include the unanalysed effects and spurious relationships due to antecedent
variables
...
CONCLUSION
From the brief account of multivariate techniques presented above, we may conclude that such
techniques are important for they make it possible to encompass all the data from an investigation in
one analysis
...
These techniques yield more realistic probability statements
Multivariate Analysis Techniques
341
in hypothesis testing and interval estimation studies
...
The common source of each individual observation generally results into dependence or correlation
among the dimensions and it is this feature that distinguishes multivariate data and techniques from
their univariate prototypes
...
As
such their applications in the context of research studies have been accelerated only with the advent
of high speed electronic computers since 1950’s
...
What do you mean by multivariate techniques? Explain their significance in context of research studies
...
Write a brief essay on “Factor analysis” particularly pointing out its merits and limitations
...
Name the important multivariate techniques and explain the important characteristic of each one of such
techniques
...
Enumerate the steps involved in Thurstone’s centroid method of factor analysis
...
Write a short note on ‘rotation’ in context of factor analysis
...
Work out the first two centroid factors as well as first two principal components from the following
correlation matrix, R, relating to six variables:
1
Variables
1
...
55
1
...
43
...
00
...
25
...
00
5
6
...
31
...
43
1
...
36
...
33
...
44
1
...
71
...
70
...
65
...
40
...
37
–
...
43
–
...
71
...
70
...
64
...
39
...
32
–
...
45
–
...
Compute communality for each of the variable based on first two centroid factors in question six above
and state what does it indicate
...
Compute the proportion of total variance explained by the two factors worked out in question six above
by the principal components method
...
‘
9
...
10
...
Appendix: Summary Chart
Appendix
Summary Chart:
Showing the Appropriateness of a
Particular Multivariate Technique
Techniques of
multivariate analysis
Number of
Explanatory
variables
Criterion variables
1
...
Multiple discriminant
analysis
many
3
...
Factor analysis
many
6
...
Multidimensional
scaling (MDS)
many
8
...
Canonical
correlation analysis
*1
one
Non-metric
Any one of the two
...
A
many*1
many*2
A
metric
many
A
metric
Non-metric
A
343
344
Research Methodology
14
Interpretation and Report Writing
After collecting and analyzing the data, the researcher has to accomplish the task of drawing inferences
followed by report writing
...
It is only through interpretation
that the researcher can expose relations and processes that underlie his findings
...
But in case the researcher had no hypothesis to start with, he would try to explain his
findings on the basis of some theory
...
All this analytical information and consequential inference(s) may well be communicated,
preferably through research report, to the consumers of research results who may be either an
individual or a group of individuals or some public/private organisation
...
In fact, it is a search for broader meaning of research findings
...
, (i) the effort to establish continuity in research through
linking the results of a given study with those of another, and (ii) the establishment of some explanatory
concepts
...
Interpretation also extends beyond the data of the study to include the
results of other research, theory and hypotheses
...
WHY INTERPRETATION?
Interpretation is essential for the simple reason that the usefulness and utility of research findings lie
in proper interpretation
...
William Emory, Business Research Methods, p
...
Interpretation and Report Writing
345
(i) It is through interpretation that the researcher can well understand the abstract principle
that works beneath his findings
...
Fresh inquiries can test these predictions later on
...
(ii) Interpretation leads to the establishment of explanatory concepts that can serve as a guide
for future research studies; it opens new avenues of intellectual adventure and stimulates
the quest for more knowledge
...
(iv) The interpretation of the findings of exploratory research study often results into hypotheses
for experimental research and as such interpretation is involved in the transition from
exploratory to experimental research
...
TECHNIQUE OF INTERPRETATION
The task of interpretation is not an easy job, rather it requires a great skill and dexterity on the part of
researcher
...
The researcher
may, at times, seek the guidance from experts for accomplishing the task of interpretation
...
In fact, this is the technique of how generalization should be done and concepts be
formulated
...
(iii) It is advisable, before embarking upon final interpretation, to consult someone having insight
into the study and who is frank and honest and will not hesitate to point out omissions and
errors in logical argumentation
...
(iv) Researcher must accomplish the task of interpretation only after considering all relevant
factors affecting the problem to avoid false generalization
...
PRECAUTIONS IN INTERPRETATION
One should always remember that even if the data are properly collected and analysed, wrong
interpretation would lead to inaccurate conclusions
...
Researcher must pay attention to the following points for correct interpretation:
(i) At the outset, researcher must invariably satisfy himself that (a) the data are appropriate,
trustworthy and adequate for drawing inferences; (b) the data reflect good homogeneity;
and that (c) proper analysis has been done through statistical methods
...
Errors can arise due to false generalization and/or due to wrong
interpretation of statistical measures, such as the application of findings beyond the range
of observations, identification of correlation with causation and the like
...
In fact, the positive test results accepting the hypothesis must be
interpreted as “being in accord” with the hypothesis, rather than as “confirming the validity
of the hypothesis”
...
He should be well equipped with and must know the
correct use of statistical measures for drawing inferences concerning his study
...
As such he must take the task of interpretation
as a special aspect of analysis and accordingly must take all those precautions that one
usually observes while going through the process of analysis viz
...
(iv) He must never lose sight of the fact that his task is not only to make sensitive observations
of relevant occurrences, but also to identify and disengage the factors that are initially
hidden to the eye
...
Broad
generalisation should be avoided as most research is not amenable to it because the coverage
may be restricted to a particular time, a particular area and particular conditions
...
(v) The researcher must remember that “ideally in the course of a research study, there should
be constant interaction between initial hypothesis, empirical observation and theoretical
conceptions
...
”2 He must pay
special attention to this aspect while engaged in the task of interpretation
...
As a matter of fact even the most
brilliant hypothesis, highly well designed and conducted research study, and the most striking
generalizations and findings are of little value unless they are effectively communicated to others
...
Research
results must invariably enter the general store of knowledge
...
Young, Scientific Social Surveys and Research, 4th ed
...
488
...
There are people who do not consider writing of report as an integral part of
the research process
...
Writing of report is the last
step in a research study and requires a set of skills somewhat different from those called for in
respect of the earlier stages of research
...
DIFFERENT STEPS IN WRITING REPORT
Research reports are the product of slow, painstaking, accurate inductive work
...
Though all these steps are self explanatory, yet a brief
mention of each one of these will be appropriate for better understanding
...
There are two ways in which to develop a subject (a) logically and
(b) chronologically
...
Logical treatment often consists
in developing the material from the simple possible to the most complex structures
...
The directions for doing or
making something usually follow the chronological order
...
They are an aid to the logical organisation
of the material and a reminder of the points to be stressed in the report
...
Such a step is of utmost importance for the researcher now sits to write down
what he has done in the context of his research study
...
Rewriting and polishing of the rough draft: This step happens to be most difficult part of all
formal writing
...
The careful
revision makes the difference between a mediocre and a good piece of writing
...
The
researcher should also “see whether or not the material, as it is presented, has unity and cohesion;
does the report stand upright and firm and exhibit a definite pattern, like a marble arch? Or does it
resemble an old wall of moldering cement and loose brick
...
He should check the
mechanics of writing—grammar, spelling and usage
...
The bibliography, which is generally appended to the research report, is a list of books
3
4
Elliott S
...
Gatner and Francesco Cordasco, Research and Report Writing, p
...
Ibid
...
50
...
It should contain all those works which
the researcher has consulted
...
Generally, this pattern of bibliography is
considered convenient and satisfactory from the point of view of reader, though it is not the only way
of presenting bibliography
...
Name of author, last name first
...
Title, underlined to indicate italics
...
Place, publisher, and date of publication
...
Number of volumes
...
R
...
Ltd
...
For magazines and newspapers the order may be as under:
1
...
2
...
3
...
4
...
5
...
6
...
Example
Robert V
...
995
...
The only thing important is that,
whatever method one selects, it must remain consistent
...
The final draft should be written in a concise
and objective style and in simple language, avoiding vague expressions such as “it seems”, “there
may be”, and the like ones
...
Illustrations and examples based on common experiences must be incorporated
in the final draft as they happen to be most effective in communicating the research findings to
others
...
It must be remembered that every report should be an attempt to solve some
intellectual problem and must contribute to the solution of a problem and must add to the knowledge
of both the researcher and the reader
...
For this purpose there is the need of
proper layout of the report
...
A comprehensive layout of the research report should comprise (A) preliminary pages; (B)
the main text; and (C) the end matter
...
(A) Preliminary Pages
In its preliminary pages the report should carry a title and date, followed by acknowledgements in
the form of ‘Preface’ or ‘Foreword’
...
(B) Main Text
The main text provides the complete outline of the research report along with all details
...
Each main section of the
report should begin on a new page
...
(i) Introduction: The purpose of introduction is to introduce the research project to the readers
...
e
...
A brief
summary of other relevant research may also be stated so that the present study can be seen in that
context
...
The methodology adopted in conducting the study must be fully explained
...
The statistical analysis adopted must also be clearly stated
...
The various
limitations, under which the research project was completed, must also be narrated
...
If the findings happen to be extensive, at this point they should be
put in the summarised form
...
This generally comprises the main body of the report, extending over several chapters
...
All the results should be presented in logical sequence and splitted into readily identifiable
sections
...
But how one is to decide about what is
relevant is the basic question
...
But ultimately the researcher must
rely on his own judgement in deciding the outline of his report
...
”5
(iv) Implications of the results: Toward the end of the main text, the researcher should again put
down the results of his research clearly and precisely
...
Such implications may have three aspects as stated below:
(a) A statement of the inferences drawn from the present study which may be expected to
apply in similar circumstances
...
(c) Thc relevant questions that still remain unanswered or new questions raised by the study
along with suggestions for the kind of research that would provide answers for them
...
The conclusion drawn from the study should be clearly
related to the hypotheses that were stated in the introductory section
...
(v) Summary: It has become customary to conclude the research report with a very brief summary,
resting in brief the research problem, the methodology, the major findings and the major conclusions
drawn from the research results
...
Bibliography of sources
consulted should also be given
...
The value of index lies in the fact that it works as a guide
to the reader for the contents in the report
...
448
...
In each individual case, both the length and the
form are largely dictated by the problems at hand
...
Banks, insurance organisations and financial institutions
are generally fond of the short balance-sheet type of tabulation for their annual reports to their
customers and shareholders
...
Chemists report their results in symbols and formulae
...
In the field of education
and psychology, the favourite form is the report on the results of experimentation accompanied by
the detailed statistical tabulations
...
News items in the daily papers are also forms of report writing
...
In such reports the first paragraph usually contains the important information in detail and the
succeeding paragraphs contain material which is progressively less and less important
...
Such reviews also happen to be a kind of short report
...
Such reports are usually considered as important research products
...
D
...
The above narration throws light on the fact that the results of a research investigation can be
presented in a number of ways viz
...
Which method(s) of presentation to be used in a particular
study depends on the circumstances under which the study arose and the nature of the results
...
A popular report is used if the research results have policy
implications
...
A general outline of a technical report can be as follows:
1
...
2
...
3
...
For instance, in
sampling studies we should give details of sample design viz
...
352
Research Methodology
4
...
If secondary
data are used, their suitability to the problem at hand be fully assessed
...
5
...
This, in
fact, happens to be the main body of the report usually extending over several chapters
...
Conclusions: A detailed summary of the findings and the policy implications drawn from the
results be explained
...
Bibliography: Bibliography of various sources consulted be prepared and attached
...
Technical appendices: Appendices be given for all technical matters relating to questionnaire,
mathematical derivations, elaboration on particular technique of analysis and the like ones
...
Index: Index must be prepared and be given invariably in the report at the end
...
This, in other words,
means that the presentation may vary in different reports; even the different sections outlined above
will not always be the same, nor will all these sections appear in any particular report
...
(B) Popular Report
The popular report is one which gives emphasis on simplicity and attractiveness
...
Attractive layout along with large print, many subheadings,
even an occasional cartoon now and then is another characteristic feature of the popular report
...
We give below a general outline of a popular report
...
The findings and their implications: Emphasis in the report is given on the findings of most
practical interest and on the implications of these findings
...
Recommendations for action: Recommendations for action on the basis of the findings of the
study is made in this section of the report
...
Objective of the study: A general review of how the problem arise is presented along with the
specific objectives of the project under study
...
Methods employed: A brief and non-technical description of the methods and techniques used,
including a short review of the data on which the study is based, is given in this part of the report
...
Results: This section constitutes the main body of the report wherein the results of the study are
presented in clear and non-technical terms with liberal use of all sorts of illustrations such as charts,
diagrams and the like ones
...
Technical appendices: More detailed information on methods used, forms, etc
...
But the appendices are often not detailed if the report is entirely meant for
general public
...
The only
important thing about such a report is that it gives emphasis on simplicity and policy implications from
the operational point of view, avoiding the technical details of all sorts to the extent possible
...
The merit of this approach lies in the
fact that it provides an opportunity for give-and-take decisions which generally lead to a better
understanding of the findings and their implications
...
In order to
overcome this difficulty, a written report may be circulated before the oral presentation and referred
to frequently during the discussion
...
Use of slides, wall charts and blackboards is quite helpful in contributing to clarity and
in reducing the boredom, if any
...
This very often happens in academic institutions where the researcher discusses
his research findings and policy implications with others either in a seminar or in a group discussion
...
But in practical field and
with problems having policy implications, the technique followed is that of writing a popular report
...
MECHANICS OF WRITING A RESEARCH REPORT
There are very definite and set rules which should be followed in the actual preparation of the
research report or paper
...
The criteria of format should be decided as soon as the materials for
the research paper have been assembled
...
Size and physical design: The manuscript should be written on unruled paper 8 1 2 ″ × 11″ in
size
...
A margin of at least
one and one-half inches should be allowed at the left hand and of at least half an inch at the right hand
of the paper
...
The paper should be neat and
legible
...
2
...
3
...
4
...
But if a quotation is of a considerable length (more than four
or five type written lines) then it should be single-spaced and indented at least half an inch to the right
of the normal text margin
...
The footnotes: Regarding footnotes one should keep in view the followings:
(a) The footnotes serve two purposes viz
...
In other words, footnotes are meant for cross references,
citation of authorities and sources, acknowledgement and elucidation or explanation of a
point of view
...
The modern tendency is to make the minimum use of footnotes
for scholarship does not need to be displayed
...
Footnotes are customarily separated from the textual
material by a space of half an inch and a line about one and a half inches long
...
The number should be put slightly above the line, say at the end of a quotation
...
Thus, consecutive numbers must be used to correlate the reference in the
text with its corresponding note at the bottom of the page, except in case of statistical
tables and other numerical material, where symbols such as the asterisk (*) or the like one
may be used to prevent confusion
...
6
...
Such
documentary footnotes follow a general sequence
...
Author’s name in normal order (and not beginning with the last name as in a bibliography)
followed by a comma;
2
...
Place and date of publication;
4
...
Example
John Gassner, Masters of the Drama, New York: Dover Publications, Inc
...
315
...
Author’s name in the normal order;
Interpretation and Report Writing
2
...
4
...
355
Title of work, underlined to indicate italics;
Place and date of publication;
Number of volume;
Pagination references (The page number)
...
In such cases the order is illustrated as under:
Example 1
“Salamanca,” Encyclopaedia Britannica, 14th Edition
...
But if there should be a detailed reference to a long encyclopedia article, volume and
pagination reference may be found necessary
...
2
...
4
...
6
...
(v) Regarding anthologies and collections reference
Quotations from anthologies or collections of literary works must be acknowledged not
only by author, but also by the name of the collector
...
Original author and title;
2
...
Second author and work
...
F
...
16, quoted in History of the Pacific Ocean area, by R
...
Abel,
p
...
(vii) Case of multiple authorship
If there are more than two authors or editors, then in the documentation the name of only the first
is given and the multiple authorship is indicated by “et al
...
Subsequent references to the same work need not be so detailed as stated above
...
A single page should be referred to as p
...
If there are several pages referred to at a stretch, the practice is to use often the page number,
for example, pp
...
Roman numerical is generally used to indicate the number of the
volume of a book
...
cit
...
cit
...
Op
...
or Loc
...
after the
writer’s name would suggest that the reference is to work by the writer which has been cited in
detail in an earlier footnote but intervened by some other references
...
Punctuation and abbreviations in footnotes: The first item after the number in the footnote is
the author’s name, given in the normal signature order
...
After the
comma, the title of the book is given: the article (such as “A”, “An”, “The” etc
...
The title is followed by a comma
...
This entry is followed by a comma
...
for London, N
...
for New York, N
...
for New Delhi and so on
...
Then the name of the publisher is mentioned and this entry is closed
by a comma
...
If the date
appears in the copyright notice on the reverse side of the title page or elsewhere in the volume, the
comma should be omitted and the date enclosed in square brackets [c 1978], [1978]
...
Then follow the volume and page references and are separated by a comma
if both are given
...
But one should remember
that the documentation regarding acknowledgements from magazine articles and periodical literature
follow a different form as stated earlier while explaining the entries in the bibliography
...
The following is a partial list of the most common abbreviations frequently
used in report-writing (the researcher should learn to recognise them as well as he should learn to
use them):
anon
...
,
before
art
...
,
augmented
bk
...
,
bulletin
cf
...
,
chapter
col
...
,
dissertation
ed
...
ed
...
,
edition cited
e
...
,
exempli gratia: for example
eng
...
al
...
,
ex
...
, ff
...
,
fn
...
, ibidem:
id
...
, illus
...
Intro
...
,
l, or ll,
loc
...
,
loco citato:
MS
...
,
N
...
, nota bene:
n
...
,
n
...
,
no pub
...
,
o
...
,
op
...
or pp
...
,
tr
...
,
vid or vide:
viz
...
or vol(s)
...
, versus:
357
et sequens: and the following
example
and the following
figure(s)
footnote
in the same place (when two or more successive footnotes refer to the
same work, it is not necessary to repeat complete reference for the second
footnote
...
may be used
...
the same
illustrated, illustration(s)
introduction
line(s)
in the place cited; used as op
...
, (when new reference
is made to the same pagination as cited in the previous note)
Manuscript or Manuscripts
note well
no date
no place
no publisher
number(s)
out of print
in the work cited (If reference has been made to a work
and new reference is to be made, ibid
...
cit
...
The
name of the author must precede
...
Use of statistics, charts and graphs: A judicious use of statistics in research reports is often
considered a virtue for it contributes a great deal towards the clarification and simplification of the
material and research results
...
Statistics are usually presented in the form of tables, charts, bars and line-graphs
and pictograms
...
It should be
suitable and appropriate looking to the problem at hand
...
9
...
For the purpose, the researcher should put to himself questions
like: Are the sentences written in the report clear? Are they grammatically correct? Do they say
what is meant’? Do the various points incorporated in the report fit together logically? “Having at
least one colleague read the report just before the final revision is extremely helpful
...
A friendly critic, by pointing out passages
that seem unclear or illogical, and perhaps suggesting ways of remedying the difficulties, can be an
invaluable aid in achieving the goal of adequate communication
...
Bibliography: Bibliography should be prepared and appended to the research report as discussed
earlier
...
Preparation of the index: At the end of the report, an index should invariably be given, the
value of which lies in the fact that it acts as a good guide, to the reader
...
The former gives the names of the subject-topics or concepts
along with the number of pages on which they have appeared or discussed in the report, whereas the
latter gives the similar information regarding the names of authors
...
Some people prefer to prepare only one index common for names of authors,
subject-topics, concepts and the like ones
...
A
good research report is one which does this task efficiently and effectively
...
While determining the length of the report (since research reports vary greatly in length),
one should keep in view the fact that it should be long enough to cover the subject but short
enough to maintain interest
...
2
...
3
...
The
report should be able to convey the matter as simply as possible
...
4
...
For this purpose, charts,
6
Claire Selltiz and others, Research Methods in Social Relations rev
...
Ltd
...
454
...
6
...
8
...
10
...
12
...
14
...
359
graphs and the statistical tables may be used for the various results in the main report in
addition to the summary of important findings
...
The reports should be free from grammatical mistakes and must be prepared strictly in
accordance with the techniques of composition of report-writing such as the use of quotations,
footnotes, documentation, proper punctuation and use of abbreviations in footnotes and the
like
...
It must reflect a structure
wherein the different pieces of analysis relating to the research problem fit well
...
It must contribute to the solution of a problem and must add to
the store of knowledge
...
It is usually considered desirable if the report makes a forecast of the
probable future of the subject concerned and indicates the kinds of research still needs to
be done in that particular field
...
Bibliography of sources consulted is a must for a good report and must necessarily be
given
...
Report must be attractive in appearance, neat and clean, whether typed or printed
...
Objective of the study, the nature of the problem, the methods employed and the analysis
techniques adopted must all be clearly stated in the beginning of the report in the form of
introduction
...
Questions
1
...
3
...
Write a brief note on the ‘task of interpretation’ in the context of research methodology
...
Why so?
Describe the precautions that the researcher should take while interpreting his findings
...
Elucidate the
given statement explaining the technique of interpretation
...
“It is only through interpretation the researcher can expose the relations and processes that underlie his
findings”
...
6
...
7
...
8
...
9
...
10
...
Is only oral presentation
sufficient? If not, why?
11
...
(b)What are the different forms in which a research work may be reported
...
(M
...
Exam
...
of Rajasthan)
12
...
requires something equally important:
an organisation or synthesis which provides the essential structure into which the pieces of analysis fit
...
(M
...
Exam
...
of Rajasthan)
13
...
14
...
Discuss
...
The development of electronic devices, specially the computers,
has given added impetus to this activity
...
Computer is certainly one of the most versatile and ingenious developments of the modern
technological age
...
No longer are they just
big boxes with flashing lights whose sole purpose is to do arithmetic at high speed but they make use
of studies in philosophy, psychology, mathematics and linguistics to produce output that mimics the
human mind
...
Indeed, the advancement
in computers is astonishing
...
Electronic computers have by now become an indispensable part of research
students in the physical and behavioural sciences as well as in the humanities
...
A basic
understanding of the manner in which a computer works helps a person to appreciate the utility of
this powerful tool
...
answers questions like: What is a computer? How does it function? How does one
communicate with it? How does it help in analysing data?
THE COMPUTER AND COMPUTER TECHNOLOGY
A computer, as the name indicates, is nothing but a device that computes
...
But what has made this term conspicuous today and, what we normally imply when we
speak of computers, are electronically operating machines which are used to carry out computations
...
The computer can be a digital computer or it can be a analogue computer
...
Digital
computer handles information as strings of binary numbers i
...
, zeros and ones, with the help of
counting process but analogue computer converts varying quantities such as temperature and pressure
into corresponding electrical voltages and then performs specified functions on the given signals
...
Most computers are digital, so much so that the word computer is generally accepted as being
synonymous with the term ‘digital computer’
...
The
present day microcomputer is far more powerful and costs very little, compared to the world’s first
electronic computer viz
...
The microcomputer works many times faster, is thousands of times more reliable and has a
large memory
...
* Today we
have the fourth generation computer in service and efforts are being made to develop the fifth
generation computer, which is expected to be ready by 1990
...
This machine did not have any facility for storing programs and the instructions had to be fed into it
by a readjustment of switches and wires
...
The transistor replaced the valve in all
electronic devices and made them much smaller and more reliable
...
The third generation computer followed the invention of integrated
circuit (IC) in 1959
...
The fourth generation computers owe their birth to the
advent of microprocessor—the king of chips—in 1972
...
This device has enabled the
development of microcomputers, personal computers, portable computers and the like
...
It is said that fifth generation computer will be 50 times or so more faster
than the present day superfast machines
...
Regarding output devices, the teleprinter has been substituted by various types of low-cost high
speed printers
...
For storing data, the magnetic tapes and discs
*
(i)
(ii)
(iii)
(iv)
First generation computers were those produced between 1945–60 such as IBM 650, IBM 701
...
Third generation computers were those produced between 1965–70 such as IBM System 360, 370
...
The Computer: Its Role in Research
363
are being replaced by devices such as bubble memories and optical video discs
...
THE COMPUTER SYSTEM
In general, all computer systems can be described as containing some kind of input devices, the CPU
and some kind of output devices
...
1 depicts the components of a computer system and their
inter-relationship:
Central Processing Unit
(CPU)
Control Unit
(Interprets the computer
programme
...
15
...
The input
devices translate the characters into binary, understandable by the CPU, and the output devices
retranslate them back into the familiar character i
...
, in a human readable form
...
e
...
So far as
CPU is concerned, it has three segments viz
...
When a computer program or data is input into the CPU, it is in fact input into the internal
storage of the CPU
...
Its
function extends to the input and output devices as well and does not just remain confined to the
sequence of operation within the CPU
...
In terms of overall sequence of events, a computer program is input into the internal storage and
then transmitted to the control unit, where it becomes the basis for overall sequencing and control of
computer system operations
...
After the designated calculations and comparisons have been completed,
output is obtained from the internal storage of the CPU
...
) of computer are collectively called hardware
...
(c) Firmware: It is that software which is incorporated by the manufacturer into the electronic
circuitry of computer
...
It is also
known as operating software and is normally supplied by the computer manufacturer
...
This software is
either written by the user himself or supplied by ‘software houses’, the companies whose
business is to produce and sell software
...
Silicon is the most commonly used semiconductor—a material which is neither
a good conductor of electricity nor a bad one
...
(g) Memory chips: These ICs form the secondary memory or storage of the computer
...
(h) Two-state devices: The transistors on an IC Chip take only two states—they are either on
or off, conducting or non-conducting
...
These two binary digits are called bits
...
A chip is called 8-bit, 16-bit, 32-bit and so on, depending on
the number of bits contained in its standard word
...
This has led to many scientific projects which were
previously impossible
...
If two million calculations have to be performed, it will perform
the two millionth with exactly the same accuracy and speed as the first
...
Hence, it is impossible to store all types of information inside the computer
records
...
(iv) Accuracy: The computer’s accuracy is consistently high
...
Almost without exception, the errors in computing are due to human rather
than to technological weaknesses, i
...
, due to imprecise thinking by the programmer or due
to inaccurate data or due to poorly designed systems
...
The CPU follows these instructions until it meets a last instruction which says ‘stop program
execution’
...
(Binary system has been described in further details under separate heading in this
chapter
...
THE BINARY NUMBER SYSTEM
An arithmetic concept which uses two levels, instead of ten, but operates on the same logic is called
the binary system
...
The base of this number system is 2
...
Binary numbers can be constructed just like decimal numbers except
that the base is 2 instead of 10
...
On the other hand, in the binary system, the factor being 2 instead of 10,
the first place is still for 1s but the 2nd place is for 2s, the 3rd for 4s, the 4th for 8s and so on
...
The method works as follows:
Start by dividing the given decimal integer by 2
...
Next, divide ql by 2 and let R2 and q2 be the remainder and quotient respectively
...
The equivalent binary number can be formed
by arranging the remainders as
Rk Rk –1
...
366
Research Methodology
Illustration 1
Find the binary equivalents of 26 and 45
...
1
Number to be
divided by 2
Quotient
Remainder
45
22
11
5
2
1
22
11
5
2
1
0
1
0
1
1
0
1
Thus, we have (45)10 = (101101)2
i
...
, the binary equivalent of 45 is 101101
...
For example,
26 = 16 + 8 + 0 + 2 + 0 = 1 × 24 + 1 × 23 + 0 × 22 + 1 × 21 + 0 × 20
Then collect the multipliers of the powers to form the binary equivalent
...
This alternative method is convenient for
converting small decimal integers by hand
...
This can be described as follows:
The Computer: Its Role in Research
367
Begin the conversion process by doubling the leftmost bit of the given number and add to it the bit
at its right
...
Proceed in this
manner till all the bits have been considered
...
Illustration 2
Convert 1101 to its decimal equivalent using the double-babble method
...
Doubling the leftmost bit we get 2
...
Adding to it the bit on its right we get 2 + 1 = 3
3
...
Adding to it the next bit we get 6 + 0 = 6
5
...
Finally adding the last bit we get 12 + 1 = 13
Thus, we have (1101)2 = (13)10
In other words, the decimal equivalent of binary 1101 is 13
...
Those interested may read any binary system book
...
The binary addition rules are as shown below:
0
0
1
1
+0
+1
+0
+1
0
1
1
10
Note that sum of 1 and 1 is written as ‘10’ (a zero sum with a 1 carry) which is the equivalent of
decimal digit ‘2’
...
Illustration 3
Add 1010 and 101
...
368
Research Methodology
Solution:
Carry 111
10111000
+ 111011
Carry 11
184
+ 59
11110011
243
In Illustration 4, we find a new situation (1 + 1 + 1) brought about by the 1 carry
...
We add the digits in turn
...
The third 1 is now added to this result to obtain 11 (a 1 sum
with a 1 carry)
...
×, –, +) by a form of addition
...
g
...
This idea of repeated
addition may seem to be a longer way of doing things, but remember that computer is well suited to
carry out the operation at great speed
...
(b) Complementary subtraction: Three steps are involved in this method:
Step 1
...
Add this to number from which you are taking away;
Step 3
...
Following two examples illustrate this method
...
Solution:
Decimal
number
Binary number
25
Subtract 10
Illustration 6
Subtract 72 from 14
...
The Computer: Its Role in Research
369
Solution:
Decimal
number
14
Subtract 72
Binary
number
Step 1
...
Step 3
...
Its decimal
equivalent is –58
...
For example, 45 ÷ 9 may be thought of as 45 – 9 = 36 – 9 = 27 – 9 = 18 – 9
= 9 – 9 = 0 (minus 9 five times)
...
The binary
fraction can be converted into decimal fraction as shown below:
0
...
5 + 0
...
125
= 0
...
The whole number part of the first multiplication
gives the first 1 or 0 of the binary fraction;
(ii) The fractional part of the result is carried over and multiplied by 2;
(iii) The whole number part of the result gives the second 1 or 0 and so on
...
625 into its equivalent binary fraction
...
625 × 2 = 1
...
250 × 2 = 0
...
500 × 2 = 1
...
101 is the required binary equivalent
...
375 into its equivalent binary number
...
First (3)10 = (11)2 as shown earlier
...
375)10 = (0
...
Hence, the required binary equivalent is 11
...
From all this above description we find how computer arithmetic is based on addition
...
The
number of individual steps may indeed be increased because all computer arithmetic is reduced to
addition, but the computer can carry out binary additions at such great speed that this is not a
disadvantage
...
Educational, commercial, industrial,
administrative, transport, medical, social financial and several other organisations are increasingly
depending upon the help of computers to some degree or the other
...
“The motorists,
the air passenger, hospital patients and those working in large departmental stores, are some of the
people for whom computers process information
...
Many people who are working in major organisations and receive
monthly salary have their salary slips prepared by computers
...
1
“Computers can be used by just about anyone: doctors, policemen, pilots, scientists, engineers and
recently even house-wives
...
Without computers we might not have achieved a number of things
...
We might not have
built 100 storied buildings or high speed trains and planes
...
2
Some of the various uses
Provide a large data bank of information;
Aid to time-tabling;
Carry out lengthy or complex calculations;
Assist teaching and learning processes;
Provide students’ profiles;
Assist in career guidance
...
Education
○
Applications in
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
○
Contd
...
Subramanian, “Introduction to Computers”, Tata McGraw-Hill Publishing Company Ltd
...
192
...
, p
...
The Computer: Its Role in Research
Applications in
2
...
Banks and Financial
institutions
4
...
Industry
6
...
Scientific Research
8
...
(ii) Handle payroll of personnel, office accounts, invoicing, records
keeping, sales analysis, stock control and financial forecasting
...
(i) Planning of new enterprises;
(ii) Finding the best solution from several options;
(iii) Helpful in inventory management, sales forecasting and
production planning;
(iv) Useful in scheduling of projects
...
(i) Helpful in electronic mail;
(ii) Useful in aviation: Training of pilots, seat reservations, provide
information to pilots about weather conditions;
(iii) Facilitate routine jobs such as crew schedules, time-tables,
maintenance schedules, safety systems, etc
...
(i) Model processing;
(ii) Performing computations;
(iii) Research and data analysis
...
;
(ii) Can be used as an educational aid;
(iii) Home management is facilitated
...
Computers are ideally suited for data analysis concerning large
research projects
...
In all these operations,
computers are of great help
...
372
Research Methodology
Researchers in economics and other social sciences have found, by now, electronic computers
to constitute an indispensable part of their research equipment
...
Computation of means, standard deviations, correlation
coefficients, ‘t’ tests, analysis of variance, analysis of covariance, multiple regression, factor analysis
and various nonparametric analyses are just a few of the programs and subprograms that are available
at almost all computer centres
...
are also available in the market
...
The only work a researcher has to do is to feed in the data
he/she gathered after loading the operating system and particular software package on the computer
...
Techniques involving trial and error process are quite frequently employed in research methodology
...
Computer is best suited for such
techniques, thus reducing the drudgery of researchers on the one hand and producing the final result
rapidly on the other
...
different scenarios are made available to researchers by computers in no
time which otherwise might have taken days or even months
...
Thus, computers do facilitate the research work
...
Moreover, the results obtained are generally correct and reliable
...
Hence, researchers should be given computer education and be trained in the line so that they can
use computers for their research work
...
A brief mention about each of the above steps is appropriate and can be stated as under:
First of all, researcher must pay attention toward data organisation and coding prior to the input
stage of data analysis
...
For this purpose the data must be coded
...
For instance, regarding sex, we may give number 1 for male
and 2 for female; regarding occupation, numbers 1, 2, and 3 may represent Farmer, Service and
Professional respectively
...
For instance, I
...
Level with marks 120 and above may be given number 1, 90–119 number 2, 60–89 number 3, 30–59
number 4 and 29 and below number 5
...
4000 and above, Rs
...
2000–2999 and below Rs
...
The coded data are to be put in coding forms (most systems
The Computer: Its Role in Research
373
call for a maximum of 80 columns per line in such forms) at the appropriate space meant for each
variable
...
If more than 80 spaces are required for each
subject, then two or more lines will need to be assigned
...
Remaining columns are used for variables
...
Once the data is coded, it is ready to be stored in the computer
...
After this, the researcher must decide the appropriate statistical measure(s) he will use
to analyse the data
...
Most researchers
prefer one of the canned programs easily available but others may manage to develop it with the help
of some specialised agency
...
The above description indicates clearly the usefulness of computers to researchers in data analysis
...
The developments now taking place in computer technology will further enhance and facilitate the
use of computers for researchers
...
In spite of all this sophistication we should not forget that basically computers are machines that
only compute, they do not think
...
As such, researchers should be fully aware about the following limitations of computer-based
analysis:
1
...
All these require time, effort and money
...
2
...
3
...
If
poor data or faulty programs are introduced into the computer, the data analysis would not
be worthwhile
...
Questions
1
...
2
...
3
...
4
...
374
Research Methodology
5
...
Do you agree? Answer pointing out the various
characteristics of computers
...
Write a note on “Computers and Researchers”
...
“Inspite of the sophistication achieved in computer technology, one should not forget that basically
computers are machines that only compute, they do not think”
...
8
...
State the decimal equivalent of the sum you arrive at
...
Explain the method of complementary subtraction
...
10
...
110
(b) 0
...
210
(b) 0
...
Convert 842 to binary and 10010101001 to decimal
...
What do you understand by storage in a computer and how is that related to the generations?
Appendix
375
Appendix
(Selected Statistical Tables)
376
Research Methodology
Table 1: Area Under Normal Curve
1
An entry in the table is the proportion under the
entire curve which is between z = 0 and a positive
value of z
...
0
2
Areas of a standard normal distribution
z
...
01
...
03
...
05
...
07
...
09
...
1
...
3
...
5
...
0398
...
1179
...
1915
...
0438
...
1217
...
1950
...
0478
...
1255
...
1985
...
0517
...
1293
...
2019
...
0557
...
1331
...
2054
...
0596
...
1368
...
2088
...
0636
...
1406
...
2123
...
0675
...
1443
...
2157
...
0714
...
1480
...
2190
...
0753
...
1517
...
2224
...
7
...
9
1
...
2257
...
2881
...
3413
...
2611
...
3186
...
2324
...
2939
...
3461
...
2673
...
3238
...
2389
...
2995
...
3508
...
2734
...
3289
...
2454
...
3051
...
3554
...
2794
...
3340
...
2517
...
3106
...
3599
...
2852
...
3389
...
1
1
...
3
1
...
5
...
3849
...
4192
...
3665
...
4049
...
4345
...
3888
...
4222
...
3708
...
4082
...
4370
...
3925
...
4251
...
3749
...
4115
...
4394
...
3962
...
4279
...
3790
...
4147
...
4418
...
3997
...
4306
...
3830
...
4177
...
4441
1
...
7
1
...
9
2
...
4452
...
4641
...
4772
...
4564
...
4719
...
4474
...
4656
...
4783
...
4582
...
4732
...
4495
...
4671
...
4793
...
4599
...
4744
...
4515
...
4686
...
4803
...
4616
...
4756
...
4535
...
4699
...
4812
...
4633
...
4767
...
1
2
...
3
2
...
5
...
4861
...
4918
...
4826
...
4896
...
4940
...
4868
...
4922
...
4834
...
4901
...
4943
...
4875
...
4927
...
4842
...
4906
...
4946
...
4881
...
4931
...
4850
...
4911
...
4949
...
4887
...
4934
...
4857
...
4916
...
4952
2
...
7
2
...
9
3
...
4953
...
4974
...
4987
...
4966
...
4982
...
4956
...
4976
...
4987
...
4968
...
4983
...
4959
...
4977
...
4988
...
4970
...
4984
...
4961
...
4979
...
4989
...
4972
...
4985
...
4963
...
4980
...
4990
...
4974
...
4986
...
f
...
20
0
...
05
0
...
01
d
...
Level of significance for one-tailed test
0
...
05
0
...
01
0
...
078
1
...
638
1
...
476
6
...
920
2
...
132
2
...
706
4
...
182
2
...
571
31
...
965
4
...
747
3
...
657
9
...
841
4
...
032
1
2
3
4
5
6
7
8
9
10
1
...
415
1
...
383
1
...
943
1
...
860
1
...
812
2
...
365
2
...
262
2
...
143
2
...
896
2
...
764
3
...
499
3
...
250
3
...
363
1
...
350
1
...
341
1
...
782
1
...
761
1
...
201
2
...
160
2
...
731
2
...
681
2
...
624
2
...
106
3
...
012
2
...
947
11
12
13
14
15
16
17
18
19
20
1
...
333
1
...
328
1
...
746
1
...
734
1
...
725
2
...
110
2
...
093
2
...
583
2
...
552
2
...
528
2
...
898
2
...
861
2
...
323
1
...
319
1
...
316
1
...
717
1
...
711
1
...
080
2
...
069
2
...
060
2
...
508
2
...
492
2
...
831
2
...
807
2
...
787
21
22
23
24
25
26
27
28
29
Infinity
1
...
314
1
...
311
1
...
706
1
...
701
1
...
645
2
...
052
2
...
045
1
...
479
2
...
467
2
...
326
2
...
771
2
...
756
2
...
99
1
2
3
4
5
...
0201
...
297
...
95
...
103
...
711
...
50
...
05
...
01
...
386
2
...
357
4
...
706
4
...
251
7
...
236
3
...
991
7
...
488
11
...
412
7
...
837
11
...
388
6
...
210
11
...
277
15
...
872
1
...
646
2
...
558
1
...
167
2
...
325
3
...
348
6
...
344
8
...
342
10
...
017
13
...
684
15
...
592
14
...
507
16
...
307
15
...
622
18
...
679
21
...
812
18
...
090
21
...
209
11
12
13
14
15
3
...
571
4
...
660
4
...
575
5
...
892
6
...
261
10
...
340
12
...
339
14
...
275
18
...
812
21
...
307
19
...
026
22
...
685
24
...
618
24
...
472
26
...
259
24
...
217
72
...
141
30
...
812
6
...
015
7
...
260
7
...
672
9
...
117
10
...
338
16
...
338
18
...
337
23
...
769
25
...
204
28
...
296
27
...
869
30
...
410
29
...
995
32
...
687
35
...
000
33
...
805
36
...
566
21
22
23
24
25
8
...
542
10
...
856
11
...
591
12
...
091
13
...
611
20
...
337
22
...
337
24
...
615
30
...
007
32
...
382
32
...
924
35
...
415
37
...
343
37
...
968
40
...
566
38
...
289
41
...
980
44
...
198
12
...
565
14
...
953
15
...
151
16
...
708
18
...
336
26
...
336
28
...
336
35
...
741
37
...
087
40
...
885
40
...
337
42
...
773
41
...
140
45
...
693
47
...
642
46
...
278
49
...
892
Note: For degrees of freedom greater than 30, the quantity 2 χ 2 −
variance i
...
, zα =
2χ −
2
2d
...
− 1
...
f
...
4
18
...
13
7
...
61
5
...
59
5
...
12
4
...
84
4
...
67
4
...
54
4
...
45
4
...
38
4
...
32
4
...
28
4
...
24
4
...
21
4
...
18
4
...
08
4
...
92
3
...
5
19
...
55
6
...
79
5
...
74
4
...
26
4
...
98
3
...
80
3
...
68
3
...
59
3
...
52
3
...
47
3
...
42
3
...
38
3
...
35
3
...
33
3
...
23
3
...
07
2
...
7
19
...
28
6
...
41
4
...
35
4
...
86
3
...
59
3
...
41
3
...
29
3
...
20
3
...
13
3
...
07
3
...
03
3
...
99
2
...
96
2
...
93
2
...
84
2
...
68
2
...
6
19
...
12
6
...
19
4
...
12
3
...
63
3
...
36
3
...
18
3
...
06
3
...
96
2
...
90
2
...
84
2
...
80
2
...
76
2
...
73
2
...
70
2
...
61
2
...
45
2
...
2
19
...
01
6
...
05
4
...
97
3
...
48
3
...
20
3
...
02
2
...
90
2
...
81
2
...
74
2
...
68
2
...
64
2
...
60
2
...
57
2
...
54
2
...
45
2
...
29
2
...
0
19
...
94
6
...
95
4
...
87
3
...
37
3
...
09
3
...
92
2
...
79
2
...
70
2
...
63
2
...
57
2
...
53
2
...
49
2
...
46
2
...
43
2
...
34
2
...
17
2
...
9
19
...
85
6
...
82
4
...
73
3
...
23
3
...
95
2
...
77
2
...
64
2
...
55
2
...
48
2
...
42
2
...
38
2
...
34
2
...
31
2
...
28
2
...
18
2
...
02
1
...
v2 = Degrees of freedom for smaller variance
...
9
19
...
74
5
...
68
4
...
57
3
...
07
2
...
79
2
...
60
2
...
48
2
...
38
2
...
31
2
...
25
2
...
20
2
...
16
2
...
13
2
...
10
2
...
00
1
...
83
1
...
1
19
...
64
5
...
53
3
...
41
3
...
90
2
...
61
2
...
42
2
...
29
2
...
19
2
...
11
2
...
05
2
...
01
1
...
96
1
...
93
1
...
90
1
...
79
1
...
61
1
...
3
19
...
53
5
...
36
3
...
23
2
...
71
2
...
40
2
...
21
2
...
07
2
...
96
1
...
88
1
...
81
1
...
76
1
...
71
1
...
67
1
...
64
1
...
51
1
...
25
1
...
5 5403
5625
5764
5859
5982
6106
6235
6366
98
...
00
99
...
25
99
...
33
99
...
42
99
...
50
34
...
82
29
...
71
28
...
91
27
...
05
26
...
13
21
...
00
16
...
98
15
...
21
14
...
37
13
...
45
16
...
27
12
...
39
10
...
67
10
...
89
9
...
02
13
...
92
9
...
15
8
...
47
8
...
72
7
...
88
12
...
55
8
...
85
7
...
19
6
...
47
6
...
65
11
...
65
7
...
01
6
...
37
6
...
67
5
...
86
10
...
02
6
...
42
6
...
80
5
...
11
4
...
31
10
...
56
6
...
99
5
...
39
5
...
71
4
...
91
9
...
21
6
...
87
5
...
07
4
...
40
4
...
60
9
...
93
5
...
41
5
...
82
4
...
16
3
...
36
9
...
70
5
...
21
4
...
62
4
...
96
3
...
17
8
...
51
5
...
04
4
...
46
4
...
80
3
...
00
8
...
36
5
...
89
4
...
32
4
...
67
3
...
87
8
...
23
5
...
77
4
...
20
3
...
55
3
...
75
8
...
11
5
...
67
4
...
10
3
...
46
3
...
65
8
...
01
5
...
58
4
...
01
3
...
37
3
...
57
8
...
93
5
...
50
4
...
94
3
...
30
3
...
49
8
...
85
4
...
43
4
...
87
3
...
23
2
...
42
8
...
78
4
...
37
4
...
81
3
...
17
2
...
36
7
...
72
4
...
31
3
...
76
3
...
12
2
...
31
7
...
66
4
...
26
3
...
71
3
...
07
2
...
26
7
...
61
4
...
22
3
...
67
3
...
03
2
...
21
7
...
57
4
...
18
3
...
63
3
...
99
2
...
17
7
...
53
4
...
14
3
...
59
3
...
96
2
...
10
7
...
49
4
...
11
3
...
56
3
...
93
2
...
13
7
...
45
4
...
07
3
...
53
3
...
90
2
...
06
7
...
42
4
...
04
3
...
50
3
...
87
2
...
03
7
...
39
4
...
02
3
...
47
3
...
84
2
...
01
7
...
18
4
...
83
3
...
29
2
...
66
2
...
80
7
...
98
4
...
65
3
...
12
2
...
50
2
...
60
6
...
79
3
...
48
3
...
96
2
...
34
1
...
38
6
...
60
3
...
32
3
...
80
2
...
18
1
...
00
v1 = Degrees of freedom for greater variance
...
Appendix
381
Table 5: Values for Spearmans Rank Correlation (rs) for Combined Areas in Both Tails
(n = sample size = 12)
10% of area
10% of area
–
...
3986
n
...
10
...
02
...
002
4
5
6
7
8
9
10
...
7000
...
5357
...
4667
...
8000
...
7714
...
6190
...
5515
—
...
8236
...
7143
...
6364
—
...
8857
...
8095
...
7333
—
—
...
8929
...
8167
...
9643
...
9000
...
4182
...
3791
...
3500
...
4965
...
4593
...
6091
...
5549
...
5179
...
6713
...
6220
...
7455
...
6978
...
6536
...
8182
...
7670
...
3382
...
3148
...
2977
...
4118
...
3895
...
5000
...
4716
...
4451
...
5637
...
5333
...
6324
...
5975
...
5684
...
7083
...
6737
...
2909
...
2767
...
2646
...
3597
...
3435
...
4351
...
4150
...
3977
...
4963
...
4748
...
5545
...
5306
...
5100
...
6318
...
6070
...
2588
...
2480
...
2400
...
3236
...
3113
...
3894
...
3749
...
3620
...
4481
...
4320
...
5002
...
4828
...
4665
...
5757
...
5567
...
100
...
048
...
028
...
134
...
071
...
044
...
143
...
089
...
100
...
029
...
012
...
006
...
057
...
024
...
012
...
071
...
033
...
125
...
131
...
092
...
042
...
097
...
007
...
014
...
005
...
002
*
...
029
...
010
...
004
...
057
...
019
...
008
...
056
...
021
...
095
...
036
...
143
...
129
...
082
...
036
...
077
...
018
...
004
...
001
...
036
...
008
...
003
...
071
...
016
...
005
...
125
...
028
...
009
...
095
...
026
...
009
...
075
...
024
...
111
...
089
...
037
...
074
...
023
...
047
...
085
...
)
Research Methodology
5
3
3
3
3
3
3
3
3
Wl
2
3
4
5
6
7
8
2
382
Table 6: Selected Values of Wilcoxons (Unpaired) Distribution
[Ws Min Ws ] or [Max
...
012
...
002
...
001
...
024 *
...
019
...
057
...
129
...
009
...
026
...
063
...
004
...
013
...
032
...
002
...
007
...
017
...
001
...
004
...
010
4
5
6
7
8
28
28
28
28
28
38
50
63
77
92
...
001
...
000
...
006
...
001
...
000
...
005
...
001
...
021
...
004
...
001
...
015
...
003
...
002
...
000
...
000
...
002
...
000
...
008
...
001
...
000
...
005
...
001
...
009
...
002
...
089
...
026
...
123
...
090
...
037
...
069
...
117
...
030
...
054
...
091
...
055*
...
037
...
017
...
009
...
005
...
026
...
007
...
037
...
010
...
051
...
090
...
027
...
049
...
082
...
014
...
027
...
047
...
076
...
116
...
006
...
001
...
015
...
003
...
021
...
005
...
030
...
007
...
010
...
002
*
...
054
...
091
...
020
...
036
...
060
...
095
...
010
...
019
...
032
...
052
...
080
...
117
Indicates that the value at head of this column (add those values that are larger) are not possible for the given values of s and l in this row
...
025
...
005
Level of significance for two-tailed test
n
...
02
...
10
...
40
...
9000
1
...
7500
1
...
6000
1
...
5000
1
...
8100
...
0000
...
9375
1
...
3600
...
0000
...
7500
1
...
5905
...
9914
...
9999
1
...
2373
...
8965
...
9990
1
...
0778
...
6826
...
9898
1
...
0313
...
5000
...
9687
1
...
3487
...
9298
...
9984
...
0000
1
...
0000
1
...
0000
...
2440
...
7759
...
9803
...
9996
1
...
0000
1
...
0060
...
1672
...
6330
...
9452
...
9983
...
0000
...
0108
...
1719
...
6230
...
9453
...
9990
1
...
2824
...
8891
...
9963
...
0000
1
...
0000
1
...
0000
1
...
0000
...
1584
...
6488
...
9456
...
9972
...
0000
1
...
0000
1
...
0022
...
0835
...
4382
...
8418
...
9847
...
9997
1
...
0000
...
0031
...
0729
...
3871
...
8064
...
9806
...
0000
1
...
)
386
Research Methodology
n
r0
...
25
...
50
20
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
...
3917
...
8669
...
9886
...
9995
...
0000
1
...
0000
1
...
0000
1
...
0000
1
...
0000
1
...
0000
1
...
0032
...
0912
...
4148
...
7857
...
9590
...
9960
...
9998
1
...
0000
1
...
0000
1
...
0000
1
...
0000
...
0005
...
0159
...
1255
...
4158
...
7552
...
9433
...
9934
...
9996
1
...
0000
1
...
0000
1
...
0000
...
0002
...
0059
...
0577
...
2517
...
5881
...
8684
...
9793
...
9987
...
0000
1
...
0000
Appendix
387
Table 9: Selected Critical Values of S in the Kendalls Coefficient of Concordance
Values at 5% level of significance
k
N
Some additional
values for N = 3
3
3
4
5
6
8
10
15
20
4
5
6
48
...
0
89
...
7
49
...
6
75
...
7
127
...
9
258
...
4
88
...
3
136
...
7
231
...
8
468
...
9
143
...
4
221
...
0
376
...
5
764
...
3
217
...
2
335
...
1
571
...
9
1158
...
0
71
...
8
95
...
7
9
12
14
16
18
75
...
5
121
...
2
158
...
8
85
...
0
177
...
4
80
...
5
137
...
3
269
...
2
75
...
3
142
...
1
242
...
1
475
...
2
122
...
2
229
...
4
388
...
0
758
...
2
185
...
0
343
...
6
579
...
0
1129
...
9
388
Research Methodology
Table 10: Table Showing Critical Values of A-Statistic for any Given Value
of n 1, Corresponding to Various Levels of Probability
−
(A is significant at a given level if it is < the value shown in the table)
n – 1*
Level of significance for one-tailed test
...
025
...
005
...
10
...
02
...
001
1
2
3
4
5
6
1
2
3
4
5
0
...
412
0
...
376
0
...
5031
0
...
324
0
...
293
0
...
347
0
...
257
0
...
50012
0
...
272
0
...
218
0
...
334
0
...
211
0
...
370
0
...
368
0
...
368
0
...
281
0
...
276
0
...
230
0
...
217
0
...
210
0
...
196
0
...
185
0
...
167
0
...
146
0
...
134
11
12
13
14
15
0
...
368
0
...
368
0
...
273
0
...
270
0
...
269
0
...
205
0
...
202
0
...
178
0
...
174
0
...
170
0
...
126
0
...
121
0
...
368
0
...
368
0
...
368
0
...
268
0
...
267
0
...
200
0
...
198
0
...
197
0
...
168
0
...
166
0
...
117
0
...
114
0
...
112
21
22
23
24
25
0
...
368
0
...
368
0
...
266
0
...
266
0
...
265
0
...
196
0
...
195
0
...
165
0
...
163
0
...
162
0
...
110
0
...
108
0
...
368
0
...
368
0
...
368
0
...
265
0
...
264
0
...
194
0
...
193
0
...
193
0
...
161
0
...
161
0
...
107
0
...
106
0
...
105
(Contd
...
368
0
...
369
0
...
263
0
...
261
0
...
191
0
...
187
0
...
158
0
...
153
0
...
J
...
226
...
102
0
...
095
0
...
2
...
4
...
6
...
8
...
10
...
12
...
14
...
16
...
18
...
, The Design of Social Research, Chicago: University of Chicago Press, 1961
...
, Scientific Method, New York: John Wiley & Sons, 1962
...
Harrell, New Methods in Social Science Research, New York: Praeger Publishers, 1978
...
H
...
L
...
Anderson, T
...
, An Introduction to Multivariate Analysis, New York: John Wiley & Sons, 1958
...
, “Methods of Social Research,” New York, 1978
...
P
...
C
...
Bartee, T
...
, “Digital Computer Fundamentals,” 5th Ed
...
, 1981
...
, The Modern Researcher, rev
...
, New York: Harcourt, Brace &
World, Inc
...
Bell, J
...
, Projective Techniques: A
...
Bellenger, Danny N
...
, Marketing Research—A Management Information
Approach, Homewood, Illinois: Richard D
...
, 1978
...
, and Anderson, John F
...
J
...
, 1974
...
Berenson, Conard, and Colton, Raymond, Research and Report Writing for Business and Economics,
New York: Random House, 1971
...
, and Kahn, James V
...
, New Delhi: Prentice-Hall of India
Pvt
...
, 1986
...
Ltd
...
Boot, John C
...
, and Cox, Edwin B
...
New Delhi:
McGraw-Hill Publishing Co
...
, (International Student Edition), 1979
...
L
...
London: P
...
King and Staples Ltd
...
Selected References and Recommended Readings
391
19
...
, “Research Methods in Sociology” in Georges Gurvitch and W
...
Moore (Ed
...
20
...
, Statistical Methods for Decision Making, Bombay: D
...
Taraporevala Sons & Co
...
Ltd
...
21
...
C
...
22
...
New York: Holt,
Rinehart & Winston, 1974
...
Clover, Vernon T
...
, Business Research Methods, Columbus, O
...
, 1974
...
Cochran, W
...
, Sampling Techniques, 2nd ed
...
, 1963
...
Cooley, William W
...
, Multivariate Data Analysis, New York: John Wiley & Sons
...
26
...
E
...
J
...
, Applied General Statistics, 3rd ed
...
Ltd
...
27
...
L
...
Chand & Co
...
) Ltd
...
28
...
B
...
, McGraw-Hill International Book Co
...
29
...
Edwards
...
, Inc
...
30
...
31
...
32
...
, New York: Holt, Rinehart & Winston, 1967
...
Edwards, Allen L
...
34
...
William, Business Research Methods, Illinois: Richard D
...
Homewood, 1976
...
Ferber, Robert (ed
...
, 1948
...
Ferber, R
...
J
...
37
...
, Statistical Analysis in Psychology and Education, 4th ed
...
, Inc
...
38
...
), Research Methods in the Behavioral Sciences, New Delhi:
Amerind Publishing Co
...
Ltd
...
39
...
K
...
40
...
A
...
, New York: Hafner Publishing Co
...
41
...
A
...
ed
...
, 1960
...
Fox, James Harold, Criteria of Good Research, Phi Delta Kappa, Vol
...
43
...
, The Principles of Scientific Research, 2nd ed
...
44
...
J
...
Van Nostrand, 1954
...
Gatner, Elliot S
...
, and Cordasco, Francesco, Research and Report Writing, New York: Barnes & Noble,
Inc
...
46
...
, Graves, Harod F
...
S
...
, New York: Prentice-Hall,
1950
...
Ghosh, B
...
, Scientific Methods and Social Research, New Delhi: Sterling Publishers Pvt
...
, 1982
...
Gibbons, J
...
, Nonparametric Statistical Inference, Tokyo: McGraw-Hill Kogakusha Ltd
...
392
49
...
51
...
53
...
55
...
57
...
59
...
61
...
63
...
65
...
67
...
69
...
71
...
73
...
75
...
77
...
Research Methodology
Giles, G
...
, Marketing, 2nd ed
...
, 1974
...
, Survey Research in the Social Sciences, New York: Russell Sage Foundation, 1967
...
, 1977
...
, and Douglas, E
...
, 1954
...
, and Hatt, Paul K
...
Gopal, M
...
, An Introduction to Research Procedure in Social Sciences, Bombay: Asia Publishing
House, 1964
...
H
...
Gorden, Raymond L
...
ed
...
: Dorsey
Press, 1975
...
, Analyzing Multivariate Data, Hinsdale, Ill
...
Green, Paul E
...
J
...
, 1970
...
P
...
, 1954
...
, and Murphy, James L
...
, Inc
...
Hillway, T
...
, Boston: Houghton Mifflin, 1964
...
, Nonparametric Statistical Methods, New York: John Wiley,
1973
...
, and Shelley, J
...
, New Delhi: Prentice-Hall of India Ltd
...
Hyman, Herbert H
...
, Interviewing in Social Research, Chicago: University of Chicago Press, 1975
...
M
...
, 1971
...
Johnson, Rodney D
...
, Quantitative Techniques for Business Decisions, New
Delhi: Prentice-Hall of India Pvt
...
, 1977
...
and Cannell, Charles F
...
Karson, Marvin J
...
Kendall, M
...
, A Course in Multivariate Analysis, London, Griffin, 1961
...
and Pedhazur, Elazar J
...
Kerlinger, Fred N
...
, New York: Holt, Reinhart and Winston,
1973
...
, Survey Sampling, New York: John Wiley & Sons, Inc
...
Kothari, C
...
, Quantitative Techniques, 2nd ed
...
Ltd
...
Lastrucci, Carles L
...
: Schenkman Publishing Co
...
, 1967
...
, “Evidence and Inference in Social Research,” in David Lerher, Evidence and Inference,
Glencoe: The Free Press, 1950
...
Strauss, Field Research, New Jersey: Prentice-Hall Inc
...
Levin, Richard I
...
Ltd
...
Selected References and Recommended Readings
393
79
...
and Elzey, Freeman F
...
, 1968
...
Maranell, Gary M
...
), Scaling: A Source Book for Behavioral Scientists, Chicago: Aldine, 1974
...
Maxwell, Albert E
...
82
...
, and Parsons, A
...
, “Microprocessors: Essentials, Components and Systems,” Pitman, 1983
...
Meir, Robert C
...
, and Dazier, Harold L
...
J: Prentice Hall, Inc
...
84
...
, Handbook of Research Design & Social Measurement, 3rd ed
...
, 1977
...
Moroney, M
...
, Facts from Figures, Baltimore: Penguin Books, 1956
...
Morrison, Donald F
...
87
...
, and Neef, Marian, Policy Analysis in Social Science Research, London: Sage Publications,
1979
...
Nie, N
...
, Bent, D
...
, and Hull, C
...
, Statistical Package for the Social Sciences, New York: McGrawHill, 1970
...
Noether, G
...
, Elements of Nonparametric Statistics, New York: John Wiley & Sons, Inc
...
90
...
, Psychometric Theory, 2nd ed
...
91
...
W
...
,
1929
...
Oppenheim, A
...
, Questionnaire Design and Attitude Measurement, New York: Basic Books, 1966
...
Ostle, Bernard, and Mensing, Richard W
...
, Ames Iowa: The Iowa State
University Press, 1975
...
Payne, Stanley, The Art of Asking Questions, Princeton: Princeton University Press, 1951
...
Pearson, Karl, The Grammar of Science, New York: Meridian Books, Inc
...
96
...
, Social Research, Strategy and Tactics, 2nd ed
...
97
...
, 1973
...
Popper, Karl R
...
99
...
, “Fundamentals of Computers,” New Delhi: Prentice-Hall of India Pvt
...
, 1985
...
Ramchandran, P
...
101
...
V
...
V
...
, The Romance of Research, 1923
...
Roscoe, John T
...
, 1969
...
Runyon, Richard P
...
, 1977
...
Sadhu, A
...
, and Singh, Amarjit, Research Methodology in Social Sciences, Bombay: Himalaya Publishing
House, 1980
...
Seboyar, G
...
, Manual for Report and Thesis Writing, New York: F
...
Crofts & Co
...
106
...
, Research Methods in Social
Relations, rev
...
New York: Holt, Rinehart and Winston, Inc
...
107
...
A
...
, et al
...
Ltd
...
108
...
D
...
P
...
394
Research Methodology
109
...
, Nonparametric Statistics for the Behavioral Sciences, New York: McGraw-Hill Publishing Co
...
, 1956
...
Subramanian, N
...
Ltd
...
111
...
, (Ed
...
, 1970
...
Takeuchi, K
...
and Mukherjee, B
...
, The Foundations of Multivariate Analysis, New Delhi:
Wiley Eastern Ltd
...
113
...
C
...
114
...
and Hagen, Elizabeth P
...
, New York: John Wiley & Sons, 1977
...
Thurstone, L
...
, The Measurement of Values, Chicago: University of Chicago Press, 1959
...
Torgerson, W
...
117
...
W
...
, New York: Macmillan Publishing
Co
...
, 1978
...
Tryon, R
...
, and Bailey, D
...
, Cluster Analysis, New York: McGraw-Hill, 1970
...
Ullman, Neil R
...
, 1978
...
Whitney, F
...
, The Elements of Research, 3rd ed
...
121
...
S
...
L
...
122
...
H
...
123
...
, Statistics: An Introductory Analysis, 3rd ed
...
124
...
, Scientific Social Surveys and Research, 3rd ed
...
Author Index
395
Author Index
Ackoff, R
...
, 25, 390
Allen, T
...
L
...
H
...
, 390
Anderson, T
...
, 390
Bailey, D
...
, 338, 394
Bain, Read, 116
Baker, R
...
, 390
Balsey, Howard L
...
C
...
E
...
, 20, 91, 390
Bent, D
...
, 393
Berdie, Douglas R
...
, 86, 121, 390
Bhandarkar, P
...
, 394
Bhattacharya, Srinibas, 337, 390
Boot, John C
...
, 390
Bowley, A
...
, 18, 113, 390
Burgess, Ernest W
...
, 392
Chance, William A
...
C
...
, 391
Cochran, W
...
, 391
Colton, Raymond, 390
Cook, Stuart W
...
H
...
, 391
Cordasco, Francesco, 347, 391
Cowden, D
...
, 391
Cox, Edwin B
...
E
...
L
...
B
...
, 5, 393
Deming, W
...
Scates, 392
Edwards, Allen, 391
Edwards, Allen L
...
, 393
Emory, C
...
, 275, 391
396
Festinger, Leon, 391
Fiebleman, J
...
, 391
Fisher, R
...
, 39, 61, 256, 391
Fox, James Herold, 20, 391
Freedman, P
...
M
...
, 391
Ghosh, B N
...
D
...
B
...
, 392
Godfrey, Arthur, 392
Good, Carter V
...
, 392
Gopal, M
...
, 392
Gorden, Raymond L
...
, 160
Graff, Henry F
...
, 391
Green, Paul E
...
, 20, 91, 390
Guilford, J
...
, 80, 392
Gurvitch, Georges, 113
Guttman, Louis, 87, 88, 89
Hagen, Elizabeth P
...
, 158, 195, 214, 257, 392
Hatt, Paul K
...
, 392
Hoffman, Lyne S
...
, 391
Hollander, Myles, 392
Holtzman, W
...
, 108, 109
Hotelling H
...
C
...
H
...
, 392
Hyman, Herbert H
...
H
...
, 175, 392
Kahn, James V
...
, 392
Karson, Marvin J
...
G
...
C
...
, 392
Kish, Leslie, 392
Klein, S
...
R
...
, 10, 392
Lazersfeld, Paul F
...
, 158, 188, 392
Levine, S
...
, 391
Mahalanobis, 320
Maranell, Gary M
...
, 393
McQuitty, 338
Meadows, R
...
, 5, 393
Mensing, Richard W
...
, 393
Moore, W
...
, 113
Moroney, M
...
, 393
Morrison, Donald F
...
V
...
, 1, 393
Mukherji, B
...
, 316, 393, 394
Mukherji, S
...
, 393
Mulaik, S
...
, 335
Author Index
Murphy, James L
...
, 393
Neiswanger, W
...
, 12
Newell, William T
...
H
...
E
...
, 92, 324, 393
Odum, H
...
, 113, 393
Oppenheim, A
...
, 393
Osgood, Charles E
...
J
...
, 77, 393
Piaget, Jean, 393
Play Frederic Le, 114
Popper, Karl R
...
, 393
Ramachandran, P
...
V
...
, 393
Runyon, Richard P
...
N
...
, 110, 392
Seboyar, G
...
, 393
Selltiz Claire, 31, 38, 350, 358, 393
Sharma, B
...
V
...
D
...
, 392
Sheth, Jagdish N
...
, 298, 301, 394
Singh, Amarjit, 393
Siskih, Bernard R
...
, 1
Spearman, Charles, 138, 302
Spencer, Herbert, 114
Stephenson, M
...
, 392
Student, 160, 162
Subramaniam, N
...
J
...
, 394
Takeuchi, K
...
C
...
H
...
, 73, 394
Thurstone, L
...
, 80, 83, 84, 324, 394
Tippett, 61
Torgerson, W
...
W
...
C
...
, 233, 394
Verdoorn, P
...
, 28, 391
Wells, William D
...
L
...
S
...
, 321, 394
Wolfe, Douglas, A
...
, 394
Yanai, H
...
, 61, 246
Young, Pauline V
...
R
...
R
...
S
...
B
...
I
...
S
...
B
...
A
...
), 108
Thurstone-type scales, 83–84
Time-series analysis, 148–49
Tomkins-Horn picture arrangement test, 109
t-test, 195–96
Two-tailed and one-tailed test, 195–96
Type I and Type II errors, 187
Types of analysis, 130–31
bivariate, 130
causal, 130
correlation, 130
descriptive, 130
inferential, 131
multivariate, 130
unidimensional, 130
Variable, 33–34, 318
continuous, 34, 318
criterion, 318
dependent, 34, 318
discrete, 34, 318
dummy, 318
explanatory, 318
extraneous, 34
independent, 34
latent, 318
observable, 318
pseudo, 318
Varimax rotation, 336
Warranty cards, 106
Wilcoxon-Mann-Whitney test, 293–94
Yate’s correction, 246–49
Yule’s coefficient of association, 145–46
Z-test, 196
Title: Research-MethodologyMethods-and-Techniques-by-CR-Kothari
Description: this is not a note actually this is a book .Research-MethodologyMethods-and-Techniques-by-CR-Kothari
Description: this is not a note actually this is a book .Research-MethodologyMethods-and-Techniques-by-CR-Kothari