Monday, November 14, 2011

Sampling Issues in Research

Sampling Issues in Research
Lecture 4
Lecture Outline
Basic Concepts
Determining Sample Size
Probability and Non probability sampling
Introduction
Students, today we shall be doing various issues in sampling.
To understand it better it is necessary that we do certain related terms first.
When we are doing certain investigation the interest lies in the assessment of the general magnitude and the study of variation with respect to one or more characteristics related to individuals belonging to a group.
Population
This group of individuals is called population or universe.
Thus we can define population as any entire collection of people, animals, plants, things or areas on which we may collect data.
It is the entire group of interest, which we wish to describe or about which we wish to draw conclusions.  
Census
Census refers to a complete enumeration of all items in the population.
It is expensive in term of time money and energy. In most cases only the government can conduct census.
However, it is possible to obtain sufficiently accurate results by studying only a part of a population.
Sample and Sampling Procedure
Selected few items need to be as representative of the total population as possible. The selected respondents or few items constitute what is termed as sample.
Sampling procedure is the process of selecting the sample. Researcher must prepare a sample design for the study. There should be a plan on how the sample should be selected and of what size.
Sampling Design
A sample design refers to a plan for obtaining a sample from a given population including techniques or the procedure that will be adopted.
Sample design may as well lay down the number of items to be included in the sample.
There are many sample designs, therefore researcher must select sample design which should be  reliable and appropriate for the study.
Parameter and Statistic
A parameter is an unknown value, and therefore it has to be estimated.
Parameters are used to represent a certain population characteristics. For example, the population mean is a parameter that is often used to indicate the average value of certain variable in a population.
Within a population, a parameter is a fixed value that does not vary.
Cont…
Each sample drawn from the population has its own value of any statistic that is used to estimate this parameter. For example, the mean of the data in a sample is used to give information about the overall mean in the population from which that sample was drawn
A statistic is a quantity that is calculated from a sample of data.
Cont…
It is used to give information about unknown values in the corresponding population.
A statistic is a function of an observable random sample.
It is therefore an observable random variable.
Statistic are often assigned Roman letters, whereas the equivalent unknown values in the population (parameters) are assigned Greek letters.  
Important Points to Consider
Type of the population (universe), you should clearly define the set of objects in your population.
Sampling unit – Decide what will be the sampling unit. May be a geographical area (district, ward, village), construction unit (house, flat) or a social unit (family, club).
Cont…
Source list (sampling frame) – It is the list from which a sample is drawn. It contains the names of all items from which the sample will be drawn (finite population). If the list is not available researcher has to prepare it (comprehensive, correct, reliable and appropriate). The source list should be representative of the population as much as possible.
Cont…
Size of the sample – This refers to the number of items to be included in the sample. The size of t he sample should neither be excessively large, no too small, it should be optimal that is efficient, representative, reliable and flexible. Size of the sample will depend on the:-
Desired precision and acceptable confidence level for the estimate.
Population variance.
Cont…
Size of the population
Parameter of interest – Has a strong impact upon the sample design e.g. proportion of the population, knowing some averages and important sub-groups in the population.
Costs
Budgetary constraints – Cost considerations, from the practical point of view, have a major impacts upon decision on the sample size and type of the sample.
Sampling Procedure
Two costs are involved in sampling analysis, that is the cost of collecting the data and the cost of an incorrect inferences resulting from the data.
Incorrect inferences are caused by systematic bias and sampling error.
Systematic Bias
Systematic bias results from errors in the sampling procedures and cant be collected by adjusting the sample size. It is the result of one or more of the following factors:-
Inappropriate sampling frame (un-coverage) – If the sampling frame is not representative of the population it will result in a systematic bias.
Cont…
Defective measuring device – If the measuring device is constantly in error, it will result in a systematic bias.
Non-respondents – If we are unable to sample all the individuals initially included in the sample, there may arise a systematic bias.
Cont…
Indeterminacy principle – Sometime individual act differently when kept under observation than what they do when kept in non-observed situations, this can result to the systematic bias.
Natural bias in the reporting of data (response bias) – people may systematically understate or overstate some data. Examples when asked the income for tax purpose or for social status.     
Sampling Error
These are random variations in the sample estimates around the true population parameters.
They occur randomly and are equally likely to be in either direction.
They are of compensatory type in nature.
The expected value of such errors happens to be equal to zero.
Cont…
Sampling errors decrease with an increase in the sample size and it is of small magnitude in case of homogenous population.
Can be measured for a given sample design and size.
If the sample size increased the precision can be improved but, cost can be a limitation and can enhance the systematic bias.
Cont…
The best way to increase the precision is usually to select a better sampling design which has a smaller sampling error for a given sample size at a given cost.
Characteristics of a Good Sample Design
Result in a truly representative sample.
Has small sampling error
Viable in the context of funds available
Systematic bias can be controlled in a better way
The results of sample study can be applied in general for the population with reasonable level of confidence.
Determining the Sample Size
Another issue with samples is the number of observations necessary for a given study.
Sample size is contingent on the amount of variation that exists in the population being studied, the actual size of the population, and the types of questions being asked.
Other factors that determine the sample size for a given project are the level of precision required and the needed confidence level.
Cont…
Also, it depend on degree of variability which refers to the amount of variation present in the population.
Generally speaking, the greater the variability in the population, the larger the sample size needs to be so that all of the variability is measured
After considering each of the aforementioned criteria, you can then begin to determine the sample size required for a study.
Cont…
There are several ways of determining sample size, though the use of formula give us to what extent the sample size chosen represent the population and in what precision.
There ae several formula which are used in determination of sample size.
In this lecture we will discuss one formula for determining the sample size.
Cont…
n=      N_____
    1 + N (e)2      
Where:
n= sample size
N is the Population
e is the acceptable error (the precision)
Cont…
For example, Makunganya conduct a study on household health seeking behaviour in Morogoro municipality.
Total number of household in Morogoro is 65,000. 
His acceptable error is 0.05.
What is his sample size?
Cont…
Sample size (n)
n =         65,000______
        1 + 65,000 (0.07)2                                               n = 203
Types of Sampling
The type of enquiry you want to have and the nature of data that you want to collect fundamentally determines the technique or method of selecting a sample.
The procedure of selecting a sample may be broadly classified under the following two types: Probability and Non-Probability Sampling Methods.
Probability Sampling
A probability sampling method is any method of sampling that utilizes some form of random selection.
In order to have a random selection method, you must set up some process or procedure that assures that the different units in your population have equal probabilities of being chosen.
Probability sampling is divided into the following types.
Simple Random Sampling
Simple random sampling is the most basic and least complicated.
This is sampling without replacement.
It assumes that the population is known and each case is identifiable.
It requires complete listing of all elements and the assumption that all such elements are statistically independent of one another.
Procedure
List all elements of the population and assign them consecutive numbers from 1 to N.
Then decide upon the sample size.
Using random number tables, select n different number falling between 1 and N.
Systematic Sampling
In systematic sampling every nth item on the list is selected.
The elements of randomness (probability) are introduced into this kind of sampling by using a random numbers to pick up the unit with which to start with.
Systematic sampling is used when the list of population is available and they are of considerable length such as a roaster, a telephone directory or a stack of file cards.
Cont…
It has the advantage of operational convenience especially when the sample is to be selected from a list such as a roaster, a telephone directory or a stack of file cards.
However, it has the disadvantage that, it may produce a non-representative sample when the list contains hidden periodicity.  
Stratified Sampling
In this sampling the population is divided into several sub-population that are individually more homogenous than the total population.
Then, random sample is selected from each strata using a simple random sampling.
Generally applied to obtain representative sample when the population does not constitute homogenous group.
The different sub-population are called strata.
Cont…
Strata are formed on the basis of common characteristics of the items to be put in each stratum.
Items are selected from each stratum by using the method of simple random sampling.
Also, systematic sampling can be used when appropriate.
Cont…
The items to be selected from each stratum depend on the method of proportional allocation under which the sizes of the samples from the different strata are kept proportional to the sizes of the strata.
Example: proportional allocation
Let pi = proportional of population included in the stratum 1.
N = total sample size.
The number of elements selected from strata I = n.pi
Cont…
Suppose we want a sample size n = 30 from a population of size N = 8000 which is divided into three strata of size N1 = 4000, N2 = 2400, N3 = 1600.
The sample size from the three strata
n1 = n.p1 = 30 (4000/8000) = 15
n2 = n.p2 =30(2400/8000) = 9
n3 = n.p3 = 30(1600/8000) = 6
Cont…
In case where strata differ not only in size but also in variability it is reasonable to take larger samples from more variable strata and smaller samples from the less variable.
This is called a optimum allocation in the context of disproportionate sampling.
The allocation results in a formula for determining sample sizes for different strata as:
Cont…
ni = (n.Nisi)/(N1s1 + N2s2 + ... Nksk)
Where:-
si = indicate standard deviatin for strata i
i = 1, 2, ...k
Example: A population is divide into three strata so that N1 = 5000, N2 = 2000 and N3 = 3000 respective standard deviation are: s1 =15, s2 = 18 and s3 = 5. How should a sample of n = 84 be allocated to the three strata if we use optimum allocation using disproportionate sampling design?
Cont…
Sample size for strata with N1 = 5000
=[(84)(5000)(15)]/[(5000)(15) + (2000)(18) + (3000)(5)]
=[6300000]/[126000]
= 50
Similarly,
Sample size for strata with N2 = 2000 = 24.
Sample size for strata with N3 = 3000 = 10.  
Cluster Sampling
The total population is divided into a number of relatively small subdivisions (clusters).
Some of these clusters are randomly selected to be included in the sample.
It is useful when the total area of study is very big.
Multi Stage Sampling
It is a form of cluster sampling that performs selection in stages.
Example: In 25 regions of the URT, randomly select 4 regions, the 2 districts in each region, 4 villages or township in each district, 2 streets from each village/township, 2 houses from each street and two respondents (man and a woman) in each house.
Cont…
It is easier to administer than most single stage sampling because the sampling frame is developed in partial units. A large number of units can be sampled for a given costs.
Non probability Sampling
Non-probability sampling is used because the researcher feels it is not feasible to include large number of samples in the study.
The researcher does not have sufficient information about the population to undertake probability sampling (may not know how many people/events make up the population).
Cont…
It may prove exceedingly difficult to construct a sample selected through convectional probability sampling e.g research on the drug addicts, the homeless, HIV/AIDS careers.
Non probability sampling has the following types.
Purposive Sampling
With purposive sampling, the decision with regard to which element/item should be included or excluded in the sample rests on the researcher’s judgment and intuition.
For this reason, purposive sampling is sometimes known as judgmental sampling.
The researcher chooses only those elements which he believes will be able to deliver the required data.    
Convenience/ Accidental Sampling
Accidental sampling involves selecting respondents primarily on the basis of their availability and willingness to respond.
It is a technique of sampling where the major drive for inclusion of an element in the sample is its ease access.
The advantage of this approach relies on time and money saving.
Cont…
But it should be understood that whenever you use it there is a greater chance of bias; therefore, research results may not be representative. 
Quota Sampling
It is a judgemental sampling with the constraint that a sample includes a minimum number from each specified subgroup in the population.
Quota sampling often is based on such demographic data as geographic location, age, gender, education and income.  
Snowball Sampling
This is another type of non-probability sampling technique.
It is employed when you are not certain that the respondents have relevant data for your study, but you know a few of them.
You will interview or provide questionnaire to those few and then ask them to identify others who are likely to have the required data.
Seminar Question
A population is divided into four strata so that N1 = 600, N2 = 1500, N3 = 800 and N4 = 2500. Respective standard deviations are: s1=15, s2=18, s3 = 4 and s4=5.
How should a sample of size n = 104 be allocated to the four strata, if we want optimum allocation using disproportionate sampling design?