Tests of Hypotheses and Confidence Intervals
Statistical evaluation and decision is often based on samples rather than on populations in general. For example we wish to decide on the basic of a sample whether an immunization is effective in preventing people from getting a disease.
In order to decide we have to make assumption or just guess about the population - which can be "the population in full", "the total production of beef production" as well as "all children of age less than 16 years" and "every occurence of antibiotic material in the seas around Denmark".
If 30 tosses of a die yield 10 "ones" you would perhaps say that the die is unfair. But perhaps this single serie of 30 tosses - even though random - turn out to be a little special. A priori we expect "the one" to come out 1/6 of the times. Forget for a moment the problem deciding how large a sample to choose. We can make a hypothesis then that there is no diffence between the final result from the sample and the result from the whole population. We will call such an hypothesis a Null Hypothesis (H0). In this example we call the following result the expected: p =1/6 (S=5), because we know from the probability calculation that the chance of a "one" "in both the whole population and in a sample" is 1/6. The alternative to H0 is P>< 1/6 or S><5. One alternative more, and yet another type of hypothesis may be S< 7.
Test Design Problems
The procedure which enable us to decide whether to accept or reject a hypothesis H0 or to determine whether the sample differs significant from the expected result or the difference might be explained as pure random, and if the result from the sample is included in the area of confidence it is called a test of hypothesis or test of confidence.
To get a good trust worthy results from tests of hypotheses or other basics of decision the test must be designed to minimize errors of decision. This is not simple, especially not, if the size of sample is given, some errors may be more serious than others, and if you reduce one type you may increase the influence of another. Even if the sample is not given, it is often difficult to alter the size of a sample, think of the number of patients with a specific diagnosis who may suffer pain before the sample size about pain-treatment has been changed, think of new knowledge about relations between variables collected and registred some years ago etc.
Level of confidence
In designing a test of a hypothesis we could choose a confidence level of 0.95 which means that we allowe a random variation so that in 5 of 100 it will make us reject the hypothesis, when we should have accepted it, or perhaps just in 1 of 100 that means we have to choose a test of H0 at confidence level of 0.99. This means that we accept to get the wrong answer - rejecting the hypothesis - in 1 of 100. This type of errors is called Type 1 error. If we on the other hand reject a hypothesis, when it should been accepted, it is called a Type 2 error.
If assumed that the sample is normal distributed - which is the case in most of the cases - the sample has a mean ms and a standard deviation s s - the distribution in the example above is typically normal. That means the number of "ones" everytime we select a sample 30 tosses might vary, but if the sample is large (>=30) the number of "ones" concentrate about 5, and the following curve will illustrate the distribution of the number of "ones" in the sample. The standardized normal distribution has been chosen with z=(X -m s)/s s on the 1. axis , with mean 0 and variance 1, where S is the calculated mean of "ones" of the sample.
The die example above: z = (X - m )/s Ö N , where m is the mean in the population and s is the standard deviation in the population (following from the Binomial Distribution).
The probability to get a "one" and the probability to get "not a one" are respectively
1/6 and 5/6.
Would we accept the H0 p=1/6 even if we got "ones" 8 times in a sample of 30 at confidence level of 1) 0.95 2) with 28 times "one" in a sample of 90 at confidence level 0.99?
N=30, p= X/N=1/6, and q=1-p= 5/6.
m =Np =30*1/6=5 and s =Ö Npq=Ö 30*(1/6)*(5/6)=2.04
The confidence limits:
z= (X-5)/2.04=1.96 or X=9.0
z= (X-5)/2.04=-1.96 or X=1.0
A third method:
-1.96< (X-5)/2.04 <1.96
m =Np =90*1/6=18 and s =Ö Npq=Ö 90*(1/6)*(5/6)=3.54
18 -2.58*3.54 =8.9
1) Accept the null hypothesis if you accept a type 1 error in 5 of 100 cases.
2) Reject the null hypothesis if you accept a type 2 error in 1 of 100 cases
You often formulate a hypothesis H0 in order to reject it, because it is often easier to test X= a number than X lesser or larger than a number.
A new fertilizer product was claimed by the supplier to pullute the subsoil water in a measurable amount only in 0.1% of all wells. A sample of 2000 water tests – one from each random selected well - 6 were measured pulluted. Determine whether the claim was legitime.
Let p denote the probability of measurable pullution:
H0: p>=0.1%, and the claim is correct
H0: p<0.1%, and the claim is false
We choose a one-tailed test, since we are interested in determine whether the proportion of pulluted wells is too high.
If the level of significance is taken as 0.01, i.e. if the shaded area in the figure 0.01, then z=2.33. Then the area between 0 and z1 0.49 can be seen from the table of the Normal
Distribution, and form the figure above when you exclude the one tail.
The decision rule:
1. The claim i not legitimate if z is larger than 2.33 (in which case we reject the H0)
2. Otherwise, the claim is legitimate and the observed results are due to chance (in which we accept H0)
If H0 is true, m = Np = 2000*0.001 = 2 and s = Ö Npq = Ö 20000*0.999*0.001 =1.4
Now 6 in standard units: (6-2)/1.4 = 2.857 which is more than 2.33. Thus we have to conclude that the claim is not legitimate, and H0 is rejected.
If not 6 but 5 pulluted wells had been found, the number of standard units would have been 2.14, and this might have made us accept the claim at the same level of confidence.
A hot example from Denmark (not finished)
Provisional English comments to the investigation of fertility among immigrants in Denmark
In Denmark we have got no official correct account of the number immigrantes living in the country. Now the authorities try to make a better basic of their prognoses of the population by estimating the demografic parameter of fertility among foreign women immigrated to Denmark. Fertility is the average number of children expected born by each women in the ages of fertility.
China and Mexico are among the few countries in the world, where UN observed an adjustment of the rate of fertility, even in the native population. The population policies in the mentioned countries had to very restictive to get these decreases. In Denmark the assertion by E. Vesselbo about an adjustment of the rate of fertility is aimed at the islamic immigrantes (about 70% of the group of immigrantes in Denmark for the last 20 years), and the Danish population policy is certainly the opposite of restictive.