One-Factor Analysis of Variance
We want a test of whether there are diffenrences among various groups on some characteristic of the response variable. If there is evidence of such differences, a second analysis involves finding just which group are different and the degree to which they differ.
The analysis of variance procedure involves partitioning this measure of the total variation of the observation about the overall mean into two independent parts. One of these is, BBS, is the portion of the total that is explained by the difference among the sample means. The other part, WSS, is the portion of the total that cannot be explained by these difference (among the means).
When we measure (for example) apparently under the same circumstances the same phenomenons time after time, we may notice a variation in the result measured by the mean. You may recognize a similar variation if you measure the same phenomenon in other areas. But perhaps you recognize differences from area to area. Are we sure we are measuring precisely the same phenomenons in the different areas, when we look at the varing result? Can we explain the differences in the sample means by chance? To answer this we have to analyze the variance and compare relatively to chance fluctuations. How can we measure this chance fluctuation?
|
120 tests (N) of the antibiotic amount in water of 8 areas (k) |
||||||||
|
Test No (n) Areas (k) |
||||||||
|
Y1 |
2 |
3 |
4 |
5 |
6 |
7 |
Y8 |
|
|
X1 |
81 |
81 |
86 |
87 |
83 |
59 |
78 |
94 |
|
2 |
56 |
92 |
94 |
88 |
76 |
67 |
96 |
88 |
|
3 |
58 |
88 |
82 |
86 |
55 |
74 |
86 |
77 |
|
4 |
81 |
86 |
87 |
89 |
90 |
90 |
76 |
57 |
|
5 |
78 |
90 |
79 |
90 |
91 |
61 |
57 |
77 |
|
6 |
83 |
77 |
88 |
87 |
86 |
91 |
58 |
75 |
|
7 |
71 |
74 |
78 |
86 |
84 |
57 |
76 |
78 |
|
8 |
55 |
65 |
62 |
77 |
93 |
56 |
89 |
83 |
|
9 |
76 |
78 |
90 |
78 |
76 |
89 |
94 |
84 |
|
10 |
85 |
81 |
89 |
77 |
52 |
77 |
57 |
85 |
|
11 |
83 |
78 |
77 |
75 |
75 |
76 |
87 |
76 |
|
12 |
58 |
76 |
75 |
93 |
87 |
56 |
53 |
78 |
|
13 |
77 |
79 |
57 |
91 |
75 |
91 |
81 |
90 |
|
14 |
56 |
57 |
87 |
90 |
86 |
65 |
91 |
87 |
|
X15 |
54 |
79 |
83 |
84 |
76 |
69 |
74 |
67 |
|
Yg |
70 |
79 |
81 |
85 |
79 |
72 |
77 |
80 |
|
(Yi1-Y1g)^2 |
(Yi2-Y2g)^2 |
… |
… |
… |
… |
… |
(Yi8-Y8g)^2 |
|
|
81 |
4 |
25 |
4 |
9 |
196 |
1 |
196 |
|
|
256 |
169 |
169 |
9 |
16 |
36 |
361 |
64 |
|
|
196 |
81 |
1 |
1 |
625 |
1 |
81 |
9 |
|
|
81 |
49 |
36 |
16 |
100 |
289 |
1 |
529 |
|
|
36 |
121 |
4 |
25 |
121 |
144 |
400 |
9 |
|
|
121 |
4 |
49 |
4 |
36 |
324 |
361 |
25 |
|
|
1 |
25 |
9 |
1 |
16 |
256 |
1 |
4 |
|
|
289 |
196 |
361 |
64 |
169 |
289 |
144 |
9 |
|
|
16 |
1 |
81 |
49 |
16 |
256 |
289 |
16 |
|
|
169 |
4 |
64 |
64 |
784 |
16 |
400 |
25 |
|
|
121 |
1 |
16 |
100 |
25 |
9 |
100 |
16 |
|
|
196 |
9 |
36 |
64 |
49 |
289 |
576 |
4 |
|
|
25 |
0 |
576 |
36 |
25 |
324 |
16 |
100 |
|
|
256 |
484 |
36 |
25 |
36 |
64 |
196 |
49 |
|
|
260 |
0 |
4 |
1 |
9 |
8 |
9 |
162 |
|
|
S sq. |
2104 |
1148 |
1467 |
463 |
2036 |
2501 |
2936 |
1217 |
|
St. dev. |
280,57 |
153,08 |
195,64 |
61,79 |
271,47 |
333,50 |
391,47 |
162,29 |
|
Variance |
16,8 |
12,4 |
14,0 |
7,9 |
16,5 |
18,3 |
19,8 |
12,7 |
|
WSS: |
S i S j of sq. = |
13873,42 |
|||||||||||||
|
Degrees of freedom: |
total sample size – number of groups |
90 - 8 = |
82 |
||||||||||||
|
Within Estimate of Variance: |
WSS/N-k = |
169,19 |
|||||||||||||
|
The overall mean of Y is the mean of the combined sample of N=120, which is: |
|
||||||||||||||
|
(S i S j)/N = |
78 |
||||||||||||||
|
BBS: |
Between Sums of Squares: |
S n i(Ygj -Ygg)^2= |
2482,19 |
||||||||||||
|
This sum of squares is based on degrees of freedom k-1, number of groups -1, |
8 -1= |
7 |
|||||||||||||
|
and |
BSS/k-1 = |
354,60 |
|||||||||||||
|
so that the between estimate of variance is: |
F=BSS(k-1)/WSS(N-k): |
2,10 |
|||||||||||||
|
Source |
Sum of Squares |
DF |
Mean Square |
F |
Prop>F |
|
Between |
2482,19 |
k -1=7 |
354,60 |
2,10 |
P<0.05 (?) |
|
Within |
13873,42 |
k(n -1)=112 |
169,19 |
||
|
Total |
16355,61 |
kn -1=119 |
|||
Whenever Ho is true, this ration F will have a value near 1, where Ho "no difference"
in the population means.
Few tables of the F- distribution are so comprehensive that they include all possible combinations of degrees of freedom. I can mention:
F.
95(7, 120) = 2.09 and F.99 (7,120) = 2.79F
.95 (7, º ) = 2.01 and F.99 (7, º ) = 2.64The conclusion is that mean square
sb^2 = 354.60 is so much larger than the mean square sw^2 = 169.19 that difference is slightly significant.
|
The sum of squares between the area means plus the sum of of squares within the same area should add up to the total sum of squares. Each sum of squares is also called variation. When we divide the variation by the appropriate degrees of freedom, we get the estimated variance. The differences between the columns is "explained" by the fact that values come from different parent populations (areas/pullution/other unrecongnized phenomenons). The variance within the columns may be explained by apparatus of measurement, the responsible person who measures or by quite other facts not known yet. Without knowing the relative importance of the differences of variations (and how large a rate is explained by chance), you cannot go on doing it better, because you do not know where to begin. The form of variance analyses concentrated of here is also called the simple analyses of variance. It assumes that sets of observations are classified into k groups. In each group the number (m) may vary or be helt constant. If varying you have to use the weight average and introduce the constant weights. A constant number in the groups is the simplest and used here. But the other methods do not vary very much whether these assumptions are chanced a little. |