Designing and Reporting Experiments in Psychology Peter Harris
     
 
 
 
Designing & Reporting Experiments in Psychology 3/e
 
  Buy this Book  
     
  A. Choosing a statistical test  
  B. Reporting specific inferential statistics  
  B1 Reporting some of the more commonly used inferential statistics  
  B2 Measures of association and correlation  
  B3 Tests of differences - nonparametric  
  B4 Tests of differences - parametric  
  B5 Statistics of effect size  
  B6 More advanced issues and reporting  
  B7 More about analysis of variance  
  C. More on main effects, interactions and graphing interactions  
  D. Rules for writers  
  E. Reporting studies that include questionnaires  
  F. Experimental and nonexperimental data: Some things to watch out for  
  G. Some tips for advanced students to improve your experiments yet further  
  H. Some issues to consider in the RESULTS sections of your later reports and your projects  
  I. Final year projects  
     
 
Related Statistics Books
 
  Pallant, SPSS Survival Manual  
     
  Greene & D'Oliveira, Learning to Use Statistical Tests in Psychology  
     
   
Reporting specific inferential statistics

 

B7 More about analysis of variance

B7.1 Multiple comparisons

After using analysis of variance for only a short while you will discover that your F ratios often need to be followed by further tests - either to locate the differences creating a significant main effect where you have three or more conditions on your IV (described in this section) or to locate the precise difference(s) that are creating a significant interaction (described in the next section, Section B7.2).

Comparisons to unpack a main effect can either be planned or unplanned, depending on whether you have predicted the difference. Unplanned tests are also called post hoc tests. If your IV is quantitative you can also test for trends in your data.

Whether planned or unplanned, these further tests usually take the form of additional ANOVAs or t tests in which the error term or degrees of freedom are modified in some way specific to the comparison or to compensate for the fact that the difference was not predicted. You can find out more about this in Greene and D'Oliveira (third edition) Chapter 16 and in Part Five of Pallant (third edition).

There is no particular rule about when to report these, except that logically they appear somewhere after you have reported the relevant omnibus F. An omnibus F is the overall F ratio that tests for an interaction or main effect. (See Sections 13.4 and 13.5 of the book for more on main effects and interactions.) When the numerator of this F ratio has degrees of freedom equal to 2 or more, a significant omnibus F tells you that there is evidence of differences somewhere among the means, but not precisely where any such differences lie. This is a signal that you will need to conduct further tests to locate the difference.

Example

In an experiment researchers tested whether the 16 students who received a positive comment from their tutor once in every class performed differently in the end of course examination from the 16 students who received a mildly critical comment from their tutor once in every class or the 16 control students who received neither comment. Mean percentages in the examination were M = 64.4 (positive group), M = 59.4 (criticised group) and M = 56.3 (control group). Analysis using one-way analysis of variance for unrelated measures revealed a significant main effect of tutor's behaviour on examination performance, F (2, 45) = 7.49, p = .002, at an alpha of .05. Further comparisons are required to locate where the differences among the means lie.

Planned comparisons

Had these differences been predicted prior to the experiment, you could conduct planned comparisons. These can take a variety of forms. The ones described below are the most complicated and involve the use of orthogonal contrasts . Orthogonal contrasts test independent questions about the data. Knowing that one orthogonal contrast is statistically significant tells you nothing about whether any of the other orthogonal contrasts will be statistically significant. You can make as many orthogonal contrasts as there are degrees of freedom for the main effect. As there are 2 degrees of freedom for the main effect of tutor's behaviour on examination performance - the degrees of freedom in the above example are F (2, 45) - the contrasts employed here therefore test two orthogonal questions: (1) Is the combined mean of the two treatment groups, positive and critical comments, different from the control group mean, and (2) does the mean of the positive comment group differ from the mean of the criticised group? For the first contrast, t = -3.04, with degrees of freedom = 45 and p = .004. For the second contrast, t = 2.39, with degrees of freedom = 45 and p = .021. These analyses could be written succinctly following the omnibus analysis as:

  Planned comparisons employing orthogonal contrasts revealed that, when combined, the groups receiving positive or critical comments performed significantly better in the end of course examination than did the control group, t (45) = -3.04, p = .004, but also that the positive comment group performed significantly better than did the group receiving critical comments, t (45) = 2.39, p = .021.

Note, however, that you should ONLY use planned comparisons when you have quite clearly predicted that you will make these comparisons BEFORE you ran the study. If you have not done this then you need to use the stricter post hoc tests. This is because if you test for differences once you know which conditions look like they differ from each other statistically you need to control for the fact that you are (at least in your mind's eye) comparing all possible combinations of pairs of conditions. You can find more about this in Greene and D'Oliveira (Third Edition) Chapter 16 and in Part Five of Pallant (Third Edition).

If you are using SPSS you can find an option for specifying contrasts in the contrasts window of One Way ANOVA and in the General Linear Model analyses.

For example, the following contrasts would test the orthogonal contrasts specified in the above example:

 

Positive comment

Critical comment

Control

question 1

+1

+1

-2

question 2

+1

-1

0

Question 1: Is the combined mean of the two treatment groups, positive and critical comments, different from the control group mean? (The contrasts weight the means for the positive and critical comment conditions equally and with the same sign, but each is given half the weight of the control, which has the opposite sign. These coefficients tell the computer to test the combined mean of the two treatment groups [+1, +1] against the value for the control [-2].)

Question 2: Does the mean of the positive comment group differ from the mean of the criticised group? (Now the contrasts weight the means for the positive and critical comment equally but with different signs, so they are now contrasted with each other rather than combined. The zero weight for the control group eliminates it from the analysis, so now only the means of the two treatment groups are compared.)

Note that if this all seems too complicated to you and impossible to follow, don't worry. You do not need to run orthogonal contrasts - just try to understand how the coefficients instruct the computer to combine conditions (where they share the same sign) and contrast conditions (where the signs differ) and weight conditions the same (where they share the same number) or differently (where the number differs). You can then just test whatever contrast you have predicted irrespective of whether a set of contrasts is orthogonal.

Post hoc comparisons

The performance of the criticised group in the above example is surprising and is quite likely to have been unexpected. (We might have expected people receiving critical comments from their tutor to have lost confidence and to perform worst of all, yet this is not the case.) If we had not predicted this difference, therefore, we would need to analyse our data more cautiously. Post hoc tests are designed for this purpose - for testing for differences that we did not explicitly predict before we collected the data. They are statistically more conservative. There is a range of post hoc tests available and these are easy to find and use on most software packages. The one used here is one of the one most frequently used, Tukey's HSD test. (There are different variants of this test available depending on whether the sample size in the 2 groups is the same or different - however your software package should take care of this for you.) Like all post hoc tests it has its strengths and weaknesses. The analyses revealed:

 

Post hoc comparisons using Tukey's HSD test revealed that the group receiving positive comments performed significantly better in the end of course examination than did the control group or the group receiving critical comments. However, the difference in performance between the control group and the critical comment group was not significant.

So, you can see here that the unexpected difference in performance was not large enough to be statistically significant when tested properly using a relevant post hoc test.

If you have put the relevant DVs in a table, then a very convenient way of displaying the results of post hoc tests is by using subscripts to the means in that table. Means that share a subscript do not differ significantly from each other at whatever alpha level you have chosen to use for the study. The above results could, for example, be displayed in this way:

Table 1
The Effect of the Comments on Mean Performance on the End of Course Examination

 

Positive comment

Critical comment

Control

M

SD

10.3

12.3

11.6

Note. Scores are marks out of 100. Means sharing subscripts do not differ significantly (alpha = .05) on post-hoc tests (Tukey's HSD).

Tests of trend

One particular form of planned comparison involves testing for trends in the data. You can do this when you have measured your IV on a quantitative scale. (The DV is, of course, always quantitative.) For example, in the alcohol and driving experiment described in Chapter 10 of the book, we could test to see whether there was a predictable relationship between the amount of alcohol consumed by the driver and their errors in driving performance. One possibility is that there might be a linear trend in the data. A linear trend is a straight-line relationship, such that for every increase in the glasses of alcohol consumed there is a constant increase in errors. (In fact a linear trend is only likely to hold for low to moderate amounts of alcohol. After enough alcohol the driving performance will probably become so poor that more alcohol will lead to little or no further measurable deterioration.) There are other types of trend you can test for as well, such as the quadratic trend , which is a U (or inverted U) shape.

Example

Four different groups of respondents drove on a driving simulator after 0, 1, 2, or 3 standard measures (units) of alcohol. Mean errors were 4.00, 9.00, 12.00, and 13.00 respectively. Given that the alcohol IV has more than two levels and has been measured quantitatively, we can include contrasts to test whether there is any evidence of a linear increase in errors with amount of alcohol:

 

Analysis using one-way analysis of variance for unrelated measures revealed a significant main effect of alcohol on the number of errors made on the driving simulator, F (3, 16) = 25.13, p < .001, at an alpha of .05. There was a significant linear trend in the data, F (1, 16) = 69.23, p < .001, with increases in alcohol being accompanied by increases in errors.

If you are using SPSS you can find an option for testing for linear trend ("linearity") with a one way ANOVA with an unrelated measures IV in the compare means analysis. You can also specify it (and other testes of trend) by ticking the "polynomial" box in the contrasts window of One Way ANOVA. You can also define contrasts to specify such comparisons in the contrasts window of the General Linear Model analyses. (If you are unfamiliar with the term unrelated measures IV, see Section 10.2 of the book. If you are unfamiliar with terms like "one way" two way", see Section 13.3 of the book.) Indeed, SPSS tends to print out tests of trend as the default whenever this can be calculated.

Some issues to watch out for

Basic textbooks of statistics tend to cover post hoc tests better than they do the use of planned comparisons. Post hoc tests are often also easier to run on statistical software packages. Make sure that you report which post hoc test you have used - there are many and they differ in their assumptions and when they should be used. Some tests require the omnibus test to be significant for further comparisons to be made. You may need to adjust the significance level of the tests to control for what is known as the familywise error rate . Basically, this means that repeated analysis of the same set of data is increasing your type I error rate and you need to do something about it. (For more on type I error, see Section 11.3 of the book.) One relatively straightforward thing to do is to make the significance level of each test more stringent using what is called the Bonferroni adjustment. You can find out more about how to do this in Greene and D'Oliveira (third edition) Chapter16 and in Part Five of Pallant (third edition).

 

 

 

 

Open University PressMcGraw-Hill logo