Tukey's B method

Tukey's B method, also known as the Tukey-Kramer B procedure, or Tukey's Wholly Significant Difference (WSD) is a post-hoc multiple comparison statistical test used to identify which specific group means differ significantly from each other after a statistically significant result has been obtained from an analysis of variance (ANOVA). It is considered a compromise between two other popular multiple comparison procedures: Tukey's range test and the Newman-Keuls method.

The primary purpose of post-hoc tests like Tukey's B is to control the family-wise error rate (FWER) when performing multiple comparisons. Without such control, the probability of making at least one Type I error increases with the number of comparisons made.

History and context

The development of multiple comparison procedures stems from the work of Ronald Fisher, John Tukey and others in the mid-20th century. Tukey's HSD test is a conservative method that guarantees the FWER does not exceed the chosen significance level (e.g., ). Conversely, the Newman-Keuls (NK) method, while providing higher statistical power, is known to be anti-conservative; that is, not strictly controlling the FWER as the number of groups increases.

Tukey's B method was introduced to provide an intermediate level of conservatism. It seeks to balance the strict error control of HSD with the greater sensitivity to differences offered by Newman-Keuls.

Methodology

Tukey's B method operates by comparing all possible pairs of means. For each pair, it calculates a critical value based on the studentized range distribution.

While Tukey's HSD uses a single critical value derived from the total number of groups (), and Newman-Keuls uses critical values that vary depending on the number of steps between the ordered means (), Tukey's B calculates the critical value () as the simple arithmetic mean of the critical values obtained from those two procedures:

The absolute difference between two means, , is then compared against a critical difference value:

where:

is the mean squared error from the ANOVA, and
and are the sample sizes of the groups being compared.

If , the difference is declared statistically significant.

Characteristics and comparison with other methods

Tukey's B method is a standard post-hoc option in statistical packages such as SPSS, and provides a middle ground for researchers:

Error rate control: it offers better control over the family-wise-error rate than the Newman-Keuls method, but is less conservative than Tukey's HSD.
Statistical power: it generally has greater statistical power than Tukey's HSD, making it more likely to detect true differences.

Statistical criticism

In contemporary statistical practice, the procedure has largely fallen out of favor due to several factors:

Theoretical grounding: unlike the Tukey HSD, which is rooted in the distribution of the studentized range, Tukey's B lacks a rigorous mathematical justification for its averaging approach.
Error rate control: because it is a hybrid, it does not guarantee the same level of family-wise error rate protection as more modern, stepwise procedures.
Availability of alternatives: the development of more powerful and theoretically sound procedures, such as the Ryan-Einot-Gabriel-Welsch (REGW) or the Fisher-Hayter test, has rendered Tukey's B largely obsolete in most modern statistical software packages.

References