School of Mathematics and Statistics - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    The performance of multiple hypothesis testing procedures in the presence of dependence
    Clarke, Sandra Jane ( 2010)
    Hypothesis testing is foundational to the discipline of statistics. Procedures exist which control for individual Type I error rates and more global or family-wise error rates for a series of hypothesis tests. However, the ability of scientists to produce very large data sets with increasing ease has led to a rapid rise in the number of statistical tests performed, often with small sample sizes. This is seen particularly in the area of biotechnology and the analysis of microarray data. This thesis considers this high-dimensional context with particular focus on the effects of dependence on existing multiple hypothesis testing procedures. While dependence is often ignored, there are many existing techniques employed currently to deal with this context but these are typically highly conservative or require difficult estimation of large correlation matrices. This thesis demonstrates that, in this high-dimensional context when the distribution of the test statistics is light-tailed, dependence is not as much of a concern as in the classical contexts. This is achieved with the use of a moving average model. One important implication of this is that, when this is satisfied, procedures designed for independent test statistics can be used confidently on dependent test statistics. This is not the case however for heavy-tailed distributions, where we expect an asymptotic Poisson cluster process of false discoveries. In these cases, we estimate the parameters of this process along with the tail-weight from the observed exceedences and attempt to adjust procedures. We consider both conservative error rates such as the family-wise error rate and more popular methods such as the false discovery rate. We are able to demonstrate that, in the context of DNA microarrays, it is rare to find heavy-tailed distributions because most test statistics are averages.