Group sequential testing

#Group sequential testing full#

Mutate(stop_p_value = first(p_value, desc(stop))) %>% # all weeks this will be the p-value for the first week # first p-value stopped at, if stop = 0 for # flag for whether p-value is less than 0.05 Here a flag is added to indicate weeks where the \(p\)-value for a prop test is % Now there’s a set of tests, some naive early stopping can be added. The base-rate in the control group is set to 50%, and there is no difference between treatment and control ( treatment_effect = 0). We’ll simulate 1,000 experiments each lasting 4 weeks, with 500 new people each week. To begin with, let’s simulate a set of experiments. These functions are used here, so if you’re interested in how they work, go read the previous post. Previously I outlined a set of functions to simulate A/B tests with a binary outcome. This post will focus on group sequential designs as a method for sequential analysis.Īn outline for these methods is provided below, after introducing the problem sequential analysis addresses.

Sequential analysis is broad topic covering a wide range of techniques ( Whitehead, 2005 provides a nice historical overview). The point of sequential analysis techniques is that they allow you to test sequentially in a principled way, rather than randomly peeking at the data (see Albers, 2019 for an introduction to sequential testing). Sequential analysis describes a range of techniques that allow researchers to carry out interim analyses.Īn experiment may then be stopped by these interim analyses, meaning the sample size is variable, rather than fixed. This makes it harder to divert from your plan and not have people notice.Īnother option is to allow interim analyses as the data accumulates.

#Group sequential testing full#

One way to address this issue is to go down the pre-registration route, where the full experimental protocol is specified in advance and published for others to see. There’s a lot of potential for error here if experiments are stopped or changed based on ad hoc peeks at the data. It’s common for scientists and analysts alike to check in on an experiment as the data is coming in ( Albers, 2019, Miller, 2010). This is an ideal, which people often fall short of in practice. We collect all our data then analyse it once with the appropriate statistical model. Typically the sample size of a study is fixed, ideally based on a power analysis.