Programs for Computing Group Sequential Boundaries Using the Lan-DeMets Method
Version 2
by
July 16, 1996
Introduction
The design of many clinical trials includes some strategy for early stopping if an interim analysis reveals large differences between treatment groups. In addition to saving time and resources, such a design feature can reduce study participants’ exposure to the inferior treatment. However, when repeated significance testing on accumulating data is done, some adjustment of the usual hypothesis testing procedure must be made to maintain an overall significance level (Armitage, McPherson & Rowe, 1969; McPherson & Armitage, 1971). The methods described by Pocock (1977) and O’Brien & Fleming (1979), among others, are popular implementations of group sequential testing for clinical trials. Sometimes interim analyses are equally spaced in terms of calendar time or the information available from the data, but this assumption can be relaxed to allow for unplanned or unequally spaced analyses. Lan & DeMets (1983) introduced type I error spending functions, denoted , and determined boundaries by
where are (upper) boundaries for the sequence of interim test statistics and is either the proportion of elapsed time to maximum duration or observed information to total information. That is, if the interim standardized test statistic at the interim analysis is denoted by , we continue the trial as long as (two-sided), otherwise termination is considered. The spending function for and for . That is, this flexible procedure guarantees a fixed level when the trial is complete. Neither the time or the number of analyses needs to be specified in advance: only must be specified. Issues surrounding the use of calendar time and information have been discussed by Lan & DeMets (1989) and Lan, Reboussin & DeMets (1994). Spending functions, which are also called use functions, are prespecified and correspond to those described by Lan & DeMets (1983) and Kim & DeMets (1987a). These are similar to commonly used group sequential boundaries proposed by Pocock (1977) and O’Brien & Fleming (1979). Additional spending functions may be found in Hwang, Shih & de Cani (1990).
Figure: Sequential outcomes and boundaries for interim standardized test statistics from a clinical trial.
Options
The program described here perform computations related to group sequential boundaries, such as the one illustrated in Figure . The program begins by prompting the user to specify whether it is being run interactively or not, and then to specify one of four options. It continues prompting based on the selected option. The options are:
- computation of boundaries for a specified spending function (including graphical presentation);
- power calculation for a specified set of boundary values and a drift parameter corresponding to the alternative hypothesis;
- computation of the exit probabilities for a specified spending function, analysis times, and drift parameter;
- computation of confidence intervals following termination of a trial.
For interim analysis of an ongoing clinical trial, the first option takes as input the times of the previous and current interim analyses, and the type I error spending function. The program then reports what boundaries should be used to determine whether or not to stop the trial. This is accomplished using Equation () and a searching routine which makes an initial choice of boundaries, computes stopping probabilities, and alters boundaries until the desired alpha level is obtained. The other options, in contrast, evaluate probabilities associated with a given set of boundaries. They require as input boundaries and times for the interim analyses. The package can be used to design sequential trials, determine boundary values while the trial is ongoing or compute confidence intervals when the trial is ended. We present examples for design or analyses using test statistics comparing mean, binomial, survival or repeated measures outcomes.
Summary of methodology
A detailed presentation of the methodology may be found in Lan & DeMets (1983), DeMets & Lan (1984), and Lan & Zucker (1993). Group sequential procedures for interim analyses are equivalent to discrete boundary crossing problems for a Brownian motion process W(t) with drift parameter . We take advantage of this correspondence in both theoretical developments and in implementation. At each interim analysis, a standardized test statistic is computed. These normally distributed variates have mean , where is the “drift” parameter, and for , where is the information fraction (or information time) at the analysis, e.g. if is the maximum sample size (per arm). The drift parameter and the standardized difference are related by the equation
To reiterate in more technical terms, the program uses Equation () to determine one of
- for given , and ,
- given , and
- , where , given and ,
- a confidence interval for given , , and
It may be useful to note correspondences between the notation used here and in some other references (see Table 1).
Table: Correspondence of notation for commonly used group sequential parameters.
To clarify notation for the sample size, let be the number of subjects at the look in each treatment arm. is the maximum number of subjects per treatment arm and K is the maximum number of looks or interim analyses. If there are n subjects accumulated between interim analyses, . The drift parameter can be expressed in terms of the noncentrality parameter in Pocock (1977) as .
Use of the Program for Study Design
Although spending functions provide flexibility in data monitoring and do not require analysis times to be prespecified, the anticipated number and timing of interim analyses must be specified for design purposes. This is not more restrictive than for the group sequential procedures proposed by Pocock (1977) or O’Brien & Fleming (1979). Deviation from the initial design, even substantially, does not cause a serious loss of power. Thus for design only, we shall assume , where K is the anticipated number of interim analyses and n is the anticipated number of subjects accrued between analyses.
Kim & DeMets (1992) provide a detailed discussion of sample size determination for group sequential testing. The relationship between sample size and power depends on two quantities: the drift parameter of the underlying Brownian motion and the standardized difference between control and treatment arms. Thus by determining and for a particular design problem, the required sample size can be computed. The value of depends on the desired power, the set of boundaries and analysis times, and the properties of Brownian motion. Exit or rejection probabilities for Brownian motion given a set of boundaries can be computed by the program or, for certain designs, found in the tables provided by Kim & DeMets (1992). The sequential boundaries are determined by the choice of spending function , the number and timing of interim analyses, the level and whether the test is one or two sided. The standardized difference , on the other hand, depends on the type of data to be collected by the study. Several examples are detailed below for normal, binomial and survival data.
Kim & DeMets (1992) provide tables of drift parameters for spending functions producing O’Brien-Fleming type and Pocock type boundaries ( and , respectively). The program currently offers five choices for spending functions, but others can be added (see Appendix).
Normally distributed data
Kim-DeMets example
Kim & DeMets (1992) discuss the following example. Suppose that a normally distributed response has mean in controls of with standard deviation . The null hypothesis is , where is the mean in the experimental group, expected to be 200. The test statistic is
Then the drift parameter is
So
For the program, we specify two-sided O’Brien-Fleming type ( ) boundaries with K = 5 looks at 0.2, 0.4, 0.6, 0.8 and 1.0 (see Section 4.1). The output boundary values are
Kim & DeMets (1992) indicate that for 90% power, so
The program can verify that corresponds to 90% power, and that alternative timings of analyses does not greatly affect the power (see Section 4.1). The effect of alternative assumptions for on sample size can be determined without recomputing .
Kim-DeMets example with Pocock boundary
Suppose that in the previous example the O’Brien-Fleming type boundaries were replaced with Pocock type boundaries ( ). The computations are identical except for the value of . Two-sided 0.05 Pocock type boundary values are
Kim and DeMets (1992) indicate that for 90% power using these boundaries, , so
An example using Pocock’s notation
We duplicate an example from Pocock (1977). If we take and N = 5, corresponding boundaries are determined. For a desired power , we determine using the program that so that . To compare two sample means, we compute
where and and from Pocock (1977)
For ,
so 2nN = 2(20)(5) = 200 subjects.
Binomially distributed data
In the binomial case, where we test , assume and . The statistic
has asymptotically a normal distribution with a mean of 0 and a variance of 1 (under ). The standardized difference is
where
Kim and DeMets example
Kim & DeMets (1992) show so
For example, if and under the alternative hypothesis, then , and for a one sided test using five interim analyses and Pocock type boundaries ( ), we have
For and 90% power, Kim & DeMets (1992) report (or see Section 4.2), so
O’Brien-Fleming example
As another binomial example, consider a two sided test with O’Brien-Fleming type ( ) boundaries, and for design purposes only, assume K=5 equally spaced analyses at 0.2, 0.4, 0.6, 0.8 and 1.0. As above, we take , but now let and under the alternative hypothesis (a 25% reduction, ). The program produces
From Kim and DeMets (1992), (see Section 4.1) so
Survival data
Suppose we are interested in comparing the hazard rate of two populations. Let be the hazard function of the control group and the hazard function in the treatment group. Under the null hypothesis and . The logrank statistic is
where d is the number of events, is 1 if the event at is in the control group and 0 if it is in the treatment group, is the number of patients in the control group at risk just before , and is the number of patients in the treatment group at risk just before . The expected value of L(d) is approximately , and the estimated variance is
These approximations are reasonable if and is close to 0. If is the number of events at analysis k, the statistic has a distribution, so Then the maximum number of events required per arm is
If we assume and (see Section 4.3),
Repeated measures
Many clinical trials are designed to measure subjects repeatedly over the course of the trial, and define as the primary outcome the change or slope over time. For such trials, the difference between treatment groups can be tested using the estimated slopes from each group using
where and are the average of the slopes estimated for patients in the treatment and control groups at the interim analysis, and and are their variances. The sequentially computed have been shown to have the required Brownian motion structure when the variance parameters are known (Reboussin, Lan & DeMets, 1992; Wu & Lan, 1992). Lan, Reboussin & DeMets (1994) show
where and are the mean population slopes, is the between patient variance of the slopes, and is the natural estimate of total information at the end of the trial. For the comparison of means and binomial proportions, , but in this case, the natural estimate of total information, denoted , is the sum of the natural estimates of information for each patient:
where R is the ratio of within to between patient variance. For design purposes, we may assume an identical number and timing of measurements for all patients, so that is . Then
and
so
If a sufficient number of observations are taken on each patient, the term is nearly one (Lan, Reboussin & DeMets, 1994), so that the power computations are similar to the normal case.
Using the Program to Sequentially Analyze a Trial
We describe how to run the program using data from the Beta-Blocker Heart Attack Trial or BHAT (Beta-Blocker Heart Attack Trial Research Group, 1982). BHAT, a study sponsored by the National Heart, Lung and Blood Institute, was designed to test whether long term use of propranolol by patients with recent heart attack reduced mortality. The following example does not correspond exactly to what was actually done for BHAT, though it is similar. From June 1978 to October 1980, 3837 patients were randomized to either propranolol (1916 patients) or placebo (1921 patients). Follow-up was originally scheduled to end in June 1982. The total information D (number of deaths by June 1982) was never observed since the trial was terminated early in October 1981. The value of D was estimated to be 628 when BHAT was designed, but with the data available in September 1982, was estimated to be around 400 (Lan & DeMets, 1989). In the six Policy and Data Monitoring Board meetings (May 1979, October 1979, March 1980, October 1980, April 1981, and October 1981), the observed number of deaths were (56, 77, 126, 177, 247, 318) and normalized log-rank statistics were (1.68, 2.24, 2.37, 2.30, 2.34, 2.82).
Computing boundaries with a spending function
Let denote calendar time measured from the beginning of the trial, and denote the maximum duration in calendar time. Let be the information fraction or “information time”, which must often be estimated by , some function either of calendar time or number of observed patients or events. We begin with an example using only calendar time.
Example with calendar time
Set in June 1978 and assume the maximum duration is months, which corresponds to June 1982. Then the calendar times for interim analyses correspond to (11, 16, 21, 28, 34, 40) months after the start of the trial. We estimate as a function of calendar time by , so the information times are (0.2292, 0.3333, 0.4375, 0.5833, 0.7083, 0.8333), and adopt the spending function to construct a data monitoring boundary. This corresponds to in Lan & DeMets (1983) and Kim & DeMets (1987a). The original BHAT design had a two-sided significance level of 0.05.
When the data were monitored in May 1979, , and . The program produces a boundary value of : if is standard normal, . In October 1979, , , and . Ignoring the observed number of deaths and using only calendar time, the calculation proceeds as follows. Suppose and are standard normal with correlation coefficient We wish to find such that This solution requires some numerical integration which the program performs. In fact, this equality is satisfied if .
In this example, after specifying Option 1, the user is prompted for
- the number interim analyses (2),
- whether the analyses are equally spaced (no),
- times of the interim analyses (0.2292, 0.3333),
- whether a second time scale for information will be entered (Lan and DeMets, 1989) (no),
- the overall significance level (.05)
- whether the test was one-sided or two-sided symmetric (2),
- which function to apply ( )
- whether the boundary values should be truncated (no)
(see Section 4.4). The program returns and . For the third analysis, we enter 0.2292, 0.3333, and 0.4375. The program returns and as before, plus . Continuing in this manner, we obtain the boundary values (2.53, 2.61, 2.57, 2.47, 2.43, 2.38).
Example with information
We now repeat the above calculation using the information in the number of deaths. Assuming the total information is the number of expected events, D = 628, the information fractions are (56/628, 77/628, 126/628, 177/628, 247/628, 318/628), or (0.0892, 0.1226, 0.2006, 0.2818, 0.3933, 0.5064). Then at the second interim analysis, the program would ask for
- the number interim analyses (2),
- whether the analyses are equally spaced (no),
- times of the interim analyses (0.0892, 0.1226),
- whether a second time scale for information will be entered (Lan and DeMets, 1989) (no),
- the overall significance level (.05)
- whether the test was one-sided or two-sided symmetric (2),
- which function to apply ( ).
- whether the boundary values should be truncated (no)
The information fractions are treated as times. Since we do not enter the information separately apart from the information fractions, the answer to the question on a second time scale is “no”. The output boundary values are (2.84, 2.97). At the sixth analysis, when the additional times are input, the resulting boundary values are (2.84, 2.97, 2.79, 2.72, 2.61, 2.54).
Two time scales
Some users may be familiar with the use of both information and calendar time as described in Lan & DeMets (1989) and Lan, Reboussin & DeMets (1994). The program includes such an option. We will use the percent of elapsed calendar time to determine how much type I error probability is to be spent, but for the correlation of successive test statistics, we will use the information in the number of deaths. The first boundary is computed exactly as above. For the analysis in October 1979, at 16 months, , , and also just as before. To evaluate , note that even though is unknown, is observed. If and are standard normal then the correlation coefficient , and the solution to is . The program asks the same questions as before (see Section 4.6). Since the times entered were based on the percent of elapsed calendar time, it is desirable to use the information available in the number of deaths. When the question on a second time scale for information is asked, we answer “yes” and enter the information for each analysis, which is the number of deaths in this example. The resulting boundaries are (2.53, 2.59, 2.63, 2.50, 2.51, 2.47) for the six data monitoring points of BHAT, and this boundary is crossed at or in October of 1981. This is the same as the result given for the example in Lan & DeMets (1989).
Computing confidence intervals
Kim & DeMets (1987b) detail the theory for confidence intervals following early termination using group sequential tests. Suppose that a trial has been stopped at the analysis with boundary values and with final standardized estimate of treatment difference . The confidence interval is based on computing upper exit probabilities associated with
Continuing with the previous example, the final observed standardized statistic was 2.82, and suppose that a 95 percent confidence interval is desired. The program prompts for
- the number of analyses (6),
- whether the analyses are equally spaced between 0 and 1 (no),
- the information times of the analyses (.2292, .3333, etc.),
- whether a spending function will be used (no),
- whether the boundary is one or two sided (2),
- whether the two sided boundary is symmetric (yes),
- the boundaries to be evaluated (2.53, 2.61, etc.).
- the value of the standardized statistic at the last analysis (2.82),
- the confidence level (0.95).
Computation of confidence intervals is the most time consuming of the four options since it involves a linear search. The program outputs the result (0.1881, 4.9347) (see Section 4.7).
Using the equation we can translate this interval into an interval for . The statistic is based on 318 events, so , or is the lower bound. Repeating this computation for the upper bound, we obtain (0.021, 0.553) as a 95% confidence interval for .
Examples with Output
This section contains examples of interactive sessions with the program, which were used for the examples considered in Sections 2 and 3.
Normally distributed data
This program output related to the first example in Section 2.1. For this example, we use 5 equally spaced interim analyses (0.2, 0.4, 0.6, 0.8, and 1.0) with two-sided O’Brien-Fleming boundaries and . We first determine the boundaries and then for these boundaries, determining the drift parameter to calculate a sample size.
PROGRAM PROMPTS USER INPUT Is this an interactive session? (1=yes,0=no) y interactive = 1 Enter number for your option: (1) Compute bounds for given spending function. (2) Compute drift for given power and bounds (3) Compute probabilities for given bounds. (4) Compute confidence interval. 1 Option 1: You will be prompted for a spending function. Number of interim analyses? 5 5 interim analyses. Equally spaced times between 0 and 1? (1=yes,0=no) y Analysis times: 0.200 0.400 0.600 0.800 1.000 Do you wish to specify a second time/information scale? (e.g. number of patients or number of events, as in Lan & DeMets 89?) (1=yes, 0=no) n Overall significance level? (>0 and <=1) .05 alpha = 0.050 One(1) or two(2)-sided symmetric? 2 2.-sided test Use function? (1-5) (1) OBrien-Fleming type (2) Pocock type (3) alpha * t (4) alpha * t^1.5 (5) alpha * t^2 1 Use function alpha-star 1 Do you wish to truncate the standardized bounds? (1=yes, 0=no) n Bounds will not be truncated. This program generates two-sided symmetric boundaries. n = 5 alpha = 0.050 use function for the lower boundary = 1 use function for the upper boundary = 1 Time Bounds alpha(i)-alpha(i-1) cum alpha 0.20 -4.8769 4.8769 0.00000 0.00000 0.40 -3.3569 3.3569 0.00079 0.00079 0.60 -2.6803 2.6803 0.00683 0.00762 0.80 -2.2898 2.2898 0.01681 0.02442 1.00 -2.0310 2.0310 0.02558 0.05000 Do you want to see a graph? (1=yes,0=no) y
: 5.00: * 4.60: 4.20: 3.80: 3.40: * 3.00: 2.60: * 2.20: * * 1.80: 1.40: 1.00: 0.60: 0.20: -0.20: -0.60: -1.00: -1.40: -1.80: -2.20: * * -2.60: * -3.00: -3.40: * -3.80: -4.20: -4.60: -5.00: * ............................................... 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 Done.
Once these initial boundaries are obtained, to compute the required sample size, we must find the drift parameter corresponding to the desired power. In the program, this is option 2. We enter the times and boundary values and select the desired power. Alternatively, drift parameters for some potential analysis scenarios are contained in Kim & DeMets (1992). In our example, a drift parameter of 3.2788 gives a power of 0.90.
PROGRAM PROMPTS USER INPUT Is this an interactive session? (1=yes,0=no) y interactive = 1 Enter number for your option: (1) Compute bounds for given spending function. (2) Compute drift for given power and bounds (3) Compute probabilities for given bounds. (4) Compute confidence interval. 2 Option 2: You will be prompted for bounds and a power level. Number of interim analyses? 5 5 interim analyses. Equally spaced times between 0 and 1? (1=yes,0=no) y Analysis times: 0.200 0.400 0.600 0.800 1.000 Are you using a spending function to determine bounds? (1=yes,0=no) y Spending function will determine bounds. Overall significance level? (>0 and <=1) .05 alpha = 0.050 One(1) or two(2)-sided symmetric? 2 2.-sided test Use function? (1-5) (1) OBrien-Fleming type (2) Pocock type (3) alpha * t (4) alpha * t^1.5 (5) alpha * t^2 1 Use function alpha-star 1 Do you wish to truncate the standardized bounds? (1=yes, 0=no) n Bounds will not be truncated. Time Bounds 0.20 -4.8769 4.8769 0.40 -3.3569 3.3569 0.60 -2.6803 2.6803 0.80 -2.2898 2.2898 1.00 -2.0310 2.0310 Desired power? (>0 and <=1) .9 Power is 0.900 n = 5, drift = 3.2788 look time lower upper exit probability cum exit pr 1 0.20 -4.8769 4.8769 0.00032 0.00032 2 0.40 -3.3569 3.3569 0.09939 0.09971 3 0.60 -2.6803 2.6803 0.34658 0.44629 4 0.80 -2.2898 2.2898 0.29966 0.74595 5 1.00 -2.0310 2.0310 0.15405 0.90000 Done.
A drift of 3.28 was used in Section 2.1.1 to compute the required sample size for 90% power, which was 48.44 patients per arm.
Consider another sample size determination based on a different initial analysis plan. This set of analyses will be planned for unequally spaced time points 0.1, 0.4, 0.75, 1.0, but other features of the test are the same. The program determines the corresponding drift parameter.
PROGRAM PROMPTS USER INPUT Is this an interactive session? (1=yes,0=no) y interactive = 1 Enter number for your option: (1) Compute bounds for given spending function. (2) Compute drift for given power and bounds (3) Compute probabilities for given bounds. (4) Compute confidence interval. 2 Option 2: You will be prompted for bounds and a power level. Number of interim analyses? 4 4 interim analyses. Equally spaced times between 0 and 1? (1=yes,0=no) n Times of interim analyses: (>0 & <=1) .1 .4 .75 1.0 Analysis times: 0.100 0.400 0.750 1.000 Are you using a spending function to determine bounds? (1=yes,0=no) y Spending function will determine bounds. Overall significance level? (>0 and <=1) .05 alpha = 0.050 One(1) or two(2)-sided symmetric? 2 2.-sided test Use function? (1-5) (1) OBrien-Fleming type (2) Pocock type (3) alpha * t (4) alpha * t^1.5 (5) alpha * t^2 1 Use function alpha-star 1 Do you wish to truncate the standardized bounds? (1=yes, 0=no) n Bounds will not be truncated. Time Bounds 0.10 -6.9914 6.9914 0.40 -3.3569 3.3569 0.75 -2.3449 2.3449 1.00 -2.0125 2.0125 Desired power? (>0 and <=1) .9 Power is 0.900 n = 4, drift = 3.2696 look time lower upper exit probability cum exit pr 1 0.10 -6.9914 6.9914 0.00000 0.00000 2 0.40 -3.3569 3.3569 0.09871 0.09871 3 0.75 -2.3449 2.3449 0.58876 0.68746 4 1.00 -2.0125 2.0125 0.21254 0.90000 Done.
The sample size is computed
Notice that the different timing of interim analyses has little impact on the sample size needed to achieve 90% power.
Binomially distributed data
In much the same manner as was done to compare two means from a normal population, we can compare two proportions from a binomial population. Recall the example from Section 2.2.1. We use option 2 to determine the drift parameter for a power of 90% given one sided 0.05 Pocock boundaries and five equally spaced analyses:
PROGRAM PROMPTS USER INPUT Is this an interactive session? (1=yes,0=no) y interactive = 1 Enter number for your option: (1) Compute bounds for given spending function. (2) Compute drift for given power and bounds (3) Compute probabilities for given bounds. (4) Compute confidence interval. 2 Option 2: You will be prompted for bounds and a power level. Number of interim analyses? 5 5 interim analyses. Equally spaced times between 0 and 1? (1=yes,0=no) y Analysis times: 0.200 0.400 0.600 0.800 1.000 Are you using a spending function to determine bounds? (1=yes,0=no) y Spending function will determine bounds. Overall significance level? (>0 and <=1) .05 alpha = 0.050 One(1) or two(2)-sided symmetric? 1 1.-sided test Use function? (1-5) (1) OBrien-Fleming type (2) Pocock type (3) alpha * t (4) alpha * t^1.5 (5) alpha * t^2 2 Use function alpha-star 2 Do you wish to truncate the standardized bounds? (1=yes, 0=no) n Bounds will not be truncated. Time Bounds 0.20 -8.0000 2.1762 0.40 -8.0000 2.1437 0.60 -8.0000 2.1132 0.80 -8.0000 2.0895 1.00 -8.0000 2.0709 Desired power? (>0 and <=1) .9 Power is 0.900 n = 5, drift = 3.2055 look time lower upper exit probability cum exit pr 1 0.20 -8.0000 2.1762 0.22884 0.22884 2 0.40 -8.0000 2.1437 0.25845 0.48729 3 0.60 -8.0000 2.1132 0.19989 0.68718 4 0.80 -8.0000 2.0895 0.13238 0.81956 5 1.00 -8.0000 2.0709 0.08044 0.90000 Done.
The impact of changing frequency
Even if the interim analyses actually performed during the study are not equally spaced, the power is not greatly affected. This can be seen in the following example. Recall our original plan had looks at 0.2, 0.4, 0.6, 0.8 and 1.0 and a target power of 90%. Suppose instead the looks occur at 0.2, 0.5, 0.6, 0.8, and 1.0. Option 3 generates appropriate boundaries and computes the power for a drift of 3.21. As shown, the power is not seriously affected.
PROGRAM PROMPTS USER INPUT Is this an interactive session? (1=yes,0=no) y interactive = 1 Enter number for your option: (1) Compute bounds for given spending function. (2) Compute drift for given power and bounds (3) Compute probabilities for given bounds. (4) Compute confidence interval. 3 Option 3: You will be prompted for bounds or a spending function to compute them. Number of interim analyses? 5 5 interim analyses. Equally spaced times between 0 and 1? (1=yes,0=no) n Times of interim analyses: (>0 & <=1) .2 .5 .6 .8 1.0 Analysis times: 0.200 0.500 0.600 0.800 1.000 Are you using a spending function to determine bounds? (1=yes,0=no) y Spending function will determine bounds. Overall significance level? (>0 and <=1) .05 alpha = 0.050 One(1) or two(2)-sided symmetric? 1 1.-sided test Use function? (1-5) (1) OBrien-Fleming type (2) Pocock type (3) alpha * t (4) alpha * t^1.5 (5) alpha * t^2 2 Use function alpha-star 2 Do you wish to truncate the standardized bounds? (1=yes, 0=no) n Bounds will not be truncated. Time Bounds 0.20 -8.0000 2.1762 0.50 -8.0000 2.0435 0.60 -8.0000 2.1609 0.80 -8.0000 2.0866 1.00 -8.0000 2.0680 Do you wish to use drift parameters? (1=yes, 0=no) y How many drift parameters do you wish to enter? 1 1 drift parameters. Enter drift parameters: 3.21 Drift parameters: 3.210 Drift is equal to the standard treatment difference times the square root of total information per arm. n = 5, drift = 3.2100 look time lower upper exit probability cum exit pr 1 0.20 -8.0000 2.1762 0.22945 0.22945 2 0.50 -8.0000 2.0435 0.38289 0.61234 3 0.60 -8.0000 2.1609 0.07757 0.68991 4 0.80 -8.0000 2.0866 0.13220 0.82211 5 1.00 -8.0000 2.0680 0.07941 0.90152 Done.
Survival data
Referring to the previous survival example in Section 2.3, assume that three equally spaced analyses were initially planned for this study, and that test was to have 90% power. The following output from the program illustrates the Brownian motion drift parameter of 3.261 will give the desired power.
PROGRAM PROMPTS USER INPUT Is this an interactive session? (1=yes,0=no) y interactive = 1 Enter number for your option: (1) Compute bounds for given spending function. (2) Compute drift for given power and bounds (3) Compute probabilities for given bounds. (4) Compute confidence interval. 2 Option 2: You will be prompted for bounds and a power level. Number of interim analyses? 3 3 interim analyses. Equally spaced times between 0 and 1? (1=yes,0=no) y Analysis times: 0.333 0.667 1.000 Are you using a spending function to determine bounds? (1=yes,0=no) y Spending function will determine bounds. Overall significance level? (>0 and <=1) .05 alpha = 0.050 One(1) or two(2)-sided symmetric? 2 2.-sided test Use function? (1-5) (1) OBrien-Fleming type (2) Pocock type (3) alpha * t (4) alpha * t^1.5 (5) alpha * t^2 1 Use function alpha-star 1 Do you wish to truncate the standardized bounds? (1=yes, 0=no) n Bounds will not be truncated. Time Bounds 0.33 -3.7103 3.7103 0.67 -2.5114 2.5114 1.00 -1.9930 1.9930 Desired power? (>0 and <=1) .90 Power is 0.900 n = 3, drift = 3.2608 look time lower upper exit probability cum exit pr 1 0.33 -3.7103 3.7103 0.03380 0.03380 2 0.67 -2.5114 2.5114 0.52651 0.56031 3 1.00 -1.9930 1.9930 0.33969 0.90000 Done.
Computing bounds during analysis of a trial
This is an interactive session using the BHAT data and calendar time as the only time scale. The input sequence is described in Section .
PROGRAM PROMPTS USER INPUT Is this an interactive session? (1=yes,0=no) y interactive = 1 Enter number for your option: (1) Compute bounds for given spending function. (2) Compute drift for given power and bounds (3) Compute probabilities for given bounds. (4) Compute confidence interval. 1 Option 1: You will be prompted for a spending function. Number of interim analyses? 2 2 interim analyses. Equally spaced times between 0 and 1? (1=yes,0=no) n Times of interim analyses: (>0 & <=1) .2292 .3333 Analysis times: 0.229 0.333 Do you wish to specify a second time/information scale? (e.g. number of patients or number of events, as in Lan & DeMets 89?) (1=yes, 0=no) no Overall significance level? (>0 and <=1) .05 alpha = 0.050 One(1) or two(2)-sided symmetric? 2 2.-sided test Use function? (1-5) (1) OBrien-Fleming type (2) Pocock type (3) alpha * t (4) alpha * t^1.5 (5) alpha * t^2 3 Use function alpha-star 3 Do you wish to truncate the standardized bounds? (1=yes, 0=no) n Bounds will not be truncated. This program generates two-sided symmetric boundaries. n = 2 alpha = 0.050 use function for the lower boundary = 3 use function for the upper boundary = 3 Time Bounds alpha(i)-alpha(i-1) cum alpha 0.23 -2.5284 2.5284 0.01146 0.01146 0.33 -2.6098 2.6098 0.00520 0.01667 Do you want to see a graph? (1=yes,0=no) n Done.
In this case, the program outputs the number of analyses so far, the type I error specified, the use function chosen, the times, the computed boundaries, and the type I error “spent” at each analysis so far.
Using the program noninteractively
Some users may want to use the program noninteractively. This can be done by preparing an input file with the appropriate format. Each question is answered on its own line in the input file, and the answer to the first question must be “no” or “0”. Here is an input file which reproduces the above interactive session:
0 # noninteractive 1 # option 1: bounds 2 # number of analyses 0 # equally spaced? (0=no) .2292 .3333 # times of analyses 0 # second time scale? (0=no) .05 # alpha 2 # 1 or 2 sided test 3 # use function (1-5) 0 # truncate boudaries (0=no) 0 # show graph? (0=no) 0 # start again? (0=no)
The resulting output is
Is this an interactive session? (1=yes,0=no) interactive = 0 2 interim analyses. Analysis times: 0.229 0.333 alpha = 0.050 2.-sided test Use function alpha-star 3 This program generates two-sided symmetric boundaries. n = 2 alpha = 0.050 use function for the lower boundary = 3 use function for the upper boundary = 3 Time Bounds alpha(i)-alpha(i-1) cum alpha 0.23 -2.5284 2.5284 0.01146 0.01146 0.33 -2.6098 2.6098 0.00520 0.01667 Do you want to see a graph? (1=yes,0=no) Done.
Using information to compute boundaries during analysis
For this session, the numbers of events were entered as information, as described in Section 3.1.
PROGRAM PROMPTS USER INPUT Is this an interactive session? (1=yes,0=no) y interactive = 1 Enter number for your option: (1) Compute bounds for given spending function. (2) Compute drift for given power and bounds (3) Compute probabilities for given bounds. (4) Compute confidence interval. 1 Option 1: You will be prompted for a spending function. Number of interim analyses? 6 6 interim analyses. Equally spaced times between 0 and 1? (1=yes,0=no) n Times of interim analyses: (>0 & <=1) .2292 .3333 .4375 .5833 .7083 .8333 Analysis times: 0.229 0.333 0.438 0.583 0.708 0.833 Do you wish to specify a second time/information scale? (e.g. number of patients or number of events, as in Lan & DeMets 89?) (1=yes, 0=no) y Second scale will estimate covariances. Information: 56 77 126 177 247 318 Information 56.000 77.000 126.000 177.000 247.000 318.000 Overall significance level? (>0 and <=1) .05 alpha = 0.050 One(1) or two(2)-sided symmetric? 2 2.-sided test Use function? (1-5) (1) OBrien-Fleming type (2) Pocock type (3) alpha * t (4) alpha * t^1.5 (5) alpha * t^2 3 Use function alpha-star 3 Do you wish to truncate the standardized bounds? (1=yes, 0=no) n Bounds will not be truncated. This program generates two-sided symmetric boundaries. n = 6 alpha = 0.050 use function for the lower boundary = 3 use function for the upper boundary = 3 Time Information Bounds alpha(i)-alpha(i-1) cum alpha 0.23 56.00 -2.5284 2.5284 0.01146 0.01146 0.33 77.00 -2.5905 2.5905 0.00520 0.01667 0.44 126.00 -2.6327 2.6327 0.00521 0.02187 0.58 177.00 -2.5036 2.5036 0.00729 0.02916 0.71 247.00 -2.5073 2.5073 0.00625 0.03542 0.83 318.00 -2.4655 2.4655 0.00625 0.04166 Do you want to see a graph? (1=yes,0=no) n Done.
In addition to the output described previously, the information is also reported.
Computing a confidence interval at the end of a trial
In addition to the information needed to compute probabilities associated with a set of boundaries, computing a confidence interval also requires the last value of the standardized test statistic.
PROGRAM PROMPTS USER INPUT Is this an interactive session? (1=yes,0=no) y interactive = 1 Enter number for your option: (1) Compute bounds for given spending function. (2) Compute drift for given power and bounds (3) Compute probabilities for given bounds. (4) Compute confidence interval. 4 Option 4: You will be prompted for bounds and a confidence level. Number of interim analyses? 6 6 interim analyses. Equally spaced times between 0 and 1? (1=yes,0=no) n Times of interim analyses: (>0 & <=1) .2292 .3333 .4375 .5833 .7083 .8333 Analysis times: 0.229 0.333 0.438 0.583 0.708 0.833 Are you using a spending function to determine bounds? (1=yes,0=no) no You must enter a set of bounds. One(1)- or two(2)-sided? 2 2-sided test Symmetric bounds? (1=yes,0=no) y Two sided symmetric bounds. Enter upper bounds (standardized): 2.53 2.61 2.57 2.47 2.43 2.38 Bounds entered. Time Bounds 0.23 -2.5300 2.5300 0.33 -2.6100 2.6100 0.44 -2.5700 2.5700 0.58 -2.4700 2.4700 0.71 -2.4300 2.4300 0.83 -2.3800 2.3800 Enter the standardized statistic at the last analysis: 2.82 Last value: 2.8200 Enter confidence level (>0 and <1): .95 95. percent confidence interval Starting computation for lower limit . . . Lower limit computed, starting on upper limit . . . 95. percent confidence interval: ( 0.1881, 4.9347) Drift is equal to the standard treatment difference times the square root of total information per arm. Done.
Translation of the standardized parameter back to an estimate of the difference between treatment groups is done in Section 3.2
Acknowledgements
The authors wish to acknowledge of Kris Erlandson and Bill Ladd for assistance in constructing examples, and Wen Wei for assistance in programming.
References
Armitage, P., McPherson, C. K. & Rowe, B. C. (1969), `Repeated significance tests on accumulating data’, Journal of the Royal Statistical Society, Series A 132, 235-244.
Beta-Blocker Heart Attack Trial Research Group (1982), `A randomized trial of propranolol in patients with acute myocardial infarction. I, Mortality results.’, Journal of the American Medical Association 246, 1707-1714.
DeMets, D. L., & Lan, K. K. G. (1984), `An overview of sequential methods and their applications in clinical trials’, Communications in Statistics, Theory and Methods, 13, 2315-2338.
Hwang, I. K., Shih, W. J. & deCani, J. S. (1990), `Group sequential designs using a family of type I error probability spending functions’, Statistics in Medicine, 9, 1439-1445.
Kim, K. & DeMets, D. L. (1987a), `Design and analysis of group sequential tests based on the type I error spending rate function’, Biometrika 74, 149-154.
Kim, K. & DeMets, D. L. (1987b), `Confidence intervals following group sequential tests in clinical trials’, Biometrics 43, 857-864.
Kim, K. & DeMets, D. L. (1992), `Sample size determination for group sequential clinical trials with immediate response’, Statistics in Medicine 11, 1391-1399.
Lan, K. K. G. & DeMets, D. L. (1983), `Discrete sequential boundaries for clinical trials’, Biometrika 70, 659-663.
Lan, K. K. G. & DeMets, D. L. (1989), `Group sequential procedures: calendar versus information time’, Statistics in Medicine 8, 1191-1198.
Lan, K. K. G. and Zucker, D. M., (1993) `Sequential monitoring of clinical trials: the role of information and Brownian motion’, Statistics in Medicine 12, 753-765.
Lan, K. K. G., Reboussin, D. M. & DeMets, D. L. (1994), `Information and information fractions for design and sequential monitoring of clinical trials’, Communications in Statistics, Part A–Theory and Methods 23, 403-420.
McPherson, C. K. & Armitage, P. (1971), `Repeated significance tests on accumulating data when the null hypothesis is not true’, Journal of the Royal Statistical Society, Series A 134, 15-25.
O’Brien, P. C. & Fleming, T. R. (1979), `A multiple testing procedure for clinical trials’, Biometrics 35, 549-556.
Pocock, S. J. (1977), `Group sequential methods in the design and analysis of clinical trials’, Biometrika 64, 191-199.
Reboussin92a Reboussin, D. M., DeMets, D. L., Kim, K. & Lan, K. K. G. (1992), Programs for computing group sequential boundaries using the Lan-DeMets method, Technical Report 60, Department of Biostatistics, University of Wisconsin-Madison.
Reboussin, D. M., Lan, K. K. G. & DeMets, D. L. (1992). Group sequential testing of longitudinal data. Technical Report 72, Department of Biostatistics, University of Wisconsin-Madison.
Wu, M. C. & Lan, K. K. G., (1992), `Sequential monitoring for comparison of changes in a response variable in clinical studies’, Biometrics 48, 765-779.
Appendix
Theory related to the computations.
Consider a Brownian motion process in continuous time, W(t), , having unknown drift parameter , which may be inspected at times . We wish to test the hypothesis at each inspection time and proceed only if the test fails to reject; that is, if does not exceed some value, so that the sequential test rejects if . Consider a sequence of boundaries, applied at times . Let g denote the standard normal density function, The probability distribution for W at analysis i is determined recursively by and
where is the variance of , that is, . Integrating from to gives the probability that the trial continues past the analysis.
Computations at the first analysis involve only the standard normal density and distribution function, but for the second and beyond, numerical integration is necessary. By applying Fubini’s theorem, we have the continuation probability at analysis i
Note that only a single numerical integration is now required. This manipulation allows the use of simple, accurate approximations to the normal distribution function to be used for computing . Extension of the above to two sided tests is straightforward: if is the lower bound, it can be substituted for in the above integrals.
Description of computations.
For the first analysis, which uses only the cumulative normal distribution, we have . The probability calculated for exceeding the first upper boundary is
In the programs, given , separate subroutines are called to compute the exit probability, denoted and, if there are more analyses to come, to compute . For the routine computing , a grid of values of for , saved from the previous step, is needed. The grid size is standardized, so that it is finer when the increment has a smaller standard deviation. At each grid point u, the quantity is computed and stored in an array. This array is then passed to a numerical integration routine along with and the grid size, and is returned. The other subroutine computes for a grid of values between and . For each grid point, the grid of values of is needed. Letting u denote a point in the grid from to and x denote a point in the grid from to , the quantity is computed and stored in an array. As before, this array is passed to a numerical integration routine, along with and the grid size, and is obtained and stored for the next step. Currently, the numerical integration routine is a composite trapezoidal rule, which appears to produce fairly accurate results. Reboussin, DeMets, Kim & Lan (1992) present testing of the programs for computational accuracy and simulations results for validity. Their appendices contain listings of the code.
Programming for spending functions.
Boundaries and information fractions are related by the type I error spending function. The program contains five choices for these functions in a single subroutine called alphas. The critical source code is:
c Calculate probabilities according to use function. do 50 i=1,nn if (iuse .eq. 1) then pe(i)=2.d0* . (1.d0-pnorm(znorm(1.d0-(alpha/side)/2.d0)/dsqrt(t(i)))) else if (iuse .eq. 2) then pe(i)=(alpha/side)*dlog(1.d0 + (e-1.d0)*t(i)) else if (iuse .eq. 3) then pe(i)=(alpha/side)*t(i) else if (iuse .eq. 4) then pe(i)=(alpha/side)*(t(i) ** 1.5d0) else if (iuse .eq. 5) then pe(i)=(alpha/side)*(t(i) ** 2.0d0) c Add other spending function options here: e.g. c else if (iuse.eq.6) then . . . else write(6,*) ' Warning: invalid use function.' end if
Additional spending functions can be added as “silent” options by editing this section of code. For example, here is the code for a spending function which does not allow stopping until the trial is half over. Once half the information has accumulated, the type I error is spent uniformly until the end of the trial.
else if (iuse .eq. 6) then if (t(i).le.0.0) then pe(i)=0.0d0 else pe(i)=(alpha/side)*(t(i) * 2.0d0 - 1.d0) end if
This could also be added to the input routine with some additional programming effort.