Review Article - (2015) Volume 6, Issue 3

Analysis of Two-Stage Adaptive Seamless Trial Design

Chow SC1* and Lin M2
1Duke University School of Medicine, Durham, North Carolina, USA
2Food and Drug Administration, Silver Spring, Maryland, USA
*Corresponding Author: Chow SC, Department of Biostatistics and Bioinformatics, Duke University School of Medicine, 2424 Erwin Road, Hock Suite 1102, Room 11068, Durham, NC, USA, Tel: 919-668-7523 Email:

Abstract

In the past decade, adaptive design methods in clinical research have attracted much attention because it offers the principal investigators (1) potential flexibility for identifying clinical benefit of a test treatment under investigation, but efficiency for speeding up the development process. One of the most commonly considered adaptive designs is probably a two-stage seamless (e.g., phase I/II or phase II/III) adaptive design. The two-stage seamless adaptive designs can be classified into four categories depending upon study objectives and study endpoints at different stages. These categories include (I) design with same study objectives and study endpoints at different stages, (II) designs with same study objectives but different study endpoints at different stages, (III) designs with different study objectives but same study endpoints at different stages, and (IV) designs with different study objectives and different study endpoints at different stages. In this article, an overview of statistical methods for analysis of these different types of two-stage designs is provided. In addition, a case study concerning the evaluation of a test treatment for treating hepatitis C infected patients utilizing type (IV) trial design is presented.

Keywords: Adaptive designs; Seamless phase II/III; Efficiency; Flexibility; Validity; Integrity

Introduction

In the past decade, adaptive design methods in clinical research have attracted much attention because it offers the principal investigators (1) potential flexibility for identifying clinical benefit of a test treatment under investigation, but efficiency for speeding up the development process. The FDA adaptive design draft guidance defines an adaptive design as a clinical study that includes a prospectively planned opportunity for modification of one or more specified aspects of the study design and hypotheses based on analysis of data (usually interim data) from subjects in the study [1]. As it is recognized by many investigators/researchers, the use of adaptive design methods in clinical trials may allow the researchers to correct assumptions used at the planning stage and select the most promising option early. In addition, adaptive designs make use of cumulative information of the on-going trial, which provide the investigator an opportunity to react earlier to surprises regardless of positive or negative results Thus, the adaptive design approaches may speed up the drug development process.

Despite the possible benefits for having a second chance to modify the trial at interim when utilizing an adaptive design, it can be more problematic operationally due to bias that may have introduced to the conduct of the trial. As indicated by the FDA draft guidance, operational biases may occur when adaptations in trial and/or statistical procedures are applied after the review of interim (unblinded) data. As a result, it is a concern whether scientific integrity and validity of trial are warranted. Chow and Chang [2] indicated that trial procedures include, but not limited to, inclusion/exclusion criteria, dose/dose regimen and treatment duration, endpoint selection and assessment and/or laboratory testing procedures employed. On the other hand, statistical procedures are referred to as study design, statistical hypotheses (which can reflect study objectives), endpoint selection, power analysis for sample size calculation, sample size re-estimation, and/or sample size adjustment, randomization schedules and statistical analysis plan (SAP). With respect to these trial and statistical procedures, commonly employed adaptations at interim include sample size re-estimation at interim analysis, adaptive randomization with unequal treatment allocation (e.g., change from 1:1 ratio to 2:1 ratio), deleting, adding, or modifying treatment arms after the review of interim data, (4) shifting in patient population due to protocol amendment, different statistical methods, (6) changing study endpoints (e.g., change response rate and/or survival to time-to-disease progression in cancer trials), and changing hypotheses/objectives (e.g., switch a superiority hypothesis to a non-inferiority hypothesis). Therefore, the use of the adaptive design methods in clinical trials seems promising because of its potential flexibility for identifying any possible clinical benefit, signal, and/or trend regarding efficacy and safety of the test treatment under investigation. However, major adaptations may have an impact on the integrity and validity of the clinical trials, which may raise some critical concerns to the accurate and reliable evaluation of the test treatment under investigation. These concerns include (1) that the control of the overall type I error rate at a pre-specified level of significance, (2) that the correctness of the obtained p-values, and (3) that the reliability of the obtained confidence interval. Most importantly, major (significant) adaptations may have resulted in a totally different trial that is unable to address the scientific/medical questions the original study intended to answer.

As indicated by Chow [3], a seamless trial design is defined as a trial design that combines two independent trials into a single study that can addresses study objectives from individual studies. An adaptive seamless design is referred to as a seamless trial design that would use data collected before and after the adaptation in the final analysis. In practice, a two-stage seamless adaptive design typically consists of two stages (phases): a learning (or exploratory) phase (stage 1) and a confirmatory phase (stage 2). The objective of the learning phase is not only to obtain information regarding the uncertainty of the test treatment under investigation but also to provide the investigator the opportunity to stop the trial early due to safety and/or futility/efficacy based on accrued data or to apply some adaptations such as adaptive randomization at the end of Stage 1. The objective of the second stage is to confirm the findings observed from the first stage. A two-stage seamless adaptive trial design has the following advantages that (1) it may reduce lead time between studies (the traditional approach); (2) it provides the investigator the second chance to re-design the trial after the review of accumulated date at the end of Stage 1. Most importantly, data collected from both stages are combined for a final analysis in order to fully utilize all data collected from the trial for a more accurate and reliable assessment of the test treatment under investigation.

As indicated in Chow [3] and Chow and Tu [4], in practice, twostage seamless adaptive trial designs can be classified into the following four categories depending upon study objectives and study endpoints at different stage.

Table 1 indicates that there are four different types of two-stage seamless adaptive designs depending upon whether study objectives and/or study endpoints at different stages are the same. For example, Category I designs (i.e., SS designs) include those designs with same study objectives and same study endpoints, while Category II and Category III designs (i.e., SD and DS designs) are referred to those designs with same study objectives but different study endpoints and different study objectives but same study endpoints, respectively. Category IV designs (i.e., DD designs) are the study designs with different study objectives and different study endpoints. In practice, different study objectives could be treatment selection for Stage 1and efficacy confirmation for Stage 2. On the other hand, different study endpoints could be biomarker, surrogate endpoints, or a clinical endpoint with a shorter duration at the first stage versus clinical endpoint at the second stage. Note that a group sequential design with one planned interim analysis is often considered an SS design.

Study Objectives Study Endpoint
Same (S) Different (D)
Same (S) I=SS II=SD
Different (D) III=DS IV=DD
Source: Chow [2]

Table 1: Types of Two-stage seamless Adaptive Designs.

In practice, typical examples for a two-stage adaptive seamless design include a two-stage adaptive seamless phase I/II design and a two-stage adaptive seamless phase II/III design. For the two-stage adaptive seamless phase I/II design, the objective at the first stage may be for biomarker development and the study objective for the second stage is usually to establish early efficacy. For a two-stage adaptive seamless phase II/III design, the study objective is often for treatment selection (or dose finding) while the study objective at the second stage is for efficacy confirmation. In this article, our focus will be placed on Category II designs. The results can be similarly applied to Category III and Category IV designs.

It should be noted that the terms seamless and phase II/III were not used in the FDA draft guidance as they have sometimes been adopted to describe various design features [1]. In this article, a two-stage adaptive seamless phase II/III design only refers to a study containing an exploratory phase II stage (stage 1) and a confirmatory phase III stage (stage 2) while data collected at both phases (stages) will be used for final analysis.

One of the questions that are commonly asked when applying a two-stage adaptive seamless design in clinical trials is sample size calculation/allocation. For the first kind (i.e. Category I, SS) of twostage seamless designs, the methods based on individual p-values as described in Chow and Chang [2] can be applied. However, for other kinds (i.e. Category II to Category IV) of two-stage seamless trial designs, standard statistical methods for group sequential design are not appropriate and hence should not be applied directly. For Category II-IV trial designs, power analysis and/or statistical methods for data analysis are challenging to the biostatistician. For example, a commonly asked question is “How do we control the overall type I error rate at a pre-specified level of significance?” in the interest of stopping trial early, “How to determine stopping boundaries?” is a challenge to the investigator and the biostatistician. In practice, it is often of interest to determine whether the typical O’Brien-Fleming type of boundaries is feasible. Another challenge is “How to perform a valid analysis that combines data collected from different stages?” to address these questions, Cheng and Chow [5] proposed the concept of a multiple-stage transitional seamless adaptive design which takes into consideration of different study objectives and study endpoints.

Properties of Two-Stage Adaptive Design

As compared to the traditional approach (i.e., having two separate studies), a two-stage seamless adaptive design is preferred in terms of controlling type I error rate and power. For comparison of controlling the overall type I error rate, consider a two-stage adaptive trial design that combines a phase II trial and a phase III study. Let αIIand αIII be the corresponding type I error rate for the phase II trial and the phase III study, respectively. Thus, for the traditional approach, the overall type I error rate is given by image . In the two-stage adaptive seamless phase II/III design, on the other hand, the actual desired alpha is given by image. Thus, as compared to the traditional approach, the a for a two-stage adaptive phase II/III design is actually image times larger. Similarly, letimageand imagebe the power for the phase II trial and the phase III study, respectively. Then, the power for the traditional approach is image . In the two-stage phase II/III adaptive design, the power is given by image. Thus, as compared to the traditional approach, the power for a twostage phase II/III adaptive design is image times larger.

A two-stage seamless adaptive trial design has the following advantages. First, it may help in reducing lead time between studies for the traditional approach. In practice, the lead time between end of the phase II trial and kick-off the phase III study is estimated about 6-12 months. This is because that usually the phase III study will not be initiated until the final clinical report of the phase II trial is completed. After the completion of a clinical study, it will usually take about 4-6 months to clean and lock the database, programming and data analysis, and final report. Besides, before we kick-off the phase III trial, protocol development, site selection/initiation, and IRB review/ approval will also take some time. Thus, the use of a two-stage phase II/III adaptive trial design will definitely reduce the lead time between studies. In addition, the nature of adaptive trial design will also allow the investigator to make a go/no-go decision early (i.e., at the end of the first stage). In terms of sample size required, a two-stage phase II/ III adaptive design may require a smaller sample size as compared to the traditional approach. Most importantly, a two-stage phase II/III adaptive trial design allows us to fully utilize data collected from both stages for a combined analysis which will provide a more accurate and reliable assessment of the test treatment under investigation. In what follows, an overview of statistical methods for analysis of different types (i.e. Category I to IV) of two-stage designs is provided. In addition, a case study concerning the evaluation of a test treatment for treating patient with hepatitis C infection of a clinical study utilizing a Category IV adaptive design is presented.

Analysis for Category I Adaptive Designs

Category I design with same study objectives and same study endpoints at different stages is considered similar to a typical group sequential design with one planned interim analysis. Thus, standard statistical methods for group sequential design are often employed. It, however, should be noted that with various adaptations that applied, these standard statistical methods may not be appropriate. In practice, many interesting methods for Category I designs are available in the literature. These methods include (1) Fisher’s criterion for combining independent p-values [6-8], (2) weighted test statistics [9] (3) the conditional error function approach [10,11] and (4) conditional power approaches [12].

Among these methods, Fisher’s method for combining p-values provides great flexibility in selecting statistical tests for individual hypotheses based on sub-samples. Fisher’s method, however, lacks flexibility in the choice of boundaries [13]. For Category I adaptive designs, many related issues have been studied. For example, Rosenberger and Lachin [14] explored the potential use of responseadaptive randomization. Chow, Chang, and Pong [15] examined the impact of population shift due to protocol amendments. Li et al., [12] studied a two-stage adaptive design with a survival endpoint, while Hommel et al. [16] studied a two-stage adaptive design with correlated data. An adaptive design with a bivariate-endpoint was studied by Todd [17] Tsiatis and Mehta [18] showed that there exists a more powerful group sequential design for any adaptive design with sample size adjustment,

For illustration purpose, in what follows, we will introduce the method based on sum of p-values (MSP) by Chang [2,19]. The MSP follows the idea of considering a linear combination of the p-values from different stages.

Theoretical framework

Consider a clinical trial utilizing a K-stage design. This is similar to a clinical trial with K interim analyses, while the final analysis is the Kth interim (final) analysis. Suppose that at each interim analysis, a hypothesis test is performed. The objective of the trial can be formulated as the following intersection of the individual hypothesis tests from the interim analyses

image

where image is the null hypothesis to be tested at the ith interim analysis. Note that there are some restrictions on image that is, rejection of any image will lead to the same clinical implication (e.g. drug is efficacious); hence all image are constructed for testing the same endpoint within a trial. Otherwise the global hypothesis cannot be interpreted.

In practice, image is tested based on a sub-sample from each stage, and without loss of generality, assume image is a test for the efficacy of a test treatment under investigation, which can be written as,

image

where image and image are the responses of the two treatment groups at the i th stage and we assume bigger values are better. It is often the case that when image i2 η =η , the p-valueimage for the sub-sample at the i th stage is uniformly distributed on (0, 1) under image. Under the null hypothesis, Bauer and Kohne [6] used Fisher’s combination of the p-values to construct a test statistic for multiple-stage adaptive designs. Following similar idea, Chang [19] considered a linear combination of the p-values as follows,

image          (1)

Where image and K is the number of interim analyses planned. If image, this leads to

image          (2)

imagecan be viewed as cumulative evidence againstimage . Thus, the smaller the image is, the stronger the evidence is. Alternatively, we can consider image which an average of the evidence is against image. Intuitively, one may consider the stopping rules

image        (3)

Where image are monotonic increasing functions of k , image and image Note that image are referred to as the efficacy and futility boundaries, respectively. To reach the k th stage, a trial has to pass 1 to (k −1) th stages. Therefore, a so-called proceeding probability can be defined as the following unconditional probability:

image         (4)

Where image is the test statistic at the i th stage, and image is the joint probability density function. Thus, the error rate at the k th stage can be obtained as

image         (5)

Since the typeI error rates at different stages are mutually exclusive, the experiment-wise typeI error rate is sum ofπk, k=1,...K. Thus, we have

image         (6)

Note that stopping boundaries can be determined with appropriate choices of ak. The adjusted p-value calculation is the same as the one in a classic group sequential design[20]. The key idea is that when the test statistic at the kth stage image (i.e.just on the efficacy stopping boundary), the p-value is equal to alpha spent image This is true regardless of which error spending function is used and consistent with the p-value definition of the traditional design. As indicated in Chang [19], the adjusted p-value corresponding to an observed test statistic imageat the k th stage can be defined as

image         (7)

Note that image in equation (1) is the stage-wise (unadjusted) p-value from a sub-sample at the i th stage, while p(t;k) are adjusted p-values calculated from the test statistic, which are based on the cumulative sample up to the k th stage where the trial stops, equations (6) and (7) are valid regardless how image are calculated.

Two-stage design

In this section, for simplicity, we will consider the method of sum ofC (MSP) and apply the general framework to the two-stage designs as outlined in Chang [19] and Chow and Chang [2] which are suitable for the following adaptive designs that allow (1) early efficacy stopping, (2) early stopping for both efficacy and futility; and(3) early futility stopping. These adaptive designs are briefly described below.

Early efficacy stopping – For simplicity, consider K = 2 (i.e., a twostage design) which allows for early efficacy stopping (i.e., image). By (5), the typeI error rates to spend at Stage 1 and Stage 2 are given by

image         (8)

and

image         (9)

respectively. Using equations (8) and (9), (6) becomes

image         (10)

Solving for α2 , we obtain

image         (11)

is the stopping probability (error spent) at the first stage under the null hypothesis condition and image is the error spent at the second stage. As a result, if the test statistic image it is certain that imageTherefore, the trial should stop when image for futility.

Based on relationship among α12 and α as given in (10), various stopping boundaries can be considered with appropriate choices of α1 , α2 and α For illustration purpose, Table 2 provides some examples of the stopping boundaries from equations (10, 11).

One-sided α α1 0.005 0.010 0.015 0.020 0.025 0.030
0.025 α2 0.2050 0.1832 0.1564 0.1200 0.0250 -
0.05 α3 0.3050 0.2928 0.2796 0.2649 0.2486 0.2300
Source: Chang [19] Statistics in Medicine, 26, 2772-2784.

Table 2: Stopping boundaries for two-stage efficacy designs.

By (7)-(11), the adjusted p-value is given by

image         (12)

Where image if the trial stops at Stage 1 and image f the trial stops at Stage 2.

Early efficacy or futility stopping – For this case, it is obvious that if image , the stopping boundary is the same as it is for the design with early efficacy stopping. However, futility boundary β1 when image is expected to affect the power of the hypothesis testing. Therefore,

image         (13)

and

image         (14)

Thus, it can be verified that

image         (15)

Similarly, under (15), various boundaries can be obtained with appropriate choices of α1 , α2 , and α1 (Table 3).The adjusted p-value is given by

image         (16)

One-sided α       β1=0.15    
0.025 α1 0.005 0.010 0.015 0.020 0.025
  α2 0.2154 0.1871 0.1566 0.1200 0.0250
0.05 α3 0.005 0.010 0.015 0.020 0.025
  α4 0.3333 0.3155 0.2967 0.2767 0.2554
Source: Chang [19]. Statistics in Medicine, 26, 2772-2784

Table 3: Stopping boundaries for two-stage efficacy and futility designs.

Where image if the trial stops at Stage 1 and image f the trial stops at Stage 2.

Early futility stopping – A trial featuring early futility stopping is a special case of the previous design, where α1 = 0 in equation (15). Hence, we have

image         (17)

Solving for α2 , it can be obtained that

image         (18)

Examples of the stopping boundaries generated using equation (18) are presented in Table 4. The adjusted p-value can be obtained from equation (16), where α1 = 0, that is,

image         (19)

One-sided α β1 0.1 0.2 0.3 ≥0.4
0.025 α2 0.3000 0.2250 0.2236 0.2236
0.05 α2 0.5500 0.3500 0.3167 0.3162
Source: Chang [19].Statistics in Medicine, 26, 2772-2784.

Table 4: Stopping boundaries for two-stage futility design.

Conditional power

Conditional power with or without clinical trial simulation is often considered for sample size re-estimation in adaptive trial designs. As discussed earlier, since the stopping boundaries for the most existing methods are either based on z-scale or p-value, to link a z-scale and a p-value, we will consider image or inversely, image whereimage and image are the normal z -score and the p-value from the sub-sample at the k th stage, respectively. It should be noted that z2 has asymptotically normal distribution with image under the alternative hypothesis, where ˆδ2 is the stimation of treatment difference in the second stage and

image

The conditional power can be evaluated under the alternative hypothesis when rejecting the null hypothesis image. That is,

image         (20)

Thus, the conditional probability given the first stage naïve p-value, p1 at the second stage is given by

image         (21)

As an example, for the method based on the product of stage-wise p-values (MPP), the rejection criterion for the second stage is

image        

Therefore, image

Similarly, for the method based on the sum of stage-wise p-values (MSP), the rejection criterion for the second stage is

image

On the other hand, for the inversenormal method [21] the rejection criterion for the second stage is

image

image

where w1 and w2 are prefixed weights satisfying the condition of image Note that the group sequential design and CHW method [9] are special cases of the inverse-normal method. Since the inverse normal method requires two additional parameters ( w1 and w2 ), for simplicity, we will only compare the conditional powers of MPP and MSP. For a valid comparison, the same α1 is used for both methods.

As it can be seen from equation (21), the comparison of the conditional power is equivalent to the comparison of function image Equating the two image we have

image         (22)

where where αˆ 2and α2 are the final rejection boundaries for MPP and MSP, respectively. Solving (22) for p1 , we obtain the critical point for p1

image         (23)

Equation (23) indicates that when image or image MPP has a higher conditional power than that of MSP. When image MSP has a higher conditional power than MPP.As an example, for one-sided teat at atα = 0.025, ifwe choose α1 = 0.01 and β1 = 0.3, then image and image which result in image by equation (23).

Note that the unconditional power image is nothing but the expectation of conditional power, i.e.

image         (24)

Therefore, the difference in unconditional power between MSP and MPP is dependent on the distribution of p1 , and consequently, dependent on the true difference δ , and the stopping boundaries at the first stage α1 , β1.

Note that in Bauer and Kohne’s [6] method using Fisher’s combination, which leads to the equation image + = it is obvious that determination of β1 leads to a unique α1 , consequently α2 . This is a non-flexible approach. However, it can be verified that the method can be generalized to image where α2 does not have to be image

Note that Tsiatis and Mehta [18] indicated that for any adaptive design with sample size adjustment, there exists a more powerful group sequential design. It, however, should be noted that the efficacy gain by the classic group sequential design is at the price of a cost. For example, as the number of interim analyses increases (e.g. from 3 to 10), the associated cost may increases substantially. Also, the optimal design is under the condition of a pre-specified error-spending function, but adaptive designs do not require in general a fixed error-spending function.

Analysis for category II adaptive designs

Now, consider a Category II two-stage phase II/III seamless adaptive designs which have same study objectives but different study endpoints (continuous endpoints). Let image be the observed value of the study endpoint (e.g., a biomarker) from the ith subject in phase II (Stage 1), image and yj be the observed value of the study endpoint (i.e. the primary clinical endpoint) from the jth subject in phase III (Stage 2), image Suppose that image and image are independently and identically distributed with image and image and imageandVar( y ) , respectively. Chow, Lu and Tse [22] proposed obtaining predicted values of the clinical endpoint based on data collected from the biomarker (or surrogate endpoint) under an established relationship between the biomarker and the clinical endpoint. These predicted values are then be combined with the data collected at the confirmatory phase (Stage 2) to derive a statistical inference on the treatment effect under investigation. For simplicity, suppose that x and y can be correlated in the following straight-line relationship

image         (25)

where ε is the random error with zero mean and variance image ε is assumed to be independent of x. In practice, we assume that this relationship is well-established. In other words, the parameters image and image are assumed known. Based on equation (25), the observations xi observed in the first stage can then be transformed image (denoted by image). image is then considered as the observation of the clinical endpoint and combined with those observations yi collected in the second stage to estimate the treatment mean. Chow, Lu and Tse [22] proposed the following weighted-mean estimator,

image         (26)

where image , image and image It should be noted that image is the minimum variance unbiased estimator among all weightedmean estimators when the weight is given by

image         (27)

if image and image are known. In practice, image and image are usually unknown and ω is commonly estimated by

image         (28)

where image and image are the sample variances of image s and image s, respectively. The corresponding estimator of image , which is denoted by

image         (29)

and is referred to as the Graybill-Deal (GD) estimator of image . Note that Meier [23] proposed an approximate unbiased estimator of the variance of the GD estimator, which has bias of order image Khatri and Shah [24] gave an exact expression of the variance of this estimator in the form of an infinite series, which is given as.

image

Based on the GD estimator, the comparison of the two treatments can be made by testing the following hypotheses

image           (30)

Let image be the predicted value (based on image which is used as the prediction of y for the jth subject under the ith treatment in phase II (Stage 1). From equation (29), the GD estimator of μ1 is given by

image           (31)

where image, image and image with image and image being the sample variances of image and image respectively. For hypotheses (30), consider the following test statistic,

image          (32)

where

image

is an estimator of image i =1, 2. Consequently, an approximate 100 image confidence interval of image is given as

image        (33)

Where image . As a result, the null hypothesis H0 is rejected if the above confidence interval does not contain 0. Thus, under the local alternative hypothesis that image the required sample size to achieve a image power satisfies

image

Thus, if we let mi=ρni and n2=γn1. Then, denoted by NT, the total sample size required for achieving a desired power for detecting a clinically meaningful difference between the two treatments is (1+ρ) (1+γ)n1. which is given by

image        (34)

where image , image and image with image , i = 1, 2.

If one wishes to test for the following superiority hypotheses

image

The required sample size for achievng image power satisfies

image

This gives

image        (35)

where image . For the case of testing for equivalence with a significance level α , consider the local alternative hypothesis thatH1: image with image The required sample size to achieveimage power satisfies

image

Thus, the total sample size for two treatment groups is image with n1 given

image        (36)

where image

Note that formulas for sample size calculation and allocation for testing equality, non-inferiority, superiority, and equivalence for other data types such as binary response and time-to-event endpoints can be similarly obtained.

Analysis for Category III and IV Adaptive Designs

In this section, statistical inference for Category III and IV phase II/III seamless adaptive designs will be discussed. For a Category III design, the study objectives at different stages are different (e.g., dose selection versus efficacy confirmation) but the study endpoints are same at different stages. For a Category IV design, both study objectives and endpoints at different stages are different (e.g., dose selection versus efficacy confirmation with surrogate endpoint versus clinical study endpoint).

As indicated earlier, how to control the overall type I error rate at a pre-specified level is one of the major regulatory concerns when adaptive design methods are employed in confirmatory clinical trials. Another concern is how to perform power analysis for sample size calculation/allocation for achieving individual study objectives originally set by the two separate studies (different stages). In addition, how to combine data collected from both stages for a combined and valid final analysis. Under a Category III or IV phase II/III seamless adaptive design, in addition, the investigator plans to have an interim analysis at each stage. Thus, if we consider the initiation of the study, first interim analysis, end of Stage 1 analysis, second interim analysis, and final analysis as critical milestones, the two-stage adaptive design becomes a 4-stage transitional seamless trial design. In what follows, we will focus on analysis of a four-stage transitional seamless design without (non-adaptive version) and with (adaptive version) adaptations, respectively.

Non-adaptive version

For a given clinical trial comparing k treatments groups, image with a control group C, suppose a surrogate (biomarker) endpoint and a well-established clinical endpoint are available for assessment of the treatment effect. Denoted by image and image the treatment effect comparing image with C assessed by the surrogate (biomarker) endpoint and the clinical endpoint, respectively. Under the surrogate and clinical endpoints, the treatment effect can be tested by the following hypotheses:

image        (37)

which is for the clinical endpoint, while the hypothesis

image        (38)

is for the surrogate (biomarker) endpoint. Cheng and Chow (2015) assumed that image is a monotone increasing function of the corresponding image and proposed to test the hypotheses (37) and (38) at 3 stages (i.e., stage 1, stage 2a, stage 2b, and stage 3) based on accrued data at 4 interim analyses. Their proposed tests are briefly described below. For simplicity, he variances of the surrogate (biomarker) endpoint and the clinical outcome are denoted by image and image which are assumed known.

Stage 1 – At this stage, image subjects are randomly assigned to receive either one of the k treatments or the control at a 1:1 ratio. In this case, we have n1 subjects in each group. At the first interim analysis, the most effective treatment will be selected based on the surrogate (biomarker) endpoint and proceed to subsequent stages. For pairwise comparison, consider test statistics image and image. Thus, if imagefor some pre-specified critical value image then the trial is stopped and we are in favor of image . On the other hand, if image then we conclude that the treatment image is considered the most promising treatment and proceed to subsequent stages. Subjects who receive either the promising treatment or the control will be followed for the clinical endpoint. Treatment assessment for all other subjects will be terminated but will undergo necessary safety monitoring.

Stage 2a – At Stage 2a,Stage 2a – At Stage 2a, image additional subjects will be equally randomized to receive either the treatment image or the control C . The second interim analysis is scheduled when the short term surrogate measures from these image Stage 2 subjects and the primary endpoint measures from thoseimage Stage 1 subjects who receive either the treatment image or the control C become available. Let image and image be the pair-wise test statistics from Stage 1 based on the surrogate endpoint and the primary endpoint, respectively, and image be the statistic from Stage 2 based on the surrogate. If

image

then stop the trial and accept image . If image and image then stop the trial and reject both image and image . Otherwise, if image but image then we will move on to Stage 2b.

Stage 2b – At Stage 2b, no additional subjects will be recruited. The third interim analysis will be performed when the subjects in Stage 2a complete their primary endpoints. Let

image

where image is the pair-wise test statistic from stage 2b. If image then stop the trial and reject image Otherwise, we move on to Stage 3.

Stage 3 – At Stage 3, the final stage, image additional subjects will be recruited and followed till their primary endpoints. At the fourth interim analysis, define

image

where is the pair-wise test statistic from stage 2b. If image then stop the trial and reject image Otherwise, accept imageThe parameters in the above designs, image image and image are determined such that the procedure will have a controlled type I error rate of α and a target power of image

In the above design, the surrogate data in the first stage are used to select the most promising treatment rather than assessing image . This means that upon completion of stage one a dose does not need to be significance in order to be used in subsequent stages. In practice, it is recommended that the selection criterion be based on precision analysis (desired precision or maximum error allowed) rather than power analysis (desired power). This property is attractive to the investigator since it does not suffer from any lack of power because of limited sample sizes.

As discussed above, under the 4-stage transitional seamless design, two sets of hypotheses, namely image and image are to be tested. Since the rejection of image leads to the claim of efficacy, it is considered the hypothesis of primary interest. However, in the interest of controlling the overall type I error rate at a pre-specified level of significance, image need to be tested following the principle of closed testing procedure to avoid any statistical penalties.

In summary, the two-stage phase II/III seamless adaptive design is attractive due to its efficiency, such as potentially reducing the lead time between studies (i.e., a phase II trial and a phase III study) and flexibility, such as making an early decision and taking appropriate actions (e.g. stop the trial early or delete/add dose groups).

Adaptive version

the previous section is basically a group sequential procedure with treatment selection at interim. There are no additional adaptations involved. With additional adaptations (adaptive version), Tsiatis and Metha [18] and Jennison and Turnbull [25] argue that adaptive designs typically suffer from loss of efficiency and hence are typically not recommended in regular practice. Proschan et al. [26] however, also indicated that in some scenarios, particularly when there is not enough primary outcome information available, it is appealing to use an adaptive procedure as long as it is statistically valid and justified. The transitional feature of the multiple stage design enables us not only to verify whether the surrogate (biomarker) endpoint is predictive of the clinical outcome, but also to modify the design adaptively after the review of interim data. A possible modification is to adjust the treatment effect of the clinical outcome while validating the relationship between the surrogate (e.g. biomarker) endpoint and the clinical outcome. In practice, it is often assumed that there exists a local linear relationship between ψ and θ , which is a reasonable assumption if we focus only on the values at a neighborhood of the most promising treatment image . Thus, at the end of Stage 2a, we can re-estimate the treatment effect of the primary endpoint using

image

Consequently, sample size can be re-assessed at Stage 3 based on a modified treatment effect of the primary endpoint image , where image is a minimally clinically relevant treatment effect. Suppose image is the re-estimated Stage 3 sample size based on image . Then, there is no modification for the procedure if image .On the other hand, if image then m (instead of image as originally planned) subjects per arm will be recruited at Stage 3. The detailed justification of the above adaptation can be found in Cheng and Chow [5].

A case study – hepatitis C infection

A pharmaceutical company is interested in conducting a clinical trial for evaluation of safety, tolerability and efficacy of a test treatment for patients with hepatitis C virus infection. For this purpose, a twostage seamless adaptive design is considered. The proposed trial design is to combine two independent studies (one phase IIb study for treatment selection and one phase III study for efficacy confirmation) into a single study. Thus, the study consists of two stages: treatment selection (Stage 1) and efficacy confirmation (Stage 2).The study objective at the first stage is for treatment selection, while the study objective at Stage 2 is to establish the non-inferiority of the treatment selected from the first stage as compared to the standard of care (SOC). Thus, this is a typical Category IV design (a two-stage adaptive design with different study objectives at different stages).

For genotype 1 HCV patients, the treatment duration is usually 48 weeks of treatment followed by a 24 weeks follow-up. The well wellestablished clinical endpoint is the sustained virologic response (SVR) at week 72. The SVR is defined as an undetectable HCV RNA level (< 10 IU/mL) at week 72. Thus, it will take a long time to observe a response. The pharmaceutical company is interested in considering a biomarker or a surrogate endpoint such as a regular clinical endpoint with short duration to make early decision for treatment selection of four active treatments under study at end of Stage 1. As a result, the clinical endpoint of early virologic response (EVR) at week 12 is considered as a surrogate endpoint for treatment selection at Stage 1. At this point, the trial design has become a typical Category IV adaptive trial design (i.e., a two-stage adaptive design with different study endpoints and different study objectives at different stages). The resultant Category IV adaptive design is briefly outline below (Figure 1):

Stage 1 –At this stage, the design begins with five arms (4 active treatment arms and one control arm). Qualified subjects are randomly assigned to receive one of the five treatment arms at a 1:1:1:1:1 ratio. After all Stage 1 subjects have completed Week 12 of the study, an interim analysis will be performed based on EVR at week 12 for treatment selection. Treatment selection will be made under the assumption that the 12 week EVR is predictive of 72 week SVR. Under this assumption, the most promising treatment arm will be selected using precision analysis under some pre-specified selection criteria. In other words, the treatment arm with highest confidence level for achieving statistical significance (i.e., the observed difference as compared to the control is not by chance alone) will be selected. Stage 1 subjects who have not yet completed the study protocol will continue with their assigned therapies for the remainder of the planned 48 weeks, with final follow-up at Week 72. The selected treatment arm will then proceed to Stage 2.

Stage 2 –At Stage 2, the selected treatment arm from Stage 1 will be test for non-inferiority against the control (SOC). A separate cohort of subjects will be randomized to receive either the selected treatment from Stage 1 or the control (SOC)at a 1:1 ratio. A second interim analysis will be performed when all Stage 2 subjects have completed Week 12 and 50% of the subjects (Stage 1 and Stage 2 combined) have completed 48 weeks treatment and follow-up of 24 weeks. The purpose of this interim analysis is two-fold. First, it is to validate the assumption that EVR at week 12 is predictive of SVR at week 72. Second, it is to perform sample size re-estimation to determine whether the trial will achieve study objective (establishing non-inferiority) with the desired power if the observed treatment preserves till the end of the study.

Statistical tests as described in the previous section will be used to test non-inferiority hypotheses at interim analyses and at end of stage analyses. For the two planned interim analyses, the incidence of EVR at week 12 as well as safety data will be reviewed by an independent data safety monitoring board (DSMB). The commonly used O’Brien- Fleming type of conservative boundaries will be applied for controlling the overall Type I error rate at 5% [27]. Adaptations such as stopping the trial early, discontinuing selected treatment arms, and re-estimating the sample size based on the pre-specified criteria may be applied as recommended by the DSMB. Stopping rules for the study will be designated by the DSMB, based on their ongoing analyses of the data and as per their charter.

Figure 1. A diagram of 4-stage transitional seamless trial design

Concluding Remarks

Chow and Chang [2] pointed out that the standard statistical methods for a group sequential trial (with one planned interim analysis) is often applied for planning and data analysis of a twostage adaptive design regardless whether the study objectives and/ or the study endpoints are the same at different stages. As discussed earlier, two-stage seamless adaptive designs can be classified into four categories depending upon the study objectives and endpoints used at different stages. The direct application of standard statistical methods leads to the concern that the obtained p-value and confidence interval for assessment of the treatment effect may not be correct or reliable. Most importantly, sample size required for achieving a desired power obtained under a standard group sequential trial design may not be sufficient for achieving the study objectives under the two-stage seamless adaptive trial design, especially when the study objectives and/or study endpoints at different stages are different.

As indicated in the 2010 FDA draft guidance on adaptive clinical trial design, adaptive designs were classified as either well understood designs or less well understood designs depending upon the availability of well-established statistical methods of specific designs [1]. In practice, most of the adaptive designs (including the two-stage seamless adaptive designs discussed in this article) are considered less well understood designs. Thus, the major challenge is not only the development of valid statistical methods for those less well understood designs, but also the development of a set of criteria for choosing an appropriate design among these less well understood designs for valid and reliable assessment of test treatment under investigation.

Disclaimer

The views presented in this article have not been formally disseminated by the U.S. Food and Drug Administration and should not be construed to represent any agency determination or policy.

References

Citation: Chow SC, Lin M (2015) Analysis of Two-Stage Adaptive Seamless Trial Design. Pharm Anal Acta 6:341.

Copyright: © 2015 Chow SC, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.