PoS Bayesian Framework for Pivotal Oncology Trials
Source:vignettes/pos_bayes_framework.Rmd
pos_bayes_framework.RmdIntroduction
Here we describe the Bayesian framework that is used to estimate a phase 3 efficacy probability of success (PoS) based on work by Hampson et al. (2022). The framework consists of study and population level models that are described in the following sections.
Let be a progression-free survival (PFS), and we assume that the log hazard ratio (HR) of PFS for a phase III study is . The subscript 3 denotes the phase of the study (i.e., phase III). In addition, we assume that data on either the same endpoint or a different endpoint is observed in an earlier phase I/II or phase II study which preceded a pivotal trial ( or respectively).
General study level model
The study level model for an observed logHR of PFS in the earlier study (phase I/II, phase II), , is assumed to have the following form:
where represents a mean treatment effect on in the earlier study and is the Fisher information for . Using Bayesian framework, the prior for is: where is a population level parameter for the treatment effect on . characterizes the degree of heterogeneity in the treatment effect on across different earlier studies, which is assumed to follow a half-Normal distribution: .
Similar to the above, a phase III study level model for treatment effect on endpoint , has the following distribution:
The population level parameter is shared across the phases of the clinical development. is assumed to follow a half-Normal distribution: . The choice of will follow Supplementary Materials E in Hampson et al. (2022).
Population level model
The population level treatment effect, , is assumed to come from a mixture prior with a random mixing weight:
with the following components:
is the probability that comes from the enthusiastic prior component. is treated as a random variable to incorporate uncertainty and variability in the benchmark probability of success (PoS). The Beta prior allows integration of historical information (e.g., industry Phase III success rates or machine-learning-based predictions) while permitting data-driven updating.
is the enthusiastic component, i.e., a distribution which is centered at the target treatment effect (i.e., alternative hypothesis). is set as a solution to: , which is consistent with the interpretation of the enthusiastic (“alternative”) component, .
is the skeptical component, i.e., a distribution which is centered at the null hypothesis. is set as a solution to: , which is consistent with the interpretation of the skeptical (“null”) component, .
When denotes the logHR of PFS, is the probability that the population level treatment effect is equal to or worse than the null (i.e., logHR is ) when the benchmarking data indicate an optimistic expectation of the treatment effect; or the probability that the population level treatment effect is equal to or better than the target effect in phase III (i.e., logHR is ) when the benchmarking data indicate we should have pessimistic expectation of the treatment effect. should be set to a small number so that the probability of either lack treatment effect under the enthusiastic prior or substaintial treatment effect under the pessimistic prior is small.
Phase III efficacy PoS prediction
After fitting the models that are outlined above, we can generate a phase III efficacy PoS prediction based on the distribution of .
Let denotes the number of analyses considered in a group sequential design for a future phase III study. For instance, if , this means that a study has one interim analysis (IA) and one final analysis (FA). The distribution of the observed log HR for endpoint at the -th analysis, is as follows:
where is the underlying true log hazard ratio for all analyses. is the covariance matrix that encodes the Fisher’s information for :
with being the target number of events at the -th analysis and where is the planned proportion of patients in the control group.
The predicted treatment effect is generated times () based on the Bayesian hierarchical model and the success at the -th analysis is determined by a Frequentest efficacy boundary, . Thus, the probability of stopping a phase III trial for efficacy at the first IA is estimated as: and the probability of stopping a phase III trial for efficacy at the analysis is estimated as:
Finally, the overall PoS is: .
Study level model when phase III primary endpoint is not available from earlier study(ies)
When an early study didn’t have a reliable PFS estimate, and only ORR is available from a randomized controlled phase II study, it can be used for PoS estimation instead. Let represents a log odds ratio (OR) of the treatment effect on ORR, the observed treatment effect, , has the following distribution: where is the Fisher information associated with . Further, let be the treatment effect for PFS in Phase II. Motivated by the results in Blumenthal et al. (2015), we assume the following linear model between the PFS and ORR treatment effects: where is the number of patients in a given trial and the regression parameters are assigned the following priors: The values of , and are determined from historical data, which is provided in the meta-analysis in Blumenthal et al. (2015). Specifically, are point estimates for the intercept and slope from a weighted liner simple (WLS) linear regression model of log(HR PFS) on log(OR ORR), while are their respective SEs, is estimated based on WLS regression residual variance.
Based on the approximated correlation between and , a distribution for can be obtained and, therefore, a predicated efficacy PoS can be estimated as using models that are outlined above.
Indication-specific surrogate-primary endpoint relationships (e.g., ORR PFS)
When early endpoint objective response rate (ORR) is used to predict phase III progression-free survival (PFS), the strength and direction of the association may vary substantially across cancer types. To account for this heterogeneity, we group cancer indications into five categories, each associated with a distinct set of ORR–PFS regression parameters derived from prior Bayesian hierarchical modeling.
Trial indexed by: . Indication indexed by: .
: treatment effect on endpoint at trial . : standard error of treatment effect on endpoint at trial .
Level 1: Observed trial-level model
Each trial reports an observed log-odds ratio for ORR, , and an observed log-hazard ratio for PFS, , modeled as:
Conditional on the (latent) true PFS effect, the true ORR effect follows an indication-specific regression:
Level 2: Indication-specific regression parameters
Information is shared across indications via hierarchical modeling. Each pair is learned adaptively across indications:
Indication Groups
Group 1: Hematologic malignancies
Includes classical Hodgkin lymphoma (cHL), diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), multiple myeloma (MM), non-Hodgkin lymphoma (NHL), and peripheral T-cell lymphoma (PTCL).
Group 2: Gynecologic cancers
Includes cervical, endometrial, and ovarian cancers.
Group 3: Thoracic malignancies
Includes non-small cell lung cancer (NSCLC), small cell lung cancer (SCLC), and mesothelioma.
Group 4: Urologic and gastrointestinal solid tumors
Includes bladder cancer, gastric cancer, and renal cell carcinoma (RCC).
Group 5: Breast cancer
Includes breast cancer.
If no indication is specified, the average ORR–PFS relationship across all indication groups will be used.
Prior ORR data
The method for computing the observed log-odds ratio and its standard error differs depending on whether the earlier study was a two-arm or single-arm trial.
Two-arm setting
In the two-arm setting, and its standard error are estimated directly from the observed response counts in each arm using frequentist estimation:
where , are the number of responders and , are the total number of patients in the SOC and treatment groups, respectively.
Single-arm setting
When only single-arm data are available (i.e., no concurrent control arm), the observed ORR treatment effect cannot be computed directly from two-arm counts. Instead, following Weber et al. (2021), uncertainty in the control ORR is incorporated by placing a distribution over the Standard of Care (SOC) response rate, , using user-specified lower and upper bounds.
Let low_soc_rr and upp_soc_rr denote the
lower and upper bounds for the control ORR. The corresponding
logit-scale bounds are:
Rather than fixing to a point estimate, a normal distribution is placed on the logit scale:
where the mean and standard deviation are derived from the specified bounds:
denotes the quantile function of the standard normal distribution.
ci_rr is the user-specified confidence level for the SOC
response bounds (default = 80%). Lower values may be used for more
conservative assumptions or when the SOC data are uncertain. In the
single-arm setting,
and
entering the expressions above are drawn from the distribution over
rather than observed directly.