Skip to contents

Introduction

Here we describe the Bayesian framework that is used to estimate a phase 3 efficacy probability of success (PoS) based on work by Hampson et al. (2022). The framework consists of study and population level models that are described in the following sections.

Let PP be a progression-free survival (PFS), and we assume that the log hazard ratio (HR) of PFS for a phase III study is θP3\theta_{P3}. The subscript 3 denotes the phase of the study (i.e., phase III). In addition, we assume that data on either the same endpoint PP or a different endpoint DD is observed in an earlier phase I/II or phase II study which preceded a pivotal trial (θP2\theta_{P2} or θD2\theta_{D2} respectively).

General study level model

The study level model for an observed logHR of PFS in the earlier study (phase I/II, phase II), θ̂P2\hat{\theta}_{P2}, is assumed to have the following form:

θ̂P2Normal(θP2,P21),\begin{align} \hat{\theta}_{P2} \sim Normal(\theta_{P2}, \mathcal{I}^{-1}_{P2}) \label{eq:ph2_est}, \end{align}

where θP2\theta_{P2} represents a mean treatment effect on PP in the earlier study and P2\mathcal{I}_{P2} is the Fisher information for θP2\theta_{P2}. Using Bayesian framework, the prior for θP2\theta_{P2} is: θP2Normal(μP,τP22),\theta_{P2} \sim Normal (\mu_P, \tau_{P2}^2), where μP\mu_P is a population level parameter for the treatment effect on PP. τP2\tau_{P2} characterizes the degree of heterogeneity in the treatment effect on PP across different earlier studies, which is assumed to follow a half-Normal distribution: τP2HN(z22)\tau_{P2} \sim HN(z^2_2).

Similar to the above, a phase III study level model for treatment effect on endpoint PP, θP3\theta_{P3} has the following distribution:

θP3Normal(μP,τP32)\begin{align} \theta_{P3} \sim Normal (\mu_P, \tau_{P3}^2)\label{eq:p3} \end{align}

The population level parameter μP\mu_P is shared across the phases of the clinical development. τP3\tau_{P3} is assumed to follow a half-Normal distribution: τP3HN(z32)\tau_{P3} \sim HN(z^2_3). The choice of τP2,τP3\tau_{P2},\ \tau_{P3} will follow Supplementary Materials E in Hampson et al. (2022).

Population level model

The population level treatment effect, μP\mu_P, is assumed to come from a mixture prior with a random mixing weight:

μPωNormal(δP,σP12)+(1ω)Normal(0,σP22),ωBeta(α,β),\begin{align} \mu_{P} & \sim \omega Normal (\delta_{P}, \sigma_{P1}^2) + (1-\omega) Normal(0, \sigma_{P2}^2),\\ \omega &\sim \text{Beta}(\alpha, \beta), \end{align}

with the following components:

  • ω\omega is the probability that μP\mu_P comes from the enthusiastic prior component. ω\omega is treated as a random variable to incorporate uncertainty and variability in the benchmark probability of success (PoS). The Beta prior allows integration of historical information (e.g., industry Phase III success rates or machine-learning-based predictions) while permitting data-driven updating.

  • Normal(δP,σP12)Normal(\delta_P, \sigma^2_{P1}) is the enthusiastic component, i.e., a distribution which is centered at the target treatment effect δP\delta_P (i.e., alternative hypothesis). σP12\sigma^2_{P1} is set as a solution to: P(μP0|ω=1)=γP(\mu_P \ge 0 | \omega = 1)=\gamma, which is consistent with the interpretation of the enthusiastic (“alternative”) component, σP1=δPΦ1(γ)\Leftrightarrow \sigma_{P1} = \frac{\delta_P}{\Phi^{-1}(\gamma)}.

  • Normal(0,σP22)Normal(0, \sigma^2_{P2}) is the skeptical component, i.e., a distribution which is centered at the null hypothesis. σP22\sigma^2_{P2} is set as a solution to: P(μPδP|ω=0)=γP(\mu_P \le \delta_P | \omega = 0)=\gamma, which is consistent with the interpretation of the skeptical (“null”) component, σP2=δPΦ1(γ)\Leftrightarrow \sigma_{P2} = \frac{\delta_P}{\Phi^{-1}(\gamma)}.

When PP denotes the logHR of PFS, γ\gamma is the probability that the population level treatment effect is equal to or worse than the null (i.e., logHR is 0\ge 0) when the benchmarking data indicate an optimistic expectation of the treatment effect; or the probability that the population level treatment effect is equal to or better than the target effect in phase III (i.e., logHR is δP\le \delta_P) when the benchmarking data indicate we should have pessimistic expectation of the treatment effect. γ\gamma should be set to a small number so that the probability of either lack treatment effect under the enthusiastic prior or substaintial treatment effect under the pessimistic prior is small.

Phase III efficacy PoS prediction

After fitting the models that are outlined above, we can generate a phase III efficacy PoS prediction based on the distribution of θP3\theta_{P3}.

Let JJ denotes the number of analyses considered in a group sequential design for a future phase III study. For instance, if J=2J = 2, this means that a study has one interim analysis (IA) and one final analysis (FA). The distribution of the observed log HR for endpoint PP at the jj-th analysis, θ̂P3j,j=1,...,J\hat{\theta}_{P3j}, j = 1, ..., J is as follows:

𝛉̂P3Normal(θP3𝟏J,𝚺J×J),\begin{align} \hat{\boldsymbol{\theta}}_{P3} \sim Normal ({\theta}_{P3}\mathbf{1}_{J}, \mathbf{\Sigma}_{J \times J})\label{eq:thetahat_samp}, \end{align}

where θP3\theta_{P3} is the underlying true log hazard ratio for all JJ analyses. 𝚺\mathbf{\Sigma} is the covariance matrix that encodes the Fisher’s information for θ̂P3j\hat{\theta}_{P3j}:

𝚺ij=σunit2nj,for all ij,\begin{align} \mathbf{\Sigma}_{ij} = \frac{\sigma_{unit}^2}{n_j}, ~~ \text{for all } i \leq j\label{eq:sigma}, \end{align}

with njn_j being the target number of events at the jj-th analysis and σunit2=1p0(1p0)\sigma_{unit}^2 = \frac{1}{p_0(1-p_0)} where p0p_0 is the planned proportion of patients in the control group.

The predicted treatment effect 𝛉̂P3(l)\hat{\boldsymbol{\theta}}_{P3}^{(l)} is generated LL times (l=1,,Ll = 1, \dots, L) based on the Bayesian hierarchical model and the success at the jj-th analysis is determined by a Frequentest efficacy boundary, zP3jz_{P3j}. Thus, the probability of stopping a phase III trial for efficacy at the first IA is estimated as: PoŜ31=1Ll=1LI(θ̂P31(l)<zP31), \hat{PoS}_{31} = \frac{1}{L} \sum_{l=1}^L I(\hat{\theta}_{P31}^{(l)} < z_{P31}), and the probability of stopping a phase III trial for efficacy at the jthj^{th} analysis is estimated as:

PoŜ3j=1Ll=1LI(θ̂P3j(l)<zP3j,θ̂P3i(l)zP3i,i=1,,j1). \hat{PoS}_{3j} = \frac{1}{L} \sum_{l=1}^L I(\hat{\theta}_{P3j}^{(l)} < z_{P3j}, \hat{\theta}_{P3i}^{(l)} \geq z_{P3i}, i = 1, \dots, j-1). Finally, the overall PoS is: PoŜ3=j=1JPoŜ3j\hat{PoS}_{3} = \sum_{j=1}^J \hat{PoS}_{3j}.

Study level model when phase III primary endpoint is not available from earlier study(ies)

When an early study didn’t have a reliable PFS estimate, and only ORR is available from a randomized controlled phase II study, it can be used for PoS estimation instead. Let θORR,2\theta_{ORR,2} represents a log odds ratio (OR) of the treatment effect on ORR, the observed treatment effect, θ̂ORR,2\hat{\theta}_{ORR, 2}, has the following distribution: θ̂ORR,2Normal(θORR,2,ORR,21),\begin{align} \hat{\theta}_{ORR, 2} \sim Normal(\theta_{ORR, 2}, \mathcal{I}_{ORR, 2}^{-1}), \end{align} where ORR,2\mathcal{I}_{ORR, 2} is the Fisher information associated with θ̂ORR,2\hat{\theta}_{ORR, 2}. Further, let θPFS,2\theta_{PFS, 2} be the treatment effect for PFS in Phase II. Motivated by the results in Blumenthal et al. (2015), we assume the following linear model between the PFS and ORR treatment effects: θORR,2N(β0+β1θPFS,2,σWLS2Npatients),\begin{align} \theta_{ORR,2} \sim N(\beta_0 + \beta_1 \theta_{PFS,2}, \frac{\sigma_{WLS}^2}{N_{patients}}), \label{eq:ph23_pfs_orr_rel} \end{align} where NpatientsN_{patients} is the number of patients in a given trial and the regression parameters are assigned the following priors: β0Normal(m0,ν0)β1Normal(m1,ν1).\begin{align*} \beta_0 \sim {Normal}(m_0, \nu_0) \\ \beta_1 \sim {Normal}(m_1, \nu_1). \end{align*} The values of m0,m1m_0, m_1, ν0,ν1\nu_0, \nu_1 and σWLS2\sigma_{WLS}^2 are determined from historical data, which is provided in the meta-analysis in Blumenthal et al. (2015). Specifically, (m0,m1)(m_0, m_1) are point estimates for the intercept and slope from a weighted liner simple (WLS) linear regression model of log(HR PFS) on log(OR ORR), while (ν0,ν1)(\nu_0, \nu_1) are their respective SEs, σWLS2\sigma_{WLS}^2 is estimated based on WLS regression residual variance.

Based on the approximated correlation between θORR,2\theta_{ORR, 2} and θPFS,2\theta_{PFS, 2}, a distribution for θPFS,2\theta_{PFS, 2} can be obtained and, therefore, a predicated efficacy PoS can be estimated as using models that are outlined above.

Indication-specific surrogate-primary endpoint relationships (e.g., ORR \rightarrow PFS)

When early endpoint objective response rate (ORR) is used to predict phase III progression-free survival (PFS), the strength and direction of the association may vary substantially across cancer types. To account for this heterogeneity, we group cancer indications into five categories, each associated with a distinct set of ORR–PFS regression parameters derived from prior Bayesian hierarchical modeling.

Trial indexed by: j=1,,Jj = 1, \dots, J. Indication indexed by: k=zj{1,,K}k = z_j \in \{1, \dots, K\}.

θjP\theta^P_j: treatment effect on endpoint PP at trial jj. σjP\sigma^P_j: standard error of treatment effect on endpoint PP at trial jj.

Level 1: Observed trial-level model

Each trial reports an observed log-odds ratio for ORR, θ̂jORR\hat\theta^{\text{ORR}}_j, and an observed log-hazard ratio for PFS, θ̂jPFS\hat\theta^{\text{PFS}}_j, modeled as:

θ̂jORR𝒩(θjORR,(σjORR)2) \hat\theta^{\text{ORR}}_j \sim \mathcal{N}\!\left(\theta^{\text{ORR}}_j,\ \left(\sigma^{\text{ORR}}_j\right)^2\right)

θ̂jPFS𝒩(θjPFS,(σjPFS)2) \hat\theta^{\text{PFS}}_j \sim \mathcal{N}\!\left(\theta^{\text{PFS}}_j,\ \left(\sigma^{\text{PFS}}_j\right)^2\right)

Conditional on the (latent) true PFS effect, the true ORR effect follows an indication-specific regression:

θjORR𝒩(αzj+βzjθjPFS,σWLS2Npatients,j) \theta^{\text{ORR}}_j \;\sim\; \mathcal{N}\!\left(\alpha_{z_j} + \beta_{z_j}\,\theta^{\text{PFS}}_j,\; \frac{\sigma_{\text{WLS}}^{2}}{N_{\text{patients},j}}\right)

Level 2: Indication-specific regression parameters

Information is shared across indications via hierarchical modeling. Each pair (αk,βk)(\alpha_k, \beta_k) is learned adaptively across indications:

αk𝒩(a0,s02),βk𝒩(b0,s12),k=1,,5 \alpha_k \sim \mathcal{N}(a_0,\ s_0^2), \qquad \beta_k \sim \mathcal{N}(b_0,\ s_1^2), \qquad k = 1, \dots, 5

Level 3: Population-level hyperpriors

a0𝒩(0,52),b0𝒩(2,52),s0𝒩(0,52),s1𝒩(0,52),σWLS𝒩(0,52) a_0 \sim \mathcal{N}(0,\ 5^2), \quad b_0 \sim \mathcal{N}(2,\ 5^2), \quad s_0 \sim \mathcal{N}(0,\ 5^2), \quad s_1 \sim \mathcal{N}(0,\ 5^2), \quad \sigma_{\text{WLS}} \sim \mathcal{N}(0,\ 5^2)

Indication Groups

Group 1: Hematologic malignancies

Includes classical Hodgkin lymphoma (cHL), diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), multiple myeloma (MM), non-Hodgkin lymphoma (NHL), and peripheral T-cell lymphoma (PTCL).

Group 2: Gynecologic cancers

Includes cervical, endometrial, and ovarian cancers.

Group 3: Thoracic malignancies

Includes non-small cell lung cancer (NSCLC), small cell lung cancer (SCLC), and mesothelioma.

Group 4: Urologic and gastrointestinal solid tumors

Includes bladder cancer, gastric cancer, and renal cell carcinoma (RCC).

Group 5: Breast cancer

Includes breast cancer.

If no indication is specified, the average ORR–PFS relationship across all indication groups will be used.

Prior ORR data

The method for computing the observed log-odds ratio θ̂ORR\hat\theta^{\text{ORR}} and its standard error differs depending on whether the earlier study was a two-arm or single-arm trial.

Two-arm setting

In the two-arm setting, θ̂ORR\hat\theta^{\text{ORR}} and its standard error are estimated directly from the observed response counts in each arm using frequentist estimation:

θ̂ORR=log(xSOCnSOCxSOC)log(xtrtntrtxtrt) \hat\theta^{\text{ORR}} = \log\!\left(\frac{x_{\text{SOC}}}{n_{\text{SOC}} - x_{\text{SOC}}}\right) - \log\!\left(\frac{x_{\text{trt}}}{n_{\text{trt}} - x_{\text{trt}}}\right)

SE(θ̂ORR)=1xtrt+1ntrtxtrt+1xSOC+1nSOCxSOC SE\!\left(\hat\theta^{\text{ORR}}\right) = \sqrt{\frac{1}{x_{\text{trt}}} + \frac{1}{n_{\text{trt}} - x_{\text{trt}}} + \frac{1}{x_{\text{SOC}}} + \frac{1}{n_{\text{SOC}} - x_{\text{SOC}}}}

where xSOCx_{\text{SOC}}, xtrtx_{\text{trt}} are the number of responders and nSOCn_{\text{SOC}}, ntrtn_{\text{trt}} are the total number of patients in the SOC and treatment groups, respectively.

Single-arm setting

When only single-arm data are available (i.e., no concurrent control arm), the observed ORR treatment effect cannot be computed directly from two-arm counts. Instead, following Weber et al. (2021), uncertainty in the control ORR is incorporated by placing a distribution over the Standard of Care (SOC) response rate, pSOCp_{\text{SOC}}, using user-specified lower and upper bounds.

Let low_soc_rr and upp_soc_rr denote the lower and upper bounds for the control ORR. The corresponding logit-scale bounds are:

𝚕𝚘𝚐𝚒𝚝_𝚕𝚘𝚠=log(𝚕𝚘𝚠_𝚜𝚘𝚌_𝚛𝚛1𝚕𝚘𝚠_𝚜𝚘𝚌_𝚛𝚛),𝚕𝚘𝚐𝚒𝚝_𝚞𝚙𝚙=log(𝚞𝚙𝚙_𝚜𝚘𝚌_𝚛𝚛1𝚞𝚙𝚙_𝚜𝚘𝚌_𝚛𝚛) \texttt{logit_low} = \log\!\left(\frac{\texttt{low_soc_rr}}{1 - \texttt{low_soc_rr}}\right), \qquad \texttt{logit_upp} = \log\!\left(\frac{\texttt{upp_soc_rr}}{1 - \texttt{upp_soc_rr}}\right)

Rather than fixing pSOCp_{\text{SOC}} to a point estimate, a normal distribution is placed on the logit scale:

logit(pSOC)𝒩(μSOC,σSOC2)pSOC=ex1+ex,x𝒩(μSOC,σSOC2) \text{logit}(p_{\text{SOC}}) \sim \mathcal{N}\!\left(\mu_{\text{SOC}},\, \sigma_{\text{SOC}}^2\right) \;\Rightarrow\; p_{\text{SOC}} = \frac{e^x}{1 + e^x}, \quad x \sim \mathcal{N}\!\left(\mu_{\text{SOC}},\, \sigma_{\text{SOC}}^2\right)

where the mean and standard deviation are derived from the specified bounds:

μSOC=𝚕𝚘𝚐𝚒𝚝_𝚕𝚘𝚠+𝚕𝚘𝚐𝚒𝚝_𝚞𝚙𝚙2 \mu_{\text{SOC}} = \frac{\texttt{logit_low} + \texttt{logit_upp}}{2}

σSOC=𝚕𝚘𝚐𝚒𝚝_𝚞𝚙𝚙𝚕𝚘𝚐𝚒𝚝_𝚕𝚘𝚠2Φ1(11𝚌𝚒_𝚛𝚛2) \sigma_{\text{SOC}} = \frac{\texttt{logit_upp} - \texttt{logit_low}}{2\cdot\Phi^{-1}\!\left(1 - \dfrac{1 - \texttt{ci_rr}}{2}\right)}

Φ1()\Phi^{-1}(\cdot) denotes the quantile function of the standard normal distribution. ci_rr is the user-specified confidence level for the SOC response bounds (default = 80%). Lower values may be used for more conservative assumptions or when the SOC data are uncertain. In the single-arm setting, xSOCx_{\text{SOC}} and nSOCn_{\text{SOC}} entering the expressions above are drawn from the distribution over pSOCp_{\text{SOC}} rather than observed directly.

Blumenthal, Gideon M, Stella W Karuri, Hui Zhang, et al. 2015. “Overall Response Rate, Progression-Free Survival, and Overall Survival with Targeted and Standard Therapies in Advanced Non–Small-Cell Lung Cancer: US Food and Drug Administration Trial-Level and Patient-Level Analyses.” Journal of Clinical Oncology 33 (9): 1008.
Hampson, Lisa V, Björn Bornkamp, Björn Holzhauer, et al. 2022. “Improving the Assessment of the Probability of Success in Late Stage Drug Development.” Pharmaceutical Statistics 21 (2): 439–59.
Weber, Sebastian, Yue Li, John W Seaman III, Tomoyuki Kakizume, and Heinz Schmidli. 2021. “Applying Meta-Analytic-Predictive Priors with the R Bayesian Evidence Synthesis Tools.” Journal of Statistical Software 100 (19): 1–39. https://doi.org/10.18637/jss.v100.i19.