Phase I oncology – a statistical disaster zone?

The traditional 3+3 design, and its variants such as the rolling-6, have been widely, almost routinely used in phase I oncology trials despite, from a statistical perspective, being a bit of a disaster zone. First of all decision criteria consider only the current dose, ignoring informative data from other dose groups. To make matters worse there is huge uncertainty about the true DLT with such small sample sizes.

So, if we observe a 33% DLT rate we behave as if we know that a given dose is not tolerated, and often make irrevocable decisions about dosing of patients in future development, when in fact with 6 patients the 95% confidence intervals tell us the outcome is consistent with a true DLT rate, at that dose, of anywhere between 4% and 78% - with 3 patients, virtually no DLT rate can be ruled out!

There are of course alternatives, such as the continual reassessment method (CRM), and I wonder why they are not being used more, especially as there now seems to be regulatory acceptance when used carefully? The maths behind the CRM is complicated, the implementation fiddly and maybe that’s part of the barrier but the philosophy is quite intuitive – let me try and explain.

The key advantage of the CRM is that all available data are utilised in informing the next dose decision. In addition, it is quite explicit about the uncertainty concerning the true identity of the MTD, thus encouraging more patients to be studied. The first step in this Bayesian technique is to quantify the prior beliefs about the MTD. Then, as data are observed, the prior is continually updated using data from all doses, creating a posterior describing the probability each dose exceeds the MTD. The more data observed, the less the calculations are influenced by the prior. Let’s look at a few examples.

First of all defining a prior – you could spend days debating the prior but an appealing approach, when only pre-clinical data are available, is to use weakly informative priors (Neuenschwander). In the example below, there is an assumed dose response for the most likely DLT rates, with the best estimate of the MTD being between 12 and 15 mg (where the 0.33 dashed line intersects the point estimates). However, the probability intervals (you can think of them as a confidence intervals), for the DLT rate at each dose, are wide appropriately allowing for large uncertainty as to where the MTD lies.

As DLT data are obtained the CRM model is continually updated to provide probabilities about the MTD. Let’s imagine we’ve observed the following data so far.

No. of DLT No. of patients

1mg 0 3

2mg 0 3

4mg 0 3

6mg 1 6

8mg 1 6

Applying the CRM we can calculate 2 quantities for each potential dose

The probability there is truly greater than a 33% DLT rate or ‘overdose’
The probability the true DLT rate is between say 20% and 33% or ‘target’ toxicity

If we use the escalation with overdose control (EWOC) principle, the next recommended dose is the one that has the highest probability of target toxicity amongst those that have less than a pre-specified probability, say ≤25%, of overdose. These quantities can be calculated for all potential doses.

In this case, the next dose recommended would be 8mg, not 10mg as governed by 3+3 criteria, as all doses higher than 8mg have a > 25% probability of having a > 33% DLT rate and 8mg has the highest probability of target toxicity. The protocol would need to define whether further toxicity data should be obtained at 8mg, which would probably be wise. If a further 6 patients were doses at 8mg and no further toxicities were observed the CRM would then conclude it would be safe to dose at 10mg. Alternatively, scenarios can occur where the CRM would recommend escalation/expansion when the 3+3 criteria wouldn’t.

In practice, the CRM should only ever provide recommendations, and the ultimate decision should rest with a Safety Review Committee (SRC), comprising investigators and sponsor staff. The SRC would consider the nature of the DLTs observed and whether for example there was a pattern of persistent and troublesome Grade 2 toxicity. It is also necessary to define a period of follow-up, say 1 or 2 cycles, within which DLTs are considered but if later toxicity is observed the model could be re-run with longer follow-up and there are variants that handle longer follow-up, such as the time-to-event CRM.

The CRM model are beginning to be used, but despite the underlying mathematical complexity I hope that these approaches become more widespread.

Neuenschwander B, Branson M, Gsponer T. Critical aspects of the Bayesian approach to phase I cancer trials. Stat Med. 2008;27(13):2420-2439.