Recasting traditional frequentist thinking into Bayesian beliefs illustrates that requirements for 'strength of evidence' should drive the sizing of pharmacogenomic trials not hopeful hyperbolae. There is no 'free lunch' in pharmacogenomic investigations.
Keywords: Pivotal; Phase 3
Many authors over the years have held up the promise that pharmacogenomic clinical trials can be smaller than their non-pharmacogenomic equivalents [1-3]. This is certainly true if the pharmacogenomically targeted population has a smaller variance in any efficacy or safety outcome compared to a heterogeneous 'all-comers' sample . Clinical trial sizes (n) scale with the variance of measurements i.e. with the square of typical measurement error . A homogenous population segment of patients selected on a specific biomarker should have a smaller error variance and thus need a smaller trial to investigate. However, to non-aficionados of statistics this has led to the belief that per se pharmacogenomic trials are something special beyond the realms of ordinary regulatory concerns about trial sizing. Often pharmaceutical executives mistakenly hear, or may be unscrupulously led to believe, that a segmented medicine will always be in some sense cheaper or quicker to develop. Considering the weight of evidence requirements  is an illuminating way of understanding that the sizing of pharmacogenomic trials are not exceptional when in comes to regulatory sufficient proof.
A positive pharmacogenomic result in a clinical trial arises when a hypothesis of 'no-effect' of the genomic biomarker on the outcome is rejected by the data. Statisticians frequently refer to this as 'rejecting the null' in favour of 'the alternative' (where the alternative hypothesis is that there is some pharmacogenomic effect). One could look at these hypotheses as encapsulating the experimenter's subjective belief about the world.
Given these two mutually exclusive and exhaustive pharmacogenomic hypotheses H0 (belief in the null) and H1 (belief in the alternative) then Bayes rule gives the posterior probability of the null hypothesis (i.e. after the trial data has been collected) as:
Where, | means 'given', D is the trial's data and p(H) is the prior probability of the hypothesis. Before an experiment, the belief in either hypothesis may be equally likely (i.e. p(H) = 0.5∀H or as in the crucial equipoise philosophy of standard frequentist sample size calculations). This Bayesian approach belies the fact that H0 is a point hypothesis while the alternative spreads the probability out according to the extra parameter(s) and selection specification(s) included in H1 , but is a useful framework here for illustration.
How can this argument be used to approximate the support for belief in hypotheses when designing a pharmacogenomic clinical trial programme?
Consider, that a pivotal phase 3 pharmacogenomic trial is planned such that the propter hoc type 1 error is 0.05 one-sided and the propter hoc type 2 error is 20%. The trial runs and all that one knows is that it rolls out a ‘significant’ p value for the observed pharmacogenomic result - the actual values of the estimated parameters being unknown.
Then from the above, as the probability of obtaining a result at least as extreme as this obtained under the null belief is ≤ 0:05, and under the alternative was planned as ≤ 0:8, then the approximately
Since the equal priors of the two hypotheses cancel, and approximately
Note the key importance of the relative weight of evidence (, the surrogate for the actual observed likelihood ratio).
In terms of a probability ratio or Bayes Factor  in favour of the alternative, this is =0.941/0.059 = 16 (or ). As a guide Evett et al.  this probability ratio would be considered as 'Moderate evidence to support' by Aitken et al.  and in bits as 'Strong' by Jeffreys . It is equivalent to a weight of evidence (=10 x log10 [probability ratio]) of 12.04 decibans , being 'Moderate to strong' according to Good . It maps to the general statistical theory 'deviance' (=twice natural logarithm of likelihood ratio) used for hypothesis testing in linear modeling of 5.55.
In accordance with general regulatory practice in pharmaceutical development, a second (independent) replicate pivotal trial is planned with the same assumptions as the first and let us assume that it itself rolls out with an unspecified 'significant' pharmacogenomic result. Then, putting aside for now issues of whether the actual parameter values found are the same as before or not, given this extra data, it follows again from the Bayes argument above that the new posterior for belief in the alternative is
And the new belief in the null hypothesis posterior is ≤ 1 minus this.
By now, evidence for the alternative pharmacogenomic hypothesis (i.e. that there is a pharmacogenomic effect) is:- 0.996/0.004= a Bayes Factor of 249 ('Moderately strong evidence to support ), or 7.96 bits ('Decisive' ), or 23.96 decibans ('Strong' ). This now maps to an impressive deviance of 11.04.
Now imagine instead that a pharmaceutical company executive believes that pharmacogenomic trials are somehow special and can be done differently? Perhaps just with one 'smaller, cheaper' single pivotal trial? What obtained 'significance' level matching this planned trial type 1 error would generate at least the equivalent alternative hypothesis posterior to executing the two trials (given a planned 80% power)?
For a single planned pivotal trial with a pre-specified one sided type 1 error and 80% power, the answer is given through simple algebraic equivalence as to which the solution is a planned (and obtained) α = 0.003 - close to heuristically assuming the resultant support is 0.05 (for the first trial) times 0.05 (for the second trial) and very near the standard 'extreme p value' of 0.001 desired in regulatory guidelines for single pivotal trials . One assumes that regulators require a little lower type one error for confidence in resultant assertions as they have now lost the reliability of 2 'independent' sets of pharmacogenomic investigators investigating the drug. This extra burden of proof also compensates for the fact that despite the consilience in rejecting the null belief, the actual parameter values whilst considered the same as before will vary in practice in each trial, as well as allowing for any perceived issue of multiple testing. As clinical trialsizes () scale non-linearly with desired significance  - required numbers in the study explode for such very small type 1 errors! A smaller cheaper trial is not indicated.
A recent systematic review has highlighted that many pharmacogenomic trials being carried out are in fact small  - some with additional design failings too . The above algebraic outline does not contain any special term for the use of a pharmacogenomic biomarker - whether the trial is pharmacogenomic or not, the force of logic still applies. A single trial to generate proof needs to be much larger than expected compared to the size of one of an equivalent pair of clinical trials - there is 'no free lunch'.
Note (most usefully): If the planned significance level changes (or the planned power is altered) for the second pharmacogenomic trial - that can be fed into the above approximate argument. Similarly the above algebra can also be used for the actual significance levels obtained and the post hoc power obtained versus the planned alternative for each and every pharmacogenomic trial to calculate more accurate support. Similarly, should alternatives change for the second replicate trial then recalculation of the post hoc power of that trial back to the original first trial's alternative can be made to ensure appropriate synthesis (or alternatively the original trial's evidence be recast to the new alternative). One can use this approach including 'failed' trials, more than 2 replicate trials etc. Any pharmacogenomic trial type can be used and folded into the same argument about beliefs. Of course, for final formal support over any particular hypothesis, the actual data likelihood (p(D|H)) values over the whole possible hypothesis space (i.e. not just restricted to these two didactic exclusive and exhaustive hypotheses) should be used for each and every trial.
These views are personal and self-financed. They should not be construed as representing in any way those of the University of Reading, Daiichi-Sankyo Development Ltd, nor the Royal Society, London.