NEW Bayesian - Approaches.in - Oncology.using.R.and - OpenBUGS
NEW Bayesian - Approaches.in - Oncology.using.R.and - OpenBUGS
in Oncology Using R
and OpenBUGS
Bayesian Approaches
in Oncology Using R
and OpenBUGS
Atanu Bhattacharjee
First edition published 2021
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2021 Taylor & Francis Group, LLC
[First edition published by Willan 2008]
[Sixth edition published by Routledge 2009]
CRC Press is an imprint of Taylor & Francis Group, LLC
The right of Atanu Bhattacharjee to be identified as author of this work has been asserted by him in
accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
Reasonable efforts have been made to publish reliable data and information, but the author and
publisher cannot assume responsibility for the validity of all materials or the consequences of their
use. The authors and publishers have attempted to trace the copyright holders of all material repro
duced in this publication and apologize to copyright holders if permission to publish in this form
has not been obtained. If any copyright material has not been acknowledged please write and let us
know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis
[email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and
are used only for identification and explanation without intent to infringe.
Contents
Preface xiii
Author xv
1.1 Introduction to R . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 OpenBUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9.1 BAEssd . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9.2 BayesianPower . . . . . . . . . . . . . . . . . . . . . . 24
2.9.3 NPHMC . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.9.4 PowerTOST . . . . . . . . . . . . . . . . . . . . . . . 26
2.9.5 SampleSize4ClinicalTrials . . . . . . . . . . . . . . . . 26
2.9.6 SSRMST . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.9.7 powerSurvEpi . . . . . . . . . . . . . . . . . . . . . . 28
2.9.8 SampleSizeMeans . . . . . . . . . . . . . . . . . . . . 28
vii
viii Contents
3 Study Design-I 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Study Design-II 39
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.7 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.10.1 DoseFinding . . . . . . . . . . . . . . . . . . . . . . . 60
6 Survival Analysis 65
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Contents ix
6.3.1 BayesSurvival . . . . . . . . . . . . . . . . . . . . . . 76
6.3.2 muhaz . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
model . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
x Contents
12.7.2 JM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Contents xi
Bibliography 227
Index 241
Preface
Over the past decades, computational flexibility has extended the application
of the Bayesian approach as a handy method in research practice. The oncol
ogy disease complexity and existing challenge to obtain the best therapeutic
effect has provided immense opportunity to develop the statistical methodol
ogy. Perhaps, the acceptance of the Bayesian approach in oncology research
is relatively low. There is a vast scope to extend the analytical approach by
Bayesian in oncology research practice. This book intended to make a single
source of information on Bayesian statistical methodology for oncology re
search to cover several dimensions like study design, sample size calculation,
time-to-event data analysis, and diagnostics implication. The idea is to overall
extending the Bayesian approach in oncology research practice.
This book is structured with four sections to extend Bayesian in oncology
research. Study design-related topics proposed to bring in Part I. The impor
tance of statistics in the oncology domain first felt during the setup of the
required participant’s size for an oncology trial. Chapter 1 gives an introduc
tion about R and OpenBUGS software. It is prepared to understand about
basic stuff of R and OpenBUGS.
Now Chapter 2 is explained with sample size determination. The Bayesian
counterpart to calculate the sample size determination presented. An inferior
ity trial overrules the application of a placebo-controlled trial due to the eth
ical constraints and availability of several molecules in the oncology domain.
Chapter 3 illustrated about conventional study design for oncology clinical tri
als. The alternative approach of traditional design of study shown in Chapter
4. It is not always that high-dose chemotherapy will work as the best ther
apeutic arm. Oncology is a complex disease. Sometimes low-dose frequently
used chemo regime works well to control the disease progression. The low-dose
chemotherapy is decided based on the optimum biological dose (OBD). The
Bayesian statistical methodology on low-dose chemotherapy selection proce
dure demonstrated in Chapter 5.
Section II proposed to develop and present a different statistical approach
with real data application with time-to-event data analysis. The conventional
technique on survival analysis is a frequentist-based approach and method
with Kaplan-Meier and Cox proportional hazard models always performed.
However, there is minimal literature support about the Bayesian counterpart
on Cox proportional hazard modeling. Chapter 6 shows about Bayesian sur
vival analysis. The competing risk, i.e., cancer patient, died due to other causes
xiii
xiv Preface
than cancer is a new issue in oncology data. The decision about the best ther
apeutic regime in the presence of competing for risk is always challenging.
Chapter 7 is presented with Bayesian competing risk analysis to compare two
therapeutic schemes. The concept of frailty provides a convenient way of in
troducing unobserved heterogeneity and associations into models for survival
data. Frailty is an unobserved random proportionality factor. It influences the
hazard function of an individual or related individual chapter 7 prepared on
Bayesian frailty data analysis. Similarly, Chapter 8 prepared about Bayesian
relative survival analysis. This work developed with an illustration.
In this chapter, we will show with a Bayesian analysis for right-censored
survival suitable for curable cancer site data. Bayesian inference with Markov
chain Monte Carlo (MCMC) methods will be established through model se
lection technique and example with a real dataset. The Section III is about
statistical methodology in longitudinal and survival data analysis. Bayesian
inferences for longitudinal data analysis in oncology research is evitable. It is
attractive due to the ability to consider prior information for statistical growth
curve modeling. Chapter 10 focused on longitudinal data analysis. The mixed-
effect model illustrated with R. The Bayesian inference on missing longitudinal
data illustrated in Chapter 11. Similarly, the Bayesian approach is suitable to
handle Joint longitudinal and survival data. It helps to consider the depen
dence between two types of responses obtained by two different methods. In
Chapter 12, the joint longitudinal and survival analysis presented. Different
prior assumptions and covariance matrix presented with posterior inference.
The work performed with OpenBUGS software. Chapter 13 has presented
with different covariance structures observed by longitudinal measurements.
The Bayesian counterpart illustrated different covariance structures.
The final section-IV is detailed on diagnosis test statistics with available
statistical methodology. The author illustrated the Bayesian method to pro
vide inference on different inter and intra-rater agreement analyzes. Chapter
14 is present about mixed effect modeling. It is dedicated to select the spe
cific dose in oncology. Chapter 15 is dedicated to concordance analysis by
the Bayesian approach. The CCC is a useful tool in agreement analysis. The
importance of high-dimensional data in oncology research will be explored.
The Bayesian approach in high-dimensional data is in Chapter 16. The R and
OpenBUGS codes illustrated.
Author
xv
Part I
Bayesian in Clinical
Research
1
Chapter 1
Abstract
This chapter starts with the history of R software development. Steps to
install R are discussed. It provides about the installation of different packages
in R.Graphical representation by R will help many users to get interested in a
graphical illustration.Steps to install R studio. Description of OpenBUGS soft
ware and illustration to perform Markov Chain Monte Carlo in OpenBUGS.
Introduction about Bayesian using Gibbs sampling accessible. Different types
of Markov chain Monte Carlo (MCMC) approaches are outlined. OpenBUGS
model explanation, trace plot generation, data loading and posterior estimates
generation are described. The real-world examples are used to illustrate the
methods presented. It should provide some familiarity about the R process
and essentially environment on what actually works. R software works as the
integration of data calculation, graphical procedure, and data manipulation.
Command to work with R is presented. This chapter did not explain about
statistics in R. The aim is to provide R and OpenBUGS environment. It is
preferred to show the environment to work with the classical and modern sta
tistical test to implement. Illustrations are presented by the workstations with
windows system. It will guide the users about the available facility.
1.1 Introduction to R
R is a programming language developed by Ross Ihaka and Robert
Gentleman in 1993. It is a free software for graphics and statistical computing.
The source code of R is prepared primarily in Fortran, R, and C. Although R
works are command-driven software, there are other graphical user interfaces
like RStudio. R is named partly after the first names of the first two R authors
and partly as a play on the name of S [1].
There are several dedicated R packages available in R for specific statistical
methods. However, linear modeling, nonlinear modeling, time-series analysis,
clustering, and other analysis can be performed in R itself. No special packages
3
4 Bayesian Approaches in Oncology Using R and OpenBUGS
are required to install. The R community is noted for its active contributions
in terms of packages. R users typically access the command-line interpreted.
For example, if the user types 4+4 at the command prompt and press enter,
the computer replies with 8, as shown below.
>4+4
>8
1.3 Packages
The datasets and functions available in R are stored in packages. A spe
cific package is needed to install before its contents available. The packages
available can be obtained as
> library()
> search()
The standard (or base) packages are defined as part of the R source code.
This contains the essential functions that allow R to work. Datasets and stan
dard statistical and graphical functions are available. Data can be generated
from the normal distribution by ‘rnorm’ as
#Generate Data Distribution
x < rnorm(50)
y < rnorm(x)
plot(x, y)
Introduction to R and OpenBUGS 5
Now the Plot function is useful. A graphics window will appear automati
cally. There are thousands of functions available in R. This function can be ob
tained from the Comprehensive R Archive Network (CRAN): https://cran.r
project.org.
1.5 OpenBUGS
BUGS is an open source statistical software package used for Bayesian in
ference using Gibbs sampling. The statistical model is specified by merely stat
ing the relationship between covariates and outcome variables. The MCMC
technique is used for analyzing the specified model. It can be downloaded
from: http://www.openbugs.net/w/Downloads. A setup program for Open-
BUGS on Windows computers is available. The latest version is available as
version 3.2.2. The manual is also available over there.
It is better to have knowledge of MCMC methods to work with Open-
BUGS. There are three different families of MCMC algorithms like Gibbs,
Metropolis-Hastings, and slice sampling to work with OpenBUGS. Primarily,
Gibbs sampling algorithm works to sample from the conditional distribution
of each node given for all the others in the graph. It works as a particular case
of Metropolis-Hastings algorithm.
8 Bayesian Approaches in Oncology Using R and OpenBUGS
model
{
for (i in 1 : Dogs) {
xa[i, 1] < 0; xs[i, 1] < 0 p[i, 1] < 0
for (j in 2 : Trials) {
xa[i, j] < sum(Y[i, 1 : j - 1])
xs[i, j] < j - 1 - xa[i, j]
log(p[i, j]) < alpha × xa[i, j] + beta × xs[i, j]
y[i, j] < 1 - Y[i, j]
y[i, j] ∼ dbern(p[i, j])
}
}
alpha ∼ dnorm(0, 0.00001)I(, -0.00001)
beta ∼ dnorm(0, 0.00001)I(, -0.00001)
A < exp(alpha)
B < exp(beta)
}
10 Bayesian Approaches in Oncology Using R and OpenBUGS
#Results
Abstract
The sample size for a clinical trial jointly works with study design. A
different issue like a number of sites, number of treatments also considered for
sample size determination. The appropriate response plays an essential role in
sample size calculation. Computing large sample size always wastes of time,
money and unnecessarily put patients into a risk. Similarly, sample size too
small always makes a less powerful statement. Thus, it is essential to compute
sample size wisely to decide minimal clinically meaningful effect size with
pre-specified type I error and power. It is not possible to plan a clinical trial
unless we know the required sample size. However, there is a different strategy
to perform any oncology trial. The study design and study hypothesis play a
combined role to determine the sample size. There are some great sample size
and power calculation software packages available. We presented this chapter
with some R sample size calculation packages. However, there are undoubtedly
other packages that you might consider. The R packages are illustrated toward
implementing Bayesian for sample size calculation. Theoretical explanation
about Bayesian Sample size determination is discussed. Different R packages
are used to solve the problem and all of the problems that encountered do a
better job the other on certain types of scenario.
2.1 Introduction
How many subjects do we need? How long will the study take to complete?
In survival analysis, we need to specify information regarding the censoring
mechanism and the particular survival distributions in the null and alternative
hypotheses.
First, one needs either to specify what parametric survival model to use, or
that the test will be semi-parametric, e.g., the log-rank test. It requires to
determining the number of deaths (or events) required to meet the power and
other design specifications.
13
14 Bayesian Approaches in Oncology Using R and OpenBUGS
Second, one must also provide an estimate of the number of patients that need
to entered into the trial to produce the required number of deaths. We shall
assume that the patients enter a trial over a certain accrual period of length
a, and then followed for an additional period f known as the follow-up time.
Patients still alive at the end of follow-up are censored. Sample size calcula
tions are an essential part of study design Consider sample size requirements
early.
A well-designed trial is large enough to detect clinically important differences
between groups with high probability.
To perform sample size calculations, we need well-defined study endpoints,
hypotheses, and statistical tests. Study hypotheses decided on a clearly de
fined endpoint and period of study.
In most RCTs, known as superiority trials, the study hypothesis presented as
a null hypothesis of no difference in the distribution of the primary endpoint
between study groups. We have an alternative hypothesis in mind, for exam
ple: HA . The frequency of events at 30 days will differ in the two treatment
groups.
In superiority trials, we test the null hypothesis against a two-sided alterna
tive. When we test the null hypothesis, there are two possible states of nature
and two decisions: We will perform a test that has a small probability of a
Type 1 error, usually 0.05. The power of the study is the probability that
we will reject the null hypothesis when the alternative hypothesis is true. We
want this probability to be large, typically at least 0.8.
2p̄ ∗ (1 − p̄)(Zα/2 + Zβ )2
n= (2.1)
Δ2
Zα/2 and Zβ are the critical values of the normal distribution, p̄ is the average
of the event rates under the alternative hypothesis, and Δ is the true difference
under HA .
For example, α = 0.05, β = 0.20, 2n = 1, 903 or n = 806.
Sample Size Determination 15
−Difference :Δ = µT − µC (2.3)
−H0 : Δ = 0, Hα : Δ > 0 orΔ < 0 (2.4)
A simple formula for the sample size, assuming all subjects are followed to
the event, is
n = 2(Zα/2 + Zβ )2 /[In(λC /λT )]2 (2.5)
where the values of λC and λT are given by HA . If the subjects are recruited
over some time, and the study ends when some items have not had the event,
library(“gsDesign”)
ss < nSurvival(lambda1=.2, lambda2=.1, eta = .1,
Ts = 2, Tr = .5,sided=1, alpha=.025)
ss
# R Output
library(“gsDesign”)
x<-gsDesign(k = 5, test.type = 2, n.fix=ss$nEvents,
nFixSurv=ss$n,delta1=log(ss$lambda2/ss$lambda1))
x
Sample Size Determination 17
#R Output
library(“powerSurvEpi”)
ssizeCT.default(power = 0.8,k = 1,pE=0.3707,
pC = 0.4890,RR = 0.7,alpha = 0.05)
18 Bayesian Approaches in Oncology Using R and OpenBUGS
Now
and B(., .) represents the beta function. The term (x1 , x2 ) can be formulated
Sample Size Determination 19
as
2
� (nx i )B(xi + ci , ni − xi + di )
i
p(x1 , x2 ) = (2.11)
i=1
B(ci , di )
The simplified form of joint posterior distribution can be formulated as
which is non-zero over the region in the plane bounded by the lines π1 =
0, π1 = 1, π1 = θ and π1 = θ + 1. The marginal posterior distribution of θ can
be formulated as
� min(θ+1,1)
f (θ|x1 , x2 , n1 , n2 ) = f (π1 , θ|x1 , x2 , n1 , n2 )dπ1 (2.13)
max(0,θ)
√ √
n(U − µ0 ) nω
1 − β = P (Z > z1−α |H1 ) = P ( > z1−α − |H1 ) (2.18)
σ̂ σ
√
nω
1 − β = P (Z > z1−α |H1 ) � Φ(−z1−α + ) (2.19)
σ
√
(z1−α + z1−β ) = nω
Finally, the sample size is obtained with n by
(z1−α + z1−β )2 σ 2
n= (2.20)
ω2
where
n
� n
� � a(x1 ,x2 +l)
P r{θ ∈ (a(x1 , x2 ), a(x1 , x2 )+l)} = f (θ|x1 , x2 , n)dθ
x1 =0 x2 =0 a(x1 ,x2 )
(2.22)
The lower limit of the Highest Posterior Density (HPD) is defined
through x1 , x2 . The term probability 1 − α varies with x with HPD
interval l. It helps to define the sample with minimum size n by
� � a(x,n)+1
{ f (θ|x)dθ}f (x)dx ≥ 1 − α (2.23)
X a(x,n)
Sample Size Determination 21
If we fixed the interval with length l and defined the coverage prob
ability with 1 − α then the minimum sample size n can be defined
as ∗ ∗
� a(x1 ,x2 )+l
f (θ|x∗1 , x∗2 , n)dθ ≥ (1 − α) (2.28)
a(x∗ ∗
1 ,x2 )
The optimum sample size not possible to obtain by ACC and ALC.
The conservative approach to maximize the length of l and reduce the
minimum coverage probability of (1 − α) in connection to the data.
The optimum sample size is obtained by
� a(x,n)+l
inf { f (θ|x)dθ} ≥ 1 − α (2.29)
x∈X a(x,n)
>library(“SampleSizeProportions”)
library(“SampleSizeProportions”)
propdiff.acc(len=0.05, c1=3, d1=2, c2=2, d2=3)
[1] 2483 2483
In the above equation the length is defined as the posterior credible interval
for the two proportions difference. Now c1 , c2 are the first priors parameter
with Beta density for the first and second population. Similarly d1 , d2 are the
second priors parameter with Beta density for the first and second population.
Level of significant is defined as 0.95. Now the calculated sample size for both
the groups are (n1 , n2 ).
2.7 Illustration
Let there are two experimental arms named as Arm A and Arm B respec
tively. Arm A is exsisting arm and we want to compare the therapeutic effect
of Arm B with reference to Arm A. Now the null hypothesis is required to for
mulate. The null hypothesis should be linked with sucess of Arm A is similar
with Arm B. The success of Arm A is defined as probability of survival of the
patients at the end of 3 years is 38%. Similarly, for Arm B it is assumed as
61%. The prior parameters assumption for beta density on Arm A is defined as
Sample Size Determination 23
library(“BAEssd”)
f1 < norm1KV.2sided(sigma=5,theta0=0,prob=0.5,mu=2,tau=.1)
ss1 < ssd.norm1KV.2sided(alpha=0.25,w=0.5,minn=2,maxn=200)
ss1
Sample Size: 34
Total Average Error: 0.2451311
Acceptable sample size determined!
plot(ss1)
plot(ss1,y=“AE1”,alpha.line=FALSE) abline(h=0.05,lty=2)
24 Bayesian Approaches in Oncology Using R and OpenBUGS
2.9.2 BayesianPower
This package is useful to compute the sample size and power for comparing
inequality constrained hypotheses. The example of calculating the sample size
by “bayes sampsize” function is in the following text.
#Calculating Sample Size by BayesianPower Package
2.9.3 NPHMC
This Sample Size Calculating package is used for the Proportional Hazards
Mixture having cure or without cure Model
#Sample Size Calculation for PH Mixture Cure Model and Standard
PH Model
library(“NPHMC”)
NPHMC(power=0.90,alpha=0.05,accrualtime=3,
followuptime=4,p=0.5,accrualdist=“uniform”,
hazardratio=2/2.5,oddsratio=2.25,pi0=0.1,
survdist=“exp”,k=1,lambda0=0.5)
26 Bayesian Approaches in Oncology Using R and OpenBUGS
#Output on NPHMC
2.9.4 PowerTOST
This package is used for calculating the Power and Sample Size for (Bio)
Equivalence Studies. It can be obtained by two and one-sided t-tests.
library(“PowerTOST”)
power.TOST(CV = 0.25, n = 24)
2.9.5 SampleSize4ClinicalTrials
This pacakge is used to calculating sample size for Means and Proportion
in Phase III Clinical trials. Different ressearch design can be performed by this
package. Research design like: (1) Testing for equality, (2) Superiority trial,
(3) Non-inferiority trial, and (4) Equivalence trial. Now performed sample size
calcuation by comparison of Means and Proportions separately.
#Sample Size Calculation by SampleSize4ClinicalTrials Package
library(“SampleSize4ClinicalTrials”)
Calculation of Means
ssc meancomp(design = 3L, ratio = 1, alpha = 0.05, power = 0.8, sd
= 0.1, theta = 0, delta = -0.05)
Treatment Control
50 50
Calculation of Proportion
ssc propcomp(design = 4L, ratio = 1, alpha = 0.05, power = 0.8, p1
= 0.75, p2 = 0.80, delta = 0.2)
Treatment Control
133 133
Sample Size Determination 27
2.9.6 SSRMST
The sample size calculation by Restricted Mean Survival Time is possible
to compute. It helps to calculate the sample size based on the difference in
Restricted Mean Survival Time.
#SSRMST I
library(“SSRMST”)
ac rate=The Accrual rate is the number of patients
per unit time. ac period=The Accrual period as the time point at
last accrual.
ac number=The Accrual number of accrual patients.
tot time=The Total study time as the time point
at last follow-up.
tau Truncation is the time point to calculate RMSTs.
shape0, shape1 Shape parameters for the Weibull
distribution in
both the control (arm0) and the
treatment (arm1).
ac rate=15
ac period=35
tot time = 510
tau = 500
scale0 = 7000
scale1 = 7000
margin = 10
a=ssrmst(ac rate=ac rate,ac period=
ac period, tot time=tot time,
tau=tau, scale0=scale0, scale1=scale1,
margin=margin, ntest=20)
print(a)
28 Bayesian Approaches in Oncology Using R and OpenBUGS
#Result of SSRMST I
Non-inferiority test
2.9.7 powerSurvEpi
library(“powerSurvEpi”)
ssizeEpi.default(power = 0.80, theta = 2, p = 0.39, psi = 0.505,
rho2 = 0.1322 , alpha = 0.05)
139
library(“powerSurvEpi”)
power=postulated power.
theta=postulated hazard ratio.
p=proportion of subjects taking value one for the covariate of interest.
psi=proportion of subjects died of the disease of interest.
rho2=square of the correlation between the covariate of interest and
the other covariate.
alpha=type I error rate.
2.9.8 SampleSizeMeans
This package is useful to obtain the normal means. Several functions are
available to calculate the sample size by Bayesian approach. For example,
“mu.acc” function is used to obtain the given coverage probability on average
for a posterior credible interval of fixed length for a normal mean.
Sample Size Determination 29
#SampleSizeMeans I
library(“SampleSizeMeans”).
mu.acc(len, alpha, beta, n0, level=0.95).
len is the desired fixed length of the posterior
credible interval for the mean.
alpha is the First parameter of the Gamma prior density
for the precision (reciprocal of the variance).
beta is the second parameter of the Gamma prior density
for the precision (reciprocal of the variance).
n0 is the prior sample size equivalent for the mean.
level is the desired average coverage probability of the
posterior credible interval (e.g.,0.95).
#Calculated Sample Size#
mu.acc(len=0.2, alpha=2, beta=2, n0=50)
[1] 721
Similarly, sample size can be determined by the single normal mean using
the Average Length Criterion. The function is “mu.alc”. If the variance is
known, then sample size is calculated by mixed of Bayesian and likelihood
approach with “mu.mbl.varknown” function.
#SampleSizeMeans II
library(“SampleSizeMeans”)
mu.mbl.varknown(len, lambda, level = 0.95)
len is the desired total length of the posterior.
credible interval for the mean.
lambda is the known precision(reciprocal of variance).
level is the desired coverage probability of the
posterior credible interval (e.g., 0.95).
#Calculated Sample Size#
mu.mbl.varknown(len=0.2, lambda=1/4)
[1] 1537
Study Design-I
Abstract
It is preferred to define suitable statistical models for dose-response mod
eling. It is used as consistent by the relevant biological process with a specific
case. The natural process, cell growth and biomarker changes or covariates
with response variable is required under consideration. Mostly, dose-response
models are curve fitting practice. Curve fitting can be performed by conven
tional or Bayesian approaches. This chapter is dedicated toward the applica
tion of the Bayesian technique in early phase oncology trial. Oncology trial’s
drugs are considered for humans with a small number of participants. The
primary objective of any initial phase clinical trial is to explore the safety of a
drug. Therefore, finalize the effective dose of medicine by looking acceptabil
ity on pharmacokinetics, and pharmacodynamics principle. Phase I studies
are dedicated to safe-effective treatment. Oncology drugs come with a toxic
agent. Phase I trial patients are cancer patients for those standard therapies
failed to respond. Phase I dose-escalation method is explained with Open-
BUGS. Further, the continual reassessment method is detailed by Bayesian
and illustrated with an example. The model-based designs for determining
the maximum tolerated dose is explained. Packages available in R for dose
selection modeling is detailed.
3.1 Introduction
The Bayesian technique is prominent methodology in data science. Data
science is inevitable in clinical research. So Bayesian merged tremendously
in the clinical trials well. Study design, sample size calculation, and statis
tical analysis have emerged with the Bayesian technique. The approach to
analyzing data in Bayesian is different from the classical approach. Recent
computation advancement helped to incorporate Bayesian in clinical trials. It
also helped to upgrade data analysis practice. Oncology research is involved
with time-to-event outcomes (i.e., death). Data analysis with survival analysis
31
32 Bayesian Approaches in Oncology Using R and OpenBUGS
Now di and yi are the dose level and toxicity outcome for patient i, and where
yi = 1 if a DLT is observed and yi = 0 if not. The algorithm to perform the
CRM design is given in the following text.
Study Design-I 35
library(“bcrm”)
p.tox0 < c(0.05, 0.10, 0.20, 0.30)
dose.label < c(5, 10, 15, 25)
bcrm(stop = list(nmax = 28), p.tox0 = p.tox0,
dose = dose.label,ff = “power”,
prior.alpha = list(1, 1, 1), target.tox = 0.30,start = 1)
36 Bayesian Approaches in Oncology Using R and OpenBUGS
R Output
library(“bcrm”)
dose < c(1,2.5,5,10,15,20,25,30,40,50,75,100,150,200,250)
p.tox0 < c(0.010, 0.015, 0.020, 0.025, 0.030, 0.040, 0.050,
0.100, 0.170, 0.300, 0.400, 0.500, 0.650, 0.800, 0.900)
## Data from the first 5 cohorts of 18 patients
data < data.frame(patient=1:18, dose=rep(c(1:4, 7),
c(3, 4, 5, 4, 2)), tox=rep(0:1, c(16, 2)))
## Target toxicity level
target.tox < 0.30
## A 1-parameter power model is used, with standardised
doses calculated using
## the plug-in prior median
## Prior for alpha is lognormal with mean 0 (on log scale)
## and standard deviation 1.34 (on log scale)
Power.LN.bcrm < bcrm(stop=list(nmax=18), data=data,
p.tox0=p.tox0, dose=dose, ff=“power”,
prior.alpha=list(3, 0, 1.342 ),
target.tox=target.tox, constrain=FALSE,
sdose.calculate=“median”, pointest=“mean”)
38 Bayesian Approaches in Oncology Using R and OpenBUGS
plot(Power.LN.bcrm)
Study Design-II
Abstract
It is a difficult task to decide optimal study design for phase II study. Some
times, it becomes more complicated while work with new molecular targeted
agents. as a result, phase III gets failed. It raises the issue that several agents
to be tested with the increasing complexity and cost. Simultaneously, the
study design is expected to be robust and efficient. This chapter is dedicated
to a phase II study design. Different study designs are considered to perform
phase II study. Similarly, different types of outcomes may arise in phase II
study as binary and continuous outcomes. Results observed with binary or
continuous are illustrated. Likewise, the sample size is another issue for the
phase II study design. Sample size determination procedure also elaborated.
Different R packages required to perform phase II study with the Bayesian
approach are explained. Example are presented with prior value, likelihood
and posterior estimates generation. This chapter will help to perform a Phase
II study in oncology domains.
4.1 Introduction
Phase II oncology trials are performed to decide a trial can be conducted
in massive or not. The primary endpoint in oncology trial is tumor size reduc
tion. Similarly, the disease progression both is critical endpoints in Phase-II
clinical trials. Drugs used in oncology are cytostatic. The cytostatic drug de
stroys the tumor cells and increases survival. The effect of the cytostatic drug
is measured by progression-free survival. However, the tumor progression and
tumor shrinkage both are anatomical burdens of the tumor. The tumor size re
duction is decided based on the Response Evaluation Criteria in Solid Tumors
(RECIST). There are clinicians from academia and industry, image special
ist, statistician and another subject expert to as RECIST Working Group.
The RECIST guideline is routinely updated. The target of Cytostatic drugs
is molecularly targeted agents (MTA). The control on MTA helps to prolong
39
40 Bayesian Approaches in Oncology Using R and OpenBUGS
the survival. The solid tumor classified as progressive disease (PD), partial
responses (PR), stable disease (SD) or complete responses (CR).The terms
PD and SD stand with treatment failure. Similarly, treatment success as CR
and PR. Jointly the PR and CR patients are defined as the objective response
rate (ORR). The proportion of patients under disease control are defined as
the disease control rate (DCR). It is better to not bring Phase II patients into
Phase III due to the occurrence of the substantial risk of toxicity or death.
The measurement of toxicity requires to take in Phase II trial design. The
combined inference about treatment success is decided by (I) change in tumor
size, (II) occurrence of new lesions, and (III) death/relapse. The treatment
failure is defined as a dichotomous endpoint ( died or not). But the reduction
in tumor size is measured as continuous (i.e., the amount of tumor shrinkage).
Perhaps, the tumor shrinkage can be classified into a dichotomous category
by predefined threshold reduction. The consolidated endpoint is measured by
continuous and binary components. The challenge is to estimate the differ
ent probability and compare the uncertainty of the estimates by confidence
interval (CI).
4.2 Methods
4.2.1 Estimating treatment success
Suppose that the proportion of tumor shrinkage is assumed as 20%, and
these patients are free from toxicity or death. A total of n patients are allocated
to the treatment under consideration. The measurements are taken as t1i ,t2i
and t3i . Now D1i = 1 if patients “i” fails due to other reason and D2i = 1 if
fails occurs during the two measurement. The log tumor size ratio is defined
as
z1i z2i
(y1i , y2i ) = (log( ), log( ) (4.1)
z0i z0i
Further, the tumor shrinkage model is defined with bivariate distribution
�
(y1i , y2i )T |zi0 ≈ N ((µ1i , µ2i )T , ) (4.2)
where
µ1i = α + γzi0 , µ2i = β + γzi0 (4.3)
If non-shrinkage tumor size is intermediately unmeasured, then it can be
treated as missing at random (MAR). The model can be estimated by the
linear model. The probability of non-shrinkage failure before the interim mea
surement is presented as a probability of Di1 = 1. If non-shrinkage tumor size
is intermediately unmeasured, then it can be treated as missing at random
(MAR). The model can be estimated by the linear model. The probability of
non-shrinkage failure before the interim measurement is presented as a prob
ability of Di1 = 1. Logistics Model is useful for both model
� log(0.7) � ∞
P (Si = 1|z0i , θ) =
−∞ −∞ (4.7)
P (D1i = 0|z0i , θ)P (D2i = 0|D1i = 0, z0i , y1i ; θ)dy1i dy2i
The pdf of the bivaraite distribution is defined as fY1 ,Y2 (y1i , y2i ; θ).
42 Bayesian Approaches in Oncology Using R and OpenBUGS
The ith patients 1st observation is determined as yi1 . The mean response of
ith patients 1st measurement is defined as
Logit(P (Di2 = 1|Di1 = 0, ti , zi0 , zi1 )) = αD2 + βD2 ti + γD2 zi1 (4.14)
Chain 1:
Chain 1: Gradient evaluation took 0 seconds
Chain 1: 1000 transitions using 10 leapfrog steps
per transition would take 0 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 5.849 seconds (Warm-up)
Chain 1: 4.818 seconds (Sampling)
Chain 1: 10.667 seconds (Total)
Chain 1:
Study Design-II 45
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps
per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 5.702 seconds (Warm-up)
Chain 2: 14.104 seconds (Sampling)
Chain 2: 19.806 seconds (Total)
Chain 2:
46 Bayesian Approaches in Oncology Using R and OpenBUGS
4.4 Discussion
The proposed methods work in combination with binary and continuous
outcomes. It is suited well in phase II cancer research. The method compatible
one decision about binary and another response about continuous outcomes.
The sample size determination is a phase II clinical trial is a significant issue.
Conventionally study design is required to calculate the sample size. It can
be specified that any Phase II study planned with 50 sample size can be
accommodated with the pre-specified methodology.
Chapter 5
Abstract
The biomarker is a realistic indicator to explore angiogenesis in cancer.
Recently, Subtoxic doses of chemotherapy are administered and defined as
Metronomic Chemotherapy (MC). Now set the best effective treatment in MC
is a challenging task. This chapter is elaborated about dose fixation by looking
biomarker level. The dose could help to maintain the desired level of the
biomarker as the optimum biological dose (OBD). We presented the Bayesian
algorithm to determine the OBD. The curve fitting procedure is illustrated in
OpenBUGS to determine the OBD. This work is explored by looking at the
performance of one surrogate marker toward influencing the disease outcome.
Disease outcome is defined as time-to-event outcomes. Conventional, time-to
event methodology like Kaplan-Meier estimates are elaborated. Similarly, the
treatment outcome is classified as a dose that is suitable to control the OBD.
In this chapter, theoretical explanation about Toxicity profile testing, Logistic
Response Model and Quadratic Logistic Design are detailed with Bayesian
illustration. The illustration will help to determine the best dose by looking
at the posterior estimates of regression coefficients. Similarly, model suitability
is explained by standard deviation and model selection criteria.
5.1 Introduction
The maximum dose of the effective drug within the tolerable limit is called
the maximum tolerated dose (MTD). Conventionally, it assumed that in
creased dose is effective. The best effective dose concluded by one level below
the tolerable highest dose [3, 4, 5]. Metronomic chemotherapy (MC) emerges
as a therapeutic option in medical oncology [6, 7, 8]. Sometimes, conventional
chemotherapy becomes resistant [8, 9, 10]. Development of chemoresistance
is a common problem [11]. The alternative of high-dose chemotherapy stands
with Metronomic chemotherapy (MC). The MC causes less severe side effects
than standard chemotherapy. The MC is marked as low-dose chemotherapy.
47
48 Bayesian Approaches in Oncology Using R and OpenBUGS
xj ∼ binom(nj , qj ) (5.1)
qj ∼ beta(a, b) (5.2)
g(pj ) = β1 + β2 dj (5.4)
The predefined dose of OBD is determined as [c1 , c2 ] with uniform distribution
� 1
if c1 ≤ s ≤ c2 ;
f (s) = c2 −c1
0 if otherwise.
The cummulative dose is presented as pj as
dj
dj − c1
�
pj = f (s)ds = for c1 ≤ dj ≤ c2 (5.5)
c1 c2 − c1
−c1 1
It becomes β1 = c2 −c1 and β2 = c2 −c1
50 Bayesian Approaches in Oncology Using R and OpenBUGS
exp(β1 + β2 dj )
pj = (5.11)
1 + exp(β1 + β2 dj )
pj
log( ) = β 1 + β2 dj (5.12)
1 − pj
log(1 − pj ) = −log[1 + exp(β1 + β2 dj )] (5.13)
If the patients size is N then the log-likelihood function can be defined as
N
�
l= [yi (β1 + β2 dj ) − ni log[1 + exp(β1 + β2 dj )] + log(nyii ) (5.14)
i=1
δl � exp(β1 + β2 di ) �
U1 = = {yi − ni [ ]} = (yi − ni pj ) (5.15)
δβ1 1 + exp(β1 + β2 di )
δl � exp(β1 + β2 dj )
U2 = = {yi dj − ni dj [ ]} (5.16)
δβ2 1 + exp(β1 + β2 dj )
� �
U2 = (yi − ni pj ) = dj (yi − ni pj ) (5.17)
The information matrix is defined as
� � � �
n p (1 − pj ) � ni dj pj (1 − pj )
T = �� � i j
� �
�.
ni dj pj (1 − pj ) ni d2j pj (1 − pj ) �
Optimum Biological Dose Selection 51
Suppose there are γi patients treated with dj dose from a toal of ηj . The
likelihood function is D = {yj , ηj ; j = 1, ..., J} with
J 2
� eα+βdj +γdj 1
L(D|α, β, γ) ∝ ( α+βdj +γd2j
)γj ( α+βdj +γd2j
)ηi −γi (5.21)
j=1 1+e 1+e
5.7 Illustration
In this illustration, two models are provided. Results are obtained through
posterior mean and standard deviation. A different scenario is prepared, and
those are given in Table 5.3. It provides the scope to consider the doses and
toxicity occurrences. The optimal dose can be obtained from this table. Since
the work is about low-dose chemotherapy, then the appearance of over toxicity
is very minimal.
This work is presented by a simulation study. The posterior probability of
efficacy and safety is presented as E and S, respectively. Suppose the marginal
posterior probability of the associated parameters are defined as (σ2 or τ2 ) for
S and E respectively.
The parameter S and E is linked as Pr(S = 1) = V ∗ Pr(E = 1). Now
V = 1 shows that OBD through a biomarker is achieved. If V > 1, then the
OBD of a biomarker is not achieved. Now the term σ and τ represents the
relation between S and E. if there is no relation between σ1 and τ1 then the
efficacy E could be ignored and the corresponding dose may be ignored. Now
Logistic response model is defined as Model1. Similarly, Quadratic Logistic
design is defined as Model 2.The additive parameters σ and τ for Model 1 is
defined as σ1 and τ1 respectively. Similarly, Model 2 is defined as σ2 and τ2 . In
this illustration a total of 20,000 iterations with 1,000 burn-in is carried out
with different fixed value of σ1 , τ1 , σ2 and τ2 . The simulation is performed to
obtain the convergence. The convergence is obtained by observing the trace
Optimum Biological Dose Selection 55
plots for each parameter of the model. The models are compared by Decision
Information Criteria(DIC) value.
56 Bayesian Approaches in Oncology Using R and OpenBUGS
# OpenBUGS code 1
Model1
model
{
for (i in 1:20)
{
mu.dmf[i] < beta1 + beta2 × fl[i]
dmf[i] ∼ dnorm(mu.dmf[i],tau)
}
beta1 ∼ dnorm(0.0,0.000001)
beta2 ∼ dnorm(0.0,0.000001)
sigma1 ∼ dunif(0,400)
tau1 < 1/(sigma1×sigma1)
for (i in 1:20)
{ # calculate residuals
residual[i] < dmf[i] - mu.dmf[i]
}
pred.mean.1.7 < beta1 + beta2×1.7
pred.ind.1.7 ∼ dnorm(pred.mean.1.7, tau1)
}
58 Bayesian Approaches in Oncology Using R and OpenBUGS
# OpenBUGS code 2
Model2
model {
for (i in 1:20)
{
mu.dmf[i] < alpha + beta×fl[i] + gamma×fl[i]×fl[i]
# regression equation
dmf[i] ∼ dnorm(mu.dmf[i],tau)
# distribution individual values
}
alpha ∼ dnorm(0.0, 0.000001)
beta ∼ dnorm(0.0, 0.000001)
gamma ∼ dnorm(0.0, 0.000001)
sigma2 ∼ dunif(0,200)
tau2 < 1/(sigma2×sigma2)
for (i in 1:20)
{ # calculate residuals
residual[i] < dmf[i]-mu.dmf[i]
}
pred.mean.1.7 < alpha + beta1×1.7 + beta2×1.7×1.7
# mean prediction for fl=1.7
pred.ind.1.7 ∼ dnorm(pred.mean.1.7, tau2)
# individual pred for fl=1.7
}
Optimum Biological Dose Selection 59
# OpenBUGS code 2
dmf[] fl[]
159 15
168 7.5
201 15
206 15
209 15
212 15
230 12.5
265 12.5
270 9
285 7.5
290 7.5
195 9
201 9
236 7.5
191 12.5
225 12.5
232 7.5
265 12.5
396 9
412 9
END
list(dmf=c( 159,168,201,206,209,212,230,265,
270,285,290,195,201,236,191,225,232,265,396,
412),
fl=c(15,7.5,15,15,15,15,12.5,12.5,9,7.5,
7.5,9,9,7.5,12.5,12.5,7.5,12.5,9,9))}
5.9 Discussion
The MC can be classified as next-generation chemotherapy [13]. It is possi
ble to explore the performance of one surrogate marker toward influencing the
disease outcome. Similarly, the treatment outcome can be classified as a dose
best to control the OBD. Perhaps, it is difficult to establish the biomarker or
surrogate marker.
60 Bayesian Approaches in Oncology Using R and OpenBUGS
install.packages(“DoseFinding”)
library(“DoseFinding”)
doses < c(0, 10, 25, 50, 100, 150)
fmodels < Mods(linear = NULL, emax = 25,
logistic = c(50, 10.88111), exponential = 85,
betaMod = rbind(c(0.33, 2.31), c(1.39, 1.39)),
linInt = rbind(c(0, 1, 1, 1, 1),
c(0, 0, 1, 1, 0.8)),
doses=doses, placEff = 0, maxEff = 0.4,
addArgs=list(scal=200))
## Calculate doses giving an improvement of 0.3 over placebo
TD(fmodels, Delta=0.3)
## discrete version
TD(fmodels, Delta=0.3, TDtype =“discrete”, doses=doses)
## doses giving 50% of the maximum effect
ED(fmodels, p=0.5)
ED(fmodels, p=0.5, EDtype = “discrete”, doses=doses)
plot(fmodels, plotTD = TRUE, Delta = 0.3)
Optimum Biological Dose Selection 61
Bayesian in Time-to-Event
Data Analysis
63
Chapter 6
Survival Analysis
Abstract
The survival analysis is widely adopted statistical analysis in oncology.
Commonly, the Kaplan-Meier estimate and Cox proportional hazard models
are performed in survival analysis. The Kaplan-Meier showed by assuming
the censoring mechanism is non-informative. However, it gets violated some
times. The informative censoring procedure using a Bayesian framework is
suitable to overcome this violation. This chapter is dedicated to Bayesian sur
vival analysis. Real-life data analysis is illustrated with OpenBUGS software.
Similarly, survival analysis with R by Kaplan-Meier method, Cox PH model,
Schoenfeld Residuals and other tools are theoretically explained.
6.1 Introduction
The objective of the survival analysis is to estimate and interpret hazard
functions from survival data. Secondly, compare the hazard functions. Thirdly,
explore the relationship of explanatory variables to survival time. Hazard func
tion created from several groups.
Survival analysis comes with an analytical problem as censoring. Censoring
comes while we have some information about individual survival time, but we
do not have actual survival time.
The censoring data occurred if the event may not occur before the study end,
the person is lost to follow-up during the study ends, and a person withdraws
from the study due to other causes of death.
If the real survival time is equal to or greater than the observed survival time
then it is defined as Right censoring.
Similarly, if the real survival time is less than or equal to the observed survival
time, then it is defined as Left censoring.
Conventionally, the survival analysis is performed with hazard function. The
hazard function h(t) is defined as the instantaneous potential per unit time
for the event to occur, given that the individual has survived up to time t.
65
66 Bayesian Approaches in Oncology Using R and OpenBUGS
library(“survival”)
data(lung)
KM.result<-survfit(Surv(time, status) ∼ 1, data=lung)
KM.result
Kaplan-Meier Plot R
library(“survival”)
library(“ranger”)
library(“ggplot2”)
library(“dplyr”)
library(“ggfortify”)
autoplot(“KM.result”)
Survival Analysis 67
library(“mice”)
library(“MASS”)
lung$status < 2
ch < nelsonaalen(lung, time, status)
plot(x = lung$time y = ch,ylab=‘Cumulative hazard’, xlab=‘Time’)
= X β
= Xj β (6.5)
hj (t) h0 (t)e j e
The term
hi (t)
hj (t)
is constant over time and the model is known as the proportional hazards
model.
The log partial likelihood with respect to β gives the p × 1 score vector,
U (β):
n �
� ∞
U (β) = [Xi (s) − x̄(β, s)]dNi (s) (6.8)
i=1 0
and �
Yi (s)ri (S)Xi (s)
x̄(β, s) =
� (6.9)
Yi (s)ri (s)
with Yi (s)ri (s) as the weights. The maximum partial likelihood estimator is
found by solving the partial likelihood equation:
U (β̂) = 0 (6.10)
Now β̂ is consistent and asymptotically normally distributed with mean β
with variance {EI(β)}−1
Cox PH using R
library(“survival”)
data(lung)
a<-coxph(Surv(time, status) ∼ ph.ecog,data=lung)
summary(a)
70 Bayesian Approaches in Oncology Using R and OpenBUGS
#R Output
library(“survival”)
data(lung)
b<-coxph(Surv(time, status) ∼ sex + strata(ph.ecog),data=lung)
summary(b)
Survival Analysis 71
#R Output
Any survival function S(t; x) plotted on the complementary log-log scale, dif
�
fers from InΔ0 by the constant amount β x throughout its duration.
Hence there are two functions. S(t; x1 ) and S(t; x2 ) for different values of the
This gives the basic means of inspecting the PH assumption. Reliable esti
library(“survival”)
data(lung)
res.cox<-coxph(Surv(time, status) ∼ sex,data=lung)
res.cox
test.ph < cox.zph(res.cox)
test.ph
#R Output
chisq df p
sex 2.86 1 0.091
GLOBAL 2.86 1 0.091
(or been censored) so were at risk of failure at this time. The conditional
probability that i failed, given that there was a failure at tl is given by
�
hi (ti ) eβ x
pi = � =� (6.17)
β � xj
j�Ri hj (ti ) j�Ri e
library(“survival”)
fit < coxph(Surv(time,status) ∼ sex+factor(ph.ecog)+age,
data=lung,x=TRUE)
plot(cox.zph(fit))
library(“survival”)
library(“BMA”)
data(veteran)
veteran[1:10,]
test.bic.surv< bic.surv(Surv(time,status) ∼ ., data = veteran,
factor.type = TRUE)
summary(test.bic.surv, conditional=FALSE, digits=2)
plot(test.bic.surv)
imageplot.bma(test.bic.surv)
74 Bayesian Approaches in Oncology Using R and OpenBUGS
#R Output
6.3.1 BayesSurvival
PlotBayesSurv to Create Posterior Mean with Credible Band for the
Survival Function or Cumulative Hazard.
library(“BayesSurvival”)
library(“simsurv”)
library(“ggplot2”)
hazard.true < function(t,x, betas, ...)
1.2×(5×(t+0.05)3 -10 × (t+0.05)2 +5×(t+0.05))+0.7
sim.df < data.frame(id = 1:1000)
df < simsurv(x = sim.df, maxt = 1, hazard = hazard.true)
bs <- BayesSurv(df, “eventtime”, “status”)
PlotBayesSurv(bs, object = “survival”)
cumhaz.plot + labs(title = “Cumulative hazard”)
6.3.2 muhaz
library(“muhaz”)
library(“survival”)
data(lung)
attach(lung)
fit < pehaz(time,status)
plot(fit)
fit < muhaz(time,status)
summary(fit)
plot(fit)
Now q[i] represent the probability of recurrence and R[i] as the number at risk
at the beginning of the ith ordered time. A beta prior for the q[i] is suitable
to estimate as
q[i] ∼ beta(0.01, 0.01) (6.22)
It used to estimate the posterior distribution for the recurrence probability.
Now the recurrence probability is estimated as
and
qc[i] ∼ beta(0.01, 0.01) (6.26)
Now distribution assumptions about death and censoring jointly specify the
number at risk. Now the task is to generate the posterior distribution of S(t(i) )
as the conditional probabilities
model;
{
for (i in 1:m1){ d1[i] ∼ dbin(q1[i],r1[i])}
for (i in 1:m1){ ce1[i] ∼ dbin(qc1[i],r1[i])}
for (i in 1:m1){ q1[i] ∼ dbeta (.01,.01)}
for (i in 1:m1){ qc1[i] ∼ dbeta (.01,.01)}
for(i in 1:m1){ p[i]<-(r[i]-d[i])/r[i]}
for (i in 1:m1){ p1[i]<-1-q1[i]}
r1[1]< 35
for(i in 2:m1){ r1[i]<-r1[i-1]-d1[i-1]-ce1[i-1]}
for (i in 2:m1){ s1[i]<-s1[i-1]× p1[i]}
s1[1]<-p1[1]
for (i in 1:m2){ d2[i]∼ dbin(q2[i],r2[i])}
for (i in 1:m2){ ce2[i]∼ dbin(qc2[i],r2[i])}
for (i in 1:m2){ q2[i]∼ dbeta (.01,.01)}
for (i in 1:m2){ qc2[i]∼ dbeta (.01,.01)}
for(i in 1:m2){ p2[i]<-(r2[i]-d2[i])/r2[i]}
for (i in 1:m2){ p2[i]<-1-q2[i]}
r2[1]< 19
for(i in 2:m2){ r2[i]<-r2[i-1]-d2[i-1]-ce2[i-1]}
for (i in 2:m2){ s2[i]<-s2[i-1] × p2[i]}
s2[1]<-p2[1]
}
#call the data #
list(m1 = 9,d1 = c(3,2,1,1,2,1,1,1,1),ce1 = c(1,1,2,0,3,0,5,6,4),
m2 = 10,d2 = c(2,2,1,2,2,4,2,2,1,1),
ce2 = c(0,0,0,0,0,0,0,0,0,0))
#Define the initial data#
list(q1 = c(.5,.5,.5,.5,.5,.5,.5,.5,.5),
q2 = c(.5,.5,.5,.5,.5,.5,.5,.5,.5,.5))
Now the same covariate X can be separated as X ∗ and X. The term X ∗ shows
the one category of a variable X. Similarly, X defined another category. The
HR is defined as
i=p
�
HR = exp( βi (Xi∗ − Xi )) (6.36)
i=1
Now the Bayesian specify the prior distribution of β. It used to obtain the
hazard ratio. If X = 0 and X ∗ = 1 then hazard ratio formed as
TABLE 6.3: Posterior estimates Hazard Ratio for Arm=2 with respect to
Arm=1
Parameter Mean SD val2.5pc median val97.5pc
HR 4.763 2.645 1.375 4.022 11.52
Survival Analysis 81
model
{
for(i in 1:N) {
for(j in 1:T) {
Y[i,j] < step(obs.t[i]-t[j]+eps)
dN[i, j] < Y[i, j] × step(t[j + 1]-obs.t[i]-eps) × fail[i]
}}
for(j in 1:T) {
for(i in 1:N) {
dN[i, j] ∼ dpois(Idt[i, j])
Idt[i, j] < Y[i, j] × exp(beta × x1[i]) × dL0[j]
}
dL0[j] ∼ dgamma(mu[j], c)
mu[j] < dL0.star[j] × c
S.treat[j] < pow(exp(-sum(dL0[1 : j]))
exp(beta × -0.5))
S.placebo[j] < pow(exp(-sum(dL0[1 : j]))
exp(beta × 0.5))
}
c < 0.001
r < 0.1
for (j in 1 : T) {
dL0.star[j] < r × (t[j + 1]-t[j])
}
beta ∼ dnorm(0.0,0.000001)
HR<-exp(beta)
}
list(N = 44,T = 16,eps=1.0E-20,obs.t = c(1, 1, 2, 2, 3, 4, 4, 5, 5,
8, 8, 8, 8, 11,11,12,12,12,15,15,17,22,23,6,6, 6, 6, 7, 9, 10,
10,11,13,16,17,19,20,22,23,25,32,32,34,35),
fail=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1, 1, 1, 1, 1, 1,1,0, 1, 0, 1, 0, 0, 1, 1, 0,
0, 0, 1, 1, 0, 0, 0, 0, 0),
x1 = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,0.5,0.5, 0.5, 0.5,
0.5, -0.5,-0.5, -0.5,-0.5, -0.5,-0.5,-0.5, -0.5,-0.5,-0.5,
-0.5,-0.5,-0.5,-0.5,-0.5,-0.5,-0.5,-0.5, -0.5, -0.5, -0.5),
t = c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16,
17, 22, 23))
list(beta = 0.0,dL0 = c(1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
1.0,1.0,1.0,1.0,1.0,1.0,1.0, 1.0))
82 Bayesian Approaches in Oncology Using R and OpenBUGS
Abstract
Survival analysis data comes with multiple and mutually exclusive events.
The application of competing risk modeling helps to account different causes
of death. There are several techniques to work with competing risk model
ing to prioritize the specific cause. Several techniques are available to work
with the cause-specific framework, mixture models, sub-distribution hazard,
and method of pseudo-observations. This chapter is dedicated toward differ
ent methodology to handle cause-specific model. Details review of competing
risk model is available. Theory about competing Risk as Bivariate Random
Variable, Cumulative Incidence Rate, and Cause-specific hazard model are
discussed. The illustration to perform competing risk model with R and Open-
BUGS are elaborated. This chapter will be helpful to perform a competing
risk model in oncology or time-to-event data.
7.1 Introduction
The occurrence of Competing risk is prevalent in oncology. The reason
for death for the treated cancer patients may be different, and it is defined as
‘Competing Risk’. Now, the reason of death may be a local recurrence, distant
metastases, occurrence of a secondary tumor. There are different strategies to
deal with competing risks with survival analysis. The conventional survival
suggests pursuing with cause-specific death modeling. It occurs when there
are at least two possible ways that a person can fail, but only one such fail
ure type can occur. The situation in which an individual can experience more
than one type of event [14, 15].The failure to achieve independence between
the time to an event and the censoring mechanism [16, 17].The concept of
competing risks as the situation where one type of event ’either precludes the
occurrence of another event under investigation or fundamentally alters the
probability of occurrence of this other event [18].
Suppose, T is a continuous positive random variable provides survival time.
87
88 Bayesian Approaches in Oncology Using R and OpenBUGS
The probability density function (pdf) is defined as f (t). The cumulative dis
tribution function (cdf) is defined as
Now the survival function of the probability of being alive at t. The survival
function of the probability of being alive at t is presented as.
Fi (t) = P (T ≤ t, C = i) (7.6)
The probability that an event i does not occur by time t can be denoted as
When the competing risks are not present the overall distribution function
spans the interval [0, 1].
library(“cmprsk”)
library(“survival”)
os < rexp(100)
death<-sample(1:2,100,replace=TRUE)
arm< factor(sample(1:2,100,replace=TRUE), 1:2,c(‘arm1’,‘arm2’))
competigndata<-cuminc(os,death,arm)
plot(competigndata,lty=1,color=1:4)
fit=cuminc(os,death)
fit
90 Bayesian Approaches in Oncology Using R and OpenBUGS
#R Output
$var
1 2 3 4 5
1 1 0.002135 0.0024856 0.0025274 0.0025562 0.0026351
1 2 0.002282 0.0025567 0.0025940 0.0025940 0.0026136
timepoints(fit,times=c(1,2,3,4))
Competing Risk Data Analysis 91
#R Output
$est
1 2 3 4
1 1 0.30 0.41 0.43 0.44
1 2 0.34 0.47 0.52 0.52
$var
1 2 3 4
1 1 0.00213530 0.002485680 0.002527465 0.002556282
1 2 0.00228206 0.002556745 0.002594095 0.002594095
It can be obtained as
K
�
E(t) = Sk (t) (7.13)
k=1
92 Bayesian Approaches in Oncology Using R and OpenBUGS
If Sk (t) stands with unadjusted survival function. let the causes are k. It is
defined as � t
� �
Sk (t) = exp(− λk (t )dt ) (7.14)
t� =0
It can be obtained as
0 ≤ Sk (t) ≤ 1, ∀k = 1, ..., K, t ≥ 0 (7.15)
It is obtained as Sk (0) = 1. The one-dimensional integration can be obtained
as
� t K
� � � �
Fk (t) = ( Sk (t ))λk (t )dt (7.16)
t� =0
k� =1
It can be reformed as Weibull survival model as
λk (t) = αk γk tαk −1 (7.17)
Posterior probability represents ith model when xn are observed. The model is
selected from r models. The process is to consider the model that comes with
the highest probability. The model maximize the numerator pi (xn )P (Mi ) is
defined as best model.
It is assumed that the prior probability P (Mi ) are similar in all the models.
It maximizes the marginal likelihood through pi (xn ) among the data selected.
P (M1 |xn ) p1 (xn ) P (M1 )
= (7.22)
P (M2 |xn ) p2 (xn ) P (M2 )
The Bayes factor is defined as
p1 (xn ) f1 (xn |θ1 )π1 (θ1 )dθ1
B12 = = (7.23)
p2 (xn ) f1 (xn |θ1 )π2 (θ2 )dθ2
data(“mgus2”)
head(mgus2)
etime < with(mgus2, ifelse(pstat==0, futime, ptime))
event < with(mgus2, ifelse(pstat==0, 2× death, 1))
event < factor(event, 0:2, labels=c(‘censor’, ‘pcm’, ‘death’))
pdata < finegray(Surv(etime, event) ∼ ., data=mgus2)
fgfit < coxph(Surv(fgstart, fgstop, fgstatus) ∼ age,weight=fgwt,
data=pdata)
summary(fgfit)
#R Output
Call:
coxph(formula = Surv(fgstart, fgstop, fgstatus) ~ age,
data = pdata,weights = fgwt)
Illustration Using R
library(“survival”)
library(“BMA”)
os < rexp(100)
death<-sample(1:2,100,replace=TRUE)
arm<-rbinom(100, 1,.5)
age<-rnorm(100,50,10)
diagtime<-rnorm(100,5,1)
cdata<-data.frame(os,death,arm,age,diagtime)
test.bic.surv< bic.surv(Surv(os,death) ∼., data =cdata)
summary(test.bic.surv, conditional=FALSE, digits=2)
imageplot.bma(test.bic.surv)
#R Output
Call:
bic.surv.formula(f = Surv(os, death) ~ .,
data = cdata)
nVar 0 1
BIC 0.000 1.179
post prob 0.484 0.268
nVar 1 1 2
BIC 3.541 3.808 4.290
post prob 0.082 0.072 0.057
Competing Risk Data Analysis 95
Abstract
Disease progression by recurrence, metastases are common in cancer. How
ever, disease progression is dependent on individual-level effect. Several un
known factors are contributing to disease progression. Survival analysis of
disease progression is useful. It makes it essential to accommodate several
contributing factors on disease progression by frailty model. Frailty model is
a branch of survival analysis.
This chapter is dedicated to frailty modelling. There are several techniques
to work with frailty data to make survival analysis robust. Different method
ology to handle frailty data analysis is presented. Data analysis with frailty
model is prepared with R. Further, Bayesian inference of frailty modelling
is presented. There are several techniques to work with frailty data to make
survival analysis robust. This chapter is dedicated to different methodology
to handle frailty data analysis. Details review of frailty data analysis is avail
able. Theory about frailty data analysis is discussed. Example to handle frailty
data analysis is illustrated with R. The Bayesian counterpart of Frailty data
analysis also explained and presented by examples.
8.1 Introduction
A cancer patient may be observed with several events after or during
the first treatment. For example, the patient opted for the head, and cancer
may appear with disease recurrence or metastases disease. Cancer patients
may experience recurrent even in the superficial tumor. The entire chance of
recurrence is dependent on patient-level influence or can be defined as the
individual-level effect. There could be several unknown factors that are con
tributing to disease appearance. The data handling technique on the recurrent
event is conventionally known as survival analysis. However, the inclusion of
several factors contribution at the individual level presents the extension of
conventional survival model as a frailty model. One patient is defined as frail
97
98 Bayesian Approaches in Oncology Using R and OpenBUGS
effect model by lognormal frailty [28]. There are different multivariate forma
tion by frailty data distribution [29]. The relation between baseline to time
to event plays a crucial role. Different distribution formation is required to
approach different frailty distribution [30]. Generally, the gamma distribution
is a widely accepted choice for frailty distribution. The widely used R package
is survival. Function available coxph() in survival package is useful to handle
frailties as a semi-parametric model. Similarly, frailtypack package [31] pro
vides the gamma frailty models as a semi-parametric estimation. The Weibull
is the parametric choice. It can also be used in frailtypack package. The coxme
[32] and phmm [33] functions also work in a similar direction with lognormal
frailty model. But a large choice of parametric distributions is available in
parfm [34], an R package. The gamma, inverse Gaussian, lognormal, Weibull,
Gompertz, log-logistic are available in parfm. The parametric distributional
estimation is performed by marginal log-likelihood maximization. Frailty dis
tribution data is dependent with individual level variation.
travel to get treatment among head and neck cancer patients are established
by Cox shared frailty model [37]. The overall survival(OS) was evaluated.
#Example1
# Output1
library(“parfm”)
head(kidney)
kidney$ sex < kidney$sex - 1
mod < parfm(Surv(time, status) ∼ sex + age, cluster=‘id’,
data=kidney, dist=‘exponential’, frailty=‘gamma’)
mod
ci.parfm(mod, level=0.05)[‘sex’,]
u < predict(mod)
plot(u, sort=‘i’)
ESTIMATE SE p-val
theta 0.301 0.156
lambda 0.025 0.014
sex -1.485 0.396 <.001 ***
age 0.005 0.011 0.657
---
Signif.codes:0 '***' 0.001 '**' 0.01'*' 0.05 '.' 0.1 ' '1
low up
0.104 0.492
102 Bayesian Approaches in Oncology Using R and OpenBUGS
Frailty
Baseline gamma invGau posSta
exponential.........ok......ok......ok....
Weibull.............ok......ok......ok....
Gompertz............ok......nc......ok....
loglogistic.........ok......ok......ok....
lognormal...........ok......ok......ok....
or equivalently,
yij = log(tij ) = xTij β + vi + �ij (8.8)
Now xij = (1, xTij )T provides the intercept value. The corresponding vector is
formulated as β = (β0 , β T )T . The heteroscedastic error can be estimated as
�ij . It is independent of vi , and P (eβ0 +�ij > t|zij ) = S0,zij (t). It is defined as
β ∼ Np+1 (m0 , S0 )
library(“coda”)
library(“survival”)
library(“spBayesSurv”)
library(“fields”)
library(“BayesX”)
library(“R2BayesX”)
library(“spBayesSurv”)
106 Bayesian Approaches in Oncology Using R and OpenBUGS
data(“LeukSurv”)
d <- LeukSurv[order(LeukSurv$district), ] head(d)
Thereafter district are sorted from the LeukSurv dataset. Finally, the
adjacency matrix E is obtained as Next task is to assign a function to
run the MCMC algorithm. A total of 5,000 burn-in scans are planned with
nburn=5000. A final chain of 2,000 is fixed as nsave=2000. The thinning in
terval to run the MCMC is defined by nskip=4. A total of scans to be saved
is defined as nsave. The predefined number of scans to be displayed on the
screen is called by ndisplay.
Frailty Data Analysis 107
set.seed(1)
mcmc <- list(nburn = 5000, nsave = 2000, nskip = 4, ndisplay = 1000)
The posterior estimates for age, sex, WBC and tpi are obtained as 8.58,
−0.05, −0.26, −0.00 and −0.06 respectively. The estimates obtained through
Bayesian is presented with posterior estimates.
108 Bayesian Approaches in Oncology Using R and OpenBUGS
#R Output
traceplot(mcmc(res1beta[1, ]))
Abstract
Relative survival is helpful to understand the more deaths due to cancer
in comparison to the general population. It helps to determine the deaths due
to cancer by eliminating other causes of mortality. However, it requires to
define the causes of death. Even cancer patients may die due to other causes.
The cancer registry data always influenced by other causes of mortality. It is
required to define the causes of death rigorously. The application of relative
survival is minimal only on cancer have low survival rates. Now the relative
survival analysis by Bayesian counterpart is presented in the chapter. Cancer
clinical trial data example is shown in this chapter. The definition of piece
wise hazard function is presented. Further, the step to compute the piecewise
hazard value is performed. This chapter will help to provide information that
is difficult to address by conventional analysis. The computation of relative
survival is presented by OpenBUG software, as a Bayesian counterpart. The
methodology of relative survival by the multiplicative and additive model is
presented.
9.1 Introduction
Survival analysis is the only tool to understand cancer disease severity. It
is very much required to understand different progress in cancer care by apply
ing the survival analysis. Now the cancer survival can be presented in different
forms. Now the net cancer survival separates the impact of a cancer diagnosis
on different survival, and it is useful statistics to explain the cancer prognosis.
Net cancer survival defines the probability of surviving a cancer diagnosis in
the absence of competing causes of death. Net survival is most often used by
two technique, i.e., competing for risk and relative survival. Now, in this chap
ter, we will show the work with relative survival. The application of relative
survival becomes useful in the National Cancer Institute’s Surveillance, Epi
demiology and End Results (SEER) the program. Relative survival estimates
the percent of persons surviving by death adjusted for expected deaths by life
table data. Now relative survival estimates the life expectancies for the US
111
112 Bayesian Approaches in Oncology Using R and OpenBUGS
This ratio helps to make a comparison of patients from the general popula
tion. Survival of the patients becomes very poor if the ratio is lesser than one.
It is not obvious that the survival curve will be monotonously increasing [45].
The ratio can be used to compare the survival measures between two cohorts
adjusted by different demographic variable. It is assumed that the overall haz
ard of each cancer patients can be defined as λOi . It can be split into two parts
as a hazard due to cancer(λEi ) and hazard present in the general population
(λP i ). There is a gap between methodological development and practical ap
plication. Some methodological development is required on ad-hoc changes in
the existing function. The relative survival analysis can be performed in R.
The “resurv” is suitable enough to perform the different relative survival anal
ysis. This function is useful toward importing the population tables through
regression analysis. Futher, the individual level survival can be defined as
� t �t
exp{− 0 λOi (u)du}
SEi (t) = exp{− λEi (u)du} = �t (9.2)
0 exp{− 0 λPi (u)du}
Suppose a cohort size is N . The marginal relative survival ratio can be defined
as
N
1 �
SE (t) = SE (t) (9.3)
N i=1 i
Now the net survival is defined as
#Definition of Net Survival
1
�N 1
�N
N i=1 SOi (t) N i=1 SOi (t)
SR (t) = 1
�N ; SE (t) = 1
�N (9.4)
N i=1 SPi (t) N i=1 SPi (t)
114 Bayesian Approaches in Oncology Using R and OpenBUGS
FIGURE 9.1: Survival comparison between EGFR postive and EGFR neg
ative.
Relative Survival Analysis 115
FIGURE 9.2: Cumulative survial curve between EGFR postive and EGFR
negative.
library(“survival”)
haz < survexp(OS∼1,data=mydata, rmap = list(year=year,
age=age1), method=‘individual.h’)
FIGURE 9.3: Survial comparison between EGFR postive and EGFR nega
tive.
library(“survival”)
haz < survexp(OS∼1, data=mydata,
rmap = list(year=year, age=age1),method=“individual.h”)
glm(mydata$death∼1 + offset(log(haz)), family=poisson)
glm(mydata$death ∼mydata$EGFR + offset(log(haz)),
family=poisson)
pfit<-coxph(Surv(OS,death>0) EGFR+Stage+Diabetes+
metapresent,data=mydata)
plot(survfit(Surv(OS,death>0)∼EGFR, data=mydata),
ylab=“Survival probability”,xlab=“Survival in Days”,
main=“Relative Surival of having EGFR positive”,
mark.time=FALSE)
lines(survexp(∼EGFR, ratetable=pfit, data=mydata),
col=‘purple’)
Call:glm(formula = mydata$death~1+offset(log(haz)),
family = poisson)
Coefficients:
(Intercept)
3.474
Call:glm(formula=mydata$death~mydata$EGFR+offset(log(haz)),
family = poisson)
Coefficients:
(Intercept) mydata$EGFR
3.49530 -0.04192
(X(τj ) − X(τj−1 ))
τj = �n (9.9)
(T
i=1 i Λτ j − Ti Λτj−1 )I(T > τj−1 )
FIGURE 9.4: Piecewise hazard rate estimates comparison for different sur
vival duration.
library(“flexrsurv”)
library(“relsurv”)
data(rdata)
data(slopop)
rdata$iage<-findInterval(rdata$age ×365.24+rdata$time,
attr(slopop,“cutpoints”)[[1]])
data$iyear<-findInterval(rdata$year+rdata$time,
attr(slopop, “cutpoints”)[[2]])
therate < rep(-1, dim(rdata)[1])
for( i in 1:dim(rdata)[1]){
therate[i] < slopop[rdataiage[i], rdataiyear[i],
rdata$sex[i]] } rdata$slorate < therate
rdata$sex01 < rdata$sex -1
fit < flexrsurv(Surv(time,cens) ∼ sex01+
NLL(age, Knots=60, Degree=3,
Boundary.knots = c(24, 95)),
rate=slorate, data=rdata,
knots.Bh=1850, # one interior knot at 5 years
degree.Bh=3,
Max T=5400,
Spline =“b-spline”,
initbyglm=TRUE,
initbands=seq(0, 5400, 100),
int meth=“BANDS”,
bands=seq(0, 5400, 50) )
summary(fit)
122 Bayesian Approaches in Oncology Using R and OpenBUGS
FIGURE 9.5: Different age wise cancer diagnosis, prostate cancer case dis
tribution.
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Baseline hazard:1 -9.3669 1.9478 -4.809 1.52e-06 ***
Baseline hazard:2 -10.8062 1.9569 -5.522 3.35e-08 ***
Baseline hazard:3 -10.6876 2.1408 -4.992 5.96e-07 ***
Baseline hazard:4 -10.7724 2.2308 -4.829 1.37e-06 ***
Baseline hazard:5 -10.5813 2.6117 -4.051 5.09e-05 ***
sex01 0.6852 0.1858 3.687 0.000227 ***
NLL(age):1 -0.6171 2.7501 -0.224 0.822446
NLL(age):2 1.6521 1.6531 0.999 0.317608
NLL(age):3 0.9809 2.5093 0.391 0.695865
NLL(age):4 2.3640 2.4392 0.969 0.332446
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-likelihood: -4781.327
Relative Survival Analysis 123
9.8.2 relsurv
The relsurv package is useful to analyzing the relative survival data. It in
cludes relative survival ratio, crude mortality, multiplicative regression model.
#R Package “Relsurv” for Relative Survival
library(“relsurv”)
data(slopop)
data(rdata)
rs.surv(Surv(time,cens)∼sex,rmap=list(age=age×365.241),
ratetable=slopop,data=rdata)
d < rs.surv.rsadd(fit,newdata=data.frame(sex=1,
age=65,year=as.date(“1Jul1982”))
fit < rsadd(Surv(time,cens) sex+age+year,
rmap=list(age=age×365.241),
ratetable=slopop,data=rdata,int=c(0:10,15))
plot(d,xscale=365.241)
9.8.3 survexp.fr
This package is useful to obtain the Relative survival. Function
“survexp plot” avalible in “survexp.fr” is helpful to obtain the relative survival
plot. The illustration given below.
#R Package “Survexp.fr” is Useful for Relative Survival
library(“survexp.fr”)
attach(data.example)
survexp plot(futime, status, age, sex, entry date)
9.9 Discussion
Relative survival analysis can help us to provide information that is difficult
to address by conventional analysis. It is really useful to adopt a conventional
analysis of a better therapeutic effect comparison. We illustrate the application
through OpenBUG software. It is performed as a Bayesian counterpart. The
methodology of relative survival by the multiplicative and additive model is
well-elaborated [49]. The application of relative survival at an individual level
is also documented [50]. The application of relative survival is very limited
only on cancer have low survival rates. A good explanation about relative and
net survival is well-documented [45]. Cause-specific survival can be used to
ward median duration survival estimates. However, it is applicable in relative
survival. The hazard function estimates are developed through predictive scor
ing. The treatment effect can be compared through different age interval by
piecewise hazard. It is also applicable in health policy implementation. This
chapter provides the importance of relative survival, and piecewise estimates
of hazard function are feasible to use to develop prediction score as well. It
will provide us with another dimension about the establishment of therapeutic
effect. It may be important toward health policy decision. This work will help
to understand the hazard function at a different time point interval.
Part III
Bayesian in Longitudinal
Data Analysis
129
Chapter 10
Abstract
To describe different types of statistical tool and analytical approach for
longitudinal data analysis in oncology. A different aspect of longitudinal study
design is presented. The Bayesian counterpart to perform longitudinal data
analysis in oncology is furnished. The Bayesian estimation of a balanced lon
gitudinal model with ARMA(1) the structure is provided. Another covariance
structure like compound symmetry is explained with R packages. Finally, A
study is presented to compare the Quality of Life in palliative patients treated
with two chemotherapeutic arms. It provides the best arm to maintain bet
ter QoL among treated patients. The observations are taken repeatedly for
each patient during treatment. The application of the mixed effect model is
implemented.
10.1 Introduction
In oncology, conventionally the endpoints are considered as a clinical out
come as clinical efficacy. Conventionally, clinical efficacy is defined by overall
survival (OS). Similarly, biological efficacy also considered through disease-free
survival (DFS). The OS is the gold standard in any oncology trial. Perhaps, it
is difficult to consider for any clinical benefits. The therapeutic effect becomes
more attractive if it can provide a better quality of life (QoL). The QoL always
measured in a repeated manner through patients experience. The QoL mea
sured as a standard questionnaire. After that, the comparison of QoL between
the treatment arm provides the best effective treatment. The QoL measures
repeated outcomes over time. In a conventional randomized controlled trial,
the baseline QoL assessed and followed up longitudinally. In repeated QoL,
the same individual subjects observed, and observations are correlated. The
traditional statistical approach for analyzing such data is not suitable enough.
In this context, this chapter dedicated to mixed effect modeling to Analysis
of QoL data. The Linear mixed effect is a method to deal with repeatedly
131
132 Bayesian Approaches in Oncology Using R and OpenBUGS
repeated measurements are defined as yij . The ith individuals jth time point
measurement is defined as yij . Now if the maximum number of visits are de
fined as m then j = 1, ..., mi . Similarly, a total of n sample data is defined as
i = 1, ...., n. The mean parameter linked with each measurement is µij . Now
the term µij changes from different situations. For example, the presence of
two time period (j = 1, 2) says the estimation of µi1 , µi2 will vary. The simple
relation can be obtained as µij = αµi,j−1 .
The model can be settled as
10.6.1 bayeslongitudinal
This function is used to obtain the longitudinal regression model by
Bayesian methodology. There are two functions, i.e., mhsc and mharma11.
The mharma11 is used for autoregressive modeling and mhsc for compound
symmetry structure. Both the function is illustrated below:
Longitudinal Data Analysis 135
library(“bayeslongitudinal”)
attach(Dental)
Y=as.vector(distance)
X=as.matrix(cbind(1,age))
mharma11(Y,X,27,4,c(1,1),0.5,0.5,1,1,1,1,500,50)
mharma11(Data, Matriz, individuos, tiempos, betai, rhoi, gammai,
beta1i, beta2i, beta1j, beta2j, iteraciones, burn).
Data A=vector with the observations of the response variable.
Matriz=The model design matrix.
individuos=A numerical value indicating the number of individuals
in the study.
tiempos=A numerical value indicating the number of times observa
tions were repeated.
betai=A vector with the initial values of the vector of regressors.
rhoi= A numerical value with the initial value of the correlation for
rho.
gammai= A numerical value with the initial value of the correlation
for phi.
beta1i= A numerical value with the shape parameter of a beta apriori
distribution of rho.
beta2i= A numerical value with the scaling parameter of a beta
apriori distribution of
rho.
beta1j= A numerical value with the shape parameter of a beta apriori
distribution of phi.
beta2j= A numerical value with the scaling parameter of a beta
apriori distribution of phi.
iteraciones= A numerical value with the number of iterations that
will be applied the algorithm MCMC.
burn= Number of iterations that are discarded from the chain.
136 Bayesian Approaches in Oncology Using R and OpenBUGS
library(“bayeslongitudinal”)
attach(Dental)
Y=as.vector(distance)
X=as.matrix(cbind(1,age))
mhsc(Y,X,27,4,c(1,1),0.5,1,1,500,50)
mhsc(Data, Matriz, individuos, tiempos, betai, rhoi, beta1i, beta2i,
iteraciones, burn)
Data A=vector with the observations of the response variable.
Matriz=The model design matrix.
individuos=A numerical value indicating the number of individuals
in the study.
tiempos=A numerical value indicating the number of times observa
tions were repeated.
betai=A vector with the initial values of the vector of regressors.
rhoi=A numerical value with the initial value of the correlation.
beta1i=A numerical value with the shape parameter of a beta apriori
distribution of rho.
beta2i=A numerical value with the scaling parameter of a beta
apriori distribution of
rho=iteraciones numerical value with the number of iterations that
will be applied the algorithm.
MCMC=burn Number of iterations that are discarded from the chain.
library(“ggplot2”)
mydata<-data.frame(ID,painscore,visit,Arm)
p < ggplot(data = mydata, aes(x = time,y=painscore,group=ID))
p + geomline()
138 Bayesian Approaches in Oncology Using R and OpenBUGS
model {
for (i in 1:n) {
for (j in 1:K) {
mu[i,j] < m + a[i]
y[i,j]∼ dnorm( mu[i,j], tau )
}
a[i] ∼ dnorm( 0, tau.a)
}
m∼ dnorm(0.0,0.001) #Define prior distribution of m
tau∼dgamma(0.001,0.001)#Define prior distribution of tau
tau.a∼dgamma(0.001,0.001)#Define prior distribution of tau.a
s2< 1/tau # Define prior value for s2
s2.a< 1/tau.a # Define prior value for s2.a
ts2< s2+s2.a # Define value for ts2
cor< s2.a/ts2 # calculate correaltion coefficient
s<-sqrt(s2)
s.a< sqrt(s2.a)
for(i in 1:n) {
for(j in 1:K){
res[i,j]< y[i,j]-mu[i,j]
}}
R2<-1-pow(sd(res[1:n,1:K])/sd(y[1:n,1:K]),2)
}
140 Bayesian Approaches in Oncology Using R and OpenBUGS
DATA
list(n=14, K=7,y=structure(.Data=c(13.18,17.32,29.55,34.53,
23.78,20.18,30.63,27.46,15.63,26.45,14.62,29.15,29.53,35.14,
39.10,20.98,9.62,29.44,23.60,30.95,27.28,20.75,30.74,23.24,
34.17,24.93,32.94,23.51,16.90,24.23,2.03,27.03,20.09,29.49,
24.23,18.62,25.08,16.80,17.73,20.02,11.96,39.44,15.59,21.12,
13.63,33.99,12.74,26.82,20.28,27.95,13.01,27.19,34.90,38.22,
22.25,36.68,12.98,15.71,23.14,21.09,30.17,30.09,10.30,
36.75,15.31,16.35,21.38,18.44,11.39,17.33,28.60,30.38,
33.50,31.80,22.62,19.50,22.95,28.68,29.28,18.75,24.31,
22.89,14.33,25.12,22.60,12.62,15.89,19.03,18.36,25.80,
8.39,25.17,22.21,45.62,7.82,17.88,19.02,20.87)
,.Dim = c(14, 7)))
INITS
list(m=0.0, tau=1.0)
10.8 Result
The application of longitudinal data in oncology research arises several
times [52, 53, 54, 55, 56, 57, 54]. In epidemiological and clinical setup, both
the case it is possible to occur. The mixed-effect model is used to show whether
the changes in measurement over time individual level or group level as well.
If the group level measurement difference occurs, then it is significant or not.
The simple application of a mixed effect model is about a group of patients
comparison between two time periods. The Bayesian method in longitudinal
data presented in this chapter. The posterior mean of changes in pain score ob
tained with 23.11. The results also present individual variation and group-wise
variation. The Analysis executed with 20,000 observations for the simulation,
with the burn-in of 1,000 and a refresh of 100. The posterior distributions of
within- and between-subject variabilities are observed through (σa2 and σ 2 , re
spectively) respectively. The trace plots for all these parameters given below.
The OpenBUGS gives the trace plots. It shows the variation result for each
parameter during the simulation.
Longitudinal Data Analysis 141
Abstract
The occurrence of missing data is common in any experimental field of
research. Oncology research is not exceptional. Patients follow-up visits often
occur. The follow-up visits raise the chance of appearance of missing observa
tions. Missing observation seriously affects statistical inference. This chapter
is about presenting a Bayesian technique to handle the missing data analy
sis. The objective is to obtain the best statistical inference in the presence of
missing data.
11.1 Introduction
There is a different definition of missing data. Suppose the ith individuals
jth time point measurement is defined as Yij . Now i could take a value from
1 to N. N is the sample size. Similarly, j can take any value from 0 to a
maximum number of visits. Now another term Rij as an indicator to represent
the missing observation. If Yij is missing then Rij = 0. If Yij is present then
Rij = 1. The term Rij can be defined as the probability
143
144 Bayesian Approaches in Oncology Using R and OpenBUGS
The Bayesian approach helps in this context. The likelihood is multiplied with
the prior value of θ. The application of Bayes helped to obtain the posterior
value of θ. Bayes’s theorem is the posterior density of theta. Now the posterior
estimate of θ can be obtained as
drug. The quality of life (QOL) data is presented among those patients. Sim
ilarly, another group of the cohort (n=40) is treated with cisplatin therapy.
Data is filled with European Organisation for Research and Treatment of Can
cer(EORTC) QOL questionnaire. The primary objective of this study was to
compare the QOL between Cisplatin and Cetuximab arm. Presence of missing
data is handled by Bayesian methodology. The imputation technique is per
formed to obtain the missing data. The comparison between treatment effect
on repeated measurements is presented in Chapter 9. This chapter is restricted
only on missing data handling technique. Data is presented below.
library(“ggplot2”)
library(“BMTAR”)
data(missingest)
autoplot.regime missing(missingest,1)
library(ggplot2)
data(datasim)
data = datasim$Sim$Zt
parameters=list(l=1,orders = list(pj = 1))
initial=mtarinipars(tsregime obj = tsregime(data),
list model = list(pars = parameters))
estim1=mtarns(ini obj = initial,niter = 500,chain = TRUE)
autoplot.regime model(estim1,2)
autoplot.regime model(estim1,2)
autoplot.regime model(estim1,3)
autoplot.regime model(estim1,5)
data(datasim)
yt=datasim$Sim
Yt=yt$Yt
Zt=yt$Zt
data=tsregime(Yt,Zt)
autoplot.tsregime(data)
Missing Data Analysis 147
FIGURE 11.1: regime model object ggplot for the outputs on the function
outputs.
11.5.2 NestedCohort
The NestedCohort is dedicated for survial analysis if the covaraites are
missing. The nested cohort analysis is performed. One example to create sur
vival plot is provided below. Secondly, the application of Cox PH by Nestec-
Cohort package is given below.
Survival Analysis with Missing Covariate
library(“NestedCohort”)
data(zinc)
mod < nested.km(survfitformula=“Surv(futime01,ec01==1)∼ zn
quartiles”,
samplingmod=“ec01×basehist”,data=zinc)
plot(mod,ymin=.6,xlab=“Time (Days)”,ylab=”Survival”,
main=“Survival by Quartile of Zinc”,lty=1:4,)
legend(2000,0.7,c(“Q1”,“Q2”,“Q3”,“Q4”),lty=1:4)
coxmod < nested.coxph(coxformula=“Surv(futime01,ec01==1)∼
sex+agepill+smoke+drink+mildysp+moddysp+sevdysp+anyhist+
zncent”,samplingmod=“ec01×basehist”,data=zinc)
summary(coxmod)
mod < nested.stdsurv(outcome=“Surv(futime01,ec01==1)”,
exposures=“znquartiles”,
confounders=“sex+agestr+smoke+drink+mildysp+moddysp+
sevdysp+anyhist”
samplingmod=“ec01×basehist”,exposureofinterest=“Q4”,plot=TRUE,
main=“Time to Esophageal Cancer by Quartiles of Zinc”,data=zinc)
Prepare Dataset
model;
{
beta1 ∼ dnorm(0.0, 0.001)
beta2 ∼ dnorm(0.0, 0.001)
for(i in 1:N)for (j in 2:M)Y[i,j]∼ dnorm(m[i,j],T)
for(i in 1:N)for (j in 2:M)m[i,j]<-beta1 ×(1-rho)
+beta2×(visit[j]+rho×Y[i,j-1]-rho×visit[j-1])
for(i in 1:N)Y[i,1]∼ dnorm(mu[i,1],T)
rho ∼ dbeta(1,1)
for(i in 1:N)mu[i,1]<-beta1+beta2×visit[1]
sigma<-1/T
T ∼ dgamma(.01,.01)
}
list(N = 14,M= 4,Y = structure(.Data = c(
13.18,17.32,29.55,34.53,27.46,15.63,NA,NA,39.10,20.98,
9.62,29.44,20.75,30.74,23.24,34.17,16.90,24.23,NA,NA,
18.62,25.08,16.80,17.73,15.59,21.12,NA,NA,27.95,13.01,
27.19,34.90,12.98,15.71,NA,NA,36.75,15.31,16.35,21.38,
28.60,30.38,33.50,31.80,28.68,29.28,NA,NA,22.60,12.62,
15.89,19.03,25.17,22.21,NA,NA),. Dim = c(14,4)),
visit = c(1,2,3,4))
# initial values
list(beta1 = 0,beta2 = 0,rho =.5,T= 1)
model;
{
beta1 ∼ dnorm(0.0, 0.001)
beta2 ∼ dnorm(0.0, 0.001)
for(i in 1:N1){for (j in 2:M1){Y[i,j]∼dnorm(mu[i,j],tau)}}
for(i in 1:N1){for (j in 2:M1){mu[i,j]<-beta1∗ (1-rho)
+beta2∗ (age[j]-rho∗ age[j-1])+rho∗ Y[i,j-1]}}
for(i in 1:N1)Y[i,1]{dnorm(mu[i,1],tau)}
rho ∼ dbeta(1,1)
for(i in 1:N1){mu[i,1]<-beta1+beta2∗ age[1]}
tau ∼ dgamma(.01,.01)
sigma<-1/tau
}
list(N1 = 17,M1 =4,Y = structure(.Data = c(10.18,14.32,26.55,31.53,
24.46,NA,23.45,11.62,
36.10,NA,6.62,26.44,
17.75,27.74,20.24,31.17,
13.90,21.23,NA,24.03,
15.62,22.08,13.80,14.73,
12.59,NA,10.63,30.99,
24.95,10.01,24.19,31.90,
9.98,NA,20.14,18.09,
33.75,12.31,13.35,18.38,
25.60,NA,NA,28.80,
25.68,26.28,15.75,21.31,
19.60,9.62,12.89,16.03,
22.17,NA,NA,NA,
24.61,22.87,17.00,17.87,
36.14,24.20,26.92,24.21,
16.85,16.43,22.93,41.27),.Dim = c(17,4)),
age = c(2,4,14,16))
# initial values
list(beta1 = 0,beta2 = 0,rho =.5,tau = 1)
11.6 Conclusion
Repeatedly measured immune response and QOL is typical in oncology
research. Longitudinal modeling is required in repeatedly measured oncology
152 Bayesian Approaches in Oncology Using R and OpenBUGS
Abstract
This chapter is dedicated about joint modeling toward analyzing the joint
modeling approach. It is possible to work separately for longitudinal data and
survival data. However, we will explore the joint model to handle both types
of data. The Cox PH hazard model is capable of working with survival data.
However, we can extend to incorporate time-dependent variables. The work is
about extending the longitudinal measurement as a time-dependent covariate.
A conventional approach like partial likelihood is not suitable enough in this
context. Joint modeling is attempted to estimate the same parameters present
in the two or more models.
12.1 Introduction
The survival outcomes are handled by the Cox model. Similarly, the linear
mixed effect model is suitable to work with longitudinal measurements. The
joint model plays the role of longitudinal covariate and survival outcomes.
Present relation between survival and longitudinal model can be worked in
two ways one by using trajectory obtained from the longitudinal observations
in the hazard function of the death event. The shared random effects are
also useful in this context [59]. Models can be estimated by the likelihood
or Bayesian method. R packages to fit joint models are joinR, JMbayes, JM
[60, 61, 62, 63, 64]. The Joint modeling was first considered in AIDS research
[65, 66]. The primary interest was about survival analysis in the presence
of time-dependent covariates. It was measured repeatedly with measurement
error. However, drop out presence is obvious [67, 68]. Joint modeling provides
less bias and more efficient estimates of the treatment effect. It is considered
as most power treatment comparison method [67].
153
154 Bayesian Approaches in Oncology Using R and OpenBUGS
The term di shows the time of dropout, e.g., di = 3 as ri = (1, 1, 0, 0, ...). The
separate model for longitudinal data works with mixed effect model. The time-
to-event model works for survival data. In time-dependent survival model,
the time-dependent covariates are external because the value of the covariate
at point t is not affected by the occurrence of an event at any time point
u, t > u. Perhaps, it is not same in longitudinal setup. It is inherited property
by the subject those are involved in the failure process. Therefore, the joint
model is required to perform. Different software is capable of working with
longitudinal and survival model separately. These are available for several
years. For example, most advanced statistical software in R (R Development
Core Team 2019) several packages are available to work with mixed effect
modeling. Packages are nlme [69] and package lme4 [70]. Survival analysis
packages are available as survival [71]. Finally, the packages available for joint
modeling as well. In this work, we explored the Bayesian counterpart of the
joint model through OpenBUGS.
and
µij β = xTij β + zij
T
bi (12.3)
The times of observation for the ith individual is presented as sij . It is defined
as
xTij β = β10 + β11 sij + β12 sij × drugi + β13 genderi + β14 previ + β15 stratumi
(12.4)
T
zij bi = W1i (sij ) (12.5)
It is formaulated as
W1i (sij ) = b0i + b1i sij (12.6)
Now there is two component. One is random effect and another is random slop
for time. It is presented with bivariate normal prior distribution of covariance
G, i.e., bi ∼ N (0, G). The noninformative prior is considered with multivariate
normal distribution formation. The inverse gamma distribution is defined for
σ 2 ∼ IG(0.1, 0.1). There could be a different prior choice of the inverse gamma
distribution.
ti ∼ Weibull(φ, Ψi ) (12.7)
xT2i β2 = β20 + β21 drugi + β22 genderi + β23 prev + β24 stratumi (12.9)
156 Bayesian Approaches in Oncology Using R and OpenBUGS
The objective is to link the true and unobserved value of longitudinal data
at time point t. The unobserved value is defined as Mi (t) with the outcome
event Ti∗ .
hi (t|Mi (t), wi ) = limdt→0 Pr{t ≤ Ti∗ < t + dt|Ti∗ ≥ t, Mi (t), wi }/dt (12.13)
Trgscore=1 32 3 NA 2668 NA
Trgscore=2 148 18 NA 1445 NA
coef exp(coef) se(coef) z p-value
Trgscore 0.4705 1.6008 0.6267 0.751 0.453
It can be expressed as
and
� t
Vi (t) = exp{γ T ωi + αmi (s)}ds (12.17)
0
Further, the baseline hazard rate can be formulated as h0 (.) with a spe
cific distribution. Now the entire covariate history is defined as Mi (t) to
influence the subject-specific risk factor. Now h0 () can be evaluated as a
subject-specific risk by Vi (t)). Covriate infulence Mi (t) can be formualted
as Si {t|Mi (t)} = S0 Vi (t). Thereafter the survival models presented as mi (t)
to define the value of longitudinal covariate at single time point t. But the lon
gitudinal information is collected repeatedly and covariate or risk developed
by longitudinal trajectory Mi (t) as
The fixed effect parameter presented by β. Now the random effect is xi (t)[72,
73].
158 Bayesian Approaches in Oncology Using R and OpenBUGS
FIGURE 12.2: Linear mixed effect plot on CLC on Tumor Regression Grad
ing.
Joint Longitudinal and Survival Analysis 159
model {
for (i in 1:N) {
for (j in 1:M) {
Y[i, j] ∼ dnorm(muy[i, j], tauz)
muy[i, j]< beta1[1]+beta1[2] ×
t[j]+beta1[3] × t[j]× trgscore[i]+beta1[4]× ctcaegrade[i]
+beta1[5]× tstage[i]+beta1[6] × yptstage[i]+U[i,1]+ U[i,2] × t[j] }
for (j in 1: M) {
yp1[i,j] ∼ dnorm(muy[i, j], tauz)
r12[i,j]<-Y[i,j]-yp1[i,j]
sqr1[i,j]<-r12[i,j]× r12[i,j]
}
surt[i] ∼ dweib(p,mut[i]) I(surt.cen[i],)
log(mut[i])<-beta2[1]+beta2[2]*trgscore[i]+
beta2[3]× ctcaegrade[i]+beta2[4]× tstage[i]+
beta2[5]× yptstage[i]+r1*U[i,1]+r2 × U[i, 2]
U[i,1:2] ∼ dmnorm(U0[],tau[,])
surP[i] ∼ dweib(p,mut[i]) I(surt.cen[i],)
surR[i]<-surt[i]-surP[i]
sqr2[i]<-surR[i]× surR[i]
}
mspe1<-mean(sqr1[,])
mspe2<-mean(sqr2[])
p ∼ dgamma(1,1) # p < 1
sigmaz<-1/tauz
tau[1:2,1:2]<-inverse(sigma[,])
sigmab1 ∼ dunif(0,100) # priors
sigmab2 ∼ dunif(0,100)
cor ∼ dunif(-1,1)
sigma[1,1] < pow(sigmab1,2)
sigma[2,2] < pow(sigmab2,2)
sigma[1,2] < cor ×sigmab1∗ sigmab2
sigma[2,1] < sigma[1,2]
beta1[1:6] ∼ dmnorm(betamu1[],Sigma1[,])
tauz ∼ dgamma(0.1, 0.1)
beta2[1:5] ∼ dmnorm(betamu2[],Sigma2[,])
r1 ∼ dnorm(0, 0.01)
r2 ∼ dnorm(0, 0.01)
164 Bayesian Approaches in Oncology Using R and OpenBUGS
#Importing Data
list(N=45,M=4,betamu1=c(0,0,0,0,0,0),
betamu2=c(0,0,0,0,0),Sigma1=structure(.Data=
c(0.01,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0.01,0,0,0,0,0,0,
0.01,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0.01),.Dim=c(6,6)),
Sigma2=structure(.Data=c(0.01,0,0,0,0,0,0.01,0,0,0,
0,0,0.01,0,0,0,0,0,0.01,0,0,0,0,0,0.01),.Dim=c(5,5)),
U0=c(0,0),R=structure(.Data=c(100, 0, 0, 100),
.Dim=c(2,2)),t=c(0,2,6,12),Y=structure
(.Data=c(17.53,25.60,19.85,17.82,25.68,17.23,25.30,
18.30,16.26,18.01,19.47,19.98,18.00,15.96,12.57,20.46,
19.68,13.23,12.93,8.49,8.19,7.03,8.14,7.84,14.67,17.61,
17.31,15.30,6.422,5.22,6.51,7.85,7.55,4.55,13.89,13.59,
15.00,18.34,19.95,18.92,14.81,16.40,16.10,12.77,8.32,
8.02,6.82,6.52,18.62,10.67,10.37,14.46,12.97,13.81,
12.98,10.18,9.88,12.68,13,12.70,16.34,13.94,14.56,
14.26,9.79,11.00,9.89,9.70,40.46,40.16,41.43,32.59,
11.33,8.61,10.07,12.32,8.39,10.74,10.44,12.02,8.09,
25.24,18.73,22.05,21.75,24.94,8.75,8.28,15.90,15.60,
8.45,8.93,55.56,28.12,44.68,30.08,21.51,19.15,20.05,
29.78,31.48,34.47,28.01,34.69,15.85,17.00,16.8115.30,
15.55,16.70,16.51,18.69,8.45,7.07,15.60,12.41,11.17,
8.63,10.97,27.82,44.38,11.82,12.62,11.18,7.24,7.63,
7.257,8.23,19.38,20.87,19.83,16.70,4.85,15.00,6.63,
6.37,16.98,18.39,16.67,19.19,20.60,23.51,19.36,19.65,
13.98,12.75,13.55,9.75,17.61,20.68,6.94,22.06,7.20,
7.93,6.68,8.74,16.82,16.73,15.96,15.84,3.73,33.37,
12.31,22.89,16.37,18.89,20.30,23.21,19.94,6.77,26.25,
12.45,13.25,9.45,17.31,20.38,32.26,23.80,18.68,35.64),
.Dim=c(45,4)),surt=c(2752,NA,2731,2731,2736,2719,NA,
2695,2668,NA,2645,2611,2603,2536,2492,2467,NA,NA,2393,
NA,NA,2352,2327,2255,2240,NA,2234,2199,2164,2164,NA,NA,
2088,2059,2031,NA,2010,1982,1961,1919,NA,NA,NA,1485,NA),
surt.cen=c(0,319,0,0,0,0,19,0,0,2668,0,0,0,0,0,0,1110,
2432,0,21,493,0,0,0,0,614,0,0,0,0,1067,692,0,0,0,246,0,
0,0,0,160,200,365,0,1445),trgscore=c(0,1,1,0,1,0,0,1,0,
1,0,1,0,1,1,0,0,0,1,0,1,1,1,0,0,1,0,1,1,0,0,1,1,0,1,0,0,
1,0,1,0,0,0,1,0),ctcaegrade=c(1,1,1,0,0,0,0,0,1,1,1,1,0,
1,1,1,1,1,1,1,0,1,1,1,1,0,0,0,1,1,1,0,0,0,0,0,0,0,1,0,0,
1,0,0,0),tstage=c(4,1,2,3,1,3,2,4,1,2,4,4,4,2,1,3,3,3,2,
1,2,2,4,4,1,4,2,1,1,2,2,2,1,4,4,4,4,4,4,2,2,4,3,3,1),
yptstage=c(1,0,1,1,1,1,1,0,1,1,1,0,0,0,1,1,0,1,1,1,1,0,
0,0,1,0,0,1,1,,1,1,1,1,0,1,1,0,0,0,1,0,0,0,1))
Joint Longitudinal and Survival Analysis 165
list(beta1=c(0,0,0,0,0,0),
tauz=1,beta2=c(0,0,0,0,0),r1=0,r2=0,U=structure(.Data=c(0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0),.Dim=c(45,2)))
library(“joint.Cox”)
data(dataOvarian)
t.event=dataOvarian$t.event
event=dataOvarian$event
t.death=dataOvarian$t.death
death=dataOvarian$death
Z1=dataOvarian$CXCL12
group=dataOvarian$group
alpha given=0
kappa grid=seq(10,1e+17,length=30)
set.seed(1)
jointCox.indep.reg(t.event=t.event,event=event,
t.death=t.death,death=death,Z1=Z1,Z2=Z1,
group=group,alpha=alpha given,kappa1=kappa grid,
kappa2=kappa grid,LCV.plot=TRUE,Adj=500)
12.7.2 JM
This JM package is useful for Joint Modeling of Longitudinal and Survival
Data.
#R Code on JM Function
library(“JM”)
fitLME < lme(log(serBilir) ∼ drug × year,
random = ∼ 1 | id, data = pbc2)
fitSURV < survreg(Surv(years, status2) ∼ drug,
data = pbc2.id, x = TRUE)
fitJOINT < jointModel(fitLME, fitSURV, timeVar =“year”)
plot(fitJOINT, 3, add.KM = TRUE, col =“red”,lwd = 2)
par(mfrow = c(2, 2))
plot(fitJOINT)
Joint Longitudinal and Survival Analysis 167
Covariance modeling
Abstract
The mixed-effect model is useful to analyze the longitudinal data. Conven
tionally, the patient-specific changes are considered as random. The random
effect model is supposed to perform the analysis. However, the random effect
model found that the correlation between all the observations is the same. It
assumed that the correlation coefficient is the same for all the patients who
are the same as well. Nevertheless, these assumptions are not correct always.
Correlation always depends on the time gap between the observations. It is pa
tients specific. It is better to use a random coefficient model while the response
variable is linked as the time of interest. It suits to allow the random slopes
varies between the patients. Different covariance structures are detailed. Co
variance structure handling techniques are illustrated with R and OpenBUG
software.
13.1 Introduction
It is assumed that the patients will have different covariance, and the ran
dom effect model can not capture it. The covariance structure can be defined
separately from random effects. It helps to consider the specific pattern of
covariance for each visit. Sometimes, visit wise observations are considered as
same for patients. Now for a clinical trial, a specific pattern across the periods
could be defined same between the observations appeared on the same pa
tients. We do define the covariance pattern by matrix R. This matrix provides
that the observations are correlated. It is presented as
⎛ ⎞
R1 0 0 0 0
⎜ 0 R2 0 0 0⎟
⎜ ⎟
R=⎜ ⎜ 0 0 R 3 0 0 ⎟
⎟ (13.1)
⎝0 0 0 R4 0 ⎠
0 0 0 0 R5
169
170 Bayesian Approaches in Oncology Using R and OpenBUGS
σ12
⎛ ⎞
θ12 θ13 θ14
⎜θ12 σ22 θ23 θ24 ⎟
R=⎜ ⎟ (13.3)
⎝θ13 θ23 σ32 θ34 ⎠
θ14 θ24 θ34 σ42
(III) Compound symmetry
σ12
⎛ ⎞
θ12 θ13 θ14
⎜θ12 σ22 θ23 θ24 ⎟
R=⎜ ⎟ (13.4)
⎝θ13 θ23 σ32 θ34 ⎠
θ14 θ24 θ34 σ42
(IV) Toeplitz
σ2
⎛ ⎞
θ1 θ2 θ3
⎜ θ1 σ2 θ1 θ2 ⎟
R=⎜ ⎟ (13.5)
⎝θ13 θ23 σ32 θ34 ⎠
θ14 θ24 θ34 σ42
Covariance modeling 171
There are two methods to handle the covariance pattern for repeatedly mea
sure data: (I) Random effect model, (II) Multivariate normal distribution. The
random effect model can address the compound symmetry pattern. However, it
is required to perform the sphericity test before assuming the compound sym
metry model. In case the sphericity test failed then it is necessary to achieve
multivariate normal distribution. The multivariate normal distribution can
consider the general covariance pattern. The multivariate normal distribution
can address several covariance patterns. Perhaps, missing data inflated data
can not be dealing with the multivariate normal distribution. Multiple impu
tation techniques are needed to overcome the missing values, and after that,
covariance patterns can be taken care of suitable modeling. The suitable co
variance pattern is general covariance if the measurements differ between the
time points. Different variance for repeatedly measured data can be addressed
by (V) Heterogeneous uncorrelated, (VI) Heterogeneous Compound Symme
try, (VII) Heterogeneous first-order autoregressive, and (VIII) Heterogeneous
Toeplitz. (V) Heterogeneous uncorrelated
⎛ 2 ⎞
σ1 0 0 0
⎜ 0 σ22 0 0⎟
Ri = ⎜⎝0 2
⎟ (13.6)
0 σ3 0 ⎠
0 0 0 σ42
(VI)Heterogeneous Compound Symmetry.
σ12 ρσ1 σ2 ρ2 σ1 σ3 ρ3 σ1 σ4
⎛ ⎞
⎜ ρσ1 σ2 σ22 ρσ2 σ3 ρ2 σ2 σ4 ⎟
Ri = ⎜
⎝ρ2 σ1 σ3 ρσ2 σ3
⎟ (13.7)
σ32 ρσ3 σ4 ⎠
2
ρσ1 σ4 ρ σ2 σ4 ρσ3 σ4 σ42
(VII) Heterogeneous First-Order Autoregressive.
σ12 ρσ1 σ2 ρ2 σ1 σ3 ρ3 σ1 σ4
⎛ ⎞
⎜ ρσ1 σ2 σ22 ρ2 σ2 σ3 ρ2 σ2 σ4 ⎟
Ri = ⎜ 2
⎟ (13.8)
⎝ρ σ1 σ3 ρσ2 σ3 σ32 ρσ3 σ4 ⎠
3 2
ρ σ1 σ4 ρ σ2 σ4 ρ1 σ3 σ4 σ42
(VIII) Heterogeneous Toeplitz
σ12
⎛ ⎞
ρ1 σ1 σ2 ρ2 σ1 σ3 ρ3 σ1 σ4
⎜ρ1 σ1 σ2 σ22 ρ1 σ2 σ3 ρ2 σ2 σ4 ⎟
Ri = ⎜ ⎟ (13.9)
⎝ρ2 σ1 σ3 ρ1 σ2 σ3 σ32 ρ1 σ3 σ4 ⎠
ρ3 σ1 σ4 ρ2 σ2 σ4 ρ1 σ3 σ4 σ42
The covariance structure sometimes changes differently between the thera
peutic arms. It may be possible that the measurements are more correlated
172 Bayesian Approaches in Oncology Using R and OpenBUGS
library(“blme”)
data(“sleepstudy”, package=“lme4”)
fm1<-blmer(Reaction ∼ Days+(0+Days|Subject),sleepstudy,
cov.prior=gamma)
summary(fm1)
174 Bayesian Approaches in Oncology Using R and OpenBUGS
Scaled residuals:
Min 1Q Median 3Q Max
-3.5411 -0.5565 0.0534 0.6239 4.6236
Random effects:
Groups Name Variance Std. Dev.
Subject Days 58.49 7.648
Residual 833.79 28.875
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.405 4.000 62.847
Days 10.467 1.952 5.362
Scaled residuals:
Min 1Q Median 3Q Max
-3.5281 -0.5575 0.0533 0.6242 4.6146
Random effects:
Groups Name Variance Std. Dev.
Subject Days 55.9 7.477
Residual 837.3 28.935
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.405 4.009 62.717
Days 10.467 1.916 5.464
Scaled residuals:
Min 1Q Median 3Q Max
-2.3231 -0.5760 0.0324 0.5479 2.9331
Random effects:
Groups Name Variance Std. Dev.
Subject Days 0 0.00
Residual 2277 47.71
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.405 6.610 38.033
Days 10.467 1.238 8.454
Scaled residuals:
Min 1Q Median 3Q Max
-4.0094 -0.4665 0.0167 0.4698 5.2207
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 646.16 25.420
Days 40.33 6.351 0.00
Residual 644.93 25.396
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.405 6.948 36.18
Days 10.467 1.636 6.40
Scaled residuals:
Min 1Q Median 3Q Max
-3.9930 -0.4627 0.0128 0.4683 5.2088
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 634.52 25.190
Days 38.59 6.212 0.04
Residual 647.68 25.450
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.405 6.905 36.408
Days 10.467 1.606 6.517
#Prior as PenaltyFn
# Prior as PenaltyFn
Scaled residuals:
Min 1Q Median 3Q Max
-3.5104 -0.5588 0.0541 0.6244 4.6022
Random effects:
Groups Name Variance Std. Dev.
Subject Days 52.7 7.26
Residual 842.0 29.02
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.405 4.020 62.539
Days 10.467 1.869 5.599
Scaled residuals:
Min 1Q Median 3Q Max
-3.9688 -0.4628 0.0222 0.4674 5.1974
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 613.67 24.772
Days 35.13 5.927 0.06
Residual 650.42 25.503
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 251.229 6.822 36.825
Days 10.467 1.545 6.773
Scaled residuals:
Min 1Q Median 3Q Max
-3.9442 -0.4592 -0.0049 0.4604 5.1673
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 63677.9 252.34
Days 141.6 11.90 0.88
Residual 655.0 25.59
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.03354 0.70696 0.047
Days 0.04581 0.64477 0.071
Bayesian in Diagnostics
Test Statistics
183
Chapter 14
Abstract
It becomes an attractive choice to get a higher blockade of cancer cell pro
gression by Molecular Targeted Agents (MTA). The statistical methodology
for MTA has required attention. The literature on early phase dose-finding
studies with MTA is minimal. The dose-finding design in the MTA setting is
attempted by continuously updating the influence of the biological agent. In
this chapter, the Bayesian approach is presented to perform the linear mixed
effect models. The model has performed with the Markov chain Monte Carlo
(MCMC) method. It provides an algorithm to assign different possible doses
for each subject. It helps to define the optimum treatment on MTA value.
It is potent enough to accumulate the doses information by capturing their
variability, an overlooked area in the conventional model. A simulation study
is performed to create dose values. This work is suitable for Phase-I dose with
out the MTD or DLT. The intention is to look for specific biomarker value
by contributing effect of different dose. The illustration in this work will help
to build a dose-response curve to detect the best effective treatment through
MTA in a group of patients.
14.1 Introduction
The likelihood estimation is one way to perform a linear mixed effect
model. However, another alternative is Bayesian inference. Earlier, it was chal
lenging to achieve Bayesian mixed-effect model through R. However, R with
“glm” and “lmer” package becomes a useful tool to perform a mixed-effect
model with R. But these two packages helpful to work with likelihood-based
models. It is a fact that Bayesian is not a replacement of maximum likelihood
estimation. However, Bayesian is easy to perform in recent years. It requires
to the known fundamental concept of Bayesian before application. Hypothesis
185
186 Bayesian Approaches in Oncology Using R and OpenBUGS
p(θ|y)∞l(θ|y)p(θ) (14.1)
The term p(θ|y) used to define the posterior probability. We can only un
derstand the parameter from the data. It can give us predictions. The prior
probability is p(θ). The specification of the prior is the challenging task to
explore Bayesian modeling. The conventional dose-finding trial is based on
controlling the tolerable dose. Now the cytotoxic agents are tested about the
maximum tolerated dose (MTD). Now it is anticipated that drug response
will be higher through high doses. However, the toxicity will also increase
with increased dose. Similarly, the probability of toxicity will also incline. The
method used to identify the best effective dose is well known as rule-base
as a 3 + 3 design. Now the model-based design can be classified as Contin
ual Reassessment Method (CRM) [74, 75], and modified CRM [76, 77, 78],
Time-to-Event CRM (TITE-CRM) [79] and Escalation with Overdose Con
trol (EWOC) design [80]. Now the challenge is that a small sample size needs
to be lesser to take any decision about drug-dose. The maximum tolerable dose
that any patient can tolerate is defined as Dose Limiting Toxicity(DLT). In
this continuation, it is required to promote the model-based designs [81, 82].
But making any decision about effective dose by looking MTD is not enough
[83, 84, 85, 86]. The drugs or other essences promote tumor progression [87].
It becomes an attractive choice to get a higher blockade of cancer cell progres
sion by molecular targeted agents (MTA) .The transduction pathway by the
bypassing mechanism [88]. The statistical methodology for MTA is required to
apply. The literature on early phase dose-finding studies with MTA is minimal.
The dose-finding design in the MTA setting is attempted by continuously up
dating the influence of the biological agent [89]. Recently, there is an attempt
to compare the dose-efficacy by curves [90]. The dose-toxicity modeling can
be performed by change-point analysis [91] is performed to estimate the best
effective dose in the MTA setup. Several methodologies are about balancing
between dose-efficacy and dose-toxicity. However, the preliminary assumption
that the increase of dose will be more efficient for treatment. Alternatively, it
is also observed that increasing dose can reduce the dose efficiency [89, 91]. It
is aimed to develop a novel dose-finding model of MTA. It is required to the
point that best effective dose in MTA or better called the optimum biological
dose (OBD). There are computationally possibilities to work with Bayesian
inference by a linear mixed effect model [92, 93, 94] to get the OBD. Ini
tial attempts motivate us to propose a model with a mixed effect model set
up to deal with OBD. The dose-Response model is performed. Different dose
levels are defined as covariates on treatment efficacy. The autoregressive [95]
and dynamic linear mixed effect modeling [96] are considered. In contrast, a
flexible-dose regime is presented to treated patients to obtain the best effec
tive dose. The non-linear varying coefficient is considered to run the linear
mixed-effect model. The best effective dose is obtained by a linear setting
Bayesian Inference in Mixed-Effect Model 187
The response value measured as the vector of responses from 0 to t for the
individuals (i = 1, ..., N ). The prior to baseline measurements are defined by
(h)
superscript (h) as Yi,t and baseline measurements by Yi,0 and the k-th time
point measurements are defined as Yi,k . Further, the administered doses are
defined as
(d)
xi,t = (Xi,1 , Xi,2 , ..., Xi,t )T (14.3)
for the time point from 1 to t. The doses are defined as time dependent
covariates. The doses administered between time points t − 1 and t are defined
as Xi,t . The time of the last measurement for the ith subject is defined as Ti .
The likelihood function is defined as
(h) (d)
L(θ, ψ) = f (Yi,Ti , Xi,Ti |Zi , θ, ψ) (14.4)
The parameter θ and ψ are adopted to define the measurement process and
dose process, respectively. The time independent covariates are defined by Zi ,
188 Bayesian Approaches in Oncology Using R and OpenBUGS
Yi,t = ρYi,t−1 + (βint + bint i ) + (βdose + bdose i )Xi,t + �i,t − ρ�i,t−1 (t > 0)
(14.16)
The term Yi,t and Xi,t are the representative of response and dose con
tribution for the i-th subject at time point t, respectively. The fixed-
effect determined by β = (βbase , βint , βdose )T and random-effects by bi =
(bbase i , bint i , bdose i )T for the baseline, follow-up intercept, and follow-up
dose effect, respectively. It is assumed that the random-effect, i.e., bi , follows
the normal distribution with mean zero and covariance structure, i.e., unstruc
tured as G. The assumption of �i considered as N (0, σ 2 ). The autoregressive
error is captured through �i . Further, the marginal representation of model
for (t > 0) is represented as
t
�
Yi,t = ρt (βbase +bbase i + ρt−j {(βint +bint i )+(βdose +bdose i )Xi,j +�i,j }
j=1
(14.17)
The dose effect at time point s on the response at time t(t ≥ s) is defined as
ρt−s (βdose + bdose i )Xi,s . It will decline while the time progressed. However, if
the dose X provided repeatedly with time, then the asymptotes for the popu
lation mean and the ith subject is defined as (1 − ρ)−1 (βint + βdose X)and(1 −
ρ)−1 (βint + βdose + (βdose X + bdosei X) respectively.
Doses D̄ D̂ DIC pD
2
15 mg/m 85.7 83.68 87.73 2.02
12.5 mg/m2 93.75 91.45 96.05 2.29
10mg/m2 20.78 20.25 21.31 0.53
7.5 mg/m2 83.28 81.13 85.42 2.14
5 mg/m2 84.67 83.39 85.94 1.27
14.10 Discussion
Our proposed method provides an algorithm to assign different possible
doses for each subject. It helps to define the optimum dose on MTA value.
It is potent enough to accumulate the doses information by capturing their
variability, an overlooked area in the conventional model. A simulation study
is performed to create dose values. This work is suitable for Phase-I dose with
out the MTD or DLT. The intention is to look for specific biomarker value by
contributing effect of different doses [126]. The continual reassessment method
is found suitable to work with OBD. It is detected by highest toxicity label
[127]. The OBD detection technique is performed through two-stage clinical
trial design [128, 129]. Further, the dose-finding design is performed with OBD
by adaptive trial on molecularly targeted agents. The low-dose chemotherapy
is found suitable to suppress the tumor vessel growth and controlling the dam
age in vascular endothelial cells [130]. The semiparametric approach proposed
to detect the whole dose levels of MC by OBD [131]. In survival analysis
with MCMC iteration method, the possible threshold value for OBD serves
the same challenge [132].The high dose of chemotherapy with cisplatin can
produce a more toxic effect [130]. The targeted MC on molecularly targeted
Bayesian Inference in Mixed-Effect Model 195
model;
{
#prior distribution for beta
for(i in 1:10){beta[i]∼ dnorm(0,.0001) } #Dose=X
for(i in 1:N1){for(j in 1:M1)
{Y1[i,j] ∼ dnorm(mu1[i,j],tau)}
for(i in 1:N1){for(j in 1:M1)
{
mu1[i,j]< beta[1]+beta[2] ×time1[j]+
b[i,1]+b[i,2]×time1[j]}}
for (i in 1:N5){for (j in 1:M5)mu5[i,j]< beta[9]
+beta[10]×time4[j]+b[i,9]+b[i,10]×time5[j]}
for(i in 1:N1)b[i,1:10] ∼ dmnorm(vu[],Omega[,])}
for (
i in 1:10){vu[i]<-0
Omega[1:10,1:10]{ dwish(R[,],10)
Sigma[1:10,1:10]<-inverse(Omega[,])
tau ∼ dgamma(.00001,.00001)
Var<-1/tau
}
196 Bayesian Approaches in Oncology Using R and OpenBUGS
#Model 2
model;
{
beta[i] ∼ dnorm(0.0, 0.001) # Dose=X
for(i in 1:N1){ for (j in 2:M1)Y1[i,j] ∼ dnorm(mu1[i,j],tau) }}
for(i in 1:N1){for (j in 2:M1)
mu1[i,j]<-beta1[i]×(1-rho)+beta2[i]×(time1[j]
rho ×time1[j-1])+rho×Y1[i,j-1]}}
for(i in 1:N1){ Y1[i,1] ∼ dnorm(mu1[i,1],tau)}
for(i in 1:N1){ mu1[i,1]<-beta1+beta2 ×time1[1]}
tau ∼ dgamma(.001,.001)
rho ∼ dbeta(1,1)
sigma<-1/tau
}
Chapter 15
Concordance Analysis
Abstract
One of the parameter to promote cancer research is dependent on diagnos
tics technique expansion. However, every time new diagnostic technique need
to test in presences of an actual diagnostic test. Several times disease measured
by continuous measurement. Now concordance correlation coefficient (CCC)
is one of the tools in diagnostics test statistics. It performs as a tool in agree
ment analysis. It supports continuous measurement. The continuous variable,
i.e., tumor size measured or observed with a different diagnostic procedure,
can be computed by CCC. This chapter shows the Bayesian extension of the
CCC for continuous data. Now Bayes factor is illustrated by real-life data to
define the best diagnostics tool. The approach illustrated in this work provides
the researchers with an opportunity to find out the most appropriate model
for specific data and apply CCC to fulfil the desired hypothesis.
15.1 Introduction
One of the significant proportion of oncology research is devoted to diag
nostic research. The risk of misdiagnosis always stands in oncology. The diag
nosis is not easy and straightforward. The complex diagnosis and differential
diagnosis needs to be similar to define conclusive remark about the presence
of the disease condition. Sometimes physicians make a diagnosis error. Some
times two physicians disagree on a diagnosis among them. However, they look
at the same CT scan or MRI report. It becomes confusing while compare CT
scans with MRI reports. Alternatively, two physicians independently comment
same or different diagnosis tets. The disease status detection is not an easy
task. There is always a dilemma to define cancer staging. Due to rapid disease
progression and disease complexity, it is not an easy task to detect tumor stag
ing. Now it requires to make a conclusive remark about diagnosis test. The
agreement comes while we define the diagnosis status. Since the tumor mea
sured into size or dimension, so the tumor measurement data generated as a
197
198 Bayesian Approaches in Oncology Using R and OpenBUGS
continuous variable. The continuous variable, i.e., tumor size measured or ob
served with a different diagnostic procedure can be concluded by concordance
correlation, in this chapter, the concordance correlation illustrated with tumor
size and tumor volume measurements. While the measured outcome defined
with the binary variable, then it is presented with kappa statistics. Sometimes,
the weighted kappa also useful. The ‘inter-rater agreement’ and ‘inter-device
agreement’ required to quantifying the extent of agreement between two or
more examiners or devices is of primary interest.Widely known as the intr
aclass correlation measurement is a choice [139, 140, 141]. The application
of the within-subject coefficient of variation found attractive in this context
[142]. Now the Inter-rater agreement is important, for example, when different
raters evaluate the severity of a specific disease during a clinical trial, and the
reliability of each of their subjective evaluation is to determine by measuring
agreement among the raters [143].
#Concordance Correlation Coefficient
Now, µx and µy are the average of the two continuous variables. The
term σx2 and σy2 are the corresponding variance. Now ρ is the correlation
coefficient between the two variables.
k k
1� 2 1� 2
E( Si ) = σ (15.5)
k i=1 k i=1 i
k−1 k k−1 k
1 � � 1 � � σ2
E( (Ȳi −Ȳi� )2 ) = (µi −µi� )2 + e (15.6)
k(k − 1) i=1 � k(k − 1) i=1 � n
i =i+1 i =i+1
�k−1 �k
Now, i=1 i� =i+1 (Y¯i − Ȳi� )2 is a biased estimator of
�k−1 �k 2
i� =i+1 (µi − µi ) .
�
i=1
The corresponding unbiased estimator is defined as
k−1 k
� � k(k − 1) 2
(Ȳi − Ȳi� )2 − σe = W − Z (15.7a)
i=1 i� =i+1
n
k−1
� k
�
W = (Ȳi − Ȳi� )2 (15.7b)
�
i=1 i =i+1
k−1 k
k(k − 1) � �
Z= (Si2 + Si2� − 2Sii� )2
n i=1 � i =i+1
(15.7c)
Finally, unbiased estimate of ρCCC is defined as ρ̄CCC
�k−1 �k
2 i=1 i� =i+1 Sii�
ρ̄CCC = (15.8)
P −Q
where
k
� k−1
� k
�
P = (k − 1) Si2 + (Ȳi − Ȳi� )2 (15.9)
i=1 i=1 i =i+1�
Concordance Analysis 201
k−1 k
k(k − 1) � �
Q= (Si2 + Si2� − 2Sii� ) (15.10)
n i � i =k+1
�
Suppose i and i represents different reading for the jth patients. The
method number assigned by k = 2, i.e., CT Scanner and MRI Scanner. The
responses are defined as Yij and Yi� j of the same patients � j � . Now the linear
model for two scanners are presented as
k−1 k
2 1 � �
σrandom = (µi − µi� )2 (15.13)
k(k − 1) i=1 �
i =i+1
and
k−1 k
2 2 � � 1 2
σerror = (σ + σi2� − 2σii� )2
k(k − 1) i=1 � 2 i
i =i+1
k k−1 k
1 � 2 � �
= σi2 − σ � (15.14)
k i=1
k(k − 1) i=1 i=i+1 ii
M1 : Y = θ + βX + �ii� j (15.20)
M0 : Y = θ + �ii� j (15.21)
The model (M1 ) states the presence of CCC and absence of it by model (M0 ).
Now, the Bayes Factor through JZS is defined [148, 149, 150] as,
∞
(n/2)1/2
�
(n−1)
BF10 = × (1 + g)(n−2)/2 × [1 + (1 − r2 )g]− 2 (15.22)
τ (1/2) 0
p(Y (M1 )
BF10 = (15.23)
p(Y (M0 )
If the value of BF10 becomes more than 1, it states about the presences
of CCC otherwise not. The statistical test can be performed with two Hy
potheses: the Null Hypothesis, H0 as given in model (M0 ) and the alternative
hypothesis H1 or (M1 ). The prior probability of null hypothesis is assigned as
p(M0 ) and alternative as p(M1 ). Therefore, Baye’s theorem is applied to the
observed data to compute the posterior probability of the Hypothesis. The
appearance of the posterior probability of alternative Hypothesis is computed
as
p(Y |M1 )p(M1 )
p(M1 /Y ) = (15.24)
p(Y |M1 )p(M1 ) + p(Y |M0 )p(M0 )
Concordance Analysis 203
The term P (Y |M1 ) is the marginal likelihood of the data for alternative hy
potheses. Further, the marginal likelihood is calculated as
� ∞
p(Y |M1 ) = p(Y |θ, M1 )p(θ|H1 )dθ (15.25)
θ
p(M1 |Y ) p(M1 )
= BF10 × (15.26)
p(M0 |Y ) p(M0 )
15.4 Results
Initially, we performed a descriptive data analysis. The classical test car
ried on CCC between tumor size measurements. Under the null hypothesis is
assumed that the concordance correlation coefficient is zero as ρccc = 0. The
function named as cccUst is available in the “cccrm” package of R. Similarly,
CCC is performed on tumor volume. Demographic profile of the patients pro
vided in Table 15.1. The Bayesian posterior estimate of CCC detailed in Table
15.2 and Table 15.3. In Bayesian, the same models selected to fit the correla
tion coefficient to measure the association. It implies that the precision of the
methods is moderate. The correlation coefficients are not very much different
from calculated classical CCC detailed in Table 15.3. It can be concluded by
having high estimates for accuracy. The estimates obtained through Bayesian
methodology is presented in Table 15.3 and Table 15.4. The same models are
used to obtain the correlation coefficient to obtain the association. The result
shows that the precision is moderate enough. Now the correlation coefficients
are not different from calculated conventional CCC in Table 15.3 with high
accuracy value. In Model 1, the posterior estimate of the concordance correla
tion coefficient on tumor size is obtained as 0.69. The SD is obtained as 0.23.
The HPD is obtained as (0.13,0.95). Similarly, the posterior estimate on tumor
volume is measured as 0.75(0.13). The HPD in a similar context is measured
as (0.09,0.90). It shows that the confidence interval of the estimates is closed.
Finally, the Bayes factor is measured as 9.53.
Concordance Analysis 205
15.5 Conclusion
The CCC is an available tool to work on continuously measured variables
for agreement analysis. The presence of covariance, variance, and mean easily
handled by CCC [146, 147]. Perhaps, it may be generated with biased esti
mates or in the presence of error [152]. The marginal modeling found attractive
to reduce bias value [144]. There is an application of CCC by stratified model
ing [153]. The wider extension of CCC is known as the generalized concordance
correlation coefficient. The variance component mechanism is adopted for gen
eralized concordance correlation by [139]. There is a robust outcome observed
through the application of CCC [153]. It is also found useful in time-to-event
data analysis. In oncology, the time-to-event measurements are nonignorable.
The CCC found attractive for censoring data [154, 145]. The extension also
performed for longitudinal measurements [151]. There is an attempt by the
linear mixed effect model by using the generalized CCC (GCCC). The com
putational algorithm is developed to boost up the Bayesian methodology in
CCC. Initially, the data fitted by GLMM and later on the Bayesian CCC is
considered in this direction. It provides scope to extend it by different types
of hypothesis testing. The generation of the p-value can be overlooked in this
Concordance Analysis 207
Abstract
Currently, the proliferation of knowledge of cancer promotes cancer man
agement by gene therapy. High-dimensional data measure genomic informa
tion of cancer patients. Motivated by these essential applications in cancer
research, there has been a dramatic growth in the development of statisti
cal methodology in the analysis of high-dimensional data, mainly related to
regression model selection, estimation and prediction. The high-dimensional
data is useful to explore the cancer progression, especially for disease man
agement. However, the analyst faces difficulties to deal with high-dimensional
data where the feature dimension p grows exponentially. This chapter is ded
icated to shows the Bayesian approach in high-dimensional data. Variable
selection in high-dimensional data is one of the challenges. Bayesian variable
selection strategy is presented in this work. Different R packages for variable
selection steps are illustrated. This chapter will help to consider Bayesian in
high-dimensional variable selection.
16.1 Introduction
The base sequence procedure is now well developed. A large amount of ge
netic data is available publically. This large amount of data challenged us for
the development of analytical tools for analyzing such accumulated data. It
is essential to analyze such extensive genetic data by advanced computational
methodology coupled with statistical techniques for processing genetic data.
Similarly, microarray also provides gene expression information. Commonly,
ten of thousands of variables obtained by a single experiment. Dataset with
this large number of variables are known as high-dimensional data. Earlier, this
used to measure gene expression in serum or tissue. Currently, it used for DNA
methylation expression. Tremendous progress in a microarray experiment ob
served. Similar, growth in the statistical analysis method followed. Primarily,
the gene effect classification is the main challenge in high-dimensional data
209
210 Bayesian Approaches in Oncology Using R and OpenBUGS
analysis. Filter out a few variables from ten of thousands of variables is the task
of the gene classification. The conventional approach for statistical methodol
ogy is known as an unsupervised approach. But currently, the direction shifted
from unsupervised to supervised approach. The supervised approach help to
define the characteristics (Y ) to gene expression data (X).
The objective of high-dimensional data analysis is to make clusters be
tween the observations. The preparation of clustering is useful to merge the
representation into a similar group of individuals. It is required to take a sim
ilar type of treatment decision for the same cluster. There are different types
of clustering methods, and we will explore through R.However, the statisti
cal model is useful to predict an individual for specific clustering. Mainly, it
is known as a predictive model for survival data. The method is useful, like
penalized partial log-likelihood (PLL) for the estimation of the regression co
efficients. The widely adopted method is lasso [157, 158]. The lasso defines the
L1 penalty function. The amount of variations of the penalization is presented
as SCAD [159, 160]. Other methods like the elastic net [161] and Dantzig
selector[162]. The dimension reduction is also useful and can define by p ≥ n.
Methods like tree-based methods, as random survival forests [163] now well
explored. The conventional approach stands with a Cox proportional hazards
model. It works with time-to-event data. The procedure is to use the predic
tive PLL and after that, the cross-validation. It helps to variable selections
from a ten of thousand of variable [164]. As an alternative to the PLL, the
prediction error curve works well [165]. It helps by identifying the probabilities
of survival from a risk prediction model. The prediction of the error curve is
obtained by the expected squared difference of the risk prediction with actual
event status by Brier score at a given time point. In a Cox regression model,
there is a possibility to obtain unstable estimated regression coefficients. But
stability can be achieved by maximizing the penalized partial log-likelihood.
A penalty function of the regression coefficients subtracted from the partial
log-likelihood. It helps to choose the optimal weight of the penalty function
through maximizing the predictive value of the model [165].
hr<-hclust(as.dist(1-cor(t(geneExpmatrix),method=“pearson”)),
method=“complete”)
hc<-hclust(as.dist(1-cor(geneExpmatrix,method=“spearman”)),
method=“complete”)
mycl<-cutree(hr, h=max(hr$height)/1.5)
mycolhc<-rainbow(length(unique(mycl)),start=0.1,end=0.9)
mycolhc<-mycolhc[as.vector(mycl)]
mycol<-colorpanel(40,“darkblue”,“yellow”,“white”)
heatmap.2(geneExpmatrix,Rowv=as.dendrogram(hr),
Colv=as.dendrogram(hc),col=mycol,scale=“row”,density.info=
“none”,trace=“none”RowSideColors=mycolhc)
212 Bayesian Approaches in Oncology Using R and OpenBUGS
library(“gplots”)
df<-read.csv(“import the data from hard drive.csv”, header=TRUE)
geneExpmatrix < as.matrix(df[2:25])
pca < prcomp(geneExpmatrix, scale=T)
summary(pca)
plot(pca$x, pch=20, col=“blue”, type=“n”)
text(pca$x, rownames(pca$x), cex=0.8)
High-Dimensional Data Analysis 213
for the hazard λ(t|Zi ), i.e., the instantaneous risk of having an event at time
t, given the covariate information in Zi , for individual i. The λ0 (t) is used for
�
baseline hazard function. Suppose the parameter vector beta = (β1 , ..., βp ) is
defined by maximizing the PLL with
n n
� � � �
J(β) = δi (zi β − log( J(tj ≥ ti )exp(zj β)) (16.2)
i=1 j=1
Now the estimated vector of coefficients are fitted by β̂b− after leaving the bth
part of the data out. Now, xb is the bth part of the data with b = 1, ..., B.
the rows of the design matrix X. The proportional hazards (PH) model can
be defined as
h(t|Xi ) = h0 (t)exp(XiT β) (16.7)
The vector of the regression coefficients R can be defined by maximizing the
partial log-likelihood as
n
� �
pl(β) = di (XiT β) − In( exp(XjT β)) (16.8)
i=1 tj ≥ti
Suppose the baseline hazard function is defined as h0 (t). Suppose the event
point is defined as ti with di = 1. Further, the Breslow estimator can be
formulated as �
ĥ0 (ti ) = 1/ exp(XjT β̂) (16.9)
tj ≥tj
with
�
H0 (t) = h0 (S) (16.11)
s≤t
library(“BVSNLP”)
n < 100
p < 40
set.seed(123)
Sigma < diag(p)
full < matrix(c(rep(0.5, p∗ p)), ncol=p)
Sigma < full + 0.5∗ Sigma
cholS < chol(Sigma)
Beta < c(-1.8, 1.2, -1.7, 1.4, -1.4, 1.3)
X = matrix(rnorm(n×p), ncol=p)
X = X X < scale(X)
beta < numeric(p)
beta[c(1:length(Beta))] < Beta
XB < X
surtimes < −rexp(n,exp(XB))
censtimes < rexp(n,0.2)
times < pmin(surtimes,censtimes)
status < as.numeric(surtimes <= censtimes)
exmat < cbind(times,status,X)
L < 10; J < 10
d < 2 ∗ ceiling(log(p))
temps < seq(3, 1, length.out = L)
tau < 0.5; r < 1; a < 6; b < p-a
nlptype < 0
cur cols < c(1,2,3)
nf < 0
High-Dimensional Data Analysis 217
# Runthe Function
length(coxouthash_key)
length(coxout$hash_key)
[1] 156
which(coxoutmax_model}>0)
which(coxout$max_model>0)
[1] 1 2 3 4 5 6
coxoutmax_prob
[1] -247.6858
218 Bayesian Approaches in Oncology Using R and OpenBUGS
library(“BVSNLP”)
n < 200
p < 40
set.seed(123)
Sigma < diag(p)
full < matrix(c(rep(0.5, p×p)), ncol=p)
Sigma < full + 0.5×Sigma
cholS < chol(Sigma)
Beta < c(-1.7,1.8,2.5)
X < matrix(rnorm(n×p), ncol=p)
X < X×cholS
colnames(X) < c(paste(“gene ”,c(1:p),sep=“ ”))
beta < numeric(p)
beta[c(1:length(Beta))]<-Beta
Xout <- PreProcess(X)
X <- Xout$X
XB < X×beta
probs < as.vector(exp(XB)/(1+exp(XB)))
y < rbinom(n,1,probs)
X < as.data.frame(X)
train idx < sample(1:n,0.×n)
test idx < setdiff(1:n,train idx)
X train < X[train idx,]
y train < y[train idx]
bout < bvs(X train, y train, prep=FALSE,
family=“logistic”,
mod prior =“beta”,niter=50)
BMAout<-predBMA(bout,X,y,prep=FALSE,logT=FALSE,
train idx=train idx,test idx=test idx,
family=“logistic”)
BMAout$auc
0.6953545
High-Dimensional Data Analysis 219
#BMA Prediction
$auc
[1] 0.6953545
$roc_curve
TPR FPR
[1,] 1.00000000 1.00000000
[2,] 0.89248140 0.98816334
[3,] 0.83054193 0.97708525
[4,] 0.83054193 0.94737135
[5,] 0.77365963 0.94283795
[6,] 0.73734265 0.94283795
[7,] 0.70102566 0.93996391
[8,] 0.66470867 0.93708988
[9,] 0.62839169 0.93708988
[10,] 0.60986843 0.88243937
[11,] 0.54350027 0.88019476
[12,] 0.54075684 0.85048086
[13,] 0.54075684 0.82076696
[14,] 0.50443986 0.82076696
[15,] 0.49018789 0.76967992
[16,] 0.47424697 0.72165646
[17,] 0.47424697 0.69194256
[18,] 0.46870614 0.66222866
[19,] 0.43238916 0.66222866
[20,] 0.39607217 0.66222866
[21,] 0.35975518 0.66222866
[22,] 0.35426833 0.62798136
[23,] 0.31795135 0.62798136
[24,] 0.30924020 0.56879746
[25,] 0.27292322 0.56167016
[26,] 0.27292322 0.53195626
[27,] 0.26252634 0.48879049
[28,] 0.25072939 0.44575389
[29,] 0.20422736 0.43512925
[30,] 0.20422736 0.39891747
[31,] 0.20422736 0.36920357
[32,] 0.20422736 0.32873852
[33,] 0.15120017 0.31770459
[34,] 0.13694820 0.27798936
[35,] 0.12290807 0.22769629
[36,] 0.08307837 0.21862949
[37,] 0.03847714 0.21070557
[38,] 0.02744947 0.15743430
[39,] 0.02169358 0.12772040
[40,] 0.01818087 0.09163778
[41,] 0.01298243 0.04376265
[42,] 0.00000000 0.00000000
220 Bayesian Approaches in Oncology Using R and OpenBUGS
FIGURE 16.4: True positive rate and false positive rate relation.
install.packages(“glmnet”)
install.packages(“Matrix”)
install.packages(“tidyverse”)
install.packages(“broom”)
install.packages(“survival”)
#Load Packages
library(“glmnet”)
library(“Matrix”)
library(“tidyverse”)
library(“broom”)
library(“survival”)
High-Dimensional Data Analysis 221
#Output1
#Output2
log(cv.fit2$lambda.min)
-2.229185
cv.fit2$lambda.min
0.1076161
#Output3
#Output4
active.k.vals = est.coef[active.k]
active.k.vals
numeric(0)
222 Bayesian Approaches in Oncology Using R and OpenBUGS
install.packages(“bama”)
library(bama) # call the library bma#
Y < bama.data$y # generate continous variable of size 1000#
A < bama.data$ a # generate continous variable of size 1000#
M < as.matrix(bama.data[, paste0(“m”, 1:100)],
nrow(bama.data))
#Create the Matrix of having M#
C1 < matrix(1, 1000, 1) #Create one covariate of C1#
C2 < matrix(1, 1000, 1)#Create one covariate of C2#
beta.m < rep(0, 100)
alpha.a < rep(0, 100)
set.seed(12345)
#bama function is used filter influencing variable in the regression#
out<-bama(Y,A,M,C1,C2,beta.m,alpha.a,burnin=1000,
ndraws = 100)
summary < summary(out)
head(summary)
High-Dimensional Data Analysis 223
16.9.2 countgmifs
This package is useful to fit negative binomial nd Poisson regression by
stagewise manner. It is illustrated in the following text:
#R Code on Countgmifsp
install.packages(“countgmifs”)
library(“countgmifs”)
set.seed(10)
n < 50 # Sample size of the data
p < 500 #Covariates required for the data
intercept< .5
#generate parameter for 500 covariates#
beta< c(log(1.5), log(1.5),
-log(1.5), -log(1.5),
-log(1.5), rep(0,495))
alpha< 0.5 # Intercept
x< matrix(rnorm(n×p,0,1), nrow=n, ncol=p,
byrow=TRUE) #Covariate values
colnames(x)< paste(“Var”,1:p, sep=“ ”)
mu< exp(intercept + crossprod(t(x),beta))
y< rnbinom(n=n, size=1/alpha ,mu=mu) # Discrete response
data< data.frame(y,x)
nb<-countgmifs(y 1 , data=data, offset=NULL,
x=x, epsilon=0.01, tol=0.001,
scale=TRUE, verbose=FALSE)
coef.AIC<-coef(nb, model.select=“AIC”)
coef.AIC[coef.AIC!=0]
predict(nb, model.select=“AIC”)
plot(predict(nb, model.select=“AIC”), y)
plot(nb)
16.9.3 fastcox
This R package is used for Lasso and Elastic-Net Penalized Cox’s Regres
sion in High Dimensions Models by the Cocktail Algorithm. In this package,
the majorization-minimization principle is applied of the elastic net penalized
Cox’s proportional hazard model. It is illustrated on the following page.
224 Bayesian Approaches in Oncology Using R and OpenBUGS
#R Code on Fastcox
install.packages(“fastcox”)
library(“fastcox”)
data(FHT)
m1<-cocktail(x=FHT$x,y=FHT$y,d=FHT$status,alpha=0.5)
predict(m1,type=“nonzero”)
plot(m1)
plot(m1,xvar=“lambda”,label=TRUE)
plot(m1,color=TRUE)
16.9.4 HighDimOut
The HighDimOut package is useful for Outlier Detection Algorithms in
High-Dimensional Data. It is helpful for outlier detection by unification
scheme. One example of running this package is detailed.
HighDimensional Data Analysis 225
library(“HighDimOut”)
library(“ggplot2”)
res.ABOD < Func.ABOD(data=TestData[,1:2], basic=FALSE,
perc=0.2)
data.temp < TestData[,1:2]
data.temp$Ind < NA
data.temp[order(res.ABOD,
decreasing = FALSE)[1:10],“Ind”]
< “Outlier”
data.temp[is.na(data.temp$Ind),“Ind”] < “Inlier”
data.temp$Ind < factor(data.temp$Ind)
ggplot(data = data.temp) + geom point(aes(x = x, y = y,
color=Ind, shape=Ind))
226 Bayesian Approaches in Oncology Using R and OpenBUGS
16.10 Discussion
Data science is an emerging field in oncology. Due to the availability of huge
gene data, it can not be avoided. Always analyst faces difficulties to deal with
high-dimensional data where the feature dimension p grows exponentially. The
sample size is fixed at n, but log(p) = 0(nα )for some α ∈ (0, 1/2). Currently,
modern technology, like microarray analysis and next-generation sequencing
data is available to generate genomics data. Large amounts of data containing
more than thousands of variables are available. It is common to collect gene
expressions from p > 20,000 genomics studies. These genomics data are having
a large sample size and a limited size p parameter. The large size gene data
analysis is a challenge. Consideration of not relevant features in the statistical
literature indeed may provide undesirable computing challenges. The chal
lenges are handled by variable screening and selection techniques. Among all
methodological development, the penalized approach is favored by researchers
with K-fold cross-validation.
Bibliography
227
228 Bibliography
[21] Per Kragh Andersen, John P Klein, and Susanne Rosthøj. Generalised
linear models for correlated pseudo-observations, with applications to
multi-state models. Biometrika, 90(1):15–27, 2003.
[22] Bernhard Haller, Georg Schmidt, and Kurt Ulm. Applying competing
risks regression models: an overview. Lifetime data analysis, 19(1):33–
58, 2013.
[23] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
[24] Hemant Ishwaran, Thomas A Gerds, Udaya B Kogalur, Richard D
Moore, Stephen J Gange, and Bryan M Lau. Random survival forests
for competing risks. Biostatistics, 15(4):757–773, 2014.
[25] G Schwarz. Estimating the dimension of a model. annals of statistics, 6
(2), 461–464, 1978.
[26] H Akaike. hon entropy maximization principle. iin pr krishnarah (ed.),
applications of statistics. Amsterdam: NorthjHolland, 1977.
[27] John P Klein and Melvin L Moeschberger. Survival analysis: techniques
for censored and truncated data. Springer Science & Business Media,
2006.
[28] Luc Duchateau and Paul Janssen. The frailty model. Springer science
& business media, 2007.
[29] Philip Hougaard. Shared frailty models. In Analysis of multivariate
survival data, pages 215–262. Springer, 2000.
[30] José Cortiñas Abrahantes, Catherine Legrand, Tomasz Burzykowski,
Paul Janssen, Vincent Ducrocq, and Luc Duchateau. Comparison of
different estimation procedures for proportional hazards model with ran
dom effects. Computational statistics & data analysis, 51(8):3913–3930,
2007.
[31] Virginie Rondeau, Yassin Mazroui, and Juan R Gonzalez. frailtypack:
an R package for the analysis of correlated survival data with frailty
models using penalized likelihood estimation or parametrical estimation.
Journal of statistical software, 47(4):1–28, 2012.
[32] Terry Therneau. coxme: mixed effects cox models. R package version
2.2-3. Vienna, Austria: R Foundation for Statistical Computing, 2012.
[33] Il Do Ha, Maengseok Noh, and Youngjo Lee. frailtyhl: A package for
fitting frailty models with h-likelihood. The R Journal, 4(2):28–36, 2012.
[34] Marco Munda, Federico Rotolo, Catherine Legrand, et al. Parfm: para
metric frailty models in uppercaseR. Journal of statistical software,
51(11):1–20, 2012.
230 Bibliography
[35] Andreas Wienke. Frailty models in survival analysis. CRC press, 2010.
[36] Ronald T Cotton, Hop S Tran Cao, Abbas A Rana, Yvonne H Sada,
David A Axelrod, John A Goss, Mark A Wilson, Steven A Curley, and
Nader N Massarweh. Impact of the treating hospital on care outcomes
for hepatocellular carcinoma. Hepatology, 68(5):1879–1889, 2018.
[37] Evan M Graboyes, Mark A Ellis, Hong Li, John M Kaczmar, Anand K
Sharma, Eric J Lentsch, Terry A Day, and Chanita Hughes Halbert.
Racial and ethnic disparities in travel for head and neck cancer treatment
and the impact of travel distance on survival. Cancer, 124(15):3181–
3191, 2018.
[38] Yuhui Chen, Timothy Hanson, and Jiajia Zhang. Accelerated hazards
model based on parametric families generalized with bernstein polyno
mials. Biometrics, 70(1):192–201, 2014.
[39] Haiming Zhou and Timothy Hanson. A unified framework for fitting
bayesian semiparametric models to arbitrarily censored survival data,
including spatially referenced data. Journal of the american statistical
association, 113(522):571–581, 2018.
[40] T Hakulinen and L Tenkanen. Regression analysis of relative survival
rates. Journal of the royal statistical society: series C (applied statistics),
36(3):309–317, 1987.
[41] Arun Pokhrel and Timo Hakulinen. Age-standardisation of relative sur
vival ratios of cancer patients in a comparison between countries, genders
and time periods. European journal of cancer, 45(4):642–647, 2009.
[42] PC Lambert, PW Dickman, CP Nelson, and P Royston. Estimating the
crude probability of death due to cancer and other causes using relative
survival models. Statistics in medicine, 29(7-8):885–895, 2010.
[43] Klemen Pavlič and Maja Pohar Perme. On comparison of net survival
curves. BMC medical research methodology, 17(1):79, 2017.
[44] Klemen Pavlič and Maja Pohar Perme. Using pseudo-observations for
estimation in relative survival. Biostatistics, 20(3):384–399, 2019.
[45] Maja Pohar Perme, Janez Stare, and Jacques Estève. On estimation in
relative survival. Biometrics, 68(1):113–120, 2012.
[46] Paola Rebora, Stefania Galimberti, and Maria Grazia Valsecchi. Us
ing multiple timescale models for the evaluation of a time-dependent
treatment. Statistics in medicine, 34(28):3648–3660, 2015.
[47] Margaret Sullivan Pepe and Motomi Mori. Kaplan-meier, marginal or
conditional probability curves in summarizing competing risks failure
time data? Statistics in medicine, 12(8):737–751, 1993.
Bibliography 231
[60] Dimitris Rizopoulos. Jm: An R package for the joint modelling of longi
tudinal and time-to-event data. Journal of statistical software (Online),
35(9):1–33, 2010.
[61] Dimitris Rizopoulos. The R package jmbayes for fitting joint models
for longitudinal and time-to-event data using mcmc. arXiv preprint
arXiv:1404.7625, 2014.
[62] Pete Philipson, Peter Diggle, Ines Sousa, Ruwanthi Kolamunnage-Dona,
Paula Williamson, and Robin Henderson. joiner: Joint modelling of
repeated measurements and time-to-event data. 2012.
[63] Cécile Proust-Lima and Benoit Liquet. Lcmm: an R package for esti
mation of latent class mixed models and joint latent class models. In
The R User Conference, useR! 2011 August 16-18 2011 University of
Warwick, Coventry, UK, page 66. Citeseer, 2011.
[64] Menggang Yu, Ngayee J Law, Jeremy MG Taylor, and Howard M San
dler. Joint longitudinal-survival-cure models and their application to
prostate cancer. Statistica Sinica, pages 835–862, 2004.
[65] Anastasios A Tsiatis, Victor Degruttola, and Michael S Wulfsohn. Mod
eling the relationship of survival to longitudinal data measured with er
ror. applications to survival and cd4 counts in patients with aids. Journal
of the american statistical association, 90(429):27–37, 1995.
[66] Michael S Wulfsohn and Anastasios A Tsiatis. A joint model for survival
and longitudinal data measured with error. Biometrics, pages 330–339,
1997.
[67] Geert Verbeke. Linear mixed models for longitudinal data. In Linear
mixed models in practice, pages 63–153. Springer, 1997.
[68] Dimitris Rizopoulos, Geert Verbeke, and Geert Molenberghs. Shared
parameter models under random effects misspecification. Biometrika,
95(1):63–74, 2008.
[69] Jose Pinheiro, Douglas Bates, Saikat DebRoy, Deepayan Sarkar, et al.
Linear and nonlinear mixed effects models. R package version, 3:57,
2007.
[70] D Bates, M Maechler, and B Bolker. The lme4 package, uppercaseR
package, version 2. 2007.
[71] Terry M Therneau and Thomas Lumley. Package survival. Survival
analysis published on CRAN, 2:3, 2014.
[72] Jimin Ding and Jane-Ling Wang. Modeling longitudinal data with non
parametric multiplicative random effects jointly with survival data. Bio
metrics, 64(2):546–556, 2008.
Bibliography 233
[96] Xu Xu, Min Yuan, and Partha Nandy. Analysis of dose–response in flex
ible dose titration clinical studies. Pharmaceutical statistics, 11(4):280–
286, 2012.
[97] Toshihiro Misumi and Sadanori Konishi. Mixed effects historical
varying-coefficient model for evaluating dose–response in flexible dose
trials. Journal of the royal statistical society: series C (applied statis
tics), 65(2):331–344, 2016.
[98] JQ Shi, B Wang, EJ Will, and RM West. Mixed-effects gaussian process
functional regression models with application to dose–response curve
prediction. Statistics in medicine, 31(26):3165–3177, 2012.
[99] Damla Sent¨
¸ urk and Hans-Georg M¨ uller. Functional varying coefficient
models for longitudinal data. Journal of the american statistical associ
ation, 105(491):1256–1264, 2010.
[100] Trevor Hastie and Robert Tibshirani. Varying-coefficient models. Jour
nal of the royal statistical society: series B (methodological), 55(4):757–
779, 1993.
[101] Wensheng Guo. Functional mixed effects models. Biometrics, 58(1):121–
128, 2002.
[102] Jian Qing Shi and Taeryon Choi. Gaussian process regression analysis
for functional data. CRC Press, 2011.
[103] Hulin Wu and Jin-Ting Zhang. Nonparametric regression methods for
longitudinal data analysis: mixed-effects modeling approaches, volume
515. John Wiley & Sons, 2006.
[104] Harvey Goldstein. Efficient statistical modelling of longitudinal data.
Annals of human biology, 13(2):129–141, 1986.
[105] Nan M Laird and James H Ware. Random-effects models for longitudinal
data. Biometrics, pages 963–974, 1982.
[106] Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. Fitting
linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823,
2014.
[107] Jose Pinheiro, Douglas Bates, Saikat DebRoy, Deepayan Sarkar, and
R Core Team. nlme: Linear and nonlinear mixed effects models. R
package version, 3(2), 2015.
[108] Elena N Naumova, Aviva Must, and Nan M Laird. Tutorial in biostatis
tics: evaluating the impact of critical periods in longitudinal studies of
growth using piecewise mixed effects models. International journal of
epidemiology, 30(6):1332–1341, 2001.
236 Bibliography
[122] Paul CD Johnson, Sarah JE Barry, Heather M Ferguson, and Pie Müller.
Power analysis for generalized linear mixed models in ecology and evo
lution. Methods in ecology and evolution, 6(2):133–142, 2015.
[123] Julien GA Martin, Daniel H Nussey, Alastair J Wilson, and Denis Réale.
Measuring individual differences in reaction norms in field and experi
mental studies: a power analysis of random regression models. Methods
in ecology and evolution, 2(4):362–374, 2011.
[124] Xianggui Qu. Linear mixed-effects models using R: A step-by-step ap
proach, 2014.
[125] Nicholas G Reich, Jessica A Myers, Daniel Obeng, Aaron M Milstone,
and Trish M Perl. Empirical power and sample size calculations for
cluster-randomized and cluster-randomized crossover studies. PLoS one,
7(4), 2012.
[126] Christophe Le Tourneau, J Jack Lee, and Lillian L Siu. Dose escalation
methods in phase I cancer clinical trials. JNCI: Journal of the national
cancer institute, 101(10):708–720, 2009.
[127] Wei Zhang, Daniel J Sargent, and Sumithra Mandrekar. An adaptive
dose-finding design incorporating both toxicity and efficacy. Statistics
in medicine, 25(14):2365–2383, 2006.
[128] Mei-Yin Polley and Ying Kuen Cheung. Two-stage designs for dose-
finding trials with a biologic endpoint using stepwise tests. Biometrics,
64(1):232–241, 2008.
[129] Yong Zang, J Jack Lee, and Ying Yuan. Adaptive designs for identifying
optimal biological dose for molecularly targeted agents. Clinical trials,
11(3):319–327, 2014.
[130] Fang-Zhen Shen, Jing Wang, Jun Liang, Kun Mu, Ji-Yuan Hou, and
Yan-Tao Wang. Low-dose metronomic chemotherapy with cisplatin: can
it suppress angiogenesis in h22 hepatocarcinoma cells? International
journal of experimental pathology, 91(1):10–16, 2010.
[131] Atanu Bhattacharjee and Vijay M Patil. Determining an optimum bi
ological dose of a metronomic chemotherapy. Journal of data science,
15(1):77–94, 2017.
[132] Atanu Bhattacharjee and Vijay M Patil. Time-dependent area under
the roc curve for optimum biological dose detection. Turkiye klinikleri
journal of biostatistics, 8(2), 2016.
[133] Yuval Shaked, Alessia Ciarrocchi, Marcela Franco, Christina R Lee,
Shan Man, Alison M Cheung, Daniel J Hicklin, David Chaplin, F Stuart
238 Bibliography
[158] Robert Tibshirani. The lasso method for variable selection in the cox
model. Statistics in medicine, 16(4):385–395, 1997.
[159] Jianqing Fan and Runze Li. Variable selection via nonconcave penalized
likelihood and its oracle properties. Journal of the american statistical
Association, 96(456):1348–1360, 2001.
[160] Jianqing Fan and Runze Li. Variable selection for cox’s proportional
hazards model and frailty model. Annals of statistics, pages 74–99,
2002.
[161] Hui Zou and Trevor Hastie. Regularization and variable selection via the
elastic net. Journal of the royal statistical society: series B (statistical
methodology), 67(2):301–320, 2005.
[162] Emmanuel Candes, Terence Tao, et al. The dantzig selector: Statisti
cal estimation when p is much larger than n. The annals of statistics,
35(6):2313–2351, 2007.
[163] Hemant Ishwaran, Udaya B Kogalur, Eugene H Blackstone, Michael S
Lauer, et al. Random survival forests. The annals of applied statistics,
2(3):841–860, 2008.
[164] Pierre JM Verweij and Hans C Van Houwelingen. Cross-validation in
survival analysis. Statistics in medicine, 12(24):2305–2314, 1993.
[165] Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher.
Assessment and comparison of prognostic classification schemes for sur
vival data. Statistics in medicine, 18(17-18):2529–2545, 1999.
[166] Pierre JM Verweij and Hans C Van Houwelingen. Penalized likelihood
in cox regression. Statistics in medicine, 13(23-24):2427–2436, 1994.
[167] Hege M Bøvelstad, Ståle Nygård, Hege L Størvold, Magne Aldrin, Ør
nulf Borgan, Arnoldo Frigessi, and Ole Christian Lingjærde. Predicting
survival from microarray data—a comparative study. Bioinformatics,
23(16):2080–2087, 2007.
Index
3+3 design, 33
covariance, 169
covariance matrix, 41
Aalen-Nelson estimator, 67
Covariance pattern, 170
ACC, 21
Coverage probability, 28
ALC, 21
CRTSize package, 29
Altshuler estimator, 67
CT scan, 199
33
Cumulative incidence function, 89
Asymptotically, 20
Autoregressive, 186
Data science, 226
Bayeslongitudinal, 134
Dose-limiting toxicities (DLTs), 32
186
Competing risk, 87
EurosarcBayes package, 29
(CRAN), 6
Fixed effect model, 188
Concordance correlation
flexsurv package, 120
coefficient(CCC), 199
Frailty data analysis, 97
Conditonal probability, 78
Frailty on recurrent events, 104
(CRM), 34
Gibbs, 7
coprimary package, 29
glm, 185
241
242 Index
Hazard function, 65
Mixed effect model, 187
171
Molecular targeted agents(MTA),
Heterogeneous first-order
186
autoregressive, 171
MRI, 199
Hypothesis, 14
JM package, 166
OpenBUGS, 7
Kaplan-Meier estimator, 66
Partial log-likelihood, 215
LASSO, 210
escalation(PGDE) design,
lme4, 186
Phase I trial, 32
lmer, 185
Phase II, 39
Log-Rank, 13
Piecewise Hazard, 118
Logistics model, 41
Piecewise hazard, 118
longpower package, 29
Posterior error rate, 23
195
Product inverse moment (piMOM),
Carlo), 7
Product limit, 66
MCMCglmm, 192
Product-moment (pMOM), 216
216
Metropolis, 7
Quadratic logistic design, 52
Microarray, 209
Index 243
RCTs, 14
Relative survival, 111, 120
Right censoring, 65
Risk, 19
Rule-based design, 32
Wald score, 71
Wald statistics, 42
Walds test, 119