Performs multivariable survival analysis using Cox proportional hazards regression. In multivariable survival analysis, person-time follow-up is crucial for properly adjusting for covariates while accounting for varying observation periods. The Cox proportional hazards model incorporates person-time by modeling the hazard function, which represents the instantaneous event rate per unit of person-time. When stratifying analyses or examining multiple predictors, the model accounts for how these factors influence event rates relative to the person-time at risk in each subgroup.
Usage
multisurvival(
data,
elapsedtime,
tint = FALSE,
dxdate,
fudate,
timetypedata = "ymd",
timetypeoutput = "months",
uselandmark = FALSE,
landmark = 3,
outcome,
outcomeLevel,
dod,
dooc,
awd,
awod,
analysistype = "overall",
explanatory,
contexpl,
multievent = FALSE,
hr = FALSE,
sty = "t1",
ph_cox = FALSE,
km = FALSE,
endplot = 60,
byplot = 12,
ci95 = FALSE,
risktable = FALSE,
censored = FALSE,
medianline = "none",
pplot = TRUE,
cutp = "12, 36, 60",
calculateRiskScore = FALSE,
numRiskGroups = "four",
plotRiskGroups = FALSE,
ac = FALSE,
adjexplanatory,
ac_method = "average",
showNomogram = FALSE,
use_modelSelection = FALSE,
modelSelection = "enter",
selectionCriteria = "aic",
pEntry = 0.05,
pRemoval = 0.1,
use_stratify = FALSE,
stratvar,
person_time = FALSE,
time_intervals = "12, 36, 60",
rate_multiplier = 100,
use_tree = FALSE,
min_node = 20,
complexity = 0.01,
max_depth = 5,
show_terminal_nodes = TRUE,
use_time_dependent = FALSE,
td_format = "wide",
time_dep_vars,
change_times = "6, 12, 18",
td_suffix_pattern = "_t{time}",
start_time_var,
stop_time_var,
use_frailty = FALSE,
frailty_var,
frailty_distribution = "gamma",
use_splines = FALSE,
spline_vars,
spline_df = 3,
spline_type = "pspline"
)
Arguments
- data
The dataset to be analyzed, provided as a data frame. Must contain the variables specified in the options below.
- elapsedtime
The numeric variable representing follow-up time until the event or last observation. If
tint
= false, this should be a pre-calculated numeric time variable. Iftint
= true,dxdate
andfudate
will be used to calculate this time.- tint
If true, survival time will be calculated from
dxdate
andfudate
. If false,elapsedtime
should be provided as a pre-calculated numeric variable.- dxdate
Date of diagnosis. Required if
tint
= true. Must match the format specified intimetypedata
.- fudate
Follow-up date or date of last observation. Required if
tint
= true. Must match the format specified intimetypedata
.- timetypedata
Specifies the format of the date variables in the input data. This is critical if
tint = true
, asdxdate
andfudate
will be parsed according to this format to calculate survival time. For example, if your data files record dates as "YYYY-MM-DD", selectymd
.- timetypeoutput
The units in which survival time is reported in the output. Choose from days, weeks, months, or years.
- uselandmark
If true, applies a landmark analysis starting at a specified time point.
- landmark
The time point (in the units defined by
timetypeoutput
) at which to start landmark analyses. Only used ifuselandmark
= true.- outcome
The outcome variable. Typically indicates event status (e.g., death, recurrence). For survival analysis, this may be a factor or numeric event indicator.
- outcomeLevel
The level of
outcome
considered as the event. For example, ifoutcome
is a factor, specify which level indicates the event occurrence.- dod
The level of
outcome
corresponding to death due to disease, if applicable.- dooc
The level of
outcome
corresponding to death due to other causes, if applicable.- awd
The level of
outcome
corresponding to alive with disease, if applicable.- awod
The level of
outcome
corresponding to alive without disease, if applicable.- analysistype
Type of survival analysis: - overall: All-cause survival - cause: Cause-specific survival - compete: Competing risks analysis
- explanatory
Categorical explanatory (predictor) variables included in the Cox model.
- contexpl
Continuous explanatory (predictor) variables included in the Cox model.
- multievent
If true, multiple event levels will be considered for competing risks analysis. Requires specifying
dod
,dooc
, etc.- hr
If true, generates a plot of hazard ratios for each explanatory variable in the Cox model.
- sty
The style of the hazard ratio (forest) plot. "finalfit" or "survminer forestplot".
- ph_cox
If true, tests the proportional hazards assumption for the Cox model. Use if you suspect violations of the PH assumption.
- km
If true, produces a Kaplan-Meier survival plot. Useful for visualization of survival functions without covariate adjustment.
- endplot
The maximum follow-up time (in units defined by
timetypeoutput
) to display on survival plots.- byplot
The interval (in units defined by
timetypeoutput
) at which time points or labels are shown on plots.- ci95
If true, displays 95\ estimates on plots.
- risktable
If true, displays the number of subjects at risk at each time point below the survival plot.
- censored
If true, marks censored observations (e.g., using tick marks) on the survival plot.
- medianline
If true, displays a line indicating the median survival time on the survival plot.
- pplot
If true, displays the p-value from the survival comparison test on the survival plot.
- cutp
.
- calculateRiskScore
If true, calculates a risk score from the Cox model coefficients for each individual.
- numRiskGroups
Select the number of risk groups to create from the risk scores. The data will be divided into equal quantiles based on this selection.
- plotRiskGroups
If true, stratifies individuals into risk groups based on their calculated risk scores and plots their survival curves.
- ac
.
- adjexplanatory
.
- ac_method
Method for computing adjusted survival curves
- showNomogram
.
- use_modelSelection
If true, applies a variable selection procedure to find the best-fitting model based on criteria like AIC or likelihood ratio tests.
- modelSelection
The method used to select variables: - enter: Includes all variables (no selection) - forward: Adds variables one at a time if they improve the model - backward: Removes variables that do not significantly contribute - both: Combination of forward and backward steps
- selectionCriteria
The criterion used for adding or removing variables in model selection: - aic: Balances model fit and complexity - lrt: Uses likelihood ratio tests to decide inclusion/removal
- pEntry
Significance level at which a variable enters the model during forward or stepwise selection.
- pRemoval
Significance level at which a variable is removed from the model during backward or stepwise selection.
- use_stratify
If true, uses stratification to handle variables that violate the proportional hazards assumption. Stratification creates separate baseline hazard functions for different groups.
- stratvar
Variables used for stratification. When proportional hazards are not met, stratification can adjust the model to better fit the data by allowing different baseline hazards.
- person_time
Enable this option to calculate and display person-time metrics, including total follow-up time and incidence rates. These metrics help quantify the rate of events per unit of time in your study population.
- time_intervals
Specify time intervals for stratified person-time analysis. Enter a comma-separated list of time points to create intervals. For example, "12, 36, 60" will create intervals 0-12, 12-36, 36-60, and 60+.
- rate_multiplier
Specify the multiplier for incidence rates (e.g., 100 for rates per 100 person-years, 1000 for rates per 1000 person-years).
- use_tree
If true, fits a survival decision tree to identify subgroups with different survival outcomes. Decision trees provide an intuitive alternative to Cox regression for identifying risk factors.
- min_node
The minimum number of observations required in a terminal node. Larger values create simpler trees that may be more generalizable but potentially miss important subgroups.
- complexity
The complexity parameter for tree pruning. Higher values result in smaller trees. This parameter controls the trade-off between tree size and goodness of fit.
- max_depth
The maximum depth of the decision tree. Limits the complexity of the tree to avoid overfitting.
- show_terminal_nodes
If true, displays Kaplan-Meier survival curves for each terminal node of the decision tree.
- use_time_dependent
Enable time-dependent covariates for Cox regression. This allows modeling variables that change values at specific time points during follow-up (e.g., treatment changes, biomarker measurements, disease progression).
- td_format
Specify whether your data is in wide format (one row per subject with time points as separate variables) or long format (multiple rows per subject with time intervals).
- time_dep_vars
Variables that change values over time. In wide format, these are baseline variables that will be updated at change points. In long format, these are the time-varying variables.
- change_times
Time points (in same units as survival time) when time-dependent variables change. For wide format data, specify comma-separated time points (e.g., "6, 12, 18"). The function will create intervals and update covariate values at these times.
- td_suffix_pattern
For wide format: Pattern for time-specific variable names. Use time as placeholder. Example: if baseline variable is 'treatment' and pattern is '_ttime', the function looks for 'treatment_t6', 'treatment_t12', etc.
- start_time_var
For long format only: Variable indicating the start time of each interval. Leave empty for wide format data.
- stop_time_var
For long format only: Variable indicating the stop time of each interval. Leave empty for wide format data.
- use_frailty
Add a frailty term to account for unobserved heterogeneity or clustering in the data. Frailty models add random effects to the Cox model.
- frailty_var
Clustering variable for the frailty term (e.g., hospital, family, or study center). Each level represents a cluster with shared frailty.
- frailty_distribution
Distribution of the frailty term. Gamma is most commonly used and assumes multiplicative effect on the hazard. Gaussian assumes additive effect on log-hazard.
- use_splines
Use penalized splines to model time-varying effects (non-proportional hazards). This is an alternative to stratification for handling PH violations.
- spline_vars
Variables to model with time-varying coefficients using splines. These are variables that violate the proportional hazards assumption.
- spline_df
Degrees of freedom for the spline functions. Higher values allow more flexible time-varying effects but may lead to overfitting.
- spline_type
Type of spline basis to use. Penalized splines provide smooth functions with automatic smoothness selection. Natural splines are constrained to be linear at the boundaries.
Value
A results object containing:
results$todo | a html | ||||
results$text | a html | ||||
results$text2 | a html | ||||
results$personTimeTable | a table | ||||
results$personTimeSummary | a html | ||||
results$plot | an image | ||||
results$plot3 | an image | ||||
results$cox_ph | a preformatted | ||||
results$plot8 | an image | ||||
results$plotKM | an image | ||||
results$risk_score_analysis | a preformatted | ||||
results$risk_score_analysis2 | a html | ||||
results$riskScoreTable | a table | ||||
results$riskScoreMetrics | a html | ||||
results$riskGroupPlot | an image | ||||
results$stratificationExplanation | a html | ||||
results$calculatedtime | an output | ||||
results$outcomeredefined | an output | ||||
results$addRiskScore | an output | ||||
results$addRiskGroup | an output | ||||
results$plot_adj | an image | ||||
results$plot_nomogram | an image | ||||
results$nomogram_display | a html | ||||
results$tree_summary | a html | ||||
results$tree_plot | an image | ||||
results$node_survival_plots | an image | ||||
results$mydataview_modelselection | a preformatted | ||||
results$text_model_selection | a html | ||||
results$selection_method | a html | ||||
results$text2_model_selection | a html |
Tables can be converted to data frames with asDF
or as.data.frame
. For example:
results$personTimeTable$asDF
as.data.frame(results$personTimeTable)
Examples
# Example 1: Basic multivariable Cox regression
library(survival)
data(colon)
multisurvival(
data = colon,
elapsedtime = "time",
outcome = "status",
outcomeLevel = "1",
explanatory = c("sex", "obstruct", "perfor"),
contexpl = c("age", "nodes"),
timetypeoutput = "days",
hr = TRUE # Show hazard ratio plot
)
#> Error in multisurvival(data = colon, elapsedtime = "time", outcome = "status", outcomeLevel = "1", explanatory = c("sex", "obstruct", "perfor"), contexpl = c("age", "nodes"), timetypeoutput = "days", hr = TRUE): argument "adjexplanatory" is missing, with no default
# Example 2: Using dates to calculate survival time
# Assuming you have diagnosis and follow-up dates
multisurvival(
data = mydata,
tint = TRUE,
dxdate = "diagnosis_date",
fudate = "last_followup_date",
timetypedata = "ymd",
timetypeoutput = "months",
outcome = "vital_status",
outcomeLevel = "Dead",
explanatory = c("stage", "grade"),
contexpl = "age"
)
#> Error: object 'mydata' not found
# Example 3: Risk stratification analysis
multisurvival(
data = colon,
elapsedtime = "time",
outcome = "status",
outcomeLevel = "1",
explanatory = c("sex", "obstruct"),
contexpl = c("age", "nodes"),
calculateRiskScore = TRUE,
numRiskGroups = "three",
plotRiskGroups = TRUE,
addRiskScore = TRUE, # Add risk score to data
addRiskGroup = TRUE # Add risk group to data
)
#> Error in multisurvival(data = colon, elapsedtime = "time", outcome = "status", outcomeLevel = "1", explanatory = c("sex", "obstruct"), contexpl = c("age", "nodes"), calculateRiskScore = TRUE, numRiskGroups = "three", plotRiskGroups = TRUE, addRiskScore = TRUE, addRiskGroup = TRUE): unused arguments (addRiskScore = TRUE, addRiskGroup = TRUE)
# Example 4: Model with stratification for non-proportional hazards
multisurvival(
data = colon,
elapsedtime = "time",
outcome = "status",
outcomeLevel = "1",
explanatory = c("obstruct", "perfor"),
contexpl = c("age", "nodes"),
use_stratify = TRUE,
stratvar = "sex", # Stratify by sex if PH assumption violated
ph_cox = TRUE # Test proportional hazards assumption
)
#> Error in multisurvival(data = colon, elapsedtime = "time", outcome = "status", outcomeLevel = "1", explanatory = c("obstruct", "perfor"), contexpl = c("age", "nodes"), use_stratify = TRUE, stratvar = "sex", ph_cox = TRUE): argument "adjexplanatory" is missing, with no default
# Example 5: Stepwise model selection
multisurvival(
data = colon,
elapsedtime = "time",
outcome = "status",
outcomeLevel = "1",
explanatory = c("sex", "obstruct", "perfor", "adhere"),
contexpl = c("age", "nodes"),
use_modelSelection = TRUE,
modelSelection = "both", # Stepwise selection
selectionCriteria = "aic",
pEntry = 0.05,
pRemoval = 0.10
)
#> Error in multisurvival(data = colon, elapsedtime = "time", outcome = "status", outcomeLevel = "1", explanatory = c("sex", "obstruct", "perfor", "adhere"), contexpl = c("age", "nodes"), use_modelSelection = TRUE, modelSelection = "both", selectionCriteria = "aic", pEntry = 0.05, pRemoval = 0.1): argument "adjexplanatory" is missing, with no default
# Example 6: Person-time analysis
multisurvival(
data = colon,
elapsedtime = "time",
outcome = "status",
outcomeLevel = "1",
explanatory = "sex",
contexpl = "age",
person_time = TRUE,
time_intervals = "180, 365, 730", # 6mo, 1yr, 2yr
rate_multiplier = 1000 # Rate per 1000 person-days
)
#> Error in multisurvival(data = colon, elapsedtime = "time", outcome = "status", outcomeLevel = "1", explanatory = "sex", contexpl = "age", person_time = TRUE, time_intervals = "180, 365, 730", rate_multiplier = 1000): argument "adjexplanatory" is missing, with no default