Skip to contents

Performs multivariable survival analysis using Cox proportional hazards regression. In multivariable survival analysis, person-time follow-up is crucial for properly adjusting for covariates while accounting for varying observation periods. The Cox proportional hazards model incorporates person-time by modeling the hazard function, which represents the instantaneous event rate per unit of person-time. When stratifying analyses or examining multiple predictors, the model accounts for how these factors influence event rates relative to the person-time at risk in each subgroup.

Usage

multisurvival(
  data,
  elapsedtime = NULL,
  tint = FALSE,
  dxdate = NULL,
  fudate = NULL,
  timetypedata = "ymd",
  timetypeoutput = "months",
  uselandmark = FALSE,
  landmark = 3,
  outcome = NULL,
  outcomeLevel,
  dod,
  dooc,
  awd,
  awod,
  analysistype = "overall",
  explanatory = NULL,
  contexpl = NULL,
  multievent = FALSE,
  hr = FALSE,
  sty = "t1",
  ph_cox = FALSE,
  km = FALSE,
  endplot = 60,
  byplot = 12,
  ci95 = FALSE,
  risktable = FALSE,
  censored = FALSE,
  medianline = "none",
  pplot = FALSE,
  cutp = "12, 36, 60",
  calculateRiskScore = FALSE,
  numRiskGroups = "four",
  plotRiskGroups = FALSE,
  ac = FALSE,
  adjexplanatory = NULL,
  ac_method = "average",
  showNomogram = FALSE,
  use_stratify = FALSE,
  stratvar = NULL,
  person_time = FALSE,
  time_intervals = "12, 36, 60",
  rate_multiplier = 100,
  use_tree = FALSE,
  min_node = 20,
  complexity = 0.01,
  max_depth = 5,
  show_terminal_nodes = FALSE,
  use_time_dependent = FALSE,
  td_format = "wide",
  time_dep_vars = NULL,
  change_times = "6, 12, 18",
  td_suffix_pattern = "_t{time}",
  start_time_var = NULL,
  stop_time_var = NULL,
  use_frailty = FALSE,
  frailty_var = NULL,
  frailty_distribution = "gamma",
  use_splines = FALSE,
  spline_vars = NULL,
  spline_df = 3,
  spline_type = "pspline",
  showExplanations = FALSE,
  showSummaries = FALSE,
  ml_method = "none",
  ml_validation = "cv",
  ml_cv_folds = 5,
  ml_feature_selection = FALSE,
  ml_importance = FALSE,
  ml_calibration = FALSE,
  ml_performance = FALSE,
  ml_shap = FALSE,
  ml_hyperparameter_tuning = FALSE,
  ml_ensemble_weights = "equal"
)

Arguments

data

The dataset to be analyzed, provided as a data frame. Must contain the variables specified in the options below.

elapsedtime

The numeric variable representing follow-up time until the event or last observation. If tint = false, this should be a pre-calculated numeric time variable. If tint = true, dxdate and fudate will be used to calculate this time.

tint

If true, survival time will be calculated from dxdate and fudate. If false, elapsedtime should be provided as a pre-calculated numeric variable.

dxdate

Date of diagnosis. Required if tint = true. Accepts: (1) Date/datetime text, (2) Numeric Unix epoch seconds (from DateTime Converter's corrected_datetime_numeric output), (3) Numeric datetime values from R. Time intervals calculated as difference from follow-up date.

fudate

Follow-up date or date of last observation. Required if tint = true. Accepts: (1) Date/datetime text, (2) Numeric Unix epoch seconds (from DateTime Converter's corrected_datetime_numeric output), (3) Numeric datetime values from R. Must be in same format as diagnosis date.

timetypedata

Specifies the format of the date variables in the input data. This is critical if tint = true, as dxdate and fudate will be parsed according to this format to calculate survival time. For example, if your data files record dates as "YYYY-MM-DD", select ymd.

timetypeoutput

The units in which survival time is reported in the output. Choose from days, weeks, months, or years.

uselandmark

If true, applies a landmark analysis starting at a specified time point.

landmark

The time point (in the units defined by timetypeoutput) at which to start landmark analyses. Only used if uselandmark = true.

outcome

The outcome variable. Typically indicates event status (e.g., death, recurrence). For survival analysis, this may be a factor or numeric event indicator.

outcomeLevel

The level of outcome considered as the event. For example, if outcome is a factor, specify which level indicates the event occurrence.

dod

The level of outcome corresponding to death due to disease, if applicable.

dooc

The level of outcome corresponding to death due to other causes, if applicable.

awd

The level of outcome corresponding to alive with disease, if applicable.

awod

The level of outcome corresponding to alive without disease, if applicable.

analysistype

Type of survival analysis: - overall: All-cause survival - cause: Cause-specific survival - compete: Competing risks analysis

explanatory

Categorical explanatory (predictor) variables included in the Cox model.

contexpl

Continuous explanatory (predictor) variables included in the Cox model.

multievent

If true, multiple event levels will be considered for competing risks analysis. Requires specifying dod, dooc, etc.

hr

If true, generates a plot of hazard ratios for each explanatory variable in the Cox model.

sty

The style of the hazard ratio (forest) plot. "finalfit" or "survminer forestplot".

ph_cox

If true, tests the proportional hazards assumption for the Cox model. Use if you suspect violations of the PH assumption.

km

If true, produces a Kaplan-Meier survival plot. Useful for visualization of survival functions without covariate adjustment.

endplot

The maximum follow-up time (in units defined by timetypeoutput) to display on survival plots.

byplot

The interval (in units defined by timetypeoutput) at which time points or labels are shown on plots.

ci95

If true, displays 95\ estimates on plots.

risktable

If true, displays the number of subjects at risk at each time point below the survival plot.

censored

If true, marks censored observations (e.g., using tick marks) on the survival plot.

medianline

If true, displays a line indicating the median survival time on the survival plot.

pplot

If true, displays the p-value from the survival comparison test on the survival plot.

cutp

.

calculateRiskScore

If true, calculates a risk score from the Cox model coefficients for each individual.

numRiskGroups

Select the number of risk groups to create from the risk scores. The data will be divided into equal quantiles based on this selection.

plotRiskGroups

If true, stratifies individuals into risk groups based on their calculated risk scores and plots their survival curves.

ac

.

adjexplanatory

.

ac_method

Method for computing adjusted survival curves

showNomogram

.

use_stratify

If true, uses stratification to handle variables that violate the proportional hazards assumption. Stratification creates separate baseline hazard functions for different groups.

stratvar

Variables used for stratification. When proportional hazards are not met, stratification can adjust the model to better fit the data by allowing different baseline hazards.

person_time

Enable this option to calculate and display person-time metrics, including total follow-up time and incidence rates. These metrics help quantify the rate of events per unit of time in your study population.

time_intervals

Specify time intervals for stratified person-time analysis. Enter a comma-separated list of time points to create intervals. For example, "12, 36, 60" will create intervals 0-12, 12-36, 36-60, and 60+.

rate_multiplier

Specify the multiplier for incidence rates (e.g., 100 for rates per 100 person-years, 1000 for rates per 1000 person-years).

use_tree

If true, fits a survival decision tree to identify subgroups with different survival outcomes. Decision trees provide an intuitive alternative to Cox regression for identifying risk factors.

min_node

The minimum number of observations required in a terminal node. Larger values create simpler trees that may be more generalizable but potentially miss important subgroups.

complexity

The complexity parameter for tree pruning. Higher values result in smaller trees. This parameter controls the trade-off between tree size and goodness of fit.

max_depth

The maximum depth of the decision tree. Limits the complexity of the tree to avoid overfitting.

show_terminal_nodes

If true, displays Kaplan-Meier survival curves for each terminal node of the decision tree.

use_time_dependent

Enable time-dependent covariates for Cox regression. This allows modeling variables that change values at specific time points during follow-up (e.g., treatment changes, biomarker measurements, disease progression).

td_format

Specify whether your data is in wide format (one row per subject with time points as separate variables) or long format (multiple rows per subject with time intervals).

time_dep_vars

Variables that change values over time. In wide format, these are baseline variables that will be updated at change points. In long format, these are the time-varying variables.

change_times

Time points (in same units as survival time) when time-dependent variables change. For wide format data, specify comma-separated time points (e.g., "6, 12, 18"). The function will create intervals and update covariate values at these times.

td_suffix_pattern

For wide format: Pattern for time-specific variable names. Use time as placeholder. Example: if baseline variable is 'treatment' and pattern is '_ttime', the function looks for 'treatment_t6', 'treatment_t12', etc.

start_time_var

For long format only: Variable indicating the start time of each interval. Leave empty for wide format data.

stop_time_var

For long format only: Variable indicating the stop time of each interval. Leave empty for wide format data.

use_frailty

Add a frailty term to account for unobserved heterogeneity or clustering in the data. Frailty models add random effects to the Cox model.

frailty_var

Clustering variable for the frailty term (e.g., hospital, family, or study center). Each level represents a cluster with shared frailty.

frailty_distribution

Distribution of the frailty term. Gamma is most commonly used and assumes multiplicative effect on the hazard. Gaussian assumes additive effect on log-hazard.

use_splines

Use penalized splines to model time-varying effects (non-proportional hazards). This is an alternative to stratification for handling PH violations.

spline_vars

Variables to model with time-varying coefficients using splines. These are variables that violate the proportional hazards assumption.

spline_df

Degrees of freedom for the spline functions. Higher values allow more flexible time-varying effects but may lead to overfitting.

spline_type

Type of spline basis to use. Penalized splines provide smooth functions with automatic smoothness selection. Natural splines are constrained to be linear at the boundaries.

showExplanations

Display detailed explanations for each analysis component to help interpret the statistical methods and results.

showSummaries

Display natural language summaries alongside tables and plots. These summaries provide plain-language interpretations of the statistical results. Turn off to reduce visual clutter when summaries are not needed.

ml_method

Machine learning survival analysis method

ml_validation

Model validation approach

ml_cv_folds

CV fold count

ml_feature_selection

Enable feature selection

ml_importance

Show variable importance

ml_calibration

Generate calibration plot

ml_performance

Show performance metrics

ml_shap

Compute SHAP values

ml_hyperparameter_tuning

Enable hyperparameter tuning

ml_ensemble_weights

Ensemble model weights

Value

A results object containing:

results$todoa html
results$multivariableCoxHeadinga preformatted
results$texta html
results$text2a html
results$multivariableCoxSummaryHeadinga preformatted
results$multivariableCoxSummarya html
results$personTimeHeadinga preformatted
results$personTimeTablea table
results$personTimeSummaryHeadinga preformatted
results$personTimeSummarya html
results$survivalPlotsHeadinga preformatted
results$plotan image
results$plot3an image
results$cox_pha preformatted
results$plot8an image
results$plotKMan image
results$risk_score_analysisa preformatted
results$risk_score_analysis2a html
results$riskScoreHeadinga preformatted
results$riskScoreSummaryHeadinga preformatted
results$riskScoreTablea table
results$riskScoreSummarya html
results$riskScoreMetricsa html
results$riskGroupPlotan image
results$stratificationExplanationa html
results$calculatedtimean output
results$outcomeredefinedan output
results$addRiskScorean output
results$addRiskGroupan output
results$adjustedSurvivalHeadinga preformatted
results$plot_adjan image
results$adjustedSurvivalSummaryHeadinga preformatted
results$adjustedSurvivalSummarya html
results$nomogramHeadinga preformatted
results$plot_nomograman image
results$nomogram_displaya html
results$nomogramSummaryHeadinga preformatted
results$nomogramSummarya html
results$mydataview_survivaldecisiontreea preformatted
results$survivalTreeHeadinga preformatted
results$treeSummaryHeadinga preformatted
results$tree_summarya html
results$tree_plotan image
results$node_survival_plotsan image
results$multivariableCoxExplanationa html
results$multivariableCoxHeading3a preformatted
results$adjustedSurvivalExplanationa html
results$riskScoreExplanationa html
results$nomogramExplanationa html
results$personTimeExplanationa html
results$stratifiedAnalysisExplanationa html
results$survivalPlotsHeading3a preformatted
results$survivalPlotsExplanationa html
results$ml_variable_importancea table
results$ml_performance_metricsa html
results$ml_feature_selection_resultsa table
results$ml_ensemble_summarya html
results$ml_prediction_intervalsa table
results$ml_cross_validation_summarya html

Tables can be converted to data frames with asDF or as.data.frame. For example:

results$personTimeTable$asDF

as.data.frame(results$personTimeTable)

Examples

# Example 1: Basic multivariable Cox regression
library(survival)
data(colon)

multisurvival(
    data = colon,
    elapsedtime = "time",
    outcome = "status",
    outcomeLevel = "1",
    explanatory = c("sex", "obstruct", "perfor"),
    contexpl = c("age", "nodes"),
    timetypeoutput = "days",
    hr = TRUE  # Show hazard ratio plot
)

# Example 2: Using dates to calculate survival time
# Assuming you have diagnosis and follow-up dates
multisurvival(
    data = mydata,
    tint = TRUE,
    dxdate = "diagnosis_date",
    fudate = "last_followup_date",
    timetypedata = "ymd",
    timetypeoutput = "months",
    outcome = "vital_status",
    outcomeLevel = "Dead",
    explanatory = c("stage", "grade"),
    contexpl = "age"
)

# Example 3: Risk stratification analysis
multisurvival(
    data = colon,
    elapsedtime = "time",
    outcome = "status",
    outcomeLevel = "1",
    explanatory = c("sex", "obstruct"),
    contexpl = c("age", "nodes"),
    calculateRiskScore = TRUE,
    numRiskGroups = "three",
    plotRiskGroups = TRUE,
    addRiskScore = TRUE,  # Add risk score to data
    addRiskGroup = TRUE   # Add risk group to data
)

# Example 4: Model with stratification for non-proportional hazards
multisurvival(
    data = colon,
    elapsedtime = "time",
    outcome = "status",
    outcomeLevel = "1",
    explanatory = c("obstruct", "perfor"),
    contexpl = c("age", "nodes"),
    use_stratify = TRUE,
    stratvar = "sex",  # Stratify by sex if PH assumption violated
    ph_cox = TRUE      # Test proportional hazards assumption
)

# Example 5: Stepwise model selection
# multisurvival(
#     data = colon,
#     elapsedtime = "time",
#     outcome = "status",
#     outcomeLevel = "1",
#     explanatory = c("sex", "obstruct", "perfor", "adhere"),
#     contexpl = c("age", "nodes"),
#     use_modelSelection = TRUE,
#     modelSelection = "both",  # Stepwise selection
#     selectionCriteria = "aic",
#     pEntry = 0.05,
#     pRemoval = 0.10
# )

# Example 6: Person-time analysis
multisurvival(
    data = colon,
    elapsedtime = "time",
    outcome = "status",
    outcomeLevel = "1",
    explanatory = "sex",
    contexpl = "age",
    person_time = TRUE,
    time_intervals = "180, 365, 730",  # 6mo, 1yr, 2yr
    rate_multiplier = 1000  # Rate per 1000 person-days
)