Medical Decision Tree — tree • ClinicoPath

Usage

tree(
  data,
  vars,
  facs,
  target,
  targetLevel,
  train,
  trainLevel,
  imputeMissing = FALSE,
  balanceClasses = FALSE,
  scaleFeatures = FALSE,
  clinicalMetrics = FALSE,
  featureImportance = FALSE,
  showInterpretation = FALSE,
  showPlot = FALSE,
  showPartitionPlot = FALSE,
  minCases = 10,
  maxDepth = 4,
  confidenceInterval = FALSE,
  riskStratification = FALSE,
  exportPredictions = FALSE,
  clinicalContext = "diagnosis",
  costRatio = 1,
  prevalenceAdjustment = FALSE,
  expectedPrevalence = 10,
  crossValidation = FALSE,
  cvFolds = 5,
  bootstrapValidation = FALSE,
  bootstrapSamples = 1000,
  showROCCurve = FALSE,
  showCalibrationPlot = FALSE,
  showClinicalUtility = FALSE,
  variableImportanceMethod = "frequency",
  customThresholds = FALSE,
  sensitivityThreshold = 0.8,
  specificityThreshold = 0.8,
  treeVisualization = "standard",
  showNodeStatistics = FALSE,
  compareModels = FALSE,
  spatialCoords,
  useAutocart = FALSE,
  spatialAlpha = 0.5,
  spatialBeta = 0.5,
  modelComparisonMetric = "bacc"
)

Arguments

data

The data as a data frame containing clinical variables, biomarkers, and patient outcomes.

vars

Continuous variables such as biomarker levels, age, laboratory values, or quantitative pathological measurements.

facs

Categorical variables such as tumor grade, stage, histological type, or patient demographics.

target

Primary outcome variable: disease status, treatment response, survival status, or diagnostic category.

targetLevel

Level representing disease presence, positive outcome, or event of interest.

train

Variable indicating training vs validation cohorts. If not provided, data will be split automatically.

trainLevel

Level indicating the training/discovery cohort.

imputeMissing

Impute missing values using medically appropriate methods (median within disease groups for continuous, mode for categorical).

balanceClasses

Balance classes to handle rare diseases or imbalanced outcomes. Recommended for disease prevalence <20\

scaleFeaturesStandardize continuous variables (useful when combining biomarkers with different scales/units).

clinicalMetricsDisplay sensitivity, specificity, predictive values, likelihood ratios, and other clinical metrics.

featureImportanceIdentify most important clinical variables and biomarkers for the decision tree.

showInterpretationProvide clinical interpretation of results including diagnostic utility and clinical recommendations.

showPlotDisplay visual representation of the decision tree.

showPartitionPlotDisplay 2D decision boundary visualization using parttree. Requires exactly 2 continuous variables for optimal visualization.

minCasesMinimum number of cases required in each terminal node (higher values prevent overfitting).

maxDepthMaximum depth of decision tree (deeper trees may overfit).

confidenceIntervalDisplay confidence intervals for performance metrics.

riskStratificationAnalyze risk stratification performance and create risk categories based on tree predictions.

exportPredictionsAdd predicted classifications and probabilities to the dataset.

clinicalContextClinical context affects interpretation thresholds and recommendations (e.g., screening requires high sensitivity).

costRatioRelative cost of missing a case vs false alarm. Higher values favor sensitivity over specificity.

prevalenceAdjustmentAdjust predictive values for expected disease prevalence in target population (different from study sample).

expectedPrevalenceExpected disease prevalence in target population for adjusted predictive value calculations.

crossValidationPerform k-fold cross-validation for robust performance estimation.

cvFoldsNumber of folds for cross-validation (typically 5 or 10).

bootstrapValidationPerform bootstrap validation for confidence intervals on performance metrics.

bootstrapSamplesNumber of bootstrap samples for confidence interval calculation.

showROCCurveDisplay ROC curve with AUC for model discrimination assessment.

showCalibrationPlotDisplay probability calibration plot to assess prediction reliability.

showClinicalUtilityDisplay clinical utility curve for threshold optimization.

variableImportanceMethodMethod for calculating variable importance in decision trees.

customThresholdsSet custom thresholds for clinical performance interpretation.

sensitivityThresholdMinimum sensitivity threshold for clinical acceptability.

specificityThresholdMinimum specificity threshold for clinical acceptability.

treeVisualizationStyle of decision tree visualization display.

showNodeStatisticsDisplay detailed statistics at each decision tree node.

compareModelsCompare FFTrees performance with logistic regression, CART, and autocart (if spatial coordinates provided).

spatialCoordsX and Y coordinates for spatial analysis. Required for autocart spatial regression trees. Typically longitude/latitude or tissue microarray coordinates.

useAutocartEnable spatial-aware regression trees using autocart methodology. Requires spatial coordinates (X, Y variables).

spatialAlphaWeight for spatial autocorrelation in autocart splitting (0.0-1.0). Higher values emphasize spatial clustering in decision tree splits.

spatialBetaWeight for spatial compactness in autocart splitting (0.0-1.0). Higher values favor spatially compact regions in tree partitions.

modelComparisonMetricPrimary metric for comparing model performance.

A results object containing:

`results$todo`					a html
`results$text1`					a preformatted
`results$text2`					a preformatted
`results$text2a`					a preformatted
`results$text2b`					a preformatted
`results$text3`					a preformatted
`results$text4`					a html
`results$dataQuality`					a preformatted
`results$missingDataReport`					a table
`results$modelSummary`					a html
`results$clinicalMetrics`					a table
`results$clinicalInterpretation`					a html
`results$featureImportance`					a table
`results$riskStratification`					a table
`results$confusionMatrix`					a table
`results$adjustedMetrics`					a table
`results$crossValidationResults`					a table
`results$bootstrapResults`					a table
`results$modelComparison`					a table
`results$spatialAnalysis`					a table
`results$spatialInterpretation`					a html
`results$plot`					an image
`results$partitionPlot`					an image
`results$rocPlot`					an image
`results$calibrationPlot`					an image
`results$clinicalUtilityPlot`					an image
`results$deploymentGuidelines`					a html
`results$predictions`					an output
`results$probabilities`					an output

Tables can be converted to data frames with asDF or as.data.frame. For example:results$missingDataReport$asDFas.data.frame(results$missingDataReport) Enhanced decision tree analysis for medical research, pathology and oncology. Provides clinical performance metrics, handles missing data appropriately, and offers interpretations relevant to medical decision-making. # Example for cancer diagnosis data(cancer_biomarkers) tree( data = cancer_biomarkers, vars = c("PSA", "age", "tumor_size"), facs = c("grade", "stage"), target = "diagnosis", targetLevel = "cancer", train = "cohort", trainLevel = "discovery", imputeMissing = TRUE, balanceClasses = TRUE )