Introduction_to_RCTRecruit

Data Initialization

The LoadData function ensures that input data is properly formatted, processed for analysis, and stored internally for the session. The package includes the gripsYR1 dataset, which contains the first year of recruitment data from the GRIPS study. We will use the ScreenDt and Enrolled columns to initialize our baseline.

library(RCTRecruit)

# View the structure of the historical data
head(gripsYR1)
##     ScreenDt Enrolled
## 1 2019-06-19        1
## 2 2019-06-20        0
## 3 2019-06-21        0
## 4 2019-06-24        0
## 5 2019-06-25        0
## 6 2019-06-26        0
# Load the data into the package's internal memory
LoadData(data = gripsYR1, date = ScreenDt, enrolled = Enrolled)
## 
## Variables Enrolled and ScreenDt were successfully loaded

Cumulative Projections

The GetWeekPredCI function predicts cumulative weekly recruitment, providing median estimates and a 95% projection band. By default, weight functions are generated using a Binomial (51, 0.5) probability mass function, anchoring the peak of the curve at the same calendar week in the empirical data to capture seasonal patterns. An s3 plot method is available to visualize the results.

set.seed(123)
res <- GetWeekPredCI()

# View the first 10 weeks of the prediction matrix
res$predCI[1:10, ]
##   2.5% 50% 97.5%
## 0    0   0     0
## 1    0   0     2
## 2    0   1     3
## 3    0   1     4
## 4    0   2     5
## 5    0   2     5
## 6    0   2     6
## 7    0   2     6
## 8    0   3     7
## 9    0   3     7
# Plot the predicted recruitment against the input data
plot(res)

Anomaly Adjustments (Gap Weeks & Efficiency)

There are instances where the empirical data contains prolonged periods with zero enrollment, referred to as “gap weeks,” which are not expected to recur (e.g., pandemic disruptions). Setting fillGaps = TRUE replaces these with expected values. To adjust for other anticipated changes, such as a 50% increase in recruitment efficiency, use the efficiencyFactor multiplier.

set.seed(123)
res_anomaly <- GetWeekPredCI(fillGaps = TRUE, efficiencyFactor = 1.5)

# View the first 10 weeks of the adjusted prediction matrix for comparison
res_anomaly$predCI[1:10, ]
##   2.5% 50% 97.5%
## 0    0   0     0
## 1    0   0     3
## 2    0   2     4
## 3    0   2     6
## 4    0   3     8
## 5    0   3     8
## 6    0   3     9
## 7    0   3     9
## 8    0   4    10
## 9    0   4    10

Timeline Estimations

The Time2Nsubjects function estimates the number of weeks required to recruit a specified number of subjects based on the historical recruitment data. The default target is 50 subjects.

set.seed(123)
Time2Nsubjects()
## Enrolling 50 subjects requires 148 weeks
## 
##  2.5%   50% 97.5% 
##   107   148   197

Validation Metrics and Scenario Comparison

The GetDistance function calculates the Euclidean Distance (ED) between predicted and actual weekly recruitment to assess model accuracy. It requires a target cumulative enrollment vector. Here, we use the second year of recruitment (gripsYR2Weekly) as our target benchmark.

We can evaluate how adjustments affect predictive accuracy by comparing the distance of a default model against an adjusted model.

# Set the target vector
target <- gripsYR2Weekly$enrolled

# Calculate Euclidean Distance using defaults
set.seed(123)
GetDistance(target = target)
##  2.5%   50% 97.5% 
##    66   104   138
# Calculate Euclidean Distance accounting for gap weeks
set.seed(123)
GetDistance(target = target, fillGaps = TRUE)
##  2.5%   50% 97.5% 
##    26    60    93
# Compare four different predictive scenarios visually
scenarios <- list(
  sc1 = GetWeekPredCI(),
  sc2 = GetWeekPredCI(cauchyWt = TRUE),
  sc3 = GetWeekPredCI(fillGaps = TRUE),
  sc4 = GetWeekPredCI(fillGaps = TRUE, efficiencyFactor = 1.5)
)

maxY <- sapply(scenarios, \(x) x$pargs$maxY) |> max()

defaultGraphicParams <- par(no.readonly = TRUE)
graphics::par(mfrow = c(2, 2), cex.main = 1)
for (x in scenarios) plot(x, yMax = maxY, main = x$call.)