| Title: | HIV Incidence Estimation using Recency Testing Data with Population Adjustment |
|---|---|
| Description: | Tools for estimating HIV incidence using cross-sectional recency testing data, adjusting for internal and external target populations and supporting subtype-specific parameters. The statistical methodology implemented builds on the framework described in Wang, Duerr, and Gao(2025) <doi:10.1002/sim.70216>. |
| Authors: | Sirong Li [aut], Fei Gao [aut, cre] (ORCID: <https://orcid.org/0000-0001-6797-5468>), Marlena Bannick [aut] (ORCID: <https://orcid.org/0000-0001-6797-5978>) |
| Maintainer: | Fei Gao <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-22 08:47:02 UTC |
| Source: | https://github.com/cran/XSRecencyX |
A public-use dataset from the Consortium for the Evaluation and Performance of HIV Incidence Assays (CEPHIA).
cephiacephia
A data frame with 212831 rows and 38 variables:
Assay name.
CEPHIA panel identifier.
Laboratory where testing was performed.
Date of assay testing.
Field corresponding to assay result.
Numeric assay result value.
Method used to obtain assay result.
Specific result identifier.
Generic result identifier.
Unique participant identifier.
Visit identifier.
Type of biological specimen.
HIV status at visit.
HIV status at cohort entry.
Days since cohort entry.
HIV subtype classification.
Indicator whether subtype was confirmed.
Country of participant.
Biological sex of participant.
Age in years at visit.
Interval size for estimated date of detectable infection (EDDI).
Days since estimated date of detectable infection.
Days since earliest possible date of detectable infection.
Days since latest possible date of detectable infection.
Indicator for elite controller status at visit.
Indicator whether participant was ever designated as elite controller.
Indicator whether participant was treatment naive at visit.
Indicator whether participant was on treatment at visit.
Indicator for first treatment episode.
Days since first antiretroviral therapy (ART).
Days since current ART episode.
Days from EDDI to first ART.
Days from EDDI to current ART.
Viral load measurement closest to visit.
Offset between viral load date and visit date.
Type of viral load measurement.
Indicator whether viral load was detectable.
CD4 count at visit.
The dataset was obtained from Zenodo (2025 release, version 2) and is redistributed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
The data are used internally by XSRecencyX for estimation of mean duration of recent infection (MDRI) and false recent rate (FRR) when these parameters are not supplied by the user.
Grebe, E., et al. (2025). CEPHIA public use data. Zenodo. doi:10.5281/zenodo.17439895.
Distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Facente, S.N., et al. (2020). Estimated Dates of Detectable Infection (EDDIs) as an improvement upon Fiebig staging for HIV infection dating. Epidemiology and Infection, 148:e53.
The function returns the estimated HIV incidence rate (cases per person-year) with optional adjustment to internal or external target populations using inverse probability weights. Supports subtype-specific parameters for recency test performance.
estimate_incidence( data, status_col, recency_col, covariates = NULL, target_data = NULL, target_col = NULL, subtype_col = NULL, recency.params = NULL, n_boot = NULL, seed = NULL, return_weights = FALSE, cephia_information_message = FALSE, assays = NULL, algorithm = NULL )estimate_incidence( data, status_col, recency_col, covariates = NULL, target_data = NULL, target_col = NULL, subtype_col = NULL, recency.params = NULL, n_boot = NULL, seed = NULL, return_weights = FALSE, cephia_information_message = FALSE, assays = NULL, algorithm = NULL )
data |
A data frame containing cross-sectional recency testing data. It must include HIV status and recency test results, and may optionally include covariates, a target group indicator and subtype label. |
status_col |
Character. Column name in |
recency_col |
Character. Column name in |
covariates |
Character vector. Vector of column names in |
target_data |
Data frame (optional). A data frame containing covariates (and subtype if applicable) from the external target population. Required only when adjusting to an external population. |
target_col |
Character (optional). Column name in data indicating inclusion in the internal target population data (1 = target, 0 = not target). Required only when estimating for an internal population. |
subtype_col |
Character (optional). Column name in |
recency.params |
A named list with the following elements (element names must match those below, case-insensitive):
Notes:
|
n_boot |
Integer (optional). Number of bootstrap replicates for confidence intervals and variances. |
seed |
Integer (optional). Seed for reproducibility. |
return_weights |
Logical (optional). If |
cephia_information_message |
Logical (optional). If |
assays |
Character vector (optional). Names of assays used in the recency testing algorithm. Default is
|
algorithm |
Function(optional). Defines the recency indicator with arguments in the same order as the Notes: |
This function estimates HIV incidence using cross-sectional recency testing data, optionally adjusting for differences between the observed sample and a specified target population. The target population can be:
the same as the observed cross-sectional population by specifying (target_data = NULL and target_col = NULL),
an internal subset of the cross-sectional population by specifying target_col,
or a separate external population (e.g., for transportability applications) by specifying target_data.
Incidence is estimated using a weighted version of the adjusted cross-sectional incidence estimator as in Wang et al. (2025). Weights are derived via logistic regression to adjust for population heterogeneity in covariates. Subtype-specific MDRI and FRR parameters can be incorporated to improve estimation accuracy when recency test performance varies by HIV subtype. Specifically, the incidence is estimated by
where and
are the estimated mean duration of recent infection (MDRI) and false recent rate (FRR) for HIV subtype ,respectively.
Bootstrapping is used to construct confidence interval for the incidence estimate.
Uncertainty in MDRI/FRR is incorporated via their confidence intervals assuming lognormal distributions.
A named list with the following elements:
incidence: Point estimate of HIV incidence in the specified target population.
se_incidence: Standard error of the incidence estimate based on bootstrap.
ci_incidence: 95% confidence interval(s) of the incidence estimate.
recency.params: a named list of recency test parameters, with specification in Arguments.
weights: (Optional) A numeric vector of weights used in the point estimation, returned if return_weights = TRUE.
Wang, X., Duerr, A., & Gao, F. (2025). Addressing population heterogeneity for HIV incidence estimation based on recency test. Statistics in Medicine. https://doi.org/10.1002/sim.70216
## Example 1: Incidence estimation with full recency parameters # Define covariates used in the model covariates <- c("rInfection_pos", "Receptive", "Anal_nocondom", "College") # Full recency parameters: # MDRI, its 95% CI, FRR, its 95% CI, and time cutoff T recency.params <- list( MDRI = c(182, 186), # MDRI (days) MDRI_CI = list(c(174, 189), c(170, 198)), # 95% CI for MDRI FRR = c(0, 0.02), # False recent rate FRR_CI = list(c(0, 0), c(0.015, 0.03)), # 95% CI for FRR T = 2 # Time cutoff (years) ) # Run the estimator using observed recency status estimate_incidence( data = test.cross, target_data = test.target, status_col = "pos", recency_col = "rpos", covariates = covariates, recency.params = recency.params, subtype_col = "Subtype", n_boot = 3 )## Example 1: Incidence estimation with full recency parameters # Define covariates used in the model covariates <- c("rInfection_pos", "Receptive", "Anal_nocondom", "College") # Full recency parameters: # MDRI, its 95% CI, FRR, its 95% CI, and time cutoff T recency.params <- list( MDRI = c(182, 186), # MDRI (days) MDRI_CI = list(c(174, 189), c(170, 198)), # 95% CI for MDRI FRR = c(0, 0.02), # False recent rate FRR_CI = list(c(0, 0), c(0.015, 0.03)), # 95% CI for FRR T = 2 # Time cutoff (years) ) # Run the estimator using observed recency status estimate_incidence( data = test.cross, target_data = test.target, status_col = "pos", recency_col = "rpos", covariates = covariates, recency.params = recency.params, subtype_col = "Subtype", n_boot = 3 )
Computes weights for individuals in the cross-sectional HIV population when estimating HIV incidence for an external population.
estimate_weights_external( data, status_col, covariates, target_data, subtype_col = NULL )estimate_weights_external( data, status_col, covariates, target_data, subtype_col = NULL )
data |
A data frame containing cross-sectional recency testing data. It must include HIV status, recency test results, and covariates, and optionally subtype label. |
status_col |
Character. Column name in |
covariates |
Character vector. Vector of column names in |
target_data |
A data frame containing covariates (and subtype if applicable) from the external target population. Required only when estimating for a external population. |
subtype_col |
Character (optional). Column name in |
A numeric vector of estimated weights for each individual in the cross-sectional dataset.
## Example: external target population weighting ## Define covariates used for weighting covariates <- c("rInfection_pos", "Receptive", "Anal_nocondom", "College") ## Estimate external weights weights_ext <- estimate_weights_external( data = test.cross, status_col = "pos", covariates = covariates, target_data = test.target, subtype_col = "Subtype" ) ## Inspect weights for different subtypes unique(weights_ext)## Example: external target population weighting ## Define covariates used for weighting covariates <- c("rInfection_pos", "Receptive", "Anal_nocondom", "College") ## Estimate external weights weights_ext <- estimate_weights_external( data = test.cross, status_col = "pos", covariates = covariates, target_data = test.target, subtype_col = "Subtype" ) ## Inspect weights for different subtypes unique(weights_ext)
Computes inverse probability weights for individuals in the cross-sectional HIV population when estimating HIV incidence for an internal population.
estimate_weights_internal( data, status_col, covariates, target_col, subtype_col = NULL )estimate_weights_internal( data, status_col, covariates, target_col, subtype_col = NULL )
data |
A data frame containing cross-sectional recency testing data. It must include HIV status, recency test results, covariates, target group indicator, and optionally subtype label. |
status_col |
Character. Column name in |
covariates |
Character vector. Vector of column names in |
target_col |
Character. Column name in |
subtype_col |
Character (optional). Column name in |
A numeric vector of estimated weights for each individual in the cross-sectional dataset.
## Example: internal population weighting ## Define covariates used for weighting covariates <- c("rInfection_pos", "Receptive", "Anal_nocondom", "College") ## Estimate external weights weights_int <- estimate_weights_internal( data = test.cross, status_col = "pos", covariates = covariates, target_col = "intrial", subtype_col = "Subtype" ) ## Inspect the weights for different subtypes unique(weights_int)## Example: internal population weighting ## Define covariates used for weighting covariates <- c("rInfection_pos", "Receptive", "Anal_nocondom", "College") ## Estimate external weights weights_int <- estimate_weights_internal( data = test.cross, status_col = "pos", covariates = covariates, target_col = "intrial", subtype_col = "Subtype" ) ## Inspect the weights for different subtypes unique(weights_int)
A simulated dataset generated to illustrate cross-sectional HIV incidence estimation with subtype-specific recency parameters and population adjustment.
test.crosstest.cross
A data frame with 5000 rows and 9 variables:
Binary indicator of HIV infection status (1 = positive, 0 = negative).
Binary indicator of recent infection among HIV-positive individuals (1 = recent, 0 = non-recent).
Binary simulation indicator used for internal data generation.
Binary indicator of rectal infection.
Binary indicator of receptive anal intercourse.
Binary indicator of anal intercourse without condom use.
Binary indicator of postsecondary education.
Factor indicating HIV subtype classification.
Binary indicator of enrollment in the target (trial) population.
The dataset is intended solely for demonstration and testing purposes.
Simulated data generated under assumptions described in Wang, Duerr, and Gao (2025) for methodological illustration.
A simulated cohort dataset representing an external target population used for evaluating transportability and population-adjusted HIV incidence estimation.
test.targettest.target
A data frame with 2500 rows and 8 variables:
Observed follow-up time (in years) in the target cohort.
Binary indicator of HIV seroconversion during follow-up (1 = event, 0 = censored).
Binary simulation indicator used for internal data generation.
Binary indicator of rectal infection.
Binary indicator of receptive anal intercourse.
Binary indicator of anal intercourse without condom use.
Binary indicator of postsecondary education.
Factor indicating HIV subtype classification.
The dataset includes follow-up time and event indicators, along with baseline covariates and subtype information. It is intended solely for methodological illustration and testing purposes.
This dataset can be used as an external target population when estimating inverse probability weights to transport cross-sectional incidence estimates.
Simulated data generated under assumptions described in Wang, Duerr, and Gao (2025) for methodological illustration.