
Create bootstrap replicate weights using the "doubled half bootstrap" method of Antal and Tillé (2014).
Source:R/make_bootstrap_weights.R
make_doubled_half_bootstrap_weights.Rd
Creates bootstrap replicate weights using the method of Antal and Tillé (2014). This method is applicable to single-stage sample designs, potentially with stratification and clustering. It can be used for designs that use simple random sampling without replacement or unequal probability sampling without replacement. One advantage of this method is that it yields integer replicate factors of 0, 1, 2, or 3.
Usage
make_doubled_half_bootstrap_weights(
num_replicates = 100,
samp_unit_ids,
strata_ids,
samp_unit_sel_probs,
output = "weights"
)
Arguments
- num_replicates
Positive integer giving the number of bootstrap replicates to create.
- samp_unit_ids
Vector of sampling unit IDs.
- strata_ids
Vector of strata IDs for each sampling unit at each stage of sampling.
- samp_unit_sel_probs
Vector of selection probabilities for each sampling unit.
- output
Either
"weights"
(the default) or"factors"
. Specifyingoutput = "factors"
returns a matrix of replicate adjustment factors which can later be multiplied by the full-sample weights to produce a matrix of replicate weights. Specifyingoutput = "weights"
returns the matrix of replicate weights, where the full-sample weights are inferred usingsamp_unit_sel_probs
.
Value
A matrix of with the same number of rows as samp_unit_ids
and the number of columns equal to the value of the argument num_replicates
.
Specifying output = "factors"
returns a matrix of replicate adjustment factors which can later be multiplied by
the full-sample weights to produce a matrix of replicate weights.
Specifying output = "weights"
returns the matrix of replicate weights,
where the full-sample weights are inferred using samp_unit_sel_probs
.
Details
For stratified sampling, the replicate factors are generated independently in each stratum. For cluster sampling at a given stage, the replicate factors are generated at the cluster level and then the cluster's replicate factors are applied to all units in the cluster.
In the case of unequal probability sampling, this bootstrap method is only recommended for high entropy sampling methods (i.e., most methods other than systematic sampling).
See Section 7 of Antal and Tillé (2014) for a clear description of how the replicates are formed. The paper presents two options for the resampling probabilities used in replication: the R function uses the option referred to in the paper as "the \(\pi\)-bootstrap."
References
Antal, E. and Tillé, Y. (2014). "A new resampling method for sampling designs without replacement: The doubled half bootstrap." Computational Statistics, 29(5), 1345-1363. https://doi.org/10.1007/s00180-014-0495-0
See also
If the survey design can be accurately represented using svydesign
,
then it is easier to simply use as_bootstrap_design
with argument type = "Antal-Tille"
.
Use estimate_boot_reps_for_target_cv
to help choose the number of bootstrap replicates.
Examples
# \donttest{
library(survey)
# Example 1: A cluster sample
data('library_multistage_sample', package = 'svrep')
replicate_factors <- make_doubled_half_bootstrap_weights(
num_replicates = 5,
samp_unit_ids = library_multistage_sample$PSU_ID,
strata_ids = rep(1, times = nrow(library_multistage_sample)),
samp_unit_sel_probs = library_multistage_sample$PSU_SAMPLING_PROB,
output = "factors"
)
# Example 2: A single-stage sample selected with unequal probabilities, without replacement
## Load an example dataset of U.S. counties states with 2004 Presidential vote counts
data("election", package = 'survey')
pps_wor_design <- svydesign(data = election_pps,
pps = "overton",
fpc = ~ p, # Inclusion probabilities
ids = ~ 1)
## Create bootstrap replicate weights
set.seed(2022)
bootstrap_replicate_weights <- make_doubled_half_bootstrap_weights(
num_replicates = 5000,
samp_unit_ids = pps_wor_design$cluster[,1],
strata_ids = pps_wor_design$strata[,1],
samp_unit_sel_probs = pps_wor_design$prob
)
## Create a replicate design object with the survey package
bootstrap_rep_design <- svrepdesign(
data = pps_wor_design$variables,
repweights = bootstrap_replicate_weights,
weights = weights(pps_wor_design, type = "sampling"),
type = "bootstrap"
)
## Compare std. error estimates from bootstrap versus linearization
data.frame(
'Statistic' = c('total', 'mean'),
'SE (bootstrap)' = c(SE(svytotal(x = ~ Bush, design = bootstrap_rep_design)),
SE(svymean(x = ~ I(Bush/votes),
design = bootstrap_rep_design))),
'SE (Overton\'s PPS approximation)' = c(SE(svytotal(x = ~ Bush,
design = pps_wor_design)),
SE(svymean(x = ~ I(Bush/votes),
design = pps_wor_design))),
check.names = FALSE
)
#> Statistic SE (bootstrap) SE (Overton's PPS approximation)
#> 1 total 2.437243e+06 2.939608e+06
#> 2 mean 1.359043e-01 1.089739e-01
# }