Skip to contents

Creates bootstrap replicate weights using the method of Antal and Tillé (2014). This method is applicable to single-stage sample designs, potentially with stratification and clustering. It can be used for designs that use simple random sampling without replacement or unequal probability sampling without replacement. One advantage of this method is that it yields integer replicate factors of 0, 1, 2, or 3.

Usage

make_doubled_half_bootstrap_weights(
  num_replicates = 100,
  samp_unit_ids,
  strata_ids,
  samp_unit_sel_probs,
  output = "weights"
)

Arguments

num_replicates

Positive integer giving the number of bootstrap replicates to create.

samp_unit_ids

Vector of sampling unit IDs.

strata_ids

Vector of strata IDs for each sampling unit at each stage of sampling.

samp_unit_sel_probs

Vector of selection probabilities for each sampling unit.

output

Either "weights" (the default) or "factors". Specifying output = "factors" returns a matrix of replicate adjustment factors which can later be multiplied by the full-sample weights to produce a matrix of replicate weights. Specifying output = "weights" returns the matrix of replicate weights, where the full-sample weights are inferred using samp_unit_sel_probs.

Value

A matrix of with the same number of rows as samp_unit_ids and the number of columns equal to the value of the argument num_replicates. Specifying output = "factors" returns a matrix of replicate adjustment factors which can later be multiplied by the full-sample weights to produce a matrix of replicate weights. Specifying output = "weights" returns the matrix of replicate weights, where the full-sample weights are inferred using samp_unit_sel_probs.

Details

For stratified sampling, the replicate factors are generated independently in each stratum. For cluster sampling at a given stage, the replicate factors are generated at the cluster level and then the cluster's replicate factors are applied to all units in the cluster.

In the case of unequal probability sampling, this bootstrap method is only recommended for high entropy sampling methods (i.e., most methods other than systematic sampling).

See Section 7 of Antal and Tillé (2014) for a clear description of how the replicates are formed. The paper presents two options for the resampling probabilities used in replication: the R function uses the option referred to in the paper as "the \(\pi\)-bootstrap."

References

Antal, E. and Tillé, Y. (2014). "A new resampling method for sampling designs without replacement: The doubled half bootstrap." Computational Statistics, 29(5), 1345-1363. https://doi.org/10.1007/s00180-014-0495-0

See also

If the survey design can be accurately represented using svydesign, then it is easier to simply use as_bootstrap_design with argument type = "Antal-Tille".

Use estimate_boot_reps_for_target_cv to help choose the number of bootstrap replicates.

Examples

# \donttest{
 library(survey)
 
 # Example 1: A cluster sample
 
   data('library_multistage_sample', package = 'svrep')
  
   replicate_factors <- make_doubled_half_bootstrap_weights(
     num_replicates      = 5,
     samp_unit_ids       = library_multistage_sample$PSU_ID,
     strata_ids          = rep(1, times = nrow(library_multistage_sample)),
     samp_unit_sel_probs = library_multistage_sample$PSU_SAMPLING_PROB,
     output              = "factors"
   )

 # Example 2: A single-stage sample selected with unequal probabilities, without replacement

   ## Load an example dataset of U.S. counties states with 2004 Presidential vote counts
   data("election", package = 'survey')
   pps_wor_design <- svydesign(data = election_pps,
                               pps = "overton",
                               fpc = ~ p, # Inclusion probabilities
                               ids = ~ 1)

   ## Create bootstrap replicate weights
   set.seed(2022)
   bootstrap_replicate_weights <- make_doubled_half_bootstrap_weights(
     num_replicates      = 5000,
     samp_unit_ids       = pps_wor_design$cluster[,1],
     strata_ids          = pps_wor_design$strata[,1],
     samp_unit_sel_probs = pps_wor_design$prob
   )

   ## Create a replicate design object with the survey package
   bootstrap_rep_design <- svrepdesign(
     data       = pps_wor_design$variables,
     repweights = bootstrap_replicate_weights,
     weights    = weights(pps_wor_design, type = "sampling"),
     type       = "bootstrap"
   )

   ## Compare std. error estimates from bootstrap versus linearization
   data.frame(
     'Statistic' = c('total', 'mean'),
     'SE (bootstrap)' = c(SE(svytotal(x = ~ Bush, design = bootstrap_rep_design)),
                          SE(svymean(x = ~ I(Bush/votes),
                                     design = bootstrap_rep_design))),
     'SE (Overton\'s PPS approximation)' = c(SE(svytotal(x = ~ Bush,
                                                         design = pps_wor_design)),
                                             SE(svymean(x = ~ I(Bush/votes),
                                                        design = pps_wor_design))),
     check.names = FALSE
   )
#>   Statistic SE (bootstrap) SE (Overton's PPS approximation)
#> 1     total   2.437243e+06                     2.939608e+06
#> 2      mean   1.359043e-01                     1.089739e-01
# }