Skip to contents

Randomly subsamples the replicates of a survey design object, to keep only a subset. The scale factor used in estimation is increased to account for the subsampling.

Usage

subsample_replicates(design, n_reps)

Arguments

design

A survey design object, created with either the survey or srvyr packages.

n_reps

The number of replicates to keep after subsampling

Value

An updated survey design object, where only a random selection of the replicates has been retained. The overall 'scale' factor for the design (accessed with design$scale) is increased to account for the sampling of replicates.

Statistical Details

Suppose the initial replicate design has \(L\) replicates, with respective constants \(c_k\) for \(k=1,\dots,L\) used to estimate variance with the formula $$v_{R} = \sum_{k=1}^L c_k\left(\hat{T}_y^{(k)}-\hat{T}_y\right)^2$$

With subsampling of replicates, \(L_0\) of the original \(L\) replicates are randomly selected, and then variances are estimated using the formula: $$v_{R} = \frac{L}{L_0} \sum_{k=1}^{L_0} c_k\left(\hat{T}_y^{(k)}-\hat{T}_y\right)^2$$

This subsampling is suggested for certain replicate designs in Fay (1989). Kim and Wu (2013) provide a detailed theoretical justification and also propose alternative methods of subsampling replicates.

References

Fay, Robert. 1989. "Theory And Application Of Replicate Weighting For Variance Calculations." In, 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1989_033.pdf

Kim, J.K. and Wu, C. 2013. "Sparse and Efficient Replication Variance Estimation for Complex Surveys." Survey Methodology, Statistics Canada, 39(1), 91-120.

Examples

library(survey)
set.seed(2023)

# Create an example survey design object

  sample_data <- data.frame(
    STRATUM = c(1,1,1,1,2,2,2,2),
    PSU     = c(1,2,3,4,5,6,7,8)
  )

  survey_design <- svydesign(
    data = sample_data,
    strata = ~ STRATUM,
    ids = ~ PSU,
    weights = ~ 1
  )

  rep_design <- survey_design |>
    as_fays_gen_rep_design(variance_estimator = "Ultimate Cluster")

# Inspect replicates before subsampling

  rep_design |> getElement("repweights")
#>          REP_1     REP_2     REP_3     REP_4     REP_5     REP_6     REP_7
#> [1,] 1.3535534 0.6464466 1.3535534 0.6464466 1.3535534 0.6464466 1.3535534
#> [2,] 1.0722540 0.6864786 1.3135214 0.9277460 0.6920437 1.5492236 0.4507764
#> [3,] 0.4135167 1.1689015 0.8310985 1.5864833 1.3507810 1.0668008 0.9331992
#> [4,] 1.1606758 1.4981733 0.5018267 0.8393242 0.6036219 0.7375290 1.2624710
#> [5,] 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534 1.3535534
#> [6,] 0.9342068 1.3506702 1.3506702 0.9342068 0.8300909 0.4136276 0.4136276
#> [7,] 1.2618712 0.6028047 0.6028047 1.2618712 0.5024265 1.1614930 1.1614930
#> [8,] 0.4503686 0.6929717 0.6929717 0.4503686 1.3139292 1.0713260 1.0713260
#>          REP_8
#> [1,] 0.6464466
#> [2,] 1.3079563
#> [3,] 0.6492190
#> [4,] 1.3963781
#> [5,] 1.3535534
#> [6,] 0.8300909
#> [7,] 0.5024265
#> [8,] 1.3139292
#> attr(,"scale")
#> [1] 1
#> attr(,"rscales")
#> [1] 1 1 1 1 1 1 1 1

# Inspect replicates after subsampling

  rep_design |>
    subsample_replicates(n_reps = 4) |>
    getElement("repweights")
#>          REP_5     REP_1     REP_7     REP_8
#> [1,] 1.3535534 1.3535534 1.3535534 0.6464466
#> [2,] 0.6920437 1.0722540 0.4507764 1.3079563
#> [3,] 1.3507810 0.4135167 0.9331992 0.6492190
#> [4,] 0.6036219 1.1606758 1.2624710 1.3963781
#> [5,] 1.3535534 1.3535534 1.3535534 1.3535534
#> [6,] 0.8300909 0.9342068 0.4136276 0.8300909
#> [7,] 0.5024265 1.2618712 1.1614930 0.5024265
#> [8,] 1.3139292 0.4503686 1.0713260 1.3139292
#> attr(,"scale")
#> [1] 4
#> attr(,"rscales")
#> [1] 1 1 1 1