# Creates replicate factors for the generalized survey bootstrap

Source:`R/generalized_bootstrap.R`

`make_gen_boot_factors.Rd`

Creates replicate factors for the generalized survey bootstrap method. The generalized survey bootstrap is a method for forming bootstrap replicate weights from a textbook variance estimator, provided that the variance estimator can be represented as a quadratic form whose matrix is positive semidefinite (this covers a large class of variance estimators).

## Arguments

- Sigma
The matrix of the quadratic form used to represent the variance estimator. Must be positive semidefinite.

- num_replicates
The number of bootstrap replicates to create.

- tau
Either

`"auto"`

, or a single number. This is the rescaling constant used to avoid negative weights through the transformation \(\frac{w + \tau - 1}{\tau}\), where \(w\) is the original weight and \(\tau\) is the rescaling constant`tau`

.

If`tau="auto"`

, the rescaling factor is determined automatically as follows: if all of the adjustment factors are nonnegative, then`tau`

is set equal to 1; otherwise,`tau`

is set to the smallest value needed to rescale the adjustment factors such that they are all at least`0.01`

.- exact_vcov
If

`exact_vcov=TRUE`

, the replicate factors will be generated such that their variance-covariance matrix exactly matches the target variance estimator's quadratic form (within numeric precision). This is desirable as it causes variance estimates for totals to closely match the values from the target variance estimator. This requires that`num_replicates`

exceeds the rank of`Sigma`

. The replicate factors are generated by applying PCA-whitening to a collection of draws from a multivariate Normal distribution, then applying a coloring transformation to the whitened collection of draws.

## Value

A matrix with the same number of rows as `Sigma`

, and the number of columns
equal to `num_replicates`

. The object has an attribute named `tau`

which can be retrieved
by calling `attr(which = 'tau')`

on the object. The value `tau`

is a rescaling factor
which was used to avoid negative weights.

In addition, the object has attributes named `scale`

and `rscales`

which can be
passed directly to svrepdesign. Note that the value of `scale`

is \(\tau^2/B\),
while the value of `rscales`

is vector of length \(B\), with every entry equal to \(1\).

## Statistical Details

Let \(v( \hat{T_y})\) be the textbook variance estimator for an estimated population total \(\hat{T}_y\) of some variable \(y\). The base weight for case \(i\) in our sample is \(w_i\), and we let \(\breve{y}_i\) denote the weighted value \(w_iy_i\). Suppose we can represent our textbook variance estimator as a quadratic form: \(v(\hat{T}_y) = \breve{y}\Sigma\breve{y}^T\), for some \(n \times n\) matrix \(\Sigma\). The only constraint on \(\Sigma\) is that, for our sample, it must be symmetric and positive semidefinite.

The bootstrapping process creates \(B\) sets of replicate weights, where the \(b\)-th set of replicate weights is a vector of length \(n\) denoted \(\mathbf{a}^{(b)}\), whose \(k\)-th value is denoted \(a_k^{(b)}\). This yields \(B\) replicate estimates of the population total, \(\hat{T}_y^{*(b)}=\sum_{k \in s} a_k^{(b)} \breve{y}_k\), for \(b=1, \ldots B\), which can be used to estimate sampling variance.

$$ v_B\left(\hat{T}_y\right)=\frac{\sum_{b=1}^B\left(\hat{T}_y^{*(b)}-\hat{T}_y\right)^2}{B} $$

This bootstrap variance estimator can be written as a quadratic form:

$$ v_B\left(\hat{T}_y\right) =\mathbf{\breve{y}}^{\prime}\Sigma_B \mathbf{\breve{y}} $$ where $$ \boldsymbol{\Sigma}_B = \frac{\sum_{b=1}^B\left(\mathbf{a}^{(b)}-\mathbf{1}_n\right)\left(\mathbf{a}^{(b)}-\mathbf{1}_n\right)^{\prime}}{B} $$

Note that if the vector of adjustment factors \(\mathbf{a}^{(b)}\) has expectation \(\mathbf{1}_n\) and variance-covariance matrix \(\boldsymbol{\Sigma}\),
then we have the bootstrap expectation \(E_{*}\left( \boldsymbol{\Sigma}_B \right) = \boldsymbol{\Sigma}\). Since the bootstrap process takes the sample values \(\breve{y}\) as fixed, the bootstrap expectation of the variance estimator is \(E_{*} \left( \mathbf{\breve{y}}^{\prime}\Sigma_B \mathbf{\breve{y}}\right)= \mathbf{\breve{y}}^{\prime}\Sigma \mathbf{\breve{y}}\).
Thus, we can produce a bootstrap variance estimator with the same expectation as the textbook variance estimator simply by randomly generating \(\mathbf{a}^{(b)}\) from a distribution with the following two conditions:
**Condition 1**: \(\quad \mathbf{E}_*(\mathbf{a})=\mathbf{1}_n\)
**Condition 2**: \(\quad \mathbf{E}_*\left(\mathbf{a}-\mathbf{1}_n\right)\left(\mathbf{a}-\mathbf{1}_n\right)^{\prime}=\mathbf{\Sigma}\)

While there are multiple ways to generate adjustment factors satisfying these conditions,
the simplest general method is to simulate from a multivariate normal distribution: \(\mathbf{a} \sim MVN(\mathbf{1}_n, \boldsymbol{\Sigma})\).
This is the method used by this function.

## Details on Rescaling to Avoid Negative Adjustment Factors

Let \(\mathbf{A} = \left[ \mathbf{a}^{(1)} \cdots \mathbf{a}^{(b)} \cdots \mathbf{a}^{(B)} \right]\) denote the \((n \times B)\) matrix of bootstrap adjustment factors. To eliminate negative adjustment factors, Beaumont and Patak (2012) propose forming a rescaled matrix of nonnegative replicate factors \(\mathbf{A}^S\) by rescaling each adjustment factor \(a_k^{(b)}\) as follows: $$ a_k^{S,(b)} = \frac{a_k^{(b)} + \tau - 1}{\tau} $$ where \(\tau \geq 1 - a_k^{(b)} \geq 1\) for all \(k\) in \(\left\{ 1,\ldots,n \right\}\) and all \(b\) in \(\left\{1, \ldots, B\right\}\).

The value of \(\tau\) can be set based on the realized adjustment factor matrix \(\mathbf{A}\) or by choosing \(\tau\) prior to generating the adjustment factor matrix \(\mathbf{A}\) so that \(\tau\) is likely to be large enough to prevent negative bootstrap weights.

If the adjustment factors are rescaled in this manner, it is important to adjust the scale factor used in estimating the variance with the bootstrap replicates, which becomes \(\frac{\tau^2}{B}\) instead of \(\frac{1}{B}\). $$ \textbf{Prior to rescaling: } v_B\left(\hat{T}_y\right) = \frac{1}{B}\sum_{b=1}^B\left(\hat{T}_y^{*(b)}-\hat{T}_y\right)^2 $$ $$ \textbf{After rescaling: } v_B\left(\hat{T}_y\right) = \frac{\tau^2}{B}\sum_{b=1}^B\left(\hat{T}_y^{S*(b)}-\hat{T}_y\right)^2 $$ When sharing a dataset that uses rescaled weights from a generalized survey bootstrap, the documentation for the dataset should instruct the user to use replication scale factor \(\frac{\tau^2}{B}\) rather than \(\frac{1}{B}\) when estimating sampling variances.

## References

The generalized survey bootstrap was first proposed by Bertail and Combris (1997).
See Beaumont and Patak (2012) for a clear overview of the generalized survey bootstrap.
The generalized survey bootstrap represents one strategy for forming replication variance estimators
in the general framework proposed by Fay (1984) and Dippo, Fay, and Morganstein (1984).

- Beaumont, Jean-François, and Zdenek Patak. 2012. “On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling: Generalized Bootstrap for Sample Surveys.” International Statistical Review 80 (1): 127–48. https://doi.org/10.1111/j.1751-5823.2011.00166.x.

- Bertail, and Combris. 1997. “Bootstrap Généralisé d’un Sondage.” Annales d’Économie Et de Statistique, no. 46: 49. https://doi.org/10.2307/20076068.

- Dippo, Cathryn, Robert Fay, and David Morganstein. 1984. “Computing Variances from Complex Samples with Replicate Weights.” In, 489–94. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf.

- Fay, Robert. 1984. “Some Properties of Estimates of Variance Based on Replication Methods.” In, 495–500. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_095.pdf.

## See also

The function `make_quad_form_matrix`

can be used to
represent several common variance estimators as a quadratic form's matrix,
which can then be used as an input to `make_gen_boot_factors()`

.

## Examples

```
if (FALSE) {
library(survey)
# Load an example dataset that uses unequal probability sampling ----
data('election', package = 'survey')
# Create matrix to represent the Horvitz-Thompson estimator as a quadratic form ----
n <- nrow(election_pps)
pi <- election_jointprob
horvitz_thompson_matrix <- matrix(nrow = n, ncol = n)
for (i in seq_len(n)) {
for (j in seq_len(n)) {
horvitz_thompson_matrix[i,j] <- 1 - (pi[i,i] * pi[j,j])/pi[i,j]
}
}
## Equivalently:
horvitz_thompson_matrix <- make_quad_form_matrix(
variance_estimator = "Horvitz-Thompson",
joint_probs = election_jointprob
)
# Make generalized bootstrap adjustment factors ----
bootstrap_adjustment_factors <- make_gen_boot_factors(
Sigma = horvitz_thompson_matrix,
num_replicates = 80,
tau = 'auto'
)
# Determine replication scale factor for variance estimation ----
tau <- attr(bootstrap_adjustment_factors, 'tau')
B <- ncol(bootstrap_adjustment_factors)
replication_scaling_constant <- tau^2 / B
# Create a replicate design object ----
election_pps_bootstrap_design <- svrepdesign(
data = election_pps,
weights = 1 / diag(election_jointprob),
repweights = bootstrap_adjustment_factors,
combined.weights = FALSE,
type = "other",
scale = attr(bootstrap_adjustment_factors, 'scale'),
rscales = attr(bootstrap_adjustment_factors, 'rscales')
)
# Compare estimates to Horvitz-Thompson estimator ----
election_pps_ht_design <- svydesign(
id = ~1,
fpc = ~p,
data = election_pps,
pps = ppsmat(election_jointprob),
variance = "HT"
)
svytotal(x = ~ Bush + Kerry,
design = election_pps_bootstrap_design)
svytotal(x = ~ Bush + Kerry,
design = election_pps_ht_design)
}
```