Stratified Sampling in R (part 1)
Stratified Sampling explained and demonstrated with a simulated example. part 2 of this series: https://youtu.be/WLPm-X4isvk part 3 of this series: https://youtu.be/j6BUHlnb6fs R code: #Stratified Sampling #alternative to SRS #divide population into k non-overlapping distinct subpopulations called strata #Why Stratify? #1 - interested in learning about the subpopulation; perhaps to compare them later #2 - convenient for organizing data collection #3 - improve precision of your estimate; smaller error of estimation; especially # when strata are homogenous; efficiency gain set.seed(9850) df = data.frame(gender=rep(c("F","M"),c(6000,4000)), ht=c(rnorm(6000, mean=60, sd=5),rnorm(4000, mean=90, sd=5))) #Population parameters (mu, sigma, strata, etc) mean(df$ht) table(df$gender) var(df$ht) sd(df$ht) aggregate(df$ht ~ df$gender, FUN=mean) aggregate(df$ht ~ df$gender, FUN=sd) #Generating 1000 SRS, size n=50, for purposes of measuring precision of estimate for mu set.seed(9850) xbar = apply(replicate(1000, sample(df$ht, 50)), 2, FUN=mean) mean(xbar) var(xbar) sd(xbar) sigmasq_xbar = ((var(df$ht) * (length(df$ht) - 1)) / length(df$ht)) / 50 sigmasq_xbar_hat = var(xbar) #proportional stratified RS using gender for strata table(df$gender) / nrow(df) * 50 #30 F and 20 M xbarStrat = NA set.seed(9850) for (i in 1:1000) {xbarStrat[i] = mean( c(df[sample(6000, 30), "ht"], df[sample(6001:10000, 20), "ht"]) ) } mean(xbarStrat) var(xbarStrat) sigmasq_xbarStrat = (6000/10000)^2*(((var(df[df$gender %in% "F", "ht"])*(6000-1)) / 6000)/30)*((6000-30)/(6000-1)) + (4000/10000)^2*(((var(df[df$gender %in% "M", "ht"]) * (4000 - 1)) / 4000)/20)*((4000-20)/(4000-1)) #visualizing the precision differential between the two sampling techniques par(mfrow=c(1,2)) hist(xbar, freq=F, xlim=c(65,85)) lines(density(xbar), col="red") hist(xbarStrat, freq=F, xlim=c(65,85)) lines(density(xbarStrat), col="red")
Download
1 formatsVideo Formats
Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.