Correct Batch Effects in RNA-seq: ComBat-seq, limma & MLM
Master batch effect correction in RNA-seq with three proven methods: ComBat-seq, limma's removeBatchEffect, and mixed linear models. Learn when to use each approach and how to integrate batch adjustment into DESeq2, edgeR, and limma workflows. π FULL TUTORIAL WITH CODE: https://ngs101.com/how-to-analyze-rnaseq-data-for-absolute-beginners-21-a-comprehensive-guide-to-batch-effects-covariates-adjustment/ β±οΈ TIMESTAMPS: 0:00 - Why Batch Effects Destroy RNA-seq Results 2:18 - Setup R Environment: sva, limma, edgeR, lme4 2:37 - Prepare the Example Dataset 4:22 - Identify Batch Effects with PCA Visualization 5:11 - Method 1: ComBat-seq Empirical Bayes Correction 6:23 - Method 2: limma removeBatchEffect with Voom 7:34 - Method 3: Mixed Linear Models for Complex Designs 10:43 - Adjust Batch Effects during DE Analysis π¬ WHAT YOU'LL LEARN: β Identify batch effects through PCA visualization β Apply ComBat-seq empirical Bayes correction to count data β Use limma's removeBatchEffect for normalized expression β Implement mixed linear models for hierarchical designs β Integrate batch as covariate in DESeq2, edgeR, limma β Choose between correction vs statistical modeling β Avoid overcorrection and complete confounding β Validate correction effectiveness with visualizations 𧬠WHY BATCH EFFECTS MATTER: Batch effects are systematic technical variations from different sequencing runs, reagent lots, sample preparation protocols, personnel, and environmental conditions. They can dwarf biological signals, cause false discoveries, and compromise reproducibility. Common sources include processing dates, instruments, technicians, and time-related factors spanning weeks or months. π§ THREE CORRECTION METHODS: **ComBat-seq (Empirical Bayes)**: - Works directly on RNA-seq count data - Borrows information across genes for small samples - Function: ComBat_seq() from sva package **limma removeBatchEffect**: - Works on normalized log-CPM values - Integrates with limma-voom workflow - Best for visualization (not for DE testing) **Mixed Linear Models (MLM)**: - Handles nested or crossed batch structures - Models fixed and random effects - Uses lmer() from lme4 package π‘ CORRECTION VS MODELING: **Data Correction** (for visualization): β Transform data to remove batch variation β Use for PCA plots, heatmaps, clustering β Never use corrected counts for statistical testing **Statistical Modeling** (for DE analysis): β Include batch in design formula: ~ batch + treatment β Proper approach for hypothesis testing in DESeq2, edgeR, limma β Don't correct data AND include batch in model - choose ONE π WORKFLOW: 1. Visualize batch effects with PCA before correction 2. Apply appropriate correction method 3. Validate with after-correction PCA 4. For DE analysis: Include batch in design formula instead β οΈ CRITICAL PITFALLS: **Complete Confounding**: All controls in batch 1, treatments in batch 2 = impossible to correct. Redesign experiment. **Overcorrection**: Removes real biology. Compare before/after PCA. **Wrong Formula Order**: Use ~ batch + treatment (not ~ treatment + batch) **Double Adjustment**: Never correct data AND include batch in model π» SOFTWARE & PACKAGES: - sva: ComBat-seq for count correction - limma: removeBatchEffect and voom - edgeR: TMM normalization and QL framework - lme4: Mixed linear models - DESeq2: Batch in design formula - ggplot2: Visualization πΊ RNA-SEQ COURSE NAVIGATION: β Part 20: Comparing limma, DESeq2, and edgeR - https://ngs101.com/how-to-analyze-rnaseq-data-for-absolute-beginners-part-20-comparing-limma-deseq2-and-edger-in-differential-expression-analysis/ π COMPLETE PLAYLIST: https://youtube.com/playlist?list=PLTfZb80upqPhaClVX_sVXeCvJRQ6KQdH4 π Subscribe: @NGS101-LearningHub π§ Free scripts: https://ngs101.com #RNAseq #BatchEffects #BioinformaticsTutorial #ComBatSeq #DESeq2 #edgeR #limma #GeneExpression
Download
1 formatsVideo Formats
Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.