Increasingly there is interest in studying the microbiome from environments that contain few microbes (low microbial biomass). 16S rRNA gene sequencing allows researchers to study microbes from environments in a high throughput culture independent manner, but recent research has highlighted that this technique is prone to contaminants when applied to samples originating from low microbial biomass environments.
We recently performed a study to understand the impact of decreasing microbial biomass on 16S rRNA gene sequencing experiments and evaluate the current computational strategies to control for contaminants (preprint available here).
Here, we provide the compiled R Markdown document to reproduce the analysis presented in the manuscript, divided into 6 primary sections:
Here, we provide the compiled R Markdown document to reproduce the analyses of the mock microbial community presented in the manuscript. Readers interested in the details of how to use the different methods should click on the respective links above.
# load libraries
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(ggplot2)
library(ggpubr)
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:tidyr':
##
## extract
library(knitr)
options(digits=2)
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggpubr':
##
## get_legend
## The following object is masked from 'package:ggplot2':
##
## ggsave
load('results_filtering.RData')
load('results_decontam.RData')
load('results_sourcetracker.RData')
This loaded a LOT of data! Each method that was evaluated has its own results variable. The method used is indicated in the variable name:
Variable name | Method | Parameter changed |
---|---|---|
results_blank | negative control | none |
results_filter_001 | abundance filter | abundance < 0.01 % removed |
results_filter_01 | abundance filter | abundance < 0.10 % removed |
results_filter_1 | abundance filter | abundance < 1.00 % removed |
results_decontam_0_1 | decontam frequency | threshold = 0.1 |
results_decontam_0_2 | decontam frequency | threshold = 0.2 |
results_decontam_0_3 | decontam frequency | threshold = 0.3 |
results_decontam_0_4 | decontam frequency | threshold = 0.4 |
results_decontam_0_5 | decontam frequency | threshold = 0.5 |
results_st_sc1_case1 | SourceTracker | Scenario 1, case 1 |
results_st_sc1_case2 | SourceTracker | Scenario 1, case 2 |
results_st_sc2_case1 | SourceTracker | Scenario 2, case 1 |
results_st_sc2_case2 | SourceTracker | Scenario 2, case 2 |
All of these objects have the same overall structure, but a few modifications are needed to make everything work together (and look nice).
We need to make a few adjustments to the result data frames to make them easier to manage, summarize, and use for analysis.
These changes are documented as comments.
# Change the method in results_original to better match terminology used with other methods. (current is "Original Data")
results_original$method <- 'Uncorrected Results'
# Add prevalence to the original data and expected data (for plotting)
results_original$prevalence <- results_filter_0_1$prevalence
results_expected$prevalence <- results_filter_0_1$prevalence
Working with the results in individual variables is probably do-able, but likely a headache. We will save ourselves by combining all the results_* objects into one object called results_all. This was kind of done in a brute force way, if anyone has suggestions to do this more elegantly, please share!
We also have a few modifications to the results_all data frame that to make the data cleaner/easier to work with down the line.
# create massive data table with all results
results_all <- bind_rows(results_original,results_expected, results_blank, results_decontam_0_1,results_decontam_0_2,results_decontam_0_3, results_decontam_0_4, results_decontam_0_5, results_filter_0_01, results_filter_0_1, results_filter_1, results_st_sc1_case1, results_st_sc1_case2,results_st_sc2_case1, results_st_sc2_case2 )
# change 'Blank control removal' to 'Negative control filter' to reflect the manuscript
results_all <- results_all %>%
mutate(method = recode_factor(method,`Blank control removal`= 'Negative control filter'))
# change NAs to 0 (confrimed that NAs are only due to ASVs removed from blank control removal)
results_all[is.na(results_all)] <- 0
# resubset results_blank so it has all expected ASVs (ASVs that are not represented in the data set are now 0)
results_blank <- results_all %>%
filter(method == 'Negative control filter')
# reorder to be contistent with the rest of the resulst file
## figure out how to do this better
results_blank <- results_blank[,c(17,18,1:14,19:22,15,16)]
# order the methods
results_all$method <- factor(results_all$method, levels = c("Uncorrected Results", "Expected Results" ,"Negative control filter", "Abundance filter, 0.01","Abundance filter, 0.1", "Abundance filter, 1", "Decontam frequency, thr =0.1", "Decontam frequency, thr =0.2", "Decontam frequency, thr =0.3", "Decontam frequency, thr =0.4", "Decontam frequency, thr =0.5", "SourceTracker, scenario 1 case 1", "SourceTracker, scenario 1 case 2","SourceTracker, scenario 2 case 1", "SourceTracker, scenario 2 case 2"))
# create a column for contaminant removal method type
results_all <- results_all %>% mutate(method_type = gsub(',.*', '', method))
# order the method types
results_all$method_type <- factor(results_all$method_type, levels = c("Uncorrected Results", "Expected Results" ,"Negative control filter", "Abundance filter", "Decontam frequency","SourceTracker"))
The results_all variable contains all of our results in one nice organized data.frame. However, this is not the best format for everything.
Here, we subset the data into 1. results_asv_long - This variable has only the ASV abundance and associated data in long format to make it easy to work with ggplot.
2. results_limited - This variable has only a subset of samples spanning low, moderate, and high levels of contaminants (represented by D3, D6, and D8) 3. results_asv_limited_long - This variable has only the ASV abundance with associated data from a subset of dilution series samples and is in long format
# Create a pretty subset of ASV-only data for plotting
results_asv_long <- results_all %>%
select(contains("ASV_"),sample_names, method, method_type) %>%
gather(key = 'ASV',value = 'Abundance', -sample_names, -method, - method_type)
# Create a small subset of the result data representing low, moderate, and high levels of contaminants for plotting
results_limited <- results_all %>%
filter(sample_names %in% c('D3','D6','D8'))
results_asv_limited_long <- results_limited %>%
select(contains("ASV_"),sample_names, method, method_type) %>%
gather(key = 'ASV',value = 'Abundance', -sample_names, -method, - method_type)
We evaluated the effect of contaminant ASVs on the summary metrics typically used in microbiome studies - alpha diversity metrics such as the number of Observed ASVs, Shannon Diversity, and Inverse Simpson indices.
# Alpha diversity dataframe
results_all_alpha <- results_all %>% filter (method_type %in% c('Expected Results', 'Uncorrected Results','Negative control filter','Abundance filter','Decontam frequency','SourceTracker')) %>%
select(Observed, InvSimpson, Shannon,method, method_type, sample_names, prevalence) %>%
gather(key = 'alpha_diversity_measure',value = 'Value', -sample_names, -method, - method_type, - prevalence) %>%
mutate(alpha_diversity_measure = factor(alpha_diversity_measure, levels = c('Observed', 'InvSimpson', 'Shannon')))
# Figure 2 - Alpha diversity plots for expected and uncorrected results
figure_2 <- results_all_alpha %>% filter (method_type %in% c('Expected Results', 'Uncorrected Results')) %>%
ggplot(., aes(x = sample_names, y = Value, shape = method)) +
geom_point() + facet_wrap(~alpha_diversity_measure, scales = "free_y") +
theme_bw() +
theme(legend.position = "right", legend.title = element_text(size = 10),
axis.text = element_text(size = 8),
axis.title = element_text(size = 10),
plot.title = element_text(size = 10),
strip.text.x = element_text(size = 10)) +
labs(x = "Sample") +
ggtitle('Figure 2')
figure_2
From the above plot, we can see that leaving contaminants unaccounted for can lead to drastically inflated values of alpha diversity.
To evaluate the overall performance of each contaminant removal method, ASVs were classified as being correctly or incorrectly identified as a contaminant ASV (an ASV that was not expected to be part of the mock community ) or a mock community ASV (an ASV that is expected to occur in the mock community.
library(gtable)
library(grid)
customPalette <- c('#969696','#bdbdbd', '#d6604d', '#4393c3')
### Individual plots for publication
# Filter methods
classification_data <- results_all %>%
mutate(method = plyr::revalue(method, c("Uncorrected Results" = "A. None", "Expected Results" = "Expected Results", "Negative control filter" = "B. Negative control ", "Abundance filter, 0.01" = "C. Abundance, 0.01" , "Abundance filter, 0.1" = "D. Abundance, 0.1" ,"Abundance filter, 1" = "E. Abundance, 1.0" , "Decontam frequency, thr =0.1" = "F. Threshold = 0.1", "Decontam frequency, thr =0.2" = "G. Threshold = 0.2", "Decontam frequency, thr =0.3" = "H. Threshold = 0.3", "Decontam frequency, thr =0.4" = "I. Threshold = 0.4", "Decontam frequency, thr =0.5" = "J. Threshold = 0.5" , "SourceTracker, scenario 1 case 1" = "K. S1, case 1", "SourceTracker, scenario 1 case 2" = "L. S1, case 2", "SourceTracker, scenario 2 case 1" = "M. S2, case 1", "SourceTracker, scenario 2 case 2" = "N. S2, case 2")))
long_profile <- classification_data %>%
filter(method_type %in% c('Uncorrected Results', 'Abundance filter', 'Negative control filter')) %>%
select(true_pos, true_neg, false_pos, false_neg, sample_names, method, method_type) %>%
gather(key = 'SequenceClass',value = 'Abundance', -sample_names, -method, - method_type)
# Figures
classificationPlot_filter <- ggplot(long_profile, aes(x = sample_names, y = Abundance)) + facet_wrap(~method, ncol = 5) +
geom_col(aes(fill = SequenceClass), width = 0.7, position = position_fill()) +
scale_fill_manual(values=customPalette, ) + theme(text = element_text(size=9)) +
labs(x = NULL, y = 'Proportion of Reads') +
theme(legend.position = "none",
axis.text = element_text(size = 9),
axis.text.x = element_text(angle = 90, hjust = 1),
axis.title = element_text(size = 9),
plot.title = element_text(size = 9))
# Decontam methods
long_profile <- classification_data %>%
filter(method_type %in% c('Decontam frequency')) %>%
select(true_pos, true_neg, false_pos, false_neg, sample_names, method, method_type) %>%
gather(key = 'SequenceClass',value = 'Abundance', -sample_names, -method, - method_type)
# Figures
classificationPlot_decontam <- ggplot(long_profile, aes(x = sample_names, y = Abundance)) + facet_wrap(~method, ncol = 5) +
geom_col(aes(fill = SequenceClass), width = 0.7, position = position_fill()) +
scale_fill_manual(values=customPalette) + theme(text = element_text(size=9)) +
labs(x = NULL, y = 'Proportion of Reads') +
theme(legend.position = "none",
axis.text = element_text(size = 9),
axis.text.x = element_text(angle = 90, hjust = 1),
axis.title = element_text(size = 9),
plot.title = element_text(size = 9))
# SourceTracker
long_profile <- classification_data %>%
filter(method_type %in% c('SourceTracker')) %>%
select(true_pos, true_neg, false_pos, false_neg, sample_names, method, method_type) %>%
gather(key = 'SequenceClass',value = 'Abundance', -sample_names, -method, - method_type)
# Figures
classificationPlot_sourcetracker <- ggplot(long_profile, aes(x = sample_names, y = Abundance)) + facet_wrap(~method, ncol = 4) +
geom_col(aes(fill = SequenceClass), width = 0.7, position = position_fill()) +
scale_fill_manual(values=customPalette, labels = c("Mock \n(incorrect)","Contaminants \n(incorrect)", "Mock \n(correct)", "Contaminant \n(correct)")) +
guides(color = guide_legend(override.aes = list(size=7))) +
theme(text = element_text(size=9)) +
labs(x = "Sample", y = 'Proportion of Reads') +
theme(legend.position = "right",
axis.text = element_text(size = 9),
axis.text.x = element_text(angle = 90, hjust = 1),
axis.title = element_text(size = 9),
plot.title = element_text(size = 9),
legend.justification='left',
legend.margin = margin(0, unit='cm'))
figure_3 <- ggdraw() +
draw_plot(classificationPlot_filter, x = 0, y = .68, width = 1, height = .3) +
draw_plot(classificationPlot_decontam, x = 0, y = .35, width = 1, height = .3) +
draw_plot(classificationPlot_sourcetracker, x = 0, y = 0, width = 1, height = 0.33) +
draw_plot_label(label = c('Figure 3','Filter', 'Decontam frequency', 'SourceTracker'), size = 12,
x = c(.75,0, 0, 0), y = c(1,1, .67, .35), hjust = 0)
figure_3
We wanted to succinctly be able to evaluate the performance of each method and compare across methods. There are many ways to do this, we chose to use a classification scheme and evaluate performance based on common classification metrics which are commonly used and easy to interpret such as accuracy.
Accuracy measures the proportion of classifications that were correct. In our case, this is the number of ASVs correctly classified as contaminant ASVs (True Positives) AND number of ASVs correctly classified as mock community ASVs (True Negatives).
We summarized how accurate each method was by plotting the accuracy by the prevalence of contaminant ASVs (in the original sample) for each method.
# Figure 4 - Accuracy faceted by method type (in manuscript)
color_palette <- c(`1` ="#3182bd", `2` ="#fd8d3c" , `3` = "#d95f02", `4` = "#a63603", `5` = "#bcbddc" ,`6` = "#9e9ac8", `7`= "#807dba", `8` ="#6a51a3" , `9` = "#4a1486", `10`= "#99d8c6", `12` = "#66c2a4", `13` ="#2ca25f", `14` = "#006d2c")
figure_4 <- results_all %>% filter (method_type %in% c('Negative control filter','Abundance filter','Decontam frequency','SourceTracker')) %>%
ggplot(., aes(x = prevalence, y = accuracy, color = method)) +
geom_line() + geom_point() + facet_grid(~method_type) +
theme_bw() +
theme(legend.position = "right", legend.title = element_text(size = 10),
legend.text=element_text(size=8),
axis.text = element_text(size = 10),
axis.title = element_text(size = 10),
plot.title = element_text(size = 8),
strip.text.x = element_text(size = 10),
legend.key.size = unit(.01, 'lines'),
panel.spacing = unit(.3, "lines"),
legend.margin = margin(l = 0, unit = 'cm' )) +
scale_color_manual(values = unname(color_palette)) +
labs(x = "Contaminant Prevalence", y = "Accuracy") +
ggtitle('Figure 4')
figure_4
From this graph, we see that the negative control filter performed poorly regardless of the level of contaminants. All other methods worked well (with an accuracy near 1) for low levels of contaminants (less than 5% contaminant ASVs). All methods start declining once the amount of contaminants is greater than 20%, though the sourcetracker method with well-defined experimental environments, abundance filter of 1%, and decontam with a threshold of 0.5 work the best.
We also evaluated the ability of each method to predict contaminants correctly by measuring the sensitivity of each method.
# Sensitivity (not in manuscript)
results_all %>% filter (method_type %in% c('Negative control filter','Abundance filter','Decontam frequency','SourceTracker')) %>%
ggplot(., aes(x = prevalence, y = sensitivity, color = method)) +
geom_line() + geom_point() + facet_grid(~method_type) +
theme(legend.position = "right", legend.title = element_text(size = 10),
axis.text = element_text(size = 10),
axis.title = element_text(size = 10),
plot.title = element_text(size = 10),
strip.text.x = element_text(size = 10)) +
scale_color_manual(values = unname(color_palette))
To demonstrate the impact of contaminant removal on microbiome analyses, we present the relative abundance data after contaminant removal. The full results are in the supplemental materials (Supplemental Figure 1), and a subset is presented in the main manuscript.
# representative
fig_5a <- results_asv_limited_long %>% filter(ASV == 'ASV_1') %>%
filter (method_type %in% c('Abundance filter','Decontam frequency','SourceTracker', 'Negative control filter')) %>%
ggplot(., aes(x = method, y = Abundance, fill = method)) + geom_bar(stat = 'identity') + facet_wrap(~sample_names , ncol = 9) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ylim(0,35) + scale_fill_manual(values = unname(color_palette)) +
geom_hline(data = results_asv_limited_long %>% filter(method %in% c('Expected Results'), ASV == 'ASV_1'), aes(yintercept = Abundance, linetype ="Expected Results*"), show.legend = TRUE, color = 'black') +
theme(legend.position = "none",
axis.text.y = element_text(size = 12),
axis.title.y = element_text(size = 12),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
plot.title = element_text(size = 12))
# representative
fig_5b <- results_asv_limited_long %>% filter(ASV == 'ASV_2') %>%
filter (method_type %in% c('Abundance filter','Decontam frequency','SourceTracker', 'Negative control filter')) %>%
ggplot(., aes(x = method, y = Abundance, fill = method)) +
geom_bar(stat = 'identity') + facet_wrap(~sample_names , ncol = 9) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ylim(0,30) +
scale_fill_manual(values = unname(color_palette)) +
geom_hline(data = results_asv_limited_long %>% filter(method %in% c('Expected Results'), ASV == 'ASV_2'), aes(yintercept = Abundance, linetype ="Expected Results*"), show.legend = TRUE, color = 'black') +
theme(legend.position = "none",
axis.text.y = element_text(size = 12),
axis.title.y = element_text(size = 12),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
plot.title = element_text(size = 12))
# representative
fig_5c <- results_limited %>%
filter (method_type %in% c('Abundance filter','Decontam frequency','SourceTracker', 'Negative control filter')) %>%
ggplot(., aes(x = method, y = InvSimpson, fill = method) )+ geom_bar( stat = "identity") + facet_wrap(~sample_names , ncol = 9) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + scale_fill_manual(values = unname(color_palette)) + geom_hline(data = results_limited %>% filter(method %in% c('Expected Results')), aes(yintercept = InvSimpson, linetype ="Expected Results*"), show.legend = TRUE, color = 'black') +
theme(legend.position = "none",
axis.text.y = element_text(size = 12),
axis.title.y = element_text(size = 12),
axis.title.x = element_text(size = 12),
axis.text.x = element_text(size = 10),
plot.title = element_text(size = 12))
figure_5 <- ggdraw() +
draw_plot(fig_5a, x = 0, y = .78, width = 1, height = .19) +
draw_plot(fig_5b, x = 0, y = .56, width = 1, height = .19) +
draw_plot(fig_5c, x = 0, y = 0, width = 1, height = 0.53) +
draw_plot_label(label = c('Figure 5','A. Relative abundance of Lactobacillus', 'B. Relative abundance of Escherichia/Shigella', 'C. Inverse Simpson Index'), size = 12,
x = c(.75,0, 0, 0), y = c(1,1, .77, .55), hjust = 0)
figure_5
customPalette <- c("#1F78B4", "#A6CEE3", "#33A02C", "#B2DF8A", "#E31A1C", "#FB9A99", "#FF7F00", "#FDBF6F", "#6A3D9A", "#d3d3d3")
### Individual plots for publication
long_profile <- classification_data %>%
filter(method_type %in% c('Uncorrected Results', 'Abundance filter', 'Negative control filter')) %>%
select(ASV_1, ASV_2, ASV_3, ASV_4, ASV_5, ASV_6, ASV_7, ASV_8, ASV_10, con_abundance,sample_names, method, method_type) %>%
gather(key = 'SequenceClass',value = 'Abundance', -sample_names, -method, - method_type)
# Figures
composition_filter <- ggplot(long_profile, aes(x = sample_names, y = Abundance)) + facet_wrap(~method, ncol = 5) +
geom_col(aes(fill = SequenceClass), width = 0.7, position = position_fill()) +
scale_fill_manual(values=customPalette, ) +
labs(x = NULL, y = 'Proportion of Reads') +
theme(legend.position = "none",
axis.text = element_text(size = 7),
axis.title = element_text(size = 8),
strip.text = element_text(size=8))
# Decontam methods
long_profile <- classification_data %>%
filter(method_type %in% c('Decontam frequency')) %>%
select(ASV_1, ASV_2, ASV_3, ASV_4, ASV_5, ASV_6, ASV_7, ASV_8, ASV_10, con_abundance,sample_names, method, method_type) %>%
gather(key = 'SequenceClass',value = 'Abundance', -sample_names, -method, - method_type)
# Figures
composition_decontam <- ggplot(long_profile, aes(x = sample_names, y = Abundance)) + facet_wrap(~method, ncol = 5) +
geom_col(aes(fill = SequenceClass), width = 0.7, position = position_fill()) +
scale_fill_manual(values=customPalette) +
labs(x = NULL, y = 'Proportion of Reads') +
theme(legend.position = "none",
axis.text = element_text(size = 7),
axis.title = element_text(size = 8),
strip.text = element_text(size=8))
# SourceTracker
long_profile <- classification_data %>%
filter(method_type %in% c('SourceTracker')) %>%
select(ASV_1, ASV_2, ASV_3, ASV_4, ASV_5, ASV_6, ASV_7, ASV_8, ASV_10, con_abundance,sample_names, method, method_type) %>%
gather(key = 'SequenceClassification',value = 'Abundance', -sample_names, -method, - method_type)
# Figures
composition_sourcetracker <- ggplot(long_profile, aes(x = sample_names, y = Abundance)) + facet_wrap(~method, ncol = 4) +
geom_col(aes(fill = SequenceClassification), width = 0.7, position = position_fill()) +
scale_fill_manual(values=customPalette, labels = c('Lactobacillus','Staphylococcus','Escherichia/Shigella','Salmonella ASV1','Bacillus','Pseudomonas','Listeria','Enterococcus','Salmonella ASV2','Contaminants')) + theme(text = element_text(size=8)) +
labs(x = "Sample", y = 'Proportion of Reads') +
theme(legend.position = "right",
legend.title = element_text(size = 8),
axis.text = element_text(size = 7),
axis.title = element_text(size = 10),
strip.text = element_text(size=8))
figure_S1 <- ggdraw() +
draw_plot(composition_filter, x = 0, y = .68, width = 1, height = .3) +
draw_plot(composition_decontam, x = 0, y = .35, width = 1, height = .3) +
draw_plot(composition_sourcetracker, x = 0, y = 0, width = 1, height = 0.33) +
draw_plot_label(label = c('Figure S1','Filter method', 'Decontam frequency method', 'SourceTracker method'), size = 12,
x = c(.75,0, 0, 0), y = c(1,1, .67, .35), hjust = 0)
figure_S1
We also evaluated the alpha diversity measures after removing contaminants (Supplemental Figure 2).
# Alpha diversity for all (supplemental figure 2)
fig_observed <- results_all_alpha %>% filter (method_type %in% c('Negative control filter','Abundance filter','Decontam frequency','SourceTracker')) %>%
filter(alpha_diversity_measure == 'Observed') %>%
ggplot(., aes(x = sample_names, y = Value, color = method)) +
geom_point() + facet_wrap(~method_type, ncol = 6) +
theme(legend.position = "none",
axis.text = element_text(size = 8),
axis.title = element_text(size = 10),
plot.title = element_text(size = 8),
strip.text.x = element_text(size = 8)) +
labs(x = "Sample", y = 'Observed ASVs') +
scale_color_manual(values = unname(color_palette))
fig_invSimpson <- results_all_alpha %>% filter (method_type %in% c('Negative control filter','Abundance filter','Decontam frequency','SourceTracker')) %>%
filter(alpha_diversity_measure == 'InvSimpson') %>%
ggplot(., aes(x = sample_names, y = Value, color = method)) +
geom_point() + facet_wrap(~method_type, ncol = 4) +
theme(legend.position = "right",
legend.title = element_text(size = 10),
legend.text = element_text(size = 8),
axis.text = element_text(size = 6),
axis.title = element_text(size = 10),
plot.title = element_text(size = 8),
strip.text.x = element_text(size = 8)) +
labs(x = "Sample", y = 'Inverse Simpson Index') +
scale_color_manual(values = unname(color_palette))
fig_shannon <-results_all_alpha %>% filter (method_type %in% c('Negative control filter','Abundance filter','Decontam frequency','SourceTracker')) %>%
filter(alpha_diversity_measure == 'Shannon') %>%
ggplot(., aes(x = sample_names, y = Value, color = method)) +
geom_point() + facet_wrap(~method_type, ncol = 4) +
theme(legend.position = "none",
axis.text = element_text(size = 6),
axis.title = element_text(size = 10),
plot.title = element_text(size = 8),
strip.text.x = element_text(size = 8)) +
labs(x = "Sample", y = 'Shannon Index') +
scale_color_manual(values = unname(color_palette))
figure_S2 <- ggdraw() +
draw_plot(fig_observed, x = 0, y = .68, width = .72, height = .3) +
draw_plot(fig_invSimpson, x = 0, y = .35, width = 1, height = .3) +
draw_plot(fig_shannon, x = 0, y = 0, width = .72, height = 0.33) +
draw_plot_label(label = c('Figure S2'), size = 10,
x = c(.75), y = c(1), hjust = 0)
figure_S2