Antibiotic resistance gene analyses in microbial communities: challenges and opportunities - The Global Antibiotics Resistance Foundation

Culture-independent antibiotic resistance gene analyses enable broad explorations of microbial communities but often fail to link such genes to bacterial hosts and genetic contexts. This makes assessing prevalence of resistant pathogens and likelihood of further transmission or resistance evolution uncertain.

Rationale for studying antibiotic resistance genes

Antibiotic resistance has become one of the most pressing global health challenges to date. As bacteria and their genetic material tend to move between humans, domestic animals and external environments, there is a need for both interventions and research across the entire One-Health spectrum. Dynamics within microbial communities are complex: Non-pathogenic bacteria may act as sources or intermediary carriers of genetic resistance determinants or can, without being resistant themselves, impact the success of resistant bacteria in the same community. Given the challenges in cultivating the majority of bacterial species, culture-independent analyses such as metagenomic sequencing or polymerase chain reactions (PCR) have provided opportunities to gain a more holistic view of antibiotic resistance genes (ARGs) in microbial communities, far beyond individual cultivable pathogens. Consequently, studies of the nature and abundance of ARGs are today used as the basis for addressing intriguing questions. These include quantifying transmission risks and routes of resistant pathogens, understanding selection pressures on microbial communities for resistance, and providing insights into regional resistance situations.

Analysing ARGs in communities—technical limitations and solutions

For decades, scientists have analysed ARGs via (quantitative) PCR in complex samples such as waste waters, soils and human microbial communities. PCR sensitively measures individual abundances of genes, and high-throughput PCR arrays or multiplexing approaches can analyse hundreds of ARGs in parallel; however, given the millions of predicted and identified ARGs a priori defined PCR arrays may overlook many relevant ones. PCR is, furthermore, inherently sensitive to non-specific primer binding, leading to high risks for false positives and erroneous quantification. This risk often becomes evident when working with highly diverse microbial environmental samples, such as waste waters that contain many similar, potentially cross-reacting gene sequences, calling for better validation under realistic conditions.

High-throughput sequencing allows for a random, broad and deep shotgun approach that can essentially identify any ARG, thereby circumventing the challenge of non-specific PCR primer binding. Such sequencing technologies also paved the way for studying any gene recognisable as an ARG, as long as a similar gene is present in a reference database. As the choice of which ARGs to look for can be made after the data are generated, the sequencing data can be reused for retrospective ARG analyses. In addition, the same data enable studies of taxonomic composition and other biochemical functions. Despite improved accuracy of newer sequencing technologies, insufficient sequencing depth, which refers to the total amount of DNA sequenced, remains a limitation in many applications given the high diversity of most microbial communities. Hence, a major remaining challenge with shotgun metagenomics is to detect and quantify anything but the most commonly occurring ARGs.

Another critical limitation, shared with PCR, is placing ARGs in accurate genetic contexts. While there are numerous bioinformatic tools that assemble shorter DNA sequences from sequenced communities into longer, ARG-containing contigs, they often perform poorly when encountering genes or DNA sequences that are mobile and tend to occur in multiple contexts in different bacteria, which includes ARGs. The underlying problem arises as the read (or read pair) does not typically span both sides of a mobile sequence. The assembly process will therefore often generate complex assembly graphs with multiple sequences both up- and downstream of every mobile sequence, with very limited possibilities, despite taking coverage into account, to conclude with certainty which ones are truly connected (Fig). Risks for incorrect assemblies will increase as the number of mobile elements and the complexity of the community grow. Long-read sequencing, such as Oxford Nanopore and PacBio, has the potential to markedly reduce this problem. Nonetheless, benchmarking studies show that downstream analytical steps, particularly assembly and post-assembly processing, can represent a major source of artefacts even when using high-accuracy long reads, leading to chimeras, unsupported sequences, or misrepresented genomic features. With parallelised technology platforms, some of the price paid in loss of sequencing depth compared to Illumina may also be regained. A notable remaining difference is also the higher biomass typically required for long-read sequencing, which sometimes is a limiting factor.

By: Nature Communications. D. G. Joakim Larsson, Carl-Fredrik Flach, Erik Kristiansson.