1887

Abstract

Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- and long-reads) assembly approaches. Complete genomes allow a deeper understanding of bacterial evolution and genomic variation beyond single nucleotide variants. They are also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance genes. However, small plasmids are often missed or misassembled by long-read assembly algorithms. Here, we present Hybracter which allows for the fast, automatic and scalable recovery of near-perfect complete bacterial genomes using a long-read first assembly approach. Hybracter can be run either as a hybrid assembler or as a long-read only assembler. We compared Hybracter to existing automated hybrid and long-read only assembly tools using a diverse panel of samples of varying levels of long-read accuracy with manually curated ground truth reference genomes. We demonstrate that Hybracter as a hybrid assembler is more accurate and faster than the existing gold standard automated hybrid assembler Unicycler. We also show that Hybracter with long-reads only is the most accurate long-read only assembler and is comparable to hybrid methods in accurately recovering small plasmids.

Keyword(s): assembly , long-reads and plasmids
Funding
This study was supported by the:
  • Hospital Research Foundation (Award Top Up Scholarship)
    • Principle Award Recipient: GhaisHoutak
  • University of Adelaide (Award Barbara Kidman Women’s Fellowship)
    • Principle Award Recipient: AnnaE. Sheppard
  • Garnett Passe and Rodney Williams Memorial Foundation (Award senior fellowship.)
    • Principle Award Recipient: SarahVreugde
  • NIH (Award RC2DK116713)
    • Principle Award Recipient: RobertA. Edwards
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001244
2024-05-08
2024-05-19
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10/5/mgen001244.html?itemId=/content/journal/mgen/10.1099/mgen.0.001244&mimeType=html&fmt=ahah

References

  1. Land M, Hauser L, Jun S-R, Nookaew I, Leuze MR et al. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics 2015; 15:141–161 [View Article] [PubMed]
    [Google Scholar]
  2. Goldstein S, Beka L, Graf J, Klassen JL. Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing. BMC Genomics 2019; 20:23 [View Article] [PubMed]
    [Google Scholar]
  3. De Maio N, Shaw LP, Hubbard A, George S, Sanderson ND et al. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microb Genom 2019; 5:e000294 [View Article] [PubMed]
    [Google Scholar]
  4. Wick RR, Judd LM, Gorrie CL, Holt KEY. Completing bacterial genome assemblies with multiplex MinION sequencing. Microb Genom 2017; 3:e000132 [View Article] [PubMed]
    [Google Scholar]
  5. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017; 13:e1005595 [View Article] [PubMed]
    [Google Scholar]
  6. Wick RR, Holt KE. Benchmarking of long-read assemblers for prokaryote whole genome sequencing. F1000Res 2019; 8:2138 [View Article] [PubMed]
    [Google Scholar]
  7. Wick RR, Judd LM, Cerdeira LT, Hawkey J, Méric G et al. Trycycler: consensus long-read assemblies for bacterial genomes. Genome Biol 2021; 22:266 [View Article] [PubMed]
    [Google Scholar]
  8. Wick R. ONT-only accuracy with R10.4.1. Ryan Wick’s bioinformatics blog; 2023 https://rrwick.github.io/2023/05/05/ont-only-accuracy-with-r10.4.1.html https://doi.org/10.5281/zenodo.7898220
  9. Lerminiaux N, Fakharuddin K, Mulvey MR, Mataseje L. Do we still need Illumina sequencing data? Evaluating Oxford Nanopore Technologies R10.4.1 flow cells and the Rapid v14 library prep kit for Gram negative bacteria whole genome assemblies. Can J Microbiol 2024 [View Article] [PubMed]
    [Google Scholar]
  10. Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sørensen EA et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat Methods 2022; 19:823–826 [View Article] [PubMed]
    [Google Scholar]
  11. Wick R. ONT-only accuracy: 5 kHz and Dorado. Ryan Wick’s bioinformatics blog; 2023 https://rrwick.github.io/2023/10/24/ont-only-accuracy-update.html https://doi.org/10.5281/zenodo.10038672
  12. Wick RR, Judd LM, Holt KE. Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing. PLoS Comput Biol 2023; 19:e1010905 [View Article] [PubMed]
    [Google Scholar]
  13. Murigneux V, Roberts LW, Forde BM, Phan M-D, Nhu NTK et al. MicroPIPE: validating an end-to-end workflow for high-quality complete bacterial genome construction. BMC Genomics 2021; 22:474 [View Article] [PubMed]
    [Google Scholar]
  14. Schwengers O, Hoek A, Fritzenwanker M, Falgenhauer L, Hain T et al. ASA3P: an automatic and scalable pipeline for the assembly, annotation and higher-level analysis of closely related bacterial isolates. PLoS Comput Biol 2020; 16:e1007134 [View Article] [PubMed]
    [Google Scholar]
  15. Petit RA, Read TD. Bactopia: a flexible pipeline for complete analysis of bacterial genomes. mSystems 2020; 5:e00190-20 [View Article] [PubMed]
    [Google Scholar]
  16. Petit III RA. Dragonflye: Assemble Bacterial Isolate Genomes from Nanopore Reads
    [Google Scholar]
  17. Hunt M, Silva ND, Otto TD, Parkhill J, Keane JA et al. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol 2015; 16:294 [View Article] [PubMed]
    [Google Scholar]
  18. Wick RR, Holt KE. Polypolish: short-read polishing of long-read bacterial genome assemblies. PLoS Comput Biol 2022; 18:e1009802 [View Article] [PubMed]
    [Google Scholar]
  19. Johnson J, Soehnlen M, Blankenship HM. Long read genome assemblers struggle with small plasmids. Microb Genom 2023; 9:001024 [View Article] [PubMed]
    [Google Scholar]
  20. Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH et al. Sustainable data analysis with Snakemake. F1000Res 2021; 10:33 [View Article] [PubMed]
    [Google Scholar]
  21. Roach MJ, Pierce-Ward NT, Suchecki R, Mallawaarachchi V, Papudeshi B et al. Ten simple rules and a template for creating workflows-as-applications. PLoS Comput Biol 2022; 18:e1010705 [View Article] [PubMed]
    [Google Scholar]
  22. Wick RR. Filtlong; 2018
  23. Bonenfant Q, Noé L, Touzet H. Porechop_ABI: discovering unknown adapters in Oxford Nanopore Technology sequencing reads for downstream trimming. Bioinform Adv 2023; 3:vbac085 [View Article] [PubMed]
    [Google Scholar]
  24. Roach MJ. Trimnami: Trim Lots of Metagenomics Samples All at Once 2023
    [Google Scholar]
  25. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018; 34:i884–i890 [View Article] [PubMed]
    [Google Scholar]
  26. Shen W, Le S, Li Y, Hu F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One 2016; 11:e0163962 [View Article] [PubMed]
    [Google Scholar]
  27. Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 2019; 37:540–546 [View Article] [PubMed]
    [Google Scholar]
  28. Bouras G, Sheppard AE, Mallawaarachchi V, Vreugde S. Plassembler: an automated bacterial plasmid assembly tool. Bioinformatics 2023; 39:btad409 [View Article] [PubMed]
    [Google Scholar]
  29. medaka: Sequence correction provided by ONT Research. n.d
  30. Bouras G, Grigson SR, Papudeshi B, Mallawaarachchi V, Roach MJ. Dnaapler: a tool to reorient circular microbial genomes. JOSS 2024; 9:5968 [View Article]
    [Google Scholar]
  31. Bouras G, Judd LM, Edwards RA, Vreugde S, Stinear TP et al. How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies. Bioinformatics 2024 [View Article]
    [Google Scholar]
  32. Zimin AV, Salzberg SL. The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies. PLoS Comput Biol 2020; 16:e1007981 [View Article] [PubMed]
    [Google Scholar]
  33. Clark SC, Egan R, Frazier PI, Wang Z. ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics 2013; 29:435–443 [View Article] [PubMed]
    [Google Scholar]
  34. Larralde M. Pyrodigal: python bindings and interface to prodigal, an efficient method for gene prediction in prokaryotes. JOSS 2022; 7:4296 [View Article]
    [Google Scholar]
  35. Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11:119 [View Article] [PubMed]
    [Google Scholar]
  36. Vaser R, Šikić M. Time- and memory-efficient genome assembly with Raven. Nat Comput Sci 2021; 1:332–336 [View Article] [PubMed]
    [Google Scholar]
  37. Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. Nat Methods 2020; 17:155–158 [View Article] [PubMed]
    [Google Scholar]
  38. Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 2016; 32:2103–2110 [View Article] [PubMed]
    [Google Scholar]
  39. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 2017; 27:722–736 [View Article] [PubMed]
    [Google Scholar]
  40. Zhang X, Liu C-G, Yang S-H, Wang X, Bai F-W et al. Benchmarking of long-read sequencing, assemblers and polishers for yeast genome. Brief Bioinformatics 2022; 23:bbac146 [View Article]
    [Google Scholar]
  41. Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 2017; 27:737–746 [View Article] [PubMed]
    [Google Scholar]
  42. Chitale P, Lemenze AD, Fogarty EC, Shah A, Grady C et al. A comprehensive update to the Mycobacterium tuberculosis H37Rv reference genome. Nat Commun 2022; 13:7068 [View Article] [PubMed]
    [Google Scholar]
  43. Hall MB. Rasusa: randomly subsample sequencing reads to a specified coverage. JOSS 2022; 7:3941 [View Article]
    [Google Scholar]
  44. Steinig E, Coin L. Nanoq: ultra-fast quality control for nanopore reads. JOSS 2022; 7:2991 [View Article]
    [Google Scholar]
  45. Hall MB, Wick RR, Judd LM, Nguyen ANT, Steinig EJ et al. Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data. Bioinformatics 2024 [View Article]
    [Google Scholar]
  46. Wick RR, Judd LM, Monk IR, Seemann T, Stinear TP. Improved genome sequence of Australian methicillin-resistant Staphylococcus aureus strain JKD6159. Microbiol Resour Announc 2023; 12:e0112922 [View Article] [PubMed]
    [Google Scholar]
  47. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M et al. Versatile and open software for comparing large genomes. Genome Biol 2004; 5:R12 [View Article] [PubMed]
    [Google Scholar]
  48. Sayers EW, Bolton EE, Brister JR, Canese K, Chan J et al. Database resources of the national center for biotechnology information. Nucleic Acids Res 2022; 50:D20–D26 [View Article] [PubMed]
    [Google Scholar]
  49. Galata V, Fehlmann T, Backes C, Keller A. PLSDB: a resource of complete bacterial plasmids. Nucleic Acids Res 2019; 47:D195–D202 [View Article] [PubMed]
    [Google Scholar]
  50. Kolmogorov M, Bickhart DM, Behsaz B, Gurevich A, Rayko M et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat Methods 2020; 17:1103–1110 [View Article] [PubMed]
    [Google Scholar]
  51. Wick RR, Judd LM, Wyres KL, Holt KEY. Recovery of small plasmid sequences via Oxford Nanopore sequencing. Microb Genom 2021; 7:000631 [View Article] [PubMed]
    [Google Scholar]
  52. Marinus MG, Løbner-Olesen A. DNA Methylation. EcoSal Plus 2014; 6:10 [View Article] [PubMed]
    [Google Scholar]
  53. Wick RR, Judd LM, Holt KE. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 2019; 20:129 [View Article] [PubMed]
    [Google Scholar]
  54. Sanderson ND, Kapel N, Rodger G, Webster H, Lipworth S et al. Comparison of R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome reconstruction. Microb Genom 2023; 9:000910 [View Article] [PubMed]
    [Google Scholar]
  55. Chua K, Seemann T, Harrison PF, Davies JK, Coutts SJ et al. Complete genome sequence of Staphylococcus aureus strain JKD6159, a unique Australian clone of ST93-IV community methicillin-resistant Staphylococcus aureus. J Bacteriol 2010; 192:5556–5557 [View Article] [PubMed]
    [Google Scholar]
  56. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455–477 [View Article] [PubMed]
    [Google Scholar]
  57. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 2015; 31:1674–1676 [View Article] [PubMed]
    [Google Scholar]
  58. Compeau PEC, Pevzner PA, Tesler G. How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 2011; 29:987–991 [View Article] [PubMed]
    [Google Scholar]
  59. Wong J, Coombe L, Nikolić V, Zhang E, Nip KM et al. Linear time complexity de novo long read genome assembly with GoldRush. Nat Commun 2023; 14:2906 [View Article] [PubMed]
    [Google Scholar]
  60. Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020; 21:30 [View Article] [PubMed]
    [Google Scholar]
  61. Ekim B, Berger B, Chikhi R. Minimizer-space de Bruijn graphs: whole-genome assembly of long reads in minutes on a personal computer. Cell Syst 2021; 12:958–968 [View Article] [PubMed]
    [Google Scholar]
  62. Bankevich A, Bzikadze AV, Kolmogorov M, Antipov D, Pevzner PA. Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads. Nat Biotechnol 2022; 40:1075–1081 [View Article] [PubMed]
    [Google Scholar]
  63. Lin Y, Yuan J, Kolmogorov M, Shen MW, Chaisson M et al. Assembly of long error-prone reads using de Bruijn graphs. Proc Natl Acad Sci U S A 2016; 113:E8396–E8405 [View Article] [PubMed]
    [Google Scholar]
  64. Mallawaarachchi V, Roach MJ, Decewicz P, Papudeshi B, Giles SK et al. Phables: from fragmented assemblies to high-quality bacteriophage genomes. Bioinformatics 2023; 39:btad586 [View Article] [PubMed]
    [Google Scholar]
  65. Mathers AJ, Stoesser N, Chai W, Carroll J, Barry K et al. Chromosomal integration of the Klebsiella pneumoniae carbapenemase gene, blaKPC, in Klebsiella species is elusive but not rare. Antimicrob Agents Chemother 2017; 61:e01823-16 [View Article] [PubMed]
    [Google Scholar]
  66. Houtak G, Bouras G, Nepal R, Shaghayegh G, Cooksley C et al. The intra-host evolutionary landscape and pathoadaptation of persistent Staphylococcus aureus in chronic rhinosinusitis. Microb Genom 2023; 9:001128 [View Article] [PubMed]
    [Google Scholar]
  67. Sheppard AE, Stoesser N, Wilson DJ, Sebra R, Kasarskis A et al. Nested Russian doll-like genetic mobility drives rapid dissemination of the carbapenem resistance gene blaKPC. Antimicrob Agents Chemother 2016; 60:3767–3778 [View Article] [PubMed]
    [Google Scholar]
  68. Hawkey J, Wyres KL, Judd LM, Harshegyi T, Blakeway L et al. ESBL plasmids in Klebsiella pneumoniae: diversity, transmission and contribution to infection burden in the hospital setting. Genome Med 2022; 14:97 [View Article] [PubMed]
    [Google Scholar]
  69. Matlock W, Lipworth S, Chau KK, AbuOun M, Barker L et al. Enterobacterales plasmid sharing amongst human bloodstream infections, livestock, wastewater, and waterway niches in Oxfordshire, UK. Elife 2023; 12:e85302 [View Article] [PubMed]
    [Google Scholar]
  70. Roberts LW, Enoch DA, Khokhar F, Blackwell GA, Wilson H et al. Long-read sequencing reveals genomic diversity and associated plasmid movement of carbapenemase-producing bacteria in a UK hospital over 6 years. Microb Genom 2023; 9:001048 [View Article] [PubMed]
    [Google Scholar]
  71. Lerminiaux N. Plasmid genomic epidemiology of blaKPC carbapenemase-producing Enterobacterales in Canada, 2010–2021. Antimicrob Agents Chemother 2023e00860-23
    [Google Scholar]
  72. Yoo AB, Jette MA, Grondona M. eds SLURM: Simple Linux Utility for Resource Management. in Job Scheduling Strategies for Parallel Processing Berlin, Heidelberg: Springer; 2003 pp 44–60 [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.001244
Loading
/content/journal/mgen/10.1099/mgen.0.001244
Loading

Data & Media loading...

Supplements

Supplementary material 1

EXCEL

Supplementary material 2

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error