Genome Assembly and Annotation Metrics

Welcome to GenomeQC website!

How to cite GenomeQC:

GenomeQC: A quality assessment tool for genome assemblies and gene structure annotations Nancy Manchanda, John L. Portwood II, Margaret R. Woodhouse, Arun S. Seetharam, Carolyn J. Lawrence-Dill, Carson M. Andorf, Matthew B. Hufford

https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-6568-2

GenomeQC is a user-friendly and interactive platform that generates descriptive summaries with intuitive graphics for genome assemblies and structural annotations. It also benchmarks user supplied assemblies and annotations against the publicly available reference genomes of their choice.

The web application is designed to compute assembly and annotation statistics for small to medium-sized genomes with an upper limit of 2.5 Gb (the approximate size of the maize genome).

The tool’s analysis interface is organized into 3 sections:

1. Compare reference genomes

This section displays various assembly and annotation metrics for the user-selected list of reference genomes.

2. Analyse your genome assembly

This section provides the user the option to perform analysis on their genome assembly as well as benchmark their analysis with pre-computed reference genomes.

3. Analyse your genome annotation

This section provides the user the option to perform analysis on their genome annotations as well as benchmark their analysis with the pre-computed reference annotations.

Frequently Asked Questions (FAQs):

Q: How to cite GenomeQC tool?

A: GenomeQC: A quality assessment tool for genome assemblies and gene structure annotations Nancy Manchanda, John L. Portwood II, Margaret R. Woodhouse, Arun S. Seetharam, Carolyn J. Lawrence-Dill, Carson M. Andorf, Matthew B. Hufford. https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-6568-2

Q: What should I do if the web-page gets disconnected?

A: The Shiny server is sensitive to internet connectivity, and so, you may experience periodic disconnection of the page. Should this happen, the user can reload the page and resubmit a job. Once the BUSCO job has been submitted, it is not dependent on an active internet connection. Even if the webpage is disconnected or if the user closes the webpage/browser, the plots will still be emailed to the email address provided in the input box.

Q: How long does it take for the BUSCO analysis to complete?

A: BUSCO analysis of genome assemblies and annotations is a computationally intensive job and the expected run time depends on the size of assemblies and annotation sets. The following lists the expected run time for different genomes: Genomes up to 200 Mb: up to 2 hours, Genomes between 200Mb and 400 Mb: 3-4 hours, Genomes between 400Mb and 700 Mb: 4-8 hours, Genomes between 700Mb and 1.5 Gb: 8-24 hours, Genomes greater than 1.5 Gb: >1 day

Q: What should I do if I did not receive BUSCO plots via email?

A: First, confirm that you clicked on the “Assembly Busco and Contamination Plots” tab. This step is required in addition to clicking the “Click to Submit your Job” button. Second, please check your spam folder if you do not find the plots in your inbox. Finally, if you still do not receive the BUSCO plots by email, please contact us with the details of your job submission including your job ID.

Userguide for this web-application is available at this link: https://github.com/HuffordLab/GenomeQC

Please send questions (after reading the User guide) to: john.portwood@ars.usda.gov

For the best results, please click on each tab from left to right one at a time, explore the tab completely, download its results and then move on to next tab. Click the BUSCO tab at the very end only.

Pop-up plots are available for each metric by clicking on the rows of Assembly and Annotation metrics tables!!!!

NG(X) plots provide information on the contiguity of the assembled genome sequence. Higher the curve, better is the quality of the assembly in terms of contiguity.

This tab outputs different metrics including number of scaffolds, L50, N50, LG50, NG50, and gaps percentage values. A good quality assembly will have fewer total scaffolds, higher N50,NG50 values and lower L50, LG50, %N values.

This tab provides a summary of different annotation metrics like number and average length of gene models, exons, transcripts, etc.

This tab outputs the % of BUSCO genes. A good quality assembly and annotation should have a high number of complete and single copy BUSCOs and a ow number of fragmented and missing BUSCO genes. Please be patient. All the plots will be emailed once the analysis is finished.

Pop-up plots are available for each metric by clicking on the rows of Assembly and Annotation metrics tables!!!!

NG(X) plots provide information on the contiguity of the assembled genome sequence. Higher the curve, better is the quality of the assembly in terms of contiguity. Please note: if you uploaded a large genome you might experience a short lag period between when you click this tab and when your job gets started.

This tab outputs different metrics including number of scaffolds, L50, N50, LG50, NG50, and gaps percentage values. A high quality assembly will have fewer total scaffolds, higher N50, NG50 values and lower L50, LG50, %N values. Please note: if you uploaded a big genome you might experience a short lag period between when you click this tab and when your job gets started.

This tab outputs the % of BUSCO genes. A good quality assembly and annotation should have a higher number of complete and single copy BUSCOs and a lower number of fragmented and missing BUSCO genes. For contamination analysis, the assembled sequences are screened against the NCBI UniVec database to quickly identify sequences of vector origin or those of adaptors or linkers. Please note: you might experience a short lag period between when you click this tab and when your BUSCO job gets submitted to the server. Please be patient. All the plots will be emailed once the analysis is finished. Please see the FAQ section for an estimate of expected run time based on the size of the submitted assembly.

Pop-up plots are available for each metric by clicking on the rows of Assembly and Annotation metrics tables!!!!

Annotation Metrics Table
Annotation Busco Plot

This tab provides a summary of different annotation metrics like number and average length of gene models, exons, transcripts, etc. Please note: if you uploaded a big genome you might experience a short lag period between when you click this tab and when your job gets started.

This tab outputs the % of BUSCO genes. A good quality assembly and annotation should have a higher number of complete and single copy BUSCOs and a lower number of fragmented and missing BUSCO genes. Please note: you might experience a short lag period between when you click this tab and when your BUSCO job gets submitted to the server. Please be patient. All the plots will be emailed once the analysis is finished. Please see the FAQ section for an estimate of expected run time based on the size of the submitted annotation.