5 Our Metagenomic practice
In our practice we will cover the functional annotation on metagenomics data, both on assembly based and MAGs based. Due to resources and time limits we will try to run Prodigal, HMMer and Quast.
For the assembly-based analysis, we have the assembly fasta file located in the following path: /SERVER/mg_data/mg/Assembly_metaflye/assembly.fasta
5.1 ORF Prediction
Fast, reliable protein-coding gene prediction for prokaryotic genomes.
https://github.com/hyattpd/Prodigal
5.1.1 Prodigal
Setup the conda env
Tips:
- Input: Prodigal run using a fasta file, for example the one represented the assembly
- Output: You should obtains a GFF file and an Aminoacidic Fasta file of the predicted orfs
[SPOILER] - Scripts that we will use
We create an empty file called s01_prodigal.sh
touch s01_prodigal.sh
We can write our actions in the scripts as follows:
#!/bin/bash
assembly="/SERVER/mg_data/mg/Assembly_metaflye/assembly.fasta"
outfolder="output_s01"
mkdir -p $outfolder
prodigal -i ${assembly} \
-o ${outfolder}/genes.gff \
-a ${outfolder}/protein_translations.faa \
-f gff \
-p metaCreate output directory for this script (Change irsa with your utenteX name)
mkdir -p /home/irsa/analisi_MG/output_s01/
Change its permission:
chmod u+x s01_prodigal.sh
Execute it:
./s01_prodigal.sh
5.3 Assembly Quality Check
5.3.1 Quast
The QUAST package works both with and without reference genomes. However, it is much more informative if at least a close reference genome is provided along with the assemblies. The tool accepts multiple assemblies, thus is suitable for comparison.
https://github.com/ablab/quast
Setup the conda env
[SPOILER] - Scripts that we will use
We create an empty file called s04_quast.sh
touch s04_quast.sh
We can write our actions in the scripts as follows:
#!/bin/bash
outfolder="output_s04"
quast --labels flye --contig-thresholds 0,1000,10000,100000,1000000 --threads 2 -o ${outfolder} /SERVER/mg_data/Assembly_metaflye/assembly.fastaCreate output directory for this script (Change irsa with your utenteX name)
mkdir -p /home/irsa/analisi_MG/output_s04/
Change its permission:
chmod u+x s04_quast.sh
Execute it:
./s04_quast.sh
What if you use metaquast?
[SPOILER] - Scripts that we will use
MetaQUAST the extension for metagenomic datasets, it evaluates and compares metagenome assemblies based on alignments to close references. It is based on QUAST genome quality assessment tool, but addresses features specific for metagenome datasets.
5.4 Web Tools Annotations
- KOFAM Koala https://www.genome.jp/tools/kofamkoala/
- EGGNOG http://eggnog-mapper.embl.de
- DBcan https://bcb.unl.edu/dbCAN2/blast.php