| YAMAP |
 |
YAMAP home |
Bio-Linux |
Microbial Metagenomics
Yet Another Microbial/Metagenomic Annotation Pipeline/Program is a Perl application created for the NERC Microbial Metagenomics programme. It is designed to be a user-friendly way for Bio-Linux users to run a selection of first-pass annotation tools upon metagenomic sequences in order to determine whether these sequences merit more detailed investigation.
See the CVS for information on changes. The latest version can also be obtained from the apt repository (see below).
For more information please refer to the sections below:
If you have a Debian system
To install YAMAP you can add the following to your /etc/apt/sources list, then run apt-get install bio-linux-yamap.
deb http://envgen.nox.ac.uk/bio-linux/ unstable bio-linux
Alternatively, you can download the package directly from the repository.
If you don't have a Debian system
The only current option is to download the package from our repository, and convert it to tar.gz or rpm format with Alien. YAMAP is intended for Bio-Linux users on a specific project, but if there is sufficient demand it may be made available in other formats. If you install from another format you'll need to make sure that all the associated annotation programs are installed. If they aren't, YAMAP will exit and give you a list of what's missing.
YAMAP is designed to process FASTA format nucleotide sequence files only, one sequence per file.
Main interface
The interface contains several sections, as listed below:
- File selection buttons: The “select sequences” button will pop up a file selection dialogue with which you may select sequences to be annotated. You may select multiple sequences by using the shift or control key as you right click upon them. You may delete files from this box by selecting them in the same way and clicking on the “delete selected” button.
- Sequence list box: Contains a list of all the sequence files that are to be annotated. The scrollbar will become active should sufficient names be entered into the box.
- Annotation program checkboxes: The red checkboxes denote applications that will be run. Next to each is a configuration button which will pop up a separate window allowing you to configure that application.
- Run and quit buttons: Run will save your selected configuration to ~/.yamap/yamap_run.ini and then run the analysis. Quit will exit the application without saving your configuration.
Application configuration
All application configuration windows are broadly similar.
This is a typical application configuration window, in this case for configuring blast (using big-blast.pl) against a database of your choice (in this case, swissprot). Buttons allow the selection of the blast program to use, and other options may be filled in in the boxes. If you are using a Bio-Linux machine then it is recommended that you leave the number of jobs at 2. The "other options" box allows options to be passed directly to blast, and in this case it shows an expect value being specified with the option “-e 0.0001”
For more details of any of the programs run by YAMAP, please refer to the program's own documentation. This can be accessed by clicking on the documentation button in the configuration window, which will cause a terminal to appear and display the documentation with man, less; or lynx. In general, the space bar and b key will page down or up respectively, and the q key will quit the documentation.
When YAMAP is run, it will create an output directory in the same directory as the first file in your list of files. It is best practice to make sure all the files you select are in the same directory. The list below shows the files produced when all the appropriate (i.e. not QuickMine) annotation programs were run on a bacterial whole genome in FASTA format, NC_000117 (Chlamydia trachomatis).
yamap_out:
NC_000117.embl
NC_000117.qm
NC_000117.out/
yamap_out/NC_000117.out:
NC_000117
NC_000117.bigblast.crunch
NC_000117.bigblast.stdout
NC_000117.bigblast.tab
NC_000117.einverted.stdout
NC_000117.einverted.tab
NC_000117.equicktandem.stdout
NC_000117.etandem.stdout
NC_000117.etandem.tab
NC_000117.glimmer.stdout
NC_000117.glimmer.tab
NC_000117.msatfinder.stdout
NC_000117.msatfinder.tab
NC_000117.palindrome.stdout
NC_000117.palindrome.tab
NC_000117.rbs.output
NC_000117.rbs.tab
NC_000117.selfblast.crunch
NC_000117.selfblast.stdout
NC_000117.selfblast.tab
NC_000117.tan.out
NC_000117.trnascan.out
NC_000117.trnascan.stdout
|
The yamap_out directory contains the following:
- A directory of further output files, NC_000117.out/.
- An EMBL formatted sequence file. N.B. this is not appropriate for EMBL submission, as it has not been edited to remove a lot of the “junk” that may be generated by the annotation programs.
The subdirectory yamap_out/NC_000117.out contains:
- A link to the original file, NC_000117, which is used by Artemis.
- A selection of files ending in .tab. These are the feature tables that are viewed by Artemis, and written into the EMBL formatted file one directory up.
- Files ending in .crunch. These are the output of MSPcrunch, a blast post-processing program useful when comparing two sequences with ACT. These are treated like .tab files.
- Files ending in .stdout. These are the output the annotation programs produce as they run. If all is successful you can ignore these, but if there is a problem then they should be inspected to find the cause.
- Files ending in .out are output files of programs before they have been converted to .tab format — most users will not need these.
|
No data will be deleted from this directory between runs, so if you wish to run YAMAP again it is a good idea to delete it (with rm -rf), backing it up if necessary (tar czvf yamap_out.tar.gz yamap_out).
Click on the button that will appear in the program window and Artemis will start. When you close Artemis, it will re-start on the next sequence. If you have run a lot of sequences and get bored with looking at them, you can kill this process off by clicking on the flashing button.
The data that will appear in Artemis is from EMBL formatted file, created from the .tab and .crunch files that have been read in. You can edit the annotations and save any sequence that you particularly like — please see the Artemis manual for more information.
Saving the output
Eventually, it will be necessary to save sequences that you're happy with in /home/db/collector (for Bio-Linux users). More information on this will be added later, once the system is up and running.
Pfam searching
Because it is likely that MM users will be annotating many small metagenomic sequences, YAMAP has been designed to use a local Pfam database in order to reduce network traffic. This would normally be kept in /home/db on a Bio-Linux machine. If you wish to do this, you may download a version of the database here and untar in /home/db. Alternative instructions for obtaining and formatting the database are here and in the pfam_scan documentation that can be accessed from the pfam configuration window.
Programmes and Parsers
The table below lists the different programmes used in YAMAP. A link to their webpage and their associated parser, written for YAMAP, is provided.
- Sometimes a program will run successfully, but will find nothing. It will then pass this information on to the next program in the chain, which will fail due to lack of input. This may sometimes happen with Glimmer, for example, and means that you may find that no CDS regions are found in the output.
- The annotation process can be halted whilst it is running by clicking on the flashing button. However, if this is done when big-blast is running then blastall may ignore this command and keep running. If this happens, kill it off by typing “killall blastall” into a terminal.
- YAMAP is a bit lazy about checking that the user has supplied the correct file, and strange things may happen if the wrong type is used (probably ugly error messages from bioperl).
- If you find that some of the configuration options look odd (particularly msatfinder), then delete ~/.yamap.
- The output directory will not be deleted if YAMAP is not re-run. If you want to save your results them make a copy of it before running again. However, this does mean that you can keep re-running and over-writing old results or adding new tests.
- If you double click on a configuration or a documentation button, then two copies of the configuration or documentation windows will pop up. This is, in fact, a feature rather than a bug, and can be avoided by not double clicking. If you have several configuration windows of the same type open then the configuration saved by YAMAP will be whichever was in the window you last clicked save in.
If you have any questions or comments, please e-mail Tim Booth. I'd be interested to hear of any problems or suggestions for improvements. I don't have much development time but will try to include requested features and will fix any bugs.
To the top.
|