En savoir plus

Notre utilisation de cookies

« Cookies » désigne un ensemble d’informations déposées dans le terminal de l’utilisateur lorsque celui-ci navigue sur un site web. Il s’agit d’un fichier contenant notamment un identifiant sous forme de numéro, le nom du serveur qui l’a déposé et éventuellement une date d’expiration. Grâce aux cookies, des informations sur votre visite, notamment votre langue de prédilection et d'autres paramètres, sont enregistrées sur le site web. Cela peut faciliter votre visite suivante sur ce site et renforcer l'utilité de ce dernier pour vous.

Afin d’améliorer votre expérience, nous utilisons des cookies pour conserver certaines informations de connexion et fournir une navigation sûre, collecter des statistiques en vue d’optimiser les fonctionnalités du site. Afin de voir précisément tous les cookies que nous utilisons, nous vous invitons à télécharger « Ghostery », une extension gratuite pour navigateurs permettant de les détecter et, dans certains cas, de les bloquer.

Ghostery est disponible gratuitement à cette adresse : https://www.ghostery.com/fr/products/

Vous pouvez également consulter le site de la CNIL afin d’apprendre à paramétrer votre navigateur pour contrôler les dépôts de cookies sur votre terminal.

S’agissant des cookies publicitaires déposés par des tiers, vous pouvez également vous connecter au site http://www.youronlinechoices.com/fr/controler-ses-cookies/, proposé par les professionnels de la publicité digitale regroupés au sein de l’association européenne EDAA (European Digital Advertising Alliance). Vous pourrez ainsi refuser ou accepter les cookies utilisés par les adhérents de l'EDAA.

Il est par ailleurs possible de s’opposer à certains cookies tiers directement auprès des éditeurs :

Catégorie de cookie

Moyens de désactivation

Cookies analytiques et de performance

Realytics
Google Analytics
Spoteffects
Optimizely

Cookies de ciblage ou publicitaires

DoubleClick
Mediarithmics

Les différents types de cookies pouvant être utilisés sur nos sites internet sont les suivants :

Cookies obligatoires

Cookies fonctionnels

Cookies sociaux et publicitaires

Ces cookies sont nécessaires au bon fonctionnement du site, ils ne peuvent pas être désactivés. Ils nous sont utiles pour vous fournir une connexion sécuritaire et assurer la disponibilité a minima de notre site internet.

Ces cookies nous permettent d’analyser l’utilisation du site afin de pouvoir en mesurer et en améliorer la performance. Ils nous permettent par exemple de conserver vos informations de connexion et d’afficher de façon plus cohérente les différents modules de notre site.

Ces cookies sont utilisés par des agences de publicité (par exemple Google) et par des réseaux sociaux (par exemple LinkedIn et Facebook) et autorisent notamment le partage des pages sur les réseaux sociaux, la publication de commentaires, la diffusion (sur notre site ou non) de publicités adaptées à vos centres d’intérêt.

Sur nos CMS EZPublish, il s’agit des cookies sessions CAS et PHP et du cookie New Relic pour le monitoring (IP, délais de réponse).

Ces cookies sont supprimés à la fin de la session (déconnexion ou fermeture du navigateur)

Sur nos CMS EZPublish, il s’agit du cookie XiTi pour la mesure d’audience. La société AT Internet est notre sous-traitant et conserve les informations (IP, date et heure de connexion, durée de connexion, pages consultées) 6 mois.

Sur nos CMS EZPublish, il n’y a pas de cookie de ce type.

Pour obtenir plus d’informations concernant les cookies que nous utilisons, vous pouvez vous adresser au Déléguée Informatique et Libertés de l’INRA par email à cil-dpo@inra.fr ou par courrier à :

INRA
24, chemin de Borde Rouge –Auzeville – CS52627
31326 Castanet Tolosan cedex - France

Dernière mise à jour : Mai 2018

Menu Logo Principal Société Française de Bio-Informatique GdR Bionformatique Moléculaire du CNRS

DECODAGE – Communauté d’Annotation des Génomes

Usage

How to use the TriAnnot pipeline

How to use the pipeline?

Official URLs

Home:

Direct access to the submission window with login/password :

 

Input BAC sequence

You can upload, each time, a multi-fasta file up to 10 sequences maximum. The minimum and maximum size of each sequence is respectively >10 kbpand <3 Mb. Case insensitive. There is no restriction concerning the number of submission since the pipeline uses a queuing list for sequence submission.

Example :
>Contig385B22 from BAC T. aestivum BAC library Pool A
 TTTCTCTTTGGGATAATTAGATTTATGCCCCTAGTTGTGTCCCACTCGTC
 TGTTTTACCCCTAATTCCCAAAAGTCACCAGTTCTGTCCAAATCACTTTC
 CTCCTCTTATGCTTTTGCCCTTTGACCGTTTGACCGTTAGTTTGAAAACT
 TCATAACTAATTCATACTAAATCAGAAAAATTCAAATAAGATACCAAAAT
 GTTCAGAAAAACATCACCTATATGCCAGTGTCATTTGCATCCATGAAAAA
 AGTGTTGGAAAGTGCCCATCTGAGTTTTAGCTCTCATGCTACCACCATGA

Avoid blank within the sequence.
The sequence characters should be A T G C, as well as IUB common characters for DNA sequence :
U (T), M (A C), R (A G), W (A T), S (C G), Y (C T), K (G T), B (C G T), D (A G T), H (A C T), V (A C G), X/N (A C G T).

Analysis time

In principle, the pipeline can be used to annotate full genomes. However for technical reasons and parallelization purposes, 10 sequences up to 3 Mb can be submitted online at once. As it would be cumbersome to annotate several Mb or Gb of sequence this way, the online access is more adapted to small scale analyses (i.e. BAC or small BAC contigs) in which the user can submit its sequence directly on the webpage (copy/paste or download) and start the analysis with a single click. The pipeline uses a queuing list for sequence submission. Therefore, the automatic structural and functional annotation process will depend of the queue length. In general, in this configuration, TriAnnot can deliver a BAC annotation in less than one hour depending of the cluster charge. For example: a default analysis of a 117 kb sequence containing 6 genes takes about 30 minutes.

A management screen is available (My Analysis) to check the status of your analysis (see figure below):

Web page for TriAnnot status

Web page for TriAnnot status

4 different status

4 different status

Submission of your own sequence

First of all, when you log in the first time you will have to fill the following screen: “My Profile”. This has to be done once.

User profile web page

User profile web page

On the TriAnnot pipeline analysis submission screen (see below) you first have to give a title to your analysis, and then choose a pipeline template. This pipeline template will define the receipt of your analysis by building the step.xml necessary to the pipeline. At present there are five pipeline templates:

  • “Wheat default IWGSC Annotation” - a default analysis which has been optimized for the annotation of the wheat chromosome 3B (French ANR 3BSEQ project).
  • “Rice default analysis” – a specific step.xml (template) has been written for the rice sequences – more suitable databanks combination and/or ab initio gene predictors. However, this template is not optimized as it has been done for wheat
  • “Oak default analysis” – same as rice. However, for oak specific RNA-seq data are used to improve the gene prediction based using SIMsearch
  • “Barley default analysis” – same as oak
  • "Maize default analysis" - same as rice
Submission web page

Submission web page

Then, you paste or upload your sequence before you click on the “Submit analysis” button.

Remarks: Other templates could be proposed under request if necessary, especially if you want to use the TriAnnot pipeline for other species. In this case please, contact triannot-support@clermont.inra.fr.

Email & links

The pipeline automatically sends you an email when data are available for your sequence’s structural and functional annotation. Here an example:

Example of email send by TriAnnot when completed

Example of email send by TriAnnot when completed

The link will take you back to the TriAnnot management interface (My Analysis). Then, you have two possibilities:

  1. have a quick look at the results using a graphical display such as GBrowse
  2. download all your data to your own computer.

GBrowse graphical display

How to launch the GBrowse graphical viewer

How to launch the GBrowse graphical viewer

Using the “My Analysis” web page you can also display a graphical view of your annotation using the Genome Browser. Of course, the online GBrowse viewer cannot be considered and used as an editor for manual expertise.

By default and speed up the display only four tracks are shown:

  • the sequence overview;
  • the Gene overview;
  • the “Structural & Functional Gene Annotation” (the TriAnnot predicted gene models) track
  • the “05_RepeatMasker – TREPplus” track.

Of course you can add new tracks by using the “Select Tracks” tab. Be aware thanyou can’t keep your track configuration from one analysis to another. There is no way to save the current configuration!

 

An example of Gbrowse display

An example of Gbrowse display

Download all your data

How to download data

How to download data

Using the “My Analysis” web page you can download the analysis of your choice. You will recover like that all generated output files (gff, embl, align, etc … - see the “Output Files” paragraph below) to be used locally with your own graphical editors (Artemis/GenomeView/Apollo). Each embl, gff or align file is tagged with the TriAnnot step analysis number (seen Architecture of the pipeline). You can also recover the initial and masked sequence in FASTA format which has been submitted, and protein FASTA file (translated gene models).

With the TriAnnot management window you can always delete previous analysis.

Output Files

The TriAnnot analysis will generate several output files organized within 4 folders. (see Architecture of the pipeline for more details).

  • BLAST results folder
    • EMBL folder
      • A number related to the step number: 06_BLASTN_*; 07_BLASTX_* and 12_BESTHIT_BLASTP_*
      • _* correspond to the databank used (see databanks)
      • Extension .embl
    • GFF folder
      • Same as above but with .gff extension
  • EMBL folder
    • In each folder files are tagged and follows the following rules:
      • A number related to the step number
      • The type of programs i.e. REPATMASKER; AUGUSTUS; EXONERATE; EUGENE; GENEMODEL; SIMsearch; BESTHIT; TRNASCAN-SE; TRF; BLASTN; BLASTX
      • The databank used (see databanks). When no databank is used, just the type of program is displayed i.e. 16_TRF.embl. For ab initio gene prediction programs the matrix used is displayed i.e. 5_AUGUSTUS_wheat.embl
      • Extension .embl
    • Few examples for wheat:
      • Step1- tRNAscan
        • 1_TRNASCAN-SE.embl
      • Step2 -Transposable Elements annotation & masking / univec and E. coli contamination
        • 2_REPEATMASKER_Ecoli.embl
        • 2_REPEATMASKER_univec.embl
        • 2_REPEATMASKER_MIPSrepeatPoa.embl
      • Step3 -BLASTx against TREPprot
        • 3_BLASTX_TREPprot.embl
      • Step5 - ab initiogene prediction
        • 5_AUGUSTUS_wheat.embl
        • 5_FGENESH.embl
        • 5_GENEID.embl
      • Step6- BLASTn / Exonerate
        • 6_EXONERATE_cdsBDI.embl
        • 6_EXONERATE_cdsOSAirgsp.embl
        • 6_EXONERATE_rnaSeqWheat.embl
        • 6_EXONERATE_TAEugs.embl
        • etc ...
      • Step7- BLASTx/Exonerate
        • 7_EXONERATE_pepBDI.embl
        • 7_EXONERATE_pepZMA.embl
        • 7_EXONERATE_protTRIT.embl
        • 7_EXONERATE_SIMprot.embl
        • etc ...
      • Step8- Combiner
        • 8_EUGENE.embl
      • Step9- SIMsearch results
        • 8_SIMsearch_TRITfl_CAT01.embl
        • 8_SIMsearch_SIMnucWheat_CAT02.embl
        • 8_SIMsearch_MAGNmrna_CAT03.embl
      • Step10- Gene structure without functional annotation
        • 10_MERGE.embl
      • Step11 - Genestructure with functional annotation
        • 9_GENEMODEL.embl
          • This is the most important file since it gives the final gene structure and contains the functional annotation
      • Step12– BLASTp / Exonerate (Best Hit)
        • 12_EXONERATE_pepBDI.embl
        • 12_EXONERATE_protSAC.embl
      • Step13– InterProScan for protein domains identification and Gene Ontology tag
        • 13_INTERPROSCAN.embl
      • Step14 – BLASTn (CNSs)
        • 14_BLASTN_genoBDI.embl
        • 14_BLASTN_genoOSAirgsp.embl
        • 14_BLASTN_refSeqChloro.embl
        • 14_BLASTN_refSeqMito.embl
        • 14_BLASTN_tgacWGSAash.embl
        • etc ...
      • Step15-BLASTx (CNSs)
        • 15_BLASTX_SIMprot.embl
      • Step16–Microsatellite markers (SSRs)
        • 16_TRF.embl
  • GFF folder
    • Exactly the same as above except that the file extension is .gff
  • Other files folder (as an example)
    • Best hit alignment files with percentage coverage and identity, and missing or additional gaps over 9 amino acids
      • 12_BESTHIT_BLASTP_pepBDI.align
      • 12_BESTHIT_BLASTP_pepOSAirgsp.align
      • 12_BESTHIT_BLASTP_protTRIT.align
      • etc ...
    • Files results obtain with GTallymer
      • 4_GTtallymer_Cs1XOCC1.fplot
      • 4_GTtallymer_Cs1XOCC1.res  (to display the k-mer composition of repeat sequences)
    • There is also a tabulated file for TEs annotation
      • Global_XM_for_TE_RNA_Nmask.xm
  • sequences folder
    • Initial fasta file submitted to TriAnnot
      • initial.seq
    • Several type of masked sequence (Ns / lower case)
      • RNA_Nmask.seq / RNA_LCmask.seq
      • TE_RNA_Nmask.seq / TE_RNA_LCmask.seq
      • Gene_TE_RNA_Nmask.seq
    • Protein sequences derived from TriAnnot gene model annotation. This file is important since it gives the final annotation based on protein sequences which can be used for further analysis. All proteins should start with a M (Methionine) and end with a star (*)
      • proteins.seq

Manual Annotation

After downloading the gff/embl files, and for manual expertise, you may use graphical genome annotation curation editor programs such as:

  • ARTEMIS (Carver et al., 2008 Bioinformatics 24, 2672-2676)
    • In this case it is preferable to use the EMBL files
  • GenomeView (Abeel et al., 2011 Nucleic Acids Research 2011;doi: 10.1093/nar/gkr995)
    • GenomeView needs a unique feature name to differentiate each track. Therefore each EMBL file is created with this constraint. Within the Databanks link we give for each EMBL file the feature name used with GenomeView (track list - see databanks)
  • APOLLO (Lewis et al., 2002 Genome Biology 3)
    • In this case it is preferable to use the GFF files