La navigation sur ce site nécessite l'installation et l'utilisation des cookies sur votre ordinateur
En savoir +
En savoir plus
Notre utilisation de cookies
« Cookies » désigne un ensemble d’informations déposées dans le terminal de l’utilisateur lorsque celui-ci navigue sur un site web. Il s’agit d’un fichier contenant notamment un identifiant sous forme de numéro, le nom du serveur qui l’a déposé et éventuellement une date d’expiration. Grâce aux cookies, des informations sur votre visite, notamment votre langue de prédilection et d'autres paramètres, sont enregistrées sur le site web. Cela peut faciliter votre visite suivante sur ce site et renforcer l'utilité de ce dernier pour vous.
Afin d’améliorer votre expérience, nous utilisons des cookies pour conserver certaines informations de connexion et fournir une navigation sûre, collecter des statistiques en vue d’optimiser les fonctionnalités du site. Afin de voir précisément tous les cookies que nous utilisons, nous vous invitons à télécharger « Ghostery », une extension gratuite pour navigateurs permettant de les détecter et, dans certains cas, de les bloquer.
Ghostery est disponible gratuitement à cette adresse : https://www.ghostery.com/fr/products/
Vous pouvez également consulter le site de la CNIL afin d’apprendre à paramétrer votre navigateur pour contrôler les dépôts de cookies sur votre terminal.
S’agissant des cookies publicitaires déposés par des tiers, vous pouvez également vous connecter au site http://www.youronlinechoices.com/fr/controler-ses-cookies/, proposé par les professionnels de la publicité digitale regroupés au sein de l’association européenne EDAA (European Digital Advertising Alliance). Vous pourrez ainsi refuser ou accepter les cookies utilisés par les adhérents de l'EDAA.
Il est par ailleurs possible de s’opposer à certains cookies tiers directement auprès des éditeurs :
Catégorie de cookie
Moyens de désactivation
Cookies analytiques et de performance
Realytics Google Analytics Spoteffects Optimizely
Cookies de ciblage ou publicitaires
DoubleClick Mediarithmics
Les différents types de cookies pouvant être utilisés sur nos sites internet sont les suivants :
Cookies obligatoires
Cookies fonctionnels
Cookies sociaux et publicitaires
Ces cookies sont nécessaires au bon fonctionnement du site, ils ne peuvent pas être désactivés. Ils nous sont utiles pour vous fournir une connexion sécuritaire et assurer la disponibilité a minima de notre site internet.
Ces cookies nous permettent d’analyser l’utilisation du site afin de pouvoir en mesurer et en améliorer la performance. Ils nous permettent par exemple de conserver vos informations de connexion et d’afficher de façon plus cohérente les différents modules de notre site.
Ces cookies sont utilisés par des agences de publicité (par exemple Google) et par des réseaux sociaux (par exemple LinkedIn et Facebook) et autorisent notamment le partage des pages sur les réseaux sociaux, la publication de commentaires, la diffusion (sur notre site ou non) de publicités adaptées à vos centres d’intérêt.
Sur nos CMS EZPublish, il s’agit des cookies sessions CAS et PHP et du cookie New Relic pour le monitoring (IP, délais de réponse).
Ces cookies sont supprimés à la fin de la session (déconnexion ou fermeture du navigateur)
Sur nos CMS EZPublish, il s’agit du cookie XiTi pour la mesure d’audience. La société AT Internet est notre sous-traitant et conserve les informations (IP, date et heure de connexion, durée de connexion, pages consultées) 6 mois.
Sur nos CMS EZPublish, il n’y a pas de cookie de ce type.
Pour obtenir plus d’informations concernant les cookies que nous utilisons, vous pouvez vous adresser au Déléguée Informatique et Libertés de l’INRA par email à cil-dpo@inra.fr ou par courrier à :
INRA 24, chemin de Borde Rouge –Auzeville – CS52627 31326 Castanet Tolosan cedex - France
You can upload, each time, a multi-fasta file up to 10 sequences maximum. The minimum and maximum size of each sequence is respectively >10 kbpand <3 Mb. Case insensitive. There is no restriction concerning the number of submission since the pipeline uses a queuing list for sequence submission.
Example : >Contig385B22 from BAC T. aestivum BAC library Pool A TTTCTCTTTGGGATAATTAGATTTATGCCCCTAGTTGTGTCCCACTCGTC TGTTTTACCCCTAATTCCCAAAAGTCACCAGTTCTGTCCAAATCACTTTC CTCCTCTTATGCTTTTGCCCTTTGACCGTTTGACCGTTAGTTTGAAAACT TCATAACTAATTCATACTAAATCAGAAAAATTCAAATAAGATACCAAAAT GTTCAGAAAAACATCACCTATATGCCAGTGTCATTTGCATCCATGAAAAA AGTGTTGGAAAGTGCCCATCTGAGTTTTAGCTCTCATGCTACCACCATGA
Avoid blank within the sequence. The sequence characters should be A T G C, as well as IUB common characters for DNA sequence : U (T), M (A C), R (A G), W (A T), S (C G), Y (C T), K (G T), B (C G T), D (A G T), H (A C T), V (A C G), X/N (A C G T).
Analysis time
In principle, the pipeline can be used to annotate full genomes. However for technical reasons and parallelization purposes, 10 sequences up to 3 Mb can be submitted online at once. As it would be cumbersome to annotate several Mb or Gb of sequence this way, the online access is more adapted to small scale analyses (i.e. BAC or small BAC contigs) in which the user can submit its sequence directly on the webpage (copy/paste or download) and start the analysis with a single click. The pipeline uses a queuing list for sequence submission. Therefore, the automatic structural and functional annotation process will depend of the queue length. In general, in this configuration, TriAnnot can deliver a BAC annotation in less than one hour depending of the cluster charge. For example: a default analysis of a 117 kb sequence containing 6 genes takes about 30 minutes.
A management screen is available (My Analysis) to check the status of your analysis (see figure below):
Web page for TriAnnot status
4 different status
Submission of your own sequence
First of all, when you log in the first time you will have to fill the following screen: “My Profile”. This has to be done once.
User profile web page
On the TriAnnot pipeline analysis submission screen (see below) you first have to give a title to your analysis, and then choose a pipeline template. This pipeline template will define the receipt of your analysis by building the step.xml necessary to the pipeline. At present there are five pipeline templates:
“Wheat default IWGSC Annotation” - a default analysis which has been optimized for the annotation of the wheat chromosome 3B (French ANR 3BSEQ project).
“Rice default analysis” – a specific step.xml (template) has been written for the rice sequences – more suitable databanks combination and/or ab initio gene predictors. However, this template is not optimized as it has been done for wheat
“Oak default analysis” – same as rice. However, for oak specific RNA-seq data are used to improve the gene prediction based using SIMsearch
“Barley default analysis” – same as oak
"Maize default analysis" - same as rice
Submission web page
Then, you paste or upload your sequence before you click on the “Submit analysis” button.
Remarks: Other templates could be proposed under request if necessary, especially if you want to use the TriAnnot pipeline for other species. In this case please, contact triannot-support@clermont.inra.fr.
Email & links
The pipeline automatically sends you an email when data are available for your sequence’s structural and functional annotation. Here an example:
Example of email send by TriAnnot when completed
The link will take you back to the TriAnnot management interface (My Analysis). Then, you have two possibilities:
have a quick look at the results using a graphical display such as GBrowse
download all your data to your own computer.
GBrowse graphical display
How to launch the GBrowse graphical viewer
Using the “My Analysis” web page you can also display a graphical view of your annotation using the Genome Browser. Of course, the online GBrowse viewer cannot be considered and used as an editor for manual expertise.
By default and speed up the display only four tracks are shown:
the sequence overview;
the Gene overview;
the “Structural & Functional Gene Annotation” (the TriAnnot predicted gene models) track
the “05_RepeatMasker – TREPplus” track.
Of course you can add new tracks by using the “Select Tracks” tab. Be aware thanyou can’t keep your track configuration from one analysis to another. There is no way to save the current configuration!
An example of Gbrowse display
Download all your data
How to download data
Using the “My Analysis” web page you can download the analysis of your choice. You will recover like that all generated output files (gff, embl, align, etc … - see the “Output Files” paragraph below) to be used locally with your own graphical editors (Artemis/GenomeView/Apollo). Each embl, gff or align file is tagged with the TriAnnot step analysis number (seen Architecture of the pipeline). You can also recover the initial and masked sequence in FASTA format which has been submitted, and protein FASTA file (translated gene models).
With the TriAnnot management window you can always delete previous analysis.
Output Files
The TriAnnot analysis will generate several output files organized within 4 folders. (see Architecture of the pipeline for more details).
BLAST results folder
EMBL folder
A number related to the step number: 06_BLASTN_*; 07_BLASTX_* and 12_BESTHIT_BLASTP_*
_* correspond to the databank used (see databanks)
Extension .embl
GFF folder
Same as above but with .gff extension
EMBL folder
In each folder files are tagged and follows the following rules:
A number related to the step number
The type of programs i.e. REPATMASKER; AUGUSTUS; EXONERATE; EUGENE; GENEMODEL; SIMsearch; BESTHIT; TRNASCAN-SE; TRF; BLASTN; BLASTX
The databank used (see databanks). When no databank is used, just the type of program is displayed i.e. 16_TRF.embl. For ab initio gene prediction programs the matrix used is displayed i.e. 5_AUGUSTUS_wheat.embl
Extension .embl
Few examples for wheat:
Step1- tRNAscan
1_TRNASCAN-SE.embl
Step2 -Transposable Elements annotation & masking / univec and E. coli contamination
2_REPEATMASKER_Ecoli.embl
2_REPEATMASKER_univec.embl
2_REPEATMASKER_MIPSrepeatPoa.embl
Step3 -BLASTx against TREPprot
3_BLASTX_TREPprot.embl
Step5 - ab initiogene prediction
5_AUGUSTUS_wheat.embl
5_FGENESH.embl
5_GENEID.embl
Step6- BLASTn / Exonerate
6_EXONERATE_cdsBDI.embl
6_EXONERATE_cdsOSAirgsp.embl
6_EXONERATE_rnaSeqWheat.embl
6_EXONERATE_TAEugs.embl
etc ...
Step7- BLASTx/Exonerate
7_EXONERATE_pepBDI.embl
7_EXONERATE_pepZMA.embl
7_EXONERATE_protTRIT.embl
7_EXONERATE_SIMprot.embl
etc ...
Step8- Combiner
8_EUGENE.embl
Step9- SIMsearch results
8_SIMsearch_TRITfl_CAT01.embl
8_SIMsearch_SIMnucWheat_CAT02.embl
8_SIMsearch_MAGNmrna_CAT03.embl
Step10- Gene structure without functional annotation
10_MERGE.embl
Step11 - Genestructure with functional annotation
9_GENEMODEL.embl
This is the most important file since it gives the final gene structure and contains the functional annotation
Step12– BLASTp / Exonerate (Best Hit)
12_EXONERATE_pepBDI.embl
12_EXONERATE_protSAC.embl
Step13– InterProScan for protein domains identification and Gene Ontology tag
13_INTERPROSCAN.embl
Step14 – BLASTn (CNSs)
14_BLASTN_genoBDI.embl
14_BLASTN_genoOSAirgsp.embl
14_BLASTN_refSeqChloro.embl
14_BLASTN_refSeqMito.embl
14_BLASTN_tgacWGSAash.embl
etc ...
Step15-BLASTx (CNSs)
15_BLASTX_SIMprot.embl
Step16–Microsatellite markers (SSRs)
16_TRF.embl
GFF folder
Exactly the same as above except that the file extension is .gff
Other files folder (as an example)
Best hit alignment files with percentage coverage and identity, and missing or additional gaps over 9 amino acids
12_BESTHIT_BLASTP_pepBDI.align
12_BESTHIT_BLASTP_pepOSAirgsp.align
12_BESTHIT_BLASTP_protTRIT.align
etc ...
Files results obtain with GTallymer
4_GTtallymer_Cs1XOCC1.fplot
4_GTtallymer_Cs1XOCC1.res (to display the k-mer composition of repeat sequences)
There is also a tabulated file for TEs annotation
Global_XM_for_TE_RNA_Nmask.xm
sequences folder
Initial fasta file submitted to TriAnnot
initial.seq
Several type of masked sequence (Ns / lower case)
RNA_Nmask.seq / RNA_LCmask.seq
TE_RNA_Nmask.seq / TE_RNA_LCmask.seq
Gene_TE_RNA_Nmask.seq
Protein sequences derived from TriAnnot gene model annotation. This file is important since it gives the final annotation based on protein sequences which can be used for further analysis. All proteins should start with a M (Methionine) and end with a star (*)
proteins.seq
Manual Annotation
After downloading the gff/embl files, and for manual expertise, you may use graphical genome annotation curation editor programs such as:
ARTEMIS (Carver et al., 2008 Bioinformatics 24, 2672-2676)
In this case it is preferable to use the EMBL files
GenomeView (Abeel et al., 2011 Nucleic Acids Research 2011;doi: 10.1093/nar/gkr995)
GenomeView needs a unique feature name to differentiate each track. Therefore each EMBL file is created with this constraint. Within the Databanks link we give for each EMBL file the feature name used with GenomeView (track list - see databanks)