Institute of Bioinformatics Münster
TEclass
Overview Start Application Contact
Changes
2023-08-02: Adapt Download_dependencies.sh, so it works now (e.g. in Debian 12)
2021-02-25: Remove local installation problem
2020-03-10: Extend max runtime to 2 hours
2016-05-30: New Classifiers
TEclass - Classification of TE consensus sequences

Description

TEclass classifies unknown transpsosable element (TE) consensus sequences into four categories, according to their mechanism of transposition: DNA transposons, LTRs, LINEs, SINEs. The classification uses support vector machines (here), random forests (here), learning vector quantisation (here), and also predicts ORFs (here). In the current version the input sequences must be in fasta format. You can either upload the file you want to process, or paste the sequences directly. Note that the tool cannot distinguish betwen TEs and non-TEs, thus every sequence will be classified into one of the four categories (or, in ambiguous cases will be marked as unknown) even if it is not a TE.
To start the newer version please click here TEclass2

Notes

  • TEclass is not a tool to annotate whole-genome data, thus it is not a replacement for RepeatMasker or Censor. Its primary purpose is to classify the repeat libraries which can subsequently be used by these two tools. Thus, the input should not contain more than a few thousand sequences, if you have significantly more its a sign that you are almost certainly using TEclass improperly.
  • The entered data must not exceed 1MB in size!

Methods

We analyze repeats in different size categories: 0-600 bp, 601-1800 bp, 1801-4000 bp, >4000 bp, and build independent classifiers for all these length classes. We use libsvm as the SVM engine, with a Gaussian kernel. The classification process is binary, with the following steps: forward versus reverse sequence orientation > DNA versus Retrotransposon > LTRs versus nonLTRs (for retroelements) > LINEs versus SINEs (for nonLTR repeats). The last step is performed only for repeats with lengths below 1800 bp, because we are not aware of SINEs longer than 1800 bp. Separate classifiers were built for each length class and for each classification step. If the different methods of classification lead to conflicting results, TEclass reports the repeat either as unknown, or as the last category where the clasification methods are in agreement.

Download

Links

Tools for de novo reconstruction of repeat consensi
Tools for similarity based repeat identification:

Citation

Please cite Abrusan G, Grundmann N, DeMeester L, Makalowski W 2009. TEclass: a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25:1329-1330 here

Credits

Please contact the bioinformatics team or the author directly at gyorgy01||gmail||com (replace || with the approprate signs) if you have any questions. The classification tool was written by György Abrusán and was funded by the Katholieke Universiteit Leuven, Belgium (postdoctoral fellowship for G.A.) and the University of Münster
2023-10-03 08:22