The Animal Transcription Factor DataBase (AnimalTFDB) is a comprehensive TF database, in which we identified and classified all the genome-wide TFs in 50 sequenced animal genomes (Ensembl release version 60). In addition to TFs, the AnimalTFDB also collects transcription co-factors and chromatin remodeling factors of those genomes, which play regulatory roles in transcription. Here we defined the TFs as proteins containing a sequence-speciﬁc DNA-binding domain (DBD) and regulating target gene expression. Currently, the AnimalTFDB classifies all the animal TFs into 72 families according to their conserved DBDs.
- • 1. Methods for predicting TFs, transcription co-factors and chromatin remodeling factors
- • 2. TF families and seeds
- • 3. Gene annotation
- • 4. Web server
|1. Methods for predicting TFs, transcription co-factors and chromatin remodeling factors:|
|The identification of TFs was based on the Hidden Markov Model (HMM) profiles of their DBDs.
Among the 71 defined families, 59 families had HMM proﬁles for their DBDs in Pfam database (v25.0), while the remaining 12 families existing in InterPro database (v32.0) did not have available HMM proﬁles. To build the HMM profiles for them, we performed multiple sequence alignment by ClustalW2 for their DBD sequences and used the hmmbuild program in the HMMER package to build HMM proﬁles. Then, we applied the hmmsearch program to search the proteome sequences of each species to predict TFs. Based on our manual curation, we took E-value 0.0001 as the cutoff. |
Transcription co-factors are considered as proteins that interact with TFs in the transcription complex but do not bind to the DNA directly. The chromatin remodeling factors were defined as proteins that regulate transcription by modifying the chromatin formation. To identify them, we firstly got the human transcription co-factors and chromatin remodeling factors from TFCONES and GO database according to the items: transcription cofactor activity and chromatin remodeling, respectively. Then, we used the human sequences to perform BLAST search and chose the best BLAST hit as the transcription co-factor or chromatin remodeling factor for the searched species.
|2. TF families and seeds:|
|3. Gene annotation:|
1) Gene basic information|
Gene basic information includes Ensembl ID, Gene ID, symbol, alias, full name, other designations, chromosome location and transcripts etc. All of the gene information was extracted from gene_info file downloaded from NCBI ftp (ftp://ftp.ncbi.nih.gov/gene/DATA/ ) and gtf files downloaded from Ensembl ftp (ftp://ftp.ensembl.org/pub/release-60/gtf/)
2) Gene structure
The gene structure describes the distribution of CDS, UTR and intron of gene on chromosome.
3) Gene ontology (GO) annotation
The GO annotations were parsed from gene2go file, which was downloaded from NCBI ftp ( ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/);
4) Function domain annotation
The function domain displays the domains distribution of the longest protein of each gene. All of the protein domains were identified by HMMER profiles downloaded from Pfam database.
5) 3D structure hit
3D structure hit of each gene was gotten from PDB database. We applied BLAST program to search the protein sequences of factors against sequences in PDB with cutoffs: E-value<=1e-15, hit length>=50 amino acids and identity>=55%.
6) Protein-protein Interaction
The protein-protein interactions were extracted from BioGRID and HPRD databases and a literature (PMID:20211142).
7) Pathway annotation
The pathway annotations were obtained from the KEGG and BioCarta databases.
8) Ortholog information
We used the reciprocal best blast hit (RBH) strategy to predict the ortholog, on the assumption that two genes in two genomes will find each other as the best hit in the other genome.
9) Paralog Information
The BLAST Score Ratio (BSR) approach was used to identify the paralog. Paralogs are required BLAST identity>=50%, E-value<=1e-20, coverage>=70%, BSR>=0.4.
10) Cross Reference
Cross reference provides links to other database including Unigene, UniSTS, Genebank, OMIM, HGNC, CGNC etc.
|4. Web server:|
|Current URL of AnimalTFDB is http://www.bioguo.org/AnimalTFDB/. Users can browse or search the data at different levels.|
1) Browse by species. Users can browse data by clicking the logo of species or by clicking the name on the left treeview;
2) Browse by families. Users can browse data by clicking the logo of families or by clicking the name on the left treeview.
1) Quick search for Ensembl ID for gene, transcript and protein, Entrez gene ID or gene symbol at head of each page;
2) Advanced search page provides multiple ways to search. Users can search by different basic information and annotation information of a TF. Before search, users must choose the families and species.