logo here
Quick search  

The Animal Transcription Factor DataBase (AnimalTFDB) is a comprehensive TF database, in which we identified and classified all the genome-wide TFs in 50 sequenced animal genomes (Ensembl release version 60). In addition to TFs, the AnimalTFDB also collects transcription co-factors and chromatin remodeling factors of those genomes, which play regulatory roles in transcription. Here we defined the TFs as proteins containing a sequence-specific DNA-binding domain (DBD) and regulating target gene expression. Currently, the AnimalTFDB classifies all the animal TFs into 72 families according to their conserved DBDs.


1. Methods for predicting TFs, transcription co-factors and chromatin remodeling factors:
The identification of TFs was based on the Hidden Markov Model (HMM) profiles of their DBDs. Among the 71 defined families, 59 families had HMM profiles for their DBDs in Pfam database (v25.0), while the remaining 12 families existing in InterPro database (v32.0) did not have available HMM profiles. To build the HMM profiles for them, we performed multiple sequence alignment by ClustalW2 for their DBD sequences and used the hmmbuild program in the HMMER package to build HMM profiles. Then, we applied the hmmsearch program to search the proteome sequences of each species to predict TFs. Based on our manual curation, we took E-value 0.0001 as the cutoff.

Transcription co-factors are considered as proteins that interact with TFs in the transcription complex but do not bind to the DNA directly. The chromatin remodeling factors were defined as proteins that regulate transcription by modifying the chromatin formation. To identify them, we firstly got the human transcription co-factors and chromatin remodeling factors from TFCONES and GO database according to the items: transcription cofactor activity and chromatin remodeling, respectively. Then, we used the human sequences to perform BLAST search and chose the best BLAST hit as the transcription co-factor or chromatin remodeling factor for the searched species.
2. TF families and seeds:
Family DNA-binding domain Pfam ID or InterPro ID
AF-4 AF-4 PF05110
AP-2 AP_2 IPR004979
ARID ARID PF01388
bHLH HLH PF00010
bZIP TF_bZIP bZIP IPR004827
C/EBP bZIP IPR016468
CBF CBF_alpha PF02312
COE COE IPR003523
CSL Beta-trefoil PF09270
CG-1 CG-1 PF03859
CP2 CP2 PF04516
CSD CSD PF00313
E2F E2F_TDP PF02319
ETS Ets PF00178
Fork head Fork_head PF00250
GCM GCM PF03615
GTF2I GTF2I PF02946
HMG HMG_box PF00505
HMGI/HMGY HMGI/HMGY IPR000116
Homeobox Homeobox Homeobox PF00046
Pou Homeobox, Pou PF00157
CUT Homeobox, CUT PF02376
TF_Otx Homeobox, TF_Otx PF03529
HSF HSF_DNA-bind PF00447
HTH HTH_psq PF05225
IRF IRF PF00605
MH1 CTF/NFI MH1 PF00859
MH1 MH1 PF03165
MYB Myb_DNA-bd PF00249
MBD MBD PF01429
NDT80/PhoG NDT80_PhoG PF05224
NF-Y NF-YA CBFB_NFYA PF02045
NF-YBC CBFD_NFYB_HMF PF00808
Nrf1 Nrf1_DNA-bind PF10491
PC4 PC4 PF02229
P53 P53 PF00870
PAX PAX PF00292
Prox1 Prox1 PF05044
RFX RFX PF02257
RHD RHD PF00554
Runt Runt PF00853
SAND SAND PF01342
SRF SRF PF00319
STAT STAT_bind PF02864
T-box T-box PF00907
TEA TEA PF01285
TSC22 TSC22 PF01166
Tub Tub PF01167
Zinc finger zf-C2HC zf-C2HC PF01530
zf-GAGA zf-GAGA PF09237
zf-BED zf-BED PF02892
zf-C2H2 ZBTB zf-C2H2 PF00651
zf-C2H2 zf-C2H2 PF00096
Nuclear Receptor(zf-C4) PPAR receptor zf-C4 IPR003074
Androgen receptor zf-C4 PF02166
COUP_TF zf-C4 IPR003068
Ecdystd receptor zf-C4 IPR003069
GCR zf-C4 PF02155
Nuclear hormone receptor zf-C4 IPR003070
Oestrogen receptor zf-C4 PF02159
Progesterone receptor  zf-C4 PF02161
Retinoic acid receptor zf-C4 IPR003078
ROR receptor  zf-C4 IPR003079
Thyroid hormone receptor  zf-C4 IPR001728
Other nuclear receptor zf-C4 PF00104
DM DM PF00751
zf-GATA zf-GATA PF00320
zf-LITAF-like zf-LITAF-like PF10601
zf-MIZ zf-MIZ PF02891
zf-NF-X1 zf-NF-X1 PF01422
THAP THAP PF05485
Others
3. Gene annotation:
1) Gene basic information
Gene basic information includes Ensembl ID, Gene ID, symbol, alias, full name, other designations, chromosome location and transcripts etc. All of the gene information was extracted from gene_info file downloaded from NCBI ftp (ftp://ftp.ncbi.nih.gov/gene/DATA/ ) and gtf files downloaded from Ensembl ftp (ftp://ftp.ensembl.org/pub/release-60/gtf/)
2) Gene structure
The gene structure describes the distribution of CDS, UTR and intron of gene on chromosome.
3) Gene ontology (GO) annotation
The GO annotations were parsed from gene2go file, which was downloaded from NCBI ftp ( ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/);
4) Function domain annotation
The function domain displays the domains distribution of the longest protein of each gene. All of the protein domains were identified by HMMER profiles downloaded from Pfam database.
5) 3D structure hit
3D structure hit of each gene was gotten from PDB database. We applied BLAST program to search the protein sequences of factors against sequences in PDB with cutoffs: E-value<=1e-15, hit length>=50 amino acids and identity>=55%.
6) Protein-protein Interaction
The protein-protein interactions were extracted from BioGRID and HPRD databases and a literature (PMID:20211142).
7) Pathway annotation
The pathway annotations were obtained from the KEGG and BioCarta databases.
8) Ortholog information
We used the reciprocal best blast hit (RBH) strategy to predict the ortholog, on the assumption that two genes in two genomes will find each other as the best hit in the other genome.
9) Paralog Information
The BLAST Score Ratio (BSR) approach was used to identify the paralog. Paralogs are required BLAST identity>=50%, E-value<=1e-20, coverage>=70%, BSR>=0.4.
10) Cross Reference
Cross reference provides links to other database including Unigene, UniSTS, Genebank, OMIM, HGNC, CGNC etc.
4. Web server:
Current URL of AnimalTFDB is http://www.bioguo.org/AnimalTFDB/. Users can browse or search the data at different levels.
Browse:
1) Browse by species. Users can browse data by clicking the logo of species or by clicking the name on the left treeview;
2) Browse by families. Users can browse data by clicking the logo of families or by clicking the name on the left treeview.
Search:
1) Quick search for Ensembl ID for gene, transcript and protein, Entrez gene ID or gene symbol at head of each page;
2) Advanced search page provides multiple ways to search. Users can search by different basic information and annotation information of a TF. Before search, users must choose the families and species.