Donaldson and Gottgens (2006).
ABBREVIATION:
TFBS = Transcription factor binding site
UCSC custom track files:
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chrX
chrY
These files can be downloaded and viewed in the UCSC genome browser
(Mouse March 2005).
Alternatively the URL for these files can be added to the custom
track input box. The address is:
http://hscl.cimr.cam.ac.uk/sup_donaldson06/ucsc_also_human_nodup_noexon_mar05_4track_chrN.bed
(where N = chromosome; 1-19, X or Y).
Motif data files:
Complete M.m./P.t. conserved Xie TFBSs
Complete M.m./H.s. conserved Xie TFBSs
Xie TFBSs conserved between M.m./P.t. that are not conserved between M.m./H.s. (filtered)
(filtered to remove TFBSs in exons and only contain those from mouse regions aligned
to both chimpanzee and human sequence)
Complete Xie TFBSs conserved between M.m./P.t. (filtered)
(filtered to remove TFBSs in exons and only contain those from mouse regions aligned
to both chimpanzee and human sequence)
Mouse gene symbol files:
Files containing the symbols of genes located within (and +/- 25kb flanking) of genome
tiles where chimpanzee-mouse TFBSs (not present in mouse-human) are over-represented
compared to the entire chimpanzee-mouse TFBS set.
>= 3 fold over median enrichment
>= 6 fold over median enrichment
>= 2 Std Dev enrichment
GoToolBox input files:
The file contain the input files generated by GoToolBox 'Create-Dataset'.
>= 3 fold over median enrichment
>= 6 fold over median enrichment
>= 2 Std Dev enrichment
Summary table of TFBS data for this study:
Candidate TFBSs were identified as outlined in the text. Motif numbers and IUPAC consensus sequences refer to the TFBSs reported by Xie et. al. The name in parenthesis is the best matching TRANSFAC TFBS, as determined by the above study. Numbers of binding sites are shown for each TFBS found in the sequence alignments of mouse-chimpanzee and mouse-human. Also shown are the numbers of candidate TFBSs lost in the human genome together with numbers of genes localised to these TFBSs. Numbers are presented for the total non-redundant dataset (unfiltered) and the final filtered dataset (filtered). M. m = Mus musculus; H. s. = Homo sapiens; P. t. = Pan troglodytes.
| Motif |
IUPAC string |
Conserved TFBSs |
P.t. TFBSs not in H.s. |
| M.m./P.t. *1 |
M.m./H.s. *1 |
Unfiltered *2 |
Filtered *3 |
| 1: NRF-2 | RCGCAnGCGY | 1137 | 1311 | 60 | 41 |
| 2: MYC | CACGTG | 12870 | 13824 | 562 | 385 |
| 3: ELK-1 | SCGGAAGY | 3410 | 3716 | 3410 | 1906 |
| 4: -- | ACTAYRnnnCCCR | 386 | 431 | 16 | 11 |
| 5: NF-Y | GATTGGY | 11724 | 12160 | 624 | 485 |
6: SP1 | GGGCGGR | 8848 | 10244 | 680 | 549 |
| 7: AP-1 | TGAnTCA | 85194 | 87414 | 2152 | 1759 |
| 8: -- | TMTCGCGAnR | 253 | 283 | 3 | 2 |
| 9: ATF3 | TGAYRTCA | 15444 | 16000 | 389 | 303 |
| 10:YY1 | GCCATnTTG | 3398 | 3559 | 180 | 138 |
| 11: GABP | MGGAAGTG | 6911 | 7213 | 362 | 272 |
| 12: E12 | CAGGTG | 97105 | 101061 | 5627 | 4247 |
| 13: LEF1 | CTTTGT | 116849 | 119762 | 6749 | 5323 |
| 14: ATF3 | TGACGTCA | 1446 | 1566 | 26 | 22 |
| 15: AP-4 | CAGCTG | 168758 | 174850 | 3800 | 2834 |
| 16: C-ETS-2 | RYTTCCTG | 29315 | 30360 | 1409 | 1092 |
| 17: IRF1 | AACTTT | 105180 | 107711 | 6522 | 5137 |
| 18: SREBP-1 | TCAnnTGAY | 16135 | 16588 | 896 | 741 |
| 19: -- | GKCGCn(7)TGAYG | 18 | 19 | 1 | 1 |
| 20: E4F1 | GTGACGY | 2412 | 2626 | 116 | 84 |
| 21: -- | GGAAnCGGAAnY | 105 | 117 | 6 | 5 |
| 22: -- | TGCGCAnK | 2496 | 2776 | 140 | 71 |
| 23: CHX10 | TAATTA | 205934 | 210584 | 4816 | 3895 |
| 24: MAZ | GGGAGGRR | 20902 | 22637 | 1434 | 1150 |
| 25: ESRRA | TGACCTY | 43545 | 44548 | 2506 | 2012 |
| 26: E4BP4 | TTAYRTAA | 16974 | 17382 | 551 | 465 |
| 27: -- | TGGn(6)KCCAR | 5249 | 5492 | 304 | 229 |
| 28: RSRFC4 | CTAWWWATA | 15942 | 16359 | 754 | 660 |
| 29: -- | CTTTAAR | 60391 | 61875 | 3506 | 2740 |
| 30: -- | YGCGYRCGC | 1185 | 1456 | 195 | 116 |
NOTES:
*1 = Binding sites that are conserved between aligned genomes
*2 = Non-overlapping binding sites that are present in the chimpanzee genome BUT NOT in the human genome (relative to the mouse genome)
*3 = Motif libraries where sites have been removed that are present in sequence annotated as exons or affected by structural variation
|