Supplementary Data.

Donaldson and Gottgens (2006).

ABBREVIATION:
TFBS = Transcription factor binding site

UCSC custom track files:
chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10
chr11
chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19
chrX
chrY

These files can be downloaded and viewed in the UCSC genome browser (Mouse March 2005). Alternatively the URL for these files can be added to the custom track input box. The address is:

http://hscl.cimr.cam.ac.uk/sup_donaldson06/ucsc_also_human_nodup_noexon_mar05_4track_chrN.bed (where N = chromosome; 1-19, X or Y).

Motif data files:
Complete M.m./P.t. conserved Xie TFBSs
Complete M.m./H.s. conserved Xie TFBSs

Xie TFBSs conserved between M.m./P.t. that are not conserved between M.m./H.s. (filtered)
(filtered to remove TFBSs in exons and only contain those from mouse regions aligned to both chimpanzee and human sequence)
Complete Xie TFBSs conserved between M.m./P.t. (filtered)
(filtered to remove TFBSs in exons and only contain those from mouse regions aligned to both chimpanzee and human sequence)

Mouse gene symbol files:
Files containing the symbols of genes located within (and +/- 25kb flanking) of genome tiles where chimpanzee-mouse TFBSs (not present in mouse-human) are over-represented compared to the entire chimpanzee-mouse TFBS set.
>= 3 fold over median enrichment
>= 6 fold over median enrichment
>= 2 Std Dev enrichment

GoToolBox input files:
The file contain the input files generated by GoToolBox 'Create-Dataset'.
>= 3 fold over median enrichment
>= 6 fold over median enrichment
>= 2 Std Dev enrichment

Summary table of TFBS data for this study:
Candidate TFBSs were identified as outlined in the text. Motif numbers and IUPAC consensus sequences refer to the TFBSs reported by Xie et. al. The name in parenthesis is the best matching TRANSFAC TFBS, as determined by the above study. Numbers of binding sites are shown for each TFBS found in the sequence alignments of mouse-chimpanzee and mouse-human. Also shown are the numbers of candidate TFBSs lost in the human genome together with numbers of genes localised to these TFBSs. Numbers are presented for the total non-redundant dataset (unfiltered) and the final filtered dataset (filtered). M. m = Mus musculus; H. s. = Homo sapiens; P. t. = Pan troglodytes.
Motif IUPAC string Conserved TFBSs P.t. TFBSs not in H.s.
M.m./P.t. *1 M.m./H.s. *1 Unfiltered *2 Filtered *3
1: NRF-2RCGCAnGCGY113713116041
2: MYCCACGTG1287013824562385
3: ELK-1SCGGAAGY3410371634101906
4: --ACTAYRnnnCCCR3864311611
5: NF-YGATTGGY1172412160624485
6: SP1GGGCGGR884810244680549
7: AP-1TGAnTCA851948741421521759
8: --TMTCGCGAnR25328332
9: ATF3TGAYRTCA1544416000389303
10:YY1GCCATnTTG33983559180138
11: GABPMGGAAGTG69117213362272
12: E12CAGGTG9710510106156274247
13: LEF1CTTTGT11684911976267495323
14: ATF3TGACGTCA144615662622
15: AP-4CAGCTG16875817485038002834
16: C-ETS-2RYTTCCTG293153036014091092
17: IRF1AACTTT10518010771165225137
18: SREBP-1TCAnnTGAY1613516588896741
19: --GKCGCn(7)TGAYG181911
20: E4F1GTGACGY2412262611684
21: --GGAAnCGGAAnY10511765
22: --TGCGCAnK2496277614071
23: CHX10TAATTA20593421058448163895
24: MAZGGGAGGRR209022263714341150
25: ESRRATGACCTY435454454825062012
26: E4BP4TTAYRTAA1697417382551465
27: --TGGn(6)KCCAR52495492304229
28: RSRFC4CTAWWWATA1594216359754660
29: --CTTTAAR603916187535062740
30: --YGCGYRCGC11851456195116

NOTES:
*1 = Binding sites that are conserved between aligned genomes
*2 = Non-overlapping binding sites that are present in the chimpanzee genome BUT NOT in the human genome (relative to the mouse genome)
*3 = Motif libraries where sites have been removed that are present in sequence annotated as exons or affected by structural variation

Valid HTML 4.01! Webmaster.
Last modified: Thursday 8 June 2006.