1 Introduction

This document offers an introduction and overview of motifBreakR, which allows the biologist to judge whether the sequence surrounding a polymorphism or mutation is a good match to known transcription factor binding sites, and how much information is gained or lost in one allele of the polymorphism relative to another or mutation vs. wildtype. motifBreakR is flexible, giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum, 2) log-probabilities, and 3) relative entropy. motifBreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor.

motifBreakR works with position probability matrices (PPM). PPM are derived as the fractional occurrence of nucleotides A,C,G, and T at each position of a position frequency matrix (PFM). PFM are simply the tally of each nucleotide at each position across a set of aligned sequences. With a PPM, one can generate probabilities based on the genome, or more practically, create any number of position specific scoring matrices (PSSM) based on the principle that the PPM contains information about the likelihood of observing a particular nucleotide at a particular position of a true transcription factor binding site.

This guide includes a brief overview of the processing flow, an example focusing more in depth on the practical aspect of using motifBreakR, and finally a detailed section on the scoring methods employed by the package.

2 Processing overview

motifBreakR may be used to interrogate SNPs or SNVs for their potential effect on transcription factor binding by examining how the two alleles of the variant effect the binding score of a motif. The basic process is outlined in the figure below.

2.1 Outline of process

motifbreakR workflow: How inputs (trapezoids) are generated from R functions (rectangles). Diamonds represent decisions of the user.

This straightforward process allows the interrogation of SNPs and SNVs in the context of the different species represented by BSgenome packages (at least 22 different species) and allows the use of the full MotifDb data set that includes over 4200 motifs across 8 studies and 22 organisms that we have supplemented with over 2800 additional motifs across four additional studies in Humans see data(encodemotif)1, data(factorbook)2, data(hocomoco)3 and data(homer)4 for the additional studies that we have included.

Practically motifBreakR has involves three phases.

  1. Read in Single Nucleotide Variants: The first step is to read in the list of variants. Variants can be a list of rsIDs if your SNPs are represented in one of the SNPlocs packages, may be included as a .bed file with a specifically formatted name field, or may be provided as a .vcf file. We then transform these input such that it may be read my motifBreakR.
  2. Find Broken Motifs: Next we present motifBreakR with the input generated in the previous step, along with a set of motifs formatted as class MotifList, and your preferred scoring method.
  3. Visualize SNPs and motifs: Finally we can visualize which motifs are broken by any individual SNP using our plotting function.

3 How To Use motifbreakR: A Practical Example

This section offers an example of how to use motifBreakR to identify potentially disrupted transcription factor binding sites due to 701 SNPs output from a FunciSNP analysis of Prostate Cancer (PCa) genome wide association studies (GWAS) risk loci. The SNPs are included in this package here:

library(motifbreakR)
pca.snps.file <- system.file("extdata", "pca.enhancer.snps", package = "motifbreakR")
pca.snps <- as.character(read.table(pca.snps.file)[,1])

The simplest form of a motifBreakR analysis is summarized as follows:

variants <- snps.from.rsid(rsid = pca.snps,
                           dbSNP = SNPlocs.Hsapiens.dbSNP142.GRCh37,
                           search.genome = BSgenome.Hsapiens.UCSC.hg19)
motifbreakr.results <- motifbreakR(snpList = variants, pwmList = MotifDb, threshold = 0.9)
plotMB(results = motifbreakr.results, rsid = "rs7837328", effect = "strong")

Lets look at these steps more closely and see how we can customize our analysis.

3.1 Step 1 | Read in Single Nucleotide Variants

Variants can be input either as a list of rsIDs or as a .bed file. The main factor determining which you will use is if your variants have rsIDs that are included in one of the Bioconductor SNPlocs packages. The present selection is seen here:

library(BSgenome)
available.SNPs()
##  [1] "SNPlocs.Hsapiens.dbSNP.20101109"      "SNPlocs.Hsapiens.dbSNP.20120608"     
##  [3] "SNPlocs.Hsapiens.dbSNP141.GRCh38"     "SNPlocs.Hsapiens.dbSNP142.GRCh37"    
##  [5] "SNPlocs.Hsapiens.dbSNP144.GRCh37"     "SNPlocs.Hsapiens.dbSNP144.GRCh38"    
##  [7] "SNPlocs.Hsapiens.dbSNP149.GRCh38"     "SNPlocs.Hsapiens.dbSNP150.GRCh38"    
##  [9] "SNPlocs.Hsapiens.dbSNP151.GRCh38"     "XtraSNPlocs.Hsapiens.dbSNP141.GRCh38"
## [11] "XtraSNPlocs.Hsapiens.dbSNP144.GRCh37" "XtraSNPlocs.Hsapiens.dbSNP144.GRCh38"

For cases where your rsIDs are not available in a SNPlocs package, or you have novel variants that are not cataloged at all, variants may be entered in BED format as seen below:

snps.file <- system.file("extdata", "snps.bed", package = "motifbreakR")
read.delim(snps.file, header = FALSE)
##     V1        V2        V3                V4 V5 V6
## 1 chr2  12581137  12581138        rs10170896  0  +
## 2 chr2  12594017  12594018 chr2:12594018:G:A  0  +
## 3 chr3 192388677 192388678        rs13068005  0  +
## 4 chr4 122361479 122361480        rs12644995  0  +
## 5 chr6  44503245  44503246 chr6:44503246:A:T  0  +
## 6 chr6  44503247  44503248 chr6:44503248:G:C  0  +
## 7 chr6  85232897  85232898         rs4510639  0  +
## 8 chr6  44501872  44501873          rs932680  0  +

Our requirements for the BED file are that it must include chromosome, start, end, name, score and strand fields – where the name field is required to be in one of two formats, either an rsID that is present in a SNPlocs package, or in the form chromosome:position:referenceAllele:alternateAllele e.g., chr2:12594018:G:A. It is also essential that the fields are TAB separated, not a mixture of tabs and spaces.

More to the point here are the two methods for reading in the variants.

3.1.1 SNPs from rsID:

We use the SNPlocs.Hsapiens.dbSNP142.GRCh37 which is the SNP locations and alleles defined in dbSNP142 as a source for looking up our rsIDs and BSgenome.Hsapiens.UCSC.hg19 which holds the reference sequence for UCSC genome build hg19. Additional SNPlocs packages are availble from Bioconductor.

library(SNPlocs.Hsapiens.dbSNP142.GRCh37) # dbSNP142 in hg19
library(BSgenome.Hsapiens.UCSC.hg19)     # hg19 genome
head(pca.snps)
## [1] "rs1551515"  "rs1551513"  "rs17762938" "rs4473999"  "rs7823297"  "rs9656964"
snps.mb <- snps.from.rsid(rsid = pca.snps,
                          dbSNP = SNPlocs.Hsapiens.dbSNP142.GRCh37,
                          search.genome = BSgenome.Hsapiens.UCSC.hg19)
## Warning in rowids2rowidx(user_rowids, ids, x_rowids, ifnotfound): SNP ids not found: rs78914317, rs75425437, rs114099824, rs79509278, rs74738513
##   
##   They were dropped.
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted
## string

## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted
## string

## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted
## string

## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted
## string
snps.mb
## GRanges object with 700 ranges and 4 metadata columns:
##              seqnames    ranges strand |      SNP_id alleles_as_ambig            REF            ALT
##                 <Rle> <IRanges>  <Rle> | <character>   <DNAStringSet> <DNAStringSet> <DNAStringSet>
##   rs10007915     chr4 106065308      * |  rs10007915                S              C              G
##   rs10015716     chr4  95548550      * |  rs10015716                R              G              A
##   rs10034824     chr4  95524838      * |  rs10034824                K              G              T
##   rs10056823     chr5 115609454      * |  rs10056823                S              C              G
##    rs1006140    chr19  38778915      * |   rs1006140                R              A              G
##          ...      ...       ...    ... .         ...              ...            ...            ...
##    rs9901746    chr17  36103149      * |   rs9901746                R              G              A
##    rs9908087    chr17  69106937      * |   rs9908087                K              T              G
##     rs991429    chr17  69109773      * |    rs991429                R              G              A
##    rs9973650     chr2 238380266      * |   rs9973650                R              G              A
##     rs998071     chr4  95591976      * |    rs998071                S              C              G
##   -------
##   seqinfo: 93 sequences (1 circular) from hg19 genome

A far greater variety of variants may be read into motifBreakR via the snps.from.file function. In fact motifBreakR will work with any organism present as a Bioconductor BSgenome package. This includes 76 genomes representing 22 species.

library(BSgenome)
genomes <- available.genomes()
length(genomes)
## [1] 91
genomes
##  [1] "BSgenome.Alyrata.JGI.v1"                   "BSgenome.Amellifera.BeeBase.assembly4"    
##  [3] "BSgenome.Amellifera.UCSC.apiMel2"          "BSgenome.Amellifera.UCSC.apiMel2.masked"  
##  [5] "BSgenome.Aofficinalis.NCBI.V1"             "BSgenome.Athaliana.TAIR.04232008"         
##  [7] "BSgenome.Athaliana.TAIR.TAIR9"             "BSgenome.Btaurus.UCSC.bosTau3"            
##  [9] "BSgenome.Btaurus.UCSC.bosTau3.masked"      "BSgenome.Btaurus.UCSC.bosTau4"            
## [11] "BSgenome.Btaurus.UCSC.bosTau4.masked"      "BSgenome.Btaurus.UCSC.bosTau6"            
## [13] "BSgenome.Btaurus.UCSC.bosTau6.masked"      "BSgenome.Btaurus.UCSC.bosTau8"            
## [15] "BSgenome.Carietinum.NCBI.v1"               "BSgenome.Celegans.UCSC.ce10"              
## [17] "BSgenome.Celegans.UCSC.ce11"               "BSgenome.Celegans.UCSC.ce2"               
## [19] "BSgenome.Celegans.UCSC.ce6"                "BSgenome.Cfamiliaris.UCSC.canFam2"        
## [21] "BSgenome.Cfamiliaris.UCSC.canFam2.masked"  "BSgenome.Cfamiliaris.UCSC.canFam3"        
## [23] "BSgenome.Cfamiliaris.UCSC.canFam3.masked"  "BSgenome.Dmelanogaster.UCSC.dm2"          
## [25] "BSgenome.Dmelanogaster.UCSC.dm2.masked"    "BSgenome.Dmelanogaster.UCSC.dm3"          
## [27] "BSgenome.Dmelanogaster.UCSC.dm3.masked"    "BSgenome.Dmelanogaster.UCSC.dm6"          
## [29] "BSgenome.Drerio.UCSC.danRer10"             "BSgenome.Drerio.UCSC.danRer5"             
## [31] "BSgenome.Drerio.UCSC.danRer5.masked"       "BSgenome.Drerio.UCSC.danRer6"             
## [33] "BSgenome.Drerio.UCSC.danRer6.masked"       "BSgenome.Drerio.UCSC.danRer7"             
## [35] "BSgenome.Drerio.UCSC.danRer7.masked"       "BSgenome.Ecoli.NCBI.20080805"             
## [37] "BSgenome.Gaculeatus.UCSC.gasAcu1"          "BSgenome.Gaculeatus.UCSC.gasAcu1.masked"  
## [39] "BSgenome.Ggallus.UCSC.galGal3"             "BSgenome.Ggallus.UCSC.galGal3.masked"     
## [41] "BSgenome.Ggallus.UCSC.galGal4"             "BSgenome.Ggallus.UCSC.galGal4.masked"     
## [43] "BSgenome.Ggallus.UCSC.galGal5"             "BSgenome.Hsapiens.1000genomes.hs37d5"     
## [45] "BSgenome.Hsapiens.NCBI.GRCh38"             "BSgenome.Hsapiens.UCSC.hg17"              
## [47] "BSgenome.Hsapiens.UCSC.hg17.masked"        "BSgenome.Hsapiens.UCSC.hg18"              
## [49] "BSgenome.Hsapiens.UCSC.hg18.masked"        "BSgenome.Hsapiens.UCSC.hg19"              
## [51] "BSgenome.Hsapiens.UCSC.hg19.masked"        "BSgenome.Hsapiens.UCSC.hg38"              
## [53] "BSgenome.Hsapiens.UCSC.hg38.masked"        "BSgenome.Mfascicularis.NCBI.5.0"          
## [55] "BSgenome.Mfuro.UCSC.musFur1"               "BSgenome.Mmulatta.UCSC.rheMac2"           
## [57] "BSgenome.Mmulatta.UCSC.rheMac2.masked"     "BSgenome.Mmulatta.UCSC.rheMac3"           
## [59] "BSgenome.Mmulatta.UCSC.rheMac3.masked"     "BSgenome.Mmulatta.UCSC.rheMac8"           
## [61] "BSgenome.Mmusculus.UCSC.mm10"              "BSgenome.Mmusculus.UCSC.mm10.masked"      
## [63] "BSgenome.Mmusculus.UCSC.mm8"               "BSgenome.Mmusculus.UCSC.mm8.masked"       
## [65] "BSgenome.Mmusculus.UCSC.mm9"               "BSgenome.Mmusculus.UCSC.mm9.masked"       
## [67] "BSgenome.Osativa.MSU.MSU7"                 "BSgenome.Ptroglodytes.UCSC.panTro2"       
## [69] "BSgenome.Ptroglodytes.UCSC.panTro2.masked" "BSgenome.Ptroglodytes.UCSC.panTro3"       
## [71] "BSgenome.Ptroglodytes.UCSC.panTro3.masked" "BSgenome.Ptroglodytes.UCSC.panTro5"       
## [73] "BSgenome.Ptroglodytes.UCSC.panTro6"        "BSgenome.Rnorvegicus.UCSC.rn4"            
## [75] "BSgenome.Rnorvegicus.UCSC.rn4.masked"      "BSgenome.Rnorvegicus.UCSC.rn5"            
## [77] "BSgenome.Rnorvegicus.UCSC.rn5.masked"      "BSgenome.Rnorvegicus.UCSC.rn6"            
## [79] "BSgenome.Scerevisiae.UCSC.sacCer1"         "BSgenome.Scerevisiae.UCSC.sacCer2"        
## [81] "BSgenome.Scerevisiae.UCSC.sacCer3"         "BSgenome.Sscrofa.UCSC.susScr11"           
## [83] "BSgenome.Sscrofa.UCSC.susScr3"             "BSgenome.Sscrofa.UCSC.susScr3.masked"     
## [85] "BSgenome.Tgondii.ToxoDB.7.0"               "BSgenome.Tguttata.UCSC.taeGut1"           
## [87] "BSgenome.Tguttata.UCSC.taeGut1.masked"     "BSgenome.Tguttata.UCSC.taeGut2"           
## [89] "BSgenome.Vvinifera.URGI.IGGP12Xv0"         "BSgenome.Vvinifera.URGI.IGGP12Xv2"        
## [91] "BSgenome.Vvinifera.URGI.IGGP8X"

3.1.2 SNPs from BED formatted file:

Here we examine two possibilities. In one case we have a mixture of rsIDs and our naming scheme that allows for arbitrary variants. Second we have a list of variants for the zebrafish Danio rerio that does not have a SNPlocs package, but does have it’s genome present among the availible.genomes().

snps.bed.file <- system.file("extdata", "snps.bed", package = "motifbreakR")
# see the contents
read.table(snps.bed.file, header = FALSE)
##     V1        V2        V3                V4 V5 V6
## 1 chr2  12581137  12581138        rs10170896  0  +
## 2 chr2  12594017  12594018 chr2:12594018:G:A  0  +
## 3 chr3 192388677 192388678        rs13068005  0  +
## 4 chr4 122361479 122361480        rs12644995  0  +
## 5 chr6  44503245  44503246 chr6:44503246:A:T  0  +
## 6 chr6  44503247  44503248 chr6:44503248:G:C  0  +
## 7 chr6  85232897  85232898         rs4510639  0  +
## 8 chr6  44501872  44501873          rs932680  0  +

Seeing as we have some SNPs listed by their rsIDs we can query those by including a SNPlocs object as an argument to snps.from.file

library(SNPlocs.Hsapiens.dbSNP142.GRCh37)
#import the BED file
snps.mb.frombed <- snps.from.file(file = snps.bed.file,
                                  dbSNP = SNPlocs.Hsapiens.dbSNP142.GRCh37,
                                  search.genome = BSgenome.Hsapiens.UCSC.hg19,
                                  format = "bed")
snps.mb.frombed
## Warning message:
## In snps.from.file(file = snps.bed.file, dbSNP = SNPlocs.Hsapiens.dbSNP142.GRCh37:
##   7601289 was found as a match for chr2:12594018:G:A; using entry from dbSNP
## GRanges object with 8 ranges and 4 metadata columns:
##                     seqnames    ranges strand |            SNP_id alleles_as_ambig            REF
##                        <Rle> <IRanges>  <Rle> |       <character>   <DNAStringSet> <DNAStringSet>
##          rs10170896     chr2  12581138      + |        rs10170896                R              G
##          rs12644995     chr4 122361480      + |        rs12644995                M              C
##          rs13068005     chr3 192388678      + |        rs13068005                R              G
##           rs4510639     chr6  85232898      + |         rs4510639                Y              C
##            rs932680     chr6  44501873      + |          rs932680                K              G
##             7601289     chr2  12594018      + |           7601289                R              G
##   chr6:44503246:A:T     chr6  44503246      + | chr6:44503246:A:T                W              A
##   chr6:44503248:G:C     chr6  44503248      + | chr6:44503248:G:C                S              G
##                                ALT
##                     <DNAStringSet>
##          rs10170896              A
##          rs12644995              A
##          rs13068005              A
##           rs4510639              T
##            rs932680              T
##             7601289              A
##   chr6:44503246:A:T              T
##   chr6:44503248:G:C              C
##   -------
##   seqinfo: 93 sequences (1 circular) from hg19 genome

We see also that one of our custom variants chr2:12594018:G:A was actually already included in dbSNP, and was therefor annotated in the output as rs7601289

If our BED file includes no rsIDs, then we may omit the dbSNP argument from the function. This example uses variants from Danio rerio

library(BSgenome.Drerio.UCSC.danRer7)
snps.bedfile.nors <- system.file("extdata", "danRer.bed", package = "motifbreakR")
read.table(snps.bedfile.nors, header = FALSE)
##       V1       V2       V3                 V4 V5 V6
## 1  chr18 13030932 13030933 chr18:13030933:G:A  0  +
## 2  chr18 30445455 30445456 chr18:30445456:T:A  0  +
## 3   chr5 22065023 22065024  chr5:22065024:A:T  0  +
## 4  chr14 36140941 36140942 chr14:36140942:T:A  0  +
## 5   chr3 16701576 16701577  chr3:16701577:T:A  0  +
## 6  chr14 20887995 20887996 chr14:20887996:G:A  0  +
## 7   chr7 25195449 25195450  chr7:25195450:G:T  0  +
## 8   chr2 59181852 59181853  chr2:59181853:A:G  0  +
## 9   chr3 58162674 58162675  chr3:58162675:C:T  0  +
## 10 chr22 18708824 18708825 chr22:18708825:T:A  0  +
snps.mb.frombed <- snps.from.file(file = snps.bedfile.nors,
                                  search.genome = BSgenome.Drerio.UCSC.danRer7,
                                  format = "bed")
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted
## string

## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted
## string
snps.mb.frombed
## GRanges object with 10 ranges and 4 metadata columns:
##                      seqnames    ranges strand |             SNP_id alleles_as_ambig            REF
##                         <Rle> <IRanges>  <Rle> |        <character>   <DNAStringSet> <DNAStringSet>
##   chr18:13030933:G:A    chr18  13030933      + | chr18:13030933:G:A                R              G
##   chr18:30445456:T:A    chr18  30445456      + | chr18:30445456:T:A                W              T
##    chr5:22065024:A:T     chr5  22065024      + |  chr5:22065024:A:T                W              A
##   chr14:36140942:T:A    chr14  36140942      + | chr14:36140942:T:A                W              T
##    chr3:16701577:T:A     chr3  16701577      + |  chr3:16701577:T:A                W              T
##   chr14:20887996:G:A    chr14  20887996      + | chr14:20887996:G:A                R              G
##    chr7:25195450:G:T     chr7  25195450      + |  chr7:25195450:G:T                K              G
##    chr2:59181853:A:G     chr2  59181853      + |  chr2:59181853:A:G                R              A
##    chr3:58162675:C:T     chr3  58162675      + |  chr3:58162675:C:T                Y              C
##   chr22:18708825:T:A    chr22  18708825      + | chr22:18708825:T:A                W              T
##                                 ALT
##                      <DNAStringSet>
##   chr18:13030933:G:A              A
##   chr18:30445456:T:A              A
##    chr5:22065024:A:T              T
##   chr14:36140942:T:A              A
##    chr3:16701577:T:A              A
##   chr14:20887996:G:A              A
##    chr7:25195450:G:T              T
##    chr2:59181853:A:G              G
##    chr3:58162675:C:T              T
##   chr22:18708825:T:A              A
##   -------
##   seqinfo: 26 sequences (1 circular) from danRer7 genome

snps.from.file also can take as input a vcf file with SNVs, by using format = "vcf".

3.2 Step 2 | Find Broken Motifs

Now that we have our data in the required format, we may continue to the task at hand, and determine which variants modify potential transcription factor binding. An important element of this task is identifying a set of transcription factor binding motifs that we wish to query. Fortunately MotifDb includes a large selection of motifs across multiple species that we can see here:

library(MotifDb)
MotifDb
## MotifDb object of length 9933
## | Created from downloaded public sources: 2013-Aug-30
## | 9933 position frequency matrices from 14 sources:
## |    FlyFactorSurvey:  614
## |        HOCOMOCOv10: 1066
## |              HOMER:  332
## |        JASPAR_2014:  592
## |        JASPAR_CORE:  459
## |             ScerTF:  196
## |       SwissRegulon:  684
## |           UniPROBE:  380
## |         cisbp_1.02:  874
## |               hPDI:  437
## |         jaspar2016: 1209
## |         jaspar2018: 1564
## |          jolma2013:  843
## |            stamlab:  683
## | 61 organism/s
## |           Hsapiens: 4616
## |          Mmusculus: 1411
## |      Dmelanogaster: 1287
## |        Scerevisiae: 1051
## |          Athaliana:  803
## |           Celegans:   90
## |              other:  675
## Scerevisiae-ScerTF-ABF2-badis 
## Scerevisiae-ScerTF-CAT8-badis 
## Scerevisiae-ScerTF-CST6-badis 
## Scerevisiae-ScerTF-ECM23-badis 
## Scerevisiae-ScerTF-EDS1-badis 
## ...
## Mmusculus-UniPROBE-Zfp740.UP00022 
## Mmusculus-UniPROBE-Zic1.UP00102 
## Mmusculus-UniPROBE-Zic2.UP00057 
## Mmusculus-UniPROBE-Zic3.UP00006 
## Mmusculus-UniPROBE-Zscan4.UP00026
### Here we can see which organisms are availible under which sources
### in MotifDb
table(mcols(MotifDb)$organism, mcols(MotifDb)$dataSource)
FlyFactorSurvey HOCOMOCOv10 HOMER JASPAR_2014 JASPAR_CORE ScerTF SwissRegulon UniPROBE cisbp_1.02 hPDI jaspar2016 jaspar2018 jolma2013 stamlab
Acarolinensis 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Amajus 0 0 0 3 3 0 0 0 0 0 3 3 0 0
Anidulans 0 0 0 0 0 0 0 0 8 0 0 0 0 0
Apisum 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Aterreus 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Athaliana 0 0 0 48 5 0 0 0 107 0 191 452 0 0
Bdistachyon 0 0 0 0 0 0 0 0 2 0 0 0 0 0
Celegans 0 0 0 15 5 0 0 2 22 0 23 23 0 0
Cparvum 0 0 0 0 0 0 0 1 1 0 0 0 0 0
Csativa 0 0 0 0 0 0 0 0 2 0 0 0 0 0
Ddiscoideum 0 0 0 0 0 0 0 0 9 0 0 0 0 0
Dmelanogaster 614 0 0 131 125 0 0 0 138 0 139 140 0 0
Drerio 0 0 0 0 0 0 0 0 3 0 0 0 0 0
Gaculeatus 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Gallus 0 0 0 1 2 0 0 0 0 0 0 0 0 0
Ggallus 0 0 0 0 0 0 0 0 0 0 4 4 0 0
Hcapsulatum 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Hroretzi 0 0 0 1 1 0 0 0 0 0 1 1 0 0
Hsapiens 0 640 0 117 66 0 684 2 313 437 442 522 710 683
Hvulgare 0 0 0 1 1 0 0 0 0 0 1 1 0 0
Mdomestica 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Mgallopavo 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Mmurinus 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Mmusculus 0 426 0 66 47 0 0 282 132 0 165 160 133 0
Mmusculus;Hsapiens 0 0 0 0 0 0 0 0 0 0 0 3 0 0
Mmusculus;Rnorvegicus 0 0 0 0 0 0 0 0 0 0 0 2 0 0
Mmusculus;Rnorvegicus;Hsapiens 0 0 0 0 0 0 0 0 0 0 0 6 0 0
Mmusculus;Rnorvegicus;Hsapiens;Ocuniculus 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Mmusculus;Rnorvegicus;Omykiss;Ggallus;Hsapiens 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Mmusculus;Rnorvegicus;Xlaevis;Stropicalis;Ggallus;Hsapiens;Btaurus;Ocuniculus 0 0 0 0 0 0 0 0 0 0 0 2 0 0
Mmusculus;Rrattus;Hsapiens;Ocuniculus 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Mtruncatula 0 0 0 0 0 0 0 0 1 0 0 0 0 0
NA 0 0 0 0 0 0 0 0 0 0 0 40 0 0
Ncrassa 0 0 0 0 0 0 0 0 15 0 0 0 0 0
Ngruberi 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Nhaematococca 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Nsp. 0 0 0 0 0 0 0 0 0 0 1 1 0 0
Nsylvestris 0 0 0 1 1 0 0 0 0 0 0 0 0 0
Nvectensis 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Ocuniculus 0 0 0 1 1 0 0 0 0 0 1 1 0 0
Osativa 0 0 0 0 0 0 0 0 5 0 0 0 0 0
Otauri 0 0 0 0 0 0 0 0 2 0 0 0 0 0
Pcapensis 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Pfalciparum 0 0 0 0 0 0 0 2 26 0 0 0 0 0
Phybrida 0 0 0 1 1 0 0 0 0 0 1 1 0 0
Ppatens 0 0 0 0 0 0 0 0 7 0 0 0 0 0
Ppygmaeus 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Psativum 0 0 0 3 3 0 0 0 1 0