Summix2

Summix2 is a suite of methods that detect and leverage substructure in genetic summary data. This package builds on Summix, a method that estimates and adjusts for substructure in genetic summary that was developed by the Hendricks Research Team at the University of Colorado Denver.

Find more details about Summix in our manuscript published in the American Journal of Human Genetics.

For individual function specifics in Summix2:

summixfast forward to example

adjAFfast forward to example

summix_localfast forward to example

Package Installation

To install Summix2, ensure you are in the devel version of R- (to install in Windows click here). Start R (version “4.4”)-the devel version- and run the following commands:

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

#The following initializes usage of the Bioconductor development version of Summix2
BiocManager::install(version = "devel")

BiocManager::install("Summix")




summix

The summix() function estimates mixture proportions of reference groups within genetic summary (allele frequency) data using sequential quadratic programming performed with the slsqp() function in the nloptr package.

summix() Input

Mandatory parameters are:

Optional parameters are:

summix() Output

A data frame with the following columns:




adjAF

The adjAF() function adjusts allele frequencies to match reference group substructure mixture proportions in a given target group or individual.

adjAF() Input

Mandatory parameters are:

Optional parameters are:

adjAF() Output

A data frame with the following columns:




summix_local

The summix_local() function estimates local ancestry mixture proportions in genetic summary data using the same slspq() functionality as summix(). summix_local() also performs a selection scan (optional) that identifies regions of selection along the given chromosome.

summix_local() Input

Mandatory parameters are:

Optional parameters are:

Conditional parameters are:

If algorithm = “windows”:

If algorithm = “fastcatch”:

If type = “variants”:

If type = “bp”:

If algorithm = “fastcatch” and type = “variants”:

If algorithm = “fastcatch” and type = “bp”:

If selection_scan = TRUE:

summix_local() Output

A data frame with a row for each local ancestry block and the following columns:

Additional Output if selection_scan = TRUE:

Examples using toy data in the Summix package

For quick runs of all demos, we suggest using the data saved within the Summix library called ancestryData.

A quick demo of summix()

The commands:

library(Summix)

# load the data
data("ancestryData")

# Estimate 5 reference ancestry proportion values for the gnomAD African/African American group
# using a starting guess of .2 for each ancestry proportion.
summix(data = ancestryData,
    reference=c("reference_AF_afr",
        "reference_AF_eas",
        "reference_AF_eur",
        "reference_AF_iam",
        "reference_AF_sas"),
    observed="gnomad_AF_afr",
    pi.start = c(.2, .2, .2, .2, .2),
    goodness.of.fit=TRUE)
#>   goodness.of.fit iterations           time filtered reference_AF_afr
#> 1       0.4853597         20 0.6533918 secs        0         0.812142
#>   reference_AF_eur reference_AF_iam
#> 1         0.169953         0.017905





A quick demo of adjAF()

The commands:

library(Summix)

# load the data
data("ancestryData")


adjusted_data<-adjAF(data = ancestryData,
     reference = c("reference_AF_afr", "reference_AF_eur"),
     observed = "gnomad_AF_afr",
     pi.target = c(1, 0),
     pi.observed = c(.85, .15),
     adj_method = 'average',
     N_reference = c(704,741),
     N_observed = 20744,
     filter = TRUE)
#> [1] "Average fold change between observed and target group proportions is: 0.58"
#> 
#> 
#> [1] "Note: In this AF adjustment, 0 SNPs (with adjusted AF > -.005 & < 0) were rounded to 0. 0 SNPs (with adjusted AF > 1) were rounded to 1, and 0 SNPs (with adjusted AF <= -.005) were removed from the final results."
#> 
#> [1] $pi
#>          ref.group pi.observed pi.target
#> 1 reference_AF_afr        0.85         1
#> 2 reference_AF_eur        0.15         0
#> 
#> [1] $observed.data
#> [1] "observed AF data to update: 'gnomad_AF_afr'"
#> 
#> [1] $Nsnps
#> [1] 1000
#> 
#> 
#> [1] $effective.sample.size
#> [1] 18336
#> 
#> 
#> [1] "use $adjusted.AF$adjustedAF to see adjusted AF data"
#> 
#> 
#> [1] "Note: The accuracy of the AF adjustment is likely lower for rare variants (< .5%)."
print(adjusted_data$adjusted.AF[1:5,])
#>        POS REF ALT CHROM reference_AF_afr reference_AF_eas reference_AF_eur
#> 1 31652001   T   A chr22      0.040925268                0      0.000000000
#> 2 34509945   C   G chr22      0.217971527                0      0.000000000
#> 3 34636589 CAA   C chr22      0.181117576                0      0.001149425
#> 4 38889885   A AAG chr22      0.007117446                0      0.000000000
#> 5 49160931   G   T chr22      0.064056997                0      0.000000000
#>   reference_AF_iam reference_AF_sas gnomad_AF_afr  adjustedAF
#> 1                0                0    0.04171490 0.045000811
#> 2                0                0    0.18774500 0.219423999
#> 3                0                0    0.15198300 0.179859133
#> 4                0                0    0.00422064 0.006041453
#> 5                0                0    0.05445710 0.064062087





A quick demo of summix_local()

The commands:

library(Summix)

# load the data
data("ancestryData")

results <- summix_local(data = ancestryData,
                        reference = c("reference_AF_afr", 
                                      "reference_AF_eas", 
                                      "reference_AF_eur", 
                                      "reference_AF_iam", 
                                      "reference_AF_sas"),
                        NSimRef = c(704,787,741,47,545),
                        observed="gnomad_AF_afr",
                        goodness.of.fit = T,
                        type = "variants",
                        algorithm = "fastcatch",
                        minVariants = 150,
                        maxVariants = 250,
                        maxStepSize = 1000,
                        diffThreshold = .02,
                        override_fit = F,
                        override_removeSmallAnc = TRUE,
                        selection_scan = F,
                        position_col = "POS")
#> [1] "Done getting LA proportions"

print(results$results)
#>   Start_Pos  End_Pos goodness.of.fit iterations      time filtered
#> 1  10595784 19258643       1.2555376         10 0.1949110        0
#> 2  19258643 25252606       0.5018649         13 0.1896176        0
#> 3  25252606 30743600       0.2304807         11 0.2167401        0
#> 4  30743600 35846592       0.2933341         14 0.1748781        0
#> 5  35846592 42706228       0.5480859         14 0.2054605        0
#> 6  42706228 47902876       0.2634092         11 0.1897080        0
#> 7  47902876 50791970       0.2891929         10 0.1914284        0
#>   reference_AF_afr reference_AF_eas reference_AF_eur reference_AF_iam
#> 1         0.809208         0.000000         0.146185         0.034417
#> 2         0.816933         0.000000         0.161511         0.021556
#> 3         0.805795         0.002730         0.160926         0.000000
#> 4         0.820812         0.002558         0.161353         0.015276
#> 5         0.806428         0.016357         0.157855         0.019360
#> 6         0.810130         0.004046         0.181798         0.004025
#> 7         0.811265         0.000000         0.148492         0.019896
#>   reference_AF_sas nSNPs
#> 1         0.010189   150
#> 2         0.000000   149
#> 3         0.030550   149
#> 4         0.000000   149
#> 5         0.000000   149
#> 6         0.000000   149
#> 7         0.020347   104