Changes in version 3.16.0:
o estimateDisp() now respects weights in calculating the APLs.
o Added design matrix to the output of estimateDisp().
o glmFit() constructs design matrix, if design=NULL, from
y$samples$group.
o New argument 'null' in glmTreat(), and a change in how p-values are
calculated by default.
o Modified the default 'main' in plotMD().
o Created a new S3 class, compressedMatrix, to store offsets and
weights efficiently.
o Added the makeCompressedMatrix() function to make a compressedMatrix
object.
o Switched storage of offsets in DGEGLM objects to use the
compressedMatrix class.
o Added the addPriorCount() function for adding prior counts.
o Modified spliceVariants() calculation of the average log-CPM.
o Migrated some internal calculations and checks to C++ for greater
efficiency.
Changes in version 3.14.0:
o estimateDisp(), estimateCommonDisp(), estimateTrendedDisp(),
estimateTagwiseDisp(), splitIntoGroups() and equalizeLibSizes() are
now S3 generic functions.
o The default method of estimateGLMTrendedDisp() and
estimateGLMTagwiseDisp() now only return dispersion estimates
instead of a list.
o Add fry method for DGEList objects.
o Import R core packages explicitly.
o New function gini() to compute Gini coefficients.
o New argument poisson.bound for glmQLFTest(). If TRUE (default), the
p-value returned by glmQLFTest() will never be less than what would
be obtained for a likelihood ratio test with NB dispersion equal to
zero.
o New argument samples for DGEList(). It takes a data frame containing
information for each sample.
o glmFit() now protects against zero library sizes and infinite offset
values.
o glmQLFit.default() now avoids passing a NULL design to .residDF().
o cpm.default() now outputs a matrix of the same dimensions as the
input even when the input has 0 row or 0 column.
o DGEList() pops up a warning message when zero lib.size is detected.
o Bug fix to calcNormFactors(method="TMM") when two libraries have
identical counts but the lib.sizes have been set unequal.
o Add a CRISPR-Cas9 screen case study to the users' guide and rename
Nigerian case study to Yoruba.
Changes in version 3.12.0:
o New argument tagwise for estimateDisp(), allowing users not to
estimate tagwise dispersions.
o estimateTrendedDisp() has more stable performance and does not
return negative trended dispersion estimates.
o New plotMD methods for DGEList, DGEGLM, DGEExact and DGELRT objects
to make a mean-difference plot (aka MA plot).
o readDGE() now recognizes HTSeq style meta genes.
o Remove the F-test in glmLRT().
o New argument contrast for diffSpliceDGE(), allowing users to specify
the testing contrast.
o glmTreat() returns both logFC and unshrunk.logFC in the output
table.
o New method implemented in glmTreat() to increase the power of the
test.
o New kegga methods for DGEExact and DGELRT objects to perform KEGG
pathway analysis of differentially expressed genes using Entrez
Gene IDs.
o New dimnames<- methods for DGEExact and DGELRT objects.
o Bug fix to dimnames<- method for DGEGLM objects.
o User's Guide updated. Three old case studies are replaced by two new
comprehensive case studies.
Changes in version 3.10.0:
o An DGEList method for romer() has been added, allowing access to
rotation gene set enrichment analysis.
o New function dropEmptyLevels() to remove unused levels from a
factor.
o New argument p.value for topTags(), allowing users to apply a
p-value or FDR cutoff for the results.
o New argument prior.count for aveLogCPM().
o New argument pch for the plotMDS method for DGEList objects. Old
argument col is now removed, but can be passed using .... Various
other improvements to the plotMDS method for DGEList objects,
better labelling of the axes and protection against degenerate
dimensions.
o treatDGE() is renamed as glmTreat(). It can now optionally work with
either likelihood ratio tests or with quasi-likelihood F-tests.
o glmQLFit() is now an S3 generic function.
o glmQLFit() now breaks the output component s2.fit into three
separate components: df.prior, var.post and var.prior.
o estimateDisp() now protects against fitted values of zeros, giving
more accurate dispersion estimates.
o DGEList() now gives a message rather than an error when the count
matrix has non-unique column names.
o Minor corrections to User's Guide.
o requireNamespace() is now used internally instead of require() to
access functions in suggested packages.
Changes in version 3.8.0:
o New goana() methods for DGEExact and DGELRT objects to perform Gene
Ontology analysis of differentially expressed genes using Entrez
Gene IDs.
o New functions diffSpliceDGE(), topSpliceDGE() and plotSpliceDGE()
for detecting differential exon usage and displaying results.
o New function treatDGE() that tests for DE relative to a specified
log2-FC threshold.
o glmQLFTest() is split into three functions: glmQLFit() for fitting
quasi-likelihood GLMs, glmQLFTest() for performing quasi-likelihood
F-tests and plotQLDisp() for plotting quasi-likelihood dispersions.
o processHairpinReads() renamed to processAmplicons() and allows for
paired end data.
o glmFit() now stores unshrunk.coefficients from prior.count=0 as well
as shrunk coefficients.
o estimateDisp() now has a min.row.sum argument to protect against all
zero counts.
o APL calculations in estimateDisp() are hot-started using fitted
values from previous dispersions, to avoid discontinuous APL
landscapes.
o adjustedProfileLik() is modified to accept starting coefficients.
glmFit() now passes starting coefficients to mglmOneGroup().
o calcNormFactors() is now a S3 generic function.
o The SAGE datasets from Zhang et al (1997) are no longer included
with the edgeR package.
Changes in version 3.6.0:
o Improved treatment of fractional counts. Previously the classic
edgeR pipeline permitted fractional counts but the glm pipeline did
not. edgeR now permits fractional counts throughout.
o All glm-based functions in edgeR now accept quantitative
observation-level weights. The glm fitting function mglmLS() and
mglmSimple() are retired, and all glm fitting is now done by either
mglmLevenberg() or mglmOneWay().
o New capabilities for robust estimation allowing for
observation-level outliers. In particular, the new function
estimateGLMRobustDisp() computes a robust dispersion estimate for
each gene.
o More careful calculation of residual df in the presence of exactly
zero fitted values for glmQLFTest() and estimateDisp(). The new
code allows for deflation of residual df for more complex
experimental designs.
o New function processHairpinReads() for analyzing data from shRNA-seq
screens.
o New function sumTechReps() to collapse counts over technical
replicate libraries.
o New functions nbinomDeviance() and nbinomUnitDeviance. Old function
deviances.function() removed.
o New function validDGEList().
o rpkm() is now a generic function, and it now tries to find the gene
lengths automatically if available from the annotation information
in a DGEList object.
o Subsetting a DGEList object now has the option of resetting to the
library sizes to the new column sums. Internally, the subsetting
code for DGEList, DGEExact, DGEGLM, DGELRT and TopTags data objects
has been simplified using the new utility function
subsetListOfArrays in the limma package.
o To strengthen the interface and to strengthen the object-orientated
nature of the functions, the DGEList methods for estimateDisp(),
estimateGLMCommonDisp(), estimateGLMTrendedDisp() and
estimateGLMTagwiseDisp no longer accept offset, weights or
AveLogCPM as arguments. These quantities are now always taken from
the DGEList object.
o The User's Guide has new sections on read alignment, producing a
table of counts, and on how to translate scientific questions into
contrasts when using a glm.
o camera.DGEList(), roast.DGEList() and mroast.DGEList() now include
... argument.
o The main computation of exactTestByDeviance() now implemented in C++
code.
o The big.count argument has been removed from functions
exactTestByDeviance() and exactTestBySmallP().
o New default value for offset in dispCoxReid.
o More tolerant error checking for dispersion value when computing
aveLogCPM().
o aveLogCPM() now returns a value even when all the counts are zero.
o The functions is.fullrank and nonEstimable are now imported from
limma.
Changes in version 3.4.0:
o estimateDisp() now creates the design matrix correctly when the
design matrix is not given as an argument and there is only one
group. Previously this case gave an error.
o plotMDS.DGEList now gives a friendly error message when there are
fewer than 3 data columns.
o Updates to DGEList() so that arguments lib.size, group and
norm.factors are now set to their defaults in the function
definition rather than set to NULL. However NULL is still accepted
as a possible value for these arguments in the function call, in
which case the default value is used as if the argument was
missing.
o Refinement to cutWithMinN() to make the bin numbers more equal in
the worst case. Also a bug fix so that cutWithMinN() does not fail
even when there are many repeated x values.
o Refinement to computation for nbins in dispBinTrend. Now changes
more smoothly with the number of genes. trace argument is retired.
o Updates to help pages for the data classes.
o Fixes to calcNormFactors with method="TMM" so that it takes account
of lib.size and refCol if these are preset.
o Bug fix to glmQLFTest when plot=TRUE but abundance.trend=FALSE.
o predFC() with design=NULL now uses normalization factors correctly.
However this use of predFC() to compute counts per million is being
phased out in favour of cpm().
Changes in version 3.2.0:
o The User's Guide has a new section on between and within subject
designs and a new case study on RNA-seq profiling of unrelated
Nigerian individuals. Section 2.9 (item 2) now gives a code example
of how to pre-specify the dispersion value.
o New functions estimateDisp() and WLEB() to automate the estimation
of common, trended and tagwise dispersions. The function
estimateDisp() provides a simpler alternative pipeline and in
principle replaces all the other dispersion estimation functions,
for both glms and for classic edgeR. It can also incorporate
automatic estimation of the prior degrees of freedom, and can do
this in a robust fashion.
o glmLRT() now permits the contrast argument to be a matrix with
multiple columns, making the treatment of this argument analogous
to that of the coef argument.
o glmLRT() now has a new F-test option. This option takes into account
the uncertainty with which the dispersion is estimated and is more
conservative than the default chi-square test.
o glmQLFTest() has a number of important improvements. It now has a
simpler alternative calling sequence: it can take either a fitted
model object as before, or it can take a DGEList object and design
matrix and do the model fit itself. If provided with a fitted model
object, it now checks whether the dispersion is of a suitable type
(common or trended). It now optionally produces a plot of the raw
and shrunk residual mean deviances versus AveLogCPM. It now has the
option of robustifying the empirical Bayes step. It now has a more
careful calculation of residual df that takes special account of
cases where all replicates in a group are identically zero.
o The gene set test functions roast(), mroast() and camera() now have
methods defined for DGEList data objects. This facilitates gene set
testing and pathway analysis of expression profiles within edgeR.
o The default method of plotMDS() for DGEList objects has changed. The
new default forms log-counts-per-million and computes Euclidean
distances. The old method based on BCV-distances is available by
setting method="BCV". The annotation of the plot axes has been
improved so that the distance method used is apparent from the
plot.
o The argument prior.count.total used for shrinking log-fold-changes
has been changed to prior.count in various functions throughout the
package, and now refers to the average prior.count per observation
rather than the total prior count across a transcript. The
treatment of prior.counts has also been changed very slightly in
cpm() when log=TRUE.
o New function aveLogCPM() to compute the average log count per
million for each transcript across all libraries. This is now used
by all functions in the package to set AveLogCPM, which is now the
standard measure of abundance. The value for AveLogCPM is now
computed just once, and not updated when the dispersion is
estimated or when a linear model is fitted. glmFit() now preserves
the AveLogCPM vector found in the DGEList object rather than
recomputing it. The use of the old abundance measure is being
phased out.
o The glm dispersion estimation functions are now much faster.
o New function rpkm() to compute reads per kilobase per million
(RPKM).
o New option method="none" for calcNormFactors().
o The default span used by dispBinTrend() has been reduced.
o Various improvements to internal C++ code.
o Functions binCMLDispersion() and bin.dispersion() have been removed
as obsolete.
o Bug fix to subsetting for DGEGLM objects.
o Bug fix to plotMDS.DGEList to make consistent use of norm.factors.
Changes in version 3.0.0:
o New chapter in the User's Guide covering a number of common types of
experimental designs, including multiple groups, multiple factors
and additive models. New sections in the User's Guide on clustering
and on making tables of read counts. Many other updates to the
User's Guide and to the help pages.
o New function edgeRUsersGuide() to open the User's Guide in a pdf
viewer.
o Many functions have made faster by rewriting the core computations
in C++. This includes adjustedProfileLik(), mglmLevenberg(),
maximizeInterpolant() and goodTuring().
o New argument verbose for estimateCommonDisp() and
estimateGLMCommonDisp().
o The trended dispersion methods based on binning and interpolation
have been rewritten to give more stable results when the number of
genes is not large.
o The amount by which the tagwise dispersion estimates are squeezed
towards the global value is now specified in estimateTagwiseDisp(),
estimateGLMTagwiseDisp() and dispCoxReidInterpolateTagwise() by
specifying the prior degrees of freedom prior.df instead of the
prior number of samples prior.n.
o The weighted likelihood empirical Bayes code has been simplified or
developed in a number of ways. The old functions weightedComLik()
and weightedComLikMA() are now removed as no longer required.
o The functions estimateSmoothing() and approx.expected.info() have
been removed as no longer recommended.
o The span used by estimateGLMTagwiseDisp() is now chosen by default
as a decreasing function of the number of tags in the dataset.
o New method "loess" for the trend argument of estimateTagwiseDisp,
with "tricube" now treated as a synonym.
o New functions loessByCol() and locfitByCol() for smoothing columns
of matrix by non-robust loess curves. These functions are used in
the weighted likelihood empirical Bayes procedures to compute local
common likelihood.
o glmFit now shrinks the estimated fold-changes towards zero. The
default shrinkage is as for exactTest().
o predFC output is now on the natural log scale instead of log2.
o mglmLevenberg() is now the default glm fitting algorithm, avoiding
the occasional errors that occurred previously with mglmLS().
o The arguments of glmLRT() and glmQLFTest() have been simplified so
that the argument y, previously the first argument of glmLRT, is no
longer required.
o glmQLFTest() now ensures that no p-value is smaller than what would
be obtained by treating the likelihood ratio test statistic as
chisquare.
o glmQLFTest() now treats tags with all zero counts in replicate
arrays as having zero residual df.
o gof() now optionally produces a qq-plot of the genewise goodness of
fit statistics.
o Argument null.hypothesis removed from equalizeLibSizes().
o DGEList no longer outputs a component called all.zeros.
o goodTuring() no longer produces a plot. Instead there is a new
function goodTuringPlot() for plotting log-probability versus
log-frequency. goodTuring() has a new argument 'conf' giving the
confidence factor for the linear regression approximation.
o Added plot.it argument to maPlot().
Changes in version 2.6.0:
o edgeR now depends on limma.
o Considerable work on the User's Guide. New case study added on
Pathogen inoculated arabidopsis illustrating a two group comparison
with batch effects. All the other case studies have been updated
and streamlined. New section explaining why adjustments for GC
content and mappability are not necessary in a differential
expression context.
o New and more intuitive column headings for topTags() output. 'logFC'
is now the first column. Log-concentration is now replaced by
log-counts-per-million ('logCPM'). 'PValue' replaces 'P.Value'.
These column headings are now inserted in the table of results by
exactTest() and glmLRT() instead of being modified by the show
method for the TopTags object generated by topTags(). This means
that the column names will be correct even when users access the
fitted model objects directly instead of using the show method.
o plotSmear() and plotMeanVar() now use logCPM instead of logConc.
o New function glmQLFTest() provides quasi-likelihood hypothesis
testing using F-tests, as an alternative to likelihood ratio tests
using the chisquare distribution.
o New functions normalizeChIPtoInput() and calcNormOffsetsforChIP()
for normalization of ChIP-Seq counts relative to input control.
o New capabilities for formal shrinkage of the logFC. exactTest() now
incorporates formal shrinkage of the logFC, controlled by argument
'prior.count.total'. predFC() provides similar shrinkage capability
for glms.
o estimateCommonDisp() and estimateGLMCommonDisp() now set the
dispersion to NA when there is no replication, instead of setting
the dispersion to zero. This means that users will need to set a
dispersion value explicitly to use functions further down the
analysis pipeline.
o New function estimateTrendedDisp() analogous to
estimateGLMTrendedDisp() but for classic edgeR.
o The algorithms implemented in estimateTagwiseDisp() now uses fewer
grid points but interpolates, similar to estimateGLMTagwiseDisp().
o The power trend fitted by dispCoxReidPowerTrend() now includes a
positive asymptote. This greatly improves the fit on real data
sets. This now becomes the default method for
estimateGLMTrendedDisp() when the number of genes is less than 200.
o New user-friendly function plotBCV() displays estimated dispersions.
o New argument target.size for thinCounts().
o New utility functions getDispersion() and zscoreNBinom().
o dimnames() methods for DGEExact, DGELRT and TopTags classes.
o Function pooledVar() removed as no longer necessary.
o Minor fixes to various functions to ensure correct results in
special cases.
Changes in version 2.4.0:
o New function spliceVariants() for detecting alternative exon usage
from exon-level count data.
o A choice of rejection regions is now implemented for exactTest(),
and the default is changed from one based on small probabilities to
one based on doubling the smaller of the tail probabilities. This
gives better results than the original conditional test when the
dispersion is large (especially > 1). A Beta distribution
approximation to the tail probability is also implemented when the
counts are large, making exactTest() much faster and less memory
hungry.
o estimateTagwiseDisp() now includes an abundance trend on the
dispersions by default.
o exactTest() now uses tagwise.dispersion by default if found in the
object.
o estimateCRDisp() is removed. It is now replaced by
estimateGLMCommonDisp(), estimateGLMTrendedDisp() and
estimateGLMTagwiseDisp().
o Changes to glmFit() so that it automatically detects dispersion
estimates if in data object. It uses tagwise if available, then
trended, then common.
o Add getPriorN() to calculate the weight given to the common
parameter likelihood in order to smooth (or stabilize) the
dispersion estimates. Used as default for estimateTagwiseDisp and
estimateGLMTagwiseDisp().
o New function cutWithMinN() used in binning methods.
o glmFit() now S3 generic function, and glmFit() has new method
argument specifying fitting algorithm.
o DGEGLM objects now subsettable.
o plotMDS.dge() is retired, instead a DGEList method is now defined
for plotMDS() in the limma package. One advantage is that the plot
can be repeated with different graphical parameters without
recomputing the distances. The MDS method is also now much faster.
o Add as.data.frame method for TopTags objects.
o New function cpm() to calculate counts per million conveniently.
o Adding args to dispCoxReidInterpolateTagwise() to give more access
to tuning parameters.
o estimateGLMTagwiseDisp() now uses trended.dispersion by default if
trended.dispersion is found.
o Change to glmLRT() to ensure character coefficient argument will
work.
o Change to maPlot() so that any really extreme logFCs are brought
back to a more reasonable scale.
o estimateGLMCommonDisp() now returns NA when there are no residual df
rather than returning dispersion of zero.
o The trend computation of the local common likelihood in
dispCoxReidInterpolateTagwise() is now based on moving averages
rather than lowess.
o Changes to binGLMDispersion() to allow trended dispersion for data
sets with small numbers of genes, but with extra warnings.
o dispDeviance() and dispPearson() now give graceful estimates and
messages when the dispersion is outside the specified interval.
o Bug fix to mglmOneWay(), which was confusing parametrizations when
the design matrix included negative values.
o mglmOneWay() (and hence glmFit) no longer produces NA coefficients
when some of the fitted values were exactly zero.
o Changes to offset behaviour in estimateGLMCommonDisp(),
estimateGLMTrendedDisp() and estimateGLMTagwiseDisp() to fix bug.
Changes to several other functions on the way to fixing bugs when
computing dispersions in data sets with genes that have all zero
counts.
o Bug fix to mglmSimple() with matrix offset.
o Bug fix to adjustedProfLik() when there are fitted values exactly at
zero for one or more groups.