Information

4.2: Estimating Rates using Independent Contrasts - Biology


The information required to estimate evolutionary rates is efficiently summarized in the early (but still useful) phylogenetic comparative method of independent contrasts (Felsenstein 1985). Independent contrasts summarize the amount of character change across each node in the tree, and can be used to estimate the rate of character change across a phylogeny. There is also a simple mathematical relationship between contrasts and maximum-likelihood rate estimates that I will discuss below.

We can understand the basic idea behind independent contrasts if we think about the branches in the phylogenetic tree as the historical “pathways” of evolution. Each branch on the tree represents a lineage that was alive at some time in the history of the Earth, and during that time experienced some amount of evolutionary change. We can imagine trying to measure that change initially by comparing sister taxa. We can compare the trait values of the two sister taxa by finding the difference in their trait values, and then compare that to the total amount of time they have had to evolve that difference. By doing this for all sister taxa in the tree, we will get an estimate of the average rate of character evolution ( 4.1A). But what about deeper nodes in the tree? We could use other non-sister species pairs, but then we would be counting some branches in the tree of life more than once (Figure 4.1B). Instead, we use a “pruning algorithm,” (Felsenstein 1985, Felsenstein (2004)) chopping off pairs of sister taxa to create a smaller tree (Figure 4.1C). Eventually, all of the nodes in the tree will be trimmed off – and the algorithm will finish. Independent contrasts provides a way to generalize the approach of comparing sister taxa so that we can quantify the rate of evolution throughout the whole tree.

A more precise algorithm describing how Phylogenetic Independent Contrasts (PICs) are calculated is provided in Box 4.2, below (from Felsenstein 1985). Each contrast can be described as an estimate of the direction and amount of evolutionary change across the nodes in the tree. PICs are calculated from the tips of the tree towards the root, as differences between trait values at the tips of the tree and/or calculated average values at internal nodes. The differences themselves are sometimes called “raw contrasts” (Felsenstein 1985). These raw contrasts will all be statistically independent of each other under a wide range of evolutionary models. In fact, as long as each lineage in a phylogenetic tree evolves independently of every other lineage, regardless of the evolutionary model, the raw contrasts will be independent of each other. However, people almost never use raw contrasts because they are not identically distributed; each raw contrast has a different expected distribution that depends on the model of evolution and the branch lengths of the tree. In particular, under Brownian motion we expect more change on longer branches of the tree. Felsenstein (1985) divided the raw contrasts by their expected standard deviation under a Brownian motion model, resulting in standardized contrasts. These standardized contrasts are, under a BM model, both independent and identically distributed, and can be used in a variety of statistical tests. Note that we must assume a Brownian motion model in order to standardize the contrasts; results derived from the contrasts, then, depend on this Brownian motion assumption.

Box 4.2: Algorithm for Phylogenetic Independent Contrasts

One can calculate PICs using the algorithm from Felsenstein (1985). I reproduce this algorithm below. Keep in mind that this is an iterative algorithm – you repeat the five steps below once for each contrast, or n − 1 times over the whole tree (see Figure 4.1C as an example).

  1. Find two tips on the phylogeny that are adjacent (say nodes i and j) and have a common ancestor, say node k. Note that the choice of which node is i and which is j is arbitrary. As you will see, we will have to account for this “arbitrary direction” property of PICs in any analyses where we use them to do certian analyses!
  2. Compute the raw contrast, the difference between their two tip values:[c_{ij} = x_i − x_j label{4.1}]
  • Under a Brownian motion model, cij has expectation zero and variance proportional to vi + vj.
  1. Calculate the standardized contrast by dividing the raw contrast by its variance $$ s_{ij} = frac{c_{ij}}{v_i + v_j} = frac{x_i - x_j}{v_i + v_j} label{4.2}$$
  • Under a Brownian motion model, this contrast follows a normal distribution with mean zero and variance equal to the Brownian motion rate parameter σ2.
  1. Remove the two tips from the tree, leaving behind only the ancestor k, which now becomes a tip. Assign it the character value: $$ x_k = frac{(1/v_i)x_i+(1/v_j)x_j}{1/v_1+1/v_j} label{4.3}$$
    • It is worth noting that xk is a weighted average of xi and xj, but does not represent an ancestral state reconstruction, since the value is only influenced by species that descend directly from that node and not other relatives.
  2. Lengthen the branch below node k by increasing its length from vk to vk + vivj/(vi + vj). This accounts for the uncertainty in assigning a value to xk.

As mentioned above, we can apply the algorithm of independent contrasts to learn something about rates of body size evolution in mammals. We have a phylogenetic tree with branch lengths as well as body mass estimates for 49 species (Figure 4.2). If we ln-transform mass and then apply the method above to our data on mammal body size, we obtain a set of 48 standardized contrasts. A histogram of these contrasts is shown as Figure 4.2 (data from from Garland 1992).

Figure 4.2. Histogram of PICs for ln-transformed mammal body mass on a phylogenetic tree with branch lengths in millions of years (data from Garland 1992). Image by the author, can be reused under a CC-BY-4.0 license.

Note that each contrast is an amount of change, xixj, divided by a branch length, vi + vj, which is a measure of time. Thus, PICs from a single trait can be used to estimate σ2, the rate of evolution under a Brownian model. The PIC estimate of the evolutionary rate is:

$$ hat{sigma}_{PIC}^2 = frac{sum{s_{ij}^2}}{n-1} label{4.4}$$

That is, the PIC estimate of the evolutionary rate is the average of the n − 1 squared contrasts. This sum is taken over all sij, the standardized independent contrast across all (i, j) pairs of sister branches in the phylogenetic tree. For a fully bifurcating tree with n tips, there are exactly n − 1 such pairs. If you are statistically savvy, you might note that this formula looks a bit like a variance. In fact, if we state that the contrasts have a mean of 0 (which they must because Brownian motion has no overall trends), then this is a formula to estimate the variance of the contrasts.

If we calculate the mean sum of squared contrasts for the mammal body mass data, we obtain a rate estimate of $hat{sigma}_{PIC}^2$ = 0.09. We can put this into words: if we simulated mammalian body mass evolution under this model, we would expect the variance across replicated runs to increase by 0.09 per million years. Or, in more concrete terms, if we think about two lineages diverging from one another for a million years, we can draw changes in ln-body mass for both of them from a normal distribution with a variance of 0.09. Their difference, then, which is the amount of expected divergence, will be normal with a variance of 2 ⋅ 0.09 = 0.18. Thus, with 95% confidence, we can expect the two species to differ maximally by two standard deviations of this distribution, $2 cdot sqrt{0.18} = 0.85$. Since we are on a log scale, this amount of change corresponds to a factor of e2.68 = 2.3, meaning that one species will commonly be about twice as large (or small) as the other after just one million years.


Transcriptome analysis of Sinorhizobium melilotiduring symbiosis

Rhizobia induce the formation on specific legumes of new organs, the root nodules, as a result of an elaborated developmental program involving the two partners. In order to contribute to a more global view of the genetics underlying this plant-microbe symbiosis, we have mined the recently determined Sinorhizobium meliloti genome sequence for genes potentially relevant to symbiosis. We describe here the construction and use of dedicated nylon macroarrays to study simultaneously the expression of 200 of these genes in a variety of environmental conditions, pertinent to symbiosis.

Results

The expression of 214 S. meliloti genes was monitored under ten environmental conditions, including free-living aerobic and microaerobic conditions, addition of the plant symbiotic elicitor luteolin, and a variety of symbiotic conditions. Five new genes induced by luteolin have been identified as well as nine new genes induced in mature nitrogen-fixing bacteroids. A bacterial and a plant symbiotic mutant affected in nodule development have been found of particular interest to decipher gene expression at the intermediate stage of the symbiotic interaction. S. meliloti gene expression in the cultivated legume Medicago sativa (alfalfa) and the model plant M. truncatula were compared and a small number of differences was found.

Conclusions

In addition to exploring conditions for a genome-wide transcriptome analysis of the model rhizobium S. meliloti, the present work has highlighted the differential expression of several classes of genes during symbiosis. These genes are related to invasion, oxidative stress protection, iron mobilization, and signaling, thus emphasizing possible common mechanisms between symbiosis and pathogenesis.


1. Introduction

In clinical research, we usually compare the results of two treatment groups (experimental and control). The statistical methods used in the data analysis depend on the type of outcome. [1] If the outcome data are continuous variables (such as blood pressure), the researchers may want to know whether there is a significant difference in the mean values between the two groups. If the data is normally distributed, the two-sample t-test (for two independent groups) and the paired t-test (for matched samples) are probably the most widely used methods in statistics for the comparison of differences between two samples. Although this fact is well documented in statistical literature, confusion exists with regard to the use of these two test methods, resulting in their inappropriate use.

The reason for this confusion revolves around whether we should regard two samples as independent (marginally) or not. If not, what’s the reason for correlation? According to Kirkwood: ‘When comparing two populations, it is important to pay attention to whether the data sample from the populations are two independent samples or are, in fact, one sample of related pairs (paired samples)’. [2] In some cases, the independence can be easily identified from the data generating procedure. Two samples could be considered independent if the selection of the individuals or objects that make up one sample does not influence the selection of the individuals or subjects in the other sample in any way. [3] In this case, two-sample t-test should be applied to compare the mean values of two samples. On the other hand, if the observations in the first sample are coupled with some particular observations in the other sample, the samples are considered to be paired. [3] When the objects in one sample are all measured twice (as is common in �ore and after” comparisons), when the objects are related somehow (for example, if twins, siblings, or spouses are being compared), or when the objects are deliberately matched by the experimenters and have similar characteristics, dependence occurs. [2]

This paper aims to clarify some confusion surrounding use of t-tests in data analysis. We take a close look at the differences and similarities between independent t-test and paired t-test. Section 2 illustrates the data structure for two-independent samples and the matched pair samples. We discuss the differences and similarities of these two t-tests in Sections 3.

In section 4, we present three examples to explain the calculation process of the independent t-test in independent samples, and paired t-test in the time related samples and the matched samples, respectively. The conclusion and discussion are reported in Section 5.


4.2: Estimating Rates using Independent Contrasts - Biology

The history of life unfolds within a phylogenetic context. Comparative phylogenetic methods are statistical approaches for analyzing historical patterns along phylogenetic trees. This task view describes R packages that implement a variety of different comparative phylogenetic methods. This is an active research area and much of the information is subject to change. One thing to note is that many important packages are not on CRAN: either they were formerly on CRAN and were later archived (for example, if they failed to incorporate necessary changes as R is updated) or they are developed elsewhere and have not been put on CRAN yet. Such packages may be found on GitHub, R-Forge, or authors' websites.

Getting trees into R : Trees in R are usually stored in the S3 phylo class (implemented in ape), though the S4 phylo4 class (implemented in phylobase) is also available. ape can read trees from external files in newick format (sometimes popularly known as phylip format) or NEXUS format. It can also read trees input by hand as a newick string (i.e., "(human,(chimp,bonobo))"). phylobase and its lighter weight sibling rncl can use the Nexus Class Library to read NEXUS, Newick, and other tree formats. treebase can search for and load trees from the online tree repository TreeBASE, rdryad can pull data from the online data repository Dryad. RNeXML can read, write, and process metadata for the NeXML format. PHYLOCH can load trees from BEAST, MrBayes, and other phylogenetics programs (PHYLOCH is only available from the author's website ). phyext2 can read and write various tree formats, including simmap formats. rotl can pull in a synthetic tree and individual study trees from the Open Tree of Life project. The treeio package can read trees in Newick, Nexus, New Hampshire eXtended format (NHX), jplace and Phylip formats and data output from BEAST, EPA, HyPhy, MrBayes, PAML, PHYLDOG, pplacer, r8s, RAxML and RevBayes. phylogram can convert Newick files into dendrogram objects. brranching can fetch phylogenies from online repositories, including phylomatic .

Utility functions: These packages include functions for manipulating trees or associated data. ape has functions for randomly resolving polytomies, creating branch lengths, getting information about tree size or other properties, pulling in data from GenBank, and many more. phylobase has functions for traversing a tree (i.e., getting all descendants from a particular node specified by just two of its descendants). geiger can prune trees and data to an overlapping set of taxa. tidytree can convert a tree object in to a tidy data frame and has other tidy approaches to manipulate tree data. evobiR can do fuzzy matching of names (to allow some differences). SigTree finds branches that are responsive to some treatment, while allowing correction for multiple comparisons. dendextend can manipulate dendrograms, including subdividing trees, adding leaves, and more. apex can handle multiple gene DNA alignments making their use and analysis for tree inference easier in ape and phangorn. aphid can weight sequences based on a phylogeny and can use hidden Markov models (HMMs) for a variety of purposes including multiple sequence alignment.

Ancestral state reconstruction : Continuous characters can be reconstructed using maximum likelihood, generalised least squares or independent contrasts in ape. Root ancestral character states under Brownian motion or Ornstein-Uhlenbeck models can be reconstructed in ouch, though ancestral states at the internal nodes are not. Discrete characters can be reconstructed using a variety of Markovian models that parameterize the transition rates among states using ape. markophylo can fit a broad set of discrete character types with models that can incorporate constrained substitution rates, rate partitioning across sites, branch-specific rates, sampling bias, and non-stationary root probabilities. phytools can do stochastic character mapping of traits on trees.

Diversification Analysis: Lineage through time plots can be done in ape. A simple birth-death model for when you have extant species only (sensu Nee et al. 1994) can be fitted in ape as can survival models and goodness-of-fit tests (as applied to testing of models of diversification). TESS can calculate the likelihood of a tree under a model with time-dependent diversification, including mass extinctions. Net rates of diversification (sensu Magellon and Sanderson) can be calculated in geiger. diversitree implements the BiSSE method (Maddison et al. 1997) and later improvements (FitzJohn et al. 2009). TreePar estimates speciation and extinction rates with models where rates can change as a function of time (i.e., at mass extinction events) or as a function of the number of species. caper can do the macrocaic test to evaluate the effect of a a trait on diversity. apTreeshape also has tests for differential diversification (see description ). iteRates can identify and visualize areas on a tree undergoing differential diversification. DDD can fit density dependent models as well as models with occasional escape from density-dependence. BAMMtools is an interface to the BAMM program to allow visualization of rate shifts, comparison of diversification models, and other functions. DDD implements maximum likelihood methods based on the diversity-dependent birth-death process to test whether speciation or extinction are diversity-dependent, as well as identifies key innovations and simulate a density-dependent process. PBD can calculate the likelihood of a tree under a protracted speciation model. phyloTop has functions for investigating tree shape, with special functions and datasets relating to trees of infectious diseases.

Divergence Times: Non-parametric rate smoothing (NPRS) and penalized likelihood can be implemented in ape. geiger can do congruification to stretch a source tree to match a specified standard tree. treedater implements various clock models, ways to assess confidence, and detecting outliers.

Phylogenetic Inference: UPGMA, neighbour joining, bio-nj and fast ME methods of phylogenetic reconstruction are all implemented in the package ape. phangorn can estimate trees using distance, parsimony, and likelihood. phyclust can cluster sequences. phytools can build trees using MRP supertree estimation and least squares. phylotools can build supermatrices for analyses in other software. pastis can use taxonomic information to make constraints for Bayesian tree searches. For more information on importing sequence data, see the Genetics task view pegas may also be of use.

Time series/Paleontology: Paleontological time series data can be analyzed using a likelihood-based framework for fitting and comparing models (using a model testing approach) of phyletic evolution (based on the random walk or stasis model) using paleoTS. strap can do stratigraphic analysis of phylogenetic trees.

Tree Simulations: Trees can be simulated using constant-rate birth-death with various constraints in TreeSim and a birth-death process in geiger. Random trees can be generated in ape by random splitting of edges (for non-parametric trees) or random clustering of tips (for coalescent trees). paleotree can simulate fossil deposition, sampling, and the tree arising from this as well as trees conditioned on observed fossil taxa. TESS can simulate trees with time-dependent speciation and/or extinction rates, including mass extinctions.

Trait evolution: Independent contrasts for continuous characters can be calculated using ape, picante, or caper (which also implements the brunch and crunch algorithms). Analyses of discrete trait evolution, including models of unequal rates or rates changing at a given instant of time, as well as Pagel's transformations, can be performed in geiger. Brownian motion models can be fit in geiger, ape, and paleotree. Deviations from Brownian motion can be investigated in geiger and OUwie. mvMORPH can fit Brownian motion, early burst, ACDC, OU, and shift models to univariate or multivariate data. Ornstein-Uhlenbeck (OU) models can be fitted in geiger, ape, ouch (with multiple means), and OUwie (with multiple means, rates, and attraction values). geiger fits only single-optimum models. Other continuous models, including Pagel's transforms and models with trends, can be fit with geiger. ANOVA's and MANOVA's in a phylogenetic context can also be implemented in geiger. Multiple-rate Brownian motion can be fit in RBrownie . Traditional GLS methods (sensu Grafen or Martins) can be implemented in ape, PHYLOGR, or caper. Phylogenetic autoregression (sensu Cheverud et al) and Phylogenetic autocorrelation (Moran's I) can be implemented in ape or--if you wish the significance test of Moran's I to be calculated via a randomization procedure--in adephylo. Correlation between traits using a GLMM can also be investigated using MCMCglmm. phylolm can fit phylogenetic linear regression and phylogenetic logistic regression models using a fast algorithm, making it suitable for large trees. brms can examine correlations between continuous and discrete traits, and can incorporate multiple measurements per species. phytools can also investigate rates of trait evolution and do stochastic character mapping. metafor can perform meta-analyses accounting for phylogenetic structure. pmc evaluates the model adequacy of several trait models (from geiger and ouch) using Monte Carlo approaches. phyreg implements the Grafen (1989) phyglogenetic regression. geomorph can do geometric morphometric analysis in a phylogenetic context. Disparity through time, and other disparity-related analyses, can be performed with dispRity. MPSEM can predict features of one species based on information from related species using phylogenetic eigenvector maps. Rphylip wraps PHYLIP which can do independent contrasts, the threshold model, and more. convevol and windex can both test for convergent evolution on a phylogeny.

Trait Simulations : Continuous traits can be simulated using brownian motion in ouch, geiger, ape, picante, OUwie, and caper, the Hansen model (a form of the OU) in ouch and OUwie and a speciational model in geiger. Discrete traits can be simulated using a continuous time Markov model in geiger. phangorn can simulate DNA or amino acids. Both discrete and continuous traits can be simulated under models where rates change through time in geiger. phytools can simulate discrete characters using stochastic character mapping. phylolm can simulate continuous or binary traits along a tree.

Tree Manipulation : Branch length scaling using ACDC Pagel's (1999) lambda, delta and kappa parameters and the Ornstein-Uhlenbeck alpha parameter (for ultrametric trees only) are available in geiger. phytools also allows branch length scaling, as well as several tree transformations (adding tips, finding subtrees). Rooting, resolving polytomies, dropping of tips, setting of branch lengths including Grafen's method can all be done using ape. Extinct taxa can be pruned using geiger. phylobase offers numerous functions for querying and using trees (S4). Tree rearrangements (NNI and SPR) can be performed with phangorn. paleotree has functions for manipulating trees based on sampling issues that arise with fossil taxa as well as more universal transformations. dendextend can manipulate dendrograms, including subdividing trees, adding leaves, and more. enveomics.R can prune a tree to keep clade representatives.

Community/Microbial Ecology : picante, vegan, SYNCSA, phylotools, PCPS, caper, DAMOCLES integrate several tools for using phylogenetics with community ecology. HMPTrees and GUniFrac provide tools for comparing microbial communities. betapart allows computing pair-wise dissimilarities (distance matrices) and multiple-site dissimilarities, separating the turnover and nestedness-resultant components of taxonomic (incidence and abundance based), functional and phylogenetic beta diversity. adiv can calculate various indices of biodiversity including species, functional and phylogenetic diversity, as well as alpha, beta, and gamma diversities. entropart can measure and partition diversity based on Tsallis entropy as well as calculate alpha, beta, and gamma diversities. metacoder is an R package for handling large taxonomic data sets, like those generated from modern high-throughput sequencing, like metabarcoding.

Phyloclimatic Modeling : phyloclim integrates several new tools in this area.

Phylogeography / Biogeography : phyloland implements a model of space colonization mapped on a phylogeny, it aims at estimating limited dispersal and competitive exclusion in a statistical phylogeographic framework. jaatha can infer demographic parameters for two species with multiple individuals per species. diversitree implements the GeoSSE method for diversification analyses based on two areas.

Species/Population Delimitation : adhoc can estimate an ad hoc distance threshold for a reference library of DNA barcodes.

Tree Plotting and Visualization: User trees can be plotted using ape, adephylo, phylobase, phytools, ouch, and dendextend several of these have options for branch or taxon coloring based on some criterion (ancestral state, tree structure, etc.). paleoPhylo and paleotree are specialized for drawing paleobiological phylogenies. Trees can also be examined (zoomed) and viewed as correlograms using ape. Ancestral state reconstructions can be visualized along branches using ape and paleotree. phytools can project a tree into a morphospace. BAMMtools can visualize rate shifts calculated by BAMM on a tree. The popular R visualization package ggplot2 can be extended by ggtree to visualize phylogenies. Trees can also be to interactively explored (as dendrograms) using idendr0. phylocanvas is a widget for "htmlwidgets" that enables embedding of phylogenetic trees using the phylocanvas javascript library. ggmuller allows plotting a phylogeny along with frequency dynamics.

Tree Comparison: Tree-tree distances can be evaluated, and used in additional analyses, in distory and Rphylip. ape can compute tree-tree distances and also create a plot showing two trees with links between associated tips. kdetrees implements a non-parametric method for identifying potential outlying observations in a collection of phylogenetic trees, which could represent inference problems or processes such as horizontal gene transfer. dendextend can evaluate multiple measures comparing dendrograms.

Taxonomy: taxize can interact with a suite of web APIs for taxonomic tasks, such as verifying species names, getting taxonomic hierarchies, and verifying name spelling. evobiR contains functions for making a tree at higher taxonomic levels, downloading a taxonomy tree from NCBI or ITIS, and various other miscellaneous functions (simulations of character evolution, calculating D-statistics, etc.).

Gene tree - species tree: HyPhy can count the duplication and loss cost to reconcile a gene tree to a species tree. It can also sample histories of gene trees from within family trees.

Interactions with other programs: geiger can call PATHd8 through its congruify function. ips wraps several tree inference and other programs, including MrBayes, Beast, and RAxML, allowing their easy use from within R. Rphylip wraps PHYLIP , a broad variety of programs for tree inference under parsimony, likelihood, and distance, bootstrapping, character evolution, and more. BoSSA can use information from various tools to place a query sequence into a reference tree. pastis can use taxonomic information to make constraints for MrBayes tree searches.

Notes: At least ten packages start as phy* in this domain, including two pairs of similarly named packages (phytools and phylotools, phylobase and phybase). This can easily lead to confusion, and future package authors are encouraged to consider such overlaps when naming packages. For clarification, phytools provides a wide array of functions, especially for comparative methods, and is maintained by Liam Revell phylotools has functions for building supermatrices and is maintained by Jinlong Zhang. phylobase implements S4 classes for phylogenetic trees and associated data and is maintained by Francois Michonneau phybase has tree utility functions and many functions for gene tree - species tree questions and is authored by Liang Liu, but no longer appears on CRAN.


Rate of Change

Rates of change can be positive or negative. This corresponds to an increase or decrease in the y -value between the two data points. When a quantity does not change over time, it is called zero rate of change.

Positive rate of change

When the value of x increases, the value of y increases and the graph slants upward.

Negative rate of change

When the value of x increases, the value of y decreases and the graph slants downward.

Zero rate of change

When the value of x increases, the value of y remains constant. That is, there is no change in y value and the graph is a horizontal line .

Use the table to find the rate of change. Then graph it.

Time &thinsp &thinsp Driving &thinsp &thinsp ( h ) &thinsp x Distance &thinsp &thinsp Travelled &thinsp ( mi ) &thinsp y 2 80 4 160 6 240

A rate of change is a rate that describes how one quantity changes in relation to another quantity.

The rate of change is 40 1 or 40 . This means a vehicle is traveling at a rate of 40 miles per hour.

Download our free learning tools apps and test prep books

Names of standardized tests are owned by the trademark holders and are not affiliated with Varsity Tutors LLC.

4.9/5.0 Satisfaction Rating over the last 100,000 sessions. As of 4/27/18.

Media outlet trademarks are owned by the respective media outlets and are not affiliated with Varsity Tutors.

Award-Winning claim based on CBS Local and Houston Press awards.

Varsity Tutors does not have affiliation with universities mentioned on its website.

Varsity Tutors connects learners with experts. Instructors are independent contractors who tailor their services to each client, using their own style, methods and materials.


4.4 A three stage model

Figure 4.5: The life cycle of garlic mustard using a post-breeding census (Pardini et al. 2009). The census takes place in May of each year. Each arrow represents the transition from May to May. Seeds germinate in early spring and become rosettes (basal leaves near the soil surface). The rosettes experience mortality all summer, fall, and winter. Surviving rosettes become reproductive adults the following spring a summer. Adults flower and are pollinated in June, after which the fruits ripen and seeds mature. Seeds overwinter for at least six months before germinating in the spring. Not all seeds germinate, but they may remain viable in the seed bank for several years. Thus, the complete life cycle at least two years. Once the seeds germinate, the plant requires over a year to reach maturity, and produce flowers, fruits and seeds.

Let’s work through these probability transitions.

  • (s_1) , a germinated seed survives as a rosette.
  • (s_2) , surviving from May to August as a rosette.
  • (s_3) , surviving from August to early May and becoming a reproductive plant.
  • (v) , a seed is viable (survives and can germinate).
  • (g_1) , a viable seed germinates in the first season, or (1-g_1) remains ungerminated.
  • (g_2) , a viable seed germinates in the second season or (1-g_2) does not.

Fecundity, (f) , is the average number of seeds per reproductive plant.

The transition matrix A would thus be

Put into your own words each of the transition elements.

What about the transition from adult to rosette? Did the plant shrink? While perennial plants can get smaller, or regress, that is not what happens here. In this transition, the adult in May gets pollinated, develops fruits, the seeds mature and are deposited on the soil late that summer or fall. Those seeds are survive overwinter, germinate in early spring, and grow into rosettes that summer, and survive until the next census in May. In that way, stage 3 (adult) contributes to stage 2 (rosette) through reproduction plus survival and growth. The transition from adult to seed, (p_<13>) , occurs only when the seeds do not germinate after the first winter, but spend another year in the seed bank in the soil.

Once we have the transition matrix, we can use it to characterize many features of the population, including the finite rate of increase ( (lambda) ), the predicted relative abundances of the various stages, and the relative importance of each separate transition (p_) for the long term population growth rate. We will do this in a later section, but first will explore projection. It is frequently useful to acutally project the population, one year at a time, into the future.


Biology Lab. Introduction to Science

You should submit your document in a Word (.doc or .docx) or Rich Text Format (.rtf) for best compatibility.

Exercise 1: Data Interpretation

Table 1: Water Quality vs. Fish Population

Dissolved Oxygen |0 |2 |4 |6 |8 |10 |12 |14 |16 |18 | |Number of Fish Observed |0 |1 |3 |10 |12 |13 |15 |10 |12 |13 | |

1. What patterns do you observe based on the information in Table 1?

The patterns that I observe based on the information in Table 1 are:

• The level of ‘Dissolved Oxygen’ consistently increases by an increment of 2 with each subsequent data point

• The ‘Number of Fish Observed’ seems to fluctuate with no real consistency (with the exception that after the peak # of 15 fish was observed, the next
3 data points reflect “10, 12, 13” which was the same # of fish that were observed in the exact order prior to reaching the peak 15.

(In short, the pattern 10, 12, 13 seems to have repeated itself.)

• The level of ‘Dissolved Oxygen’ does not seem to decrease when the ‘Number of Fish Observed’ decreases

2. Develop a hypothesis relating to the amount of dissolved oxygen measured in the water sample and the number of fish observed in the body of water.

“ Thank you so much for accepting my assignment the night before it was due. I look forward to working with you moving forward ”

Based on the information provided in the table, I would hypothesize that the number of fish observed has no bearing on the level of oxygen dissolved. This hypothesis would be based on the fact that the dissolved oxygen steadily and consistently increased by an increment of 2 with each progressive data point. When the number of fish observed was significantly increased from 3 to 10, the dissolved oxygen level only increased by 2. Conversely, when the number of fish observed significantly decreased from 15 to 10, the dissolved oxygen still maintained that consistent increase of 2.

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

3. What would your experimental approach be to test this hypothesis?

The experimental approach that I would use to test this hypothesis would be to obtain a 2 freshwater fish tanks, fresh water, fish, an aquarium water level meter and a dissolved oxygen meter. In one tank, I would ensure that the aquarium was filled with a specifically determined level of water and measure the level of dissolved oxygen present with no fish. Then I would gradually begin adding fish daily, starting with one fish. Each day I would ensure that the water level remained the same as it was prior to adding the first fish and I would continue increasing/decreasing the total number of fish daily. I would also consistently measure the dissolved oxygen levels as I introduced or removed fish to observe the levels.

In the 2nd fish tank, I would ensure that the level of freshwater and dissolved oxygen matched the levels of the first fish tank prior to adding any fish. Then, I would add the maximum number of fish that I intended to observe in the 1st tank and observe the oxygen level. For the duration of the experiment, I would not ensure that the water level remains the same but I would not modify the total fish in this tank. I would also observe the oxygen levels in the 2nd tank throughout the experiment.

4. What are the independent and dependent variables?

The independent variable in this experiment would be the total number of fish being observed, and the dependent variable would be the dissolved oxygen.

5. What would be your control?

My control in this experiment would be the 2nd fish tank, which I would not fluctuate the total number of observed fish.

6. What type of graph would be appropriate for this data set? Why?

The most appropriate type of graph to utilize, which would best illustrate the data being compared in this example, would be a line graph. I would use a line graph because it most clearly and effectively demonstrates how the two independent data sets are related, as well as how their independent fluctuations in volume affect one another.

7. Graph the data from Table 1: Water Quality vs. Fish Population (found at the beginning of this exercise). You may use Excel, then “Insert” the graph, or use another drawing program. You may also draw it neatly by hand and scan your drawing. If you choose this option, you must insert the scanned jpg image here.

8. Interpret the data from the graph made in Question 7.

The data from the graph supports my hypothesis that the total number of fish
observed does not have any bearing on the level of dissolved oxygen, which steadily increases by a level of two with each data point.

Exercise 2: Testable Observations

Determine which of the following observations (A-J) could lead to a testable hypothesis.

For those which are testable:

Write a hypothesis and null hypothesis

What would be your experimental approach?

What are the dependent and independent variables?

How will you collect your data?

How will you present your data (charts, graphs, types)?

How will you analyze your data?

1. When a plant is placed on a window sill, it grows three inches faster per day than when it is placed on a coffee table in the middle of the living room. – TESTABLE

• Hypothesis – The plant will grow at a faster rate per day when it is placed on a window sill as opposed to being placed on a coffee table in the middle of a living room.

• Null Hypothesis – The location of the plant has no bearing on the
growth rate per day.

• Experimental Approach – I would gather four identical plants, two of which I would I would rotate between the living room and window sill daily, and the other two would remain static in their locations for the entire duration of the experiment. I would treat and care for all plants in an identical manner, ensure that their respective locations remained precisely the same, as well as measure and record the growth of each plant daily. After a sufficient period of time had elapsed, I would record the final relevant data in Excel, and insert a line graph with all four plants incorporated into a single chart, which would also demonstrate the growth rate over time.

Subsequently, based on the information contained within the data points, and the line graph comparison, I would draw a final conclusion and present my data to interested parties in the form of a brief Microsoft PowerPoint presentation. I would include a brief summary of the intent of the experiment, a detailed explanation of the tools and exact process in which I used to conduct my tests, and all of the raw data statistics relative to the daily growth rate of all four plants.

• Dependent Variable – The location of the plants.

• Independent Variable – The growth rate of the plants.

• Control – The 2 static plants.

2. The teller at the bank with brown hair and brown eyes and is taller than the other tellers. – NOT TESTABLE

3. When Sally eats healthy foods and exercises regularly, her blood pressure is 10 points lower than when she does not exercise and eats unhealthy foods. – TESTABLE

• Hypothesis – Sally’s blood pressure will be lower when she eats healthy
foods and exercises regularly.

• Null Hypothesis – The fact that Sally eats healthy foods and excercises regularly will have no effect on Sally’s blood pressure.

• Experimental Approach – I would first observe and record, for a sufficient period of time, Sally’s eating habits, exercise regimen, and blood pressure, when she is not eating as healthy or exercising as regularly to accurately gauge a reliable average of the range of her blood pressure in this phase of the experiment. Then, I would ensure that Sally was placed on a healthy eating plan, approved by a nutritionist, and prescribe an exercise routine. Sally’s eating habits and exercise regimen would again be recorded daily, along with her blood pressure statistics and other relevant information.

I would track and record the daily relevant statistics in Excel, and I would also use a line graph to illustrate the comparison of her blood pressure over time under the two different scenarios. Subsequently, based on the information contained within the data points, and the line graph comparison, I would draw final conclusion and present my data to interested parties in the form of a brief Microsoft PowerPoint presentation. I would include a brief summary of the intent of the experiment, a detailed explanation of the tools and exact process in which I used to conduct my tests, and all of the raw data statistics relative to the changes in Sally’s blood pressure as well as her diet and exercise habits throughout the experiment process.

• Dependent Variable – Sally’s eating and exercise plan.

• Independent Variable – Sally’s blood pressure reduction.

• Control – the phase of the experiment when Sally’s blood pressure is observed and recorded when she is not eating healthy or exercising regularly.

4. The Italian restaurant across the street closes at 9 pm but the one two blocks away closes at

5. For the past two days the clouds have come out at 3 pm and it has started raining at 3:15 pm. – NOT TESTABLE

6. George did not sleep at all the night following the start of daylight savings. – NOT TESTABLE

For each of the following, convert each value into the designated units.
1. 46,756,790 mg = _46.7568 kg
2. 5.6 hours = _20160 seconds
3. 13.5 cm = _5.31496_ inches
4. 47 °C = 116.6 °F

Exercise 4: Accuracy and Precision

During gym class, four students decided to see if they could beat the norm of 45 sit-ups in a minute. The first student did 64 sit-ups, the second did 69, the third did 65, and the fourth did 67. 2. The average score for the 5th grade math test is 89.5. The top 4th graders took the test and scored 89, 93, 91 and 87. – Both

Yesterday the temperature was 89 °F, tomorrow it’s supposed to be 88°F and the next day it’s supposed to be 90°F, even though the average for September is only 75°F degrees! – Precision

Four friends decided to go out and play horseshoes. They took a picture of their results shown to the right: – Neither

A local grocery store was holding a contest to see who could most closely guess the number of pennies that they had inside a large jar. The first six people guessed the numbers 735, 209, 390, 300, 1005 and 689. The grocery clerk said the jar actually contains 568 pennies. – Neither

Exercise 5: Significant Digits and Scientific Notation

Part 1: Determine the number of significant digits in each number and write out the specific significant digits. 405000 – 3 (405)
0.0098 – 2 (98)
39.999999 – 8 (39999999)
13.00 – 4 (1300)
80,000,089 – 8 (80000089)
55,430.00 – 7 (5543000)
0.000033 – 2 (33)
620.03080 – 8 (62003080)

Part 2: Write the numbers below in scientific notation, incorporating what you know about significant digits. 70,000,000,000 = 7 X 1010
0.000000048 = 4.8 X 10-8
67,890,000 = 6.789 X 107
70,500 = 7.05 X 104
450,900,800 = 4.509008 X 108
0.009045 = 9.045 X 10-3
0.023 = 2.3 X 10-2


13.4.1 Doing the test in R

To run a Welch test in R is pretty easy. All you have to do is not bother telling R to assume equal variances. That is, you take the command we used to run a Student&rsquos t-test and drop the var.equal = TRUE bit. So the command for a Welch test becomes:

Not too difficult, right? Not surprisingly, the output has exactly the same format as it did last time too:

The very first line is different, because it&rsquos telling you that its run a Welch test rather than a Student test, and of course all the numbers are a bit different. But I hope that the interpretation of this output should be fairly obvious. You read the output in the same way that you would for the Student test. You&rsquove got your descriptive statistics, the hypotheses, the test results and some other information. So that&rsquos all pretty easy.

Except, except&hellip our result isn&rsquot significant anymore. When we ran the Student test, we did get a significant effect but the Welch test on the same data set is not (t(23.03)=2.03, p=.054). What does this mean? Should we panic? Is the sky burning? Probably not. The fact that one test is significant and the other isn&rsquot doesn&rsquot itself mean very much, especially since I kind of rigged the data so that this would happen. As a general rule, it&rsquos not a good idea to go out of your way to try to interpret or explain the difference between a p-value of .049 and a p-value of .051. If this sort of thing happens in real life, the difference in these p-values is almost certainly due to chance. What does matter is that you take a little bit of care in thinking about what test you use. The Student test and the Welch test have different strengths and weaknesses. If the two populations really do have equal variances, then the Student test is slightly more powerful (lower Type II error rate) than the Welch test. However, if they don&rsquot have the same variances, then the assumptions of the Student test are violated and you may not be able to trust it: you might end up with a higher Type I error rate. So it&rsquos a trade off. However, in real life, I tend to prefer the Welch test because almost no-one actually believes that the population variances are identical.


8.7 Multi-factor designs and linear models

8.7.1 What is a multifactorial design?

Let’s assume that in addition to the siRNA knockdown of the pasilla gene, we also want to test the effect of a certain drug. We could then envisage an experiment in which the experimenter treats the cells either with negative control, with the siRNA against pasilla, with the drug, or with both. To analyse this experiment, we can use the notation

This equation can be parsed as follows. The left hand side, (y) , is the experimental measurement of interest. In our case, this is the suitably transformed expression level (we’ll discuss this in Section 8.8.3) of a gene. Since in an RNA-Seq experiment there are lots of genes, we’ll have as many copies of Equation (8.1), one for each. The coefficient (eta_0) is the base level of the measurement in the negative control often it is called the intercept.

⊕ Sometimes Equation (8.1) is written with an additional term (x_0) that is multiplied with (eta_0) , where it is understood that (x_0=1) always. It turns out that this makes subsequent notation and bookkeeping easier since then the intercept can be handled consistently together with the other (eta) s, instead of being a separate case. The design factors (x_1) and (x_2) are binary indicator variables: (x_1) takes the value 1 if the siRNA was transfected and 0 if not, and similarly, (x_2) indicates whether the drug was administered. In the experiment where only the siRNA is used, (x_1=1) and (x_2=0) , and the third and fourth terms of Equation (8.1) vanish. Then, the equation simplifies to (y=eta_0+eta_1) . This means that (eta_1) represents the difference between treatment and control. If our measurements are on a logarithmic scale, then

is the logarithmic fold change due to treatment with the siRNA. In exactly the same way, (eta_2) is the logarithmic fold change due to treatment with the drug. What happens if we treat the cells with both siRNA and drug? In that case, (x_1=x_2=1) , and Equation (8.1) can be rewritten as

This means that (eta_<12>) is the difference between the observed outcome, (y) , and the outcome expected from the individual treatments, obtained by adding to the baseline the effect of siRNA alone, (eta_1) , and of drug alone, (eta_2) .

We call (eta_<12>) the interaction effect of siRNA and drug. It has nothing to do with a physical interaction, the terminology indicates that the effects of these two different experimental factors do not simply add up, but combine in a more complicated fashion.

⊕ Note that the addition is on the logarithmic scale, which corresponds to multiplication on the original scale. For instance, if the target of the drug and of the siRNA were equivalent, leading to the same effect on the cells, then we biologically expect that (eta_1=eta_2) . We also expect that their combination has no further effect, so that (eta_<12>=-eta_1) . If, on the other hand, the targets of the drug and of the siRNA are in parallel pathways that can buffer each other, we’ll expect that (eta_1) and (eta_2) are both relatively small, but that the combined effect is synergistic, and (eta_<12>) is large.

Not always do we care about interactions. Many experiments are designed with multiple factors where we care most about each of their individual effects. In that case, the combinatorial treatment might not be present in the experimental design, and the model to use for the analysis is a version of Equation (8.1) with the rightmost term removed.

We can succinctly encode the design of the experiment in the design matrix. For instance, for the combinatorial experiment described above, the design matrix is

The columns of the design matrix correspond to the experimental factors, and its rows represent the different experimental conditions, four in our case. If, instead, the combinatorial treatment is not performed, then the design matrix is reduced to only the first three rows of (8.4).

8.7.2 What about noise and replicates?

Equation (8.1) provides a conceptual decomposition of the observed data into the effects caused by the different experimental variables. If our data (the (y) s) were absolutely precise, we could set up a linear system of equations, one equation for each of the four possible experimental conditions represented by the (x) s, and solve for the (eta) s.

Of course, we usually wish to analyze real data that are affected by noise. We then need replicates to estimate the levels of noise and assess the uncertainty of our estimated (eta) s. Only then we can empirically assess whether any of the observed changes between conditions are significantly larger than those occuring just due to experimental or natural variation. We need to slightly extend the equation,

We have added the index (j) and a new term (varepsilon_j) . The index (j) now explicitly counts over our individual replicate experiments for instance, if for each of the four conditions we perform three replicates, then (j) counts from 1 to 12. The design matrix has now 12 rows, and (x_) is the value of the matrix in its (j) th row and (k) th column.

⊕ Remember that since (eta_0) is the intercept, (x_=1) for all (j) . The additional terms (varepsilon_j) , which we call the residuals, are there to absorb differences between replicates. However, one additional modeling component is needed: the system of twelve equations (8.5) would be underdetermined without further information, since it has now more variables (twelve epsilons and four betas) than it has equations (twelve, one for each (j) ). To fix this, we require that the (varepsilon_j) be small. One popular way – we’ll encounter others – to overcome this is to minimize the sum of squared residuals,

It turns out that with this requirement satisfied, the (eta) s represent the average effects of each of the experimental factors, while the residuals (varepsilon_j) reflect the experimental fluctuations around the mean between the replicates. This approach, which is called the least sum of squares fitting, is mathematically convenient, since it can achieved by straightforward matrix algebra. It is what the R function lm does.

An alternative way to write Equation (8.5) is

How can this be mapped to Equation (8.5), i.e., what’s with the interaction term (x_,x_eta_<12>) ?

This is really just a trivial matter of notation: the sum extends over (k=0. 3) , where the terms for (k=0,1,2) are exactly as we know them already. We write (eta_<3>) instead of (eta_<12>) , and (x_) is defined to be (x_x_) . The generic notation (8.7) is practical to use in computer software that implements linear models and in mathematical proofs. It also highlights that the “scientific content” of a linear model is condensed in its design matrix.

Show that if we have fit Equation (8.5) to data such that objective (8.6) holds, the fit residuals (hat_j) have an average of 0.

8.7.3 Analysis of variance

A model like (8.5) is called a linear model, and often it is implied that criterion (8.6) is used to fit it to data. This approach is elegant and powerful, but for novices it can take some time to appreciate all its facets. What is the advantage over just simply taking, for each distinct experimental condition, the average over replicates and comparing these values across conditions? In simple cases, the latter approach can be intuitive and effective. However, it comes to its limits when the replicate numbers are not all the same in the different groups, or when one or more of the (x) -variables is continuous-valued. In these cases, one will invariably end up with something like fitting (8.5) to the data. A useful way to think about (8.5) is contained in the term analysis of variance, abbreviated ANOVA. In fact, what Equation (8.5) does is decompose the variability of (y) that we observed in the course of our experiments into elementary components: its baseline value (eta_0) , its variability caused by the effect of the first variable, (eta_1) , its variability caused by the effect of the second variable, (eta_2) , its variability caused by the effect of the interaction, (eta_<12>) , and variability that is unaccounted for. The last of these we commonly call noise, the other ones, systematic variability.

⊕ The distinction between noise and systematic variability is in the eye of the beholder, and depends on our model, not on reality.

8.7.4 Robustness

The sum (8.6) is sensitive to outliers in the data. A single measurement (y_) with an outlying value can draw the (eta) estimates far away from the values implied by the other replicates. This is the well-known fact that methods based on least sum of squares have a low breakdown point: if even only a single data point is outlying, the whole statistical result can be strongly affected. For instance, the average of a set of (n) numbers has a breakdown point of (frac<1>) , meaning that it can be arbitrarily changed by changing only a single one of the numbers. On the other hand, the median has a much higher breakdown point. Changing a single number often has no effect at all, and when it does, the effect is limited to the range of the data points in the middle of the ranking (i.e., those adjacent to rank (frac<2>) ). To change the median by an arbitrarily high amount, you need to change half the observations. We call the median robust, and its breakdown point is (frac<1><2>) . Remember that the median of a set of numbers (y_1, y_2, . ) minimizes the sum (sum_j|y_j-eta_0|) .

To achieve a higher degree of robustness against outliers, other choices than the sum of squares (8.6) can be used as the objective of minimization. Among these are:

Here, (R) is the quantity to be minimized. Choice (8.8) is called least absolute deviations regression. It can be viewed as a generalization of the median. Although conceptually simple, and attractive on first sight, it is harder to minimize than the sum of squares, and it can be less stable and less efficient especially if the data are limited, or do not fit the model 122 122 The Wikipedia article gives an overview. . Choice (8.9), also called , uses a penalization function ( ho_s) (least-squares regression is the special case with ( ho_s(varepsilon)=varepsilon^2) ) that looks like a quadratic function for a limited range of (varepsilon) , but has a smaller slope, flattens out, or even drops back to zero, for absolute values (|varepsilon|) that are larger than the scale parameter (s) . The intention behind this is to downweight the effect of outliers, i.e. of data points that have large residuals (Huber 1964) . A choice of (s) needs to be made and determines what is called an outlier. One can even drop the requirement that ( ho_s) is quadratic around 0 (as long as its second derivative is positive), and a variety of choices for the function ( ho_s) have been proposed in the literature. The aim is to give the estimator desirable statistical properties (say, bias and efficiency) when and where the data fit the model, but to limit or nullify the influence of those data points that do not, and to keep computations tractable.

Plot the graph of the function ( ho_s(varepsilon)) proposed by Huber (1964) for M-estimators.

Huber’s paper defines, on Page 75:

[egin ho_s(varepsilon) = left< egin frac<1><2>varepsilon^2, quad ext|varepsilon|< s s|varepsilon|-frac<1><2>s^2, quad ext|varepsilon|ge s end ight. end]

The graph produced by the below code is shown in Figure 8.8.

Figure 8.8: Graph of ( ho_s(varepsilon)) , for a choice of (s=2) .

Choice (8.10) generalises the least sum of squares method in yet another way. In least quantile of squares (LQS) regression, the the sum over the squared residuals is replaced with a quantile, for instance, (Q_<50>) , the median, or (Q_<90>) , the 90%-quantile (Rousseeuw 1987) . In a variation thereof, least trimmed sum of squares (LTS) regression, a sum of squared residuals is used, but the sum extends not over all residuals, but only over the fraction (0le hetale1) of smallest residuals. The motivation in either case is that outlying data points lead to large residuals, and as long as they are rare, they do not affect the quantile or the trimmed sum.

However, there is a price: while the least sum of squares optimization (8.6) can be done through straightforward linear algebra, more complicated iterative optimization algorithms are needed for M-estimation, LQS and LTS regression.

Approach (8.11) represents an even more complex way of weighting down outliers. It assumes that we have some way of deciding what weight (w_j) we want to give to each observation, presumably down-weighting outliers. For instance, in Section 8.10.3, we will encounter the approach used by the DESeq2 package, in which the leverage of each data point on the estimated (eta) s is assessed using a measure called Cook’s distance. For those data whose Cook’s distance is deemed too large, the weight (w_j) is set to zero, whereas the other data points get (w_j=1) . In effect, this means that the outlying data points are discarded and that ordinary regression is performed on the others. The extra computational effort of carrying the weights along is negligible, and the optimization is still straightforward linear algebra.

All of these approaches to outlier robustness introduce a degree of subjectiveness and rely on sufficient replication. The subjectiveness is reflected by the parameter choices that need to be made: (s) in (8.9), ( heta) in (8.10), the weights in (8.11). One scientist’s outlier may be the Nobel prize of another. On the other hand, outlier removal is no remedy for sloppy experiments and no justification for wishful thinking.

Search the documentation of R and CRAN packages for implementations of the above robust regression methods. A good place to start is the CRAN task view on robust statistical methods.


References

Hershey JWB, Sonenberg N, Mathews MB: Translational Control in Biology and Medicine. 2007, Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press, 3

Kapp LD, Lorsch JR: The molecular mechanics of eukaryotic translation. Annu Rev Biochem. 2004, 73: 657-704. 10.1146/annurev.biochem.73.030403.080419

von der Haar T, Oku Y, Ptushkina M, Moerke N, Wagner G, Gross JD, McCarthy JE: Folding transitions during assembly of the eukaryotic mRNA cap-binding complex. J Mol Biol. 2006, 356 (4): 982-992. 10.1016/j.jmb.2005.12.034

Kapp LD, Lorsch JR: GTP-dependent recognition of the methionine moiety on initiator tRNA by translation factor eIF2. J Mol Biol. 2004, 335 (4): 923-936. 10.1016/j.jmb.2003.11.025

Nika J, Yang W, Pavitt GD, Hinnebusch AG, Hannig EM: Purification and kinetic analysis of eIF2B from Saccharomyces cerevisiae. J Biol Chem. 2000, 275 (34): 26011-26017. 10.1074/jbc.M003718200

Fekete CA, Mitchell SF, Cherkasova VA, Applefield D, Algire MA, Maag D, Saini AK, Lorsch JR, Hinnebusch AG: N- and C-terminal residues of eIF1A have opposing effects on the fidelity of start codon selection. Embo J. 2007, 26 (6): 1602-1614. 10.1038/sj.emboj.7601613

Fringer JM, Acker MG, Fekete CA, Lorsch JR, Dever TE: Coupled release of eukaryotic translation initiation factors 5B and 1A from 80S ribosomes following subunit joining. Mol Cell Biol. 2007, 27 (6): 2384-2397. 10.1128/MCB.02254-06

Gromadski KB, Schummer T, Stromgaard A, Knudsen CR, Kinzy TG, Rodnina MV: Kinetics of the interactions between yeast elongation factors 1A and 1Balpha, guanine nucleotides, and aminoacyl-tRNA. J Biol Chem. 2007, 282 (49): 35629-35637. 10.1074/jbc.M707245200

Rodnina MV, Beringer M, Wintermeyer W: How ribosomes make peptide bonds. Trends Biochem Sci. 2007, 32 (1): 20-26. 10.1016/j.tibs.2006.11.007

Menne TF, Goyenechea B, Sanchez-Puig N, Wong CC, Tonkin LM, Ancliff PJ, Brost RL, Costanzo M, Boone C, Warren AJ: The Shwachman-Bodian-Diamond syndrome protein mediates translational activation of ribosomes in yeast. Nat Genet. 2007, 39 (4): 486-495. 10.1038/ng1994

Verlhac MH, Chen RH, Hanachi P, Hershey JW, Derynck R: Identification of partners of TIF34, a component of the yeast eIF3 complex, required for cell proliferation and translation initiation. Embo J. 1997, 16 (22): 6812-6822. 10.1093/emboj/16.22.6812

von der Haar T, McCarthy JE: Intracellular translation initiation factor levels in Saccharomyces cerevisiae and their role in cap-complex function. Mol Microbiol. 2002, 46 (2): 531-544. 10.1046/j.1365-2958.2002.03172.x

Belle A, Tanay A, Bitincka L, Shamir R, O'Shea EK: Quantification of protein half-lives in the budding yeast proteome. Proc Natl Acad Sci USA. 2006, 103 (35): 13004-13009. 10.1073/pnas.0605420103

Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O'Shea EK, Weissman JS: Global analysis of protein expression in yeast. Nature. 2003, 425 (6959): 737-741. 10.1038/nature02046

Lu P, Vogel C, Wang R, Yao X, Marcotte EM: Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol. 2007, 25 (1): 117-124. 10.1038/nbt1270

Newman JR, Ghaemmaghami S, Ihmels J, Breslow DK, Noble M, DeRisi JL, Weissman JS: Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature. 2006, 441 (7095): 840-846. 10.1038/nature04785

Beyer A, Hollunder J, Nasheuer HP, Wilhelm T: Post-transcriptional expression regulation in the yeast Saccharomyces cerevisiae on a genomic scale. Mol Cell Proteomics. 2004, 3 (11): 1083-1092. 10.1074/mcp.M400099-MCP200

Greenbaum D, Colangelo C, Williams K, Gerstein M: Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol. 2003, 4 (9): 117- 10.1186/gb-2003-4-9-117

Holstege FC, Jennings EG, Wyrick JJ, Lee TI, Hengartner CJ, Green MR, Golub TR, Lander ES, Young RA: Dissecting the regulatory circuitry of a eukaryotic genome. Cell. 1998, 95 (5): 717-728. 10.1016/S0092-8674(00)81641-4

Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D: Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2003, 100 (7): 3889-3894. 10.1073/pnas.0635171100

Holland MJ: Transcript abundance in yeast varies over six orders of magnitude. J Biol Chem. 2002, 277 (17): 14363-14366. 10.1074/jbc.C200101200

Jelinsky SA, Samson LD: Global response of Saccharomyces cerevisiae to an alkylating agent. Proc Natl Acad Sci USA. 1999, 96 (4): 1486-1491. 10.1073/pnas.96.4.1486

Roth FP, Hughes JD, Estep PW, Church GM: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol. 1998, 16 (10): 939-945. 10.1038/nbt1098-939

Perry RP: Balanced production of ribosomal proteins. Gene. 2007, 401 (1–2): 1-3. 10.1016/j.gene.2007.07.007

Boehlke KW, Friesen JD: Cellular content of ribonucleic acid and protein in Saccharomyces cerevisiae as a function of exponential growth rate: calculation of the apparent peptide chain elongation rate. J Bacteriol. 1975, 121 (2): 429-433.

Schweizer E, Halvorson HO: On the regulation of ribosomal RNA synthesis in yeast. Exp Cell Res. 1969, 56 (2): 239-244. 10.1016/0014-4827(69)90008-1

Udem SA, Warner JR: Ribosomal RNA synthesis in Saccharomyces cerevisiae. J Mol Biol. 1972, 65 (2): 227-242. 10.1016/0022-2836(72)90279-3

Waldron C, Jund R, Lacroute F: The elongation rate of proteins of different molecular weight classes in yeast. FEBS Lett. 1974, 46 (1): 11-16. 10.1016/0014-5793(74)80323-6

Waldron C, Lacroute F: Effect of growth rate on the amounts of ribosomal and transfer ribonucleic acids in yeast. J Bacteriol. 1975, 122 (3): 855-865.

Warner JR: The economics of ribosome biosynthesis in yeast. Trends Biochem Sci. 1999, 24 (11): 437-440. 10.1016/S0968-0004(99)01460-7

von der Haar T: Optimized protein extraction for quantitative proteomics of yeasts. PLoS ONE. 2007, 2 (10): e1078- 10.1371/journal.pone.0001078

Hereford LM, Rosbash M: Number and distribution of polyadenylated RNA sequences in yeast. Cell. 1977, 10 (3): 453-462. 10.1016/0092-8674(77)90032-0

Kief DR, Warner JR: Coordinate control of syntheses of ribosomal ribonucleic acid and ribosomal proteins during nutritional shift-up in Saccharomyces cerevisiae. Mol Cell Biol. 1981, 1 (11): 1007-1015.

Waldron C, Jund R, Lacroute F: Evidence for a high proportion of inactive ribosomes in slow-growing yeast cells. Biochem J. 1977, 168 (3): 409-415.

Sangthong P, Hughes J, McCarthy JE: Distributed control for recruitment, scanning and subunit joining steps of translation initiation. Nucleic Acids Res. 2007, 35 (11): 3573-3580. 10.1093/nar/gkm283

Taylor DJ, Frank J, Kinzy TG: Structure and function of the eukaryotic ribosome and elongation factors. Translational control in biology and medicine. Edited by: Mathews MB, Sonenberg N, Hershey JWB. 2007, 59-85. Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press

von der Haar T, Tuite MF: Regulated translational bypass of stop codons in yeast. Trends Microbiol. 2007, 15 (2): 78-86. 10.1016/j.tim.2006.12.002

Pisareva VP, Pisarev AV, Hellen CU, Rodnina MV, Pestova TV: Kinetic analysis of interaction of eukaryotic release factor 3 with guanine nucleotides. J Biol Chem. 2006, 281 (52): 40224-40235. 10.1074/jbc.M607461200

Pavitt GD: eIF2B, a mediator of general and gene-specific translational control. Biochem Soc Trans. 2005, 33 (Pt 6): 1487-1492.

Singh CR, Udagawa T, Lee B, Wassink S, He H, Yamamoto Y, Anderson JT, Pavitt GD, Asano K: Change in nutritional status modulates the abundance of critical pre-initiation intermediate complexes during translation initiation in vivo. J Mol Biol. 2007, 370 (2): 315-330. 10.1016/j.jmb.2007.04.034

Jorgensen F, Kurland CG: Processivity errors of gene expression in Escherichia coli. J Mol Biol. 1990, 215 (4): 511-521. 10.1016/S0022-2836(05)80164-0

Arava Y, Boas FE, Brown PO, Herschlag D: Dissecting eukaryotic translation and its control by ribosome density mapping. Nucleic Acids Res. 2005, 33 (8): 2421-2432. 10.1093/nar/gki331

Buchan JR, Stansfield I: Halting a cellular production line: responses to ribosomal pausing during translation. Biol Cell. 2007, 99 (9): 475-487. 10.1042/BC20070037

Pisareva VP, Hellen CU, Pestova TV: Kinetic analysis of the interaction of guanine nucleotides with eukaryotic translation initiation factor eIF5B. Biochemistry. 2007, 46 (10): 2622-2629. 10.1021/bi062134g