Jillian and Joachim Willey describe how to use R (http://www.R-project.org/) to estimate the properties of random processes from datasets of observed time-series data. A working knowledge of R is assumed. Experimental data on the action of thymidine kinase from a phage TK2 is analyzed using autocorrelation and regression techniques, providing evidence for the presence of a short-term memory component, a multivariate stochastic process with zero cross-correlation, and evidence for a nonlinear relationship between growth and TK2 enzyme activity. This simulation study reinforces the biological background and provides evidence for the use of the R procedures in data analysis.
The Human Genome Project launched a new era in genomic science with a goal of discovering the complete, unduplicated sequence of all organisms’ genomes. This project was accompanied by several complementary efforts: the sequencing of large collections of organisms’ genomes, the sequencing of the genomes of a wider range of organisms, and the development of a set of ‘standard organisms’ that could serve as models of human biology. In this paper we describe a two-part computational project aimed at the latter goal: phylogenetic reconstruction of the ancestral sequence of all protein-coding genes in Drosophila species. First, we focus on the species whose genomes were sequenced in the Human Genome Project: Drosophila melanogaster, Drosophila pseudoobscura and Drosophila virilis. Second, we apply a phylogenetic reconstruction strategy that includes a sequence homology search component as well as a gene order and synteny reconstruction component. We demonstrate the quality of our results and address the accuracy and scalability of the new computational strategy. By testing our method on the whole-genome sequences of the four species, we can accurately reconstruct the missing genomes of many fly species and can demonstrate that even for short genomes the method can capture the ancestral sequence of all protein-coding genes.
Faced with limited computing power, scientists must draw choices among useful quantitative phenotypes from among a complex range of possible phenotypes. In large animal genetics, for example, many animals are surveyed at birth, but only a few animals are actually phenotyped under standardized conditions. New statistical approaches are needed to quantify the sources of variation among these birth data and to identify the primary variations that serve as a selection basis. Here we propose a novel method for identifying the relative importance of discrete phenotypes for selection in animal genetics. We first classify phenotypes according to three criteria: (i) ease of measurement at birth, (ii) heritability, and (iii) the need for additional indicators (e.g. show) to link it with a particular gene of interest. Data for animals from replicate lines are modeled as phenotypes of a single trait and, using factor models, we assess: (i) the additive genetic effects on each criterion and (ii) the degree to which each criterion is required for identifying a given allele, considering both the additive genetic variance and the variance component due to random sampling from ancestors. Based on a genetic diversity panel of breed-cross lines, we demonstrate our method in both simulated and real data, explaining various sources of variation in reproduction, growth, and disease resistance, and providing information on the importance of each criterion for detecting an allele of interest. We have investigated the behaviour of a variety of correlation measures used to assess co-expression in yeast ribosome (rRNA) data. We show that simple correlation measures, such as Pearson and Spearman, are likely to be inappropriate when applied to rRNA data, both because of the inherent nature of rRNA transcriptional regulation, and also because of their association with cell-size. By contrast, we find that measures which model the multiplicity of rRNA operons, such as the Jaccard measure, are more robust. Finally, we consider the effect of the presence of non-rRNA loci on quantitative measures. We find that the Tukey family provides the most robust correlation measures, even in the presence of non-rRNA loci. 5ec8ef588b