Abstract
While Next-Generation Sequencing (NGS) offers biologists a way to obtain genetic data at a fraction of the cost and unprecedented speeds compared to conventional sequencing, it also comes with unprecedented complexity - imagine trying to re-assemble an encyclopedia from its shredded pages. That's roughly the task that alignment programs are faced with, and it's only the first step in NGS analysis. Hundreds of algorthims have been developed to work with NGS data, but no tool has yet emerged as a "one size fits all" solution. Each has its own specific function, as well as its own file format, documentation, and parameters - and just about all of them still only have a command-line interface. Obviously, there is huge potential for human error - meticulous data provenance here is key - not to mention the sheer usability issues that a biologist (especially one with little command-line experience) is likely to encounter.
We are currently working with Nicola Camp at the University of Utah's Division of Genetic Epidemiology to develop a visualization technique for an ongoing research project searching for deleterious and protective genetic variations for breast cancer risk in the chromosomal regions surrounding 3 apoptosis genes: CASP8, DR4 and DR5. We have set up a Galaxy server as a more user-friendly environment for chaining NGS tools together, and we are still in the brainstorming stages of how to effectively visualize the results.
Piyush Rai
Title: Nonparametric Bayesian Models for Inferring Low-Dimensional Structures from High-Dimensional Data
Abstract
Real-world data sets are complex and high-dimensional; for example, text corpora, gene-expression data sets, image databases, speech signals, etc. However, many of these data sets can often be described using simpler, lower-dimensional "latent" structures. Finding these latent structures can yield two-fold benefits: (1) better understanding of the data via dimensionality reduction, and (2) using the latent structures for prediction problems which are otherwise prone to overfitting in high dimensions. This talk will describe some of my recent work on nonparametric Bayesian models for learning such low-dimensional structures, as well as automatically figuring out the "right size" of these structures, avoiding the need of ad-hoc methods based on model selection.