Research/group/dca/sysbio - Department of Information Technology

Applications in Systems Biology

Background

An important aim of systems biology is to understand the biological phenomena that we observe on a cellular, tissue and organism level, through the molecular interactions that occur inside individual cells. Computational Systems Biology deals with computer assisted modeling in cell biology as well as the method development that enables such approaches. In a young and interdisciplinary field such as Systems Biology, fundamental methods development needs to go hand-in-hand with software development in order to rapidly disseminate methods research and make an impact to push the borders for what modelers consider feasible. Cloud services and SaaS provide a great way to put the latest methods research in the hands of the practitioners.

A popular way to model and simulate biochemical networks in a subcellular setting is through stochastic models such as Markov processes. Using such models, time series of individual realizations of the chemical network dynamics can be generated. In order to study the effects of stochasticity or the robustness of the biochemical networks to noise, vast numbers of such stochastic realizations has to be generated in order to perform a statistical analysis. Often, this needs to be repeated for many different values of the model parameters. This is common to many applications that rely on Monte Carlo techniques. The computational experiments needed to be performed in order to address biological questions often require large computational resources and might have demanding storage requirements, but are often largely task parallel and maps well to distributed and heterogeneous computational resources.

Research

StochSS - Stochastic Simulation Service

We participate in the development of StochSS - Stochastic Simulation Service. Based on our observations on how stochastic biochemical network simulations are used in modeling projects today, we are building a service oriented platform where one can scale a model from simple ordinary differential equations, to well-mixed stochastic models, to PDEs, up to spatial stochastic simulations. In addition to scaling the modeling levels, we are developing the infrastructure to scale computational resources from local workstations to local clusters to grids and (hybrid) clouds depending on the resource requirement of the scientific workflow executed by a user and modeler.

The figure shows a snapshot of the StochSS UI - a web application that can serve as a native application (OSX, Linux) by serving the app from localhost, or be deployed as a service by an admin user.

For more information, visit us at GitHub.

Distributed Computation

While a single realization can give valuable insight into the dynamics, to get a complete picture it is necessary to generate large ensembles of independent realizations and perform a statistical analysis. Furthermore, many parameters such as kinetic rate constants are usually undetermined by experiments, or known with poor precision (e.g. ±100%). In practice, this means that parameter sweeps are necessary in order to fit the model to experimental data, or to assess how robust a hypothesis is to variations in parameters. The problem is data-intensive but highly task-parallel and scales in principle linearly with the number of available processors. Problems of this type are especially well suited for distributed computational environments such as Clouds and Grids, and are often referred to as e-Science applications.

Together with collaborators in eSSENCE, P-O Östberg (UmU), Erik Elmroth (UmU), Sverker Holmgren (UU) and with Brian Drawert (UCSB) we are investigating different approaches and tools to efficiently and flexibly scale simulations with URDME to distributed computing environments such as grids and clouds.

The figure shows the architecture of an integration framework that we created as a case study of how to enable grid support to URDME through the StratUm framework developed in the Distributed Computing Group, UmU (Image credit Brian Drawert). Distributed computations are an economical way of conducting e.g. large scale parameter sweeps as shown to the right.

For more information, see

P-O. Östberg, A. Hellander, B. Drawert, E. Elmroth, S. Holmgren and L. Petzold, Abstractions for Scaling eScience Applications to Distributed Computing Environments; A StratUm Integration Case Study in Molecular Systems Biology, Proceedings of BIOINFORMATICS 2012, International Conference on Bioinformatics Models, Methods, and Algorithms, pp. 290-294, 2012.

P-O Östberg, A. Hellander, B. Drawert, E. Elmroth, S. Holmgren, L. Petzold, Reducing Complexity in Management of eScience Computations, Proceedings of CCGrid 2012 - The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 845-852, 2012

Simulation Output Analysis

A great challenge relates to efficient management and analysis of the vast amount of simulation output a computational experiment might generate. For example, for a spatial stochastic simulation using fine grid size, high temporal resolution and $10^6$ different parameter points, even for a modest number of independent replications (e.g. 100) per parameter case, the output data could easily amount to 100TB. In some cases, aggregation and reduction functions can be applied to the data at the time of simulation to reduce the size of the output, but in other cases it may be necessary for the data to be made persistent, at least for an amount of time. Modelers might not know the different ways they need to analyze the simulation output (this is more or less inherent in the scientific processes) early in a modeling project.

In a collaboration with Tore Risch, Sverker Holmgren and Salman Toor we are exploring the utility of SciSPARQL, an extension to SPARQL for scientific array data developed by Risch's group, and Chelonia, a distributed storage system developed by Toor, to allow for felixble and efficient postprocessing of the time series data generated by URDME during a computational experiment. For more information, consult

S. Toor, A. Andrejev, A. Hellander, S. Holmgren and T. Risch, Scientific analysis by queries in extended SPARQL over a distributed e-Science data store, Proceeding of the 9th International Conference on e-Science 2013.

For other aspects of Computational Systems Biology research at the department, please visit the Computational System Biology Research Group.

Contact

Andreas Hellander