Centralized Life Sciences Data (CLSD)

The Problem
Increasingly, biomedical research requires integration of data from a wide variety of sources. There's much work involved in assembling the data a researcher needs and each researcher has to repeat these tasks:

Commonly available web interfaces work poorly for frequent, large in number, and large in size queries, which are best handled programmatically.


The Solution
Public datasets are downloaded and prepared for use locally at IU. User applications draw data through a single, centralized interface called CLSD. CLSD is implemented using a DB2 database on our IBM Research SP supercomputer which has been enhanced via IBM's Information Integrator to enable users to access both local and external data.

Maintenance
The clsd-update and clsd-monitor programs work together to automate the process of downloading data from FTP sites, parsing it into relational form, and loading it into DB2 for use by CLSD, while checking and reporting on any errors in the process.

Indiana University

Copyright 2005, The Trustees of Indiana University