Data platform improves cancer data access

New web based software, which can extract potentially life-saving knowledge from data in minutes, has bioinformatic research applications that could prove essential to laboratory technicians wanting to improve diagnosis and treatment of cancer.

In Europe, cancer is the second cause of death; worldwide it accounts for 23.5 per cent of all deaths. The race to beat this disease increasingly depends on the techniques and methods used to extract relevant data. Current practices being developed in the bioinformatic field creates massive amounts of data.

While the great strength of biological data, especially in Cancer, adds to the depth of understanding, the worry is that the bottleneck that is forming will threaten to engulf all the good work that has been so far achieved.

Specific kinds of data will include, expression data: related to the transcriptome, these data will include single assay results like RT-PCR or Northern-blot as well as new types of biological data, like microarrays (on nylon membranes, glass or Affymetrix DNA chips).

Positional and mutational data that is related to the genome, will include comparative genomic hybridisation with intact chromosomes or BAC arrays, sequence or SNP (Single Nucleotide Polymorphism) analyses, and DNA microarrays.

"Bioinformatics faces several challenges," says Philippe Boutruche, coordinator of the IST project HKIS. "Life scientists need to access data from many different sources and in a variety of formats." He adds that they lack standards to cross all this data, which cover everything from human DNA to genomes, and may spend weeks doing this manually.

New techniques using mixed data from the genome, transcriptome and proteome can be used for cancer detection and evaluation of therapy resulting in better diagnosis and treatment efficiency. To become widespread these techniques require integration of intelligent processing, and easy control and access to analysis data and results.

The project proposes the realisation of a platform integrating the most recent advances in data processing that will be validated by 3 European oncology hospitals owning very rich but under-exploited tumours and clinical databases. The created web based platform will allow formalisation of re-usable, simple to operate, complex chaining of data mining and data morphing treatments. To reach this goal, the project is organised in a definition phase, followed by a realisation phase and ended by an evaluation phase. This last phase prepares further future redeployment.

The new integrated software platform, developed under HKIS is deemed suitable for biological and biomedical data processing in cancerology. The platform was built around the established Amadea - software used by banks and marketers for processing, crossing and transforming data.

At just 20MB in size, the basic interactive platform is miniscule compared to similar tools. Principally aimed at medical and biological professionals, it can connect to all data types saved in any form or structure. It can also integrate and analyse new data sources from public and private databases much faster than more labour-intensive solutions.

The platform needs no programming, can be accessed on the Internet and may be used by people with different expertise levels. Thanks to a cache memory management system and special algorithms, it provides graphical output for each analysis stage in real time, even if data is stored on another server.

"We want to provide doctors, bioinformaticians and clinicians with a common environment to build data-driven experiments," said Boutruche. "The project's platform is homogeneous, so there is no need to export or configure data from one format to another. Being integrated, it allows a continuous workflow with raw data saved in XML format. Users can run statistical mining or algorithms, which may show why the genes of some patients are more susceptible to cancer."

>HKIS' project proposes the realisation of a platform integrating the most recent advances in data processing that will be validated by 3 European oncology hospitals owning very rich but under-exploited tumours and clinical databases. The created web based platform will allow formalisation of re-usable, simple to operate, complex chaining of data mining and data morphing treatments.

Experimental trials were conducted in 2003 in specialist cancer hospitals in the Ulm Medicine University, the Curie Institute and the European Oncology Institute. Two hospitals used real medical data from their own databases, while the third focused on data mining. "Our platform helped to define some predictive diagnostic genes for identifying genes of interest in bladder and pancreas cancer," notes the coordinator.

He believes the project's technology could benefit a variety of other medical and biology domains. Among them are genetic diseases, therapeutic targets and drug discovery, genotyping and biotechnologies in general. Others include the management of genetic databases, where the software could enable quality assessment and automation.

By mid-2005, the partners will have a commercial product for biology labs, adding a specialised bio-pack to the original software. This pack will integrate the project's major results, including the ability to access data from different databases and to upgrade the platform.