Improved data dealing drives drug discovery

Drug research and development is facing a major problem with the sheer influx of data generated, which threatens to further slow down productivity, if its management and interpretation is not efficiently dealt with.

That was the verdict from Sheryl Torr-Brown, worldwide safety sciences for Pfizer Global R&D, who was speaking at the Drug Discovery Technology Europe conference in London this week. Her concerns stemmed from the idea that pharmaceutical companies were not fully extracting and understanding relationships between genes and diseases or compounds and side effects - the fundamental essence in drug discovery.

She explained that the new problems in the management of knowledge required new solutions and pinpointed the greater geographic and cultural dispersion that was occurring as a result of new markets such as India and China entering the domain. As a result, the anticipated explosion of data is now placing new emphasis on interpretation and early elimination of substandard data.

Torr-Brown also identified external pressures for transparency not only from regulatory bodies such as the Food and Drug Administration (FDA) but also from the public. Confidence in drug discovery is at an all-time low following high profile withdrawals of popular drugs.

More significantly she also questioned the future of pharmaceutical companies and their ability to produce numerous blockbuster drugs that characterised the industry in the mid-90's.

"Declining R&D productivity as a result of budget cuts are making it difficult for pharmaceutical companies to produce best-selling drugs. As a consequence, we may be at the end of the blockbuster era, she commented."

"Developing a drug can easily take 15 years and cost between $800 million (€600m) and $1.7 billion. The investment coupled with lack of guarantees make pharmaceutical companies wary of innovation and risk which characterised the blockbuster drug period."

More data has been generated between 1999 and 2002 than that generated in all of the pharmaceutical industry;'s history. Especially within the past 40 years, a fivefold increase in the number of abstracts published in Medline has pushed the number up to 12 million. Most are available online as full text articles.

In addition to this there are patents, internal reports, and other potentially valuable in-house and public sources. Although a small proportion of this information is in a structured form that can be managed using database systems, around 80 per cent is unstructured and written in a natural language. Torr-Brown agreed that there was just too much to review or keep up-to-date with manually.

She said: "The problem is that data is trapped in hierarchical silos, restricted by structure, location, systems and semantics."

"The situation has become a data graveyard."

She focused on a number of potential solutions in rectifying the situation. The problem of data sharing could be dealt with by the introduction of common front-end tools. Examples include a centralised clinical trial management system and interactive voice response systems as standalone applications.

Greater benefits could achieved by using these systems and other electronic tools together as part of an integrated, web-enabled solution that will allow data to be accessed almost immediately and facilitate faster decision-making from trial planning to database lock.

Torr-Brown continued saying that data mining was currently in dire need of query tools that could better pinpoint the most relevant information quickly and cheaply. Indeed, according to studies done by International Data Corporation an enterprise employing 1000 workers wastes nearly $2.5 million per year due to an inability to locate and retrieve information, resulting in lost opportunities and diminished competitiveness.

"It's not all about more. It's about awareness, connection and meaning. Impact vs activity and knowledge vs data. Those are the key issues we must face sooner or later."

A topic of Torr-Brown's speech was the emphasis on a new idea but with a shift in perspective. Ontology is a branch of applied science that deals with the development and use of knowledge networks. Essentially it is the science of context and communication.

Ontology can be used to exploit knowledge underpinning a series of information management applications such as; enhanced information retrieval, text mining, annotation of databases, data mining and tools to facilitate discovery and decision-making.

"Ontology makes knowledge visible and accessible and enables teams to share their knowledge and profit from experience. In this way, knowledge becomes reusable with multiple tools, extending its 'cognitive reach,'" she commented.

This idea of harnessing the complex relationships that exist in the biomedical domain have been taken on by many companies, which have set out to build an integration platform providing related tools focused on drug discovery.

BioWisdom's Discovery ontology products provide a description of the life sciences domain. The Discovery Ontologies contain around 500,000 concepts describing targets, diseases, chemicals, phenotypes, technologies and institutions. By use of intuitive navigational facilities, BioWisdom combines various indexing techniques and the database architecture of biomedical information systems.

Sagitus Solutions' approach is towards exploiting the power of using different ontologies together. The OIL language that combines features from Frames and DLs is designed to represent ontologies developed in different formalisms, allowing information transfer between them.

By representing ontologies in OIL, Sagitus aims to facilitate data mining /management across data sources from genes to clinical trials. The vision is to develop and integrate new and existing ontologies to capture knowledge across the whole drug discovery process.

3rd Millenium has created a system that extracts high throughput biological data and places it in its biological context, thereby enabling semantic integration and greater knowledge of biological pathways. The purpose is to enable high throughput biology to measurable accelerate drug discovery.

Network Inference's Cerebra Server Platform combines with BioWisdom's comprehensive ontologies and deep domain expertise to enable discovery and innovation. They attempt to discover and show associations using a multi-relational ontology.

She summarised by saying the knowledge management landscape was changing, whereby the old technology-enabled data management model was gradually being replaced with technology enabled knowledge management.

"Local communities are becoming increasingly global. The concept of web development is now becoming a more semantic web with an emphasis on ontology driven knowledge management," she concluded.