Building a better biomedical knowledgebase
TSRI Associate Professor Andrew Su, who co-led the new projects, told us the researchers initiated the project because open biomedical data is powerful. “For all the effort and resources we (as a society) put into doing biomedical research, we currently don't do a great job of making the products of that research usable and accessible,” he said.
“All science (including drug development) benefits from building on past research and knowledge. Making that knowledge more accessible will naturally make science more efficient,” explained Su.
However, according to the researchers, the biggest challenge in the past has been the stability of resources.
“You can imagine that creating a ‘database of all biomedical knowledge’ is a pretty monumental task, and not many groups have the technical know-how to do it,” said Su.
Yet, Wikidata, which is run by the Wikimedia Foundation, the group that runs Wikipedia, has experience scaling this type of community resource, “so we're hopeful now is the right time,” added Su.
The next biggest challenge will be representing the data – a challenge which the researchers refer to as "data modeling."
“If two people are contributing similar types of data to wikidata, then they have to converge on some common way to store that data in wikidata,” explained Su. “Here, biomedical ontologies are very important, and thankfully that field has really come together in the past 10 years or so. So again, it seems like everything is converging at the right time here.”
The researchers have already reached out to several groups, and they have been happy for the researchers to load data on their behalf. Other groups have also independently congregated on wikidata. For example, scientists at the CDC have looked into depositing data and researchers at NIH are interested in creating data hub for information related to the Zika virus.
Additionally, TSRI Research Associate Sebastian Burgstaller-Muehlbacher has already added data on all human and mouse genes, all human diseases, and all drugs approved by the US Food and Drug Administration, and others have added data focusing on microbial genomes.
According to the researchers, with all this information collected in a single system, researchers can more easily spot connections between diseases, pathogens, and biological processes.
While the researchers have not reached out to pharma yet, they believe (hope) that there would be “quite a bit of precompetitive data that they would be willing to share in the spirit of open data.”
Ultimately, the researchers hope to compile a comprehensive database, which is easy to search and open to all.