Leveraging scientific data using the power of the semantic web: Who wants to start?
Monday, March 31st, 2008
As much as I appreciate the mission of PLoS, I haven’t actually gotten around to covering any of their articles in this space. The other day I happened to come across one entitled “Open Access: Taking Full Advantage of the Content“, and knew that I would have to read it.
In the paper, the authors echo the call of Science Commons to work on creating applications which can leverage open scientific content. They describe some of the benefits and current shortcomings of producing manuscripts with XML markup (which provides for more facile machine reading and data extraction). They then go on to argue that the only way to convince people to go through the trouble of creating the machine-readable file is to demonstrate what can be done with the current level of markup and then drawing a picture of what expanding this would do.
The authors have been involved in the development of several data mining tools (BioLit and PubNet are mentioned in this article). I took a look at both of these, and they are definitely interesting. My favorite was PubNet, which allows cross-referencing of PubMed queries. For instance, I could input my name as one query and my advisor’s as another. This can generate a map of co-authorship. I would point out here that there are some issues, such as the fact that someone else with the same name as I (at least according to PubMed) has papers in the database. This throws off the co-authorship map by including things which I’m not looking for. BioLit seemed to have some problems - often I would click on a link and nothing would seem to happen.
The paper continues with more examples of applications and systems people are developing based around open access data, such as SciVee (which I’ve mentioned in the past). They end with a “call to action” for scientists and others to engage in expanding these tools and developing new ones in order to encourage interest in the idea of Open Access.
I agree with the authors of this paper that I’d love to see more things being done with the data we currently have available. If we are able to point out place where having more data would lead to even greater returns, so much the better. I also appreciate that this is published in a journal, as I think this is something that professors are more likely to see (as compared to blogs, where the majority of the conversation has been held). I do wonder if they are “preaching to the choir” a bit, in that the users of PLoS are more likely to be aware of these tools and potential for growth. Perhaps this is partially the point - to engage those who are currently thinking “I really enjoy things like PLoS and SciVee” in a more active role.



