Archive for March, 2008

The Return of the Search Box - I think PLoS is gonna pull through

Monday, March 31st, 2008

:: Caution - Shameless Butt-kissing ahead ::

Kudos to the web team at PLoS.  When I am on their website, I don’t feel like I’m reading a journal - I feel like I’m using a very slick application to learn some exciting stuff.  There are so many little things about their sites that I like: the layout, color schemes specific to each journal, the right-hand bar when you’re reading an article with the “interactive” features presented, and on and on.  They’ve really done a remarkable job in coming up with ways to move beyond the flat text of a standard journal’s online access.

I say this for a few reasons.  First of all, because I really mean it - I have a lot of respect for what they’ve done so far and look forward to future developments.  Secondly, because I’d still love to at least interview for a job there, as soon as I can figure out what I’m most qualified to do for them.  Finally, because I’ve been dogging them over the last couple of weeks as they’ve wrestled with some major performance problems.

As I was writing the previous entry for PA, I noticed that after several weeks’ absence, the search bar has returned at the top of their site.  Not only is it there, it seems to be working quite nicely!  Finally I was able to start digging through some of the archives for specific things.

Overall the site seems to be moving quite nicely now.  I was able to access comments and ratings for different articles, however I do have problems submitting new content.  I’m not sure if this is an issue with their site or the fact that I’m running the Firefox Beta, which has a few quirks left to iron out.

Regardless, I wanted to congratulate the PLoS web team on what I’m sure was a few weeks of very hard and stressful work (which may or may not be over yet) and thank them for a job well done.

Leveraging scientific data using the power of the semantic web: Who wants to start?

Monday, March 31st, 2008

ResearchBlogging.orgAs much as I appreciate the mission of PLoS, I haven’t actually gotten around to covering any of their articles in this space.  The other day I happened to come across one entitled “Open Access: Taking Full Advantage of the Content“, and knew that I would have to read it.

In the paper, the authors echo the call of Science Commons to work on creating applications which can leverage open scientific content.  They describe some of the benefits and current shortcomings of producing manuscripts with XML markup (which provides for more facile machine reading and data extraction).  They then go on to argue that the only way to convince people to go through the trouble of creating the machine-readable file is to demonstrate what can be done with the current level of markup and then drawing a picture of what expanding this would do.

The authors have been involved in the development of several data mining tools (BioLit and PubNet are mentioned in this article).  I took a look at both of these, and they are definitely interesting.  My favorite was PubNet, which allows cross-referencing of PubMed queries.  For instance, I could input my name as one query and my advisor’s as another.  This can generate a map of co-authorship.  I would point out here that there are some issues, such as the fact that someone else with the same name as I (at least according to PubMed) has papers in the database.  This throws off the co-authorship map by including things which I’m not looking for.  BioLit seemed to have some problems - often I would click on a link and nothing would seem to happen.

The paper continues with more examples of applications and systems people are developing based around open access data, such as SciVee (which I’ve mentioned in the past).  They end with a “call to action” for scientists and others to engage in expanding these tools and developing new ones in order to encourage interest in the idea of Open Access.

I agree with the authors of this paper that I’d love to see more things being done with the data we currently have available.  If we are able to point out place where having more data would lead to even greater returns, so much the better.  I also appreciate that this is published in a journal, as I think this is something that professors are more likely to see (as compared to blogs, where the majority of the conversation has been held).  I do wonder if they are “preaching to the choir” a bit, in that the users of PLoS are more likely to be aware of these tools and potential for growth.  Perhaps this is partially the point - to engage those who are currently thinking “I really enjoy things like PLoS and SciVee” in a more active role.

Bourne, P.E., Fink, J.L., Gerstein, M. (2008). Open Access: Taking Full Advantage of the Content. PLoS Computational Biology, 4(3), e1000037. DOI: 10.1371/journal.pcbi.1000037

Talking the talk but not walking the walk?

Friday, March 28th, 2008

I mention in some of the background information on this site that I’m currently in the process of writing my thesis.  However, you won’t find this work-in-progress on any publicly accessible site.  Does that make me a big hypocrite?

Before jumping in, I’ll point out that some people do (rarely) write their thesis and academic publications online and in a raw sort of “as you go” style, although to be fair these are often researchers who are at the bleeding edge of open access.  See Pimm or UsefulChemistry for examples.  I’ve thought about doing this myself, and as someone who cares so deeply about advancing Open Access, why not?

(more…)

Open Access Prodigy

Friday, March 28th, 2008

The idea of a database of scientific facts gleaned from research publications reminded me of this old IBM commercial:

This quote really got me:

Collecting data is only the first step toward wisdom, but sharing data is the first step towards community

I feel like I remember some internet-based project along these lines… People could go to a website and type in “facts”, which the AI would then learn. Does anyone else remember more about this? I can’t even recall the name at the moment.

::EDIT:: I think I was thinking of Open Mind Common Sense from MIT

A reply from John Wilbanks

Thursday, March 27th, 2008

My post from yesterday seems to have gotten attention… from the subject of the post, John Wilbanks.  He emailed me to say that he attempted to comment, but it didn’t go through, so he followed up on his Nature Network blog.

I’m really excited about his response, and I’ll just briefly mention a few of my thoughts.  First of all, he points out the crux of the matter:

It’s a point I have actually removed from my talks recently because I was finding it misconstrued – it’s a little subtle and hard to grok sometimes, and it’s an example of how hard it is for the lawyers and the scientists to understand each other.

To be honest, considering the fact that I was so on board with just about everything else he had to say in the talk, I had a hunch this was the case. Perhaps I didn’t make this clear enough in my original post, but my point was not that I really thought he was saying scientists are all uncreative robots, but that this was how I thought many would parse what he was saying.

He goes on to clear things up in a way that makes sense to me:

The copyright on the overall article, that comes from the connectors and the clause structures, is being used to control the movement of the facts of the experiment

This is an excellent point, and one that I think is a major issue (although to be sure a bit harder to include in a casual talk).

To be honest, my traffic is so low that it’s probably best to move the conversation over to his blog, in order to expand the audience as much as possible.  I’ll continue my own thoughts in the comments there.

Of course comments will be open here as well (assuming you can get them to wind their way through wordpress).  Actually, now I see that Nature Networks requires signup to use, but PA is a bit more open :)

Also, some link love from Open Access News.

Working in the lab: Common space or individual benches?

Thursday, March 27th, 2008

When we think of a scientist at work, we usually imagine a lone figure (probably caucasian male, unfortunately) in a lab coat standing at a bench loaded with apparatus.  Just check out the Google Image search for scientist

The fact of the matter is that unless the scientist in question is very rich and/or a misanthropic hermit, they probably aren’t working in a lab all by themselves.  There will be other scientists there as well.

My question today is to those who have worked in labs themselves.  Do you prefer a setup with individual bench space, or one in which all of the areas are “common use”, and you can just plop down and start working wherever you please?

I’m a fan of everyone having their own space.  I think that this makes people more likely to keep “their” area clean, and if someone chooses to have a messy bench it doesn’t impact you.  Also, it means you can sort of keep things that you use often close at hand, and configure the space to suit your work.  There are drawbacks, of course - territorial disputes being the most common.  In labs where all the space is common use, it’s nice to be able to do your work near whatever piece of equipment you may be using at the time.  In my experience, however, people don’t do as good of a job at tidying up the space when their done, and it can get crowded when several people are using space near one another.

So, what is your favorite way to work?  Why do you like it as opposed to the other way?  How is your lab set up now?

If you aren’t a scientist, are there similar environments in your work space?  How do you handle them?

Are scientific papers “Creative Works”?

Wednesday, March 26th, 2008

A few weeks ago, I saw (and blogged about) a video from a talk at MIT on Open Access.  In it the vice president of Science Commons, John Wilbanks, gave a very interesting overview of why Open Access is such a great thing and some of the ways Science Commons is attempting to build on and leverage information which is free.

A couple of times in the talk, however, he put up some text from a scientific paper, and made a comment that I thought was a bit odd when I first heard it.  To paraphrase, he says something to the effect of “I don’t really see how this is a creative work.  There aren’t too many ways you can rewrite this and have it retain the same meaning”.  Perhaps it’s a sign of how little exposure I’ve had to the laws in the domain of copyright protections, but it took me a bit before I understood why he was saying this.  The reason, of course, is that only “creative works” are considered to be under the jurisdiction of copyright laws.  By attempting to frame scientific papers as something other than creative works, he was implicating that they should not be governed by copyright law.

I have to say that I strongly empathize with the goals of the Open Access movement, but I also strongly disagree with the idea that papers are not creative works. (more…)

I do read papers that don’t come out of David Baker’s Lab

Tuesday, March 25th, 2008

ResearchBlogging.orgBut not today.  To be fair the paper I’ll be talking about today (from today’s issue of PNAS) inolves quite a few researchers from several institutions.  In it, the researchers describe sort of a new way of “solving” protein structures, although the technique they describe really sits at the boundary of solution and prediction.

In effect, the method involves the early stages of solving a protein’s structure using NMR spectroscopy.  This is a well-established method which has yielded lots of structures.  One of the nice things about NMR is that it’s relatively easy (compared to X-Ray crystallography) to get your sample - you “just” need to have a very pure, highly concentrated bit of protein.  Once you have the sample, it’s also relatively facile to collect data on it - typical NMR experiments take a few days, but pertinent information is available in minutes.  Contrast this to X-ray crystallography, in which growing the crystal might take weeks or months (even once the proper conditions are identified), and data collection takes on the order of hours.

The gist of the method described in the paper is this: you collect the “fast and easy” information on your NMR sample, then construct a model of your protein that uses this data to extract fragments of already-solved structures and put them together in a way which matches your NMR information.  It’s sort of like building a new device from LEGO parts which you’ve gotten by breaking up other devices.  Well, to extend the analogy to the breaking point, what they really do is use the amino acid seqeunce of the protein they are examining to pull out a bunch of LEGO pieces (solved structures with similar sequences), and then apply the NMR data to pick the “best” piece for that bit of the protein.  They call this CS-ROSETTA (CS for chemical shift, the NMR data; ROSETTA for the method of picking out the individual segments of structure and assembling them into a new protein).

CS-ROSETTA OverlayNow that I’ve thoroughly muddled a rather efficient and clean approach, how does it do?  Pretty well, according to the authors.  The figure to the right is from the paper, and shows an overlay of the actual structures of some of the test proteins (determined by X-ray crystallography or NMR, in blue) and the lowest-energy structures from their CS-ROSETTA method (in red).  You can see that they overlap quite nicely.

They follow up the test case by using the CS-ROSETTA technique on several structures that were in the process of being solved by a proteomics consortium.  Once again, they find very good agreement between the final structures (solved after the modeling with CS-ROSETTA) and their predicted folds.

As a structural biochemist myself, I’m always interested in new ways to solve structures as quickly and efficiently as possible.  What concerns me is that methods based on simulations, no matter how accurate or elegant, will always be viewed with skepticism by the scientific community.  Although there is a fair amount of simulation already in the “standard” methods of NMR and X-ray crystallography, it seems that this can be more easily justified in the eyes of the community than the types of calculations needed to “solve” a structure with ROSETTA or other computational methods.

Shen, Y., Lange, O., Delaglio, F., Rossi, P., Aramini, J.M., Liu, G., Eletsky, A., Wu, Y., Singarapu, K.K., Lemak, A., Ignatchenko, A., Arrowsmith, C.H., Szyperski, T., Montelione, G.T., Baker, D., Bax, A. (2008). From the Cover: Consistent blind protein structure generation from NMR chemical shift data. Proceedings of the National Academy of Sciences, 105(12), 4685-4690. DOI: 10.1073/pnas.0800256105

Is academia turning too “nice”?

Tuesday, March 25th, 2008

There has been a growing movement in education for probably the past 10 or 15 years.  Call it the “positive reinforcement” model.  Instead of “fail” you have “needs improvement”.  The point of this system seems to be to avoid discouraging students, and instead work with them to improve their learning.  This is an admirable objective, but I often feel like it’s taken too far.

Grade inflation is a serious problem these days, all the way through college in my opinion.  It’s easier for a teacher to barely pass a student who should otherwise fail: they don’t have to deal with as many administrative headaches, the parents won’t come screaming, and hey, they don’t have to teach that poor student again!  The problem is that I feel people have forgotten what it means to “pass” a student - that they sufficiently understand the material to advance to the next level.

Although grad school doesn’t really have “grades” for the most part, I think that this same sort of “positive reinforcement” method has started to negatively impact even the pursuit of a higher degree.  For instance, our department gives a weekly Journal Club, at which a graduate student presents a recent paper from the literature.  In my early years here, it was known that: you should pick a paper that is broadly relevant, not just the latest paper in your specific field; also, you should prepare well and understand the background research.  It was a matter of common knowledge that the faculty would rake you over the coals if you went in without understanding at least the basic supporting information.  While I’ve been here, both of these aspects have changed dramatically.  It’s typical now for a graduate student to choose a paper that is more or less directly tied to their research - you can almost predict the paper if you know the lab the student is from.  Also, the faculty don’t seem to have quite the same fire during the questioning.  Students often get away with a simple “I don’t know” as an answer, without any requests by the faculty for them to at least speculate.  At least this would demonstrate some basic scientific understanding and deductive reasoning.

I think that this has made these Journal Clubs superfluous.  The audience gets nothing out of them, as the papers are so esoteric as to be more or less uninteresting to the majority.  The student gets nothing in the way of education, since they don’t need to do any more than drop the main paper’s figures into a powerpoint and mumble for about 45 minutes.

This is just one small example.  I know that people feel that graduate school is often very tough (and it can be), but the faculty need to hold the students to a higher level of competency if a Ph.D. is going to mean anything 25 years down the road.  With this continual backsliding due to the “positive reinforcement” paradigm of education, it won’t be long before we’ll need to tack another degree on after the doctorate.  Hmm… what percentage of doctorates go on to post-docs again?

In which I gnaw on my fingernails - Sent in a job application

Monday, March 24th, 2008

Perhaps it’s coming off of a relaxing weekend, perhaps it’s just because I’m in a “get things done” sort of mode, I decided to send in a job application today.  It’s for a position at the Research Collaboratory for Structural Bioinformatics (RCSB), specifically at the Protein Data Bank.  This is the main repository for protein structures, and it serves an absolutely critical and invaluable role to the research community.

I had sent in an application for the position once before.  That time, I was rapidly told that they were not hiring any additional people for the position I was interested in.  However, the opening has remained listed on the website, and I simply couldn’t take the first “no” as an answer.

The reason is that, by all appearances, the job seems to be perfect for me.  The criteria they list are more or less an itemized list of my research experience and personal interests.  The RCSB is one of the most established and well-respected repositories of structural information, they are non-profit (yay) and the information they collect is freely available (super yay).  The job is right next door to where my wife might get an offer for a post-doc, and it’s not so far away that we’d be leaving all of our friends behind.

Of course, the very perfection that this job seems to be also serves to make me especially nervous about being turned down for it.  C’est la vie, I suppose.  Wish me luck.