Archive for the ‘linux/OSS’ Category

Sort of a busy day, but I just wanted to comment on a talk by Peter Suber

Friday, April 4th, 2008

I’ve been trying to get training on one of the department’s common-use instruments for some time now, and finally the person in charge is going to take care of me.  Unfortunately this means that most of my day is full, and I won’t be able to write as in-depth of a discussion as I’d like.

Yesterday, I watched a talk by Peter Suber on what Universities can do to promote Open Access.  It was interesting to me, even thought I am not really in a position to influence this level of policy at my institution.  Dr. Suber seems to place a lot of weight on institutional repositories as a good way to sort of do an end-run around strict copyright regulations from publishers.  This is called the “green” road to OA.  Much of his talk focused on ways to “encourage” or “mandate” authors of academic manuscripts to archive their work in these institutional repositories.

I find this idea interesting.  Our university does already have a repository, but I have to admit that I didn’t know this until I did a search today.  The policy “invites” authors to submit their work, but as far as I know there is no requirement to do so.  It seems that the author must seek out the repository and take the initiative for the deposition themselves.

To me, the keystone to traveling the path to OA down the “green road” lies in interoperability.  It’s great if all of Harvard’s research is in their institutional repository, but the key is in creating sort of a “shadow” literature database - one that combines the contents of all institutional repositories in one easy to index and search location.  I think this can be accomplished by agreeing to mark works in these repositories by a standard set of metadata tags.  Standalone software can then be written which can mine the repositories for these tags and parse the manuscripts accordingly.

This is already accomplished by some of the software tools being used to build the repositories.  The key is in making sure that all of the institutional repositories are on board with a common system.  I’m not a librarian, so I’m not sure about the best way to go about doing this.  I do think, however, that only when there is an interconnected network of repositories at a majority of institutions will this resource become a go-to directory for academic work.  We can see this occuring already at the magnificent arXiv.

I’d love to go more into what is being done on this front in other fields as well, but unfortunately the time has come for me to do some “real” work.  Please comment with your thoughts and other repository aggregators that you know of.  We can continue the discussion in the comments section, and also with posts at a later date.

Open Source LIMS solutions?

Wednesday, April 2nd, 2008

Much is made over the extra effort it would take to produce scientific manuscripts with XML formatting in order to facilitate machine reading and therefore increase our ability to catalog and cross-reference the content therein.  While I don’t necessarily agree that the effort itself would be as difficult as some might think, I’m of the opinion that the hard part is just getting people to do it in the first place - a paradigm shift in how the papers are written.

As it stands, writing papers are already sort of tough.  You have to convert a large amount of often somewhat disjointed and relatively uncatalogued data (mostly in the lab notebooks and brains of the people who did the work) into a concise and understandable document, complete with figures and references.

One way to both smooth the production of XML-rich manuscripts as well as the actual process of writing the papers would be to place the data, more or less as it is collected, into an information management system, part of which would (mostly behind the scenes) index the data with XML tags.  Many products have been developed to handle research information, and these are usually referred to as Electronic Laboratory Notebooks (ELN) or Laboratory Information Management Systems (LIMS).

Unfortunately, most of the LIMS systems I have come across are closed-source applications marketed to industry or medical labs at a high cost.   They tend to be rather inflexible, and often require the users to learn a customized method of interacting with the software which doesn’t correlate to any other user experience.  I would much rather see an open-source, cost-effective (free would be nice) system marketed to all labs (although my main interest is in academic adoption).  The software should have an intuitive user-interface, and the more it resembles tools that the researchers may already be familiar with the better.

For instance, one component that comes to mind is a Twitter-like box for entering the day’s experiments.  These could be brief statements (”Performed minipreps on overnight cultures from 4/1/08.  Stored samples at -20 C”) that would be aggregated into a timeline by the software package.

(more…)

Some Python code for you to ridicule

Tuesday, April 1st, 2008

Mrs. PA is also working on her thesis, and she came to me with what sounded like a simple programming problem.  She has a lot of gels from different samples, with many bands in each sample’s lane.  What she wants to do is: given the sizes of the bands, place them into bins of a range of sizes.  There are so many samples and so many bins that this is very tedious to do by hand.

I thought that this would be no problem at all and sat down to work.  I decided right away that I’d use Python to do the chore, since I worked with it for my own project a while back and remembered that it was pretty good at this sort of thing.  Well, after a lot of cursing and tracebacks, I’ve managed to hack together something that works.

(more…)

The Return of the Search Box - I think PLoS is gonna pull through

Monday, March 31st, 2008

:: Caution - Shameless Butt-kissing ahead ::

Kudos to the web team at PLoS.  When I am on their website, I don’t feel like I’m reading a journal - I feel like I’m using a very slick application to learn some exciting stuff.  There are so many little things about their sites that I like: the layout, color schemes specific to each journal, the right-hand bar when you’re reading an article with the “interactive” features presented, and on and on.  They’ve really done a remarkable job in coming up with ways to move beyond the flat text of a standard journal’s online access.

I say this for a few reasons.  First of all, because I really mean it - I have a lot of respect for what they’ve done so far and look forward to future developments.  Secondly, because I’d still love to at least interview for a job there, as soon as I can figure out what I’m most qualified to do for them.  Finally, because I’ve been dogging them over the last couple of weeks as they’ve wrestled with some major performance problems.

As I was writing the previous entry for PA, I noticed that after several weeks’ absence, the search bar has returned at the top of their site.  Not only is it there, it seems to be working quite nicely!  Finally I was able to start digging through some of the archives for specific things.

Overall the site seems to be moving quite nicely now.  I was able to access comments and ratings for different articles, however I do have problems submitting new content.  I’m not sure if this is an issue with their site or the fact that I’m running the Firefox Beta, which has a few quirks left to iron out.

Regardless, I wanted to congratulate the PLoS web team on what I’m sure was a few weeks of very hard and stressful work (which may or may not be over yet) and thank them for a job well done.

Open Access Prodigy

Friday, March 28th, 2008

The idea of a database of scientific facts gleaned from research publications reminded me of this old IBM commercial:

This quote really got me:

Collecting data is only the first step toward wisdom, but sharing data is the first step towards community

I feel like I remember some internet-based project along these lines… People could go to a website and type in “facts”, which the AI would then learn. Does anyone else remember more about this? I can’t even recall the name at the moment.

::EDIT:: I think I was thinking of Open Mind Common Sense from MIT

Whuffie: The currency of the meritocracy

Tuesday, March 11th, 2008

Cory Doctorow is fairly well known as a champion against DRM and strict copyright regulations.  He’s one of the contributors to the massively popular blog BoingBoing, and also a (science?) fiction writer.

I’ve read all of his work (I think), but the story which has had the most lasting effect on me was his first, Down and Out in the Magic Kingdom (just an aside here: Doctorow releases all his work under a Creative Commons license, so if you click that last link and look at the top of the page you will see a “Download for Free” link.  This is indeed the full novel if you’d like to read it for yourself).  (more…)

A funny linux bug

Wednesday, March 5th, 2008

I’ve been running the alpha of Ubuntu’s upcoming Hardy Heron release for a month or so now.  Since the software is still being developed, they push out updates pretty often.  Well, after this morning’s updates, for some reason my keyboard layout would randomly start typing in other languages.  It seemed to have a particular preference for Arabic.

After checking out the system a bit, Iرeَلِزe

see, there it went again.  Anyway, the bug seems to be with the Smart Common Input Method (SCIM).  It’s pretty easy to put back to English (I just have to click the icon and select the standard keyboard layout again), but I still haven’t figured out what’s caused it to go haywire in the first place.

As far as bugs go, this one is pretty mild and actually sort of funny.

::EDIT:: Harry seems to have figured out the pattern.  It was happening when I hit shift-space, which occurs when I type too fast.  I removed that from the triggers for SCIM and we’ll see if that helps :)