Archive for the ‘linux/OSS’ Category

Django is like an alien spaceship of awesomeness

Thursday, May 29th, 2008

We sit eyeing one another, this spaceship and I.  The power it holds within is clear, but the methodology for harnessing that power escapes me.  It’s evident that a vastly superior being has designed this device to do amazing things, but a manner of interacting efficiently with it is not forthcoming to my uninitiated cortex.  I have managed to move it a bit, but I don’t think that holding a match to the propellant tank is what the designers had in mind.

It’s an enigma, this machine, but I plan to plumb the depths of its intricacies, perhaps learning more about myself in the process… (more…)

Brainstorming a Feature Set for an Open-Source LIMS

Friday, May 23rd, 2008

As I investigate Django, I find myself matching up features of the framework with applications I’d like to implement if I were writing my own Laboratory Information Management System (LIMS).  So far my typical cycle goes something like this:

  • Find new (to me) development framework and do a cursory investigation
  • Work through some basic tutorials
  • Choose one component of a custom LIMS that looks to be the simplest to implement with the new framework and work on it
  • Get bogged down
  • Give up

With Django, I’m at step 3 of the process.  The interesting thing this time is that I can envision solutions to writing several of the LIMS modules I have in mind, rather than a rough idea for one and a hope that I’ll figure out the rest as I go.  Maybe this time around I just “get it” a little more than with previous systems.  Perhaps I just think that I do :)

With that in mind, I’ve turned once again to brainstorming the set of features that I would like to see in a LIMS.  Even if I don’t end up writing the software myself (a likely scenario), it’s worthwhile to have the ideas out there.  Here is my list, but feel free to add any you can think of in the comments.

  • User authentication
    • Django largely takes care of this automatically
  • Manuscript repository with version control (for collaborative document writing)
  • To-Do lists/Workflow management
  • Inventory/Re-ordering management
    • Chemical locations, MSDS links
  • Wiki (with the standard Wiki history allowing for reverts)
    • I’m imagining this as being used for protocols, but it could potentially hold a lot of things
  • Literature repository (can hold actual PDFs or link to Institutional/other open Repository)
  • Calendar (group & individual, perhaps the group calendar just aggregates the others)
  • Research Image repository/browser
  • Personal blogs/microblogs
    • Could use tags/categories to separate “lab notebook” entries from other, less formal posts
  • Portal page which can serve as public lab homepage if desired
  • Grant/manuscript tracking (could be integrated with the workflow manager above)
  • Teaching material repository
  • Automated backup of data
    • Daily/weekly database & file backup
  • Instrument interface API?

That is what I can think of off the top of my head.  Now, tell me all the things I’m missing.

One that I’m aware of is integration of laboratory instruments - the ability to have an instrument dump the data directly into the LIMS.  My reason for leaving this out is that I really think this is the most complicated part.  Every instrument will have different ways of outputting data.  My most ambitious goal would be to have some sort of ability for people to write their own interface modules, which could then be added on by that particular lab.  Even this is a task that I’m not really sure how to start on.

First look: Creating scientific web applications with Django

Thursday, May 22nd, 2008

Unrelated to the actual body of this post, but possibly of more interest to you, dear reader is that I’ve sent in another job application.  This time it is for an Associate Editor position at the esteemed Science magazine.  My qualifications are a bit less than what they seemed to be looking for, so I’m not terribly optimistic (what’s new).  As usual though I’m nervous…  All right, on to the actual post!

I like to think (perhaps a bit ambitiously) that all of my tinkering around has elevated me to the level of “novice” programmer.  I can usually decipher things that others have written (ok, I can often do so), and I’ve written several command-line scripts that will do something useful.  I think one of the key things I’ve learned is that coding is hard, and I have tons of respect for the people who’ve chosen to do this as their career.  Now that I’m starting to get a handle on everything I don’t know, I feel like I’m also starting to find the handholds I need to climb a little farther up the cliff/learning curve.

So far I’ve had the most success writing things in Python.  This is most likely because it’s a relatively simple language, designed to be accessible to noobs like me.  It’s a fine language which tends to do what I like in ways that (more or less) make sense, and since it’s usage is fairly widespread in bioinformatics I don’t feel like it’s a waste of time to learn.

The problem with most of my “applications” so far is that, like I said above, they are uniformly command-line scripts which either take console or text file input.  For my own personal use this is fine - I understand the quirks of the program and am comfortable operating from the console.  This tends to be a barrier to more widespread usage, however.  Most people (who might use one of the things I’ve coded) aren’t very comfortable at all with entering commands into the terminal or editing a configuration file by hand.

So, I wanted to start looking into ways to start writing things that had a friendlier user interface.  I looked into using Glade to make graphical front-ends, but was having trouble wrapping my head around all of the handlers and things.  I was also a little worried that this would restrict the final product to a Gnome-based desktop.  What I really wanted to do was make something accessible via the web, so that I could install the application on our lab’s central machine and let people use it from their own computers.  My problem was that I couldn’t find a decent (i.e. quickly understandable by me) way to build web apps based on Python.  That is until I found Django.

Django is a web framework based on Python that just makes it easy to develop a Python-based application and distribute it via the web.  I haven’t had time to build anything from the ground up yet (I’ve been working my way through the online tutorial/book), but I can definitely see the potential.  I’ve gotten much farther with Django in a much shorter time than with any of the other solutions I’ve looked at so far.

I’ll keep you up to date as I continue my experimentation.

More testing with VMD and Tachyon

Tuesday, May 6th, 2008

I’m still testing out some of the advanced features of using Tachyon to render nice images of biological macromolecules. I came across these beautiful images of bacteria which are able to consume radioactive waste, and decided to tinker a bit to see if I could get something similar out of VMD.

First of all I loaded in my molecule and set it up similar to the exercises from the other day: white background, surface representation, diffuse material. I also added the Depth Cue feature of VMD, which adds a fog which increases in density with depth. This helps to add a bit of a 3D feel to the representation. I also played around with the various lights, settling on having lights 0 & 2 on.
I rendered the image with:

"/usr/local/lib/vmd/tachyon_LINUX" -aasamples 4 -rescale_lights 0.3 -add_skylight 0.9 %s -format TARGA -o %s.tga

Note: this takes about 8 minutes to render on my laptop at about 700×700 resolution.
If my understanding is correct, this should give a scene that is dominated a fair bit by the skylight parameter, and this is more or less the case. The image, while interesting in some ways, is far too bright!
Let’s drop the skylight down then:

"/usr/local/lib/vmd/tachyon_LINUX" -aasamples 4 -rescale_lights 0.3 -add_skylight 0.6 %s -format TARGA -o %s.tga

Well that darkened the shadows a bit, but the overall image is still way too bright. How about dropping the lights?

"/usr/local/lib/vmd/tachyon_LINUX" -aasamples 4 -rescale_lights 0.1 -add_skylight 0.6 %s -format TARGA -o %s.tga

Well, still far too light. What’s happening is that the depth cue fades the image to the background color (in this case white) as it goes. Let’s drop the depth cue density in order to cut back on the lightening. This setting is found in Display–>Display Settings. I adjusted it to a value of 0.15, still using the Exp2 function for the density. When I rendered this (using the same settings as the last one above, it looked OK, but not fantastic. Mostly it was just “flat”, if that makes sense - not a lot of visual appeal. I rescaled the lights back up to 0.3, and this was better.

Something still isn’t “there”, though. To be sure, the tachyon renders look nice, but I just don’t feel like this is the best that can be done. I’ll have to keep toying with it.

Hold the phone here, Tachyon looks pretty nice

Wednesday, April 30th, 2008

In the last post, I went over some of the POV-Ray basics. The toughest part of actually using POV-Ray to render figures of proteins is importing the structure into the rendering package - the complex geometry of the macromolecule has to be translated to the system of simple objects understood by POV-Ray.

As I was looking into some software packages that can output .pov files, I came across another raytracing program called Tachyon which is included (sort of) in the latest version of VMD.  The example images made using the ambient occlusion lighting capability of Tachyon made my jaw hit the floor.  Instead of babbling on about ways to get POV-Ray to play nice, I’ll go over how to get and use VMD/Tachyon.
(more…)

Weekend Liveblog: Switching my home computer from Windows XP to Ubuntu

Saturday, April 19th, 2008

I’ve been using Ubuntu as my operating system at work for about 2 years now. I really love it. I’ve tinkered with making it the OS on my home computer, but since I do a bit of gaming I keep putting it off (as gaming support in Linux tends to lag behind Windows a bit).

I’ve been getting more and more fed up with Windows lately, however, so I’ve decided to give Linux another go. I’ll be liveblogging the changeover for you below.

(more…)

Feasibilty of using LaTeX as a means of generating machine-readable papers

Friday, April 18th, 2008

As more papers are published in Open Access journals, it becomes possible to text-mine these articles and build large, centralized, and cross-linked databases of scientific knowledge.  One of the keys to this effort is the ability to read in the content of scientific articles using computational algorithms.  These algorithms need to be able to “understand” the content, at least to the point where they can properly assign things such as the title, authors, system under study, key findings, etc.

Most people who are working on this problem, which is tied in with the semantic web, believe that in order to do this, articles must be accompanied by markup language, or metadata.  This means that in addition to the actual content of the article (the text of the paper), we need to include some information about the content.  For instance, the raw file might look something like this:

<journal>The Journal I Just Made Up</journal>
<issue>1</issue>
<pages>1-2</pages>
<title>A Study of Everything: Not so complicated as you might think</title>
<author1>Accuracy, P.</author1>

The things between the <> are metadata - they tell a computer something about what is inside.  A program could quite easily scan this raw file and then do things with it, whether that is assigning the content between the tags to a database, or formatting the article for easier reading online.

Markup like this is already very common in web development.  If you right-click this page and then click on “view source”, you’ll see that the raw file used to generate the page is full of tags like this.  Your web browser formats the raw file according to a simple style sheet (also provided by the web developer - you can see the PlausibleAccuracy sheet here) in order to display it on your screen.

So one of the issues of machine-readable scientific articles is that we need to apply some sort of markup like this to the papers.  This requires some cost, both in time and money (you presumably have to pay someone to do it), and as such you have to convince the people involved that it’s worth doing.

There is already document writing software which includes markup and is particularly well-suited to academic papers - LaTeX.  Much in the same way that Cascading Style Sheets can be used to separate web design from site content, LaTeX uses style files to format a content-rich document.  When writing a paper in LaTeX, you simply tag things as you go and focus on writing what is important, rather than wrangling with formatting.  You then process the document and a nice looking file is generated.

It seems to me that writing papers in LaTeX would be a benefit to everyone involved in the publishing of academic papers.  Authors can just write up their results, and spend less time wrestling with Microsoft Word to try and shoehorn the paper to match the journal’s formatting rules.  Journals should be able to write relatively simple style files for the LaTeX submissions to generate documents suitable for publication.  In the same vein, those interested in machine parsing of the documents could write software which also leveraged the markup for their own uses.  Of course, however, there are some concerns.

The main one, in my view is reluctance on the part of authors to learn LaTeX.  It does take a (small) amount of effort to learn, and to be sure the actual writing is more complicated than doing so in a What You See Is What You Get (WYSIWYG) editor like Microsoft Word.  It’s my experience that for some reason people see the time and effort put into learning LaTeX and including markup during the writing process as more work than the formatting changes that they have to make after the fact in Word, even though this may take more time and be more frustrating.

It’s also harder to just read a LaTeX file.  It’s sort of like reading a raw web page without a browser - you stumble across the tags.  More important than this is group editing of a document; most professors like to receive an electronic copy from the student writing the paper, so they can make their edits directly and return it.  As LaTeX tends to output PDF files, this isn’t so simple.  There are other alternatives, like WYSIWYG editors for LaTeX, but the experience I’ve had with them is less than stellar.

In spite of these issues, I think that LaTeX (or perhaps a modified version of it) is an excellent candidate as we think about ways to generate a semantically addressed body of scientific literature.  Perhaps development of specific markup and improved WYSIWYG editors would aid this effort.

Should Open Access walk before it crawls?

Tuesday, April 8th, 2008

A post over on Zzzoot advocating the insistence that Open Access means accessibility by machines as well has stirred up some commentary around the blogs.  I wanted to add my own thoughts.

First of all I’ll say that for me, a scientist working in the developed world at a well-funded institution, the ability to crawl and text mine literature is the primary academic reason to desire Open Access in the first place.  So-called “free access”, which allows for humans to read the papers but forbids machine processing in order to remix and recombine that information is substantially reduced in value from a discovery point of view, however free access remains quite useful for just getting the data out there - to any interested party, not just those with large library budgets.

Bill Hooker points out that the permission to crawl is already in the founding text of the Open Access platform (emphasis mine):

By “open access” to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself.

So it’s clear that removal of these permission barriers is one of the central aims of the OA movement.

I have to argue, however, that perhaps this is one battle that we should wait to fight another day.  We are still operating under conditions in which the majority of published papers are “restricted access”, and the task of changing this paradigm requires a focus of effort to achieve.  I think that it’s a more logical progression to push for Open Access, but be willing to accept (for now) Free Access, as long as we are converting away from Restricted Access.  Once the level of adoption of F/OA has reached critical mass, then we can put more effort into moving everything to “true” OA.

Firstly, I think this is more logical because this achieves the philanthropic goals of OA (making the literature accessible to everyone) in the least amount of time.  It’s one less stipulation that has to be worked out before starting or converting a database/journal from a restricted model.  At the same time, we can work on tools that employ text mining to demonstrate the power of the method.  The next step is to start working on the databases/repositories/journals themselves to convince them to drop the restriction on text mining of the freely available literature.  This is a simpler argument to make than the restricted –> free conversion debate, and by splitting the two up you can tackle the issue without scaring away the other party.

It’s true that this approach might be frustrating - in some ways it may seem easier to go directly from restricted access to OA, in that you only have one step.  I see it more like a reaction with intermediates, however.  First we have to apply the activation energy to get to a transition state of mainly F/OA, and then apply a little more energy to get over the barrier to full OA.  These efforts individually are more tractable than an all-at-once approach.

Short software review: Referencer

Friday, April 4th, 2008

I’ve been writing my thesis in LaTeX.  It’s really magnificent, because I can focus on the content rather than all of the manual formatting that using a standard word processor (like OpenOffice) would require.  Things are made even easier for me because my school provides a LaTeX template file for theses.

One thing that is a bit cumbersome, however, is handling references.  To date, I’ve just been using Hubmed to export BibTex citations for any papers, and pasting them into a running BibTex file.  This does not deal well with duplicates, and is cumbersome to search rapidly.

Today, I saw a link to Referencer (via Lifehacker) and decided to give it a go.  Just a note here before people go scrambling to the comment box: I do know about other packages such as EasyBib and JabRef.  I’ve tried them, and just can’t seem to get into them for some reason.

(more…)

Sort of a busy day, but I just wanted to comment on a talk by Peter Suber

Friday, April 4th, 2008

I’ve been trying to get training on one of the department’s common-use instruments for some time now, and finally the person in charge is going to take care of me.  Unfortunately this means that most of my day is full, and I won’t be able to write as in-depth of a discussion as I’d like.

Yesterday, I watched a talk by Peter Suber on what Universities can do to promote Open Access.  It was interesting to me, even thought I am not really in a position to influence this level of policy at my institution.  Dr. Suber seems to place a lot of weight on institutional repositories as a good way to sort of do an end-run around strict copyright regulations from publishers.  This is called the “green” road to OA.  Much of his talk focused on ways to “encourage” or “mandate” authors of academic manuscripts to archive their work in these institutional repositories.

I find this idea interesting.  Our university does already have a repository, but I have to admit that I didn’t know this until I did a search today.  The policy “invites” authors to submit their work, but as far as I know there is no requirement to do so.  It seems that the author must seek out the repository and take the initiative for the deposition themselves.

To me, the keystone to traveling the path to OA down the “green road” lies in interoperability.  It’s great if all of Harvard’s research is in their institutional repository, but the key is in creating sort of a “shadow” literature database - one that combines the contents of all institutional repositories in one easy to index and search location.  I think this can be accomplished by agreeing to mark works in these repositories by a standard set of metadata tags.  Standalone software can then be written which can mine the repositories for these tags and parse the manuscripts accordingly.

This is already accomplished by some of the software tools being used to build the repositories.  The key is in making sure that all of the institutional repositories are on board with a common system.  I’m not a librarian, so I’m not sure about the best way to go about doing this.  I do think, however, that only when there is an interconnected network of repositories at a majority of institutions will this resource become a go-to directory for academic work.  We can see this occuring already at the magnificent arXiv.

I’d love to go more into what is being done on this front in other fields as well, but unfortunately the time has come for me to do some “real” work.  Please comment with your thoughts and other repository aggregators that you know of.  We can continue the discussion in the comments section, and also with posts at a later date.