Archive for the ‘linux/OSS’ Category

Identi.ca launches open source Twitter lookalike

Wednesday, July 2nd, 2008

The number of microblogging services out there keeps growing. While Twitter went through a burst of popularity (and is still in heavy use), frequent service outages have users looking elsewhere. There is a moderately active group over at FriendFeed, and via that resource I’ve heard of a new microblog called Identi.ca. This one looks quite a bit like Twitter (140 character text messages), however it is open source and content is released under a Creative Commons license. Being so new, it is light on features for the moment. As soon as they release an API, I’ll try to work on some manner of Wordpress integration.

The good news is that since it is open source, it will likely be possible to hook into Twitter, meaning that you don’t necessarily have to leave behind any contacts you have on that service. Indeed, I expect this to be one of the first features that the community develops, if the actual dev team doesn’t do it first.

Meanwhile, follow me.

More fun & games with ElementTree

Wednesday, June 25th, 2008

I’m still deeply in love with ElementTree. Here’s a script that will give you all the lines of a given player. Note that the import area has changed, as on this machine I’m using Python 2.4:

#!/usr/bin/python

# Initialization
import elementtree.ElementTree as ET
import os, string

# Read in our XML file
infile = raw_input("Input XML file name > ")
xmltree = ET.ElementTree(file=infile)
rootelem = xmltree.getroot()

speaker_find = raw_input("Which player's lines do you wish? > ")

act_list = rootelem.findall('ACT')
for act in act_list:
	scene_list = act.findall('SCENE')
	for scene in scene_list:
		speech_list = scene.findall('SPEECH')
		for speech in speech_list:
			speaker_list = speech.findall('SPEAKER')
			for speaker in speaker_list:
				if speaker.text == speaker_find.upper():
					print speaker.text
					lines = speech.findall('LINE')
					for line in lines:
						print line.text
					print ''

and the output:

Input XML file name > hamlet.xml
Which player's lines do you wish? > osric
OSRIC
Your lordship is right welcome back to Denmark.

OSRIC
Sweet lord, if your lordship were at leisure, I
should impart a thing to you from his majesty.

OSRIC
I thank your lordship, it is very hot.

OSRIC
It is indifferent cold, my lord, indeed.

OSRIC
Exceedingly, my lord; it is very sultry,--as
'twere,--I cannot tell how. But, my lord, his
majesty bade me signify to you that he has laid a
great wager on your head: sir, this is the matter,--

OSRIC
Nay, good my lord; for mine ease, in good faith.
Sir, here is newly come to court Laertes; believe
me, an absolute gentleman, full of most excellent
differences, of very soft society and great showing:
indeed, to speak feelingly of him, he is the card or
calendar of gentry, for you shall find in him the
continent of what part a gentleman would see.

OSRIC
Your lordship speaks most infallibly of him.

OSRIC
Sir?

OSRIC
Of Laertes?

OSRIC
I know you are not ignorant--

OSRIC
You are not ignorant of what excellence Laertes is--

OSRIC
I mean, sir, for his weapon; but in the imputation
laid on him by them, in his meed he's unfellowed.

OSRIC
Rapier and dagger.

OSRIC
The king, sir, hath wagered with him six Barbary
horses: against the which he has imponed, as I take
it, six French rapiers and poniards, with their
assigns, as girdle, hangers, and so: three of the
carriages, in faith, are very dear to fancy, very
responsive to the hilts, most delicate carriages,
and of very liberal conceit.

OSRIC
The carriages, sir, are the hangers.

OSRIC
The king, sir, hath laid, that in a dozen passes
between yourself and him, he shall not exceed you
three hits: he hath laid on twelve for nine; and it
would come to immediate trial, if your lordship
would vouchsafe the answer.

OSRIC
I mean, my lord, the opposition of your person in trial.

OSRIC
Shall I re-deliver you e'en so?

OSRIC
I commend my duty to your lordship.

OSRIC
Ay, my good lord.

OSRIC
A hit, a very palpable hit.

OSRIC
Nothing, neither way.

OSRIC
Look to the queen there, ho!

OSRIC
How is't, Laertes?

OSRIC
Young Fortinbras, with conquest come from Poland,
To the ambassadors of England gives
This warlike volley.

It’s pretty great… 25 lines of code, and I’m still doing things in sort of a long way. For instance, I’ve already figured out how to rewrite the counting script from my previous post using the getiterator function:

#!/usr/bin/python

# Initialization
import elementtree.ElementTree as ET
import os, string

# Read in our XML file
infile = raw_input("Input XML file name > ")
xmltree = ET.ElementTree(file=infile)
rootelem = xmltree.getroot()

speaker_find = raw_input("Which player's lines do you wish? > ")

i = 0
speaker_list = xmltree.getiterator('SPEAKER')
for speaker in speaker_list:
	if speaker.text == speaker_find.upper():
		i = i + 1
print i

This steps through all the levels of the XML document for us, so there isn’t any need to do that manually. It’s facile to rewrite the “Player’s Lines” scripts similarly:

#!/usr/bin/python

# Initialization
import elementtree.ElementTree as ET
import os, string

# Read in our XML file
infile = raw_input("Input XML file name > ")
xmltree = ET.ElementTree(file=infile)
rootelem = xmltree.getroot()

speaker_find = raw_input("Which player's lines do you wish? > ")

speeches = xmltree.getiterator('SPEECH')
for speech in speeches:
	speaker = speech.find('SPEAKER')
	lines = speech.findall('LINE')
	if speaker.text == speaker_find.upper():
		print speaker.text
		for line in lines:
			print line.text
		print ''

So far I’m still learning a lot with every poke at ElementTree. I’m sure I’ll be adding more as I go.

Hope you find this sort of thing interesting.

Using Python to parse XML is easier than it should be

Tuesday, June 24th, 2008

A few months back when I was just starting to poke around with Python, I saw this XKCD comic come through my RSS feed (my apologies if this clashes with the right hand sidebar; maximizing your window might help):
import soul
XKCD
At the time, I thought it was sort of funny, more for the complete nerdiness of creating a pet from an Eee PC and a hamster ball than anything else. The kicker at the end about importing a soul was just icing.

I bring this up because in preparation for the Elsevier Article 2.0 Challenge coming up in September, I wanted to start spending more time learning how to handle XML files. Since Python has become my language of choice (ok, full honesty - it’s the only language I can speak at all really, and even then only in primitive grunts), I wanted to see how hard it would be to work up an XML parser. It’s really easy. You just have to import it.

import xml.etree.ElementTree as ET

I wrote a very very simple and short script just to make sure that it was as easy as I thought it was, and sure enough this is the case.

xmlparser.py

# Read in our XML file
infile = raw_input("Input XML file name > ")
xmltree = ET.ElementTree(file=infile)
rootelem = xmltree.getroot()

print "This should be root_element"
print rootelem.tag

print "This should print two subelement tags"
for subelement in rootelem:
	print subelement.tag

print "This should print out the content of the sub elements"
for subelement in rootelem:
	print subelement.text

And I used a self-generated test file, test.xml:

<root_element>
	<sub_element>This is a sub element</sub_element>
	<sub_element id="2">This is a sub element with the ID set to "2"</sub_element>
</root_element>

and the output pretty much matches what you would guess:

Input XML file name > test.xml
This should be root_element
root_element
This should print two subelement tags
sub_element
sub_element
This should print out the content of the sub elements
This is a sub element
This is a sub element with the ID set to "2"

This took all of about 10 minutes to do… I’m still sort of stunned.  I’m sure the programmers/Python jockies are laughing right now, but c’est la vie I suppose.

I mean, it’s really almost frighteningly simple.  Let’s try playing with Hamlet, available online in XML format of course.  We can write a quick script to count how often Rosencrantz speaks:

#!/usr/bin/python

# Initialization
import xml.etree.ElementTree as ET

# Read in our XML file
infile = raw_input("Input XML file name > ")
xmltree = ET.ElementTree(file=infile)
rootelem = xmltree.getroot()

i = 0
act_list = rootelem.findall('ACT')
for act in act_list:
	scene_list = act.findall('SCENE')
	for scene in scene_list:
		speech_list = scene.findall('SPEECH')
		for speech in speech_list:
			speaker_list = speech.findall('SPEAKER')
			for speaker in speaker_list:
				if speaker.text == "ROSENCRANTZ":
					i = i + 1

print i

It’s 49, in case you are wondering.

I’m pretty excited in my experimentation with ElementTree so far.  As usual I’ve got a ton to learn, but it’s great to know that this powerful tool was lurking inside of python the whole time.

Elsevier Article 2.0 Contest starts September 1

Friday, June 20th, 2008

Over at The Life Scientists room on FriendFeed (thanks to Bill for introducing me to the site), Pierre has posted a link to the Elsevier Article 2.0 Competition.  The details are scarce at the moment, but enough to pique my interest:

We will provide contestants with access to approximately 7,500 full-text XML scientific articles (including images) and challenge each contestant to be the publisher. In other words, each contestant will have complete freedom for how they would like to present the scientific research articles contained in the Article 2.0 dataset.

And there are prizes to be won as well:

First Prize:  	$2500
Second Prize: 	$1000
Third Prize: 	$500

I’m really hoping that I’ll be able to put together an entry. I’m not as familiar with Xquery as I should be, but I’ve got a couple of months to learn I suppose. I’ve already started brainstorming a pile of ideas of course. It’s what I do best.

I’m not sure if the rules will allow for teams, but I think it would be really great if some of the Open Access proponents could work together to generate a really fine product which relies on the many OA resources that are available on the internet. It’s a prime opportunity to demonstrate the value added by opening up the articles in this way.

More on lab management

Friday, June 6th, 2008

Using web-based tools to manage labs is a key interest of mine.  I believe that these tools are more or less already available, and would greatly aid investigators who are already spread thin due to competing demands on their time.  This is why I’ve made calls for open source LIMS packages as well as taken some initial steps towards building one myself.

My efforts to create a system have not had a high sense of urgency about them.  There are several reasons for this; it will be some time before I have to worry about this myself (if ever), I have other work to do which is actually related to my Ph.D., and because I tend to get discouraged when I run into programming challenges that I can’t handle quickly.

The whole thing has become more important to me, however, since one of my good friends (who posts in the comments occasionally as The Argonaut) is preparing to begin his career as a tenure-track professor.  He needs a solution, and rather soonish.  I’m not sure if I’ll be able to create something functional in the time period he’s got before he starts his job, so I thought I would try to cobble together a list of already-available technologies that I think are really useful, and can be installed today.

First of all, set up a Google Calendar.  Use it.  Post everything there, and share it with your lab.  I’ll repeat this - use it.  It’s easy to think of taking the time to list what you are doing on a calendar as an inconvenience and waste of time, but it’s invaluable (both for your own scheduling as well as your students).

Install some sort of version management system.  Trac + Subversion is a good way to go.  This sort of system is used very often in software development, but I think it has applicability to any project, including research projects.  The system is designed to assign and monitor workflow of a project - just think of it as a file folder of your progress, in every project going on in the lab.  Trac has a built-in Wiki, which you can use to store protocols and other lab-wide documents that you’d like to share.  You can use Subversion to get version control of your grants and papers, rather than dealing with endless iterations of new word documents.  It’s becoming more trendy to use a distributed version control system, so you might look at using Git as opposed to Subversion.  Both should integrate with Trac, although Git requires a plugin (makes sure to check out the entire trac-hacks site, as there are many useful plugins there).

If Trac seems too daunting, you can try out MediaWiki, the software that runs Wikipedia as well as OpenWetWare and many other great sites.  The version control isn’t quite as rigorous (you’re left looking at page edit histories), but it’s a bit more user-friendly.

So at this point you have a calendar and project management running.  This is a pretty solid base, and you’re blowing most labs out of the water as far as organization.  I would leverage the wiki functionality of Trac to build in some other things like inventory management as well.

The last thing you probably want is a public-facing website.  You have several options here, although a content management system (CMS) of some type is going to make life a lot easier.  If you just want a simple website, you can use blogging software such as Wordpress, or MovableType.  These are relatively easy to install, theme, and update with new content.  If you’re looking for something more powerful, you may consider the free and open-source Drupal.  It’s more complicated to use, but also has a lot more functionality.

If you can manage to get all of these running and convince your lab to use them, congratulations!  It will probably take some time to become familiar with using each of these systems, but for the most part they are accessible to novices.  The effort it takes will be well worth it.  Of course, it would be best if all of these functions lived under one roof, rather than split across 3 or 4 different software packages.  This is the goal of a LIMS, and I should probably GB2W on my pet project…

Interview with Lorrie Lejeune on the Science Commons blog

Tuesday, June 3rd, 2008

Donna Wentorth has published an interview with Lorrie Lejeune (of OpenWetWare) over at the Science Commons blog.  It is interesting, and highlights the “non-standard” thought processes going on at OWW:

To paraphrase what we state on the OWW wiki, some users of OpenWetWare think that the best thing to happen would be if somebody “stole” your idea and finished the work before you. Then you could go work on another idea. Good researchers usually have more ideas than they have time to explore, and having more people exploring those ideas will in the end benefit your research.

I am really happy that OWW is around and attracting users (the interview cites approximately 4000). I think it’s a good project with laudable goals. As of now, I do think it’s a bit limited by being so wiki-centric. Wikis are great for certain content, and not so great for others. I’d like to see them implement some other technologies in order to improve site navigation/usability, as well as to add new and interesting features to the toolkit. This would probably require a major reorganization of the current site, however, and I’m not sure if it’s feasible at this time. My gut feeling is that when the site was started as a collaboration between two labs, it just wasn’t designed to be as scalable as it could be. This isn’t anyone’s fault, but it means that they might have to go through some growing pains in order to make the service more attractive to an even broader community.

I also found something of specific interest to me (and perhaps you) in the interview; a link to an open guide/beginning of a book on using Python to do science. If you need me, I’ll be reading it.

Django is like an alien spaceship of awesomeness

Thursday, May 29th, 2008

We sit eyeing one another, this spaceship and I.  The power it holds within is clear, but the methodology for harnessing that power escapes me.  It’s evident that a vastly superior being has designed this device to do amazing things, but a manner of interacting efficiently with it is not forthcoming to my uninitiated cortex.  I have managed to move it a bit, but I don’t think that holding a match to the propellant tank is what the designers had in mind.

It’s an enigma, this machine, but I plan to plumb the depths of its intricacies, perhaps learning more about myself in the process… (more…)

Brainstorming a Feature Set for an Open-Source LIMS

Friday, May 23rd, 2008

As I investigate Django, I find myself matching up features of the framework with applications I’d like to implement if I were writing my own Laboratory Information Management System (LIMS).  So far my typical cycle goes something like this:

  • Find new (to me) development framework and do a cursory investigation
  • Work through some basic tutorials
  • Choose one component of a custom LIMS that looks to be the simplest to implement with the new framework and work on it
  • Get bogged down
  • Give up

With Django, I’m at step 3 of the process.  The interesting thing this time is that I can envision solutions to writing several of the LIMS modules I have in mind, rather than a rough idea for one and a hope that I’ll figure out the rest as I go.  Maybe this time around I just “get it” a little more than with previous systems.  Perhaps I just think that I do :)

With that in mind, I’ve turned once again to brainstorming the set of features that I would like to see in a LIMS.  Even if I don’t end up writing the software myself (a likely scenario), it’s worthwhile to have the ideas out there.  Here is my list, but feel free to add any you can think of in the comments.

  • User authentication
    • Django largely takes care of this automatically
  • Manuscript repository with version control (for collaborative document writing)
  • To-Do lists/Workflow management
  • Inventory/Re-ordering management
    • Chemical locations, MSDS links
  • Wiki (with the standard Wiki history allowing for reverts)
    • I’m imagining this as being used for protocols, but it could potentially hold a lot of things
  • Literature repository (can hold actual PDFs or link to Institutional/other open Repository)
  • Calendar (group & individual, perhaps the group calendar just aggregates the others)
  • Research Image repository/browser
  • Personal blogs/microblogs
    • Could use tags/categories to separate “lab notebook” entries from other, less formal posts
  • Portal page which can serve as public lab homepage if desired
  • Grant/manuscript tracking (could be integrated with the workflow manager above)
  • Teaching material repository
  • Automated backup of data
    • Daily/weekly database & file backup
  • Instrument interface API?

That is what I can think of off the top of my head.  Now, tell me all the things I’m missing.

One that I’m aware of is integration of laboratory instruments - the ability to have an instrument dump the data directly into the LIMS.  My reason for leaving this out is that I really think this is the most complicated part.  Every instrument will have different ways of outputting data.  My most ambitious goal would be to have some sort of ability for people to write their own interface modules, which could then be added on by that particular lab.  Even this is a task that I’m not really sure how to start on.

First look: Creating scientific web applications with Django

Thursday, May 22nd, 2008

Unrelated to the actual body of this post, but possibly of more interest to you, dear reader is that I’ve sent in another job application.  This time it is for an Associate Editor position at the esteemed Science magazine.  My qualifications are a bit less than what they seemed to be looking for, so I’m not terribly optimistic (what’s new).  As usual though I’m nervous…  All right, on to the actual post!

I like to think (perhaps a bit ambitiously) that all of my tinkering around has elevated me to the level of “novice” programmer.  I can usually decipher things that others have written (ok, I can often do so), and I’ve written several command-line scripts that will do something useful.  I think one of the key things I’ve learned is that coding is hard, and I have tons of respect for the people who’ve chosen to do this as their career.  Now that I’m starting to get a handle on everything I don’t know, I feel like I’m also starting to find the handholds I need to climb a little farther up the cliff/learning curve.

So far I’ve had the most success writing things in Python.  This is most likely because it’s a relatively simple language, designed to be accessible to noobs like me.  It’s a fine language which tends to do what I like in ways that (more or less) make sense, and since it’s usage is fairly widespread in bioinformatics I don’t feel like it’s a waste of time to learn.

The problem with most of my “applications” so far is that, like I said above, they are uniformly command-line scripts which either take console or text file input.  For my own personal use this is fine - I understand the quirks of the program and am comfortable operating from the console.  This tends to be a barrier to more widespread usage, however.  Most people (who might use one of the things I’ve coded) aren’t very comfortable at all with entering commands into the terminal or editing a configuration file by hand.

So, I wanted to start looking into ways to start writing things that had a friendlier user interface.  I looked into using Glade to make graphical front-ends, but was having trouble wrapping my head around all of the handlers and things.  I was also a little worried that this would restrict the final product to a Gnome-based desktop.  What I really wanted to do was make something accessible via the web, so that I could install the application on our lab’s central machine and let people use it from their own computers.  My problem was that I couldn’t find a decent (i.e. quickly understandable by me) way to build web apps based on Python.  That is until I found Django.

Django is a web framework based on Python that just makes it easy to develop a Python-based application and distribute it via the web.  I haven’t had time to build anything from the ground up yet (I’ve been working my way through the online tutorial/book), but I can definitely see the potential.  I’ve gotten much farther with Django in a much shorter time than with any of the other solutions I’ve looked at so far.

I’ll keep you up to date as I continue my experimentation.

More testing with VMD and Tachyon

Tuesday, May 6th, 2008

I’m still testing out some of the advanced features of using Tachyon to render nice images of biological macromolecules. I came across these beautiful images of bacteria which are able to consume radioactive waste, and decided to tinker a bit to see if I could get something similar out of VMD.

First of all I loaded in my molecule and set it up similar to the exercises from the other day: white background, surface representation, diffuse material. I also added the Depth Cue feature of VMD, which adds a fog which increases in density with depth. This helps to add a bit of a 3D feel to the representation. I also played around with the various lights, settling on having lights 0 & 2 on.
I rendered the image with:

"/usr/local/lib/vmd/tachyon_LINUX" -aasamples 4 -rescale_lights 0.3 -add_skylight 0.9 %s -format TARGA -o %s.tga

Note: this takes about 8 minutes to render on my laptop at about 700×700 resolution.
If my understanding is correct, this should give a scene that is dominated a fair bit by the skylight parameter, and this is more or less the case. The image, while interesting in some ways, is far too bright!
Let’s drop the skylight down then:

"/usr/local/lib/vmd/tachyon_LINUX" -aasamples 4 -rescale_lights 0.3 -add_skylight 0.6 %s -format TARGA -o %s.tga

Well that darkened the shadows a bit, but the overall image is still way too bright. How about dropping the lights?

"/usr/local/lib/vmd/tachyon_LINUX" -aasamples 4 -rescale_lights 0.1 -add_skylight 0.6 %s -format TARGA -o %s.tga

Well, still far too light. What’s happening is that the depth cue fades the image to the background color (in this case white) as it goes. Let’s drop the depth cue density in order to cut back on the lightening. This setting is found in Display–>Display Settings. I adjusted it to a value of 0.15, still using the Exp2 function for the density. When I rendered this (using the same settings as the last one above, it looked OK, but not fantastic. Mostly it was just “flat”, if that makes sense - not a lot of visual appeal. I rescaled the lights back up to 0.3, and this was better.

Something still isn’t “there”, though. To be sure, the tachyon renders look nice, but I just don’t feel like this is the best that can be done. I’ll have to keep toying with it.