Archive for June, 2008

A weekend of hiking and old games

Monday, June 30th, 2008

I spent the end of last week wrestling with Django and not getting very far. I’m trying to recode this site more or less from the ground-up in order to implement a few features that I’d like. Unfortunately I seem to progress rather quickly through the verbose error messages to the non-verbose and impossible to decipher sort. After a few days of this I was in need of a nice break.

Fortunately the weekend came around, and it was off to the woods for another round of geocaching. We chose 4 sites in a state park near us and took off, finding 3 of the 4. We probably would have gotten all of them if we hadn’t bitten off a bit more than we could chew for the actual hiking portion; we ended up going something like 6 or 7 miles if the park map is to be believed.

After collapsing back at the house, I decided to install and play around with some old computer games. First I gave Space Quest 6 a shot, but that one seems to have a bug that makes a certain portion unpassable. After that I installed Grim Fandango, a LucasArts puzzle game. It was a lot of fun (although frustrating as only these types of games can be) to play through some of that with Mrs. PA.

I also installed two demos of newer games off of Steam: Overlord and Audiosurf. Overlord received some good reviews in the gaming press when it came out a while back, but the demo was sort of “meh”, and I think at $39.99 I’m going to pass. Audiosurf, on the other hand, was a lot of fun. This $10 game allows you to play sort of a musical tetris, with your own audio files as the source material. You select one of your MP3s, and the game auto-generates a level based on the sonic profile. You then fly your ship through the level, collecting colored blocks in different lanes to make rows. It’s a bit hard to describe, but I highly recommend checking it out.

Oh, and I also got to watch Spain beat Germany in Euro 2008 :) It was a nice weekend! Now the week is upon us once again, so it’s off to the lab…

ChemSpider tantalizes me with the promise of article markup tools

Thursday, June 26th, 2008

From the ChemSpider blog:

We are adding our finishing touches to some markup tools for Open Access articles at present and they will unveil shortly.

Don’t leave me hanging! Are these automated tools that put XML all over the manuscript, making it facile to slice and dice the articles as we please? Do the tools you’ve developed do the slicing and dicing largely?

Regardless, I’ll be keeping my eye on further developments.

More fun & games with ElementTree

Wednesday, June 25th, 2008

I’m still deeply in love with ElementTree. Here’s a script that will give you all the lines of a given player. Note that the import area has changed, as on this machine I’m using Python 2.4:

#!/usr/bin/python

# Initialization
import elementtree.ElementTree as ET
import os, string

# Read in our XML file
infile = raw_input("Input XML file name > ")
xmltree = ET.ElementTree(file=infile)
rootelem = xmltree.getroot()

speaker_find = raw_input("Which player's lines do you wish? > ")

act_list = rootelem.findall('ACT')
for act in act_list:
	scene_list = act.findall('SCENE')
	for scene in scene_list:
		speech_list = scene.findall('SPEECH')
		for speech in speech_list:
			speaker_list = speech.findall('SPEAKER')
			for speaker in speaker_list:
				if speaker.text == speaker_find.upper():
					print speaker.text
					lines = speech.findall('LINE')
					for line in lines:
						print line.text
					print ''

and the output:

Input XML file name > hamlet.xml
Which player's lines do you wish? > osric
OSRIC
Your lordship is right welcome back to Denmark.

OSRIC
Sweet lord, if your lordship were at leisure, I
should impart a thing to you from his majesty.

OSRIC
I thank your lordship, it is very hot.

OSRIC
It is indifferent cold, my lord, indeed.

OSRIC
Exceedingly, my lord; it is very sultry,--as
'twere,--I cannot tell how. But, my lord, his
majesty bade me signify to you that he has laid a
great wager on your head: sir, this is the matter,--

OSRIC
Nay, good my lord; for mine ease, in good faith.
Sir, here is newly come to court Laertes; believe
me, an absolute gentleman, full of most excellent
differences, of very soft society and great showing:
indeed, to speak feelingly of him, he is the card or
calendar of gentry, for you shall find in him the
continent of what part a gentleman would see.

OSRIC
Your lordship speaks most infallibly of him.

OSRIC
Sir?

OSRIC
Of Laertes?

OSRIC
I know you are not ignorant--

OSRIC
You are not ignorant of what excellence Laertes is--

OSRIC
I mean, sir, for his weapon; but in the imputation
laid on him by them, in his meed he's unfellowed.

OSRIC
Rapier and dagger.

OSRIC
The king, sir, hath wagered with him six Barbary
horses: against the which he has imponed, as I take
it, six French rapiers and poniards, with their
assigns, as girdle, hangers, and so: three of the
carriages, in faith, are very dear to fancy, very
responsive to the hilts, most delicate carriages,
and of very liberal conceit.

OSRIC
The carriages, sir, are the hangers.

OSRIC
The king, sir, hath laid, that in a dozen passes
between yourself and him, he shall not exceed you
three hits: he hath laid on twelve for nine; and it
would come to immediate trial, if your lordship
would vouchsafe the answer.

OSRIC
I mean, my lord, the opposition of your person in trial.

OSRIC
Shall I re-deliver you e'en so?

OSRIC
I commend my duty to your lordship.

OSRIC
Ay, my good lord.

OSRIC
A hit, a very palpable hit.

OSRIC
Nothing, neither way.

OSRIC
Look to the queen there, ho!

OSRIC
How is't, Laertes?

OSRIC
Young Fortinbras, with conquest come from Poland,
To the ambassadors of England gives
This warlike volley.

It’s pretty great… 25 lines of code, and I’m still doing things in sort of a long way. For instance, I’ve already figured out how to rewrite the counting script from my previous post using the getiterator function:

#!/usr/bin/python

# Initialization
import elementtree.ElementTree as ET
import os, string

# Read in our XML file
infile = raw_input("Input XML file name > ")
xmltree = ET.ElementTree(file=infile)
rootelem = xmltree.getroot()

speaker_find = raw_input("Which player's lines do you wish? > ")

i = 0
speaker_list = xmltree.getiterator('SPEAKER')
for speaker in speaker_list:
	if speaker.text == speaker_find.upper():
		i = i + 1
print i

This steps through all the levels of the XML document for us, so there isn’t any need to do that manually. It’s facile to rewrite the “Player’s Lines” scripts similarly:

#!/usr/bin/python

# Initialization
import elementtree.ElementTree as ET
import os, string

# Read in our XML file
infile = raw_input("Input XML file name > ")
xmltree = ET.ElementTree(file=infile)
rootelem = xmltree.getroot()

speaker_find = raw_input("Which player's lines do you wish? > ")

speeches = xmltree.getiterator('SPEECH')
for speech in speeches:
	speaker = speech.find('SPEAKER')
	lines = speech.findall('LINE')
	if speaker.text == speaker_find.upper():
		print speaker.text
		for line in lines:
			print line.text
		print ''

So far I’m still learning a lot with every poke at ElementTree. I’m sure I’ll be adding more as I go.

Hope you find this sort of thing interesting.

Using Python to parse XML is easier than it should be

Tuesday, June 24th, 2008

A few months back when I was just starting to poke around with Python, I saw this XKCD comic come through my RSS feed (my apologies if this clashes with the right hand sidebar; maximizing your window might help):
import soul
XKCD
At the time, I thought it was sort of funny, more for the complete nerdiness of creating a pet from an Eee PC and a hamster ball than anything else. The kicker at the end about importing a soul was just icing.

I bring this up because in preparation for the Elsevier Article 2.0 Challenge coming up in September, I wanted to start spending more time learning how to handle XML files. Since Python has become my language of choice (ok, full honesty - it’s the only language I can speak at all really, and even then only in primitive grunts), I wanted to see how hard it would be to work up an XML parser. It’s really easy. You just have to import it.

import xml.etree.ElementTree as ET

I wrote a very very simple and short script just to make sure that it was as easy as I thought it was, and sure enough this is the case.

xmlparser.py

# Read in our XML file
infile = raw_input("Input XML file name > ")
xmltree = ET.ElementTree(file=infile)
rootelem = xmltree.getroot()

print "This should be root_element"
print rootelem.tag

print "This should print two subelement tags"
for subelement in rootelem:
	print subelement.tag

print "This should print out the content of the sub elements"
for subelement in rootelem:
	print subelement.text

And I used a self-generated test file, test.xml:

<root_element>
	<sub_element>This is a sub element</sub_element>
	<sub_element id="2">This is a sub element with the ID set to "2"</sub_element>
</root_element>

and the output pretty much matches what you would guess:

Input XML file name > test.xml
This should be root_element
root_element
This should print two subelement tags
sub_element
sub_element
This should print out the content of the sub elements
This is a sub element
This is a sub element with the ID set to "2"

This took all of about 10 minutes to do… I’m still sort of stunned.  I’m sure the programmers/Python jockies are laughing right now, but c’est la vie I suppose.

I mean, it’s really almost frighteningly simple.  Let’s try playing with Hamlet, available online in XML format of course.  We can write a quick script to count how often Rosencrantz speaks:

#!/usr/bin/python

# Initialization
import xml.etree.ElementTree as ET

# Read in our XML file
infile = raw_input("Input XML file name > ")
xmltree = ET.ElementTree(file=infile)
rootelem = xmltree.getroot()

i = 0
act_list = rootelem.findall('ACT')
for act in act_list:
	scene_list = act.findall('SCENE')
	for scene in scene_list:
		speech_list = scene.findall('SPEECH')
		for speech in speech_list:
			speaker_list = speech.findall('SPEAKER')
			for speaker in speaker_list:
				if speaker.text == "ROSENCRANTZ":
					i = i + 1

print i

It’s 49, in case you are wondering.

I’m pretty excited in my experimentation with ElementTree so far.  As usual I’ve got a ton to learn, but it’s great to know that this powerful tool was lurking inside of python the whole time.

All right, this is getting a little scary

Monday, June 23rd, 2008

There is a date on the calendar I can point to.  As of this writing, on that day:

  • We won’t have a place to live (lease runs out on our house)
  • Which is sort of good, because our household income will consist of a single grad student stipend

Needless to say, since that date is less than 3 months away, I’m a little nervous these days.

Mrs. PA and I are going into major job hunt mode.  She’s got a few candidates kicking around, but unfortunately I have work to do on getting my grad school situation sorted.  I’ve got a few things out there, but I think it’s (past) time for “the talk” with my adviser, and also a redoubling of my own hunt for a position.

More later.

The #1 article on Reddit at the time of this writing is calling out for some OA commenting love

Friday, June 20th, 2008

06/20/08 11:51AM EST Reddit #1 Story

In case you can’t read it, the title says “Who else is sick of sites hosting research papers that show all their content to Google so it gets indexed, but when people visit, they want you to pay exorbitant fees?

Link

Elsevier Article 2.0 Contest starts September 1

Friday, June 20th, 2008

Over at The Life Scientists room on FriendFeed (thanks to Bill for introducing me to the site), Pierre has posted a link to the Elsevier Article 2.0 Competition.  The details are scarce at the moment, but enough to pique my interest:

We will provide contestants with access to approximately 7,500 full-text XML scientific articles (including images) and challenge each contestant to be the publisher. In other words, each contestant will have complete freedom for how they would like to present the scientific research articles contained in the Article 2.0 dataset.

And there are prizes to be won as well:

First Prize:  	$2500
Second Prize: 	$1000
Third Prize: 	$500

I’m really hoping that I’ll be able to put together an entry. I’m not as familiar with Xquery as I should be, but I’ve got a couple of months to learn I suppose. I’ve already started brainstorming a pile of ideas of course. It’s what I do best.

I’m not sure if the rules will allow for teams, but I think it would be really great if some of the Open Access proponents could work together to generate a really fine product which relies on the many OA resources that are available on the internet. It’s a prime opportunity to demonstrate the value added by opening up the articles in this way.

PlausibleAccuracy does not plan to charge for quoting at this time

Thursday, June 19th, 2008

There has been a lot of heat on the net over the past few days as the Associated Press went after a blog for posting “blockquotes” and has declared that they will levy fees against anyone who quotes more than 5 words of one of their articles.  The predominant opinion seems to be that quoting an AP article falls under the terms of Fair Use, and that it’s ludicrous for them to charge for the privilege.  The AP disagrees:

AP considers taking the headline and lede of a story without a proper license to be an infringement of its copyrights

As I’m sure you can guess, I fall in with the crowd that believes the AP is just being ridiculous. First of all, I’m not sure that the headline of an article is a “Creative Work” (can I call this the Wilbanks Postulate?) Take this headline, for instance (for which I’m likely to be fined I suppose, as it’s over 5 words):

Obama opts out of public campaign finance system

Copyright 2008 Associated Press

This is a statement of fact, not a truly creative work.  I fail to see how one can copyright this and have a legal leg to stand on.

Secondly, a short excerpt clearly falls within the realm of Fair Use, especially given the link back to the article itself.

The problem is that the AP spends a lot of money writing their articles in the first place, and deserves to get some return on them of course.  If a blog comes along and copies the content wholesale, then this is a problem that needs to be addressed.  From my point of view, this looks like an “old media” company backlash against “new media”; an attempt to subvert and stretch copyright law past the breaking point.  They’ve worsened the situation by propping up a shill who claims to negotiate for all bloggers.  I hope that the AP comes to their senses and realizes that cracking down on spam blogs who are scraping their content is a completely different matter from squashing Fair Use and open discussion of world events.

Happenings in the Digital Wonderland - Reddit goes Open Source and Spore Creature Creator arrives

Wednesday, June 18th, 2008

Two items not related to Open Access (at least in a scientific sense) today that I think you’ll enjoy.

One of the largest “social media” sites (an my personal favorite), Reddit, today announced that they would make about 95% of their code open source, including the algorithm which determines the stories that make the front page.  They’ve got a Trac site set up at code.reddit.com interfaced with a Git version control system.  The code appears to be largely built on Python.  I could definitely see this being leveraged to create some interesting science-related applications.

The second item is completely for fun, although xenobiologists might enjoy it as well.  Spore is an upcoming game from Will Wright, the creator of The Sims, and it looks like it’s going to be great.  The player starts out as a single-celled organism in a pool and has to evolve up to a spacefaring civilization.  The game has been in development for a long time, to the point that some people thought it might end up as Vaporware.  Now, however, it has a release date for September.  Even though the full game isn’t out yet, the Creature Creator portion was released yesterday, both as a free trial and (soon?) a $10 full version.  At first I thought it was weird to release the character creation part of a game as a standalone product, until I got to play with it.  It’s really amazing, but also hard to describe.

When you create a new creature, you start with a randomly shaped blob, which is the creature’s body.  You can move it and reshape it, then start adding parts to make your critter.  The parts you add and the conformations you place them in all influence how the creature acts and moves.  I’m really shocked at how seamless and simple it is.  Once you’re happy with your creation, you can easily upload it to the Sporepedia, a conglomeration of everyone’s aliens.  You can also make a video and send it to YouTube with a few clicks.  The quality of the YouTube videos could be better; I feel like there is a compression issue to work out. Nevertheless, here is one I made:

I can’t really describe how much fun it is to play, so you’ll just have to check it out yourself. If you make any creature videos, link them up in the comments :)

100th post retrospective

Tuesday, June 17th, 2008

According to my Wordpress dashboard, this will be the 100th post here at PA.  I’m fairly pleased with the response that the blog has gotten so far, and I feel like my decision to keep the focus tightly on Open Access was a good one.

I have to admit feeling a bit like I’m both preaching to the choir and shouting into the wind at the same time.  Most of the comments come either from a dedicated core (awesome) or others in the OA movement who I presume are getting pings from Google hits (also great).  At some point though, I’d like to start hooking into that next layer; people who might be interested in doing more things with OA but don’t know how or are unfamiliar with all of the great options.  We can sit here all day and pat one another on the back for a point well made, but it’s the public-facing products and discussions that will really make a difference.  By this I mean the things that you’re already familiar with; sites like OpenWetWare, journals like PLoS, public talks by proponents of OA, etc.

You might say I’m interested in expanding PA’s deliverables.

At the moment I’m sort of investigating some of the projects that are already out there.  To be honest, I do feel like the OA movement is in some ways disjointed, in that there are many projects which are being maintained by small groups and have relatively narrow impact.  This isn’t necessarily bad - this is the sort of “grass roots” community-based enthusiasm that many causes would love to have.  I’m more interested, however, in applying whatever minimal force I can bring to bear in a way that has more widespread effects.  I’m not sure what this is yet, but I’m thinking hard about it.

Once again, thanks for reading.  It still amazes me that one person pecking away at a laptop keyboard can engage in these discussions in a (hopefully) productive way.  I look forward to continuing the conversation with all of you.