This is the second of several articles describing my work on a Semantic Web project I hope becomes helpful to Gentoo developers and users. I use Gentoo as an example distribution in this article but creating doapfiend plugins for other distributions is trivial.
In my first article in this series I described DOAP, a vocabulary for describing Open Source project metadata and described some imaginary tools that could be made using Semantic Web technology. Since then I’ve created a tool that does much of what my imaginary tools promised.

Introducing Doapfiend
Some of the metadata DOAP can describe is a project’s name, homepage, description, bug tracker URL, file release URLs and changelogs, screenshot urls, VCS URLs (svn, cvs, bzr, mercurial etc.), wiki URLs, programming languages, licenses, names and email addresses for developers, documenters, translators and more.
Doapfiend is a command-line client and Python library that displays or serializes DOAP in several formats. You can also search the Semantic Web with relative ease and actually do stuff with that metadata.
Doapfiend is entirely plugin based. Some of the plugins available let you search for DOAP using a Gentoo package name, a SourceForge, Freshmeat, Ohloh or Python Package Index project name, or a project’s homepage.
You can simply display DOAP in human readable format or serialize it in various formats. A very basic plugin takes DOAP and creates a very basic skeleton ebuild or a webpage with HTML and CSS.
The simplest usage example, and quite boring:
doapfiend -u /path/to/somedoap.rdf
or
doapfiend -u http://example.com/some.rdf
This will display the project metadata in human readable format, nicely formatted. It’s not terribly exciting, but you get concise information quickly.
If you only want some metadata about the project, use the –fields option. Say you want to find out the project’s subversion location URL and the homepage URL:
doapfiend -u some-project.rdf --fields svn.location,homepage
That’s a little more exciting, but what if you don’t know what kind of version control system they’re using? And you don’t know where their DOAP file is on the web or even the homepage of the project, but you know the name your pacakage manager knows the project as.
doapfiend --gentoo dev-python/doapfiend --vcs checkout
This looks in the portage tree for the doapfiend ebuild, gets the homepage, searches the Semantic Web for DOAP with that project homepage, fetches it and gets the repository. The VCS plugin determines the project uses Subversion and sends the ‘checkout’ command to the repository using the URL it found.
When you create an ebuild, rpm, deb file etc., you need the basic project metadata. I’ve written a very basic plugin that generates a Gentoo ebuild. Say you know the sourceforge name (or ohloh, freshmeat etc.):
doapfiend --sf project_name --ebuild
This prints an ebuild to stdout, nothing too fancy, but it only took about 30 minutes to write the plugin. A more sophisticated plugin would start by showing you file releases and letting you choose, naming the ebuild accordingly etc. We could also determine the programming language from the DOAP and if we have a more suitable ebuild generator, like g-cpan for Perl or g-pypi for Python, call those.
Doapfiend isn’t strictly limited to DOAP files. You can throw any RDF file at Doapfiend and it will try to do something with it. If you have a FOAF (Friend of a Friend) file and it has the person’s Open Source projects listed in it, Doapfiend will print all those project’s homepages. You can add the -f switch and it will search for DOAP for each project and display all the metadata.
Doapfiend API - Don’t Panic
Doapfiend contains a library with a simple API designed to be easy to use for coders who have little or no RDF experience. It’s based on RDFAlchemy, an ORM which uses rdflib. If you’re familiar with SQLAlchemy, you’re all set. The RDFAlchemy API was created to let you create code that uses SQLAlchemy or RDFAlchemy with little to no code changes. If you’re an RDF guru you can drop down to rdflib and access triples after using Doapfiend’s API to search the Semantic Web if you prefer.
If all that means nothing but you know a little Python, here’s how you’d fetch metadata for a project with a SourceForge name of ‘nut’.
from doapfiend.doaplib import get_by_pkg_index
print get_by_pkg_index('sf', 'nut') |
That will print out all of the project’s metadata in plain text, but say you just want a few pieces of information, using the Freshmeat project name:
from doapfiend.doaplib import get_by_pkg_index, load_graph
doap = load_graph(get_by_pkg_index('fm', 'nut'))
print doap.name
'nut'
print doap.created
'2008-04-19' |
So there you have a taste of what you can do with DOAP today. Of course you’re wondering how much DOAP is out there, who’s creating it and how you can create it for your own projects.
Where Does DOAP Come From? Who’s Using It?
Doapfiend uses doapspace.org to search for most DOAP. I get new and updated DOAP URLs from PingTheSemanticWeb.com and re-spider them daily. The Ohloh plugin uses the RDFOhloh website. I started work on doapspace.org last year and have been spidering DOAP, creating DOAP by scraping HTML from SourceForge and other package indexes, used metadata from FLOSSMole, imported and converted Freshmeat’s publicy available data. The Python Package Index provides DOAP for every project listed. All this DOAP is made freely available on doapspace.org.
Today I have approximately 54,000 DOAP files hosted on doapspace.org. That isn’t DOAP for 54,000 different projects, there are duplicates because it’s common to have metadata for a single project from SourceForge, Ohloh, Freshmeat and PyPi, for instance. I’ve monitored that last 17,000 SourceForge project releases and created DOAP for each. I’m about 99% happy with my SourceForge spider. When it’s ready I’ll spider all of SourceForge and keep that metadata up to date ‘in real time’.
When I started doapspace.org, having all that duplicated metadata was worrisome. I was trying to figure out which data to serve up. Ohloh doesn’t provide file release info, but Freshmeat does. But only current releases, which is handy, but SourceForge has all the file releases. Was I going to have to figure out what data the client wants then serve up the ‘best’ RDF file?
That was before I realised how flexible RDF is and how easy it’s going to be to aggregate all that metadata about a single project into a single graph. I’m not there yet, but I’m getting there.
In My Next Article in the Series…
In my next article I’ll show you how to create DOAP, put it on the web and get it spidered by a Semantic Web crawler. You’ll learn how to add a few lines to a FOAF file for each project you’re involved with. I’ll also discuss some other vocabularies that tie in with DOAP such as SIOC and BAETLE, the Bug And Enhancement Tracking LanguagE, which will allow a semantic interface to existing bug trackers.
This series of articles will also explore using FOAF and DOAP to make Gentoo’s metadata more easily available to developers and users. For instance, our LDAP information is only accessible to developers. How about taking that info, creating a FOAF file for every developer in their dev.gentoo.org/~user accounts? We could add DOAP for projects or herds they’re involved with automatically, then let them edit from there to add as much personal information as they’d like. See SPARQLbot, an IRC bot (#sparqlbot on Freenode) to see where I’m headed with this.