This is a brief post regarding why I decided to remove
Bio::FeatureIO from the main core distribution. The fact that it has taken this long for someone to notice is interesting.
tl;dr: In BioPerl, I would be wary of anything using
Bio::SeqFeature::Annotated, or relying on having that particular module’s functionality. It’s considered deprecated; these really need to be converted to use
A bit of back story
In BioPerl, the
Bio::FeatureIO modules were centered around features, in particular GFF3 (and I believe Chado a bit), using
Bio::SeqFeature::Annotated. The idea was that
Bio::SeqFeature::Annotated would have pretty much everything type-checked. The idea is sound, but in this implementation everything was converted into different annotation objects (score, primary tag, all tag information, etc), and all features were checked against SO.
I believe this comes from some conflation about how data is stored in a feature: is it a simple key-value pair, or is it annotation? The idea here was to have this all converted to annotation, but checked against SO when needed using
Now, b/c this completely breaks the
SeqFeature API (e.g.
$sf->score would return a Bio::Annotation::SimpleValue), the class overloaded all its accessors to print strings. This overloading was implemented in all
Bio::Annotation and many other modules (
Bio::Ontology IIRC was also involved). Additional code was additionally written to rely on
This of course led to all sorts of hard-to-debug problems (
Bio::Annotation was never meant to be overloaded, and lots of modules expecting objects got strings instead), not to mention performance issues.
Sadly enough, at the same time additional code was written that required this behavior. Then… the original developer basically quit working on the code, leaving an unfinished set of modules deeply integrated into BioPerl, changing behavior of several other core bits, and also having other code (Chado, and additional parser modules) reliant on them.
This ended up being the main blocker for a v1.6 ‘stable’ release. There was no way forward with the current implementation, and no replies from the developer in question to address the problems, so I had to rip that stuff out of core to basically put us back on the road to a new release.
My plan is to release
Bio::FeatureIO as a separate distribution on CPAN, as it was originally written. This is most of the way there; I will likely release this in the next few days (it will have it’s own version number, v1.7.0). However,
Bio::SeqFeature::Annotated is deprecated, and any code reliant on it should be migrated to another module. This was announced prior to the 1.6 release and reiterated on the mailing list a few times.
Bio::FeatureIO is also being rewritten with a simpler seqfeature class in mind, possibly using Rob Buels’
Bio::GFF3 or Barry Moore’s GAL tools.
Just so this is clear, I have no hard feelings today towards the developer involved here. The idea in general made sense, even if the implementation in practice was problematic. I think it’s a great idea to try messing with core code (and I applaud them for trying, just as much as I smack my forehead that this wasn’t done on a branch)
What this has done, for me at least, is to try and promote discussing major code changes on the mail list, but suggest implementing them on a branch. Strangely, no one ever seemed to work off a branch in BioPerl; maybe this has to do with the nature of the code, or that the project used Subversion at the time, or simply that no one understood how branches worked. Thankfully, we switched to git and that has made a world of difference :)
Bio::FeatureIO v. 1.6.902 is now on CPAN.