Monday, August 11, 2014

Our Approach

Python is used to implement the bioinformatics algorithms that follow, simply because that was the language that my colleague had started with.  Python is a fine language for many reasons, not least of which is the fact that it seems to be easier for many beginners to learn, so there was no compelling reason to switch.

Python is an interpreted language, vs a compiled language like C, which many people equate with being ‘slower’.  However, with good algorithm design it is possible to solve all of these bioinformatics challenges well within a 5 minute time limit.  T he “traditional” implementation of Python (aka cPython) from www.python.org was used.  Everything was coded within the 2.7x branch, not the 3x branch.  A few add on packages are used for some solutions, but they will be pointed out when in the problems that use them.   PyPy was used occasionally for timing comparisons, but none of the solutions require it.  And finally, the solutions are not dependent on high end development hardware.  My colleague’s somewhat older laptop (~4yrs or so) was used for all of the problems and able to successfully solutions within the specified time limits.  Specs for the laptop are: Dell Latitude E4310, Windows 7 64-bit, dual 2.40GHz processor, and 4GB ram.  Not the worst system in the world but definitely not cutting edge.

This is not to say all of the algorithms are optimal and cannot be improved.  Generally, if an algorithm provided a solution within the parameters of the problem then it was left alone.

A quick soapbox
Now for a quick rant against Python.  Actually, this is not specific to Python, it applies to many other languages such as R, etc… that are traditionally implemented as interpreted languages.  And to be fair, the rant is against the ‘APPROACH’ that many people bring to programming projects in these languages, not the languages themselves.

Interpreted languages are also commonly referred to ‘scripted’ languages.  i.e. you write ‘scripts’ not ‘programs’.  Yes, this is purely semantics, BUT to many people, scripting implies an approach that requires much less rigor (or has no rigor at all).  The scripting mindset tends to be completely opposed to concepts such as planning, developing pseudo-code, formal debugging, etc…  This unfortunately does not just manifest itself in the language used (i.e. tweaking code to get it work, or work better as opposed to debugging and profiling), but in how the work is performed. 

Imagine Jeff Foxworthy as a computer science graduate, “You might be a scripter / hacker, if:”
You think comments are for wimps and whitespace or indenting is sheer folly.
You think a solution CANNOT be optimal unless it fits on a single line.
You think variable names should be as uninformative as possible, preferably a single character.
You think randomly changing parts of code is the ideal way to debug or optimize.

Obviously changing the mindset of everyone so as to reach consensus on the best approach to programming is not a battle that can be won.  Nor is this to say that you CANNOT be a good programmer if you prefer the scripting / hacking approach.  HOWEVER, this approach to programming is NOT the most conducive to a teaching environment.  Therefore the solutions that follow attempt to describe the problem being solved, explain the method of developing the final algorithm, and use generally accepted coding practices.  If there is a decision to be made between clarity and compactness, the intent is to always choose clarity.


Besides, if you really don’t like this approach you can always remove the comments and condense it down to one line afterwards J

No comments :

Post a Comment