Saturday, April 9, 2011

Python critique, from a new fan

Sometimes I can be stubborn. Since the early 90's, my preferred script language was Perl. Need something parsed, munged, extracted, or CGI-ified? It was done in Perl. My regular expression skills rivaled those of Chuck Norris.

My employer, however, leans quite heavily towards python. All experienced programmers wind up knowing bits of "many" of the popular programming languages; but with python, I avoided it. Every time I looked at python code written by others, it annoyed me and was a bit difficult to understand.

Now, after writing a few server programs in python, I've become a fan. My preference for script writing is now python, not Perl. Python code tends to be more "dense." I accomplish some tasks in fewer lines of code than in Perl (or C). This dense code tends to be of the "core algorithm" variety, with a lot of boilerplate argument and error checking elsewhere. After becoming experienced in python, I find my own code easier to review, because I have fewer LOC to review for a particular task.

Python is also very JSON-friendly, making JSON output of python data structures trivial. JSON has become the new XML on the web, because it is so compact and easy to read. Compared to perl, in python there is no separate $, %, or & required to distinguish between a value, an array (list) and a dictionary (key/value map). Python's handling of 'None' feels more consistent than Perl's undef behavior.

That said, I would like to highlight some core python problems that I feel marginalize it in key areas, or give it "less than professional" behavior:

  1. Global Interpreter Lock (GIL). Modern multi-core, hyper-threaded processors are very thread efficient, and modern programming languages have followed suit. It is therefore inexcusable that the primary implementation of python remains constrained by a global lock. From a technology standpoint, it is understandable: you are executing a bytecode engine which is executing your program, unlike C or C++ where your program is directly executing on the hardware. Understandable, but nonetheless a major blemish.
  2. Long tracebacks, at the drop of a hat. In my experience, a great many python programs dump a long, user-unfriendly traceback when they encounter some error. Often this is intentional, as tracebacks provide helpful debugging information. But python programmers, in my experience, often rely on this as their method of reporting problems to the user, e.g. relying on python's OSError traceback output to tell a user "file does not exist." Python programs can handle this gracefully, reporting a user-friendly error message, but many do not.
  3. Fragility in exception handling. Related to the previous item. While python-the-language's exception handling seems well defined and predictable, my subjective experience is that many python programs omit, intentionally or accidentally, exceptions that occur less frequently (but do occur in the field). Sometimes it is not obvious at all which exceptions may be produced by a line of code, and you can see an abundance of"oh, we need to handle this [exception] too" commits in git and svn repositories all over the world. Sure, this problem may occur in a C/C++ program, but python just feels more susceptible to this problem than C/C++/Perl.
  4. Lacks an always-sorted data structure. The B-tree was invented in 1972, yet in 2011, python still lacks a standard library key/value data structure with an always-sorted attribute. The closest python comes is with an OrderedDict, a data structure that must be manually re-sorted in its entirety, after new items are added, or heapq with its destructive traversal.

Regardless, python provides enough of a productivity boost that it is my new favorite script language. And that says a lot, from a formerly self-styled Perl bigot.