Monday, January 28, 2013

Dyamic Constants. Of the automatic kind

A quickie one -- sometimes we need named constants -
just like "True" and "False" are in Python  - and we'd like they to behave
like named constants - not just a  variable pointing to an int.

So, if the constant you want has an integer value, something that behaves like
Python's True and False is easily achievable with:


class Constant(int):
    def __new__(cls, name, value=0):
        return int.__new__(cls, value)
    def __init__(self, name, value):
        self.name = name
    def __repr__(self):
        return self.name
    __str__ = __repr__

And there you are:
>>> RED = Constant("RED", 0)
>>> RED
RED
>>> RED + 1
1
>>> print RED
RED
Then ...we stumble on the DRY principle. The constant is nice, but the call for its creation requires its name as a string, and we have to associate the resulting object with a variable with the same name. It is a problem faced by "collections.namedtuple", and ORM's all around the Galaxy.

 We could make the code in the Constant class to inject the constant name in the caller frame, yes. But that is ugly, as it kills the "explicit is better than implicit" principle, and would break static code analysis tools everywhere - one can't have code that fiddles with the calling frames and don't feel too hackish for production code anyway:
from inspect import currentframe

class Constant(int):
    def __new__(cls, name, value=0):
        return int.__new__(cls, value)
    def __init__(self, name, value):
        self.name = name
        currentframe().f_back.f_locals[name] = self
    def __repr__(self):
        return self.name
    __str__ = __repr__

And now one can do:
>>> Constant("GREEN", 1)
GREEN
>>> GREEN
GREEN
>>> print (GREEN, GREEN + 0)
GREEN 1
No more WET stuff. But unsactisfactory, nonetheless... Then you think on the ridle: what is that can inject clean names in the current namespace? Assignment statements, also some statements like "for", "with", "except" and so on....but also the "import" statement! T

here it is -- if we can get our constants to be authomaticaly created when they are imported into the current module -- there is a drawback, the import semantics does not allow one to assign values to the names being imported.

A pity - but arbitrary constants would still work when only the name as a value is important. With you, the all new, all magic....DYNAMIC AUTO CONSTANTS


import sys

class Constant(int):
    def __new__(cls, name, value=0):
        return int.__new__(cls, value)
    def __init__(self, name, value):
        self.name = name
    def __repr__(self):
        return self.name
    __str__ = __repr__


class _ConstantFactory(object):
    counter = 0
    Constant = globals()["Constant"]
    def __init__(self):
        self.counter = 0
        self._all_constants = {}
    def __getattr__(self, name):
        cls = self.__class__
        if name in self._all_constants:
            return self._all_constants[name]
        const = cls.Constant(name, self.counter)
        self.counter += 1
        self._all_constants[name] = const
        return const

sys.modules[__name__] = _ConstantFactory()
So, this code as a not-so-well-behaved module, replaces itself in sys.modules with an instance of the _ConstantFactory class .... an almost ordinary class - but which will produce a new Constant object each time an attribute is retrueved from it - and..guess what "from <module> import name1, name2" does?

>>> from ConstantFactory import RED, GREEN, BLUE
>>> RED
RED
>>> GREEN + BLUE
5
>>> print RED, GREEN, BLUE
RED GREEN BLUE

There you are: no writing names twice, no names magic appearing in the module name-space, no declarations needed.

Monday, October 10, 2011

Cartesian Product for Sets

(Python 2.x)

This is a short one - there is a thread on Python-ideas mailing lista bout adding a cartesian product operation for SET types in Python, using the " * " operator for it.

Such an operation would yield a set where each element would be a tuple with one element from the first set and an elemento f the second set, for all possible combinations.

Most likely it won't fly as a language modification, since using itertools.product seems both more explicit and more eficient than having a __mul__ operation defined for set * set.

However, for the few cases on will want a "set" object with a defined cartesian product operation, it is a simple matter, as are simple all thing Python, of adding a __mul__ method to a "set" subclass that calls itertools.product.

One straightforward way of doing so, one would think, would be to write:


>>> import itertools
>>> class mset(set):
...   __mul__ = itertools.product


However, this won't work for an implementation detail in cpython 2: itertools.product is a "built-in" (a.k.a. native code) function. And as such, it won't work as an ordinary instancemethod if put in a class body like above. Python functions addressed in class bodies like this automatically work as methods of the class, available as instance methods - that is, an extra "self" argument is inserted by the inner workings of Python whenever the method is called. Being a native function, the "self" argument won't be inserted by Python - and thus our __mul__ method would require two explicit arguments and would not be fit for acting as a method to be called when the "*" operator is used between our object and another one.

Thus, steps in meta-python. All that is needed for the itertools.product method to work as an operator is to wrap it with a "pure-python" function.

One could write
def __mul__(self, other):
     return itertools.product(self, other)

but...where is the _fun_ of that? It is so boring   it could be written even in Java. 

Instead, we define an "instancemethod" decorator, since pythona lready has the "classmethod" and "staticmethod" counterparts for it. And all an "instancemethod" decorator needs to do is to wrap a function call in "pure-python" so that it is promoted to a method by the type metaclass (the default Python mechanism for that):

import itertools
instancemethod = lambda f: (lambda *a, **kw: f(*a, **kw))
class mset(set):
   __mul__ = __rmul__ = instancemethod(itertools.product)

And there it is - it works now.
One can type soemthing like:
>>> a = set(range(5,10))
>>> b = mset(range(5))
>>> a * b
<itertools.product object at 0xaace10>

>>> set(a*b)
set([(4, 7), (4, 8), (2, 8), (0, 7), (1, 6), (3, 7), (2, 5), (4, 9), (2, 9), (1, 5), (3, 6), (2, 6), (4, 5), (3, 9), (0, 5), (1, 9), (0, 8), (3, 5), (2, 7), (4, 6), (3, 8), (0, 6), (1, 8), (1, 7), (0, 9)])


Friday, January 28, 2011

Ordered Unit Tests

Due to the way classes are constructed in Python, and to how tests are grouped in classes when using the unittest module, ordinarily these tests are executed in arbitrary order (sorted by the method name hash in cPython 2.x)

It is not diffcult to imagine situations when we'd like our unittests to run in order (most likely from simpler to more complex tests). Python 3 metaclasses include a feature that unittest could use to ensure ordered execution of tests, if it where coded to do so (the "__prepare__" method in the metaclass can change the assemble-time class dictionary to an ordered dict, and anottate tehe execution ordeer from there).

However, I wanted an implementation that would work in Python 2. So I came up with this "ordered" decorator that can ensure execution order for tests using unittest. It is deliciously hackish, as you can see: the methods are still called in arbitrary order - but what runs is the decorator code, which then takes care of calling the next test in order, regardless of the one unittest thinks is running. :-)

Appart from having to use the functools.wraps decorator (to preserve method names so that unittest actually finds them -- I didn try without it, it might work), the code is rather straightforward once it is written down.

With you, tonight's metapython bit:

Monday, December 13, 2010

Interval Search

Not that "meta" but interesting:

Someone popped up on Brazilian Python discussion list asking for ideas to optimize this search:
There is a set of merchandise distributors, each being able to deliver goods to one or more zip code ranges.Given a set of ZIP codes, he needs to find out which distributors can make the delivery for a given ZIP code.

For ~100000 zip cods and 1500 ZIP ranges this search takes about 1 min. with pure SQL in a postgresql server with some optimizations.

The implementation here can make the search in ~4s using in memory python data structures.

Leonardo Santagada offered an idea that might be even faster than the one presented here: finding a hash function for the ZIP ranges and building a Btree with then - thereafter, check each ZIP code agains the btree and retrieve the matching ranges.

The problem is that such hashfunction is not trivial (not for me :-p ) to come up with. And event he btree implementation would be an order of magnitude more complicated than this here:
Instead of matching the ZIPs against the set of ranges, one by one, we do match the Ranges against a sorted list of ZIPs - and for each Range get all ZIPs it matches in O(log(N)) time.

I didn't came up with this - credit is due to my friend Humberto Innareli, who had the idea in a chat in the car. (I myself came up with far more complicated, and far worse, ideas in the meantime)


For log(n) searchs in a sorted list, Python's stdlib provides the "bisect" module - its use is straightforward. Some more fancy classes and dictionary manipulation for a comfortable interface, and there it is, in less than 40 lines, running more than 10 times faster than the PostgreSQL solution for the same values. Btw, in the performance test. ZIP ranges are assumed to be no larger than 2/10ths of the total ZIP space, as this is the largest chunk of continuous zip codes mapping to a location in Brazil - if the ranges are all completly random performance drops dramatically - thats is because "harvesting" the zip codes themselves after we found the range of matching ZIPs is made in linear time, and is a function of the range size.


Friday, December 3, 2010

Multiple contexts in 'With' statement: not just for 2.7 and 3.1

As you know one of the syntax inovations for Python 3.1 and 2.7 is teh possiblity of using multiple,
different contexts in a single "with" statement. I.E. it is now possible to write this:


with open("file1") as f1, open("file2") as f2:
    #do stuff


With earlier Python versions it is thought that one is required to write:


with open("file1") as f1:
    with open("file2") as f2:
        #do stuff


But the fact is that if you are using Python 2.5 or 2.6 you are not bound to use this dreaded nested "with" construct.

A simple class could be used to allow the "flat 'with' statement" with no new syntax added - and it can be used in Python 2.6 (and probably 2.5). In fact, I am not not shure if this syntax enhancement is not in direct violation of "Special cases aren't special enough to break the rules". After all, the "for" statement can be used with multiple variables and no one has to write:

for index in range(len(seq)), value in seq: 
    #do stuff


And now, for something completly different, the multicontext class:


# -*- coding: utf-8 -*-
# Author: João S. O. Bueno
# License: Creative Commons Attribuion Required 3.0

class multicontext(object):
    def __init__(self, *args):
        if (len(args) == 1 and 
               (hasattr(args[0], "__len__") or 
                hasattr(args[0], "__iter__"))):
            self.objs = list(args[0])
        else:
            self.objs = args
    
    def __enter__(self):
        return tuple(obj.__enter__() for obj in self.objs)
    
    def __exit__(self, type_, value, traceback):
        return all([obj.__exit__(type_,value, traceback)
                      for obj in self.objs])


Zip. That is all. No need to upgrade your Python if all you need are flat "with"s.
Usage example and test:


>>> with  multicontext(open("%s.txt" %letter, "wt") for letter in "abc") as (a,b,c):
...   a.write("a")
...   b.write("b")
...   c.write("c")
...
>>> a
<closed file 'a.txt', mode 'wt' at 0x7f4172f17ad0>
>>> b
<closed file 'b.txt', mode 'wt' at 0x7f4172f177a0>
>>> c
<closed file 'c.txt', mode 'wt' at 0x7f4172f179c0>
>>>


(and thanks @rbp for pointing out I was offering no prize for the "contests" here #dislexia)

Wednesday, November 17, 2010

X-unzip - and peeping into it

Most of us already know the value of the zip function. (and xzip, in python2).

Once in a while, there might be the need of "unzipping" a sequence: that is - de-interleave a sequence zipped inside other one. So that one could get [[0, 1, 2, 3], [10, 11, 12, 13]]
back from [(0, 10), (1, 11), (2, 12), (3, 13)].

Python being what it is, that is an easy, if not trivial task -
we can just fetch all values on the zipped sequence, and append each one on a list enumerated inside a container sequence - it is easy as:

def unzip(seq):
    start = True
    for next in seq:
        if start:
            result = [[] for _ in xrange(len(next))]
            start = False
        for i,v in enumerate(next):
            result[i].append(v)
    return result


But... what if the original list, with te values to be de-interleaved is itself huge? Or part of a data-stream generated elsewhere? How can we fetch de-interleaved data, in order, without having to read through all that stream?

Well, the function up to this would return us a container sequence where each object would be a generator itself. And when called upon, this generator would manage to fetch new values from the original stream as needed, and share the values of the other data sequences with its sibling generators.

Fear not! That is a task for a couple dynamically generated generators (DGG's) agressively sharing data in their "nonlocal" variable space (otherwise known as closure).
def xunzip(seq):
    it = iter(seq)
    next = it.next()
    width = len(next)
    results = []
    buffer_ = [{"tuple": next, "consumed": 0}]
    # This is a list to workaround the non-existence
    # of the "nonlocal" keyword in python 2
    buffer_base = [0]
    for i in xrange(width):
        def freeze_i(j):
            def unzipper():
                offset = 0
                while True:
                    index = offset - buffer_base[0]
                    if not (offset < len(buffer_) + buffer_base[0]):
                        buffer_.append({"tuple": it.next(), "consumed": 0})
                    buffer_[index]["consumed"] += 1
                    value = buffer_[index]["tuple"][j]
                    if buffer_[index]["consumed"] == width:
                        buffer_.pop(0)
                        buffer_base[0] += 1
                    yield value
                    offset += 1
            unzipper.func_name = "seq_%d" % j
            return unzipper
        results.append(freeze_i(i)())
    return results
And just to finish today's unzipping party, some introspective code so that we can peep all of it taking place:
if __name__ == "__main__":
    import random
    a = zip(range(10), range(10,20), range(20,30))
    #print unzip(a)
    b = xunzip(a)
    finished_count = [0,0,0]
    unpack = [-1, -1, -1]
    while sum(finished_count) < len(b):
        index = random.randrange(len(b))
        try:
            unpack[index] = b[index].next()
            buffer_lenght = len(b[index].gi_frame.f_locals["buffer_"])
        except StopIteration:
            finished_count[index] = 1
            buffer_lenght = "Unknown"
        print unpack, "Buffer lenght: ", buffer_lenght
(Allternatively, fetch all code together from here: http://pastebin.com/fiAq2uy9 )

Thursday, November 11, 2010

"Selfless Python"

Proof of concept of Python methods using no explicit declaration of the parameter Self.

I mean:


class A:
   __metaclass__ = Selfless
   def __init__(a , b):
      self.a = a
      self.b = b


Is made to work from this example.

One common critic to Python is the need of the explicit "self" parameter declaration on function Methods. No matter how this design is optimal in terms of internal consistency, keeping things explicit, and reducing the keyword count, people keep complaining about it.

Given the access granted to the inner workings of the Language, it is easy to arrange an alternate mechanism for class methods that could preclude the declaration of a first parameter to a method. Therefore, people who are seeing complaining could try some shots of "selfless" Python.

My approach here is just as follows: I create a metaclass that uses an alternate way to fetch function methods inside the classes. With the normal "type" metaclass - i.e. normal "new style" classes - any functions declared in a class body are substituted, at class creation time, for a descriptor. When the attribute with the function name in a class, or instance of it, is retrieved, this descriptor creates a method object that wraps the original function. (And yes, the method object is created at attribute access time). This method object, when called, prepends the "instance" object as the first parameter in the list, and forwards the call to the wrapped function.

In this "selfless" proof of concept, I change the functions in the class body for descriptor objects, but descriptors that behave differently: instead of wrapping the function in a method object, they recreate the function when the method is acessed.
A new function object is created with the same code object, the same name, the same closure, but, with a different global dictionary. The global dictionary of the original function is copied and changed to include an entry named "self" which points to the object instance. (Or "None" if the function is retrieved from the class, not from an instance).

Therefore, inside these selfless methods, the name "self" can be used without ever being declared: it will exist as a global vairable at execution time, and point to the instance. Just like the parameter self exists as a local variable and points to the instance in regular Python methods.

This should not be used in production code - it is a good proof of concept - but of course there should be lots of strange and inconsistent behavior associeated with it. One of which is the fact methods can not be used "unbounded" as one does with ordinary python methods, explicitly passing an instance object as the first parameter.


Asa final consideration, Pedro Werneck had an implementation of this "feature" before here: http://www.python.org.br/wiki/NoSelf - but unlike Werneck's example, I don't fiddle with the bytecode or anyother thing to "rebuild" the bytecode so that references to an unexisting local self variable are rewritten, and "self" is artificially inserted as a local variable at runtime. The work presented here is an order of magnitude simpler, and possibly less subject to colateral effetcs.

Nonetheless: I warn again - this is just a toy! Don't use this in "real life"!

Other than that, it works finner and in a simpler way than I expected.

# -*- coding: utf-8 -*-
from types import FunctionType

"""
Changes classes so that each method has its instances changed so
that "self" is a global variable pointing to the instance. 
"""

class SelflessDescriptor(object):
    def __init__(self, func):
        self.func = func
    def __get__(self, instance, class_):
        new_globals =  self.func.func_globals.copy()
        new_globals["self"] = instance
        new_func = FunctionType(self.func.func_code,
             new_globals, self.func.func_name,
             self.func.func_defaults,
             self.func.func_closure)
        return new_func

class Selfless(type):
    def __new__(cls, name, bases, dict_):
        for key, val in dict_.items():
            if isinstance(val, FunctionType):
                dict_[key] = SelflessDescriptor(val)
        return type(name, bases, dict_)


if __name__ == "__main__":
    __metaclass__ = Selfless

    class A:
        def __init__(a, b):
            self.a = a
            self.b = b
    
    a = A("Hello", "World!")
    print a.a, a.b