Friday, November 5, 2010

Python instance methods - How are they different from ordinary functions?

Python OO model works exceptionally well. Its introspection capabilities, and flexibility provided by the descriptors implementation allow for a lot of interesting thing - one of the nicest being what is available through the "property" built-in call.

However, the descriptor model implies in some behaviors that can raise eyebrows, even for experienced users. Like this bit here:
>>> class A(object):
...   def b(self): pass
...
>>> A.b is A.b
False
>>> a = A()
>>> a.b is a.b
False



Eeeeek!  :-) And now what? Well - it happens that while Object methods are not just functions that receive the object instance as its first parameter - that is, methods are objects from a different class than functions - they are not recorded in the class or in the object instance itself. Rather: both bound and unbound methods for new-style classes in Python are created at attribute retrieval-time: The very "a.b" expression above is, internally, a call to a.__getattribute__("b") - and this call uses the descritor mechanism to wrap the function stored in the class __dict__ attribute in an instance method.

So, methods are objects that contain a reference to the original function, but prepends the "self" variable to the function call when they are called for a bound instance:

   
>>> class A(object):
...   def b(self): pass
>>> a = A()
>>> a.b.im_func
<function b at 0x7f6613a6cd70>
>>> a.b.im_func is A.b.im_func is A.__dict__["b"]
True


(extra bonus: I just learnd about the possibility of concatenating the "is" operator here as well :-) )

This way of doing things works fine, in almost all circunstances. However, the only kinds of objects that are turned into "properties which are method factories" at class creation time are functions. If you want to have other kind of callable objects (like any instance of a class with a "call") into a  proper method, you have to create the descriptor for that yourself. It is actually quite simple -you just have to "implement the descriptor protocol" - which is another way to say, you just have to add a __get__ method :-) . Let's suppose I want a method for a class that is itself an object which keeps track of how many times it was called:
   
class CounterFunction(object):
    """Designed to work as function decorator - will not work for methods """
    def __init__(self, function):
        self.func = function
        self.counter = 0
    def __call__(self, *args, **kw):
        self.counter += 1
        return self.func(*args, **kw)

@CounterFunction
def b(): pass

b()
b()
print b.counter
#-> 2


And if we try to use that as a decorator for a method,
the method will no longer "work":

class C(object):
    @CounterFunction
    def d(self): pass

c = C()
c.d()
#Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#  File "<stdin>", line 8, in __call__
#TypeError: d() takes exactly 1 argument (0 given)


However a call like "c.d(c) " would succeed  - the c.d object just does not get its instance prepended to the argument list.

If we do, instead:
from types import MethodType

class CounterFunction(object):
    """Designed to work as function or method decorator """
    def __init__(self, function):
        self.func = function
        self.counter = 0
    def __call__(self, *args, **kw):
        self.counter += 1
        return self.func(*args, **kw)
    def __get__(self, instance, owner):
        return MethodType(self, instance, owner)

class C(object):
    @CounterFunction
    def d(self): pass

c = C()
c.d()
print c.d.counter
#-> 1

This "__get__" method, as above, does exactly what the authomatically created "getter" for methods does when we use an ordnary method. This behavior is implemented in the metaclass for Python classes, that is, the "type" method. Whena  class object is instantiaded (the moment it is defined) the __new__ method inside type checks which objects in its body dictionary are functions, and replaces them by the descritors with a __get__ method like the one above.

It should be noted that it is an official advise that you can increase performance in certain code patterns by caching an object method in a local variable rather than retrieveing the method each time you want to call it - that is:

for i in xrange(5000):
    a.do_something()


is actually faster as:

b = a.do_something
for i in xrange(5000):
    b()


The exact behavior of methods and descriptors is described in detail here in the official documentation, here, --look at "user methods"  objects:
http://docs.python.org/reference/datamodel.html

What if called functions could access the variables from the caller?

Of course, normally you just pass the variables you are interested the called function to access as parameters.

But it raises an interesting development paradigm if every non-local variable is looked up on the  scope of each calling function. Some languages, like Postscript, work this way.

In Python it can be achieved simply by copying the local variables in the code-frame of the caller functions to the global scope of the called one:


This decorator, while solid enough for "production" won't  work for variables in functions defined within other functions that use variable names that exist on the outher functions. Those would still be bound to the defining closure, as Free Variables.  It can be made to work with those, but it would be another beast.

Monday, November 1, 2010

Twitter and Tkinter: Who does not follow you

We are not so "meta" today, and more hands on.

Given the spam avalanche today, with people trying to get me to use a webapp that checks "who within my firends does not follow me" - I thought: That is quite easy -- just pick one set of people and subtract from the other, and voilá!

To get it out of terminal usage, I added in a little bit Tkinter - it is quite minimalist, still the Tkinter Part could contain some interesting patterns that I had to look around to produce (like, using the scrollbar, or updating what is displayed on teh window in the middle of data processing, with the tk.call("update") thing )

As for th twitter, we are not using any libraries exteranl to Python - just the rest API for a service that does not need authentication, and good old "urllib". We just loop through the follower and friends page results, and handpick just the users screen name (we could pick URL, and others for a more sofisticated app).



Saturday, October 30, 2010

Creating a Heap Class in one Python Line

Sometimes the stdlib is just strange --
like, the "heapq" module -- it provides some methods to use a list as a Heap, according to some well known algorithms - but it does not provide itself a "Heap" class -  You have to create your Healp as an empty list, and call "heapq.heappush" and "heapq.heappop"  (among other  methods), always passing your "heap" (list) as the first argument.

Oh.. "the object itself" as its first argument -- I had seen that before - so, we could possibly just create a class and in its body provide "pop = heapq.heappop" , and so on -- since these "heap*" methods signature is always "heapq.heap* (heap[, arg])", if I mark these functions as a "Heap class" Method themselves, they should work as methods. Not!

class Heap(list):
    pop = heapq.heappop 
    .
    .
    .



The only members  in a class body that are promoted to "InstanceMethods" are functions - and the hepq methods are "built-in functions".  It is a trick on how Python new style classes work. Without being functions, they are not promoted to "methods", and whenever they are called, the object instance will not be pre-pended to parameters list.

So, the most obvious way to do that, is to create wrapper functions that just call the heapq.methods, and use  those as methods in a class that inherits from "list".

Since the goal is to do that in one line, this is the perfect ocasion to use the new dictionary generators from Python 2.7 to create the class body.

So we try:




Heap = type("Heap", (list,), {item: (lambda self, *args: getattr(heapq, "heap" + item)(self, *args))  for item in ("pop", "push", "pushpop", "replace")   } )


This also does not work!
We  have to keep in mind whenever we generate functions inside a for loop in Python, that functions work in a closure - so, the variables used on the "for" itself will always evaluate to the last value the attained in the "for statement" when the generated functions are called.  Hence, all the members in the class above will wrap "heapq.heapreplace".

The way to get it working is to add yield the generated functions from another level of wrapper functions, so that the variables used in the loop are frozen in this second-level wrappers.





import heapq
Heap = type("Heap", (list,), {item: (lambda item2: (lambda self, *args: getattr(heapq, "heap" + item2)(self, *args)))(item)  for item in ("pop", "push", "pushpop", "replace")})


And now we are in business:



>>> a = Heap()
>>> a.push(10)
>>> a.push(5)
>>> a.push(1)
>>> a.push(20)
>>> a.pop()
1
>>> a.pop()
5
>>> a.pushpop(30)
10
>>>