Wednesday, November 17, 2010

X-unzip - and peeping into it

Most of us already know the value of the zip function. (and xzip, in python2).

Once in a while, there might be the need of "unzipping" a sequence: that is - de-interleave a sequence zipped inside other one. So that one could get [[0, 1, 2, 3], [10, 11, 12, 13]]
back from [(0, 10), (1, 11), (2, 12), (3, 13)].

Python being what it is, that is an easy, if not trivial task -
we can just fetch all values on the zipped sequence, and append each one on a list enumerated inside a container sequence - it is easy as:

def unzip(seq):
    start = True
    for next in seq:
        if start:
            result = [[] for _ in xrange(len(next))]
            start = False
        for i,v in enumerate(next):
            result[i].append(v)
    return result


But... what if the original list, with te values to be de-interleaved is itself huge? Or part of a data-stream generated elsewhere? How can we fetch de-interleaved data, in order, without having to read through all that stream?

Well, the function up to this would return us a container sequence where each object would be a generator itself. And when called upon, this generator would manage to fetch new values from the original stream as needed, and share the values of the other data sequences with its sibling generators.

Fear not! That is a task for a couple dynamically generated generators (DGG's) agressively sharing data in their "nonlocal" variable space (otherwise known as closure).
def xunzip(seq):
    it = iter(seq)
    next = it.next()
    width = len(next)
    results = []
    buffer_ = [{"tuple": next, "consumed": 0}]
    # This is a list to workaround the non-existence
    # of the "nonlocal" keyword in python 2
    buffer_base = [0]
    for i in xrange(width):
        def freeze_i(j):
            def unzipper():
                offset = 0
                while True:
                    index = offset - buffer_base[0]
                    if not (offset < len(buffer_) + buffer_base[0]):
                        buffer_.append({"tuple": it.next(), "consumed": 0})
                    buffer_[index]["consumed"] += 1
                    value = buffer_[index]["tuple"][j]
                    if buffer_[index]["consumed"] == width:
                        buffer_.pop(0)
                        buffer_base[0] += 1
                    yield value
                    offset += 1
            unzipper.func_name = "seq_%d" % j
            return unzipper
        results.append(freeze_i(i)())
    return results
And just to finish today's unzipping party, some introspective code so that we can peep all of it taking place:
if __name__ == "__main__":
    import random
    a = zip(range(10), range(10,20), range(20,30))
    #print unzip(a)
    b = xunzip(a)
    finished_count = [0,0,0]
    unpack = [-1, -1, -1]
    while sum(finished_count) < len(b):
        index = random.randrange(len(b))
        try:
            unpack[index] = b[index].next()
            buffer_lenght = len(b[index].gi_frame.f_locals["buffer_"])
        except StopIteration:
            finished_count[index] = 1
            buffer_lenght = "Unknown"
        print unpack, "Buffer lenght: ", buffer_lenght
(Allternatively, fetch all code together from here: http://pastebin.com/fiAq2uy9 )

12 comments:

  1. Can't you simply do

    >>> zipped = list(zip(list1, list2))
    >>> unzipped = zip(*zipped)

    ?

    The "list()" around the first zip is because, in Python 3, "zip" returns a "zip object", not a list. This also ensures (if I'm not mistaken) that the generator-like properties you seek are kept.

    ReplyDelete
  2. While this zip-reapplying does unzip - (and D'UH, I had not even perceived that) - the iterator I get on the long example here is not achieved by Python's 3 zip object (or itertools.izip on python 2.0) :
    It returns an iterator that which, when its items are retrieved, return the full deinterleaved sequences as a list, one at a time.
    The examle in this post returns a container list for one iterator for each data sequence.

    ReplyDelete
  3. "Enhance your analytics skills with our power bi training online classes designed for beginners and professionals alike."

    ReplyDelete
  4. EasyPayTax helps businesses stay compliant by offering end-to-end solutions for tax filing, company registration, accounting, and statutory compliance.online tax consultant in Hyderabad

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. OnlineITGuru DevOps training helps you learn CI/CD, Docker, Kubernetes, cloud tools, and automation through hands-on projects for real industry roles.devops course training

    ReplyDelete
  7. OnlineITGuru Data Modeling course teaches ER diagrams, normalization, and database design best practices to create scalable, business-ready data models.data modeling training courses

    ReplyDelete
  8. "Enhance your career with salesforce administrator course designed for beginners and professionals alike. Gain hands-on skills to manage Salesforce effectively and boost your organizational impact."

    ReplyDelete
  9. EasyPayTax helps individuals online tax consultant in Hyderabad
    and businesses file taxes quickly and accurately with expert guidance and simple online tools.

    ReplyDelete
  10. EasyPayTax delivers fast, reliable tax filing services online, ensuring compliance and maximizing your refunds with minimal effort.tax filing services in Hyderabad

    ReplyDelete
  11. EasyPayTax makes tax filing easy and stress-free with expert support, secure online processing, and accurate return submissions. ITR filing services in Hyderabad

    ReplyDelete
  12. EasyPayTax: smart, simple, and affordable online tax filing for individuals and businesses with professional support. GST registration in Hyderabad

    ReplyDelete