Français - English
Source: svn://svn.saintamh.org/code/lib-python/trunk/saintamh/struct/
Source (en couleurs): http://svn.saintamh.org/code/lib-python/trunk/saintamh/struct/

The idea for this came to me in a dream (!) on the morning of 2010-08-08, so hey, I just couldn't not try and implement it. The original idea was to have something as concise as Scala's case classes, where a single, compact statement can define a class, its constructor, string serialization, comparators, etc.

I suppose if you need this in your code, you're not doing things "the Python way", but I guess sometimes I find "the Python way" a bit too verbose. Or maybe I just don't get it, but it seems to me that writing a class constructor that takes keyword args, sense-checks them, sets defaults and finally sets the instance fields, involves a lot of boilerplate.

This module aims to make it possible for the name of each field to appear only once in the class definition. While we're at it also defines sensible defaults for __repr__ and __cmp__, conversion to and from XML trees or JSON structs, automatic generation of MySQL statements, and automatic serialization to and from CSV files.

At first I tried to resist the urge for too much emulation of strong typing, since that is probably not "the Python way", but I guess in the end I thought it *is* useful to have basic type checks, and it *is* extremely useful to start writing a program by laying down a concise, but precise definition of what your data types are. Representation is, after all, the essence of programming (or so they say).

Similarly, all structs are immutable. That's definitely not "the Python way", but it's my way.

Here's a quick synopsis of how it works:

>>> from saintamh.struct import *

The `struct' function takes in keyword parameters that describe a data
structure in terms of its fields, and constraints on their type and values.
It returns a class object, that can be instantiated or inherited from:

>>> class Artist (struct (
...         name = str,
...         )):
...     pass

>>> class MusicBrainzID (struct (
...         id = {
...             'class': str,
...             'regex': r'^[0-9a-f\-]{36}$',
...             },
...         )):
...     pass

>>> class Track (struct (
...         title = str,
...         total_seconds = nonnegative (int),
...         minutes = lambda self: self.total_seconds/60,
...         seconds = lambda self: self.total_seconds%60,
...         )):
...     pass

>>> class Album (struct (
...         artist = Artist,
...         title = str,
...         year = {
...             'class': int,
...             'accepts_value': lambda n: 1900 < n < 2050,
...             },
...         tracks = seq_of(Track),
...         )):
...     pass

You can then instantiate these classes:

>>> album = Album (
...     artist = "Wah-Wah",
...     title = "Gyroscope",
...     year = 2000,
...     tracks = (
...         Track (title="Elevon", total_seconds=209),
...         Track (title="Gear", total_seconds=514),
...         Track (title="Stringer", total_seconds=413),
...         ),
...     )

And access its values as regular fields:

>>> album.artist.name
'Wah-Wah'
>>> ', '.join (track.title for track in album.tracks)
'Elevon, Gear, Stringer'


Unlike namedtuple, the fields are not defined in any particular order, and
so the generated class' constructor can only take keyword parameters:

>>> t1 = Track ("Up the Ante", 100)
Traceback (most recent call last):
  ...
TypeError: 'Track' constructor takes no non-keyword params


Much of the point of this approach is to ensure that strict constraints are
enforced on the data object's contents. For a start, unless a field is
explicitly marked as `nullable', a non-null value must be supplied.

>>> t1 = Track (title="Up the Ante")
Traceback (most recent call last):
    ...
ValueError: Track.total_seconds: ValueError: value not accepted: None

Fields are converted to the proper class. You'll never accidentally save a
numerical value as a string object.

>>> t1 = Track (title="Up the Ante", total_seconds='100')
>>> isinstance (t1.total_seconds, int)
True

If the given value cannot be converted to the proper class, instantiation
fails

>>> t1 = Track (title="Up the Ante", total_seconds='cent')
Traceback (most recent call last):
  ...
ValueError: Track.total_seconds: ValueError: invalid literal for int() with base 10: 'cent'

If the class definition places constraints are a field's value, the class
cannot be instantiated with a value that doesn't satisfy them:

>>> MusicBrainzID ('not a valid id')
Traceback (most recent call last):
  ...
ValueError: MusicBrainzID.id: ValueError: value not accepted: 'not a valid id'

The class' __repr__ echoes the constructor call:

>>> t1 = Track (title="Up the Ante", total_seconds=100)
>>> t2 = Track (title="Bucking the Trend", total_seconds=150)
>>> print repr(t1)
Track(title='Up the Ante', total_seconds=100)

If a class has only a single non-nullable field,
it doesn't need to be passed as a keyword param:

>>> art = Artist ("The Decadents")
>>> art
Artist('The Decadents')

The generated classes are given an appropriate __cmp__ method

>>> art == Artist ("The Decadents")
True
>>> art == Artist ("Bronenosets Potyomkin")
False
>>> Artist("A") < Artist("B")
True


All objects created from the generated classes are immutable. You can't
change their fields after instantiation. All sequence types are converted
to tuples, dicts to an immutable mapping class.

They also have a __hash__ method, and so are suitable for use a dictionary keys.

>>> t1.title = "Zapped"
Traceback (most recent call last):
    ...
TypeError: Track objects are immutable

The closest thing to setting a field is to derive a copy of the object, 
with certain fields changed.

>>> t1
Track(title='Up the Ante', total_seconds=100)
>>> t3 = t1.derive (title = "Zapped")
>>> t3
Track(title='Zapped', total_seconds=100)
>>> t1
Track(title='Up the Ante', total_seconds=100)

You can declare "virtual fields" (see the `minutes' and `seconds' fields in
the declaration above) whose value depends on other fields in the object.
They are defined as a lambda that takes `self' as an argument. They are
accessed from the outside as fields, not methods. The value is computed the
first time it is read, and cached forever after (which works because the
object is immutable).

>>> '%d:%02d' % (t1.minutes, t1.seconds)
'1:40'


Automatic conversion to/from ETrees is provided

>>> import lxml.etree as ET
>>> album_etree = album.etree()
>>> print ET.tostring (album_etree, pretty_print=True).rstrip()
<Album year="2000">
  <title>Gyroscope</title>
  <artist>
    <name>Wah-Wah</name>
  </artist>
  <tracks>
    <item index="0" total_seconds="209">
      <title>Elevon</title>
    </item>
    <item index="1" total_seconds="514">
      <title>Gear</title>
    </item>
    <item index="2" total_seconds="413">
      <title>Stringer</title>
    </item>
  </tracks>
</Album>
>>> Album.parse_etree(album_etree) == album
True

Automatic conversion to JSON structs is also provided

>>> import pprint
>>> pprint.pprint (album.json_struct())
{'artist': {'name': 'Wah-Wah'},
 'title': 'Gyroscope',
 'tracks': [{'title': 'Elevon', 'total_seconds': 209},
            {'title': 'Gear', 'total_seconds': 514},
            {'title': 'Stringer', 'total_seconds': 413}],
 'year': 2000}

Automatic writing and reading of CSV files:

>>> from StringIO import StringIO
>>> dummy_file = StringIO ('')

>>> Track.csv_file_write (dummy_file, album.tracks)

>>> print dummy_file.getvalue().replace("\\r","").rstrip()
title,total_seconds
Elevon,209
Gear,514
Stringer,413

>>> dummy_file.seek(0)
>>> tuple(Track.csv_file_read (dummy_file)) == album.tracks
True


There are many more utilities for defining field types:

>>> class AirportInfo (struct (
...         iata_id = uppercase_letters(3),
...         icao_id = uppercase_letters(4),
...         )):
...     pass

>>> AirportInfo (iata_id='yul', icao_id='yul')
Traceback (most recent call last):
  ...
ValueError: AirportInfo.icao_id: ValueError: value not accepted: 'YUL'

>>> AirportInfo (iata_id='yul', icao_id='cyul')
AirportInfo(iata_id='YUL', icao_id='CYUL')

Voici quelques exemples de ce module en action:

Vous pouvez aussi lire le code source de ce module.