SQLAlchemy Aggregator

  1. News
  2. Installation
  3. Basic usage
  4. Reference
    1. Aggregators
    2. Aggregations
  5. To Do
  6. Download

News

  • as of 20.06.2008 - supporting project by me (Paul Colomiets) is suspended, there is more recent version at  https://dbcook.svn.sourceforge.net/svnroot/dbcook/trunk/dbcook/misc/aggregator
  • 18.11.2007 - We now have this trac installation, and svn (see below)
  • 27.08.2007 - Version 0.1.4. Code somewhat refactored, small bugfix changes by Svilen. Experimental Average aggregator support (0.1.3 was aimed to fix bugs in inherited mappers but actually fixed nothing)
  • 09.08.2007 - Version 0.1.2. Bugs fixed (pointed out in  SQLAlchemy mailing list)
  • 08.08.2007 - Version 0.1.1, UPDATE support, Sum aggregator fixed (was missing unittests, and not worked)
  • 07.08.2007 - Initial version

Installation

Installation is as easy as typing:

easy_install SQLAlchemyAggregator

Basic usage

Assuming we have some abstract blocks table with children lines table, and we want to store count of lines in block.lines and id of last line in block.lastline

Your tables should look like:

blocks = Table('blocks', metadata,
    Column('id', Integer, primary_key=True, autoincrement=True),
    ....
    Column('lines', Integer, default=0),
    Column('lastline', Integer))
lines = Table('lines', metadata,
    Column('id', Integer, primary_key=True, autoincrement=True),
    Column('block', Integer, ForeingKey(blocks.c.id)),
    ....)

Few things you should know:

  • You must have a foreign key to parent table
  • If there are several first is chosen
  • If you'd not specified default for lines you have block.lines = None until you add first row

(well, all thease restrictions should go in future releases)

Then we need to create mappers:

>>> mapper(Block, blocks)
<sqlalchemy.orm.mapper.Mapper object at 0x8507c2e>
>>> mapper(Line, lines,
...     extension=aggregator.Quick(
...         aggregator.Max(blocks.c.lastline, lines.c.id), aggregator.Count(blocks.c.lines),
...     ))
<sqlalchemy.orm.mapper.Mapper object at 0x8507e4c>

Example is mostly self-explanatory, some details will be later.

Then you can just create, and delete lines, and blocks' counter will be updated appropriately:

>>> block = Block('something')
>>> session.save(block)
>>> session.flush() # this is needed to get block.id
>>> line = Line()
>>> line.block = block.id
>>> session.save(line)
>>> session.flush()
>>> session.refresh(block)
>>> block.lines
1

More things you should know:

  • You need to save and flush children and reload parent object before value will be updated
  • You shouldn't change foreign key field line.block in this case, as well as primary key of parent object (block.id) to get accurate results
  • Aggregator uses atomic updates so if other two restrictions are kept value will be accurate even in highly concurrent environment

You could also do like this:

>>> mapper(Line, lines,
...     extension=aggregator.Quick(
...         aggregator.Max(blocks.c.lastline, lines.c.id), aggregator.Count(blocks.c.lines),
...     ), properties = {
...         '_block': blocks.c.block,
...         'block': relation(Block),
...     })
<sqlalchemy.orm.mapper.Mapper object at 0x8507e4c>
>>> block = Block('something')
>>> line = Line()
>>> line.block = block
>>> session.save(block)
>>> session.save(line)
>>> session.flush()
>>> session.refresh(block)
>>> block.lines
1

With help of relation you don't need to flush and get id before storing children object.

Reference

Aggregators

There are two basic aggregator clases:

  • Accurate - this class uses subquery with aggregation function (e.g. UPDATE blocks SET line=(SELECT COUNT(*) FROM lines WHERE id = blocks.id) WHERE id = ?) to update field after each change of relevant data
  • Quick - this one uses atomic updates to update values where possible (e.g. UPDATE blocks SET line=line+1 WHERE id = ?), but fallbacks to behavior of Accurate in corner cases (e.g. determining the max value when row was deleted)

You can use both for distinct columns using list syntax:

import aggregator as a
mapper(Line, lines,


     extension=[a.Quick(a.Count(blocks.c.lines)), a.Accurate(a.Max(blocks.c.lastline, lines.c.id))],
     ))

Aggregations

Curenntly only four functions available:

  • Count - count of child rows. Arguments:
    • target - field in which number will be stored (in parent table)
  • Max, Min - maximum and minimum of specified fields respectively
    • target - field in which number will be stored (in parent table)
    • source - field which max or min will be calculated (in children table)
  • Sum - sum of the specified field of all children
    • target - field in which number will be stored (in parent table)
    • source - field which sum will be calculated (in children table)

To Do

  • Other aggregate functions
  • Throw warnings when value can be "out of sync"
  • Document interface for adding own aggregation types (it's easy, today you can look at the source)
  • Remove foreign key requirement, allow multiple conditions
  • Test library on databases other than MySQL and SQLite

For latter, just execute:

python tests/simpletest.py scheme://login@host/dbname?everything=you_need

from distribution directory and leave feed back in comments here. By default it uses SQLite's memory database.

Download