A Python ORM for the ORM haters: A minimal implementation in less 100 lines

Object-relational mappers, or ORMs, continue to be controversial. There are many critics out there (Jeff Atwood famously called them the Vietnam of computer science), but many others swear by them. They continue to be the default in the big web frameworks like Rails and Django.

I'll state my own position - I'm mostly a hater. The main purported benefit is helping you avoid contact with SQL. But this abstraction layer is leaky, and anything remotely serious involving the database is going to mean getting your hands dirty sooner or later. If you aren't familiar with their internals they can feel very opaque and reliant on "magic".

Another problem is lock-in. ORMs are heavily tied in to their particular framework, and typically go beyond the role of query helper to schema definition and migration tool as well. This means it's hard to swap out individual parts, or play nicely with other apps and change management that might also use the same database. (These drawbacks frustrated me enough that I wrote a whole non-ORM-reliant migration tool to address these issues).

Going off-piste

Nevertheless, if we decide we don't like ORMs, what can do instead? ORMs are so ubiquitous that writing application code without an ORM can feel like seriously uncharted territory. Say what you like about 'em, at least ORMs are well-documented!

And while aspects of existing ORM implementations can be frustrating, organising our logic into objects and entities seems conceptually sound as a general principle of code organization.

Also, we don't want to get to a situation where we are defining fields manually in both schema definition and object definition: we want to specify our fields only once.

What's the minimum amount of code we can get away with that lets us work in accordance with these principles, but avoids the heavy weight and coupling of a full-on ORM?

Principles for something minimal

Let's see what we can throw together that gives us the following:

This definitely won't be suitable as-is for production use. But it shows how you might approach wiring together your own database interaction code that works best for your needs and preferences.

And it definitely won't have feature parity with a full ORM - that's not the point. This is an exercise in trading off intricate complexity with directness and explicitness.

Our tools

For this example we'll use Python and PostgreSQL. If following along, use the most recent versions of both - we'll be using dataclasses, which were added only recently to the Python standard library.

We'll be using a couple of python packages to help us:

Defining and creating

Let's pretend we're writing some software to manage books in a library. A simplified schema might look like this:

create table book(
    book_id uuid default gen_random_uuid() primary key,
    title varchar,
    author varchar
);

create table borrower(
    borrower_id uuid default get_random_uuid() primary key,
    name varchar
);

create table issue(
    issue_id uuid default get_random_uuid() primary key,
    book_id uuid,
    borrower_id uuid,
    issued_at timestamptz default now()
);

We'd like a class to represent each these entities. We can succinctly do this by inspecting the schema and using that to define data classes:

from dataclasses import make_dataclass, field as dcfield
from pathlib import Path
import results

db = results.db('postgresql:///library')

schema_sql = Path('schema.sql').read_text()

db.raw(schema_sql)
# or you can sync the schema (requires pg running locally)
# db.sync_db_structure_to_definition(schema_sql)

i = db.inspect()

def make_class(inspected_table):
    it = inspected_table

    classname = it.name.title()

    columns = [
        (c.name, c.pytype, dcfield(default=None))
        for c
        in it.columns
    ]

    return make_dataclass(
        classname, columns
    )

Book = make_class('book')
Issue = make_class('issue')
Borrower = make_class('borrower')

This small amount of code gives us real classes but with their fields defined from the database.

We can create a Book object confident that we're setting the correct fields.

book = Book(title='Moby Dick', author='Herman Melville')  # WORKS
book = Book(title='Moby Dick', bad_field_name='Herman Melville')  # DOESN'T

Saving

So far so good. Let's add some saving code. We don't want to have to implement that individually for each object, so let's make all our entity objects inherit from a class called Entity, and add some save code to that.

To do that, we just need to declare the class, and add a parameter to the make_dataclass call from earlier:

return make_dataclass(
    classname, columns, bases=(Entity,)
)

We'd like to be able to save an entity like this:

book.save(db)

Which we can implement as follows, taking advantage of the convention that the primary key of the table is the table name suffixed with _id.

from dataclasses import asdict

class Entity:
    def save(self, db):
        table_name = type(self).__name__.lower()
        id_field = f'{table_name}_id'

        fields = asdict(self)
        entity_id = fields[id_field]

        if entity_id is None:
            upsert_on = None
            fields.pop(id_field)
        else:
            upsert_on = [id_field]

        inserted = db.insert(table_name, fields, upsert_on=upsert_on)

        if entity_id is None:
            inserted_id = getattr(inserted.one(), id_field)
            setattr(self, id_field, inserted_id)

This uses PostgreSQL's upsert (insert ... on conflict) to either insert a new entity (if a primary key is not set, then we assume it hasn't been saved yet), or update an entity using its primary key.

dataclasses help us here again with their inbuilt method to return their fields as a dictionary.

That takes care of saving.

Querying

For the final piece, we need a way to do the most complicated part, querying. Our preference here is to be able to write our selects in vanilla SQL for maximum control and directness and minimal magic.

For instance we might wish to write a query like this:

select * from books where ...

and get the corresponding list of Book objects back.

We will need a code that enables this from python, something like:

books = query_to_objects(db, 'select * from book', Book)

All that remains is to implement this query_to_objects method, which we can do straightforwardly as follows:

def query_to_objects(db, query, _class):
    rows = db.ss(query)

    fields = [
        _.name
        for _
        in dataclass_fields(_class)
    ]

    return [
        _class(**{
            f: row[f]
            for f
            in fields
        })
        for row in rows
    ]

Joins and n+1

But of course, we typically want joined data too.

A common problem with ORMS is the N+1 problem, which we obviously want to avoid, by getting all the data we want in a single query.

Take this query for example. We want all the overdue books in the library (the where clause can be whatever here, let's assume all books get issued for a month).

select
    *
from
    issue i
    join book using (book_id)
    join borrower using (borrower_id)
where
    i.issued_at < now() - '1 month'::interval

We need to get back the resulting Issue objects, plus the linked Book and Borrower objects for each.

That means we need to extend our loading method a bit.

We need to not only load the linked classes, but correctly set up the links between each object based on their rows. These relationships could be one-to-one or one-to-many so we need to handle either situation.

We'd want to call the query something like this:

issues = query_to_objects(
    OVERDUE_BOOKS,
    _class=Issue,
    linked_classes=[Book, Borrower]
)

This would return a list of Issues, with the Book and Borrower object for each.

A rough implementation as below.

Queries of many-to-many relationships will result in multiple copies of the initial entity, so there is some grouping involved.

def query_to_objects(query, _class, linked_classes=None):
    linked_classes = linked_classes or []

    rows = db.ss(query)

    class_name = _class.__name__

    fields = [
        _.name
        for _
        in dataclass_fields(_class)
    ]

    grouped = rows.grouped_by(columns=fields).values()

    objects_with_rows = [
        (_class(**{f: g[0][f] for f in fields}), g)
        for g in grouped
    ]

    for obj, rows in objects_with_rows:
        for linked_class in linked_classes:
            linked_fields = [
                _.name
                for _
                in dataclass_fields(linked_class)
            ]
            table_name = linked_class.__name__.lower()
            id_field = f'{table_name}_id'

            linked_objects = [
                linked_class(
                    **{k: r[k] for k in linked_fields}
                )
                for r in rows
            ]

            for o in linked_objects:
                setattr(o, class_name, obj)

            if hasattr(obj, id_field):  # one-to-one
                setattr(obj, table_name, linked_objects[0])
            elif id_field in linked_fields:  # one-to-many
                setattr(obj, f'{table_name}s', linked_objects)

    return [_[0] for _ in objects_with_rows]

This has obvious limitations - in particular, it won't handle longer chains of joins, without further enhancement. But for more complex joins, you might wish to handle those using a view anyway.

Views and other non-table selectables often fit awkwardly into the conceptual model of full-fledged ORMs, but can be integrated quite easily into code like this (you can inspect views just like tables, and generate classes similarly).

Fully-fledged ORMs have quite sophisticated change tracking/saving features (for instance, sqlalchemy's Unit of Work). Your own database integration code won't give you this - again, going ORMless means making the trade-off between controlled explicitness vs magical implicitness.

Next steps

This is an example of a general approach, not something you should actually use as-is. If you'd like to play around with the code, take a look at the gist.

And let me know what you think via the links at the bottom of the page.

If you enjoyed this, you might also enjoy my talk on database migrations: Your migrations are bad, and you should feel bad