Durabledict for App Engine

tldr;

DatastoreDict.

What’s a durabledict?

Good question. Durabledict is a Python implementation of a persistent dictionary. The dictionary values are cached locally and sync with the datastore whenever a value in the datastore changes.

Disqus provides concrete implementations for Redis, Django, ZooKeeper and in memory. This blog post details an implementation using the App Engine datastore and memcache.

Creating your own durabledict

By following the guide the durabledict README we can create our own implementation. We need to subclass durabledict.base.DurableDict and implement the following interface methods. Strictly speaking, _pop and _setdefault do not have to be implemented but doing so makes your durabledict behave like a base dict in all cases.

persist(key, value) - Persist value at key to your data store.

depersist(key) - Delete the value at key from your data store.

durables() - Return a key=val dict of all keys in your data store.

last_updated() - A comparable value of when the data in your data store was last updated.

_pop(key, default=None) - If key is in the dictionary, remove it and return its value, else return default. If default is not given and key is not in the dictionary, a KeyError is raised.

_setdefault(key, default=None) - If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

Let’s implement these one-by-one.

persist(key, value)

Persisting a value to the datastore is a relatively simple operation. If the key already exists we update it’s value. If the key does not already exist we create it. To aid with this operation we create a get_or_create method that will return an existing entity if one exists or create a new entity if one does not exist.

def persist(self, key, val):
    instance, created = get_or_create(self.model, key, val)

    if not created and instance.value != val:
        instance.value = val
        instance.put()

    self.touch_last_updated()

The last line of this function updates the last time this durabledict was changed. This is used for caching. We create the last_updated and touch_last_updated functions now.

last_updated(key, value)

def last_updated(self):
    return self.cache.get(self.cache_key)

def touch_last_updated(self):
    self.cache.incr(self.cache_key, initial_value=self.last_synced + 1)

init

We now have the building blocks to create our initial durabledict. Within the __init__ method we set a manager and cache instance. The manager is responsible for ndb datastore operations to decouple the ndb interface from the durabledict implementation. We decouple our caching method in a similar fashion. We also set the initial value of the cache whenever we create a new instance of the durabledict.

from google.appengine.api import memcache

from durabledict.base import DurableDict
from durabledict.encoding import NoOpEncoding


class DatastoreDict(DurableDict):

    def __init__(self,
                 model,
                 value_col='value',
                 cache=memcache,
                 cache_key='__DatastoreDict:LastUpdated__'):

        self.model = model
        self.value_col = value_col
        self.cache = cache
        self.cache_key = cache_key

        self.cache.add(self.cache_key, 1)

        super(DatastoreDict, self).__init__(encoding=NoOpEncoding)

depersist(key)

Depersist implies deleting a key from the dictionary (and datastore). Here we assume a helper method delete that, given an ndb model and a string representing it’s key deletes the model. Since the data has changed we also update the last touched value to force a cache invalidation and data refresh.

def depersist(self, key):
    delete(self.model, key)
    self.touch_last_updated()

durables()

durables() returns the entire dictionary. Since we are all matching entities from the datastore it is important to keep your dictionary relatively small – as the dictionary grows in size, resyncing it’s state with the datastore will get more and more expensive. This function assumes a get_all method that will return all instances of a model.

def durables(self):
    encoded_models = get_all(self.model)
    return dict((model.key.id(), getattr(model, self.value_col)) for model in encoded_models)

setdefault(key, default=None)

_setdefault() overrides the dictionary built-in setdefault which allows you to insert a key into the dictionary, creating the key with the default value if it does not exist and returning the existing value if it does exist.

For example, the following sequence of code creates a key for y, which does not exist, and returns the existing value for x.

>>> d = {'x': 1}
>>> d.setdefault('y', 2)
2
>>> d
{'y': 2, 'x': 1}
>>> d.setdefault('x', 3)
1
>>> d
{'y': 2, 'x': 1}

We can implement _setdefault using the get_or_create helper method, updating the cache if we have changed the dictionary.

def _setdefault(self, key, default=None):
    instance, created = get_or_create(self.model, key, default)

    if created:
        self.touch_last_updated()

    return getattr(instance, self.value_col)

pop(key, default=None)

pop returns the value for a key and deletes the key. This is fairly straight forward given a get and delete helper method.

def _pop(self, key, default=None):
    instance = get(self.model, key)
    if instance:
        value = getattr(instance, self.value_col)
        delete(self.model, key)
        self.touch_last_updated()
        return value
    else:
        if default is not None:
            return default
        else:
            raise KeyError

The Help

The previous discussion uses a few helper methods that we haven’t defined yet. Each of these methods takes an arbitrary ndb model and performs an operation on it.

def build_key(cls, key):
    return ndb.Key(DatastoreDictAncestorModel,
                   DatastoreDictAncestorModel.generate_key(cls).string_id(),
                   cls, key.lower(),
                   namespace='')


@ndb.transactional
def get_all(cls):
    return cls.query(
        ancestor=DatastoreDictAncestorModel.generate_key(cls)).fetch()


@ndb.transactional
def get(cls, key):
    return build_key(cls, key).get()


@ndb.transactional
def get_or_create(cls, key, value=None):
    key = build_key(cls, key)

    instance = key.get()
    if instance:
        return instance, False

    instance = cls(key=key, value=value)
    instance.put()

    return instance, True


@ndb.transactional
def delete(cls, key):
    key = build_key(cls, key)
    return key.delete()

The last item of note is the use of a parent for each DatastoreDict. This common ancestor forces strong read consistency for the get_all method, allowing us to update a dictionary and have a consistent view of the data on subsequent reads. We use an additional model to provide the strong read consistency.

class DatastoreDictAncestorModel(ndb.Model):

    @classmethod
    def generate_key(cls, child_cls):
        key_name = '__%s-%s__' % ('ancestor', child_cls.__name__)
        return ndb.Key(cls, key_name, namespace='')
comments powered by Disqus