When doing a MapReduce operation there are times when you want to edit a set of entities without triggering the post or pre put hooks associated with those entities. On such ocassions using the raw datastore entity allows you to process the data without unwanted side effects. This article will show how to use the RawDatastoreInputReader to process datastore entities.
When doing a MapReduce operation there are times when you want to edit a set of entities without triggering the post or pre put hooks associated with those entities. On such ocassions using the raw datastore entity allows you to process the data without unwanted side effects.
For the sake of this discussion let’s assume we want to move a phone_number
field to a work_number
field for all entities of a certain Kind in the datastore.
Getting the raw datastore entity
The MapReduce library provides a RawDatastoreInputReader
that will feed raw datastore entities to your mapping function. We can set our MapReduce operation to use the RawDatastoreInputReader
using a mapreduce.yaml
declaration.
- name: move_phone_numbers
mapper:
input_reader: mapreduce.input_readers.RawDatastoreInputReader
handler: app.pipelines.move_phone_numbers_map
params:
- name: entity_kind
default: MyModel
Manipulating a raw datastore entity
Our raw_datastore_map
function to use the datastore entity in its raw form. The raw form of the datastore entity provides a dictionary like interface that we can use to manipulate the entity. With this interface we can move the phone number to the correct field.
def move_phone_numbers_map(entity):
phone_number = entity.get('phone_number')
if phone_number:
entity['work_number'] = phone_number
del entity['phone_number']
yield op.db.Put(entity)
Using op.db.Put
will put the entity to the datastore using the raw datastore
API, thereby bypassing any ndb hooks that are in place. For more information on
the raw datastore API the best resource is the source code itself, available
from the App Engine SDK
repository.