At VendAsta we have a few APIs that are backed by search documents built using the App Engine Search API. These APIs are queried using a search string entered in a text box. One way to improve the user experience of this text box is to offer the user suggestions of popular searches to use as their query. This article describes how to achieve this.

Finding the most likely search terms

Before presenting suggestions to the user we need to collect the data determining which searches are popular. This data contains the most likely choice of search term given a prefix (i.e., an incomplete search term). For example, given the incomplete search term ne we need to return the most frequent searches that have been made using that prefix.

Incomplete Search.

Suppose the user searches for the term Netflix. Given the search term we increment the frequency of a (prefix, search term) tuple for each prefix of the search term Netflix. The end result is a datastore model with entries for each (prefix, search term) tuple.

Netflix Search.

If that term Netflix is searched for a second time we increment the frequency count of each (prefix, search term) tuple.

Tuples of Netflix.

Now suppose one person searched for the term news. We build up our frequency table with each (prefix, search term) tuple again, using news as the search term.

Tuples of news.

Once we’ve assembled the data we can go back to our original problem of finding the most likely searches for a given incomplete search. Given our dataset this is lookup for each record matching our prefix in the dataset ordered by frequency.

Ordered table.

A sample implementation

The following is a sample implementation encapsulating the ideas presented above.

from google.appengine.ext import ndb

class SearchSuggestionModel(ndb.Model):
    """ Model class for scoring of search frequency. """

    created = ndb.DateTimeProperty(auto_now_add=True)
    updated = ndb.DateTimeProperty(auto_now=True)

    prefix = ndb.StringProperty(required=True)
    search_term = ndb.StringProperty(required=True)
    frequency = ndb.IntegerProperty(required=True, default=0)

    def build_key(cls, prefix, search_term, pid):
        """ Builds a key in the default namespace. """
        id_ = "%s:%s" % (prefix, search_term)
        return ndb.Key(cls, id_, namespace=pid.upper())

    def prefix_query(cls, prefix, pid):
        """ Return all models with the matching prefix. Ordered by frequency. """
        return cls.query(cls.prefix == prefix, namespace=pid).order(-cls.frequency)

    def increment(cls, search_term, partner_id):
        Given a search_term, increment each (prefix, search_term) combination for all prefixes of that search_term
        if not search_term:

        entities = []

        for index, _ in enumerate(search_term):
            prefix = search_term[0:index]
            if prefix:
                key = cls.build_key(prefix, search_term, partner_id)
                entity = key.get()
                if entity:
                    entity.frequency = entity.frequency + 1
                    # Put new entity
                    entity = cls(key=key, prefix=prefix, search_term=search_term, frequency=1)