For the past year I’ve been making a concerted effort to learn French using the methods from the book Fluent Forever, which is an excellent resource for learning how to learn a language. For those not familiar with the method, it boils down to this:
- Learn Pronunciation: knowing how to correctly pronounce words in your target language makes everything else easier.
- Learn Frequently Used Words: not all words are created equal, learn the most frequently used words first.
- Learn Grammar: put together grammatical sentences using the words you already know.
If you turn your head to the side and squint at that list, it somewhat resembles the steps you would take to learn a language as an infant — first understand the sounds of the language, then learn words (“mommy”, “daddy”), and finally put together correct sentences. In addition, as an infant you have a constant source of high quality input helping you learn words and grammar. You can imagine the following “conversation” between an adult and a hungry child:
- “Tommy, do you want an apple?”
- (points at Apple)
- “Apple? Want Apple?”
- (points at Apple again)
- “Apple now?”
It won’t take long for that hungry infant to connect the word “Apple” to the object in front of them. This type of reinforcement works great for an infant, but as an adult we are forced to simulate it using frequent review. This is where a spaced-repetition program can help. If you aren’t familiar with spaced repetition, the simplified version is that you create flash cards and review them at the points in time that help you remember them the best. Spaced repetition software helps by computing the optimal time to review flash cards to help drive them in to long-term memory. The most flexible and feature rich program for spaced-repetition I’ve found is Anki, which I’ve been using with a lot of success to learn frequently used words.
Now, a particular thorn in my progress learning French is verb conjugations. I’m trying to resolve this thorn using Anki by finding grammatically correct sentences and creating flash cards from them. These cards ask you to find the root form of a verb, and the correct conjugated form that fits grammatically in the example sentence (for a full explanation of the method, see the Fluent Forever blog). To reinforce correct grammar and pronunciation, each sentence should ideally be accompanied by a recording of a native speaker speaking the sentence. Unfortunately, it’s not always easy to find a native speaker willing to record sentences for you — this is where Amazon Polly comes in.
Amazon Polly is a service that turns text into speech in a wide variety of languages and voices. By leveraging Polly, you can easily create quality examples of native speakers for learning a language. To help automate the creation of these recordings, I created a simple serverless web application that takes text as input, turns that text to speech using Polly, and stores the result in S3. The rest of this post describes this application. Full source code is available on Github.
The API for this simple service exposes two endpoints. One for creating a recording, and a second for retrieving recordings. These endpoints are exposed through API Gateway and are backed by Lambda functions. The Lambda functions handle converting text to speech and storing that speech in S3. A DynamoDB table lists all the recordings and their locations in S3.
The following diagram shows the application architecture.
When the user wants to create a new recording:
- An HTTP call is made to the create endpoint exposed by API Gateway.
- API Gateway invokes a Lambda function responsible for converting the
text into speech and storing the result. The function performs the
- Use Polly to convert text into an audio file.
- Store the result in S3.
- Store a record of the input text and the resulting mp3 file location in DynamoDB.
When the user wants to get an existing recording:
- An HTTP call is made to the get endpoint exposed by API Gateway.
- API Gateway invokes a Lambda function responsible for retrieving the record data from DynamoDB.
- The user uses the S3 URL returned by DynamoDB to download the mp3 file.
Now let’s walk through how to create the application using the Chalice serverless framework from AWS labs.
Creating The Backing Resources
Our serverless application relies on two AWS resources: an S3 bucket to store recorded speech, and a DynamoDB table to index the S3 url for the recorded text. Since I don’t want to keep these recordings forever, I set an expiration time of two days on all S3 objects in the bucket, and also configure a time-to-live for DynamoDB entries of two days. The following CloudFormation template creates the required resources:
Resources section, we create
TranslationsBucket of type
S3Bucket. The bucket includes a
LifecycleConfiguration rule specifying
the expiration date of all objects placed in the bucket. The
TranslationsTable is a DynamoDB table with a simple
id as the primary
hash key. The
TimeToLiveSpecification lists the Dynamo attribute we will use to
expire records and enables TTL for the table. Note that Dynamo does not
require you to define a full schema ahead of time, you only need to
specify the key to start using the table.
You can deploy this CloudFormation template to create the required resources for our serverless application. Be sure to specify the desired name for your S3 bucket and for your Dynamo table.
The Chalice Application
With our resources ready to use, we can create the Chalice application implementing our application. The following sequence of commands creates a new Chalice application:
You can then deploy and test the simple hello world example:
The create endpoint is responsible for synthesizing text into speech, storing the result in S3, and indexing the S3 URL in Dynamo for future retrieval.
Using Chalice, we define a route called
recordings that accepts POST
requests. We also enable CORS support and require an API key for a minimal
layer of security. You can add the endpoint to
Now we need to fill this out to implement the desired functionality.
This function starts by accessing the JSON body of the current request, available from the Chalice request metadata. From here, it extracts the text from the request, and converts that text to a randomly chosen French voice.
We can now implement each of the functions required to create a recording.
Synthesizing speech requires an API call to Polly with the text to synthesize, and the voice to speak with. We save the result to the Lambda functions temporary file system.
Uploading to S3
We can now upload the result from the temporary file system to S3. After uploading, we set the file to be publicly readable so we can retrieve it later through a web interface.
Indexing with Dynamo
Lastly, we can index the request and the S3 url in Dynamo for later
retrieval. We set the
expires attribute to be two days in the future so
that Dynamo’s time-to-live feature will expire old recordings.
Putting this all together, we get the following Chalice file for creating a French recording of input text.
We can go ahead and deploy our application:
$ chalice deploy
Update IAM Policies
The Lambda function deployed by Chalice will need to have access to the S3 bucket, the DynamoDB table, and to Amazon Polly. Set the policy on the Lambda execution role created by Chalice to include this access.
Testing Record Creation
You can deploy and test our recording function using
httpie by calling your endpoint. Substitute your
API gateway URL for
The create endpoint is fairly straightforward. We use Chalices’ URL
parameter functionality to specify a URL parameter called
use that identifier to fetch the corresponding entry from DynamoDB and
return that to the user. For convenience, we use a
return all entries from the table.
Testing Recording Retrieval
We can use our get endpoint to retrieve recordings our recordings.
To help make generating example sentences a little easier, I created a simple user interface that accepts text and calls the API endpoints to store record that text as an mp3 file in S3 using our API.
You can find the full source code for the interface on Github.
When starting this application I was skeptical that Polly would provide a natural expression of example sentences. Thankfully, I was quite surprised by the quality of the sentences. With the serverless application, I am now able to quickly create recordings of any French word or phrase to aid in language learning. Combining this with Anki for spaced repetition I’ve found a valuable resource for learning and recalling verb conjugations.