I’ve recently been working with Java via the Google Cloud Dataflow SDK. One problem I’ve had is working with the BigQuery Java Client. It was never entirely clear how to create a repeated record. This article explains how it works and how you can accomplish the same thing.

First, you need to create a new TableRow. For this example, let’s assume we are logging events using a guid and a timestamp.

TableRow row = new TableRow()
  .set("guid", java.util.UUID.randomUUID().toString())
  .set("timestamp", ISODateTimeFormat.dateTime().print(new DateTime(DateTimeZone.UTC)))

For additional logging, we include arbitrary key-value pairs of metadata with our event. In BigQuery, the metadata is defined as a repeated record where each key and value is a STRING type. In Java, to create a repeated record we create a single TableRow for each of our repeated records. In this example, we create a metadata entry for the IP address and hostname of the machine logging the event.

TableRow ip = new TableRow()
  .set("key", "ip")
  .set("value", "127.0.0.1")

TableRow host = new TableRow()
  .set("key", "host")
  .set("value", "localhost")

Now the actual repeated record is defined as a list of TableRow objects.

List<TableRow> metadata = new ArrayList<TableRow>();
metadata.add(host);
metadata.add(ip);

And finally, we can set our repeated record metadata with our list of TableRow objects.

// Add the repeated record to the original row
row.set("metadata", metadata);

Full source code follows:

TableRow row = new TableRow()
  .set("guid", java.util.UUID.randomUUID().toString())
  .set("timestamp", ISODateTimeFormat.dateTime().print(new DateTime(DateTimeZone.UTC)))

// Metadata
TableRow ip = new TableRow()
  .set("key", "ip")
  .set("value", "127.0.0.1")

TableRow host = new TableRow()
  .set("key", "host")
  .set("value", "localhost")

List<TableRow> metadata = new ArrayList<TableRow>();
metadata.add(host);
metadata.add(ip);

row.set("metadata", metadata);