Writing Repeated BigQuery records using the Java Client Library

I’ve recently been working with Java via the Google Cloud Dataflow SDK. One problem I’ve had is working with the BigQuery Java Client. It was never entirely clear how to create a repeated record. This article explains how it works and how you can accomplish the same thing.

First, you need to create a new TableRow. For this example, let’s assume we are logging events using a guid and a timestamp.

TableRow row = new TableRow()
  .set("guid", java.util.UUID.randomUUID().toString())
  .set("timestamp", ISODateTimeFormat.dateTime().print(new DateTime(DateTimeZone.UTC)))

For additional logging, we include arbitrary key-value pairs of metadata with our event. In BigQuery, the metadata is defined as a repeated record where each key and value is a STRING type. In Java, to create a repeated record we create a single TableRow for each of our repeated records. In this example, we create a metadata entry for the IP address and hostname of the machine logging the event.

TableRow ip = new TableRow()
  .set("key", "ip")
  .set("value", "127.0.0.1")

TableRow host = new TableRow()
  .set("key", "host")
  .set("value", "localhost")

Now the actual repeated record is defined as a list of TableRow objects.

List<TableRow> metadata = new ArrayList<TableRow>();
metadata.add(host);
metadata.add(ip);

And finally, we can set our repeated record metadata with our list of TableRow objects.

// Add the repeated record to the original row
row.set("metadata", metadata);

Full source code follows:

TableRow row = new TableRow()
  .set("guid", java.util.UUID.randomUUID().toString())
  .set("timestamp", ISODateTimeFormat.dateTime().print(new DateTime(DateTimeZone.UTC)))

// Metadata
TableRow ip = new TableRow()
  .set("key", "ip")
  .set("value", "127.0.0.1")

TableRow host = new TableRow()
  .set("key", "host")
  .set("value", "localhost")

List<TableRow> metadata = new ArrayList<TableRow>();
metadata.add(host);
metadata.add(ip);

row.set("metadata", metadata);
comments powered by Disqus