I’ve recently been working with Java via the Google Cloud Dataflow SDK. One problem I’ve had is working with the BigQuery Java Client. It was never entirely clear how to create a repeated record. This article explains how it works and how you can accomplish the same thing.
First, you need to create a new TableRow
. For this example, let’s assume we
are logging events using a guid and a timestamp.
TableRow row = new TableRow()
.set("guid", java.util.UUID.randomUUID().toString())
.set("timestamp", ISODateTimeFormat.dateTime().print(new DateTime(DateTimeZone.UTC)))
For additional logging, we include arbitrary key-value pairs of metadata with
our event. In BigQuery, the metadata is defined as a repeated record where each
key and value is a STRING
type. In Java, to create a repeated record we
create a single TableRow
for each of our repeated records. In this example,
we create a metadata entry for the IP address and hostname of the machine
logging the event.
TableRow ip = new TableRow()
.set("key", "ip")
.set("value", "127.0.0.1")
TableRow host = new TableRow()
.set("key", "host")
.set("value", "localhost")
Now the actual repeated record is defined as a list of TableRow
objects.
List<TableRow> metadata = new ArrayList<TableRow>();
metadata.add(host);
metadata.add(ip);
And finally, we can set our repeated record metadata
with our list of TableRow
objects.
// Add the repeated record to the original row
row.set("metadata", metadata);
Full source code follows:
TableRow row = new TableRow()
.set("guid", java.util.UUID.randomUUID().toString())
.set("timestamp", ISODateTimeFormat.dateTime().print(new DateTime(DateTimeZone.UTC)))
// Metadata
TableRow ip = new TableRow()
.set("key", "ip")
.set("value", "127.0.0.1")
TableRow host = new TableRow()
.set("key", "host")
.set("value", "localhost")
List<TableRow> metadata = new ArrayList<TableRow>();
metadata.add(host);
metadata.add(ip);
row.set("metadata", metadata);