Paper Review: Informix under CONTROL: Online Query Processing

Title and Author of Paper Informix under CONTROL: Online Query Processing. J. M. Hellerstein et al. Summary The CONTROL project attempts to improve the interaction between users and computers during data analysis. Traditional data analysis systems are a black box where a user enters a query, and waits for some amount of time before receiving a result. The CONTROL project aims to make this process interactive by continuously providing approximate results that are improved over time. Implementing such a system requires rethinking some fundamental tenants of database systems. First, with interactive systems queries may never complete, but instead they may be halted when results are “good enough”. Second, interactive systems must be able to provide approximate results quickly while maximizing the rate at which an accurate answer is found. This paper explores the changes in database technology needed to support interactive use cases. ...

March 3, 2017 · 4 min · Kevin Sookocheff

How Much is That Going to Cost Anyway? Calculating Cost of Goods Sold

One of the realities of running a business is that you will — at some point — need to make money. This means you must sell something for more than it cost to make it. Profit = Total Revenue - Total Expenses So, to sell a service or feature (like a third-party API portal) at a profit, we need to understand how much it costs to make it, so we know how much to sell it for. In business parlance, the quantity we are looking for is the cost of goods sold — the accumulated costs used to create a good, including direct labor, materials, and any overhead. ...

February 26, 2017 · 8 min · Kevin Sookocheff

Paper Review: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates

Title and Author of Paper An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. Y. Zhao et al. Summary One of the core functions of an OLAP system is computing aggregations and group-by operations. This functionality has been characterized by the “Cube” operator, which computes group-by aggregations over all possible subsets of a specified dimension. As an example of the Cube operator, consider a model with the dimensions product, store, date, and the measured value sales. To compute the Cube for this data set requires computing sales for all subsets of the dimensions: sales by product, store, and date; sales by product and store; sales by product; etc. As a user, I want the system to prepare these results for me in response to ad-hoc queries or as part of a ETL job that prepares the data for analysis. Because there is a lot of data involved, the challenge of implementing the Cube operator is in computing these aggregations as efficiently as possible. ...

February 20, 2017 · 5 min · Kevin Sookocheff

Italics in Vim

I write Markdown using Vim and the vim-pencil plugin and one of the things that particularly bothered me was that, by default, iTerm2 and the solarized colour scheme did not support italic text. I finally sat down this evening and got this working. Really, I just found and watched Greg Hurrel’s YouTube video explaining the whole thing. To enable italics requires updating the terminfo database. Terminfo enables programs to use the terminal in a device-independent manner. For us, this means that it allows applications to lookup the correct escape codes for displaying italics. If the terminfo database has the correct escape codes present in the database, italics are displayed. If not? No italics. ...

February 16, 2017 · 2 min · Kevin Sookocheff

Comparing Swagger with Thrift or gRPC

I’ve been asked recently, what’s the difference between Swagger and Thrift (or gRPC)? Although they look similar, they solve fundamentally different problems. Let’s look at the differences. Swagger At the most basic level, Swagger is a REST API specification language. The great part is that there is an entire ecosystem of tools built around this specification language to support API design, client and server code generation, and interactive documentation. Key Features REST + JSON API framework. JSON requests and responses. Code generation. Documentation generation. Thrift Thrift is a software framework for supporting RPC. An interface definition language is used to describe your system in terms of data types and interfaces, the Thrift compiler generates client and server code that match your definition, and the Thrift library handles serialization and transport. ...

February 9, 2017 · 3 min · Kevin Sookocheff

Being Good Enough

Better a diamond with a flaw than a pebble without. Confucius We all want to have a perfect product, a perfect system, and a perfect development story. Unfortunately, reality is … reality. And it’s not perfect. One of the biggest struggles of engineering well is understanding the constant push and pull among the forces that govern the “rest of the business”, and governing your technological and development choices accordingly. Do not strive for perfection. Strive to be good enough. ...

January 31, 2017 · 1 min · Kevin Sookocheff

Yet Another S3 Static Site

Here it is. My version of the S3 static site. This one is publishable through CloudFormation and uses CodeCommit and CodeBuild to regenerate and publish the site with every push to the host Git repository. Any change to the CodeCommit Git repository automatically triggers a build through CodeCommit. This build runs the Hugo static site generator on your repo and syncs the results to an S3 bucket configured for serving a static site. ...

January 18, 2017 · 1 min · Kevin Sookocheff

Packaging a Custom Boomi Connector

Having created a custom Boomi connector, the next step is packaging it as a Jar file for testing and release. This requires setting some configuration files that allow the Atom process to load and run your files. Connector configuration file The first configuration we need to set is the META-INF/connector-config.xml file. This file tells the Atom process which class implements your custom connector. This file must have a root XML element named GenericConnector and specify the class name of your connector — which must be the class that extends the BaseConnector class. ...

January 17, 2017 · 2 min · Kevin Sookocheff

Creating a Custom Dell Boomi Connector

This article will show you how to create a custom connector for reading data from Dell Boomi. The connector will read the list of GitHub follower’s from the public GitHub API. This should provide an overview of how to write your own custom connector for a unique I/O source. Prerequisites To follow along, you need to have a valid Boomi licence (or free trial) to setup the Boomi Connector SDK. The SDK is written in Java so Java development experience is assumed. ...

January 16, 2017 · 6 min · Kevin Sookocheff

Paper Review: Implementing Data Cubes Efficiently

Business intelligence and analytics use cases involve complex queries on potentially very large databases. To minimize query response times, query optimization is critical. One approach to optimizing query response times is to precompute relevant values ahead of time, and to use those precomputed results to answer queries. Unfortunately, it is not always feasible to precompute every potential value that is required to answer arbitrary queries. This paper describes a framework and presents algorithms that pick a good subset of queries to precompute to optimize response time. ...

January 14, 2017 · 3 min · Kevin Sookocheff