Paper Review: The Anatomy of a Large-Scale Hypertextual Web Search Engine

Title and Author of Paper The Anatomy of a Large-Scale Hypertextual Web Search Engine. Sergey Brin and Lawrence Page. Summary This paper describes the underpinnings of the Google search engine. The paper presents the initial Google prototype and describes the challenges in scaling search engine technology to handle large datasets. At the time of writing, the main goal of Google is to improve the quality of web searches by taking advantage of the existing link data embedded in web pages to calculate the quality of a page. [Read More]

Paper Review: Consistency Analysis in Bloom: a CALM and Collected Approach

Title and Author of Paper Consistency Analysis in Bloom: a CALM and Collected Approach. Alvaro et al. Summary Distributed programming is difficult for even experienced developers to get correct. Understanding the tradeoff between consistency, availability, and latency, while guaranteeing data correctness, provides a wealth of problems for the application developer. This paper presents a language and method for programmatically verifying distributed consistency. CALM - Consistency and Logical Monotonicity There is a connection between distributed consistency algorithms and logical monotonicity, that is, our programs must be correct even in the face of the delay and re-ordering of messages and data across different nodes in a system. [Read More]

SQS or Kinesis? Comparing Apples to Oranges

When designing Workiva’s durable messaging system we took a hard look at using Amazon’s Kinesis as the message storage and delivery mechanism. At first glance, Kinesis has a feature set that looks like it can solve any problem: it can store terabytes of data, it can replay old messages, and it can support multiple message consumers. But if you dig a little deeper you will find that Kinesis is well suited for a very particular use case, and if your application doesn’t fit this use case, Kinesis may be a lot more trouble than it’s worth. [Read More]

Software Architecture as Business Analysis

Architecture is the bridge between (often abstract) business goals and the final (concrete) resulting system. Software Architecture in Practice A software architect should act as a bridge between business stakeholders and technical stakeholders. To be this bridge requires understanding the business problem being solved, and being able to distill that problem into a technical solution that a software team can implement. In essence, the architect acts as a technical business analyst that helps to define the needs of an organization and recommend solutions that deliver value to stakeholders. [Read More]

Paper Review: The CQL continuous query language: semantic foundations and query execution

Title and Author of Paper The CQL continuous query language: semantic foundations and query execution. Arasu et al. Summary CQL is a derivation of the SQL query language developed for running continuous queries over streams of data. The goals of the system are to provide a precise set of language semantics for running such continuous stream workloads. The paper starts by defining precise abstract semantics for continuous queries that cover two data types — streams and relations — and three classes of operators: ones that produce a relation from a stream, one that produces a relation from other relations, and one that produces a stream from a relation. [Read More]

Paper Review: BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

Title and Author of Paper BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. Agarwal et al. Summary BlinkDB is a massively parallel database that provides approximate results for queries over large data sets. BlinkDB’s distinguishing feature is providing the opportunity for users to trade response time for query accuracy — partial results are returned with annotated error bars describing their accuracy at the current point in time. [Read More]

Paper Review: Informix under CONTROL: Online Query Processing

Title and Author of Paper Informix under CONTROL: Online Query Processing. J. M. Hellerstein et al. Summary The CONTROL project attempts to improve the interaction between users and computers during data analysis. Traditional data analysis systems are a black box where a user enters a query, and waits for some amount of time before receiving a result. The CONTROL project aims to make this process interactive by continuously providing approximate results that are improved over time. [Read More]

How Much is That Going to Cost Anyway? Calculating Cost of Goods Sold

One of the realities of running a business is that you will — at some point — need to make money. This means you must sell something for more than it cost to make it. Profit = Total Revenue - Total Expenses So, to sell a service or feature (like a third-party API portal) at a profit, we need to understand how much it costs to make it, so we know how much to sell it for. [Read More]

Paper Review: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates

Title and Author of Paper An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. Y. Zhao et al. Summary One of the core functions of an OLAP system is computing aggregations and group-by operations. This functionality has been characterized by the “Cube” operator, which computes group-by aggregations over all possible subsets of a specified dimension. As an example of the Cube operator, consider a model with the dimensions product, store, date, and the measured value sales. [Read More]

Italics in Vim

I write Markdown using Vim and the vim-pencil plugin and one of the things that particularly bothered me was that, by default, iTerm2 and the solarized colour scheme did not support italic text. I finally sat down this evening and got this working. Really, I just found and watched Greg Hurrel’s YouTube video explaining the whole thing. To enable italics requires updating the terminfo database. Terminfo enables programs to use the terminal in a device-independent manner. [Read More]