Write-ahead logging and the ARIES crash recovery algorithm

A central tenet of databases is that any committed data survives a crash or a failure. Write-ahead logging is a fundamental primitive that ensures all changes to data are first written safely to stable storage before being applied. Coupling that with some careful use of sequence numbers and we can guarantee that changes made to a database can survive system crashes. Motivation Let’s start with a simple transaction T1 that reads object A, and updates the value for A with a write. To simplify matters, A is stored on disk as a single page as in Figure 1. ...

August 26, 2022 · 21 min · Kevin Sookocheff

How to kill a process that is a using port on macOS

Every so often I get stuck with a running process that’s using a specific port, preventing me from running some new application that uses the same port. Then I’m left Googling for solutions or rebooting the machine to make progress. But not any more! I’m recording the solution for my future self (and of course for you, dear reader). > sudo lsof -i :<PortNumber> # returns list of processes using the port, with PID > kill -9 <PID> # kill the specific pid For example, you can find out what’s running on port number 8080 by running the command: ...

June 29, 2022 · 1 min · Kevin Sookocheff

Progress is a lake, not a line

When people describe progress, they often describe it in terms of a linear progression taking us from primitive to advanced — an idea or invention occurs as a singular event, and somewhere further down the line of time a new idea or invention completely replaces it, relegating the old to the annals of history. This viewpoint is exemplified by traditional worldviews that organize all beings according to a chain of evolution, sometimes called the “great chain of being” (or scala naturae). The chain of being links God, angels, humans, animals, plants, and minerals in a hierarchy. All beings on earth, animate and inanimate, could be organized according to an increasing scale of perfection, from mushrooms and rocks at the bottom up through lobsters and rabbits, all the way to human beings and God at the top. ...

June 1, 2022 · 3 min · Kevin Sookocheff

Why Systems Work So Well

In the book “Thinking in Systems”, Donella Meadows dedicates an entire chapter to explaining why functioning systems seem to work so well. In it, she recognizes three characteristics: resilience, self-organization, and hierarchy. Resilience We can use the standard definition from the Oxford English dictionary to describe resilience: re·sil·ience /rəˈzilyəns/ noun the capacity to recover quickly from difficulties; toughness. “the often remarkable resilience of so many British institutions” the ability of a substance or object to spring back into shape; elasticity. “nylon is excellent in wearability and resilience” According to Donella Meadows this definition can be translated to systems as the “ability to survive and persist within a variable environment. The opposite of resilience is brittleness or rigidity.” ...

May 11, 2022 · 3 min · Kevin Sookocheff

Java For The Experienced Beginner

Java was the first programming language I was taught at University, and the language I used for the first decade of my career. It continues to be a reliable companion throughout my software development career. Unfortunately, not having developed with Java professionally for several years, I’ve found there are many aspects of the modern Java language that I’m simply not familiar with. To rectify this, I’ve collected the major improvements to the language beginning with Java 8, combined with a short explanation of how they work and how to use them. It assumes you know Java, but don’t really know Java. Hopefully, it can take you from experienced beginner to just plain experienced again. ...

April 27, 2022 · 43 min · Kevin Sookocheff

Behaviour Parameterization

One of the core features of modern Java is lambda expressions. Introduced in Java 8, lambdas provide concise syntax allowing the deferred execution of a block of code. Put a different way, lambdas allow us to pass behaviour as a method parameter. When the method executes, the lambda expression is run. This capability is often referred to as behaviour parameterization. Behaviour parameterization can be achieved in a number of ways, of which lambda expressions are usually the most convenient, and they are definitely the most concise. But what is behaviour parameterization, and why would we want to use it? To motivate this discussion, let’s work through a real-world example of filtering a list of items according to some criteria. More concretely, let’s investigate the problem filtering a list of students to find the ones with the best grades. ...

April 26, 2022 · 6 min · Kevin Sookocheff
Marching ants

What complex systems can teach us about building software

As a software system scales it becomes sufficiently large that the number of working parts, coupled with the number of working programmers making changes on it, makes the behaviour of the system extremely difficult to reason about. This complexity is exacerbated by the transition of many organizations towards a microservice architecture, as exemplified by the so-called “death star” architecture, where each point in the circumference of the circle represents a microservice and the lines between services represent their interactions. ...

March 9, 2022 · 13 min · Kevin Sookocheff

Hybrid Logical Clocks

The theory of distributed systems promoted the use of logical clocks by introducing the idea of causality tracking as an abstraction for reasoning about concurrency between events in the system. In practice, a lot of systems continue to operate using physical time, which presents difficulties due to clock synchronization drift. In an effort to bridge the gap between physical and logical time, HybridTime combines both logical and physical times in one system. Hybrid Logical Clocks (HLC) are extensions of the previous causality and time keeping systems that capture the causality relationship like logical clocks, but that can also substituted for physical clocks by maintaining its logical value always close to NTP. With these semantics, HLC can be used in lieu of a physical clock source like NTP for database reads, and it can simultaneously be used as a logical clock to identify consistent global snapshots. ...

January 24, 2022 · 4 min · Kevin Sookocheff

HybridTime

HybridTime introduces a hybrid between physical and logical clocks that can be used to implement globally consistent database snapshots. HybridTime is one implementation of the general concept of Hybrid Logical Clocks that combine the logical clock semantics of Lamport and Vector clocks with a physical representation of time such as Spanner’s TrueTime. HybridTime follows similar update semantics as Lamport and Vector Clocks, where each node in the system updates their internal clock in response to events received. It differs in that HybridTime values are not purely logical — they include a physical time component that allows values to be associated with a physical point-in-time. ...

January 11, 2022 · 5 min · Kevin Sookocheff

TrueTime

TrueTime is Google’s solution to providing to globally consistent timestamps to determine ordering of events. Originally developed to support the Spanner distributed database, TrueTime is a clock implementation that depends on two key factors: well engineered and accurate GPS and atomic clocks and the representation of time as an interval of uncertainty. Representing TrueTime TrueTime explicitly represents each timestamp an interval that includes bounded time uncertainty. TrueTime represents time as an interval of the type TTinterval, which includes two timestamps for the beginning and the end of the interval. These timestamps have type TTstamp. With this in mind, the TrueTime API provides the following methods reproduced from Spanner: Google’s Globally-Distributed Database: ...

December 23, 2021 · 5 min · Kevin Sookocheff