Imagine a fledgling software startup consisting of one or two developers. They are following the lean startup methodology by throwing ideas and implementations at a wall to see what sticks. This methodology demands keeping your application as simple as possible until you find the optimum market. An on-going concern for the developers is finding a simple, flexible way to store application data: NoSQL or SQL?
The NoSQL database offers a premium out-of-the-box experience: install a package, start the database, and post and retrieve data using a JSON API. And by eschewing the details of a schema and the inconvenience of data modelling, the NoSQL database allows for fast iteration.
The SQL solution requires more up-front investment: installation requires configuring a multitude of knobs and switches, inserting data requires first setting up your database tables and schemas, and getting data out of the system requires pairing SQL with an object-relational mapping system.
Given these trade-offs, the startup chooses NoSQL. This choice is vindicated as the product matures and the first paying customers are secured — it’s time to celebrate the brilliant technical and business decisions leading to this moment and frame that first revenue check for posterity.
Stage One: Denial
Having chosen a NoSQL database two very important decisions have been made implicitly: you don’t need ACID transactions, and you don’t need a schema. These two decisions may seem innocuous, but the results of those decisions have a profound impact on the future of your application.
Transactions
The first form of denial is that you don’t need ACID transactions. This may be true during the early days of an application, but there is such a broad set of use cases where transactions make your application logic easier to write and to reason about. Abandoning transactions completely makes your these tasks much more difficult.
Schema
The second form of denial is that you don’t need a schema. As described elsewhere, deferring the creation of a schema to application code is most useful for semi-structured data. Yet, it’s hard to think of use cases where semi-structured data is the norm — almost all data inserted into a database consists of properties with values on a single entity or table. Ultimately, this form of denial implies that it is the application’s responsibility to track the relationships of data.
Stage Two: Anger
Having worked through denial and accepted that maybe transactions are nice to have, and that maybe schema-on-write is useful, you realize something else about your NoSQL database. It fails. You lose data.
NoSQL databases are designed to be deployed in distributed clusters that provide a balance between consistency and availability. And the moment you step into the realm of distributed systems you must contend with complex distributed algorithms for coordination, replication, and fail over.
As chronicled by Kyle Kingsbury in his excellent Jepsen series of posts, providing consistency guarantees in a distributed environment is difficult.
Stage Three: Bargaining
Your anger at lost data has subsided, you realize you need transactions, and that a schema can be a good idea. What now? You make a bargain with your NoSQL database.
The first bargain is to write a transaction manager for your application. Unfortunately, transaction systems are typically monolithic, deeply entwined systems involving interdependencies between … concurrency control, recovery management, and access methods. You may succeed, but the end result will be difficult to extend or maintain and has likely cost your business more value that it provided in.
The second bargain is to navigate around your schema-less database using complex organizational rules and processes for enforcing data integrity. These rules are embedded in application logic and need to be communicated to every development team using the database and written into every application that accesses the data.
Stage Four: Depression
After coming up with a workable solution to your transaction problem, and an ad-hoc solution for enforcing a schema, you realize your NoSQL database can’t handle your reporting requirements. It can’t join tables. It can’t group records. Complex queries are out of the question. Depression sets in as you look to Hadoop for your reporting needs, only to realize the significant operational expertise required to manage a Hadoop cluster.
Stage Five: Acceptance
Wow. Maybe NoSQL is not a panacea. Maybe I have needs that it doesn’t cover?
Finally accepting this truth, you critically evaluate your database solution. You find that some portions of your data are relational — and some portions are not. You find that relational databases can scale to meet some of your needs — and NoSQL solutions scale to meet others. You understand that the benefits of NoSQL are weighted by an equal number of costs.
Ultimately, you accept that technology choice should not be driven by media hype but by taking your time to evaluate technologies against the criteria that matter to you and your business. You understand that early adopters live at the edge of a cliff; equally likely to fall off than to climb higher.