I've been talking to a decent number of developers about the system that Robert and I are writing at GFX, and I'm met with many blank stares every time the acronym EAV pops up. I thought it might be worthwhile to try and explain what is probably the most double edged sword in data modeling I've ever run into. If you do a search for EAV problems, you'll get 99 people out of 100 telling you one horror story after another about implementing/using it in production systems. They're not crazy, or wrong. EAV architecture is a very tricky beast to tame. Unfortunately, there's not an alternative I've found out there that gives the flexibility that an EAV system offers in an RDBMS. Knowledge is power, and in order to tame the beast, you first must understand it.
EAV (Entity Attribute Value) systems have been around for a good long while, if you're unfamiliar with the concepts, check out that wiki link, and I'll do my best to sum in about a paragraph or two here:
Basically, it takes the concept of a Table, with Columns describing the table's data and a row's context, and pivot's everything. You have your standard Entity that you're describing (a Car for example), which you use rows to describe your columns (the attributes, eg. Size, Color, WeightInLbs, # of doors, etc), and a matrix of rows to describe an individual Entity's attributes (Values, eg. Midsize, Red, 2210, 4). In a RDBMS, it's easy for users to create rows, but difficult (and almost always a bad practice) for them to create columns/tables. Using EAV offers a pretty high degree of flexibility within an application, but it's not without it's downsides: Performance, Deadlocking, and an increase in Complexity are the front runners.
There's two types of EAV systems: EAV, and EAV/CR (Class Relationships). The first, is the type of EAV system you get from most modern ecommerce sites, where you can add any number of attributes to one of your products and they're all arbitrary and weakly typed. This is the simplest implementation, and (my guess) the most common. The second is where the rabbit hole opens up. In that scenario, an attribute is given a more formal definition and contains meta attribute information. This is the EAV type I'll be talking about. A good read on EAV/CR can be found here.
The large project I have been working on at my day job, is essentially an EAV/CR system for our clients. I'll quickly state the most basic requirements of the system:
- Allows users to define their own Attributes, and strongly type them both in the domain and in the database.
- Theoretically needs to support unlimited number of attributes a user can create to describe their entity.
- Different "sets" of EAV data, so one user could have completely different data that is not shared between the context boundaries. (Context boundary could be anything, but typically is tied to each client, so every client has their own sets of attributes for entity types)
- Multiple Entity types, with their own unique attributes
- Extended attribute functionality that recreates many RDBMS column properties (eg. Uniqueness, Formulas, Bind able Context menus, etc.)
- Must be easily searchable
- Must allow for fast and scalable operations
These requirements are not without their own basic implementation challenges, but those last two bullet points, in my experience, are where the majority of the complications in building a system like this arise. EAV/CR's suffer from an exponential degradation in performance. n attributes with x entities produces nx=(v)alues. With 4,000 entities of the same type, and 200 attributes for each entity, you end up with 80,000 value rows. That's just for one entity type for one contextual boundary. Extend that with 50+ clients, and 10 entity types (80000 * 50 * 10), and you can quickly see how your v table gets out of control.
The complexities of re-implementing your own database in code causes searching to become an exercise in pivoting and endless sub-queries forking for each attribute data type you expose. Full text indexing goes from being a checkbox in most DB engines to a caching like framework you inject into your system.
In later parts of this series, I'm going to talk about some implementation challenges and a few ways I've found to overcome them in OO and NHibernate.