Best Guess Theory

A place to discuss Development techniques, .NET, XNA, NHibernate or anything else that tickles your fancy

Friday, February 5, 2010

The NHibernate Dereference pattern

Sometimes, NHibernate associations are so complex that relying on cascades is not always possible or optimal. Other times, you need to ensure that your session deleted object is actually removed from your in memory reference model (e.g. The loaded entities which still reference it) without having to refresh your objects from the database. I ran into this a while back when dealing with a bi-directional ManyToMany in which neither side would allow an all-delete-orphan cascade, and created a variant of the Disposable pattern to resolve it.

I'm sure that other developers implement something similar during their delete process, but I haven't run across anything formally defined. This keeps coming up, and after helping several FluentNHibernate & NHibernate users out by giving them direction I decided to do a writeup. I'm going to try and state the pattern on here for your reference and feedback. Disclaimer: I'm not saying it's perfect, or that there's not something better, I'm just saying it works for what we need, and we haven't found anything better yet.

Intent:
Called before an NHibernate Session.Delete(). Disassociates the entity from the domain model, removing all references to the entity, thereby allowing NHibernate to generate the appropriate UPDATE/DELETE statements in the database, in the order required, so there's no Foreign Key Violations. This pattern works with inheritance.

Implementation:


   1:  public interface IDereferenceable

   2:  {

   3:    void Dereference();

   4:  }

   5:   

   6:  public abstract class PersistedBase : IDereferenceable

   7:  {

   8:    protected bool IsDereferencing;

   9:   

  10:    public void Dereference()

  11:    {

  12:      Dereference(IsDereferencing);

  13:    }

  14:    protected virtual void Dereference(bool dereferencing)

  15:    {

  16:       if (dereferencing)

  17:       { return; }

  18:       IsDereferencing = true;

  19:   

  20:       //Dereference Children

  21:       foreach(Child child in Children.ToList())

  22:       {

  23:            RemoveChildCore(child);

  24:       }

  25:   

  26:       //Dereference Parents

  27:       if(parent != null)

  28:       {

  29:         parent.RemoveChild(this);

  30:         parent = null;

  31:       }

  32:    }

  33:   

  34:    private ICollection<Child> children;

  35:    public IEnumerable<Child> Children{ get { return children; } }

  36:    public void RemoveChild(Child child)

  37:    {

  38:        if(!IsDereferencing)

  39:        {

  40:            RemoveChildCore(child);

  41:        }

  42:    }

  43:   

  44:    protected void RemoveChildCore(Child child)

  45:    {

  46:        children.Remove(child);

  47:        IDereferenceable deref = child as IDereferenceable;

  48:        if(deref ! = null)

  49:        {

  50:            deref.Dereference();

  51:        }

  52:    }

  53:  }

  54:   

  55:  public class Concrete : PersistedBase

  56:  {

  57:    protected override void Dereference(bool dereferencing)

  58:    {

  59:       if (dereferencing)

  60:       { return; }

  61:       IsDereferencing = true;

  62:   

  63:       //Dereference Children

  64:       foreach(StepChild stepChild in StepChildren.ToList())

  65:       {

  66:            RemoveStepChildCore(stepChild);

  67:       }

  68:   

  69:       //Dereference Parents

  70:       if(stepParent != null)

  71:       { 

  72:         stepParent.RemoveChild(this);

  73:         stepParent = null;

  74:       }

  75:   

  76:       base.Dereference(false);

  77:    }

  78:   

  79:    private ICollection<StepChild> stepChildren;

  80:    public IEnumerable<StepChild> StepChildren{ get { return stepChildren; } }

  81:    public void RemoveStepChild(StepChild stepChild)

  82:    {

  83:        if(!IsDereferencing)

  84:        {

  85:            RemoveStepChildCore(stepChild);

  86:        }

  87:    }

  88:   

  89:    protected void RemoveStepChildCore(StepChild stepChild)

  90:    {

  91:        stepChildren.Remove(stepChild);

  92:        IDereferenceable deref = stepChild as IDereferenceable;

  93:        if(deref ! = null)

  94:        {

  95:            deref.Dereference();

  96:        }

  97:    }

  98:  }


Let's go down the list:
  1. Our abstract class/interface. This is the base class that handles basic dereference association and implements the IDereferenceable interface.
  2. IsDereferencing - This is similar to the IsDisposing/Disposed flag on IDisposable objects. It's a way to ensure that we only Dereference an object once. This is only set within the protected Dereference method if it's passed in a !dereferencing parameter.
  3. Dereference() - Public no-parameter. This is what is called by by the code that's about to delete the object. It's also called by other classes whenever they want to dereference this instance as a child reference. Passing in the current IsDereferencing flag prevents us from getting into a cyclic call if it's called several times by many parent classes.
  4. Dereference(bool dereferencing) A.) We early escape if our dereferencing parameter is true, so we don't perform any dereference logic more than once B.) Dereference children first. Children are typically going to be defined as collections, but there are always exceptions. C.) We enumerate over a copy of the collection. Reason? We're going to get a collection cannot be modified during enumeration exception otherwise. D.) We call the private RemoveChildCore(child), which does not do a dereferencing check (otherwise nothing would happen) E.) We check first to see if Parent is null, this is primarily here for situations where a Parent is allowed to be null, or during testing where you may not popular all the parents of an entity F.) If parent is not null, then we call the public RemoveChild(this) method on the parent so we're no longer referenced in a parent collection. Important to note, that RemoveChild() on the parent does perform a !IsDereferencing check on the parent, just like we have for the child. After all, the Child's Dereference() could have been called by the Parent's Dereference(), so we want to respect this. G.) We null out the parent
  5. Our collection is exposed out as a read only collection, which tells the user that we do not allow adding/removing directly on the collection. Good practice and safety precaution if you have additional association logic, which you would put into the AddChild(Child child) method (not implemented here for the sake of simplifying the example).
  6. The public RemoveChild(Child child) checks if we're dereferencing, if we are, then we skip actually removing the child. This is there so that when children are being dereferenced, and call parent.RemoveChild(this), they do not enter a cyclic loop. If not dereferencing, call Core dereference.
  7. RemoveChildCore(Child child) has no checks, and is private, so it should be tightly controlled by the class. This will remove the item from the private collection, cast the child as a dereferenceable, and then if the child is dereferenceable, it will proceed to call Dereference on it. NOTE: that child.Dereference() should only be called if the child dies with the parent, so if the parent is dereferenced, the child should be as well. in some cases, this is not the case, and there might only be an incidental association between them, such as is the case with a ManyToMany() association. In this case, dereferencing the child should be avoided (perhaps though you need to keep the two lists synchronized in the MTM example, so you'd call child.RemoveParents(this) in place of the child.Dereference() )
  8. The Concrete implementation of the base class. Here we have a similar setup as the base class, except we've got additional information that needs to be dereferenced (another collection and parent, defined as StepChildren and StepParent)
  9. Dereference(bool dereferencing). Our setup here is the same, and can be referenced from 4 A-G. However, it adds a new element at the end: H.) We call our base.Dereference(false) to continue up the dereference stack. Passing it false is important, since we know we're in the middle of dereferencing and should override the dereferencing check. If we passed the IsDereferencing field, the base Dereference would not fire, meaning that any references specific to the base class would still hold a reference to our class. As we're overriding, polymorphism dictates that our bottom level class (most concrete) will have Dereference called on it first. So it Dereferences from most specific to most generic (Covarience).
Usage:

   1:  using(var session = sessionFactory.OpenSession())

   2:  using(var transaction = session.BeginTransaction())

   3:  {

   4:    var concrete = session.Get(id);

   5:    concrete.Dereference();

   6:    session.Delete(concrete);

   7:    transaction.CommittTransaction();

   8:  }


So there you go, a way to manage the disassociations with NHibernate (or another ORM) and your object model. Hopefully this helps someone out, and if it doesn't, feel free to leave a comment if you've got any critiques or questions. Always open to feedback! (Also wrote all of the code in notepad, so if anything's off let me know and I'll correct it)

Friday, January 15, 2010

CodeMash 2010 Session Materials

Just finished my second session at CodeMash. Was a good time, and had a pretty packed house. As promised:


Thanks for attending, and CodeMash has been pretty amazing. Will come to Ohio again ;)

Thursday, December 31, 2009

EAV in OO/NHibernate: Part 1 - Intro

I've been talking to a decent number of developers about the system that Robert and I are writing at GFX, and I'm met with many blank stares every time the acronym EAV pops up. I thought it might be worthwhile to try and explain what is probably the most double edged sword in data modeling I've ever run into. If you do a search for EAV problems, you'll get 99 people out of 100 telling you one horror story after another about implementing/using it in production systems. They're not crazy, or wrong. EAV architecture is a very tricky beast to tame. Unfortunately, there's not an alternative I've found out there that gives the flexibility that an EAV system offers in an RDBMS. Knowledge is power, and in order to tame the beast, you first must understand it.

EAV (Entity Attribute Value) systems have been around for a good long while, if you're unfamiliar with the concepts, check out that wiki link, and I'll do my best to sum in about a paragraph or two here:

Basically, it takes the concept of a Table, with Columns describing the table's data and a row's context, and pivot's everything. You have your standard Entity that you're describing (a Car for example), which you use rows to describe your columns (the attributes, eg. Size, Color, WeightInLbs, # of doors, etc), and a matrix of rows to describe an individual Entity's attributes (Values, eg. Midsize, Red, 2210, 4). In a RDBMS, it's easy for users to create rows, but difficult (and almost always a bad practice) for them to create columns/tables. Using EAV offers a pretty high degree of flexibility within an application, but it's not without it's downsides: Performance, Deadlocking, and an increase in Complexity are the front runners.

There's two types of EAV systems: EAV, and EAV/CR (Class Relationships). The first, is the type of EAV system you get from most modern ecommerce sites, where you can add any number of attributes to one of your products and they're all arbitrary and weakly typed. This is the simplest implementation, and (my guess) the most common. The second is where the rabbit hole opens up. In that scenario, an attribute is given a more formal definition and contains meta attribute information. This is the EAV type I'll be talking about. A good read on EAV/CR can be found here.

The large project I have been working on at my day job, is essentially an EAV/CR system for our clients. I'll quickly state the most basic requirements of the system:
  • Allows users to define their own Attributes, and strongly type them both in the domain and in the database.
  • Theoretically needs to support unlimited number of attributes a user can create to describe their entity.
  • Different "sets" of EAV data, so one user could have completely different data that is not shared between the context boundaries. (Context boundary could be anything, but typically is tied to each client, so every client has their own sets of attributes for entity types)
  • Multiple Entity types, with their own unique attributes
  • Extended attribute functionality that recreates many RDBMS column properties (eg. Uniqueness, Formulas, Bind able Context menus, etc.)
  • Must be easily searchable
  • Must allow for fast and scalable operations
These requirements are not without their own basic implementation challenges, but those last two bullet points, in my experience, are where the majority of the complications in building a system like this arise. EAV/CR's suffer from an exponential degradation in performance. n attributes with x entities produces nx=(v)alues. With 4,000 entities of the same type, and 200 attributes for each entity, you end up with 80,000 value rows. That's just for one entity type for one contextual boundary. Extend that with 50+ clients, and 10 entity types (80000 * 50 * 10), and you can quickly see how your v table gets out of control.

The complexities of re-implementing your own database in code causes searching to become an exercise in pivoting and endless sub-queries forking for each attribute data type you expose. Full text indexing goes from being a checkbox in most DB engines to a caching like framework you inject into your system.

In later parts of this series, I'm going to talk about some implementation challenges and a few ways I've found to overcome them in OO and NHibernate.

Thursday, November 5, 2009

Join Fetch on NH queries returns dupes

Domain:

public class Parent
{
public ICollection<Children> Children {get;set;}
}
public class Child
{
public string Description {get;set;}
}


Say we want to retrieve all the Parents in HQL, but we want to join fetch to get children simultaniously, so we don't lazyload children. In my case, I knew I'd be working with children immediately after retrieving Parents. The HQL would be something like this:

SELECT p FROM Parent p JOIN FETCH p.Children

Nothing fancy. When we execute that HQL query, we're going to get a Parent row returned to us, for every child for every parent. So if we've got 3 Parents, and each one has 2 children, we will have 6 rows returned from that statment.

Each row will have all of the attributes of Parent on it (as columns), and it will also have all of the attributes of child (as columns). This is how NHibernate ensures it retrieves both objects in the database simultaneously.

So, after you execute the query, NHibernate gives you a list of 6 Parents. Each parent has a duplicate of itself in the retrieved list. Why? This is because that's what was returned, for rows, when you asked for Parents. You're getting both Parent and Child, but you're only selecting the Parent as the result from the query, NHibernate is smart enough to preload each Parent's collection of Children so when you attempt to access the collection, NHibernate does not need to go back to the database to load the collection.

So, my problem was, how to get just 3 Parent's back, instead of 6. Well, a few people, might suggest using DISTINCT in your HQL, so it'd look something like:

SELECT DISTINCT p FROM Parent p JOIN FETCH p.Children

Execute that, and you'll still get 6 Parent's, each parent, along with it's dupe. Why? Because DISTINCT is a SQL based syntax, and those rows are not dupes. Only the objects that are hydrated in the object model are. So you're not going to have any luck there.

What are your options?
1.) Use the results transformer:
query/criteria.SetResultTransformer(new DistinctRootEntityResultTransformer());
The result transformer should work after NH has retrieved the objects back from the database, and will attempt to do a comparison with your root entity (I believe on the ID) to determine if they're the same.
2.) Do it yourself in Linq with results.Distinct();

Hopefully this helps explain the situation a bit for a few people out there that might be experiencing the issue.

Saturday, October 17, 2009

It's been a while

Things have been busy, and apologies all around for not updating this as often as I should.

I'm doing an NHibernate presentation at the Strange Loop conference in St. Louis next week, although Robert will not be co-presenting contrary to the session description. Hopefully, after I return from there, there will be more time for me to continue the NH/FNH tips and general updates. (I will end up making a post with the session materials after the session however)

Since my last post, FNH 1.0 has come out thanks in large efforts to James Gregory, Tuna's gotten into XNA and found several physics frameworks for it that I need to check out and make a few XNA related posts, and NH went to 2.1 officially.

I've also just learned that my FNH session has been accepted at CodeMash for January, so I hope to see anyone I don't see in St. Louis next week, there.

Until then!

Thursday, July 16, 2009

Video: Fluent NHibernate session from Chicago's Alt.NET

Sergio posted the video of my session from 7/8.
http://chicagoalt.net/event/July2009Meeting060withFluentNHibernate

This one was a good bit longer than the one from Chicago Code Camp, due to no hard time restrictions. Thanks Sergio!

Thursday, July 9, 2009

NHibernate Tip #3 - ID's

(Prequalifier, this is still a subject of much Debate, please toss in your thoughts into the comments. These are Tips that I've run across that I wanted to share, it's not to say that this is the best way to do something every time, and should not be considered the catch all solution)

A subject of much debate, especially if anyone in the debate happens to have anything resembling a background as a DBA. I'll come right out and say what I believe to be the most ideal solution in a non-legacy environment that isn't geared towards squeezing the most performance out of the database as possible:



   1:  <id name="Id" 

   2:          column="PrimaryTableID" 

   3:          type="System.Guid">

   4:        <generator class="assigned" />

   5:  </id>

   6:  <version name="Version"

   7:               unsaved-value="0" />




I would also recommend to have all of your objects generate the ID in the non-default constructor (you typically want to leave an empty default constructor for NHibernate for it to load an object back from the database without any "Object initialization" logic)

The above mapping is, in my opinion, the most flexible ID setup to use with NHibernate. Why?
  • NHibernate never has to go back to the database to "find out" what the ID is for the object, which is a very expensive operation on an Identity generation setup. A Hi-Lo setup would be a decent second choice to address this concern
  • Assuming you generate the ID in the non-default constructors, the ID of an object created anywhere is going to be consistent, whether you're outside of a session (such as in model tests), or in a session using hydrated objects
  • It significantly simplifies the equality checks you're required to do. As someone commented in my previous post, if you're unable to rely on having a unique ID at all times, your equality logic has to start incorporating additional rules to account for the pre-persistance case scenario's
  • Version manages when an object should be inserted through a cascade or not (as opposed to an unsaved value attribute). Version is also handy to have for other reasons that I'll bring up in a later post.

A couple of cons:
  • Looking at Guids in the database is not a pleasing experience, but really, you shouldn't be interacting with the database at that level. It's an implementation detail.
  • Using Guids in the database can increase DB fragmentation, but really, that's where I feel the Database Engine Tuning advisor (if you're using MsSql) or a DBA steps in. There are ways around this, and as far as we've been able to tell with millions of rows in a production application, the differences seem to be almost un-noticable. Still, there's no way to get around it, it's not the best performing scenario on the DB side.

To be honest, I was originally on the side of continuing to use Identity, although I was looking at a hi-lo generator algorithm with it. My background is in a more DBA-ish field, so the concept presented above was like garlic to a vampire.

Times, they be a-changin. We need to let go of some of our old DBA habits in favor of emerging frameworks/concepts which can hide persistence complexities.