Filing this under things Tuna has recommended me that I don't have time to check into yet, but which look pretty friggan sweet. I'll update later with anything I find on it. Or you guys can tell me if you've had any experience with it ;)
Best Guess Theory
A place to discuss Development techniques, .NET, XNA, NHibernate or anything else that tickles your fancy
Wednesday, February 17, 2010
Friday, February 12, 2010
Converting CHM to ePUB (and a e-reader review)
Just got my Astak Ez Reader Pocket Pro 5", and I love it. It's the perfect size, solid resolution, has all the features, all the document support you could ever want, is very fast, has 3 different places to change the page, including link navigation, supports up to a 16gb SD card (aka a metric assload of books), no bullshit DRM and has a battery life measured in months. That's right. Months. This thing just doesn't die. It's brilliant. I've been raving about it for the week or so that I've had it now.
So on to the issue. I have a few technical books as CHM files (Old school, right?), and with the advent of ePub as a standard e-reader format, I wanted a way to convert them into some spiffy epubs so I can take advantage of text reflow (which is also supported in PDFs on the Astak). ePub is an amazing format. Low size, fantastic orientation, and chapter support. Plus, it's a standard document definition, unlike PDF, which is all over the place. It's also quite a good bit faster than PDF to navigate and turn pages on. Faster in the e-reader world typically means less processing, and less processing means longer battery life (plus a happier reader not having to wait as long for the page to turn).
Step1: CHM to HTML.
You have to get the CHM into HTML somehow. There's a few tools to do this. My favorite is ABC Amber's CHM Converter. It does cost, but it's worth it if you've got a lot of CHMs. If you don't, you can still use the trial and get away with not having watermarks crapped out all over the result (I'll explain in a few). You can also use the HH.exe in windows (if that's your platform of choice) to decompile the CHM out into a messy website in a folder of your choice (HH.EXE -decompile C:\Temp\decompile-folder C:\Temp\yourCHM.chm). You can typically convert in one of two ways:
- A single HTML file
- Multiple HTML files (website)
If you go option 1, it's much easier to track and manage, however because of the way ereaders work, navigating pages and chapters will be a huge pain. It'll take quite a few to turn to the next page and 10-15 seconds to navigate chapters on a 600 page book. Option 2 makes turning pages and chapter navigation significantly faster, but involves more work on your end to manage.
We want to go option 2, this is the best speed, and that counts for a lot on an eReader. Now that we've got an output directory containing our "website" we want to investigate the dir. Inside, you should find a single HTML file and a folder of the same name. Open up the folder and go to the images directory (or something similar). Remove any superfluous images such as arrow navigations or whatever else might have been included in the CHM (like header seperators and such). Feel free to also remove any images that you feel might not be of a significant benefit to you reading the book.
Now, look for a main.html or go up a level and look at the html file in the root folder. What you're looking for is the HTML file that acts as the Menu. The menu HTML file is important because it points to all of our chapters/pages for use during the next stage. It's also the place that if you used a trial of Amber CHM, you can do a search and replace for "[Trial Version]" and replace with an empty string and remove all of the "watermarking" that's done during the trial version conversion process. At this point, it's up to you to weed out any chapters you don't want to include in your ePub, such as appendixes, which can help to reduce the "noise" you'll see when navigating the ePub's menu file. Save the file if you've made any changes and continue on to the next step.
Step 2: HTML to ePUB
Going to introduce you to an awesome piece of free OSS software. Calibre. This converts from almost any format into an ePub, and it's great at it. If you're using a single HTML file from step1, then all you need to do here is "Add book" point it at the HTML file, and then "Convert To" and specify ePub as the target source. Done. However that's not what we've chosen to do if you're following along. So when you go to "Add book" in Calibre, you need to select *only* the MENU HTML FILE that you created in the first step. This file is going to tell Calibre where all of the rest of your "chapters" are.
Give Calibre a few moments to import and create a zip containing all of the referenced files. When it's completed, you should see the book appear in the list (hopefully with a size > 0 next to it if you've done it correctly). Right click and go to "Convert". Choose ePub as your output. I'd also recommend deselecting the options for a title page/image. Also select the option in the Menu section for Calibre to generate it's own menu file (it'll clean it up a bit).
Convert. You can now copy the converted ePub to your eReader (which is hopefully an Astak! =p ). Where is it? It's in the Calibre directory created for managing your books. Can't find that? Right click on the book in the Calibre list and go to "View source folder". Viola.
Now you've got a lightning quick ePub file out of a single CHM. Hopefully this helps a few people out :) Good luck!
Friday, February 5, 2010
The NHibernate Dereference pattern
Sometimes, NHibernate associations are so complex that relying on cascades is not always possible or optimal. Other times, you need to ensure that your session deleted object is actually removed from your in memory reference model (e.g. The loaded entities which still reference it) without having to refresh your objects from the database. I ran into this a while back when dealing with a bi-directional ManyToMany in which neither side would allow an all-delete-orphan cascade, and created a variant of the Disposable pattern to resolve it.
I'm sure that other developers implement something similar during their delete process, but I haven't run across anything formally defined. This keeps coming up, and after helping several FluentNHibernate & NHibernate users out by giving them direction I decided to do a writeup. I'm going to try and state the pattern on here for your reference and feedback. Disclaimer: I'm not saying it's perfect, or that there's not something better, I'm just saying it works for what we need, and we haven't found anything better yet.
Intent:
Called before an NHibernate Session.Delete(). Disassociates the entity from the domain model, removing all references to the entity, thereby allowing NHibernate to generate the appropriate UPDATE/DELETE statements in the database, in the order required, so there's no Foreign Key Violations. This pattern works with inheritance.
Implementation:
1: public interface IDereferenceable
2: {3: void Dereference();
4: } 5: 6: public abstract class PersistedBase : IDereferenceable
7: {8: protected bool IsDereferencing;
9: 10: public void Dereference()
11: { 12: Dereference(IsDereferencing); 13: }14: protected virtual void Dereference(bool dereferencing)
15: {16: if (dereferencing)
17: { return; }
18: IsDereferencing = true;
19: 20: //Dereference Children
21: foreach(Child child in Children.ToList())
22: { 23: RemoveChildCore(child); 24: } 25: 26: //Dereference Parents
27: if(parent != null)
28: {29: parent.RemoveChild(this);
30: parent = null;
31: } 32: } 33: 34: private ICollection<Child> children;
35: public IEnumerable<Child> Children{ get { return children; } }
36: public void RemoveChild(Child child)
37: {38: if(!IsDereferencing)
39: { 40: RemoveChildCore(child); 41: } 42: } 43: 44: protected void RemoveChildCore(Child child)
45: { 46: children.Remove(child);47: IDereferenceable deref = child as IDereferenceable;
48: if(deref ! = null)
49: { 50: deref.Dereference(); 51: } 52: } 53: } 54: 55: public class Concrete : PersistedBase
56: {57: protected override void Dereference(bool dereferencing)
58: {59: if (dereferencing)
60: { return; }
61: IsDereferencing = true;
62: 63: //Dereference Children
64: foreach(StepChild stepChild in StepChildren.ToList())
65: { 66: RemoveStepChildCore(stepChild); 67: } 68: 69: //Dereference Parents
70: if(stepParent != null)
71: { 72: stepParent.RemoveChild(this);
73: stepParent = null;
74: } 75: 76: base.Dereference(false);
77: } 78: 79: private ICollection<StepChild> stepChildren;
80: public IEnumerable<StepChild> StepChildren{ get { return stepChildren; } }
81: public void RemoveStepChild(StepChild stepChild)
82: {83: if(!IsDereferencing)
84: { 85: RemoveStepChildCore(stepChild); 86: } 87: } 88: 89: protected void RemoveStepChildCore(StepChild stepChild)
90: { 91: stepChildren.Remove(stepChild);92: IDereferenceable deref = stepChild as IDereferenceable;
93: if(deref ! = null)
94: { 95: deref.Dereference(); 96: } 97: } 98: }Let's go down the list:
- Our abstract class/interface. This is the base class that handles basic dereference association and implements the IDereferenceable interface.
- IsDereferencing - This is similar to the IsDisposing/Disposed flag on IDisposable objects. It's a way to ensure that we only Dereference an object once. This is only set within the protected Dereference method if it's passed in a !dereferencing parameter.
- Dereference() - Public no-parameter. This is what is called by by the code that's about to delete the object. It's also called by other classes whenever they want to dereference this instance as a child reference. Passing in the current IsDereferencing flag prevents us from getting into a cyclic call if it's called several times by many parent classes.
- Dereference(bool dereferencing) A.) We early escape if our dereferencing parameter is true, so we don't perform any dereference logic more than once B.) Dereference children first. Children are typically going to be defined as collections, but there are always exceptions. C.) We enumerate over a copy of the collection. Reason? We're going to get a collection cannot be modified during enumeration exception otherwise. D.) We call the private RemoveChildCore(child), which does not do a dereferencing check (otherwise nothing would happen) E.) We check first to see if Parent is null, this is primarily here for situations where a Parent is allowed to be null, or during testing where you may not popular all the parents of an entity F.) If parent is not null, then we call the public RemoveChild(this) method on the parent so we're no longer referenced in a parent collection. Important to note, that RemoveChild() on the parent does perform a !IsDereferencing check on the parent, just like we have for the child. After all, the Child's Dereference() could have been called by the Parent's Dereference(), so we want to respect this. G.) We null out the parent
- Our collection is exposed out as a read only collection, which tells the user that we do not allow adding/removing directly on the collection. Good practice and safety precaution if you have additional association logic, which you would put into the AddChild(Child child) method (not implemented here for the sake of simplifying the example).
- The public RemoveChild(Child child) checks if we're dereferencing, if we are, then we skip actually removing the child. This is there so that when children are being dereferenced, and call parent.RemoveChild(this), they do not enter a cyclic loop. If not dereferencing, call Core dereference.
- RemoveChildCore(Child child) has no checks, and is private, so it should be tightly controlled by the class. This will remove the item from the private collection, cast the child as a dereferenceable, and then if the child is dereferenceable, it will proceed to call Dereference on it. NOTE: that child.Dereference() should only be called if the child dies with the parent, so if the parent is dereferenced, the child should be as well. in some cases, this is not the case, and there might only be an incidental association between them, such as is the case with a ManyToMany() association. In this case, dereferencing the child should be avoided (perhaps though you need to keep the two lists synchronized in the MTM example, so you'd call child.RemoveParents(this) in place of the child.Dereference() )
- The Concrete implementation of the base class. Here we have a similar setup as the base class, except we've got additional information that needs to be dereferenced (another collection and parent, defined as StepChildren and StepParent)
- Dereference(bool dereferencing). Our setup here is the same, and can be referenced from 4 A-G. However, it adds a new element at the end: H.) We call our base.Dereference(false) to continue up the dereference stack. Passing it false is important, since we know we're in the middle of dereferencing and should override the dereferencing check. If we passed the IsDereferencing field, the base Dereference would not fire, meaning that any references specific to the base class would still hold a reference to our class. As we're overriding, polymorphism dictates that our bottom level class (most concrete) will have Dereference called on it first. So it Dereferences from most specific to most generic (Covarience).
Usage:
1: using(var session = sessionFactory.OpenSession())
2: using(var transaction = session.BeginTransaction())
3: { 4: var concrete = session.Get(id); 5: concrete.Dereference(); 6: session.Delete(concrete); 7: transaction.CommittTransaction(); 8: }So there you go, a way to manage the disassociations with NHibernate (or another ORM) and your object model. Hopefully this helps someone out, and if it doesn't, feel free to leave a comment if you've got any critiques or questions. Always open to feedback! (Also wrote all of the code in notepad, so if anything's off let me know and I'll correct it)
Labels:
Best Practice,
C#,
Development,
NHibernate,
SQL
Friday, January 15, 2010
CodeMash 2010 Session Materials
Just finished my second session at CodeMash. Was a good time, and had a pretty packed house. As promised:
Here's my session materials
Thanks for attending, and CodeMash has been pretty amazing. Will come to Ohio again ;)
Thursday, December 31, 2009
EAV in OO/NHibernate: Part 1 - Intro
I've been talking to a decent number of developers about the system that Robert and I are writing at GFX, and I'm met with many blank stares every time the acronym EAV pops up. I thought it might be worthwhile to try and explain what is probably the most double edged sword in data modeling I've ever run into. If you do a search for EAV problems, you'll get 99 people out of 100 telling you one horror story after another about implementing/using it in production systems. They're not crazy, or wrong. EAV architecture is a very tricky beast to tame. Unfortunately, there's not an alternative I've found out there that gives the flexibility that an EAV system offers in an RDBMS. Knowledge is power, and in order to tame the beast, you first must understand it.
EAV (Entity Attribute Value) systems have been around for a good long while, if you're unfamiliar with the concepts, check out that wiki link, and I'll do my best to sum in about a paragraph or two here:
Basically, it takes the concept of a Table, with Columns describing the table's data and a row's context, and pivot's everything. You have your standard Entity that you're describing (a Car for example), which you use rows to describe your columns (the attributes, eg. Size, Color, WeightInLbs, # of doors, etc), and a matrix of rows to describe an individual Entity's attributes (Values, eg. Midsize, Red, 2210, 4). In a RDBMS, it's easy for users to create rows, but difficult (and almost always a bad practice) for them to create columns/tables. Using EAV offers a pretty high degree of flexibility within an application, but it's not without it's downsides: Performance, Deadlocking, and an increase in Complexity are the front runners.
There's two types of EAV systems: EAV, and EAV/CR (Class Relationships). The first, is the type of EAV system you get from most modern ecommerce sites, where you can add any number of attributes to one of your products and they're all arbitrary and weakly typed. This is the simplest implementation, and (my guess) the most common. The second is where the rabbit hole opens up. In that scenario, an attribute is given a more formal definition and contains meta attribute information. This is the EAV type I'll be talking about. A good read on EAV/CR can be found here.
The large project I have been working on at my day job, is essentially an EAV/CR system for our clients. I'll quickly state the most basic requirements of the system:
- Allows users to define their own Attributes, and strongly type them both in the domain and in the database.
- Theoretically needs to support unlimited number of attributes a user can create to describe their entity.
- Different "sets" of EAV data, so one user could have completely different data that is not shared between the context boundaries. (Context boundary could be anything, but typically is tied to each client, so every client has their own sets of attributes for entity types)
- Multiple Entity types, with their own unique attributes
- Extended attribute functionality that recreates many RDBMS column properties (eg. Uniqueness, Formulas, Bind able Context menus, etc.)
- Must be easily searchable
- Must allow for fast and scalable operations
These requirements are not without their own basic implementation challenges, but those last two bullet points, in my experience, are where the majority of the complications in building a system like this arise. EAV/CR's suffer from an exponential degradation in performance. n attributes with x entities produces nx=(v)alues. With 4,000 entities of the same type, and 200 attributes for each entity, you end up with 80,000 value rows. That's just for one entity type for one contextual boundary. Extend that with 50+ clients, and 10 entity types (80000 * 50 * 10), and you can quickly see how your v table gets out of control.
The complexities of re-implementing your own database in code causes searching to become an exercise in pivoting and endless sub-queries forking for each attribute data type you expose. Full text indexing goes from being a checkbox in most DB engines to a caching like framework you inject into your system.
In later parts of this series, I'm going to talk about some implementation challenges and a few ways I've found to overcome them in OO and NHibernate.
Thursday, November 5, 2009
Join Fetch on NH queries returns dupes
Domain:
public class Parent
{
public ICollection<Children>Children {get;set;}
}
public class Child
{
public string Description {get;set;}
}
Say we want to retrieve all the Parents in HQL, but we want to join fetch to get children simultaniously, so we don't lazyload children. In my case, I knew I'd be working with children immediately after retrieving Parents. The HQL would be something like this:
SELECT p FROM Parent p JOIN FETCH p.Children
Nothing fancy. When we execute that HQL query, we're going to get a Parent row returned to us, for every child for every parent. So if we've got 3 Parents, and each one has 2 children, we will have 6 rows returned from that statment.
Each row will have all of the attributes of Parent on it (as columns), and it will also have all of the attributes of child (as columns). This is how NHibernate ensures it retrieves both objects in the database simultaneously.
So, after you execute the query, NHibernate gives you a list of 6 Parents. Each parent has a duplicate of itself in the retrieved list. Why? This is because that's what was returned, for rows, when you asked for Parents. You're getting both Parent and Child, but you're only selecting the Parent as the result from the query, NHibernate is smart enough to preload each Parent's collection of Children so when you attempt to access the collection, NHibernate does not need to go back to the database to load the collection.
So, my problem was, how to get just 3 Parent's back, instead of 6. Well, a few people, might suggest using DISTINCT in your HQL, so it'd look something like:
SELECT DISTINCT p FROM Parent p JOIN FETCH p.Children
Execute that, and you'll still get 6 Parent's, each parent, along with it's dupe. Why? Because DISTINCT is a SQL based syntax, and those rows are not dupes. Only the objects that are hydrated in the object model are. So you're not going to have any luck there.
What are your options?
1.) Use the results transformer:
query/criteria.SetResultTransformer(new DistinctRootEntityResultTransformer());
The result transformer should work after NH has retrieved the objects back from the database, and will attempt to do a comparison with your root entity (I believe on the ID) to determine if they're the same.
2.) Do it yourself in Linq with results.Distinct();
Hopefully this helps explain the situation a bit for a few people out there that might be experiencing the issue.
Saturday, October 17, 2009
It's been a while
Things have been busy, and apologies all around for not updating this as often as I should.
I'm doing an NHibernate presentation at the Strange Loop conference in St. Louis next week, although Robert will not be co-presenting contrary to the session description. Hopefully, after I return from there, there will be more time for me to continue the NH/FNH tips and general updates. (I will end up making a post with the session materials after the session however)
Since my last post, FNH 1.0 has come out thanks in large efforts to James Gregory, Tuna's gotten into XNA and found several physics frameworks for it that I need to check out and make a few XNA related posts, and NH went to 2.1 officially.
I've also just learned that my FNH session has been accepted at CodeMash for January, so I hope to see anyone I don't see in St. Louis next week, there.
Until then!
Subscribe to:
Posts (Atom)