A place to discuss Development techniques, .NET, XNA, NHibernate or anything else that tickles your fancy

Friday, February 12, 2010

Converting CHM to ePUB (and a e-reader review)

Just got my Astak Ez Reader Pocket Pro 5", and I love it. It's the perfect size, solid resolution, has all the features, all the document support you could ever want, is very fast, has 3 different places to change the page, including link navigation, supports up to a 16gb SD card (aka a metric assload of books), no bullshit DRM and has a battery life measured in months. That's right. Months. This thing just doesn't die. It's brilliant. I've been raving about it for the week or so that I've had it now.

So on to the issue. I have a few technical books as CHM files (Old school, right?), and with the advent of ePub as a standard e-reader format, I wanted a way to convert them into some spiffy epubs so I can take advantage of text reflow (which is also supported in PDFs on the Astak). ePub is an amazing format. Low size, fantastic orientation, and chapter support. Plus, it's a standard document definition, unlike PDF, which is all over the place. It's also quite a good bit faster than PDF to navigate and turn pages on. Faster in the e-reader world typically means less processing, and less processing means longer battery life (plus a happier reader not having to wait as long for the page to turn).

Step1: CHM to HTML.
You have to get the CHM into HTML somehow. There's a few tools to do this. My favorite is ABC Amber's CHM Converter. It does cost, but it's worth it if you've got a lot of CHMs. If you don't, you can still use the trial and get away with not having watermarks crapped out all over the result (I'll explain in a few). You can also use the HH.exe in windows (if that's your platform of choice) to decompile the CHM out into a messy website in a folder of your choice (HH.EXE -decompile C:\Temp\decompile-folder C:\Temp\yourCHM.chm). You can typically convert in one of two ways:
  1. A single HTML file
  2. Multiple HTML files (website)

If you go option 1, it's much easier to track and manage, however because of the way ereaders work, navigating pages and chapters will be a huge pain. It'll take quite a few to turn to the next page and 10-15 seconds to navigate chapters on a 600 page book. Option 2 makes turning pages and chapter navigation significantly faster, but involves more work on your end to manage.

We want to go option 2, this is the best speed, and that counts for a lot on an eReader. Now that we've got an output directory containing our "website" we want to investigate the dir. Inside, you should find a single HTML file and a folder of the same name. Open up the folder and go to the images directory (or something similar). Remove any superfluous images such as arrow navigations or whatever else might have been included in the CHM (like header seperators and such). Feel free to also remove any images that you feel might not be of a significant benefit to you reading the book.

Now, look for a main.html or go up a level and look at the html file in the root folder. What you're looking for is the HTML file that acts as the Menu. The menu HTML file is important because it points to all of our chapters/pages for use during the next stage. It's also the place that if you used a trial of Amber CHM, you can do a search and replace for "[Trial Version]" and replace with an empty string and remove all of the "watermarking" that's done during the trial version conversion process. At this point, it's up to you to weed out any chapters you don't want to include in your ePub, such as appendixes, which can help to reduce the "noise" you'll see when navigating the ePub's menu file. Save the file if you've made any changes and continue on to the next step.

Step 2: HTML to ePUB
Going to introduce you to an awesome piece of free OSS software. Calibre. This converts from almost any format into an ePub, and it's great at it. If you're using a single HTML file from step1, then all you need to do here is "Add book" point it at the HTML file, and then "Convert To" and specify ePub as the target source. Done. However that's not what we've chosen to do if you're following along. So when you go to "Add book" in Calibre, you need to select *only* the MENU HTML FILE that you created in the first step. This file is going to tell Calibre where all of the rest of your "chapters" are.

Give Calibre a few moments to import and create a zip containing all of the referenced files. When it's completed, you should see the book appear in the list (hopefully with a size > 0 next to it if you've done it correctly). Right click and go to "Convert". Choose ePub as your output. I'd also recommend deselecting the options for a title page/image. Also select the option in the Menu section for Calibre to generate it's own menu file (it'll clean it up a bit).

Convert. You can now copy the converted ePub to your eReader (which is hopefully an Astak! =p ). Where is it? It's in the Calibre directory created for managing your books. Can't find that? Right click on the book in the Calibre list and go to "View source folder". Viola.

Now you've got a lightning quick ePub file out of a single CHM. Hopefully this helps a few people out :) Good luck!

4 comments:

Koen said...

This did the trick, thanks! I was already looking for a tool to convert some old CHM files...

Anonymous said...

Hi, can you explain how to use hh.exe to do single HTML file and multiple HTML files. Thanks

Team DoIT said...

Don't know how you managed to get more than one line in the "search" but I can't do it.

Guidance said...

Superb post.

Post a Comment