Contents

Remove Kindle DRM with OCR and web automation

Background

I don’t have a lot of space for books, which is one of the reasons I’m a frequent patron of my local library. I borrow both traditional books and ebooks, which I can read on my Kobo reader. If I was going to buy a book, my preference is for an ebook over a paper copy, as I don’t want the physical good. A number of years ago, I wanted to read The Road Chose Me, by Dan Grec - it wasn’t available from my library AND I’m pretty sure it wasn’t available on Kobo at the time (it is now!).

I could have ordered it on Kindle, but I had a Kobo, and I don’t like the idea of “buying” something with digital rights management (DRM)/copy protection. For myself, DRM is fine for renting or streaming something, but if I’m buying something, I don’t want it tied to a specific device or supporting service. I want to be able to consume it when and how I choose, as experience has shown that over time DRM-encumbered media eventually stops working.

A quick Google search found that I could buy the Kindle copy of the book, and remove the DRM using DeDRM_tools. DeDRM_tools required:

  • That you have a Kindle device associated with your Amazon account, and its serial number
  • Amazon’s “Download and Transfer” functionality. From the Amazon website, you could download the book, with DRM tied to a specific device

When configured with your Kindle’s serial number, DeDRM could remove the DRM, allowing you to read the ebook with any device. As I didn’t have a Kindle, I picked up an inexpensive model through a used goods online marketplace, registered it with my account, and DeDRM worked great - I read The Road Chose Me on my Kobo. I ended up buying a few ebooks through Amazon, removed the DRM, and managed them on my PC using Calibre.

Kindle Download and Transfer Discontinued

Mid-February, various technology sites were reporting that Amazon was discontinuing the download and transfer feature. Sure enough, I logged in, and saw the notice.

/remove-drm-with-kindleocrer/images/AmazonDownloadAndTransfer.png
Amazon Download and Transfer Discontinued

Along with I’m sure is a very small group of like-minded Amazon customers I see in my feeds, I DeDRM’d a few books I’d purchased that I hadn’t got around to doing yet.

When Hackaday posted its article about Amazon deprecating this feature, they linked back to a 2013 post about a fun Lego robot that removed Kindle DRM: it took a photo of the screen, ran OCR, flipped the page, and continued until the end of the book, creating an electronic copy without DRM.

This inspired me to do the same, except with the Kindle web portal and Selenium browser automation.

KindleOCRer

/remove-drm-with-kindleocrer/images/kindleOCRerscreenshot.png
Book on Kindle Portal, kindleOCRer on console, and DRM free output

I created a script which opens the Kindle web app, takes screenshots, flips the pages, runs an OCR process, and produces an ePub with a working table of contents. It’s built with Selenium, marker, and Pandoc with the pypandoc wrapper.

The script is available here: https://github.com/raudette/kindleOCRer

I tried it with The Kipling Reader, which I can share here as it is in the public domain. Running it is as easy as:

1
python kindleOCRer.py --title=The\ Kipling\ Reader --jobname=The\ Kipling\ Reader

On my PC, this takes about 5 minutes. You can review the script output here: The Kipling Reader converted by kindleOCRer and compare it with The Kipling Reader from Project Gutenberg.

A video of the script running is available on YouTube: https://youtu.be/3-07wMCKlkw.

Full Source

https://github.com/raudette/kindleOCRer