I’ve been pretty stressed out the last few days (leaving a job and starting a new one is always stressful…and I will miss my old coworkers!), so my attention span is not what it should be. So today, I decided to play around with a project I have been mulling for quite some time: scanning old copies of Weaver’s magazine and converting them to searchable PDFs. I discovered to my delight that the price on Abbyy Finereader 10 (the best quasi-affordable text recognition program out there) had dropped from $399 to $149, which put it in my budget range. So I downloaded a copy and tried it out on a photo of a page from Prairie Wool Companion (the predecessor to Weaver’s). It worked!! It rendered a 3.5 MB .jpg file into a 91 KB PDF file. And the PDF file is searchable!
(I could have taken the option of including the original image in the PDF file – that works too, but for obvious reasons results in a much larger file.)
There are some problems with the text recognition, mostly with stylized text, blurry text (from poor photography), and weaving drafts. Those will require time-consuming manual cleanup. But overall I was impressed by how well it performed. Is it as good as the original? No, not by a long shot. But will it do what I want it to do, which is provide a way to search the text of all 60 issues of Weaver’s and Prairie Wool Companion? Hell yeah!
So I am going to work on this seriously over the next couple weeks. Mike and I are going to try to find a copy stand and a remote so I can take better photos (I had major text-recognition problems where the photo was blurry), and I’ll work on refining the photographing and text-recognition process. Because there are 60 magazines involved, it will probably take me a few months to complete, but at the end of it I should have all of Weaver’s and Prairie Wool Companion at my electronic fingertips. If I dump it into Evernote, I’ll be able to search and access the magazines from any web browser, anywhere in the world.
Ain’t technology great?
(This is a format conversion, which is considered “fair use” under copyright law (I spent a couple hours researching it yesterday). It would only become an issue if I sold or gave away the originals, or tried to sell the scans, neither of which I’m about to do. (You’d have to pry my issues of Weaver’s out of my cold dead hands.))
If this goes well, I’m sort of tempted to try scanning my collection of Handwoven magazines. But since I lucked into a nearly-complete collection, over 30 years of Handwovens, this would be a pretty Herculean task. I’m not sure I’m that motivated – I figure, let’s see how this first batch goes.