Open Source: Cheap Open Source Document Imaging - Page 2

Part of: Open Source

If you don't have a line like that in your /etc/fstab you can add it by typing as root:

(users with 2.4.x kernels use usbdevfs instead of usbfs)
echo 'none /proc/bus/usb usbfs auto,devmode=0666 0 0' >> /etc/fstab

Next, change USB access control:

umount /proc/bus/usb; mount /proc/bus/usb; mknod -m 666 /dev/usbscanner c 180 48

The scanner should now work properly.

Step 3: A Small Script

I wrote a small script to do all the processing for me. This way, I can just load some pages into the ADF and run the script. It will scan everything in, convert the images to pdf's, concatenate the individual pdf's, and then delete all the temporary files. It is pasted below:

=========
#!/bin/bash

#Automatic scan/conversion script
#Requires sane, imagemagick, and pdftk

#Scan in the pages
scanadf --mode "Black & White" --resolution 200

#Convert each page to a pdf file and delete the original image file
for file in image-*
do
convert $file $file.pdf
rm $file
done

#Concatenate all the individual pdf files into one single file and delete the original pdf files
pdftk image-*.pdf cat output $1.pdf
rm image-*.pdf

exit 0
=========

I have it configured to scan in black and white, with 200 dpi resolution. This works fine for the majority of things that I do and results in comparatively smaller files. If you wanted color scanning or higher resolutions, you could change them appropriately. I run the script by typing ./scan.sh filename, where filename is whatever I want the output file to be called. The .pdf extension is put on automatically in the script.

Conclusion

In all, the installation process (on Debian) takes approximately 10 minutes and the scanner only costs about $125. I don't know how many pages per hour can be imaged with this setup, but with my settings it takes approximately 2-3 seconds to scan a page and just a couple of minutes to scan a whole semester's worth of calculus notes. I'm not sure whether this could replace larger document imaging systems used in some companies because of the ADF size, but for personal and small business purposes it's a cheap and easy open source solution.


Originally posted at politicalapathy.com.

Page 1 — Page 2

Article tags

Spread the word
Bookmark and Share
Read comments on this article, and add some feedback of your own

Article comments

  • 1 - James Eglin

    Feb 01, 2006 at 12:26 pm

    Interesting approach. At what point are these files Named so they can later be retrieved?

  • 2 - Adam Drake

    Feb 01, 2006 at 12:58 pm

    James:

    The file name is supplied as an argument to the script. For example, if you were scanning notes from a class or something you would load them in the scanner and type "./scan.sh classnotes"

    That would create a file called classnotes.pdf in your current directory.

    I hope that answered your question, let me know if it didn't.

  • 3 - James Eglin

    Feb 01, 2006 at 3:59 pm

    Adam:

    Thank you. Don't computers need a file name to be unique?

  • 4 - Frank Russo

    Feb 01, 2006 at 4:16 pm

    I realize that this may be out of the article's scpe, but what do you use for document storage, organization , and retrieval? I mean, any scanner with a SANE driver and an auto sheet feeder can do what you have described above, but where is the FOSS replacemnt suite for Paperport/Omnipage/etc. I have been looking for one for quite a while now. Currently, I am using krusader to stay organized.

    Thanx Much,
    Frank Russo

  • 5 - Adam Drake

    Feb 02, 2006 at 9:12 am

    James:
    Filenames must be unique. I don't have any checks in the script to confirm that a file with the same name doesn't already exist but that could be easily added. I didn't need it for my purposes although I may expand this script later, at which time it will be added.

    Frank:
    I don't use anything special. If the document is to be network-accessible then it will be put on the network drive. Otherwise, I just move it wherever I want it to reside. I don't have any experience with either of the two commercial packages that you mentioned, so I can't really comment on them directly.

  • 6 - Brian C

    Feb 02, 2006 at 11:29 am

    I do something similar, but instead of converting each file to pdf then putting them together, ImageMagick can do that for you in one step.

    convert -adjoin image-* $1.pdf

    There is no need for pdftk either. Of course, this will fail if you have too many files for the command line to handle, but that's another issue. I've done documents of about 10 pages this way.

  • 7 - Tom R

    Feb 02, 2006 at 1:55 pm

    We use the same scanner for our Doc Imaging.
    Currently we scan our delivery copies (75-100 images a day) for our business. We also automatically capture at time of printing a image of the original sales ticket as well as invoices to customers. We store all tickets, invoices, and delivery scans in MySQL databases and use custom written PHP to search and display data across our intranet. It's fast and we tie all 3 of those documents togather as well as a image of any associated Purchase Orders. We have it setup so it is all automatic (except for putting the delivery copies into the scanner). All on Open Source Software! (we also use MSACCESS to put data into MySQL tables automatically)

  • 8 - Karl O. Pinc

    Feb 02, 2006 at 7:24 pm

    Having to run around and update proprietary drivers whenever I wanted to upgrade my system drove me nuts. Now I stick with the FOSS drivers and painlessly upgrade from the internet in one step whenever I desire, and am much happier.

    There's a reason they say "Use binary only drivers, hate life."

  • 9 - Bullwinkle

    Mar 09, 2006 at 11:01 am

    I am interested in how you would edit or input text/data into these documents or forms and pdf's? Proprietary apps such as Paperport have had this capability for quite some time, and Novell/Suse has shown some promise in the doc management field with DjVu. Still much work needs to be done.

    Simple storage/management/retrieval isn't the real problem here...but it's certainly a start.

  • 10 - Richard Cooke

    May 05, 2006 at 10:54 am

    I have noticed that this printer has an ethernet port.

    I had great trouble getting my broadband modem up and running, and changed to an ethernet modem, which worked 'just like that'. Would not the same be true of this printer?

    As a relative linux newbie I find your instructions daunting (though good and detailed), and I run mandrake at the moment (about to try SUSE), which would require some changes.

  • 11 - Adam Drake

    May 05, 2006 at 12:50 pm

    Richard,

    USB Broadband modems have been difficult (if not impossible) to configure with Linux and as you said, they do work fine if connected via ethernet.

    This printer does have an ethernet connection, but as I understand it that is solely for the printing function. The scanning function cannot be used in that way regardless of operating system.

    I am happy that you are trying Linux. The learning curve is more than something like Mac OS but after you are comfortable in the OS it is difficult to imagine switching back.

    If you have any problems or need any help let me know and I'll do my best to assist you.

  • 12 - Richard Cooke

    May 10, 2006 at 8:20 am

    Thak you,

    I have just recieved said printer. Unfortunately it did not work straight off using the ethernet cable - not sure why. But I will install the rpms tonight and hopefully ...! Thanks for your encouragement.

  • 13 - Adam Drake

    May 10, 2006 at 9:37 pm

    Richard,

    I have never used the printer via ethernet, only USB. I hope that it works as well for you as it has for me (the scanning feature that is).

  • 14 - Rob Word

    Dec 14, 2006 at 1:37 pm

    I have the all-in-one unit discussed in this thread. The Ethernet port can be used for scanning. From my understanding, Brother uses saned for network scanning. If you can configure sane on your computer, you should be able to access the scanner over the network.

  • 15 - G

    Jan 05, 2007 at 5:56 pm

    thanks for the great script, i've been using it quite a bit lately

    one suggestion... you can substantially reduce the size of pdf files you are creating by tweaking the convert line in the script to read:

    convert $file -compress LZW $file.pdf

  • 16 - SuperQ

    Sep 02, 2007 at 4:41 pm

    Thanks for the script, I wanted a bunch more functionality for use with my HP officejet.. (works great in linux over ethernet)

    I didn't really want the PDF part, so I droped it.. I suppose I could re-add the PDF mode.

    The script is here.

    New features:
    * scans to PNG files
    * has modes for BW, Gray, and Color.
    * has a series of options for paper sizes
    * prompts for dates for filenames
    * supports multi-page documents

    Features I havn't done yet: (but want to)
    * checking to make sure it won't over-write files
    * inline OCR
    * PDF mode
    * multiple multi-page documents

    My HP doesn't seem to auto-detect how many pages are in the feeder, so this version requires you know how many pages are in the hopper ahead of time.

  • 17 - Kevin

    Jun 12, 2008 at 1:22 am

    Great article, but does the Brother do double sided scanning?

    I am contemplating a similar setup but am leaning towards a Scansnap S300. The only trouble is the Linux drivers are poor, so I would probably connect it using usb over ethernet to a windows box until driver support improved.

Add your comment, speak your mind

Personal attacks are NOT allowed.
Please read our comment policy.
Please preview your comment.

blogcritics lists for Nov 11, 2009

fresh articles Most recent articles site-wide

fresh comments Most recent comments site-wide

most comments Most comments in 24hrs

top writers Most prolific Blogcritics for October

top commenters Most prolific Commenters in 24 hrs