[GreenKeys] another m28 document... Warning - long - if you're
not interested in document preservation - JUST SKIP THIS.
Randy or Sherry Guttery
comcents at bellsouth.net
Mon Jan 14 10:28:13 EST 2008
Sheldon Daitch wrote:
> dumb question time.
The only "dumb" question is the one not asked. Everyone has to learn
along the way - and if they aren't afforded the opportunity to "peek
over the shoulder" - then Q&A is the next best thing.
> I don't know very much about PDF file creation, as we have two models
> of HP high end scanners used for PDF preparation. But they are line
> scanners,
> that is, if the displayed file is expanded (magnified), the scan lines
> become very
> apparent. We don't have the "creation" type PDF program that would take a
> text document and convert it to PDF.
The key to scanning is the software used; and with some (most) software
- they require some skill (learning, trial and error and practice) to
accomplish a good result.
First - the quality of the scanner is almost irrelevant - as long as
it's capable of a minimum of 300dpi. (dots per inch of "resolution").
Even an old scanner - good software can produce excellent results - as
the software can "drive" the scanner to produce the desired (scan)
results. This particular document was scanned on a very old and tired
Microtek. It's basic scan accuracy is 300dpi - yet software can "push"
it to 2400dpi - but the trade-off is time - Which is the "major"
difference in scanners - your newer high quality scanners can turn out
an excellent scan in a single - comparatively fast pass. My old beast
can turn out a similar quality scan -- but it'll take the software maybe
ten minutes to get the job done. The average scan time per page on this
document was less than a minute - so the quality was only "acceptable"
and considering the source - a printed copy (even the graphics) looks
better than the originals - which are old, faded, yellowed, etc...
> In the first 6 pages or so, all text, there is no evidence of the
> document being
> scanned, it is like it has been "typeset" into a PDF document.
> May I assume the first text pages were not simply scanned?
Yup... Here is a case of using the best software for the desired
result. The first six pages - being all text - were scanned and
converted by Omnipage Pro 12. This is Scan to text OCR (Optical
character recognition) software package that is very good at doing it's
task. We (Sherry and I) found less than a half dozen outright errors
when we proofed it (it subbed a 9 for a 4 in one place), and one of the
errors was actually an error in spelling on the document (perceptable
vs. perceptible). Since we try to keep the new document as "true as
possible" to original - the "spelling error" was retained. The other
errors were abbreviations - which it's spelling checker sometimes
guesses wrong - and it's so close to some "real word" that it doesn't
trip the "proofer" to present it for Ignore/Change confirmation.
The table on page two (parts list) presented a little formatting issue -
as I allowed the program to "guess" at the page's entire content- when I
should have manually "drawn" the regions. Omnipage can either "guess"
the regions of a page (plain text areas; formatted text (tables); and
graphics areas)- or you can manually draw these areas to ensure the
results more closely match the original. Here again - more time - better
results.
> On the
> other hand,
> the last four pages, all the drawings pages, they are scanned, and the
> scan lines
> become very apparent about 400% display.
OK - two different issues here - one is scan mode / and the other is
post processing. As noted - different software does different things. If
the graphic elements are simple and the original is good - Omnipage Pro
can handle it pretty well (it's base graphic mode it TIFF). However if
the graphics are complex (high resolution, detailed, or poor quality
originals) then I use one of two different pieces of software to scan
the graphics. Whatever I use to scan - Photoshop is the "post
processor" as it can do magic (amount of "magic" is directly
proportionate to time invested). If I'm doing "onesies" - then
Microtek's ScanWizard plug-in to Photoshop is used. If there is a bunch
to be scanned - then Vuescan is the program of choice - as it is VERY
powerful, and automates a great deal of the "repetitive" process of
scanning things (I say things - because that's also the software I use
to drive a Nikon 35mm Slide scanner - which can scan 50 slides at a time
using it's autoloader).
The last four pages were scanned with Microtek's ScanWizard - 300Dpi -
line art mode. There is one key. High res graphics are scanned in
either gray scale (B&W) or high res color. The problem is that this
mode "retains" all the detail - problem: because the faded, yellowed
background remains. You COULD go in with photoshop and manually clean
that up - but that takes time. You can change "mode" with photoshop
(from gray scale or RGB to bitmap) and let it "drop out" the background
"clutter/noise" that way - but if you do it "page wide" it either
doesn't get it all - or it gets too much and looses some (desired)
detail. If, on the other hand - you set the scanning software to bitmap
mode (line art) - it *dynamically* adjusts as it scans - (usually) doing
a quite good job of separating the desired "stuff" from the chaff. Again
- point being - you *could* do a better job manually - *IF* you have the
time. And around here - time is in short supply.
Ok... now that we have "text" pages - they are edited in M/S Word (they
were saved out of Omniscan Pro directly into a word document). Spelling
checked, and proof read against the originals, any formatting issues
fixed. Even though the font - Times New Roman - in this case - is very
similar to the original - it doesn't "layout" exactly the same (letters
per line, etc.) as the original document - so some "tweaking" is done to
"look the same".
The graphics are already in photoshop - so a little touchup here and
there - sharpen, crop/size, convert to 72dpi (so it displays the same on
computer screens as it prints); and save. In the case of page 10 - the
original print is messed up bad - obviously went through the press
"crooked" and smeared a bit. So some clean up - and replace most text
so it's readable.
Now open Adobe Acrobat - import the word document - check - (yeah -
looks ok) - import the four pages of graphics - oops one text legend got
messed up - back to photoshop - fix, save- re-import to Acrobat.
Save a master - then save using "reduce file size option" which limits
"compatibility" - but I figure most people have at least reader 5.x - so
that's the option - and it cuts the final file size by 2/3. Up load to
server via FTP- let everyone know it's there.
> I am not sure what questions I ought to
> ask on how you did it.
Well - now that you have an "overview" of the process - and the tools
that I use - and there are certainly others... you can jump in - and
lend a hand preserving these old documents before they are lost to time
and dust... As noted - I wish I had more time, but right now that's a
luxury I don't have. Pesky doctor wants to do "a procedure" on me
tomorrow - so I have to get things in order before "reporting in". Then
it's back to business at hand. Of course being busy helps pay the bills
- so I guess that's better than to much time on my hands at this point
;-) (pesky doctors LIKE to do procedures - with their hand out, of
course!).
I am considering getting a newer scanner - not that this one isn't
"adequate" for the job from a quality viewpoint - but with several
thousands of pages to scan - (many Navy training manuals, the Test
Methods and Practices; and Reference data volumes of the EIMB - (the
1964 edition that still has TUBES in them!)).... it'd sure be nice to
turn a scanner loose like I can the slide scanner...
best regards...
--
randy guttery
A Tender Tale - a page dedicated to those Ships and Crews
so vital to the United States Silent Service:
http://tendertale.com
More information about the GreenKeys
mailing list