hist-brewing: Creating online sources

Spencer W Thomas spencer at engin.umich.edu
Mon Oct 5 20:10:07 PDT 1998

The problem with images: they're big.
The nice thing about images: they work.  :-)

The nice thing about text: it's compact and portable.
The bad thing about text: it's hard to make.  (It doesn't work. :-)

Another problem: the appropriate form for storing "printer resolution"
images is not the appropriate form for on-screen viewing.  At JSTOR
we've taken the approach of storing the high-res bitmap images (TIFF
with G4 encoding), but displaying lower-res gray scale images (GIF),
computed as needed and cached briefly.  This approach requires some
investment in software development as well as storage space,

For the Wahl-Henius pages, I preconverted the TIFF images to GIFs and
built a static set of table-of-contents (HTML) files.  This works
because there are only about 600 pages.

A couple of other solutions are available:

PDF files with embedded page images.  You can even stick the OCR text
"behind" the page image for searching and "cut-and-paste", if your OCR
engine supports this output option.  
  ++ "Everbody" has Acrobat Reader
  -- They're big -- 80Kb per page at 300DPI.

"CPC" format (www.cartesianinc.com) This is an image-only format, but
will store high-res page images with significant space savings (4x -
10x) over TIFFs.  They have a plug-in viewer for Windows and a clever
solution for Mac & Unix that converts the CPC file to PDF and passes
it to Acrobat.
  ++ Small (for images) files, viewer (on Windows) supports arbitrary
	resolution, thumbnails, etc.
  -- Encoder costs $$.  About a penny/page in "large" quantities, but
	still real money.  Not everybody has a viewer.

I plan to take Jim's 50 page sample and see if I can train our OCR
software on it.  At least to try and recognize the "f/s" character.
It's still sensitive to image quality - blots and broken characters
almost invariably cause recognition errors.

Here's the output from one page, with no editing on my part.  The
second and third lines were a subtitle, in an italics font.  As you
can see, it had a lot of trouble with these lines.

See it online at http://hubris.engin.umich.edu:8080/London/
along with the original page image and a hybrid PDF example.

     'and of the proper Soils, &c.    41

For eBreing Tah  and Amber AleJ and

  As the brown Malts are Brewed with
River, thefe are Brewed with Well or
Spring Liquors. The Liquors are by
fome taken fharper for pale than brown
Malts, and after the firf fcalding Liquor
is put over, fome lower the relt by de-
grees to'the laft which is quite Cold, for
their fmall Beer; fo alfo for Butt-Beers
there is no other difference than the ad-
dition of mo.re Hops, and boiling, and
the method of working.  But the reafons
for Brewing pale Malts with Spring or
hard Well waters, I have mentioned in
my fecond Book of Brewing.

  Eor Brewing Entire Guile fmall Beer.
  On the firRt Liquor they throw fome
hully Malt to fhew' the break of it, and
when it is very fharp, they let in fome
cold Liquor, and run it into the Tun milk
warm; this is mafh'd with thirty or for-
ty pulls of the Oar, and let Rland till the
fecon4 Liquor is ready, which mufl  be
almoft fcalding hot, to the back-of the
Hand, then run it by the Cock into the
Tun, mafn it up and let it ftand 4 Hour
                                . 'before

