Rudolf Steiner e.Lib


v173.c

Links

Home

The Basics

Background

Steiner Bio

Purpose

History

Digitizing

Promos

Steiner Quote

Steiner Data

Get Involved

Help Out

Audio

Links

CoTS Verse

CoTS Info

What's Up?

RS Archive FAQ

Welcome Page

Thank You

Contact

Supporters

Legalities

ServerStatus

e.Lib Family Sites

AnthroPubs

The e.Gallery

KnowNews e.Wire

Goethean Science

The e.Lib

e.Lib Family bLogs

RS Archive bLog

Fine Art bLog

Now I See ...

Star Calendar

Book Sources

RS Publishing

Mayflower Bookshop

Steiner Studies

RS Bookstore

Publishers

Book Finders

Anthro Societies

General Society (Dornach)

Australia

Bulgaria

Canada

Great Britain

Hawai'i

India

New Zealand

America

Anthro Branches

Chicago

Hawai'i

North Carolina

Translate Page

Helping Out

• Helping Out
• Get a TextAd
• Use GoodSearch!
• Use GoodShop!

Supporters
Links to Our Supporters

Visit Our Supporters

Steps in Digitizing a Document ...

What does it take to get a document digitized and published online here at the Rudolf Steiner Archive? Here are the steps:

[ From this ... ]

Locate and acquire the document, either purchased or from a library (see image at right ... click image for larger view).
Scan each page, saving as a computer file (preferably, but not necessarily, a TIF [Tagged Image Format] file). At this point it is a graphic image, like a photo of each page.
Run files against OCR (Optical Character Recognition) software, which converts any alphabetic characters it finds in the image to actual “text” characters, resulting in the creation of text files. The accuracy of this process varies from 95% recognition for very clear documents, to no recognition at all for some old manuscripts, which have to be keyed in (typed) by hand.
Proofread and correct each text file, comparing against the original document (preferred), or the scanned images, and save as a revised file(s). This includes:
- edit for typographic errors, whether caused by OCR inaccuracy or in the original document (it happens!),
- verify special characters, especially left and right quotation marks, and diacritical marks such as umlauts.
Proofread to locate all footnotes and graphics (e.g., diagrams, drawings) in order to place them correctly in the online version.
Proofread to locate all references to items online in order to set up links for cross-references.
Convert to HTML. For a single lecture, this is a single file. If this is a book or collection, there are multiple files, including cover image, contents, prefaces, appendices, synopses, notes, footnotes, cross-references – much of this is automated, but the human eye is still needed, and a lot of this must be done manually.
All browsers are not equal! There is quite a bit of work that needs to be done to make the document render, at least close to the same way, in all browsers! What looks fine in one browser may look terrible in another. And when you fix it in the other browser, it breaks the first one. We recommend Firefox!

[ ... to this. ]

Put into the database(s), cross-referencing with other documents, create index, keywords, and other information needed for our database and the search/research tools we have created.
Publish on the website (Whew! see image at left ... click image to read the lecture).
From start to finish, a 10-page lecture could take anywhere from one to eight hours, from initial scanning to finally appearing online. For a collection or book, it can take 10-50% more time to handle all the indexing, cross referencing, and formatting. Also, graphics and diagrams can take a lot of work to clean up after they have been scanned. Some of our materials are original typewritten manuscripts on very fragile, yellowed papers, and are nearly impossible for OCR processing. Currently, there are 1112 on-site volumes and 4171 individual documents here at the Archive!

Most of the digitizing project is done inhouse, but we have wonderful volunteers all over the world who acquire and scan documents, run against their own OCR software (if they have it), and create files that they send to us. The final proofreading, cross-references, creation of HTML files, setting up for our databases and tools, and online publication are all done inhouse. And, of course, we provide the heavy-duty servers and broadband to make it all available to the world.

Our Search and Research Tools and Database Management

Jim Stewart has designed and created the online tools — the database, searching capability, keyword indexing and cross referencing, etc. — that enable users to access and research the on-line documents with ease. This has been an ongoing project for almost 30 years.

How to Help

We have a tremendous backlog of materials we want to get online, and there are so many irreplaceable resources at risk worldwide! If you can afford to donate even a little to help support this initiative, you will be helping save irreplaceable works and to make the information available to so many others! Please check our Donation and Appeal pages to see how you can help!

The Rudolf Steiner e.Lib is maintained by:

The e.Librarian / James Stewart /

Copyright © 1980 – 2024
Rudolf Steiner e.Lib
7577 total hits since Sunday October 13th. 1 hit today.
Page was last updated on Saturday October 15, 2022 at 05:58:12.
Total execution time in seconds: 3.091157913208