Conversion of Paper Documents

(or Microfilm) to Electronic Images
and the impact on users, network, and systems

©2001 by Charles A. Plesums, Austin, Texas, USA


When paper or microfilm files are converted to electronic images, there is a huge temptation to put all the documents for a file/customer/policy together as a single file. This process is often followed to minimize the cost of indexing. Sometimes a few users argue that "the entire file must be reviewed anyway," so this approach could make sense.

Experience at several sites that have followed this practice has shown several significant problems. Assume a fairly typical 50 page file:

The alternative is to do more granular indexing during conversion, so users only retrieve the documents that they specifically need. When examined closely, the extra effort to provide the indexing can be minimal.

Expected system degradation

The average insurance policy file folder contains 30 to 80 pages, so we will assume 50 pages in this analysis. Delivering a 50-page document typically involves transfer of 2.5 million bytes. With communications overhead, this will be about 25-30 million bits. Ethernet, used for most local area networks, is designed for small messages, so is inefficient when dealing with large messages - some people consider 30% "full" utilization with records this large. With these assumptions, those 50 pages could "saturate" a 10 Mbps Ethernet segment for 10 seconds. Of course this is only average - larger files can easily be 100 pages or more, and complex documents are much larger that the 50,000 bytes in an average page, so there will be cases where one user will tie up the entire network segment for 30 seconds or more, while all the other users wait. The physical layout and network technology is a major factor in the number of users on a network segment and the way they interact, but up to 100 users is common. This means all the users in an area, up to 100 users, have to wait an average of 10 seconds for any network response whenever anyone retrieves a document. With multiple users and larger documents that wait may be minutes. The latest networking techniques can minimize the impact of this inefficiency, just as enough horsepower could let a car proceed at highway speeds with four flat tires.

A life insurance company with about 200 image system users did a backfile conversion. For a variety of reasons, the conversion had to be done in a hurry. By the time the conversion was done, a few users (those who always had to review the entire file) were thrilled - everything they wanted came in a single request. But most of the users were disappointed, since system performance was painfully slow. A series of consultants were brought in to find the system problem. Each of us reached the same conclusion (although each described the problem in different terms). The insurance company didn't like the answer, since the conversion was already completed. Having paid for the conversion, they didn't want to "pay again" to fix the conversion, so they did nothing. And were never satisfied with their image system. And never got the full benefit of using the system.

Another insurance company with hundreds of image and work management system users kept their documents small except when they converted microfiche; each fiche became a single document (of up to 50 or more pages). Since they did not have a high volume of large documents, their perception was that the system "...occasionally slowed down, but just wait a minute and it will be okay." I watched a demonstration of the system where the request to get work just happened to deliver work with three fiche documents attached - about 150 pages. As they waited a couple minutes (that seemed like hours) the user explained the occasional slowdown. I noticed the other users in the area were also waiting idly. When the documents arrived, everything started working normally again.

User Impact

If the workflow always requires reading all the documents in a file, then there is little difference between reading one large document and several smaller documents - go ahead and put all the pages in a single document. On the other hand, all new work will be stored as small, individually identified documents, so users will become used to the idea of selecting just the documents needed. Many of the people who have to "read the entire file," like quality teams or regulatory compliance examiners will do part of their review from the "table of contents" - the list of documents displayed when doing a lookup, and will only select random documents to retrieve.

If a user needs to find a specific form - such as an application, loan document, or change of beneficiary - it is far more efficient use of the person's time to just select and receive that document, rather than delivering 50 pages that have to be read.

Indexing techniques

There is a legitimate concern about the skills required (cost) to identify all the possible documents in a file, especially when a block of business has been purchased, with filing techniques and forms that are not familiar. It could be horribly expensive to identify and index every document in every file.

An application for insurance is pretty easy to identify - most companies put the application at the beginning of the converted file before filming or scanning. The illustration may also be isolated, right after the application. And the medical authorization form. And the medical history. And the Attending Physician Statement (APS). A patch page or bar code sheet can be placed between each of these documents to separate and identify them, as the staples are being removed and the documents are arranged.

Certainly there are other documents in the file - the rambling letter from the agent or insured that nobody wants to read again just to do a conversion. The change of address from 15 years ago. The change of beneficiary. The inquiry about a loan balance. It may be hard to accurately identify what each of these documents is, although almost anyone can identify the beginning and end of each document - perhaps because the pages are stapled together! Therefore, after isolating the key documents as above, it may be sufficient for the conversion to identify five 3-page documents as "other," rather than one 15-page "other" document. The first time they are used, all five documents need to be retrieved (no savings here), but it is very easy for the user to change the index of a few documents (e.g. the loan and beneficiary documents). Even if the other documents are unchanged (left as "other"), the next retrieval will either be able to select the document desired, or will only have to retrieve three, rather than five, "other" documents. The savings are in the subsequent retrievals.

In a study many years ago, users were asked to review "fat files" when used with an image system. The users concluded that it was most convenient if a large document didn't exceed about 25 pages. These users recommended that a 100 page medical report be broken into 4-5 "chapters" so that later users would only have to refer to chapter 3, page 15, to see the notes that were originally on page 75 - which may be the only notes of interest. A similar situation involved property insurance, where hundreds of receipts had been submitted for jewelry, electronics, art, etc. The users did not want several hundred one page documents (receipts), nor did they want one document with hundreds of pages of receipts. They didn't care whether the grouping was by time, type, or source, but they liked the pages grouped into sets no larger than 20-30 pages.

Document Purge

After reviewing hundreds (it feels like thousands) of files at many different companies, it is common to find that many pages or documents can be discarded from even well-maintained files. These may be copies of documents made during processing that were returned to the file, even though the original was still in the file. It may be a request for information that was satisfied ten years ago. It may be an incomplete application that was later replaced by a corrected or completed document. It may be some notes from years ago that someone saved, "just in case." After careful analysis, it is common to find that half of the pages could now be discarded (or should never have been saved).

Of course, to get that "perfect" purge that allows half of the pages to be discarded may require legal review, executive discussions, and far more professional attention and cost than anyone wants to give each of thousands of files.

A quick purge has been defined and used for several conversions. After the detailed analysis of dozens of files, documents are identified by their common name or printed heading, forms codes, and other information that a temporary employee can easily find. Simple purge rules are established and converted to posters for the working area. And most important, any document that might be confused with a critical document is retained. With these simpler rules, the conversions were able to discard about a third - 35% of the document pages.

What does this "simple" purge cost? Since the documents have to be examined anyway (to remove staples and arrange them with the critical documents first), the purge is actually free. Experience has shown that discarding a duplicate document is faster than removing the staple and preparing it for scanning. And of course, there are then 35% fewer pages to store, deliver, and review. The result is a cheaper conversion, faster system, and more efficient users.


Many conversions are headed for future trouble. If an entire file is scanned as a single document, it is very expensive to later manually break the 50 page image document into separate documents - it may even be cheaper to discard the conversion and start over with document preparation and scanning. I strongly recommend that anyone currently doing a conversion into a single document immediately change their procedures to at least create the individual documents in their image or microfilm system, and concentrate on whatever tools are necessary to improve the efficiency of the indexing process.

Back to the home page

Send e-mail comments to

©2001 by Charles A. Plesums, Austin, Texas USA. ALL RIGHTS RESERVED. You may license additional copies of this document through a nominal royalty payment as specified on