www.Plesums.com (logo)

Conversion of Microfilm Office Documents

to electronic digital images

©2004, 2007 by Charles A. Plesums, Austin, Texas, USA


Electronic imaging is currently the low-cost way of storing business documents. But this has not always been the case. For many years microfilm was the storage media of choice, so many companies have a large amount of microfilm (roll and fiche) that is part of their records storage. Consideration of the use or conversion of microfilm requires familiarity with the film technology. If you need a review of the types of microfilm and it's use for business documents, see the film tutorial available on this web site at www.plesums.com/image/filmintro.html

Every company with older documents on microfilm and newer documents on a digital document image system faces some choices:

Many companies did not follow good practices or standards when making the original film years ago, so quality is often bad. Users often hate using film, because retrieval is time consuming and the poor quality makes the documents difficult to read. Searching for a particular document on microfiche may require scanning 50 or more pages. Retrieval of digital images has always been faster and more economical, so (as the storage costs for electronic images have become cheaper) many companies are moving to the digital image technology.

Conversion costs in this paper are based on multiple studies in the 2003-2004 timeframe, with vendors throughout the country. There have not been dramatic changes in technology that would lead to drastically different prices today. These are not exceptionally high or low prices, but estimates of the competitive prices available from national vendors. Specific local vendor prices were often much higher.

Roll Film Conversion

Most companies find that it costs several dollars to find and mount a roll of microfilm, locate a document on that roll, rewind and re-file the roll, but only a few cents to capture each image desired from that roll. Further, since roll film almost always has an index (or we wouldn't be able to find a particular document), the primary cost of converting roll film is mounting and scanning. Therefore it is often cheaper to convert all the images on a roll of film, while it is mounted, rather than mounting the roll hundreds of times to convert one document at a time.

Further, since all the images on the roll were taken and processed together, the images are relatively consistent throughout the roll (whether good or bad). Since any adjustments would apply to thousands of images, during a conversion it pays to optimize the scanning for each roll.

If we are going to do a bulk conversion of all the images on a roll, then we could also use a service company to do all the rolls, since these companies have high-speed machines and staff specialized in optimizing the images quality and speed. There are exceptions, of course.

Steps in a High Volume Roll Film Conversion

  1. The film index is often called a CAR (Computer Assisted Retrieval) system, or with Kodak, a KAR system, and was typically a CICS application on a mainframe. In many companies the traditional mainframe index has been moved to a LAN or a personal computer. To start, extract the indexing information from the CAR system, one record per document, with the roll and starting frame number for the document, plus all indexing information such as company code, contract/policy number, and document type. This will be referred to as the CAR data.
  2. "Sort" the CAR data by roll and frame, and analyze the index to determine the number of pages in each document (CAR systems sometimes don't store the number of pages in a document, but the number of pages can be calculated from the difference in frame numbers between a document and the one that follows it on the roll.) (If a database is used, the index may not be physically sorted, but analyzed in a specific order)
  3. Put a code in the CAR data for documents that do not need to be converted, such as old documents or a block of business that has been sold. (Generally we will want to add a "delete" code so that the document can be skipped at a later step in the process, rather than actually deleting the records from the extract.)
  4. Extract a list of active contracts from the primary administrative computer system. Include any that are inactive, but have been active within the period where retention is required for reinstatement, tax, legal, or regulatory compliance purposes. Since there may be extraneous documents in the CAR data or on the film, including documents for contracts that are no longer even known to the primary administrative computer systems, the match between the administrative system and the CAR data is often done in terms of a "retain" flag rather than a "delete" flag.
  5. Reorder the CAR data if required to create an index of all the documents on each roll, plus the document retain or delete indicators. By creating a separate index for each roll of film, the conversion may be scheduled in any sequence, typically most recent documents (rolls) or the most active rolls first.
  6. The scanning operation or service bureau will adjust the scanner for the film quality of this particular roll, and scan the entire roll to a local disk drive. The scanned image may be electronically enhanced (sometimes for a small extra fee) perhaps improving contrast, reducing the skew (rotation), reducing the space around the edge of the document, removing the line at the edge of the page, etc. A quick scanning operation may assume a "safe" 14 x 12 inch image, creating a large image file and an awkward image to work with (especially if it must later be printed). A careful scanning operation, working with good film, may scan closer to the typical 8.5 x 11 inch image, or electronically trim the larger image to the actual document size.
  7. Locating each image on the roll is not a perfect science. The blip at the edge of the film may be missed. A small page or one with little information may be hard to find. Sometimes a "blank area" of film will appear the same as the image of a "white" sheet of paper - thus being interpreted as an extra page. Scanning operations may check that the expected number of pages from a roll match the number actually found, if the count is known (there may also have been errors in the original indexing of the roll). Finding missing pages, and discarding header labels, identifying frames, or other marks that were treated as images, can be done by the scanning operation, or at a later step in the process, but must be considered.
  8. The next couple steps involve the actual data on the images. Some service companies will perform these steps (for an additional fee), while others just return all the images, and leave these steps to the company that owns the data. Since they are critical steps as well as a good way of providing quality control of the scanning operation, there is a compelling argument for keeping these steps in-house even though it is part of the conversion.

  9. Until recently the programs to match the index to the scanned documents were not widely available - some service bureaus provided the function, but most organizations had to build custom systems. Iimage Retrieval Inc. in Dallas has published their "Iiridium" program that will take most index files, match against a set of scanned images, and "ripple" the index along as pages are added and deleted to synchronize the set of images with the index.

    The index needs to be matched against images captured. This can be a spot-check of every 10th or 50th document (or a more thorough check, if necessary, perhaps due to poor film quality). Just as the reader-printer could lose count when moving down the roll of film, the bulk scanner can also miss an image, or an error could have occurred when the film was originally indexed. Since some business knowledge and judgment is required to resolve the errors, many service companies don't like to do this. This step is necessary to create documents from the individual pages (similar to stapling the pages of a document together after making a paper copy of a file). Expect to pay a service company about an extra $.005 per frame for doing this match, and if desired, combining the several image pages from each document into a multipage image file.
  10. Some cameras can film the front and back of a document as a single frame on the film (duplex). If a roll has duplex images, the double size image has to be split into two images. Optionally, if the "back side" is blank it can be eliminated. (The split and blank page deletion has to be done late in the process, to avoid changing the page count that is used to match images to the index.) For estimating the cost, the duplex page will probably only count as one image for scanning costs, but there is likely to be an extra charge of about a half cent per frame ($10-15 per roll) to split the image and eliminate blank pages.
  11. The converted images are often written to a CD or digital backup tape. (Expect an extra charge of $5 to $15 for each CD written. One CD holds over 10,000 pages or about 5 rolls of film.) For recent rolls of film, 100% of the documents may be required, but only a small percent of the documents on a very old roll of film. (If only a few percent of the documents are required, it may be cheaper to manually convert the documents on a digital reader printer, rather than follow this bulk conversion process.) If the vendor has matched the index to the images (in the previous step), they may also be willing to delete the documents that are marked for deletion in the index file, thus saving delivery cost. Some of these vendors charge for every page processed, whether it is output or not, while others charge a higher price per page, but just for the pages delivered. With high volumes (thousands of rolls of film) expect to pay $.015 per frame/page scanned, plus extras that will bring the cost to about 2-4 cents per frame. For smaller volumes, expect to pay 3-6 cents per frame.
  12. The image file names may be generated based on the roll and page number, but if the index is matched to the film, other options are available. The image file may be given a name assigned in the index file, or the index file may be updated with the image file name assigned during the conversion. None of these techniques are particularly good or bad, but they need to be established in advance.
  13. When all the steps are completed, including matching the images to the index and discarding obsolete documents, we must still import the images and indexing information into the digital image system. This includes the image itself, indexing information for the image, any associated work items, links to folders, history records, and whatever other types of database records are required.

It is unlikely that thousands of rolls of film will be sent to a service bureau in a single shipment. Therefore a logging system needs to be established, to track each roll of film, and to be sure all the output makes it through the process once and only once.

Some service bureaus have all new equipment, and are more expensive. Others have a variety of new and old equipment. Sample runs are performed on the newest equipment, but high volume work (or rush work) may be performed on the older equipment with less experienced operators. Therefore it is important to monitor the quality of the work delivered - not just assume that all work will be as good as the sample.

A large service bureau may be able to scan 200 rolls of film per day or more, representing 500,000 image pages per day (40-50 CDs per day). Realistically few operations can perform a quality check on the service bureau at that rate, nor can most image system store images that quickly without special facilities or additional capacity. How fast can this type of conversion proceed? There is no absolute answer, but for planning, assume 2 CDs or about 25 rolls of film per day can be handled routinely without operational impact. One modern scanner (if the work is done locally) can theoretically handle 3-4 rolls per hour, but in practice only count on about 2 rolls per hour, so one on-site roll film scanner may be able to handle as much film as the people and image system can handle.

Steps in a Low Volume Roll Film Conversion

If there is a very small volume of roll film, or if only a few documents are required from a roll, or if a roll has been damaged (without backup) and requires special handling, it may pay to do the conversion manually.

Many of today's reader printers use digital technology internally, at least in the printing process. Many reader-printers have a low cost feature to allow that digital image to be saved to an attached PC (even multiple times as the image is viewed and the quality is adjusted). A few reader-printers have a standard digital interface, and still others fit into a complex (expensive) system for collecting digital image output.

A document is located on the reader printer, just as it would be for viewing and/or printing. In a process similar to printing, each page is captured as a digital image. The image can be cropped and the quality checked on the computer screen, just as it would have been checked on the reader printer. As each page is complete, the unit manually or automatically moves to the next frame. The rated speed to print pages may be as high as 10 to 20 pages per minute, but in actual use (since each page is manually handled) don't expect to average more than about 3 pages per minute.

As the several pages of a document are completed, that document must be added to the digital image system. This normally requires a custom program, to get the image into the format required by the image system, to store the image, and to index the document.

As individual documents are converted, it is wise to update the CAR system to note that the documents are done, so that multiple copies will not be converted and added to the image system.

A reader printer process, similar to this, may be a component of the quality checking of a bulk conversion, to determine what is actually on the film and to immediately scan missing pages or to rescan unreadable pages and add them to the document.

Microfiche Conversion

Most microfiche consolidate all the documents for a customer or contract in a single "jacket". Large files may consist of several jackets kept together, just as a large paper file might be in several adjacent file folders. Indexing is based on the right documents being placed in the jacket, and finding that jacket.

When a document on microfiche is requested in most companies today, the fiche is pulled from the film cabinets, and is copied or delivered to the users. A copy of the entire fiche can be made in seconds at a cost well under a dollar (probably less than 25 cents). Each user works with the copy using a microfiche viewer at their workstation. There is probably also a film printer in the area, in case a hard copy is required.

Microfiche are indexed for each contract/policy/customer, but do not have any identification of individual documents (other than, perhaps, the first document is always the original application, or other conventions). If the entire "file" must be reviewed, the fiche is easy, but if a particular document is required, the users may have to look through all 50 or so pages to find the one needed.

When doing a conversion, it is tempting to treat the entire fiche as a single 50-page document, but that approach will put a heavy load on the image system and network, as well as the user, whenever the documents are used in the future. It can become a productivity disaster for years to come. Therefore special effort is required to identify individual documents. (See the white paper on conversion of paper files at www.plesums.com/image/PaperConv.html for a discussion of indexing documents in a conversion. That article concludes that each document, or small groups of documents, should be stored separately, even it if is not completely indexed initially.)

If any part of the fiche is required, the entire fiche should be converted. It would be very hard to identify which documents had been converted and which remain only on a fiche, since there is no formal index of the individual documents in the fiche.

When converting microfiche, there is a significant labor cost to identify the documents, and perhaps to adjust the quality of the scan (since each document in the fiche was captured at a different time). Typical service bureau costs for converting jacket microfiche are 7-10 cents or more per frame for scanning, plus another 3-5 cents per frame if manual cropping is required. Thus microfiche is far more expensive to convert, both in terms of scanning and indexing cost, than roll film. Therefore it is generally better to only convert the fiche as needed, rather than trying to convert all of the fiche. (There is no economy of scale with fiche, as there was by doing an entire roll of microfilm at a time.)

In an on-demand conversion, the turn-around time required is normally short (the user is waiting), so it is not as convenient to use an off-site service company. The best solution generally involves on-site microfiche scanner(s), and custom programs to help identify and index the individual documents. Assuming that there are thousands of fiche to be converted, the scanner should be a production grade unit (50-200 pages per minute), not a reader printer with an image capture adapter (3-6 pages per minute).

One of the big challenges with microfiche conversion is how the quality (darkness) can vary drastically from page to page, since they were filmed at different times, with different cameras and processing. Some of the newest scanners save the gray scale image, at least temporarily, with great detail. These gray images are far larger than practical for permanent storage, but they allow the virtual rescan of a "bad" page, in a quality checking or indexing step, without returning to the scanner. Vendors are only starting to build systems that take advantage of this technology.

Potential Microfiche Equipment vendors

Early Sunrise brand scanners were the standard of quality for many years, but were complex to set up and operate. A replacement line of Sunrise 2000 models were remarkably unreliable - between the quality and company problems, many distributors dropped the Sunrise line. Presumably the problems have been fixed with new management and new products, but they have not completely rebuilt their reputation or distribution and support network, and their software quality is suspect.

Mekel is another popular brand of microfiche scanner. The older models had very poor image quality but the newer model M565 is greatly improved. They have an automatic feeder, so theoretically no operator intervention is required between fiche. However, in practice, manual intervention is needed, so the automatic feed is disabled. Contrast adjustment and recognition of each frame is faster/easier than the early Sunrise, but the actual scan is slower, so the net result is comparable performance. Mekel was bought by another company, who said they did not change the system, but one installation I was involved in before the purchase worked well, and a different installation after the purchase, using similar procedures and interface software, was a major headache.

Wickes and Wilson are the market leaders in aperture cards, good on roll film, and reportedly slow and steady on microfiche scanners, but I have not worked with their equipment, nor worked with anyone who has direct experience.

nextScan, recognized this dismal selection of units a few years ago, and formed a new company to enter the film conversion marketplace. Their roll film scanner is impressive, and after numerous delays, their microfiche scanner looks good. The company appears to have good technology but is getting a bad reputation for support and customer service.

I am starting to get rumors that a new product may be coming in the microfiche conversion marketplace... stay tuned.

Steps in a Microfiche Conversion

  1. Select the fiche to be converted. (Many organizations have a system for users to request documents, which may be adapted to trigger the conversion.) Sometimes this request list is printed (so the person pulling the fiche has a list to carry), including a bar code with the policy or contract number. (Minor changes may be required to the existing request system.)
  2. Name the batch, often with the policy or contract number. In some cases this is taken from a bar code printed on the "pull list." This can be used as the name of the directory that will hold the images from that fiche or set of fiche. Image file naming conventions can be established to keep the fiche number (of the few fiche in a batch), as well as the row and column information that may be useful during indexing.
  3. Insert the fiche and perform the prescan. The fiche will appear on the display, with a colored line around each image that was identified by the scanner. Adjust the light and contrast to optimize recognition of the images and to achieve the best quality scan.
  4. Adjust the discovered images as required. A smudge may be picked up as an image, and can be discarded. If two pieces of film are touching, they may be treated as a single very wide page; the operator can tell the scanner to split the pages apart. If an image was missed by the prescan, the operator can mark the area where the scanner will find it.
  5. Perform the final scan. The images will be stored in a directory on the PC associated with the scanner. In some cases a separate indexing file can be created with a line of text for each image/frame (especially useful if the scan must be restarted). It is likely that a few frames on each fiche will be so light or dark that they cannot be satisfactorily scanned with the settings used for the rest of the images on the fiche. With the Sunrise scanner, it is easy to "add an extra fiche" to the batch with a rescan of the "bad" pages (which will need to be put in the proper place in a later step). With the Mekel scanner, a subsequent scan (perhaps with a different setting) of specific frames will replace the corresponding image file for that frame.
  6. When the batch, consisting of one or a few related fiche, is complete, the image files can be transferred to a separate review and indexing PC, network connected to the scan station. Indexing information such as the contract number and company code, and perhaps other information associated with the entire fiche is entered if it wasn't already used to identify the directory/batch. Replace "bad" pages with rescanned pages, if necessary. Perform any further quality checking required.
  7. (Optional) Perform image enhancement if desired (beyond any already performed during scanning). Software is available to automatically remove edge lines, deskew the image, eliminate noise (speckles) in the image, etc. If quality is very important, the operator can even manually crop the image to the original size. In the standard system the threshold will have already been set, and the image will be pure black and white. Some scanners capture and save the image as gray tones, rather than a binary black and white image. The review and indexing workstation operator can then adjust the threshold (sometimes called the brightness or contrast) and convert the image to pure black and white at the time of review, thus enhancing the quality without rescanning the image.
  8. If some of the pages still need to be repaired (rescanned), this might be done on a low cost reader-printer rather than rescanning the fiche on the high speed scanner. The viewing/editing/indexing program described below might be run on a the computer with the reader printer, so that the any repaired images could immediately be added to (or replace) the previously scanned pages.
  9. The images are now ready to be stored. An approach that has worked well is to display the images in a viewer/editor that allows pages to be grouped into individual documents. Thumbnail images of each page can be "drag and dropped" into documents. A page or document can be selected for "full size" viewing, if desired. As each set of pages is reviewed and identified, it can be saved, entering (or selecting from a list) the type of document or other information necessary to complete the indexing and storage process. The other indexing information can also be changed, if necessary, if the document was placed in the wrong jacket (misfiled).
  10. (Optional) Delete obsolete or duplicate documents, or documents that are no longer required (e.g. an address change from several years ago). This basic file purge might be very fast, and in most customer-oriented files can typically eliminate 30-50% of the pages.
  11. When a fiche is complete, the document request system or other notification may need to be triggered so that the user waiting for the fiche knows the documents are available. This may require a custom interface to existing systems.
  12. The completed fiche should be stored separately, so that it is not scanned again. After a brief period, such as 60 days, it should be destroyed. Legally it is not wise to keep two separate sets of files, since both would be subject to legal discovery and thus would both have to be fully stored, supported, and maintained.

Conversion economics

Microfiche conversions are fun. Once the equipment is installed and the users are trained, fiche can be converted as needed by the users. If 100 files (fiche) are required per day, all 100 users will have the images of the fiche within minutes or hours. In fact, many scanning units can handle around 200 fiche per day with skilled operators, so half the capacity of the equipment might be used to immediately satisfy user needs, and the other half of the capacity might be used to "convert the rest" of the fiche - to gradually eliminate the separate film records operation.

Roll microfilm conversions are less fun. If high speed equipment is installed and the operators trained, 20-50 rolls can typically be converted per day. But if 100 files are required each day, and each file has an average of 10 documents, as many as 1000 rolls would have to be converted to meet the requirement. Only a few users, not all 100 users, will be satisfied when the conversion is begun. And most of the requests still will need to be handled following the "old" procedures that we were trying to eliminate.

After the first few months, thousands of rolls will have been converted. Hopefully the most active rolls will be converted first, so many of the 1000 documents that might be required to satisfy 100 requests per day will already be on the image system. A few requests may already be complete, and not require a request at all. Some requests will only need a few documents, which might be satisfied by moving the required rolls to the "head of the list." The "old" conversion procedures will probably still be required, but the number of documents that have to be sent through the old (expensive) process will be dropping.

Eventually the roll film scanner will be able to handle all the requests received each day, and the old conversion process can be discontinued. How long before this happens depends on the size of the roll-film library, and the pattern of requests. The conversion should continue "full speed" as time allows, until all the film is converted.

The use of a service bureau will probably cost more than an in-house roll-film scanning operation, but will allow the conversion to proceed more rapidly. Thus the expensive old procedures can be eliminated far sooner than with an in-house operation, perhaps making the total cost of the conversion less than with in-house scanning. This is certainly an option that should be considered.

Back to the home page at www.plesums.com

Back to the Document Imaging index at www.plesums.com

Send e-mail comments to Charlie@Plesums.com

©2004, 2007 by Charles A. Plesums, Austin, Texas USA. ALL RIGHTS RESERVED. You may license additional copies of this document through a nominal royalty payment as specified on www.plesums.com.