When establishing a document imaging system, one critical part of the operation in the scanning and indexing. Often this is best done in the mail room - they are the people who have understood, identified, and delivered the mail for the last 100 years! Another great candidate is the records area, which has been filing, identifying, and locating documents for as long as there has been paper or microfilm.
One of the major questions is whether it is better to have a few very high speed scanners or several slower scanners. The answer to that question is related to the variation in the documents, and the amount of preparation required. If every letter is a different shape and size, it can be economical to do minimal preparation prior to scanning, then manually feed each (odd) sheet at the scanner. If there is consistency in most of the pages, then a high speed scanner is more economical, with preparation completed before bringing the work to the scanner.
The performance of some of the lower cost scanners has reached the level of the far more expensive production scanners. So why pay the big money for the production scanner? The production scanners are designed for high volume operation - a large duty cycle with lots of pages per day, and lots of total use. As one person put it, metal gears vs. plastic gears. Some of the low cost scanners are great, but if you need a heavy duty unit, it doesn't pay to be cheap.
Many of the newer scanners can capture color images, or can send the gray image (that is the start of every pure black and white image) to the connected workstation. Color or gray scale may not be practical for all documents yet (see the separate paper on Gray Documents), but the investment may be worthwhile. For example, one production scanner only costs about 15% more if full color is delivered with the new scanner, but a later field upgrade is twice as expensive. Software is starting to emerge that will allow a "rescan" from the gray scale image, without physically rescanning the document.
With a slow speed scanner, preparation can be as simple as extracting the mail from the envelope, and confirming it is business work - not an ad for ink jet cartridges or a training course. But with high speed scanners, it is common to
The goal of the preparation is to allow the scanner operator to keep the scanner running continuously, while capturing high quality properly oriented images, ready to use. The people who do the preparation work can also substitute as scanner operators, and vice versa (this builds understanding of the issues faced in both roles), but most people enjoy building their proficiency, so should be allowed to specialize in a primary position.
How many people are required for document preparation? If you are just scanning postcards, almost none. If each envelope contains a complex combination of documents, the process is far slower. For a starting estimate assume that each prep clerk can average (over an entire day) between 300 and 900 pages per hour - 5 to 15 pages per minute.
The goal of document indexing is threefold:
There are several different situations that must be considered in planning the indexing function:
High priority work that will be processed immediately. This may only be identified by the type of work, to get it to the right processing unit. For example, cash received for investment must be processed within hours, so most organizations move it to the processing area as quickly as possible, and don't try to identify it before it is processed. Within hours, it will be both identified and processed.
Work that may take longer to process. Enough identification (such as customer number) needs to be added before it goes into the work queue, so that it can be located if the customer calls. Some people argue that there needs to be at least two indexes so that a document can still be located if one is entered incorrectly - an automatically entered "date received" often fills that need.
Turn around documents returned. Bar codes or Optical Character Recognition - OCR - can be used to provide virtually all the indexing. These documents are usually isolated as they are extracted from the envelopes.
Turn-around documents received. These are the documents that have to be examined or completed, and then sent out. Generally a "file copy" is scanned, even if the form must be completed, to confirm that it was received, and protect from loss. If no changes are made, this may be the official record (for example death certificates are typically examined then returned by insurance companies). If the form must be completed, it may be scanned again as part of the outbound operation.
Work that requires substantial data entry. Applications, orders, or even address changes may require substantial data entry that would also provide the indexing information. Some organizations have the indexing staff provide the initial data entry, or establish a special team for that purpose. If the data is entered almost immediately, separate indexing is not required. If there can be a backlog that delays the processing, enough "indexing data" must be entered so that the documents can be found.
Money that must be isolated and secured as soon as possible. Scanning is preferable to copying, but some procedures call for identifying the funds before scanning (the users can work from the image, but treasury must work from the check). This could be by looking up the contract number and writing it on the check for additional investments, or if there is no contract yet, placing a matching bar code on the application and check.
Premium customers. Many organizations try to provide a faster service to their best customers (or the customers of their best sales people). That work may be given a higher priority, or may be handled by a more experienced team. If a code is on the documents as they arrive (such as a special Post Office box) the system will collapse as everyone leans the special coding. The alternative is to identify the customer during indexing, so the premium customer coding can be part of the workflow. This can also be used to isolate employees who may also be customers, well known (VIP) customers, etc.
Why burden the scan center with processing beyond just delivering the mail? By reading the entire letter, all the business issues are identified - perhaps the most critical issue is in the middle or end of the letter. Some companies consider an address change very high priority processing - we may be mailing something to the customer today. (If there are business consequences of the address change, they are processed later by the experts, who are relieved of the data entry effort.) And the good folks in the mail room are often less expensive than the staff with special business skills.
How many people are required for indexing? If the process is simply to read a number from the image and key it, operators can sustain a rate of three documents per minute over the course of a day - 20 seconds per document. However, if they must sometimes look up the account number that is not on the letter, or check a number that is not valid, the rate can quickly drop to an average of one document per minute. Some companies have the mail room read all letters to be sure all the business issues are identified, look up every account number to be sure the customer did not make an error, and make routine changes such as address changes. In one large operation this led to a standard of 25 letters per hour over all the working hours in the day - 2.5 minutes per letter.
One definition of the skill required for indexing is to make the position comparable to a receptionist. This person must understand the general terminology of the industry, and the organization of the company. They must be able to look up customer and organization data, perhaps using computer systems. They direct people (or mail) to the correct place for processing. And although they may not be perfect, they must do it right the overwhelming portion of the time.
Be sure to distinguish between a document (such as the three pages of a letter), and a page (which represents the two pieces of paper that have to go through the scanner for that three page letter), and an image (we could capture four images - front and back of the two pages for that three page letter). Hopefully the process will allow the blank fourth image to be discarded. And when looking for the speed of the scanner, count the number of documents plus the number of pages, because most systems will require an extra "patch page" or other divider sheet to separate documents in a stack - and that patch page has to be scanned also.
The actual speed of a scanner is an interesting number. However, it bears little relation to the number of pages that can be scanned in an 8 hour work shift. For example, one of the better scanners on the market will feed 160 pages per minute - theoretically over 75,000 pages or 150,000 images per day. In practice that scanner can be used for about 10,000 documents per day - perhaps 25,000 production pages per day, or 35,000 pages counting bar codes or patch pages. Thus when planning scanning capacity, recognize that outstanding performance by skilled operators often produces less than 50% of the theoretical speed. And for planning, count on at least 6 weeks on the job for a new operator to get up to that "full" speed.
If you have detailed records of your mail volume, great. If not, most facilities plan on twice the average mail volume on the peak mail day (typically Monday) - i.e. Monday will normally have at least 40% of the average weekly volume. Other facilities estimate that the busy day will bring 50% of the total volume for the week (pretty close to the other estimate). And the volume just after a 3 day weekend (or an advertising promotion) can be far higher, above and beyond whatever seasonal variations are seen in your business.
Since the busy mail day is typically also a busy telephone day, the users will probably not be able to even look at, much less process, all the Monday mail until mid-Tuesday or later. Many companies insist that the mail room process all mail the day it was received, but eventually realize that it won't be processed immediately, even if it is all scanned and indexed. Therefore it may pay to isolate high priority mail and carry-over some of the rest of the mail, rather than forcing overtime to complete all mail handling the day it arrives.
Should indexing be done before scanning, during scanning, or after scanning?
A strong argument can be made for indexing before scanning - a more detailed examination of the paper that will best identify turn around documents or other special cases. However, this is rarely done. Most facilities categorize the documents (using preprinted bar code sheets) during preparation, and then read the image of the mail to check the scanning and complete the indexing after scanning. It is never recommended to print custom bar code sheets, with account numbers, during document preparation before scanning. The procedures are awkward (inefficient or expensive), and the chance of error is very high.
Indexing during scanning - sometimes called interactive scanning - is sometimes used with low speed scanners and special documents. However, it should only be used where simple procedures or special documents are more important than operating efficiency. With the operator having to alternate between mechanical skills (running the scanner is like running a copier), and clerical skills (indexing), they will never become efficient at either.
Indexing after scanning is the most common approach. With high speed scanners, the operator can barely detect if the image is being captured, so this is also an important check on the quality of the scanned image.
How many people are required for scanning? With good document preparation, no interactive indexing, and typical folds and wrinkles in the paper, one operator is still required for each high speed scanner, with additional people who can keep the scanner running while the operator takes a break or goes to lunch. If the documents are excellent quality, one operator can support more than one scanner. (But if the documents are that good, such as printed reports that are being scanned, there are probably better ways of capturing the images.) A full time operator is also required for each low speed scanner.
Scanning can be a boring job, but there are a few people who enjoy the challenge of producing very high quality images, while getting the largest volume of work through the scanner. Generally it is a small step up from scanner operator to indexer, but an especially good scanner operator is a valuable asset to a company and should not be encouraged to change. The document preparation people are generally the same level as most scanner operators, and are often used as substitute scanner operators during breaks and absences. (Having prep clerks run the scanner also helps them understand how to do the best job of document preparation.)
Electronic document imaging allows a mail room operation to be located away from the primary processing facility for a company - as one person said, "the post office charges the same to deliver the mail anywhere!" Sometimes locating the mail room near a major postal center can make up for time that might be lost moving the mail across the country. Beware, though, the lure of cheaper real estate and lower salaries should not move a mail operation to a remote location with poor mail service.
Some companies are looking into off-shore services. Physically moving the paper off shore for scanning is difficult without delaying the capture, except in some border towns. But many companies are using remote indexing and data entry services, even around the world, so that people (with native language skills) can get all the data entered "by the next morning." Since the data entry is also often the check of scanning quality, the availability of gray scale images for an electronic rescan, as noted above, may become interesting in the future.
In the event of a disaster, many companies worry about having sufficient backup.
One type of disaster is loss of a key piece of equipment - such as the mechanical failure of a high speed scanner. One rule of thumb is to configure the system so that peak days can be processed with two or more scanners in a single shift. This allows normal mail processing on most non-peak days with one of the scanners out of service. It allows the mail to be processed with overtime if a scanner is lost on a peak day.
Another type of disaster is physical, whether an earthquake or tornado, or even just the flu among the staff. A separate physical facility is required for this type of protection. (Some companies contract with other companies to be their backup for these rare cases, rather than establishing a second center and losing the economy of scale.) Remember that physical mail is involved, so a second facility across the country cannot be used to process the critical mail received today. Although the back up facility should be close, one rule of thumb is that it should be at least 15 miles away for protection from natural disasters. If there is a choice of location, remember that an incoming mail operation needs to be towards the East, so that the mail is available in the earlier time zone.
A staggered operating shift should be a given. High volume mail operations can pick up mail directly from the post office as early as 5 or 6 in the morning. There are special delivery services that do an early post office pick-up for more modest mail operations. Extremely high volume operations can even arrange to take their mail directly off the airplane at the airport. This allows the extraction and preparation people to start early, followed by scanner operators, followed by indexers, so that users can have the first mail received by the time they arrive at work. By the middle of the normal work day, all the mail can be in the system.
One enterprising company, recognizing that much of their mail had to be processed before the close of the stock markets on the day it was received, made a point of not getting the mail until late afternoon, so that they had almost 24 hours to get it into the system and processed.
1With contributions on the performance standards for various positions by Steven M. Parker, Austin Texas, who has run several major scan centers, and (as a consultant) has built several more new centers.
Back to the home page at www.plesums.com
Back to the Document Imaging index at www.plesums.com
Send e-mail comments to Charlie@Plesums.com
©2001, 2003 by Charles A. Plesums, Austin, Texas USA. ALL RIGHTS RESERVED. If you would like to make or distribute copies of this document, a nominal royalty payment is required, as specified on www.plesums.com.