SoftwareForLitSupport FAQ
QUESTION
Why am I seeing lines from the LFP file instead of OCR text?
ANSWER
MergeOCR uses the relative location of the LFP file, and key values and directory information within the LFP to locate each single-page text file.
This is best explained with a best practices example.
- Suppose your files structure looks like the following:
VOL001
VOL001\IMAGES
VOL001\IMAGES\00
VOL001\IMAGES\00\00
VOL001\IMAGES\01
VOL001\DATA
- The images subdirectories contain single-page TIFF files and single-page OCR text files.
- The DATA directory contains an image viewer load file (LFP, OPT, or DII file).
Step (1) : Create a directory named "SPOCR", for Single Page OCR.
Step (2) : Create a directory named "MPOCR". This directory will hold the consolidated OCR files.
VOL001
VOL001\IMAGES
VOL001\IMAGES\00
VOL001\IMAGES\00\00
VOL001\IMAGES\01
VOL001\DATA
VOL001\SPOCR
VOL001\MPOCR
Step (3) : Copy all the single-page OCR files from the IMAGES subdirectories into the SPOCR directory.
Step (4) : Create a copy of the LFP into the VOL001 directory. Rename the LFP file OCR.LFP. If you have only an OPT or DII file, use the free load file conversion tool iConvert to create an LFP file.
Step (6) : Open the LFP in a text editor that is able to edit in column mode (e.g. UltraEdit). Search for all occurences of two consecutive commas ,, and replace those occurences with , , two commas sparated with a space. This allows you to edit colimns accurately.
Step (7) : Change the LFP's directory information for each line to "SPOCR".
Before:
IM,BATES001,D,0;VOL001;IMAGES\00\00;BATES001.TIF;2
IM,BATES002, ,0;VOL001;IMAGES\00\00;BATES002.TIF;2
After:
IM,BATES001,D,0;VOL001;SPOCR;BATES001.TIF;2
IM,BATES002, ,0;VOL001;SPOCR;BATES002.TIF;2
Step(8): Run MergeOCR. Select OCR.LFP. Select "MPOCR" as the output file location.
- MergeOCR opens file OCR.LFP, then looks in relative directory "SPOCR" for file BATES001.TXT, BATES002.TXT, etc.
When you are satisfied that the consolidated OCR files are accurate, you can delete OCR.LFP, directory SPOCR and directory MPOCR.
KEYWORDS
MergeOCR
return