OCR SDK Knowledge Base

Article ID: 1188 | Category: Recognition | Type: How To | Last Modified: 1/15/2014

Regular Expressions

Description

How to use regular expressions?

Solution

There are several things that are strongly recommended on using the regular expressions:

  • specify the letter set,
  • specify that the language you use is not natural,
  • specify that words from dictionary only should be used.

All these things are implemented in the following sample code:

string fileName = Path.GetFileName(filePath);
string outputPath = Path.Combine(_outputDir, fileName);
var frDoc = _engine.CreateFRDocumentFromImage(filePath, null);
var rp = _engine.CreateRecognizerParams();
//set RegExp------------------------------------------------
FREngine.LanguageDatabase languageDatabase = _engine.CreateLanguageDatabase();
FREngine.TextLanguage textLang = languageDatabase.CreateTextLanguage();
FREngine.BaseLanguage baseLang = textLang.BaseLanguages.AddNew();
baseLang.set_LetterSet(FREngine.BaseLanguageLetterSetEnum.BLLS_Alphabet, "$0123456789,.");
baseLang.IsNaturalLanguage = false;
baseLang.AllowWordsFromDictionaryOnly = true;
var dictDescr = baseLang.DictionaryDescriptions.AddNew(FREngine.DictionaryTypeEnum.DT_RegularExpression);
dictDescr.GetAsRegExpDictionaryDescription().SetText(@"[$0-9,.]+");
rp.TextLanguage = textLang;
//------------------------------------------------
var region = _engine.CreateRegion();
region.AddRect(375, 21, 465, 31);
frDoc.Pages[0].Layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region, 0);
frDoc.Pages[0].Layout.Blocks[0].GetAsTextBlock().RecognizerParams = rp;
frDoc.Recognize(null, null);
frDoc.Export(outputPath + ".txt", FREngine.FileExportFormatEnum.FEF_TextUnicodeDefaults, null);

The ABBYY FineReader Engine regular expression alphabet can be found in the article Help > Index > regular expressions.

425 people think this is helpful.
Was this information helpful to you?