OCR SDK Knowledge Base

Article ID: 2063 | Category: Recognition | Type: How To | Last Modified: 1/31/2017

Creating a custom language with additional characters

Description

How can I create a copy of a predefined language which would include additional characters using ABBYY FineReader Engine API?

Solution

Sometimes documents may contain characters that are not a part of the recognition language. For instance, a text in English about maths or physics may use Greek letters, and currency symbols may occur in financial documents.

To recognize such documents, you could create a custom language based on one of the predefined ones and add neccessary characters to it. Additional charatcers should be added to a letterset (prefixes, suffixes or the main alphabet) of one of the base languages.

C# code sample below creates a custom language based on English with an additional rupee symbol added to the prefix letterset:

// Create new recognition language
TextLanguage textLanguage = makeTextLanguage();
DocumentProcessingParams documentProcessingParams = engineLoader.Engine.CreateDocumentProcessingParams(); documentProcessingParams.PageProcessingParams.RecognizerParams.TextLanguage = textLanguage;
document.Process( documentProcessingParams );
private TextLanguage makeTextLanguage() {     // Create new TextLanguage object     LanguageDatabase languageDatabase = engineLoader.Engine.CreateLanguageDatabase();     TextLanguage textLanguage = languageDatabase.CreateTextLanguage();
    // Copy all attributes from predefined English language     TextLanguage englishLanguage = engineLoader.Engine.PredefinedLanguages.Find("English").TextLanguage;     textLanguage.CopyFrom(englishLanguage);     textLanguage.InternalName = "SampleTextLanguage";
    // Bind new dictionary to first (and single) BaseLanguage object within TextLanguage     BaseLanguage baseLanguage = textLanguage.BaseLanguages[0];
    // Change internal dictionary name to user-defined     baseLanguage.InternalName = "SampleBaseLanguage";
    string letterSet = baseLanguage.get_LetterSet(BaseLanguageLetterSetEnum.BLLS_Prefixes);     letterSet = letterSet.Insert(0, "₹");     baseLanguage.set_LetterSet(BaseLanguageLetterSetEnum.BLLS_Prefixes, letterSet);
    return textLanguage; }
225 people think this is helpful.
Was this information helpful to you?