Text

Recognition of textual elements is delegated to the Tesseract OCR library.
This recognition is performed by the OMR engine (this is the TEXTS step). It can also be performed manually on a provided glyph.

The resulting hierarchy of sentences and words can also be manually modified by the user.


Table of contents

  1. Recognition of text items
  2. TEXTS step
  3. Manual OCR
  4. Sentence vs Words
  5. Plain Sentence
  6. Chord Name
  7. Lyric Line

Recognition of text items

It is very difficult to automatically derive the meaning from the textual items in a musical score.

For lyrics, the single syllables are connected to the chords above or below.
But it is not always obvious whether the text concerns the staff above or below nor is it always clear which voice is concerned.

For plain text, Audiveris tries to detect the text role, such as directions or typical header elements like: title, composer and lyricist. If it fails, the role can be easily corrected manually.

TEXTS step

The TEXTS step runs the Tesseract OCR on the whole image and tries to assign to each textual item its probable content, type and role.

This engine step is influenced by three options available in the Book parameters menu:

  • Support for chord names
  • Support for lyrics (assumed to be located below the related staff)
  • Support for lyrics even located above staff

Chord names and lyrics are special items; this is the reason why their recognition must be explicitly selected to avoid collateral damages of the OMR engine when they are not desired.

Manual OCR

Tesseract OCR can also be launched manually on a glyph(s) selection by pressing one of two buttons provided in the Physicals family of the Shape palette:

  • The lyric button,
  • The text button.

There are two separate buttons because lyric items have a behavior significantly different from plain text items, especially the gap between words can be much wider. By choosing one button or the other, the user clearly specifies the desired result type of the OCR operation.

Sentence vs Words

A Sentence Inter is an ensemble of one or several Word Inter(s):

  • A Word handles its textual value and location. Word sub-classes (ChordName and LyricItem) handle additional data.
    The word value is modifiable by the user:

  • A Sentence is a sequence of words.
    (We can easily navigate from a selected word to its containing sentence via the ToEnsemble button).
    Its textual content is defined as the concatenation of its word members. This content is not modifiable directly, but rather via its word members.
    The sentence role is modifiable by the user.

A sentence role can be set to any value among: UnknownRole, Lyrics, ChordName, Title, Direction, Number, PartName, Creator, CreatorArranger, CreatorComposer, CreatorLyricist, Rights, EndingNumber, EndingText.

Plain Sentence

A “plain” sentence is any sentence which is assigned a role different from Lyrics.

Following an OCR recognition (OMR engine or manual OCR), the role of each resulting sentence is determined by some heuristics. In the case of manual OCR, the lyric button will always result in the lyrics role, whereas the text button will always result in a non lyrics role.

Since the 5.2 release, in all cases, the end-user can manually modify the sentence role afterwards, from any role to any other role.

Chord Name

A chord name is a musical symbol which names and describes the related chord.

For example: C, D7, F♯, B♭min, Em♭5, G6/B, Gdim, F♯m7, Em♭5, D7♯5, Am7♭5, A(9), BMaj7/D♯.

As of this writing, the Audiveris engine is not yet able to recognize chord names that include true sharp () or flat () characters. Perhaps one day, we will succeed in training Tesseract OCR on this text content.
For the time being, Audiveris is able to recognize such chord names when these characters have been replaced (by OCR “mistake”, or by manual modification) by more “usual” characters:
- '#' (number) as replacement for '♯' (sharp),
- 'b' (lowercase b) as replacement for '♭' (flat).

When we OCR a chord name word, Audiveris may be able to decode it as a chord name and thus wrap it within a chord name sentence.

If Audiveris has failed, we can still force the chord name role (at the sentence level) and type in the missing b or # characters if so needed (at the word level). The chord name will then be decoded on-the-fly with its new textual content.

Note we don’t have to manually enter the true sharp or flat signs. Entering them via their Unicode value is a bit tricky and, in the end, useless.
Instead, when text has been recognized or assigned as a chord name, its internal b or # characters are automatically replaced by their true alteration signs.
For example, we can type “Bb” then press Enter and the chord name will be translated and displayed as “B♭”.

Lyric Line

A lyric line is a sentence composed of lyric items.

When selected, the Inter board displays additional data:

  • Voice number,
  • Verse number,
  • Location with respect to staff.

Each syllable (lyric item) is usually linked to a related chord.

If a syllable is not linked to the correct chord, we can modify this link manually by dragging from the syllable to the suitable chord. This will update on-the-fly the line data (voice, verse, location).