Text
updated for 5.6
The recognition of textual elements is delegated to the Tesseract OCR library.
The TEXTS
step runs the OCR on the current sheet. We can also manually run the OCR on a selected collection of glyphs or even drag n’ drop text items from the Shape
board.
Running the OCR results in one or several text words gathered in sentences, which we can further modify manually, in terms of:
- textual content,
- font attributes and size
- type of words and sentences,
- role of every sentence.
Table of contents
TEXTS step
The TEXTS
step runs the OCR on the whole sheet image and tries to assign to each OCR’d item its content, attributes, size, type and role.
This engine step is influenced by three options available in the Book → Set book parameters
menu:
- Support for chord names
- Support for lyrics (assumed to be located below the related staff)
- Support for lyrics even located above staff
Chord names and lyrics are special items; this is the reason why their recognition must be explicitly selected to avoid collateral damages of the OMR engine when they are not desired.
On the other hand, the metronome marks, thanks to their recognizable structuring, don’t require the setting of any specific option.
Manual OCR
The OCR can also be launched manually on a glyph(s) selection by pressing one of the buttons provided in the Texts
palette of the Shape board:
- The
text
button - The
lyric
button, - The
metronome
button,
There are separate buttons because lyric items have a behavior significantly different from other text items – especially the gap between words can be much wider. And the metronome is a specific item on its own.
By manually choosing one button or another, we clearly specify the desired result type – and thus the sentence role – of the OCR operation.
We can as well drag n’ drop items from the same Texts
palette.
In this case, no OCR is performed, and we have to manually enter every word content.
Sentence vs. Words
A Sentence
inter is an ensemble of one or several Word
inter(s):
-
Any
Word
handles its textual content, font attributes, font size and location. We can manually modify any of these informations. -
A
Sentence
is a sequence of words (we can easily navigate from a selected word to its containing sentence via theToEnsemble
button of the InterBoard).A sentence is assigned a role, which we can edit.
The sentence content is simply defined as the concatenation of the contents of its words members. Except for the case of
Metronome
– which mixes text and music characters – we cannot modify a sentence content directly, but rather via each of its words members.
Word editing
Here is the example of an input line and the corresponding OCR result, at the end of the TEXTS step:
Source | Image |
---|---|
Input line | ![]() |
OCR result | ![]() |
Word content
The word “senza” has been OCR’d as “scnza”.
We can modify the content in the text field and press ENTER
.
Word attributes
new in 5.6
We can do the same content modification for the word “sordino” OCR’d as “sardino”.
But this is not enough. The source was in italic and the result is displayed in upright style.
To fix this, we can now modify the word attributes, via the Attributes
field.
For “senza” the OCR’d attributes were “IS” (Italic Serif), while for “sordino” the OCR’d attributes are just “S” (Serif).
So we can simply change the string in the Attributes
field to “IS”, and press ENTER
.
Handling word attributes is a new feature provided by the 5.6 release.
The attributes, as transcribed by OCR or modified by the user, can be represented by a string composed of the BIUMSC
characters.
Letter | Meaning | Use |
---|---|---|
B | Bold | Style |
I | Italic | Style |
U | Underlined | - ignored - |
M | Monospaced | Font type |
S | Serif | Font type |
C | Small Caps | - ignored - |
These attributes are used to:
- Choose the text font (based on the
M
andS
attributes):
The chosen font is Serif (if specified), otherwise Monospaced (if specified), otherwise *Sans Serif *1 - Apply a font style (based on the
B
andI
attributes) : The chosen style is Plain or Bold or Italic or Bold+Italic.
Note: As of this writing, the U
(Underlined) and C
(Small Caps) attributes are not supported and thus merely ignored.
This results in the possible combinations:
Type / Style | ( ) Plain | (B)old | (I)talic | (B)old (I)talic |
---|---|---|---|---|
( ) Sans Serif | ![]() | ![]() | ![]() | ![]() |
(S)erif | ![]() | ![]() | ![]() | ![]() |
(M)onospaced | ![]() | ![]() | ![]() | ![]() |
Word location and size
Still working on the same sentence, the very first word (“sempre”) needs to be fixed:
Modifying the word content and attributes, we get this first result:
The word bounds are too wide when compared to the underlying pixels. To fix this, we have to put the word into the edit
mode, either via a double-click on the word, or by ticking the edit
checkbox in the InterBoard.
A word being edited shows 2 handles:
- The middle handle can move the word into any direction
- The right handle can increase or decrease the word size (and the related font size accordingly).
Moving the right handle to the left allows to reduce the word width:
And later, in the SYMBOLS
step, the “PP” glyph will indeed be recognized as a pianissimo symbol.
Sentence editing
A sentence role can be set to any value among:
- UnknownRole
- Lyrics
- ChordName
- Title
- Direction
- Number
- PartName
- Creator
- CreatorArranger
- CreatorComposer
- CreatorLyricist
- Rights
- EndingNumber
- EndingText
- Metronome
Since the 5.2 release, in all cases, we can manually modify the sentence role afterwards, from any role to any other role.
Plain sentence
A “plain” sentence is any sentence which is assigned a role different from Lyrics
, ChordName
and Metronome
.
Following an OCR recognition (Texts
step or manual OCR), the role of each resulting plain sentence is precised. Based on a bunch of heuristics, the engine tries to further distinguish between plain roles like: direction, part name , title, composer, lyricist, etc.
Chord name
A chord name is a musical symbol which names and describes the related chord.
For example: C
, D7
, F♯
, B♭min
, Em♭5
, G6/B
, Gdim
, F♯m7
, Em♭5
, D7♯5
, Am7♭5
, A(9)
, BMaj7/D♯
.
As of this writing, the Audiveris engine is not yet able to recognize chord names that include true sharp (♯
) or flat (♭
) characters. Perhaps one day, we will succeed in training Tesseract OCR on this text content.
For the time being, Audiveris is able to recognize such chord names when these characters have been replaced (by OCR “mistake”, or by manual modification) by more “usual” characters:
- '#'
(number) as replacement for '♯'
(sharp),
- 'b'
(lowercase b) as replacement for '♭'
(flat).
When we OCR a chord name word, Audiveris may be able to decode it as a chord name and thus wrap it within a chord name sentence.
If Audiveris has failed, we can still force the chord name role (at the sentence level) and type in the missing b
or #
characters if so needed (at the word level). The chord name will then be decoded on-the-fly with its new textual content.
Note we don’t have to manually enter the true sharp or flat signs. Entering them via their Unicode value is a bit tricky and, in the end, useless.
Instead, when text has been recognized or assigned as a chord name, its internal b
or #
characters are automatically replaced by their true alteration signs.
For example, we can type “Bb” then press Enter
and the chord name will be translated and displayed as “B♭”.
Lyric line
A lyric line is a sentence composed of lyric items.
When selected, the Inter
board displays additional data:
- Voice number,
- Verse number,
- Location with respect to staff.
Each syllable (lyric item) is usually linked to a related chord, either above or below.
But it is not always obvious whether the text concerns the staff above or below nor is it always clear which voice is concerned.
If a syllable is not linked to the correct chord, we can modify this link manually by dragging from the syllable to the suitable chord. This will update on-the-fly the line data (voice, verse, location).
Metronome mark
Since the 5.4 release, the metronome marks can be automatically recognized.
We can also edit them afterwards and even create new marks from scratch.
A metronome mark is a sentence composed of words.
One of its words is special as it contains not textual characters but music characters. This word is the BeatUnit
word.
Editing the metronome is detailed in this specific section.
-
Combining
M
(monospaced) andS
(serif) could produce a font like “Courier New,” but the result is unsatisfactory. Therefore, in “MS” theM
is ignored in favor of theS
. ↩