BrailleMaster Resource Page

Constructing A Braille Rule File

Before the arrival of BrailleMaster, programs which performed text-to-Braille conversion were based on a fixed set of rules, defined by the programmer. As Braille code is quite complex, and, like a living language, subject to evolutionary changes, it often happens that fixed-rule Braille translation programs do not follow precisely the particular Braille code which they are supposed to represent.

BraillMaster represents an important breakthrough in computerized Braille production. It makes it possible to adapt the translation mechanism of a Braille translator to follow given Braille rules precisely, and it also allows Braille users to construct their own translation rules, regardless of language or application.

BrailleMaster does this with the introduction of a symbolic language called LOUIS. LOUIS is simple to use, yet powerful and flexible enough to be able to define Braille rules for most national languages, as well as for special applications, such as math Braille, music Braille, etc.

The name LOUIS has been as a tribute to Louis Braille. Robotron has conceived LOUIS in order to produce better Braille code and to provide the public with the ultimate tool in Braille production and research. Such a universal tool has been long overdue.

LOUIS Rule File Structure

LOUIS representation of a Braille code consists of a sequence of definition lines in a text file called Rule File. Each definition line represents a certain rule of the Braille code. The sequence in which the lines appear in the Rule File is important: The rules are always searched from top to bottom and the higher the definition line is placed in the text, the higher precedence it has over the rules which follow.

All LOUIS definitions which represent a complete description of a particular Braille code are contained in a single text file called Rule File. It is possible to have many Rule Files for different languages and applications. These can be selected even from within translated text, so that different Rule Files can be applied to different portions of the text.

The LOUIS Rule Files are plain text files, consisting of lines. There are three types of lines: Comment Lines, Rule Header Lines and Rule Definition Lines.

A Comment Line always starts with a semicolon and can contain comments for user reference. A Comment Line is ignored during the translation process.

Rule Headers serve to mark individual groups of rules. They start with a single character, followed by "-rule". (A dash followed immediately by the word "rule".) For example, all rules for letter "A" will start with a header "A-rule".

The LOUIS Rule Definitions are divided into three major groups: The Composition Rules, the Non-Letter Rules and the Letter Rules.

Composition Rules are always the first rules in a LOUIS Rule File. They have the highest priority because they determine the general behaviour of the whole Braille system. For example, in the Standard English Grade 2 Braille, the Composition Rules determine such essential imperatives as that each number commences with a number sign, each upper-case character starts with a capital sign, etc.

In the Rule File, the Composition Rules are followed by the Non-Letter Rules.

Non-Letter Rules are preceded by the header "--rule". (Two dashes immediately followed by the word "rule".)

This rule group defines the Braille translation of all printable characters except for letters of the alphabet. These include punctuation marks, special symbols and digits.

The Non-Letter Rules group is further subdivided into three subgroups: The Space Rules, Punctuation Mark Rules and Numeral Rules.

The Space Rules determine how a space character is translated into Braille. This can be a very simple single-line group in case the space is always represented by a single Braille code. However, in some Braille codes, such as the Standard English Grade 2 Braille, space is sometimes removed from the corresponding Braille text, such as for example between the words "and" , "for", "of", "the", "with", etc. Such exceptions have to be defined in the Space Rules.

The Space Rules are followed by the Punctuation Marks group. Here BrailleMaster's translator is instructed how to interpret punctuation marks. All ambiguities in how the Braille code represents punctuation marks must be defined in this section. (For example a dot between digits can result in a different Braille code compared to a dot terminating a sentence.)

The Punctuation Mark Rules are followed by a small group of Numeral Rules which define a Braille representation of digits. The composition of complete numbers is given by the Composition Rules.

There as many Rule Headers in this group as there are characters in the alphabet of the language these rules describe. For example, the subgroup of rules for character A would start with "A-rule", the subgroup of rules for character M would start with "M-rule", etc.

The Letter Rules contain rules for every letter of alphabet. If a character in the translated text does not have a corresponding rule in the Letter Rules, it will be ignored. The Letter Rules can be sometimes very simple, in instances where a certain Braille code always corresponds to a certain letter of alphabet, such as in the Computer Braille. On the other hand, the Letter Rules can be quite complex if there are many ambiguities or many abbreviations and contractions, such as in English Grade 2 Braille.

LOUIS Rule Syntax

Each LOUIS rule consists of two parts: The text part on the left and the corresponding Braille part on the right. A LOUIS rule determines the relationship between the portion of translated text which is matched by the left-hand part of the rule (also called "contextual part") and the resulting Braille code, as specified by the right-hand Braille part of the rule.

For example, a simple rule for the word "here" goes like this:

|{HERE}| 5 125

The | symbols on the left determine the start and end of the context. The { and } symbols denote the text which will be replaced by the Braille code shown on the right. On the right, there are two Braille signs, Dot 5 and Dot 1,2,5, shown either as a series of asterisks and dashes (more on this later).

Indeed, the correct Braille code in English Grade II Braille for the word "here" is Dot 5, followed by Dot 1,2,5. In other words, the word "here", if it stands alone, is always contracted into a special two-character Braille sign.

But what about words such as "Hereford", where the initial "Here" is not supposed to be contracted to Dot 5 and Dot 1,2,5 but rather spelled character by character? That's why the "text" and "context" symbols are separate. In the word "Hereford" the "Here" doesn't stand alone. Its context is different: It "stands alone" from the left, but is followed by the word "ford" immediately on the right.

For this, we can attempt to construct an appropriate rule:

|{H}EREFORD| -**-*-

Note that the rule covers only the character "h" - only the character "h" is surrounded by the text delimiters {} and will be replaced by the Braille code for a single "h": Dot 1,2,5. The next characters will be handled by rules for "e", "r", then "e" again, etc. During translation, the imaginary translation "cursor" moves to the character immediately after the right text delimiter }, and invokes the rule for that particular character.

In order for the "Hereford" rule to take precedence over the more general "here" rule, the "Hereford" rule will be simply placed before the "here" rule. Like this:

|{H}EREFORD| -**-*-
|{HERE}| ----*- -**-*-

Naturally, not all the words in a human language can be contained in the rules. Therefore, BrailleMaster uses various symbols which make it possible to generalize when constructing rules. For example, to instruct the translator that every number has to be preceded by a number sign, a general symbol for "any number" can be used, without the need of specifying all the numbers in the universe, one by one!

The Braille part of a LOUIS rule consists of a series of Braille codes, either shown as digits or a sequence of dashes and asterisks. For example, "hello" in English Braille can be represented either as

125 15 123 123 135

or

-**-*- --*-*- ***--- ***--- *-*-*-

Both forms of Braille description can be freely mixed within the same line.

In the examples above, we have seen that the left-hand part of any LOUIS rule is always surrounded by special delimiters. These characters will be referred to as the "context delimiters". In the LOUIS examples shown here, we will assume that the context delimiters are vertical bar characters: |.

These context delimiters surround the whole contextual part of a rule, so that BrailleMaster knows where this part begins and where it ends.

Within each contextual part, there are two more delimiters, which are referred to as Text Delimiters. The symbols we will be using here are compound parentheses (braces): { and }. The text they embrace is always replaced the Braille code on the right.

When processing a text file, the LOUIS translator scans the rule lines one by one, comparing the original text to the entire Rule Context. When a match is found, only that portion of the original text which is matched by the Rule Text is converted to the Braille code which is specified on the right-hand side of the Rule.

Let us have another example to illustrate why we need the separate concepts of "Rule Text" and "Rule Context":

Firstly, let us establish the rule for contraction "one". According to current English Grade 2 Braille rules, "one" should be Brailled as Dot 5, Dot 1,3,5. The rule should look like this:

|{ONE}| ----*- *-*-*-

or like this:

|{ONE}| 5 135

In these two rules, there is obviously no need to separate Text and Context.

However, the contraction "one" should not be used in words such as "colonel", "anemone", etc. In order to prevent contracting the colonel, the following rule should precede the original one:

|COL{O}NEL| *-*-*-

This will mean that "o" will be translated as Dot 1,3,5, which will cause the translator to leave the "o" rules and move to next character, "n".

Recall that the scanning of a rule is done from left to right, character by character, using an imaginary translation "cursor". When the translation starts, the cursor points at the first text character and a scan through the rules begins. As soon as a match is found, the appropriate Braille code is generated and the "cursor" is placed after the converted portion of the text, i.e. after that portion which is matched by the text part of a rule.

Let us have yet another example, in which we will introduce another concept: say we are converting the word "can" into Braille. According to Standard English Grade 2 Braille rules, the word "can" needs to be converted to a single Braille code, Dot 1,4. To facilitate this particular rule, the Context Part of the Rule Definition could look like this:

| {CAN} |

The Text Part contains the whole word, since the whole word is going to be replaced if the C,A and N characters are matched. The Rule Context, however, also contains space characters on each side of the Rule Text. This is consistent with the English Grade 2 Braille requirements, which specify that only a stand-alone word should be translated in this way. Now let us consider the word "cannot": While the first three letters will match the above Rule Text, the whole word will not match the Rule Context and the rule will therefore fail for this word, which is precisely what we need.

The Braille part of a LOUIS rule can be specified in two ways: By a mixed sequence of dashes and asterisks, which resembles the layout of a Braille keyboard, or by dot numbers.

Using the first method, a dash symbolizes a released key, while an asterisk represents a key pressed down.

The code for "can" is Dot 1,4, therefore the Braille Part would look like this:

--**--

or

14

The complete Rule Line for the "can" rule would then be:

| {CAN} | --**--

or

| {CAN} | 14

However, within a sentence, the word "can" can be also terminated by a punctuation mark, rather than a space. It can be also preceded by a punctuation mark if the word is at the start of a line and the previous line ends with a punctuation mark.

To make the above rule infallible, we have to replace the space characters in the Context Part by general punctuation mark symbols. These symbols are similar in concept to wild cards. Any punctuation mark (including space) in the converted text will be matched against these wild card symbols.

Let's assume that the wild card for a punctuation mark is ~ (usually called "tilde"). So, for the purpose of Braille rule translation, a tilde will represent any punctuation mark.

The correct and final version of the "can" rule will then be:

|~{CAN}~| --**--

or

|~{CAN}~| 14

Composition Rules

The Composition Rules are exceptional because their Text Part is empty. The Text Delimiters still exist but there is no text between them. This is because the Composition Rules do not replace any text by a Braille code. This task is left to the other rules. Instead, the Composition Rules look at transitions between parts of text.

The Composition Rules define relationships between various groups of characters as a whole. This relationship in Braille is usually specified by extra Braille codes inserted into text.

For example, the Composition Rules may be used to create a rule for each whole number to be preceded by a number sign, but not each digit within that number. Or a rule that each upper case character should be preceded by a capital sign - but not each upper case character within a whole word composed of upper case letters.

To demonstrate the usage of Composition Rules, let us implement the number sign convention for Standard English Grade 2 Braille. Let us start from the simplest requirement, that each number should be preceded by a number sign:

Let us assume that the generlized LOUIS symbol for a digit is # (a "hash" symbol). A rule implementing this requirement will look like this:

| {}#| *--***

(A Number Sign in Grade 2 Braille is Dot 3,4,5,6.)

We also need to cater for situations where a number immediately follows a letter. Naturally, we do want a Number Sign there:

|@{}#| *--***

Note that we have used "@" as a generalized LOUIS symbol for a letter.

If a punctuation mark, preceded by a letter, precedes a number, we also want a number sign there. The following rule will cater for this requirement:

|@~{}#| *--***

But what happens if a number follows a dot, which follows a space? This is obviously a decimal number, such as .5 and, according to Standard English Braille needs to be translated as a number sign, followed by a special sign for decimal point, followed by the number.

The number sign must precede the sign for the dot. There is no obvious way for the Composition Rules themselves to swap the number sign and the dot sign. Indeed, what we need to do is replace the dot with a decimal point code.

Since we are talking about replacement, we must move away from the Composition Rules. And since we are talking about replacing a punctuation mark, in our case, a dot, we have to add an extra rule to the Punctuation Rules. A rule to cater for decimal numbers starting with a decimal point, would read like this:

| {.}#| *--*** -*----

This rule should be included in the Punctuation Rules, preceding a general rule for a dot.

Space Rules

Space Rules represent the first rule group of Non-Letter rules.

A space in practically all used Braille codes is represented by a gap between Braille symbols. In LOUIS, this is expressed by a sequence of six dashes. The simplest Space Rule will therefore be:

|{ }| ------

However, in Standard English Grade 2 Braille, a space is sometimes suppressed between certain words, such as between the words "and", "for", "of", "the", "with", etc.

Say we wish to implement the rule that a space between "and" and "for" will be suppressed. This is achieved by including the following rule into the Space Rules:

|~AND{ }FOR~|

The Braille part of this Rule Line remains blank: the space is replaced by no generated Braille code. Also note that the whole context of words "and" and "for" is surrounded by default LOUIS symbols for punctuation marks. This allows for this rule to operate even when this context is found at the start or end of a sentence or start or end of a line.

The |~AND{ }FOR~| rule must precede the |{ }| ------ rule, which, being the most general rule, must be located at the very end of the Space Rules.

Punctuation Rules

The Punctuation Rules can be very simple if a particular punctuation mark is always replaced by a particular Braille code. The situation gets more complicated if there are context-dependent ambiguities.

For example, the rule for a colon is very straightforward indeed:

|{:}| -*--*-

Comma, however, gets slightly more complicated:

|#{,}###~| *-----
|{,}| -*----

The second, general rule, accommodates any context which doesn't satisfy the first rule.

The first rule looks at the text surrounding the comma: If the comma is immediately preceded by a digit and immediately followed by three digits, it is then interpreted as the comma dividing thousands in long numbers and translated accordingly. This happens only if there are exactly three digits following the comma, as defined by the general punctuation symbol following the three general digit symbols. In the actual text, the punctuation symbol is matched against any punctuation mark, which can be another comma or any other punctuation mark, for example a dot at the end of the sentence; but not another digit. Space is also considered to be a punctuation mark.

The representation of punctuation marks in Braille can sometimes provide interesting situations which need the co-operation of Punctuation Rules and Composition Rules, in order to provide correct translation. An example of this is the per cent sign: In Standard English Grade 2 Braille, the per cent sign is shown as a Braille code for a dash followed by a code for "p", before the actual number. This requirement is satisfied by using the Composition Rule

| {}\%| -*--*- ****-- *--***

and the Punctuation Rule

|{%}|

The first rule instructs to precede any number which is followed by a percent sign, by three Braille codes, namely Dot 2,4; Dot 1,2,3,4 and Dot 3,4,5,6. The last one is a numeral sign. Also note that character \ has been used as a wild-card symbol for any decimal number.

The co-operating Punctuation Rule contains no Braille part, therefore the per cent sign itself will not produce any Braille code. The rule simply means that a per cent sign will be disregarded by the Punctuation Rules since it has been already taken care of by the Composition Rules.

Letter Rules

The Letter Rules are divided into groups for each letter of the alphabet. The starting letter of the Rule Text of each letter group must match.

There are a few simple points to remember when constructing Braille rules with LOUIS:

1. The sequence of the rule lines is important. For example, "bb" in the middle of a word is defined as Dot 2,3 in Standard English Grade 2 Braille. However, the "bb" in the word "babble" should NOT be translated using the Dot 2,3 code. This is because the contraction for the following "ble" is preferred. To cater for this situation, the general rule for "bb" has to be preceded by a specific rule for "babble":

|BA{B}BLE| -**---
|@{BB}@| **----

Note that it is sufficient to replace only the first "b" of the "bb" in "babble": The second "b" will take care of itself, since after the replacement of the first "b", the translation cursor will be pointing at the second "b". A new scan through the rules will be made, during which the |@{BB}@| rule will not apply any more, however a rule for "ble" will be found.

2. Every Letter Rule must be terminated by a general character rule, which represents a "fall back" position if all preceding rules fail. In order for this last rule to never fail, this means this last rule must not have any left or right context. For example, the letter "A" rules in Standard English Grade 2 Braille must terminate with

|{A}| --*---

Similarly, the letter "B" rules have to terminate with

|{B}| -**---

etc.

3. The first character of the Rule Text of all rules that follow a particular Rule Header (up to the next Rule Header or the end of the Rule File), must be identical and match that Rule Header.

A complete set of Braille rules conctructed in LOUIS, usable for English Grade 2 conversion, can be downloaded from here.

Advanced LOUIS

This chapter describes advanced features of LOUIS symbolic language. This information will be of interest to those users who wish to develop serious Braille applications beyond simple modifications of the factory-supplied rules.

User Definable Symbol Characters

In previous text, we have been using default delimiters and wild card symbols, such as |{@# , etc.

LOUIS makes it possible to override these symbols with a user- defined symbols. Any character from the standard ANSI or IBM character set can be used, except character 255.

The definition of LOUIS symbols must be done at the start of the Rule file, using Symbol Definition instructions. The syntax of a symbol definition instruction is as follows:

Symbol Name = Character Code

The Character Code is simply an ASCII character enclosed between single quotes.

The following reserved symbol names can be used to replace the default symbol definitions in basic LOUIS:

SymDel Context Delimiter (| used so far)
SymTxl Left Text Delimiter ({ used so far)
SymTxr Right Text Delimiter (} used so far)
SymPun Punctuation (~ used so far)
SymDig Digit (# used so far)
SymNum Number (\ used so far)
SymLet Letter (@ used so far)
SymUcl Upper Case Letter (^ used so far)
SymLcl Lower Case Letter (not used yet)

For example, in default LOUIS symbology, the rule for the word "can" is

|~{CAN}~| --**--

Now let us add the following symbol definitions to the start of the rule file:

SymDel='\'
SymTxl='['
SymTxr=']'
SymPun='#'

After this, the equivalent rule will look like this:

\#[CAN]#\ --**--

Word Repetition

For BrailleMaster to be the ultimate Braille production tool for most languages, the rules must be flexible enough in order to be able to generate perfect Braille code even in various "exotic" Braille alphabets.

A special facility exists in LOUIS to incorporate contractions of repetition words which occur especially in Malaysian and Indonesian Braille. In basic LOUIS this option is not activated and must be defined at the start of the Rule file, using the reserved symbol name SymRpt, for example like this:

SymRpt=''

A rule, incorporating the Word Repetition symbol can then be included in the punctuation rules:

|{-}| ******

When a repetition word is encountered in the text, for example "orang-orang", which means "people" in the Malay language, the translator will convert the word "orang" only once, append a "repetition sign" to it (Dot 1,2,3,4,5,6 as defined in the above rule) and reposition the translation cursor to the end of the repetition word (i.e. not just after the hyphen as it would normally). The hyphen between the two identical words will be disregarded.

User Definable Groups

This facility makes it possible to define not only symbol characters, but also groups of text characters or words represented by them.

For example, the fact that a space is disregarded between words "and", "for", "of", "the", "with" and "a" in English Grade 2 Braille can be described as the following sequence of seventeen rules:

|~AND{ }A~|
|~AND{ }FOR~|
|~AND{ }OF~|
|~AND{ }THE~|
|~AND{ }WITH~|
|~FOR{ }A~|
|~FOR{ }THE~|
|~FOR{ }WITH~|
|~FOR{ }OF~|
|~OF{ }A~|
|~OF{ }THE~|
|~OF{ }WITH~|
|~OF{ }FOR~|
|~WITH{ }A~|
|~WITH{ }THE~|
|~WITH{ }OF~|
|~WITH{ }FOR~|

User Definable Groups can simplify these rules substantially.

There are five user-definable groups, defined by reserved symbols SymUs1, SymUs2, SymUs2, SymUs3, SymUs4 and SymUs5.

Let us simplify the above group for rules by defining a group, for example like this, using the character _ (Alt 225) as a symbol:

SymUs1='_': 'and', 'for', 'of', 'the', 'with', 'a'

The following single rule will then replace all the previous seventeen:

|~_{ }_~|

The general syntax of the User Definable Group definition is as follows:

Symbol Name = Character Code : Group

A "character code" is a character surrounded by single quotes. A "group" is a sequence of words enclosed by single quotes and separated by commas.

There is a beneficial side-effect in implementing user-defined groups: Since the speed of translation depends on the number of rules, the reduction of rules usually increases the translation speed. This applies especially if the reduction occurs in the Composition and Punctuation rules. (The Composition rules add most time to the translation since they have to be scanned at each position of the translation cursor, checking for boundaries between words, digits, etc.)

Note that there is a limit upon the total number of characters in each user defined group: The number is not allowed to exceed 255 (not including separating commas and quotation marks). If the total number of characters is found to be greater, an error will be announced by BrailleMaster during the rule initialization process.


Home | Overview | Tutorial | Q&A and Tips | Louis Braille Memorial
Software Downloads | User Registration | Contacts

Copyright © 2009 Robotron Group