mdbFLT - Font Layout Table
For simple scripts, the rendering engine converts character codes into glyph
codes one by one by consulting the encoding of each selected font. But, to
render text that requires complicated layout (e.g. Thai and Indic scripts),
one to one conversion is not sufficient. A sequence of characters may have to
be drawn as a single ligature. Some glyphs may have to be drawn at
2-dimensionally shifted positions.
To handle those complicated scripts, the m17n library uses Font Layout Tables
(FLTs for short). The FLT driver interprets an FLT and converts a character
sequence into a glyph sequence that is ready to be passed to the rendering
An FLT can contain information to extract a grapheme cluster from a character
sequence and to reorder the characters in the cluster, in addition to
information found in OpenType Layout Tables (CMAP, GSUB, and GPOS).
An FLT is a cascade of one or more conversion stages. In each stage, a sequence
is converted into another sequence to be read in the next stage. The length of
sequences may differ from stage to stage. Each element in a sequence has the
following integer attributes.
In the first conversion stage, this is the character code in the original
character sequence. In the last stage, it is the glyph code passed to the
rendering engine. In other cases, it is an intermediate glyph code.
The category code defined in the CATEGORY-TABLE of the current stage, or defined
in the one of the former stages and not overwritten by later stages.
If nonzero, it specifies how to combine this (intermediate) glyph with the
If nonzero, it instructs the rendering function to insert a padding space before
this (intermediate) glyph so that the glyph does not overlap with the previous
If nonzero, it instructs the rendering function to insert a padding space after
this (intermediate) glyph so that the glyph does not overlap with the next
When the layout engine draws text, it at first determines a font and an FLT for
each character in the text. For each subsequence of characters that use the
same font and FLT, the layout engine generates a corresponding intermediate
glyph sequence. The code attribute of each element in the intermediate glyph
sequence is its character code, and all other attributes are zeros. This
sequence is processed in the first stage of FLT as the current run
Each stage works as follows.
At first, if the stage has a CATEGORY-TABLE, the category of each glyph in the
current run is updated. If there is a glyph that has no category, the current
run ends before that glyph.
Then, the default values of code-offset, combining-spec, and left-padding-flag
of this stage are initialized to zero.
Next, the initial conversion rule of the stage is applied to the current run.
Lastly, the current run is replaced with the newly produced (intermediate) glyph
The m17n library loads an FLT from the m17n database using the tag <font,
layouter, FLT-NAME>. The date format of an FLT is as follows:
FONT-LAYOUT-TABLE ::= FLT-DECLARATION ? STAGE0 STAGE *
FLT-DECLARATION ::= '(' 'font' 'layouter' FLT-NAME nil PROP * ')'
FLT-NAME ::= SYMBOL
PROP :: = VERSION | FONT
VERSION ::= '(' 'version' MTEXT ')'
FONT ::= '(' 'font' FONT-SPEC ')'
'(' [[ FOUNDRY FAMILY
[ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ]]]]]
[ OTF-SPEC ] [ LANG-SPEC ] ')'
STAGE0 ::= CATEGORY-TABLE GENERATOR
STAGE ::= CATEGORY-TABLE ? GENERATOR
CATEGORY-TABLE ::= '(' 'category' CATEGORY-SPEC + ')'
CATEGORY-SPEC ::= '(' CODE CATEGORY ')'
| '(' CODE CODE CATEGORY ')'
CODE ::= INTEGER
CATEGORY ::= INTEGER
In the definition of CATEGORY-SPEC, CODE is a glyph code, and CATEGORY is ASCII
code of an upper or lower letter, i.e. one of 'A', ... 'Z', 'a', .. 'z'.
The first form of CATEGORY-SPEC assigns CATEGORY to a glyph whose code is CODE.
The second form assigns CATEGORY to glyphs whose code falls between the two
GENERATOR ::= '(' 'generator' RULE MACRO-DEF * ')'
RULE ::= REGEXP-BLOCK | MATCH-BLOCK | SUBST-BLOCK | COND-BLOCK
FONT-FACILITY-BLOCK | DIRECT-CODE | COMBINING-SPEC | OTF-SPEC
| PREDEFINED-RULE | MACRO-NAME
MACOR-DEF ::= '(' MACRO-NAME RULE + ')'
Each RULE specifies glyphs to be consumed and glyphs to be produced. When some
glyphs are consumed, they are taken away from the current run. A rule may fail
in some condition. If not described explicitly to fail, it should be regarded
that the rule succeeds.
DIRECT-CODE ::= INTEGER
This rule consumes no glyph and produces a glyph which has the following
- code : INTEGER plus the default code-offset
- combining-spec : default value
- left-padding-flag : default value
- right-padding-flag : zero
After having produced the glyph, the default code-offset, combining-spec, and
left-padding-flag are all reset to zero.
PREDEFINED-RULE ::= '=' | '*' | '<' | '>' | '|' | '[' | ']'
They perform actions as follows.
This rule consumes the first glyph in the current run and produces the same
glyph. It fails if the current run is empty.
This rule repeatedly executes the previous rule. If the previous rule fails,
this rule does nothing and fails.
This rule specifies the start of a grapheme cluster.
This rule specifies the end of a grapheme cluster.
This rule sets the default left-padding-flag to 1. No glyph is consumed. No
glyph is produced.
This rule changes the right-padding-flag of the lastly generated glyph to 1. No
glyph is consumed. No glyph is produced.
This rule consumes no glyph and produces a special glyph whose category is ' '
and other attributes are zero. This is the only rule that produces that
REGEXP-BLOCK ::= '(' REGEXP RULE * ')'
REGEXP ::= MTEXT
MTEXT is a regular expression that should match the sequence of categories of
the current run. If a match is found, this rule executes RULEs temporarily
limiting the current run to the matched part. The matched part is consumed by
Parenthesized subexpressions, if any, are recorded to be used in MATCH-BLOCK
that may appear in one of RULEs.
If no match is found, this rule fails.
MATCH-BLOCK ::= '(' MATCH-INDEX RULE * ')'
MATCH-INDEX ::= INTEGER
MATCH-INDEX is an integer specifying a parenthesized subexpression recorded by
the previous REGEXP-BLOCK. If such a subexpression was found by the previous
regular expression matching, this rule executes RULEs temporarily limiting the
current run to the matched part of the subexpression. The matched part is
consumed by this rule.
If no match was found, this rule fails.
If this is the first rule of the stage, MATCH-INDEX must be 0, and it matches
the whole current run.
SUBST-BLOCK ::= '(' SOURCE-PATTERN RULE * ')'
SOURCE-PATTERN ::= '(' CODE + ')'
| (' 'range' CODE CODE ')'
If the sequence of codes of the current run matches SOURCE-PATTERN, this rule
executes RULEs temporarily limiting the current run to the matched part. The
matched part is consumed.
The first form of SOURCE-PATTERN specifies a sequence of glyph codes to be
matched. In this case, this rule resets the default code-offset to zero.
The second form specifies a range of codes that should match the first glyph
code of the code sequence. In this case, this rule sets the default
code-offset to the first glyph code minus the first CODE specifying the range.
If no match is found, this rule fails.
FONT-FACILITY-BLOCK ::= '(' FONT-FACILITY RULE * ')'
FONT-FACILITY = '(' 'font-facility' CODE * ')'
| '(' 'font-facility' FONT-SPEC ')'
If the current font has glyphs for CODEs or matches with FONT-SPEC, this rule
succeeds and RULEs are executed. Otherwise, this rule fails.
COND-BLOCK ::= '(' 'cond' RULE + ')'
This rule sequentially executes RULEs until one succeeds. If no rule succeeds,
this rule fails. Otherwise, it succeeds.
OTF-SPEC ::= SYMBOL
OTF-SPEC is a symbol whose name specifies an instruction to the OTF driver. The
name has the following syntax.
OTF-SPEC-NAME ::= ':otf=' SCRIPT LANGSYS ? GSUB-FEATURES ? GPOS-FEATURES ?
SCRIPT ::= SYMBOL
LANGSYS ::= '/' SYMBOL
GSUB-FEATURES ::= '=' FEATURE-LIST ?
GPOS-FEATURES ::= '+' FEATURE-LIST ?
FEATURE-LIST ::= ( SYMBOL ',' ) * [ SYMBOL | '*' ]
Each SYMBOL specifies a tag name defined in the OpenType specification.
For SCRIPT, SYMBOL specifies a Script tag name (e.g. deva for Devanagari).
For LANGSYS, SYMBOL specifies a Language System tag name. If LANGSYS is omitted,
the Default Language System table is used.
For GSUB-FEATURES, each SYMBOL in FEATURE-LIST specifies a GSUB Feature tag name
to apply. '*' is allowed as the last item to specify all remaining features.
If SYMBOL is preceded by '~' and the last item is '*', SYMBOL is excluded from
the features to apply. If no SYMBOL is specified, no GSUB feature is applied.
If GSUB-FEATURES itself is omitted, all GSUB features are applied.
When OTF-SPEC appears in a FONT-SPEC, FEATURE-LIST specifies features that the
font must have (or must not have if preceded by '~'), and the last'*', even if
exists, has no meaning.
The specification of GPOS-FEATURES is analogous to that of GSUB-FEATURES.
Please note that all the tags above must be 4 ASCII printable characters.
See the following page for the OpenType specification.
COMBINING ::= SYMBOL
COMBINING is a symbol whose name specifies how to combine the next glyph with
the previous one. This rule sets the default combining-spec to an integer code
that is unique to the symbol name. The name has the following syntax.
COMBINING-NAME ::= VPOS HPOS OFFSET VPOS HPOS
VPOS ::= 't' | 'c' | 'b' | 'B'
HPOS ::= 'l' | 'c' | 'r'
OFFSET :: = '.' | XOFF | YOFF XOFF ?
XOFF ::= ('<' | '>') INTEGER ?
YOFF ::= ('+' | '-') INTEGER ?
VPOS and HPOS specify the vertical and horizontal positions as described below.
POINT VPOS HPOS
----- ---- ----
0----1----2 <---- top 0 t l
| | 1 t c
| | 2 t r
| | 3 B l
9 10 11 <---- center 4 B c
| | 5 B r
--3----4----5-- <-- baseline 6 b l
| | 7 b c
6----7----8 <---- bottom 8 b r
9 c l
| | | 10 c c
left center right 11 c r
The left figure shows 12 reference points of a glyph by numbers 0 to 11. The
rectangle 0-6-8-2 is the bounding box of the glyph, the positions 3, 4, and 5
are on the baseline, 9-11 are on the vertical center of the box, 0-2 and 6-8
are on the top and on the bottom respectively. 1, 10, 4, and 7 are on the
horizontal center of the box.
The right table shows how those reference points are specified by a pair of VPOS
The first VPOS and HPOS in the definition of COMBINING-NAME specify the
reference point of the previous glyph, and the second VPOS and HPOS specify
that of the next glyph. The next glyph is drawn so that these two reference
OFFSET specifies the way of alignment in detail. If it is '.', the reference
points are on the same position.
XOFF specifies how much the X position of the reference point of the next glyph
should be shifted to the left ('<') or right ('>') from the previous
YOFF specifies how much the Y position of the reference point the next glyph
should be shifted upward ('+') or downward ('-') from the previous reference
In both cases, INTEGER is the amount of shift expressed as a percentage of the
font size, i.e., if INTEGER is 10, it means 10% (1/10) of the font size. If
INTEGER is omitted, it is assumed that 5 is specified.
Once the next glyph is combined with the previous one, they are treated as a
single combined glyph.
MACRO-NAME ::= SYMBOL
MACRO-NAME is a symbol that appears in one of MACRO-DEF. It is exapanded to the
sequence of the corresponding RULEs.
So far, it has been assumed that each sequence, which is drawn with a specific
font, is context free, i.e. not affected by the glyphs preceding or following
that sequence. This is true when sequence S1 is drawn with font F1 while the
preceding sequence S0 unconditionally requires font F0.
sequence S0 S1
currently used font F0 F1
usable font(s) F0 F1
Sometimes, however, a clear separation of sequences is not possible. Suppose
that the preceding sequence S0 can be drawn not only with F0 but also with F1.
sequence S0 S1
currently used font F0 F1
usable font(s) F0,F1 F1
In this case, glyphs used to draw the preceding S0 may affect glyph generation
of S1. Therefore it is necessary to access information about S0, which has
already been processed, when processing S1. Generation rules in the first
stage (only in the first stage) accept a special regular expression to access
already processed parts.
RE0 and RE1 are regular expressions that match the preceding sequence S0 and the
following sequence S1, respectively.
Pay attention to the space between the two regular expressions. It represents
the special category ' ' (see above). Note that the regular expression above
belongs to glyph generation rules using font F1, therefore not only RE1 but
also RE0 must be expressed with the categories for F1. This means when the
preceding sequence S0 cannot be expressed with the categories for F1 (as in
the first example above) generation rules having these patterns never match.
, FLTs provided by the m17n database
Copyright (C) 2001 Information-technology Promotion Agency (IPA)
Copyright (C) 2001-2011 National Institute of Advanced Industrial Science and
Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License