Morpho-Syntactic Analysis Framework for Tone Language Text-to-Speech Systems

This paper presents a morpho-syntactic analysis framework using the data-driven methodology. The proposed framework complements the front-end design of a recent text-to-speech (TTS) project and is generic for other tone language systems. We experiment the design for Ibibio (ISO 693-2: nic; Ethnologue: IBB), a Lower Cross language of the (New) Benue Congo language family, widely spoken in the south-eastern region of Nigeria. Implementation shows that the design is sufficient for morpho-syntactic parsing and useful for prosody improvement in TTS systems. Also, the methodology adopted detaches a greater part of the linguistic features specification from the program code. This allows for easy morphological alterations of utterances and replication of the synthesizer for other languages.


Introduction
Natural language processing (NLP) is a field of computational linguistics concerned with the interactions between computers and human (natural) languages.In theory, NLP is an attractive method of human-computer interaction (HCI).Natural language understanding is sometimes referred to as an 'AI-complete' problem (Shapiro, 1992), because they seem to require extensive knowledge about the outside world and the ability to manipulate it.One most important reason for not reaching the desired goal of NLP, i.e. achieving a design or system capable of analysing, understanding and generating natural languages with precision, is that natural languages are ambiguous.A lot of effort within NLP has been made to resolve the problem of ambiguity.Basic research areas in NLP concentrate on automatic determination of some structure(s) of written or spoken languages on the various linguistic levels such as morphology, syntax, semantics or discourse.For instance, part-of-speech taggers have been used to resolve lexical ambiguities, and shallow parsers to resolve structural ambiguities.
In this paper, we focus on language analyzer construction.There are two main methodologies for building the knowledge-base of a language analyzer: the linguistic approach and the data-driven approach.The linguistic approach lends itself on the linguist's (potentially corpus-based) abstractions about the paradigms and syntagms of the language.Distributional generalizations are manually coded as a grammar -a system of constraint rules used for discarding contextually illegitimate analyses (Voutilainen & Jarvenen, 1995;Karlsson, Voutilainen, Heikkila, & Anttila, 1995).This approach is however labour-intensive, as much skill and effort are required to write an exhaustive grammar.The data-driven approach automatically derives frequency-based information from corpora.The learning corpus can contain plain text, but better results seem achievable with annotated corpora (Merialdo, 1994;Elworth, 1994;Megyesi, 2002).The corpus-based information typically contain sequences of tags or words with well known exceptions and can either be represented as neural networks (Eineborg & Gambäck, 1994;Schmid, 1994), local rules (Brill, 1992) or collocation matrices (Garside, 1987).This approach requires no human effort for rule writing and can easily be adapted to different NLP tasks such as part-of-speech (PoS) tagging and shallow parsing (Megyesi & Carlson, 2002).However, considerable efforts may be required for determining a workable tag-set (Cutting, 1994) and training corpus annotation.The data-driven approach to syntactic analysis (parsing) is a very active area of research, but relatively little has been done in applying a similar methodology to morphology (Chrupala, 2006;De Pauw & De Schryver, 2008).One major reason for this may be due to the fact that most research publications deal with the English language, which does not have a complex inflectional morphology.In African languages, the number of inflected word forms is far larger than for English and Chinese due to the agglutinative inflectional morphology and complex subject-verb-object person concord, which adds further difficulty to morphological tone assignment and produces problems of text corpus sparseness (Gibbon, 2001).
We propose in this paper, a generic framework that is useful for prosody improvement in TTS systems.We have in a recent TTS project implemented a parser for grapheme-to-phoneme (g2p) conversion (Ekpenyong, Udoinyang, & Urua, 2009) and integrated a syllabification FST into the TTS system.Though the implementation is done for the Ibibio language (used as a benchmark for other tone languages), we adopt a data-driven approach that enables easy replication of the TTS system for other tone languages.

Literature Review
Parsing (or grammatical analysis of sentences) has been a subject of intense and widespread research for at least three decades now.Many parsers of natural languages have been designed and either used as a research avenue to explore various linguistic or computational theories or as a component of large database programs.The implementation of syntactic parsers constitutes a major task in compiler construction and has produced several classic methods and algorithms used for syntactic parser construction (Appel, 1997;Neto, Pariente, & Leonardi, 1999;Tremblay & Sorenson, 1985;Andrew, 1997).Parsing uncovers the hidden structure of linguistic input.In many applications involving natural languages, the underlying predicate-argument structure of sentences can be useful.The syntactic analysis of a language provides a means of explicitly discovering the various predicate-argument dependencies, which may exist in a sentence.The major bottleneck in parsing natural language as earlier mentioned is the pervasiveness of ambiguity, which constitutes a major problem, since the most plausible analysis has to be chosen from an exponentially large corpus of alternative analyses.Parsing also recovers information that is not explicitly specified in the input sentence.This implies that a parser requires some knowledge in addition to the input sentences, about the kind of syntactic analysis which should be produced as output.One method of providing such knowledge to the parser is to write a grammar of the language -a set of rules for syntactic analysis.The grammar rules of a language for instance, can be written using a context-free grammar (CFG) (Sipser, 2006;Flajolet, 1987).
In many languages, the notion of splitting up tokens using white spaces is problematic since each word can contain several components called morphemes.In this case, the meaning of a word can be thought of as being composed of a combination of meanings of the morphemes.Henceforth, we regard a word as being decomposed into a stem associated with several morphemes.In order to tackle the disambiguation problem for morphology, the problem of splitting a word into the most likely sequence of morphemes can be reduced to a (very complex) part-of-speech tagging task.The word itself is not split into morphemes, but each word is tagged with a PoS tag, which encodes a lot of information about the morpheme.This enriched tag set can be a rich source of features for a statistical parser, for a highly inflected language.In Seara, Pacheco, Kafta, Seara Jr. and Seara (2010), an ad-hoc morpho-syntactic parser to a TTS system for Brazilian Portuguese has been developed.Their parser is composed of a dictionary and a set of four level structured rules and uses a methodology, which creates large annotated dataset and an incremental development of rules for morpho-syntactic classification.Some sentences are inherently ambiguous and unpredictable.English sentences for instance, may result in hundreds, perhaps thousands of syntactic parse trees for certain very natural sentences.This fact has remained a major obstacle confronting natural language processing; especially when a large percentage of the syntactic parse trees are enumerated during semantic/pragmatic processing.In English, syntactic ambiguity may grow 'combinatorially' with the number of prepositional phrases (Church & Patil, 1982).Therefore enumerating the parse trees may fail to capture the relevant generalization that prepositional phrases (PPs) are 'every way ambiguous', or more precisely, the set of parse trees over i PPs is the same as the set of binary trees, which can be constructed over i terminal elements.Applying a formal power series encapsulates the ambiguity response of the system's grammar to all possible input sentences.Some methods for dealing with syntactic ambiguity in ways which exploit certain regularities among alternative parse trees have been proposed in Church and Patil (1982).These regularities are expressed as linear combinations of augmented transition networks (ATNs) (Wanner, 1980), and also as sums and products of formal power series (Flajolet, 1987;Caprini, Fischer, & Vrkoc, 2010).
Morpho-syntactic classification is important to improve prosody of synthesized speech and the pronunciation of words subject to vocalic alternation (Seara, Pacheco, Kafta, Seara, & Seara, 2010).A number of morpho-syntactic parsers have been proposed for TTS systems (Bick, 2006;Ribeiro, Oliveira, & Trancoso, 2003).These systems search for better prosodies through a more detailed linguistic description, avoiding artificially changing the acoustic parameters of synthesized speech.Simple approaches to morphological analysis deal only with the removal of endings and suffixes by means of a generic pre-defined suffix-tree, without considering the proper analysis of prefixes and compound words (Dasgupta & Ng, 2007).One further disadvantage besides this missing precision concerns the inherent syntactic and semantic information comprised in the removed endings, which results in the lack of flexibility in the resulting semantic representation.That is, there is no possibility to deal with cases where a derived word inherits a new specific meaning different from the word sense which the combination of the stem and the suffix in question would suggest.To overcome these shortcomings, the so-called lexical approach (Whitelock, 1988), which assigns all morphological features directly to the corresponding canonical forms in the dictionary, can be applied.
However, the most competent approach to implementing morphological analysis is the use of Finite State Transducers (FSTs).Useful researches applying this technique can be found in Minnen, Carroll and Pearce (2000), Ganapathiaraju and Levin (2006), Menon, Saravanan, Loganathan and Soman (2009).There are also a number of frameworks for syntactic analysis which have been used as bases for NLP.Most of these frameworks suffer from serious meta-theoretical or practical defects, especially in the area of power and descriptive accuracy.Several recent syntactic frameworks include: lexical-functional grammar (Kaplan & Bresman, 1982), generalized phrase structure grammar (Gazdar, Klein, Pullum, & Sag, 1985) and lexicase (Starosta, 1985).Data-driven framework algorithms for morpho-syntactic analysis are available in Starosta and Nomura (1986), Kumar, Dhanalakshmi and Rajendran (2010).
More recently, research on computational syntax/morphology has been dominated by unsupervised approaches (Pauw & Wagacha, 2007;Wagacha & Abade, 2007;Lavalle & Langlais, 2009;Calvo, Gambino, Gelbukh, & Inui, 2011).These methods attempt to automatically induce the morphological properties of a language on the basis of raw, un-annotated text, using minimum-distance edit metrics and pattern-matching/grammar inference techniques.The major contribution of this research is speech quality improvement.We attain this by tackling prosody -a key factor responsible for naturalness of TTS products.We also adopt a state-of-the-art approach which provides a benchmark for other tone languages.The paper will also bootstrap further research in the area of syntax/morphology of less-resource languages.Initial micro-voices obtained using the framework (c.f.Ekpenyong, Urua, Udosen, & Udoh, 2011) sounds impressive and is currently being improved upon.This paper is organized in five folds: (i) It discusses the Ibibio morphology; (ii) It studies the language's phrase structure; (iii) It provides a procedure detailing the research approach adopted; (iv) It experiments the proposed framework with a case study's language (Ibibio); (v) It concludes and highlights future research directions.

The Ibibio Morphology
In this section, we discuss the morphology of Ibibio, a Lower Cross language spoken by approximately four million (4,000,000) speakers in the south-eastern region of Nigeria.Ibibio is a classical terraced tone system (Urua, 2001).Though the Ibibio language has received significant attention in the area of syntax/morphology (Simmons, 1957;Urua, 1990;Akinlabi & Urua, 2002), not much has been done towards building computational resources for the language.We present in the following section, a useful framework for the language's phrase structure grammar.The aim is to enrich the ongoing language technology collaboration projects, which have projected Ibibio both locally and internationally.Ibibio is a morphologically-rich and inflectional language that has a lot of potentials for language technology research and development.We discuss this concept under the following word structure:

Affixation
In Ibibio, given a root word (verb) such as dí (come), the inflectional prefixes á-, é-, í-and ń-can be added to change its form, and these forms depend on the number and person as illustrated in 1.In examples 2-5, the personal markers are the prefixes: m-, ń-, ŋ-, á-, etc.In Ibibio, they function in most cases as the first constituent of any inflection before any other inflectional affix is added to a root word.The only exception is when a negative marker in the imperative form is added to the root word, for instance:  (Essien, 1990;Essien, 2010) In examples 9-11, notice that the roots have been repeated to create the intended meaning and the reduplicative morphemes always come before the root word in all cases.

Compounding
Ibibio nouns (stand alone) can also be combined to form another root word thus:

Ibibio Phrase Structure Grammar
Syntactic analysis could be done using any of these approaches: (i) use of dependency graphs: connecting a word -the head of a phrase -with the dependents in that phrase; (ii) use of phrase structure trees: traditional sentence diagrams which partition a sentence into constituents and larger constituents are formed by merging smaller ones.This approach also typically incorporates ideas from generative grammar (from linguists), to assist it deal with displaced constituents or apparent long distance relationships between heads and constituents.
Phrase structure rules define a language's grammar and generate the deep structures of its sentences.They constitute re-write rules employing symbols for its operations.We propose a phrase structure that extends Essien's (1990) phrase structure grammar (PSG) for simple-positive Ibibio sentences.Essien's (1990) PSG is as shown in Figure 1.

Grammar Construction and Productions Labelling
In our proposal, we also consider inflection which is important in language morphology.An Ibibio sentence can now be viewed as a field of sets with three subsets (S -> <NP,INFL,VP>).The initial symbol (S) exists, and generates more strings of symbols called productions.Using rewrite rule (Freidin, 1992), we construct an extended phrase structure grammar (PSG) for Ibibio as shown in Figure 2. The phrase structure in Figure 2 is comprehensive for the language and considers all the possible productions of the language.The productions are also properly labeled to distinguish top-level productions from lower-level transitions.Also, our grammar structure can generate both simple and complex sentences in Ibibio.A symbol table which defines the various notations in the PSG is shown in Table 1.Table 2. Ibibio SAMPA table

Methodology
The purpose of a morphological analyzer is to split an input word into morphemes and then figure out the grammatical categories of the word.Morphological analyzers may be called either manually or automatically by the syntactic analyzer.The description of the morphology of a natural language requires special formalism.A feature structure is a specific data structure.It is a list of 'attribute-value' pair.The value of an attribute (field) may either be atomic or a feature structure itself (i.e. has a recursive definition).This allows for the building of complex or deeply nested sub-structures.Feature structures are widely used in NLP.They are mostly used: (i) to hold initial properties of lexical entries in the dictionary (ii) to place constraints on the parser rules (iii) to pass (or reference) data across different levels of analysis Morphological rules are defined as follows: Where M i are morpheme classes and C i are optional constraints.
A syntactic analyzer scans the natural language sentences and outputs a parse tree, with information about the sentence.To accomplish this task, syntactic analyzers require a grammar file and a dictionary (or may use a morphological analyzer in place of a complete dictionary).Grammar rules for syntactic analyzers are written as CFG rules.However, there may be constraints and symbol position regulators.The rule can be written according to these constructions: Where S is a left hand side (LHS) non-terminal symbol, A i are right hand side (RHS) terminal or non-terminal symbols, C i are constraints, R is a set of symbol position regulators.Position regulators declare the order of RHS symbols in the rule, thus creating a non-fixed word ordering.There are two types of position regulators: A A  , means that A i must be placed somewhere before the symbol A j (ii) A A  , means that A i must be placed exactly before the symbol A j It would be an excellent research product to mplementing morpho-syntactic parsers which can automatically construct syntax acceptors from grammars extension and allow for the generation of syntax trees, while accepting input sentences.The transducer representing the desired parser should activate the semantic actions while the parsing tree is automatically generated for a given input sentence.As a rule of thumb, the design process can be defined in the following order: construct the grammar -> label the productions -> group production rules -> remove self recursion -> assign state -> build the transducer.The defined order represents an informal but concise process which steps may be interchanged depending on the designer's preference.In this paper for instance, we prefer building the transducer before productions grouping.
The theory of deductive databases can also be implemented within this framework.This theory has been a topic of intensive research within the last couple of years and has resulted in several successful prototype systems (Naqvi & Tsur, 1989).The theory combines the advantages of relational database algebra and logic programming.Four main components are involved: (i) a schema of base predicates (ii) a set of facts representing the data (iii) a set of rules deriving the predicates (iv) a set of query interface for generating access to stored data (for corpus input) The theory is important for the following reasons: (i) possibility of formulating recursive queries, i.e. transitive relationships are possible (ii) non-monotonic operation of negation is supported (iii) not only atomic object types, but also complex object types, like sets, trees or lists can be used for data modelling (iv) updates are performed by means of declarative specifications (v) imperative predicates are available for users of conventional control structures (e.g.if-then-else) (vi) declarative semantics are preserved The current design is built to accommodate the implementation of this theory (see Table 1 and the accompanying extraction rules)

Ibibio Phrase Structure FSTs Design (Building the Transducer)
We present in Finite State Transducers (FSTs) which illustrate top-level components of the extended PSG.Thick lines represent top-level productions while broken lines indicate sub transitions.Formal definitions according to the general classification of Finite State Machine (FSM) logic can be found in (Sipser, 2006).Details of low-level productions can be found in Ekpenyong, Urua, Udosen and Udoh (2011).These illustrations are useful for checking the completeness of the proposed PSG and applied in the next section.

Productions Grouping
From the above FSMs, it is easy to build the productions rule table in a relational database format, with the theory of deductive database in mind.The table, which defines the rules-set for sentence parsing, is shown in Table 3. would enable the system infer correctly, the right productions from the productions rule table.Also, the productions rule table should be well organized to optimize the search process.13) for the sample sentence is given in Figure 14.The expanded structure is obtained by enumerating all possible productions in Table 3 (including redundant record entries '-').The redundant nodes as seen in Figure 13 are grayed-out, while non-redundant nodes are emphasized.In Table 4, an output detailing the set of productions of each morpheme is presented.We are currently integrating this output into our front-end synthesis modules to extend its usability (i.e. could be used for teaching/learning purposes).Folding up the grey nodes (i.e. the lower triangle of Figure 13) produces the normalized form with data links/keys emphasized in Figure 14.The implementation algorithm we adopted for the parser produced an output analysis that is consistent with Table 4, or a treebank (parsed corpus) for training the parser.Treebank parsers do not need to have explicit grammar.Figure 15 shows a Scheme representation of the normalized parse tree (derivation) of Figure 14.A linked-list data structure for the sample sentence can also be formed by tracing the link locations (record indexes) of the derivation tree.This structure is shown in Figure 16. Figure 16 is a data structure solution to our morpho-syntactic parser and can be effectively implemented in any of the text processing languages (Perl, Python, LISP, etc.).To allow for a robust design, there is need for an effective interface that would make the detailed operations transparent to users.Our morpho-syntactic framework is currently being refined for a Hidden Markov-based Ibibio TTS system.Initial evaluation shows satisfactory performance and more natural sounding synthesizer.A more detailed evaluation of the synthesizer shall be reported in a subsequent paper.

Conclusion and Future Research
We have added to the series of efforts aimed at strengthening the linguistic resources of the Ibibio language by presenting a useful contribution in the area of NLP.With the help of specific formalisms, we have extended the grammar rules in Essien (1990).These formalisms represent a new, but complex approach which solves some problems connected with NLP.The algorithm constitute finite state automata (FSA) based on a sentence grammar, and accepts as input, a sentence; assigns to the sentence, its surface syntactic structure and generates the syntax tree with the help of a PoS lexicon.The sentence morphology is also taken into consideration during parsing.This resource will produce a complete toolkit for the language as well as serve as a useful reference for NLP, speech technology and machine translation research.The current limitation of the paper is that some efforts are still required to specify most of the linguistic features necessary for implementation.As an outlook, we are working towards an unsupervised approach to speech processing, where the system requires less linguistic information.We hope that this approach would enhance the replication/adaptation of the system to other tone languages, with less modification.
comes (3 rd person singular) (c) é!dí -they come (3 rd person plural) (d) ń!dí -I come (1 st person singular) (e) édí -you come (2 nd person plural) (f) ádí -you come (2 nd person singular) (g) ídí -we come (3 rd person plural)A suffix may be added to the prefix/root to show negation, as in: comingTense could also be shown in Ibibio using the inflectional prefix as follows: -dìà mḱpọ́ ntè ìnọ́ -he usually eats like a thief (habitual)ReduplicationThis refers to full or partial repetition of a root word or base(Katamba 1993).The repeated part of the word serves some inflectional or derivational purpose.Examples are: it (instead of retrieving) the words in 12, 13.(c) are derivations from 12, 13.(a) and (b) respectively.Both morphemes in 12, 13.(c) are called bound morphemes, since they can't stand alone in compound context but could as single words.

Figure 3 .Figure 4 .Figure 5 .Figure 6 .
Figure 3. Syntax tree for Ibibio sentence: anye ama nam aNwaNa ke mme owo enie utreubOk ke usVN OmmO keedkeed 'He made known that people have reward in their respective ways' There are two main constructions in the grammar file of a morphological analyzer: the morpheme class definition and the morphological rules.The morpheme class definition is used to list all possible morphemes of a given morpheme class.It is possible to declare an empty morpheme, which implies that the morpheme class may be omitted in the morphological rules.A formal syntax for morpheme class definition is: Figure 7. FST for Ibibio sentence

Table 4 .
Figure 13.Redundant state transitions for a sample sentence

Figure 14 .
Figure 14.A normalized form of experimented sentence

Figure 16 .
Figure 15.A Scheme representation of Figure 14

Table 1 .
Symbol table The sentences are written in 'Ibibio SAMPA', christened after a collaborative language documentation/speech synthesis research project.The Ibibio SAMPA table is shown in Table2.