1. Trang chủ
  2. » Ngoại Ngữ

Formal specification of full grammar models Implementation of tactical generation resources for all three languages in a Final Prototype

229 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Formal Specification of Full Grammar Models Implementation of Tactical Generation Resources for All Three Languages in a Final Prototype
Tác giả Serge Sharoff, Lena Sokolova, Danail Dochev, Ivana Kruijff-Korbayová, Geert-Jan Kruijff, Jiří Hana Kamenka, Staykova Elke Teich, John Bateman
Trường học University of Brighton
Chuyên ngành Linguistics
Thể loại Thesis
Năm xuất bản 2000
Thành phố Brighton
Định dạng
Số trang 229
Dung lượng 3,75 MB

Cấu trúc

  • 1. Introduction (0)
    • 1.1 Goals of this deliverable (10)
    • 1.2 Overview of this deliverable (11)
    • 1.3 Notational conventions in this document (0)
  • 2. Theory, methods, techniques and organization of grammar development (14)
    • 2.1 Motivations for the approach of multilingual grammar development pursued in AGILE (14)
    • 2.2 Theory: Systemic Functional Linguistics (17)
    • 2.3 Tactical generation in KPML (18)
    • 2.4 Multilingual grammar development with KPML (21)
      • 2.4.1 Resource sharing: contrastive grammar (22)
      • 2.4.2 KPML as a development environment for multilingual grammars (23)
    • 2.5 Organization of work in Work Packages 6 & 7 (24)
  • 3. Linguistic specification and implementation of grammatical resources for the Final Prototype (28)
    • 3.1 Transitivity (31)
      • 3.1.1 Theoretical background (31)
      • 3.1.2 Government patterns and circumstantials (32)
      • 3.1.3 Support verb constructions (40)
    • 3.2 Aspect (43)
      • 3.2.1 Theoretical background (44)
      • 3.2.2 Aspect choice in subordinated clauses (49)
      • 3.2.3 The Aspect region specification (50)
      • 3.2.4 The Aspect region implementation (51)
    • 3.3 Modality (55)
      • 3.3.1 Linguistic Specification (55)
      • 3.3.2 Formal Specifications (59)
      • 3.3.3 Implementation (63)
    • 3.4 Subject dropping (78)
      • 3.4.1 Description of the Phenomenon (78)
      • 3.4.2 Formal specifications (84)
      • 3.4.3 Implementation (86)
    • 3.5 Word order (95)
      • 3.5.3 A Word Ordering Algorithm Using Information Structure (103)
      • 3.5.4 Structural Constraints on Word Order in the grammars (108)
      • 3.5.5 Concluding remarks (123)
    • 3.6 Textual conjunction (125)
      • 3.6.1 Description of the Phenomenon (125)
      • 3.6.2 Formal Specifications (130)
      • 3.6.3 Implementation (132)
    • 3.7 Agreement (143)
      • 3.7.1 Introduction (143)
      • 3.7.2 Subject-Predicate Agreement (144)
      • 3.7.3 Agreement within the nominal group (154)
      • 3.7.4 Agreement of subject and predicative adjective (158)
      • 3.7.5 Conclusion (162)
    • 3.8 Quantification (163)
      • 3.8.1 Cardinal numerals (163)
      • 3.8.2 Quantity selection (170)
    • 3.9 Clause complexity (179)
      • 3.9.1 Formal Specifications (182)
      • 3.9.2 Implementation (191)
    • 3.10 Summary and conclusions (0)

Nội dung

Introduction

Goals of this deliverable

This report aims to provide a comprehensive linguistic description, specification, and implementation of grammatical phenomena essential for the Final Prototype, building upon the previous SPEC2 and IMPL2 deliverables It includes additional grammar phenomena necessary for generating target texts and offers a refined account of previously discussed phenomena, considering the unique grammatical features of Bulgarian, Czech, and Russian Notable implementations include a contrastive analysis of spatial locating circumstantials and a revised aspect description The project emphasizes collaborative development, distributing grammar description and implementation tasks among partners based on linguistic phenomena rather than by language, which is reflected in the organization of Chapter 3, where sections are arranged according to phenomena.

This report aims to explain the reasoning behind the chosen approach to grammar development in AGILE, detailing its foundational theory and methodology As the concluding report for Work Packages 6 and 7, it is essential to revisit the initial circumstances at the project's onset that influenced our approach and work strategy, enabling a comprehensive understanding of the results achieved in these work packages.

Overview of this deliverable

This report begins by outlining the motivations behind the grammar development approach adopted in AGILE, followed by an exploration of its theoretical foundations and the methodology of resource sharing employed in its implementation Sections 2.2 to 2.4 provide a condensed overview of the initial deliverable in WP7, while Section 2.5 summarizes the three phases of work in WPs 6 and 7, detailing the distribution of responsibilities among partners Additionally, the technical section of the report offers a comprehensive linguistic description, specification, and implementation of the linguistic phenomena addressed in tasks 6.3 and 7.3.

3) The linguistic phenomena treated in this report are the following (the partners responsible for each are given in brackets):

 Mood and Modality (Section 3.3; BAS);

In this article, we summarize the accomplishments of Work Packages 6 and 7, highlighting the existing grammars for Bulgarian, Czech, and Russian integrated into the AGILE framework We also identify the limitations of these grammars and evaluate their potential for reuse in various domains and applications.

This document outlines the notational conventions employed in Systemic Functional Grammar, with Figure 2 illustrating the notation used for system network specifications and Figure 3 detailing the syntax for computational system networks in KPML In the text, feature names are highlighted in bold, while system names and region names are presented in uppercase Language abbreviations include Bg for Bulgarian, Cz for Czech, and Ru for Russian Key functional elements include Actor and Subject, with system names like MOOD and grammatical features Feature selection expressions include delicacy and simultaneity, while realization statements involve inserting subjects and conflating Subject/Actor Lexical constraints are defined for processes and nouns, and syntactic structures are represented through trees shown as screen dumps.

Figure 1: Notational conventions in Systemic Functional Grammar

[feature-b] (Function2 : feature-y, Function3 ! feature-z)

Figure 2: Notation for system networks

The system is identified by its name, while the inputs define the entry conditions that activate it Outputs detail the system's features, which may include realization statements such as insert, conflate, or preselect Additionally, the region indicates the functional area to which the system belongs, serving as a more specific subclassification of the metafunction provided in the metafunction slot, and aids in the organization of resources.

Figure 3: Syntax for computational system network specifications

Furthermore, the following abbreviations are used:

- POS for Part Of Speech;

This report utilizes abbreviations for morphological categories in the English glosses of provided examples The order of categories includes POS, Gender, Number, Case, Person, and Determinacy, with any irrelevant categories omitted based on context Each category has specific possible values, although not all values are applicable across the three languages discussed.

 POS: Adj (adjective), PastPart (past participle), Nominalization (nominalized verb), Imper (imperative mood), Indic (indicative mood), Inf (infinitive), Gerund (Russian verbal adverb);

 Gender: M (masculine, in Czech masculine animate), I (masculine inanimate), F (feminine), N (neuter);

 Number: Sg (singular), Pl (plural);

 Case: Nom (nominative), Gen (genitive), Dat (dative), Acc (accusative), Voc

(vocative), Loc (locale) and Ins (instrumental);

 Determinacy: Det (definitive article for Bulgarian);

 Aspect: Imperf (imperfective), Perf (perfective).

1 Sometimes more detailed than classical divison to 9 or 10 POS categories, e.g PastPart (past participle) This category is also omitted if it is the same for the English word.

Notational conventions in this document

2.1 Motivations for the approach of multilingual grammar development pursued in AGILE

The platform for implementation of the grammatical resources of Bulgarian, Czech and Russian chosen in AGILE is the Komet-Penman MultiLingual (KPML) system (Bateman,

In this article, we explore the motivations behind selecting KPML as a tactical generator and grammar development workbench, rooted in the Penman system for generating English We also provide an overview of the theoretical foundations of the KPML system, which is based on Systemic Functional Linguistics (SFL) as proposed by Halliday.

1973, Halliday, 1985, Halliday & Matthiessen, 1999) (Section 2.2) and describe KPML’s multilingual design and use as a workbench for developing grammatical resources (Section 2.3) 2

To fully appreciate the advancements in grammar development within AGILE, it's essential to understand the initial circumstances at the project's inception We aimed to prevent the emergence of three distinct generators employing varying grammar approaches and formalisms Recognizing the project's scale, we agreed on the necessity of a unified approach to ensure that our development efforts could be efficiently and effectively shared among all partners involved.

In the AGILE project, a significant challenge was the need to develop computational resources for multiple languages that had not previously been addressed in natural language generation Unlike other multilingual initiatives such as DRAFTER, the languages involved—Bulgarian, Czech, and Russian—lacked existing computational grammars and generation algorithms Although there were some morphological components and lexicons available for these languages, the absence of reusable grammatical frameworks necessitated the creation of new grammars specifically for AGILE's objectives.

The expertise in computational linguistics varies significantly among Bulgarian, Czech, and Russian languages While the Czech Republic and Russia boast extensive research and a strong tradition in machine translation, particularly in Russia, Bulgaria has less established expertise in this area Consequently, the development of a framework for natural language generation and grammar would be a novel endeavor for all partners involved.

2 A slightly more detailed description of the major principles of Systemic Functional Linguistics, its use in Natural Language Generation and its implementation of multilinguality is given in the IMPL1 deliverable.

3 Existing morphological components of Bulgarian, Czech and Russian have been re-used in AGILE (cf e.g., deliverable IMPL2).

Theory, methods, techniques and organization of grammar development

Motivations for the approach of multilingual grammar development pursued in AGILE

The platform for implementation of the grammatical resources of Bulgarian, Czech and Russian chosen in AGILE is the Komet-Penman MultiLingual (KPML) system (Bateman,

In this article, we discuss the selection of KPML as a tactical generator and grammar development workbench, drawing on its foundation in the Penman system for English generation We outline the motivations behind this choice and include a concise overview of the theoretical framework that supports the KPML system, specifically Systemic Functional Linguistics (SFL) as proposed by Halliday.

1973, Halliday, 1985, Halliday & Matthiessen, 1999) (Section 2.2) and describe KPML’s multilingual design and use as a workbench for developing grammatical resources (Section 2.3) 2

To fully appreciate the advancements in grammar development within the AGILE project, it is crucial to understand the initial conditions at the project's inception We aimed to prevent the emergence of three separate generators utilizing distinct grammar approaches and formalisms Recognizing the project's scale, we agreed on the necessity of a unified approach to ensure that development efforts could be shared efficiently and effectively among all partners involved.

In the AGILE project, the initial challenge was to develop computational resources for multiple languages, a task that differs from other multilingual initiatives like DRAFTER, where existing resources were available Unlike those projects, AGILE focused on Bulgarian, Czech, and Russian, which lacked computational frameworks for natural language generation Although there were some morphological components and lexicons for these languages, there were no reusable grammatical accounts or generation algorithms Consequently, AGILE required the creation of new grammars for Bulgarian, Czech, and Russian from scratch.

The expertise in computational linguistics varies significantly among Bulgarian, Czech, and Russian languages While the Czech Republic and Russia boast a rich tradition in this field, particularly in machine translation, Bulgaria lacks similar expertise Consequently, any framework for natural language generation and grammar development would be novel to all partners involved.

2 A slightly more detailed description of the major principles of Systemic Functional Linguistics, its use in Natural Language Generation and its implementation of multilinguality is given in the IMPL1 deliverable.

The AGILE project has effectively incorporated three morphological components from Bulgarian, Czech, and Russian, as outlined in deliverable IMPL2 This approach is notably advantageous as it provides a framework that is easily accessible to all partners, ensuring thorough documentation and robust testing.

In developing grammar and tactical generation approaches, it was essential to consider the linguistic tradition in Eastern Europe, which is predominantly functional rather than formal The Prague School significantly influenced continental European linguistics, and both Russia and Bulgaria have strong communicative language traditions There is a wealth of linguistic-descriptive work in these functional traditions for all three languages, which we aimed to incorporate into our descriptions Therefore, the AGILE approach needed to align closely with the established linguistic practices in Eastern Europe.

The project examined the linguistic properties of three distinct Slavonic languages: Czech, Russian, and Bulgarian, each belonging to different subtypes—Western, Eastern, and Southern, respectively While Czech and Russian are characterized as highly inflecting languages, Bulgarian has significantly reduced its inflections, making it comparable to English within the Germanic language family To effectively address the unique features and commonalities of these languages, a flexible general linguistic and computational approach was essential.

In summary, the key constraints influencing the decision for a linguistic and computational approach to grammar development and tactical generation in AGILE at the project's inception can be encapsulated in several important keywords.

 Complexity & size of the task Computational resources for three languages that had not been described before for the purpose of natural language generation had to be developed.

 Existing expertise Different starting points in terms of existing resources and general computational expertise relating to the three languages had to be accommodated.

 Existing linguistic traditions It appeared desirable to be able to draw on the extensive bodies of descriptive work in the Eastern European functional linguistic tradition

 Typological considerations Typological variation among the three languages had to be accommodated.

After careful consideration of various constraints, we selected the Komet-Penman MultiLingual (KPML) system as our platform for grammar development and tactical generation Uniquely, KPML offers both advanced tactical generation capabilities and a robust workbench for implementing computational grammars for natural language generation, making it a rare find in the field It effectively supports our established requirements through its sophisticated design.

The implementation of multilinguality in KPML relies on the principle of resource sharing, allowing the grammar specification of a new language to utilize existing frameworks, even across significant typological differences Throughout the project, we leveraged the established English grammar from KPML, specifically the Nigel grammar (Mann and Matthiessen, 1983), despite the notable typological divergence between English and Slavonic languages Further insights into resource sharing can be found in Section 2.4.

 Existing expertise KPML is based on one of the oldest, most extensively tested and most widely used generation system for English, the Penman system (Penman,

In 1989, comprehensive documentation on KPML was provided by Bateman (1997a), offering partners ample resources to gain proficiency in the system During the initial phase of the project, John Bateman and Elke Teich conducted several tutorials, which facilitated the development of training materials for the partners.

 Existing linguistic traditions KPML’s underlying linguistic theory is Systemic

Functional Linguistics (SFL; Halliday, 1973; Halliday, 1985; Halliday &

The KPML generator adopts a functional approach, allowing for compatibility with various functional methodologies Notably, it incorporates a word order strategy from the Prague School, which enhances flexibility in addressing the specific needs of Slavonic languages, a feature that was previously lacking.

KPML offers a sophisticated approach to multilinguality that distinguishes it from traditional machine translation systems It recognizes that all languages share commonalities and differences, allowing for a flexible representation that incorporates both aspects Unlike interlingua systems, which prioritize commonality, or transfer systems, which emphasize diversity, KPML's model supports the re-use of computational resources across typologically diverse languages This comprehensive understanding of multilinguality is essential for effective language processing and is further elaborated in Section 2.4.

Utilizing KPML as a unified approach for tactical generation and grammar development has enabled the enhancement of the existing Nigel grammar for English while facilitating collaborative efforts among partners This collaboration focused on linguistic phenomena rather than individual languages, allowing for a more efficient division of work Notably, the principles of word order and aspect in Bulgarian, Czech, and Russian exhibit significant similarities, despite minor variations in their surface forms By identifying these commonalities, the project effectively allocated linguistic and implementation tasks based on shared linguistic features.

In conclusion, KPML is the preferred development platform due to its design for large-scale generation projects, providing well-tested resources suitable for practical applications This platform also supports contrastive-linguistic work, making resource development for new languages significantly easier than creating grammars from the ground up.

Theory: Systemic Functional Linguistics

The KPML system is grounded in Systemic Functional Linguistics (SFL), a British linguistic framework that aligns with functional approaches to language SFL has roots in the works of prominent linguists such as Hjelmslev, Dik, and Halliday, and shares similarities with the Continental-European Prague School, as evidenced by the contributions of Firbas, Daneš, and Sgall.

Systemic Functional Linguistics (SFL) emphasizes the dual concepts of function and system, recognizing three primary language functions: ideational, interpersonal, and textual It focuses on the grammatical system, particularly through Systemic Functional Grammar (SFG), which represents language grammar as a network of systems This approach is classification-based rather than rule-based, featuring an inheritance hierarchy of grammatical types, known as features, organized by increasing specificity SFG aligns with other computational linguistics models, such as Head-Driven Phrase Structure Grammar, highlighting its relevance in contemporary linguistic studies.

SFG distinguishes itself from other classification approaches through its functional motivation of grammatical types, rooted in Systemic Functional Linguistics (SFL) This perspective emphasizes metafunctions, which represent generalized functions of language The ideational metafunction conveys propositional content, particularly through transitivity, which outlines processes, participants, and circumstances like Actor and Goal The interpersonal metafunction reflects the roles of speakers in interactions, with grammatical features such as the MOOD system that differentiates between declarative, interrogative, and imperative clauses, influencing their syntactic structures Meanwhile, the textual metafunction addresses textual organization aspects, including coherence and cohesion, as seen in theme-rheme patterns and information structuring These functionally motivated systems are linked to realization statements, which define the syntagmatic constraints of each grammatical feature, such as the requirement for the Subject to precede the Finite Verb in declarative clauses in English.

In Systemic Functional Grammar (SFG), functional elements such as Actor, Goal, and Circumstantial are capitalized as a convention The concepts of functionally motivated classes and syntagmatic structure are encompassed by the term axiality, highlighting the interconnectedness of these two aspects through the relation of realization Additionally, linguistic descriptions are organized by rank, which serves as the primary system in grammatical classification, differentiating between clauses, nominal groups, prepositional phrases, adjectival and adverbial groups, words, and morphemes This rank scale establishes the fundamental grammatical classes, ensuring that the descriptions of any two ranks are mutually exclusive Within the SFL model, grammar forms the lexico-grammatical stratum, while semantics and context represent more abstract strata These strata interact through inter-stratal realization, where context categories are realized by semantic categories, which are then realized by lexico-grammatical categories For a visual representation of the five organizing principles—metafunctional diversification, stratification, ranking, axiality, and delicacy—refer to Figure 1.

Figure 4: Organizing principles of representation in SFG

This section discusses the relationship between the properties of Systemic Functional Grammar (SFG) and the tasks associated with tactical generation, detailing the generation process using SFG as implemented in the Penman-based generator, KPML.

Tactical generation in KPML

Utilizing a systemic functional model as the linguistic foundation for Natural Language Generation enhances the task modeling process, as its features align with the necessary resources for effective generation.

5 For a fairly concise introduction to Systemic Functional Grammar see (Bateman, 1992) More comprehensive accounts of the theory can be found in (Halliday, 1978; Halliday and Matthiessen, 1999). semantics lexico- grammar axiality delicacy

Textual stratification in context rank is evident as numerous generation systems are inspired by Systemic Functional Linguistics (SFL) Notable examples include Penman, KPML, COMMUNAL, Multex, Comet, and Surge Various projects have successfully utilized SFG-based tactical generators like Penman and KPML, such as the German KOMET and TECHDOC projects, along with the British DRAFTER project.

1997) and the Canadian Healthdoc project (DiMarco et al., 1995; Wanner and Hovy, 1996).

SFG-based generation in Penman-style generators involves breaking down a text plan into grammatically realizable units, such as clauses and nominal groups This process includes two key subtasks: ensuring grammatical structure and managing lexical realization By effectively handling these tasks, SFG-based generators facilitate the creation of coherent and well-structured text.

 task 1: to interpret a semantic input expression in terms of the grammar available;

 task 2: to spell out the syntactic constraints of the language in which an utterance is to be generated.

(Yang et al., 1991) characterize task 1 as involving ”deciphering the goals given by the speaker program and determining how they can be realized in language” (Yang et al.,

In 1991, Yang et al highlighted two key aspects of tactical generation: the functional aspect, which focuses on the purpose of the task, and the syntactic aspect, which ensures that the final product complies with the language's syntactic rules.

In the context of an SFG-based generator like KPML, the key question revolves around determining the optimal timing for selecting features from the grammatical system network This selection process is guided by the semantic input expression and involves considering the associated syntactic constraints articulated through realization statements.

The generation process starts with an input expression in the form of SPL (Sentence

Planning Language) (Kasper, 1989) See Figure 2 showing two such SPL expressions (henceforth: SPLs) for the sample sentences

(1) The user chooses the PLINE command. and

The SPLs are instances of the Upper Model (Bateman et al., 1990; Henschel and

In KPML, ideational meaning, as outlined by Bateman (1994), plays a crucial role in expressing propositional information within a Systemic Functional Linguistics framework Additionally, a Systemic Patterning Language (SPL) encompasses interpersonal information, such as speech act commands, alongside textual elements like identifiability and thematic organization.

6 Three kinds of keywords are defined for SPL: Upper Model or domain concepts (e.g., choose, actor), inquiries (e.g., :identifiability-q identifiable) and macros, which consist of a set of inquiries (e.g., :speechact command, :theme o1).

(SPL-1) The user chooses the PLINE command.

:identifiability-q identifiable :class-ascription (c / software-command

(SPL-2) Choose the PLINE command.

:identifiability-q identifiable :class-ascription (c / software-command :name PLINE)))

Figure 5: Input to generation with KPML

Given an input such as (SPL-1) or (SPL-2) (Figure 2), the traversal of the grammatical system network is started For English, this is the Nigel grammar (Mann and Matthiessen,

In the decision-making process of a system, a chooser is activated at each choice point to evaluate the input representation for its semantic basis Defined as a decision procedure linked to a system, the chooser facilitates the interaction between semantic and grammatical information Structurally, the chooser resembles a tree, where its nodes represent inquiries that interpret semantic knowledge within the grammar This choosing process occurs continuously as the system network is navigated, with each system's chooser being invoked, thereby constructing syntagmatic structures through successive realization statements The outcomes of this generation process for SPL-1 and SPL-2 are illustrated in Figure 6.

In a systemically-based approach to tactical generation like KPML, the first task involves interpreting the grammatical aspects of an input semantic representation by selecting paradigmatic features from the grammatical system network The second task focuses on constructing a syntactic structure by articulating the realization statements linked to these paradigmatic features.

Multilingual grammar development with KPML

This section outlines the multilinguality model that supports the KPML system, highlighting various dimensions of multilingual descriptions that illustrate the similarities and differences among languages Additionally, it provides a brief overview of KPML's role as a platform for developing grammatical resources.

7 For more details on the tactical generation process in KPML see (Bateman, 1997a).

Figure 6: Output structures (clause level) for SPL-1 and SPL-2

The representation of multilinguality in KPML highlights the inherent differences and similarities among languages, which vary depending on the level of linguistic abstraction used for contrastive analysis Regardless of the chosen level, differences are likely to persist, making them challenging to capture if one assumes a singular interlingua where everything aligns KPML's approach to describing multilinguality addresses this challenge by allowing for multiple dimensions of cross-linguistic commonality and contrast, resulting in a flexible contrastive-linguistic description that adapts to the complexities of language.

When examining the contrastive-linguistic properties of multiple languages through the lens of Systemic Functional Linguistics, we can observe that their grammars may align or diverge across several key dimensions.

 At the level of grammar, languages tend to be similar in terms of systems (functional paradigms) and different in terms of syntagmatic, surface-syntactic realization

 Grammatical systems of low delicacy (grammatical types located high in the classification hierarchy) tend to be similar across language, and systems of higher delicacy tend to be dissimilar.

 There may be different preferences in different languages concerning the rank (clause, nominal group, prepositional phrase etc.) at which a certain phenomenon is expressed (e.g., nominally vs verbally).

Languages show greater similarities at higher levels of description, such as semantics, while exhibiting more differences at lower levels, like lexico-grammar.

 Languages may differ according to which metafunction a particular formal means serves.

The constructs of stratification, metafunctional diversification, ranking, axiality, and delicacy create a framework for understanding multilingual variation, allowing for comparisons of similarities and differences among languages This framework, illustrated in Figure 5 (adapted from Bateman, 1995), is utilized in contrastive-linguistic representation within KPML and supports the concept of resource sharing.

8 This is a major problem with interlingua-based machine translation. semantics lexico- grammar axiality delicacy

Figure 7: Dimensions of multilingual description

When analyzing the grammar of a language based on rank, metafunction, axis, and delicacy, it is possible to leverage existing grammatical descriptions for other languages that share similarities in these areas This approach is particularly effective for functional paradigms; however, it is essential to adapt the foundational description to accommodate unique language-specific characteristics Notably, Bulgarian, Czech, and Russian are classified as free word order languages, with Czech and Russian exhibiting high levels of inflection.

Resource sharing in KPML refers to the practice of reusing and adapting existing language descriptions for the description of another language (Bateman, 1995; Bateman et al., 1991b; Teich, 1995) This approach is part of a broader strategy known as transfer comparison, which involves constructing grammatical descriptions based on previously established ones (Halliday et al., 1964).

2.4.2 KPML as a development environment for multilingual grammars

The KPML development environment offers a suite of tools that facilitate resource sharing, particularly in multilingual resource development It focuses on creating multilingual descriptions based on the paradigmatic and functional aspects of grammar, rather than just surface syntax This approach enables the identification of commonalities among diverse languages, while also accommodating cross-linguistic variations Consequently, this reduces the effort required to write grammars for new languages.

The KPML environment expands upon the original Penman system (Penman, 1989) by offering enhanced multilingual capabilities, improved usability, and superior development support Key features that contribute to this enhanced development support include various functionalities that facilitate easier handling and more efficient programming (Bateman, 1997a).

Test suites are interconnected with resource definitions, allowing for analysis of resources from both the instance perspective, represented as a string, and the grammar perspective, which reflects the system network.

 debugging is generally graphically driven;

 grammatical resources are highly modularised (monolingually and multilingually);

 extensive graphical and textual inspection of all aspects of the grammatical resources and their use are provided (for an example of one of the graphing facilities see Figure 8).

Figure 8: Graph of a system network having used the INSPECT option in the development window

KPML is currently being used by a number of researchers in natural language generation and is continuously refined according to the requirements of its users

Before detailing the linguistic specifications and implementations discussed in tasks 6.3 and 7.3, it is essential to highlight the organization of Work Packages 6 and 7 The effective structure of these work packages was facilitated by the resource-sharing method provided by KPML This approach enabled the team to build on its English grammar, Nigel, and to allocate development efforts based on linguistic phenomena rather than focusing on individual languages Consequently, a genuinely contrastive-linguistic method of operation was achieved.

Organization of work in Work Packages 6 & 7

Work packages 6 and 7 were structured into three distinct phases for linguistic specification and implementation, with WP6 serving as a foundation for WP7 The first phase culminated in the Initial Demonstrator after six months, while the second phase produced the Intermediate Prototype by month 18, and the third phase led to the Final Prototype Each phase included at least one week-long working meeting at a partner site, facilitating discussions on challenges related to linguistic specifications and their corresponding implementations.

The initial phase of linguistic specification and implementation aimed to address the linguistic phenomena present in the target texts of the Initial Demonstrator This phase closely followed a resource-sharing strategy based on the English grammar framework developed by Nigel Within a span of three months, we successfully generated complex sentences from the first set of sample texts in Bulgarian, Russian, and Czech, as detailed in Appendix II of IMPL1 This achievement was facilitated by leveraging the resource-sharing strategy utilized in KPML and utilizing its supportive tools Only minimal modifications were made to the English grammar, such as adopting specific system networks like the MOOD system and adjusting realization statements During this phase, tasks were allocated by language, with Bulgarian partners focusing on Bulgarian, Czech partners on Czech, and Russian partners on Russian.

Following an analysis of Bulgarian, Czech, and Russian instructional texts, we identified key grammatical phenomena essential for generating diverse instructional styles To prevent sublanguage bias, we integrated these phenomena with the functional regions defined in systemic functional grammars within KPML This approach provided a dual foundation for linguistic description and resource implementation, emphasizing both instance-oriented analysis of specific text registers and a comprehensive view of the grammatical system.

Theme, Culmination, Conjunction, Voice g r oup s/ph r a s e s prep Minor transitivity nom Meta- actant

Nominal-type, Epithet, Qualification, Selection

Person, Attitude Determinat- ion adj Modi- fier

Quality-type quant Modi- Quantity-type

In February 1998, the task 7.1 project commenced with a kick-off workshop in Prague For the following month, participants utilized a stand-alone version of KPML along with various grammar exploration tools During this period, KPML was successfully ported to Harlequin Lisp, and the Harlequin version was released by mid-March 1998.

Circumstantial Comment Conjunctive complex simplex

Table 1: Functional regions in KPML grammars

At the conclusion of the first phase, the limitations of resource sharing with English became evident, particularly concerning the complex agreement phenomena and flexible word order characteristic of Slavonic languages, which are determined primarily by pragmatics The unique aspect of these languages, not addressed in Nigel grammar, highlighted the need for collaboration among Bulgarian, Czech, and Russian In the second phase of WP 6 and 7, tasks were divided among partners based on linguistic phenomena and functional regions, focusing on specific features like transitivity, circumstance, qualification, and classification The Czech team, for instance, concentrated on word order, using Czech for testing while consulting partners for insights on Bulgarian and Russian, subsequently adapting their findings to meet the needs of all three languages This collaborative effort led to the exchange of computational specifications, which were then integrated into each partner's grammar systems.

Table 2 presents the functional regions previously discussed, now emphasizing those relevant to grammar specification and implementation within Work Packages 6 and 7, with partner responsibilities noted in brackets While the non-highlighted regions have also been addressed, the focus has been solely on the requirements of the target texts rather than on comprehensive grammatical systems.

The second phase aimed to analyze the linguistic phenomena present in the target texts for the Intermediate Prototype This phase revealed a greater syntactic variety than before, necessitating the exploration of previously unaddressed linguistic elements (refer to TEXS2 for the target text types) Instead of solely focusing on generating these texts and addressing only the phenomena that emerged, we adopted a comprehensive approach to ensure a systematic linguistic coverage that encompasses all three language systems, rather than limiting our focus to sublanguage aspects.

C o m p le x it y (clause and nom; CU) Transitivity,

(RRIAI) g r oup s/ph r a s e s prep Minor transitivity

Nominal-type, Epithet, Qualification, Selection

Quality-type quant Modi- fier

Quantity-type adv Modi- fier

Circumstantial Comment Conjunctive complex simplex

The article discusses the significance of word order and agreement in both CU and highlights the interfacing of external morphological components, specifically RRIAI It emphasizes that determination operates at the clause level, a characteristic shared by Russian and Czech, indicating its status as a clause phenomenon Additionally, aspect is positioned within the experiential metafunction at the clause level, underscoring its relevance in understanding grammatical structure.

Table 2: Functional regions under focus in AGILE

In the initial phases of the project, tasks 6.1, 6.2, 7.1, and 7.2 were executed sequentially, with 6.1 and 6.2 providing foundational input for 7.1 and 7.2 By the end of the second phase, substantial grammar components for Bulgarian, Czech, and Russian were established To enhance efficiency in executing tasks 6.3 and 7.3, we opted to merge them into a single task The collaborative work distribution model from phase II was maintained for the combined task 6.3/7.3, with partners focusing on the description and implementation of specific linguistic phenomena rather than solely on their respective languages.

The primary objective of this phase was to enhance the current implementations by addressing language-specific requirements and expanding resources for generating target texts for the Final Prototype Key focus areas included refining linguistic specifications and addressing unique features of Slavonic languages, particularly aspect, word order, and agreement To accommodate these needs, the capabilities of KPML were further developed.

The strategy of distributing work based on linguistic phenomena, rather than focusing solely on individual languages, has fostered effective contrastive-linguistic analysis and collaborative implementations Although this approach proved more challenging than a straightforward language-based distribution, it necessitated the establishment of interaction structures that promoted regular communication and information exchange among partners This included practices such as weekly reporting to coordinate work packages and maintaining discipline in adhering to these collaborative frameworks.

Linguistic specification and implementation of grammatical resources for the Final Prototype

Transitivity

This section outlines the implementation of transitivity features in Slavic languages, building on previous deliverables SPEC1 and SPEC2 We explore how process participants are represented through prepositional phrases (PPs), which serve as key circumstantial elements in clause structure, particularly for spatial locations Additionally, we examine support verb constructions, where the main verb conveys syntactic information while the argument carries the process meaning, often as nominalizations The prevalence of support verb constructions varies across languages, being more significant in Russian compared to Bulgarian and Czech, reflecting socio-cultural influences.

The Nigel grammar of English distinguishes between participants and circumstantial elements by applying different realization constraints Participants are represented as nominal groups (NGs), while circumstantial elements are expressed through prepositional phrases (PPs) Halliday (1985) considers PPs as condensed forms of clauses, contrasting with NGs, which are more elaborated A typical nominal group enhances the description of the object denoted by its Head, whereas a PP provides context about the circumstances surrounding a process Thus, the nominal group within a prepositional phrase serves a specific function in conveying additional information.

In Halliday's framework, the minirange element corresponds to the Minorprocess element, highlighting the similarities between prepositions in English and processes This reasoning is also applicable to Slavic languages, suggesting a broader linguistic connection.

 the possibility of mutual substitution between some verbs and prepositions:

En: about the trial concerning the trial

Bg: за процеса що се отнася до процеса

Cz: o procesu co se týče procesu

 the derivation of some prepositions from verbal forms:

En: include including regard regarding

Ru: включать включая благодарить благодаря

Bg: включвамвключвайки благодаря благодарение на

Cz: pomoci (help, aid) pomocí+gen (with the help of) ohlédnout se -> noun: ohled (regard) ohledně (regarding)

 the possibility of negation and other modifications of prepositions:

En: not without some misgivings right behind the door

Ru: не без опасений сразу за дверью

Bg: не без опасения непосредствено зад вратата

Cz: ne bez nehody rovnou/hned za dveřmi

Slavic languages demonstrate that spatial elements are typically expressed through prepositional phrases, akin to English However, potential participants can be represented either as nominal groups or prepositional phrases This flexibility allows for the addressee to be realized as either a nominal group or a prepositional phrase, indicating a nuanced choice beyond the distinction between participants and circumstances.

(1) Ru: обратиться к пользователям turn to users-Dat approach the users (with a suggestion)

(2) Cz: klepnout na tlačítko click on button-Acc click a button

Alternatively, the Addressee can be realized as a nominal group using the dative case as in (3) or accusative case as in (4):

(3) Ru: справка подсказывает пользователю hint-Nom suggest-Sg-3 user-Dat a hint tells the user

(4) Ru: сообщение информирует пользователя о… message-Nom inform-Pl-3 user-Acc about… a message tells the user when…

The realization of the Addressee is influenced by the type of verb used For instance, examples (1)-(4) illustrate that participants and circumstances often exist on a continuum Example (1) functions more as a circumstance, as it can be substituted with an alternative destination specification, such as "to apply to the computer users society." In contrast, examples (3) and (4) are more aligned with participants, as they are represented through Addressee and Direct complement roles.

The Nigel grammar addresses similar cases in English by incorporating markers before noun groups (NGs), such as "by" and "to" in phrases like "They were created by us" and "I told a story to her." This approach was carried over to the Intermediate Prototype, which utilized empty prepositions governing dative or instrumental cases The Final Prototype continues to employ this technique, as the precise realization of the Addressee is not essential.

To cover some more of the relevant realisations of process participants in the Final Prototype, we need some extensions of the domain model (DM) concerning process type.

In the Intermediate Prototype, user-action processes were aligned with the Upper Model (UM) concept of dispositive-material-action For the Final Prototype, it is essential to differentiate between two subclasses of user-action: USER-ACTION-DIRECTED, which corresponds to the previous user-action, and USER-ACTION-NONDIRECTED, akin to UM motion processes The key distinction lies in the absence of the Actee slot in the nondirected category, indicating a source or destination for the process These modifications are crucial for enhancing text planning for overviews and functional descriptions, as well as for accurately generating participant and circumstance realizations in our languages.

Prepositional phrases lacking clear circumstantial meanings can function as either circumstances or participants, influenced by the predicate's semantics and context For instance, in Russian, the preposition “с” and in Czech, “s,” typically indicating a circumstantial meaning, can also denote a relationship between a process and an inherent participant within that process.

(5) Ru: соединить начальную точку с конечной точкой connect start-Adj point-Acc with end-Adj point-Ins to connect the start point with the end point

(6) Cz: spojit počáteční bod s koncovým bodem connect initial point-Acc with final point-Ins to connect the start point with the end point

The start point and end point are essential participants in the connection process, often represented as a single functional element known as the Goal.

(7) Ru: соединить (эти) точки connect (these) points-Pl-Acc to connect the points

The concept of destination plays a crucial role in spatial localization, as indicated by prepositions such as Ru: к, Bg: към, Cz: k+Dat, na+Acc, and do+Gen This aspect also highlights participant roles within certain concepts of the Distribution Model (DM), particularly in DM::SNAP and DM::APPLY-PROPERTY, which both include an Actee and a Destination slot Consequently, we have incorporated a DESTINATION slot into the SNAP (snap something to something) and APPLY-PROPERTY (apply something to something) concepts.

(8) …so the arc snaps to the end point of the line.

(a) Ru: … чтобы привязать дугу к конечной точке линии, in order to attach arc-Acc to end point-Dat line-Gen

(b) Bg: за да прикачите дъгана към крайната точка на линията in order to attach arc to end point of line

(c) Cz: pro přichycení koncového bodu oblouku k čáře, for snapping end-Adj point-Gen arc-Gen to line-Dat

(8) expresses a relation similar to that of connection (note that no vicinity is meant here, an attachment implies a link) Also, the Russian preposition “ к ” can express the target of an action:

(9) to apply a style to an item

(a) Ru: применить стиль к элементу apply style-Acc to item-Dat

(b) Bg: прилагам стила към елемента apply style to item

(c) Cz: aplikovat styl na element apply style-Acc to element-Acc

The DESTINATION slot interacts with two key domain concepts: DM::RETURN-TO and DM::SWITCH-MODE In Slavic languages, these concepts are represented by verbs that lack an Actee role, leading to the classification of RETURN-TO and SWITCH-MODE as subconcepts of USER-ACTION-NONDIRECTED.

(a) Ru: перейти в режим X go into mode-Acc X

(b) Bg: Превключвам на режим X switch into mode X

(c) Cz: Přepnout do režimu X switch into mode-Gen X

(a) Ru: вернуться в режим X return-Refl into mode-Acc X

(c) Cz: Vrátit se do režimu X return-Refl into mode-Gen X

In Russian, the concepts that involve an Actee role are defined by destination and vicinity features, while those without an Actee role rely on destination and contact The distinction between vicinity and contact is governed by a specific inquiry code.

(defun REACHING-PROCESS-Q-CODE (locativerelation place)

(let ((parent (term-graph-parent (SYMBOL-VALUE place))))

(term-type-p parent 'dm::snap)

(term-type-p parent 'dm::apply-property)

(fetch-feature 'reaching-process-q place))

The structure produced by the SPL given below is shown in Figure 9:

The realization of DM::SAVE is particularly intriguing across the three Slavic languages due to the presence of an additional spatial-locating role that varies among them Russian and Bulgarian employ a nonorienting location akin to "save in a file," while Czech utilizes a destination similar to "save into a file." To address this linguistic discrepancy, we conceptualize SAVE with two slots in the DM: an ACTEE and a TARGET slot The TARGET slot is then accurately translated into either a nonorienting location for Russian and Bulgarian or a destination for Czech within the slot-mapper of the splizer.

(12) Save the file X in/into the directory Y

Save file X in directory-Loc Y

(b) Bg: Запазете файла X в директория Y save file X in directory Y

(c) Cz: Uložte soubor X do adresáře Y

Save file-Acc X into directory-Gen Y

3.1.2.1 Realisation as Prepositional phrase vs Nominal group

This is a variation that is presented in the Final Prototype for Means circumstances, which can be expressed in Czech and Russian by a nominal group in the instrumental case:

(13) By/with the OFFSET command you can create copies.

(a) Ru: Командой OFFSET Вы можете создать копии

Command-Ins OFFSET you-Nom can create copies-Acc

(b) Cz: Příkazem OFFSET můžete vytvářet kopie

Command-Ins OFFSET can-Pl-2 create copies-Acc

(c) Bg: С командата OFFSET можете да създадете копие

With command OFFSET can-Pl-2 create copies or by a prepositional phrase with the preposition с помощью-(by means of):

(14) By means of the OFFSET command you can create copies.

(a) Ru:С помощью команды OFFSET Вы можете создать копии

With-help-of command-Gen OFFSET you-Nom can create copies-Acc

(b) Cz: Pomocí příkazu OFFSET můžete vytvářet kopie

With-help-of command-Gen OFFSET can-Pl-2 create copies-Acc

(c) Bg: С помоща на командата OFFSET можете да създадете копие

With-help-of command OFFSET can-Pl-2 create copies

In the instructional texts of the Final Prototype, MEANS are generated as nominal groups in the Instrumental case only when they refer to abstract methods, such as “using one of the following methods.” In contrast, when discussing concrete instruments or functional objects, the realization is expressed through prepositional phrases, like “using a hardware tool” or “using a command.” This distinction is determined by the types of domain concepts involved.

In English, the choice of prepositions in spatial prepositional phrases (PPs) reflects the dimensionality of the objects involved For location and motion, English differentiates between three-dimensional objects (using "in" and "into"), one- or two-dimensional objects (using "on" and "onto"), and zero-dimensional objects (using "at" and "to") This distinction is crucial for accurately expressing spatial relationships.

[away-from-motion] (spacelocative : away-from)

(one-two-dimensions, rest-process) 

Basically the same system is valid for Slavic languages as well The systemic choice network above works for Bulgarian without alterations, except for lexicalization of the minor process:

(one-two-dimensions, rest-process) 

Aspect

Current approaches to aspect in Slavic languages, whether in descriptive or computational linguistics, often focus on classifying verbs based on their syntagmatic contexts However, from a natural language generation standpoint, it is essential to prioritize the distinct meanings conveyed by aspect This section outlines the theoretical foundations and the design and execution of the ASPECT region for Bulgarian, Czech, and Russian within the AGILE grammars.

The SPEC2 deliverable analyzed the linguistic properties of aspect in Bulgarian, Czech, and Russian from a traditional descriptive perspective It highlighted that aspect properties were not integrated with other grammatical categories in the computational grammars The study found that aspect choices were closely linked to other grammatical selections; for instance, imperative clauses consistently used the perfective aspect across all three languages, while medio-passive clauses employed the imperfective aspect in Russian and Bulgarian, and the perfective aspect in Czech This analysis was informed by the two text styles examined in the intermediate prototype.

In the present deliverable, we attempt to specify and implement a more general treatment of aspect starting from a meaning perspective We then move towards a formal specification and present the implementation.

The aspect category in Slavic languages reflects various perspectives on the temporal structure of events, distinguishing between imperfective and perfective verbs, as noted by Comrie (1976) and Пешковский (1934) Each verb typically belongs to one of these two classes, with rare overlap Aspect semantically conveys whether an event is ongoing, repetitive, or completed, and is informed by Vendler’s classification, which categorizes events into states, activities, achievements, and accomplishments Падучева expands on this by differentiating between controlled and non-controlled processes, illustrating how non-controlled achievements and accomplishments differ in their temporal characteristics Additionally, states are categorized into temporal and constant states, further refining our understanding of aspect The imperfective aspect emphasizes the ongoing phase of a process, while the perfective aspect highlights the completed state For instance, the imperfective imperative "chitayte" (read) signals the initiation of reading, whereas "Misha chitaet" (Mike is reading) indicates an action in progress Overall, these classifications and distinctions are crucial for understanding the intricate nature of aspect in Slavic languages.

In Slavic languages, the perfective aspect emphasizes the state following the completion of an action, making it incompatible with phase verbs like "begin," "continue," and "finish," which focus solely on specific stages of the process.

(a) Ru: (начинать / начать) писать / *написать

(begin-Imperf / begin-Perf) write-Imperf / *write-Perf

(b) Bg: (започвам / започна) пиша/*да напиша

(begin-Imperf / begin-Perf) write-Imperf / *write-Perf

(c) Cz: začínat / začít psát / *napsát

(begin-Imperf / begin-Perf) write-Imperf / *write-Perf

In Slavic languages, states, as classified by Vendler, or relational processes, according to the Upper Model, can only be expressed through the imperfective aspect, as they do not possess any states before or after the completion of the process.

An achieved state can be conveyed through a relational verb or a verb indicating an achievement For instance, the position of a button can be described as a relation, or alternatively, as the outcome of its insertion into a location using a material process in passive voice.

(22) The X button {can be found / lies } on the Y toolbar.

(a) Ru: Кнопка X находится (располагается) на панели Y

Button X finds-Refl (situates) on toolbar Y (b) Bg: Бутонът Х се намира на функционалния ред Y.

Button X finds-Refl on toolbar Y

(c) Cz: Tlačítko X {se nachází \ leží } na panelu Y.

Button X finds-Refl (lies) on toolbar Y

(23) The X button is situated on the Y toolbar

(a) Ru: Кнопка X расположена на панели Y

Button X is-situated on toolbar Y

(b) Bg: Бутонът Х е разположен на функционалния ред Y.

The X button is situated on the Y toolbar.

(c) Cz: Tlačítko X { je situováno \ je umístěno} na panelu Y.

Button X {is situated is placed} on toolbar Y

The verbs расположить (Ru), разполага (Bg), and situovat (Cz) convey the action of placing something In the context of the year 23, these verbs emphasize the resulting state of that action, specifically indicating that the button is located on the toolbar.

In the context of systemic-choice networks, the distinction between activity-highlighted and result-highlighted features influences the selection of imperfective and perfective aspects.

The imperfective aspect conveys the repetition or iteration of a process, highlighting either habitual actions or the occurrence of the same process multiple times An example illustrating the use of the imperfective aspect for multiple instantiations can be found in our sample texts.

(a) Ru: Нажимайте клавишу r каждый раз press-Imperf key r every time …

(b) Bg: Всеки път натискайте клавиша r.

Every time press-Imperf key r

Every time press-Imperf key r

A repeated process is more than just a simple iteration of actions; it involves a series of successive events that do not necessarily indicate a "repeated quality," which is essential for selecting the imperfective aspect For instance, the imperfective form is not particularly applicable to a defined number of steps in a sequence of commands.

(25) Double-click the mouse button

(a) Ru: Дважды нажмите мышь. click-Perf mouse twice

(b) Bg: Щракнете два пъти с мишката. click-Perf twice the mouse

(c) Cz: Dvakrát klikněte| klikejte myší.

Twice ClickPerf/ Click-Imperf the mouse

We see that the style consideration (instruction-step) is here the main for the aspect choice since in another discourse, the imperfect with quantification is quite possible

The repeated process feature in language is influenced by lexical, syntagmatic, and pragmatic contexts, as noted by Падучева (1996) This feature's selection can vary based on the verb's lexical meaning, syntagmatic constraints such as "каждый раз" (at every time), or different styles of event realization, exemplified by the contrast between "Нарисуйте отрезок прямой" and "Рисуется отрезок прямой." These constraints are often specific to individual languages, as seen in the differing aspect choices among Czech, Bulgarian, and Russian, where perfective aspect usage varies significantly Consequently, we propose a crucial distinction in aspect choice: repeated versus non-repeated actions.

The motivations behind aspect choice in languages like Bulgarian, Russian, and Czech are often complex and not immediately apparent In Bulgarian and Russian, the imperfective aspect in impersonal styles may indicate habitual actions or emphasize the ongoing stages of a process Conversely, Czech tends to favor the perfective aspect in similar contexts, focusing on the result Additionally, the selection of aspect can be influenced by the specific lexical meanings of verbs within perfective/imperfective pairs, as well as the interplay between aspectual and tense meanings For instance, in Bulgarian and Russian, the present tense combined with the perfective aspect denotes future tense However, in Czech, this form is more about text style than future expression, with its future meaning becoming secondary Furthermore, there are significant relationships between aspectual meanings and certain circumstantial elements, which are often utilized in descriptive linguistics as diagnostic tests; for example, the adverb често (Bg) and часто (Ru) can only be used with the imperfective aspect.

The choice of aspect in language can often be ambiguous, as seen in Bulgarian and Russian, where the imperfective aspect in impersonal styles may indicate habitual actions or emphasize the intermediate stage of a process In contrast, Czech tends to favor the perfective form in impersonal contexts, subtly underscoring the completion or result stage of an action.

In the AGILE system, aspect choice is governed by constraints from the Text Structuring Module, which overlooks crucial nuances of the Slavic aspectual system This system does not address how lexical differences between perfective and imperfective verbs influence aspect selection, nor does it account for the significant interplay between aspectual and tense meanings For instance, in Bulgarian and Russian, the present perfective form indicates future tense, conflicting with impersonal semantics, while in Czech, future semantics appear unaffected Additionally, there are connections between aspectual meanings and specific circumstantial elements, with co-occurrence constraints often serving as classification tests in descriptive linguistics For example, the adverb "Ru: часто" (often) in Russian and "Bg: често" in Bulgarian can only be paired with imperfective aspect verbs.

(a) Ru: Он часто писал (*написал) письма

He often wrote-Imperf letters

(b) Bg: Той често пишеше (*писа) писма.

He often wrote-Imperf letters

(c) Cz: Často psal dopisy (dopis)

Often wrote-Imperf-Sg-3 letters (a letter)

He often was writing letters (no assumption about whether a single or multiple letter(s) on each occasion)

(d) Cz: Často napsal dopisy (dopis)

Often wrote-Perf-Sg-3 letters (a letter).

He often wrote letters (multiple letters on every occasion)

In Czech, the perfective aspect is utilized more frequently compared to Bulgarian and Russian Additionally, it allows for the use of perfective with expressions that indicate habituality or repetition, such as "často," meaning "often."

Modality

In the deliverables SPEC2 and IMPL2, we explored the MOOD system within the interpersonal domain across three Slavic languages, providing a computational analysis This chapter focuses on the second key system at the clause level, MODALITY We will outline the linguistic specifications pertinent to the Final Prototype's target text (Section 3.3.1), present a formal description (Section 3.3.2), and detail the implementation process (Section 3.3.3).

Instructional texts typically have limited modality options In this context, we focus on the specific modalities present in the target texts of the Final Prototype.

The Final Prototype emphasizes the generation of modal clauses that express possibilities within a software tool, particularly focusing on user options and necessary conditions This approach aligns with the newly introduced styles in the Final Prototype, enhancing the functional description and overall overview of the tool.

The examples given in example (31) below show typical expressions of possibility in Bulgarian, Czech and Russian as they occur in our corpus for the Final Prototype.

(31) You can draw an arc.

(a) Bg: Вие можете да начертаете дъга.

You can-Pl2 draw-Pl2 arc

You can-Pl2 draw-Inf arc

(c) Ru: Вы можете нарисовать дугу

You can-Pl2 draw-infinitive arc-Acc

In English and the two other languages, possibility modality is conveyed through a modal auxiliary similar to "can." The structure for expressing "general possibility" remains consistent across these languages, utilizing a Modal Operator followed by a Main Verb.

Another kind of modality expression we consider in the Final Prototype is illustrated in example (32) below All of them have the meaning of enabling someone to do something.

The process involves two key roles: the Initiator, who provides the opportunity to act, and the Agent, who executes the action.

(32) The system enables (you) to create a multiline.

(a) Bg:(i)Системата позволява да създавате мултилиния.

System-Det enable-Sg3 create-Pl2Imperf multiline

(ii) Системата позволява създаване на мултилиния.

System-Det enable-Sg3 create-nominalization of multiline

(b) Cz:(i) Systém umožňuje vytvořit multičáru.

Systém-Sgenable-Sg3 create-Inf Perf multiline.

(ii) Systém umožňuje vytvoření multičáry.

Systém-Sg enable-Sg3 create-nominalization multiline.

(c) Ru: Система позволяет создавать мультилинии

System-Sg3 enable-Sg3 create-Infinitive multiline

We account for the structure of such constructions in the verbal group complex

All types of verbal group complexes in English are described in (Halliday 85) Figure 14 bellow shows the classification of verbal group complexes in English with examples Verbal group complexes

“She got killed, got run over by a lorry.”

”I neither like nor dislike it.”

“He tried, but failed, to extract the poison.

(ii) Hypotaxis: “tried to do”

Primary verbal group (“try”) carries the mood of the clause

Secondary verbal group (“do”) has a dependent status

“stops doing”, ”turns out to do”

”avoid doing”, ”learn to do”

”happen to do”, ”help to do”

(ii)-II Projection- mentalized or verbalized

“want to do”, ”pretend to do”, “be scared to do”

Figure 14: Classification of Verbal group complexes semanics (Halliday 94)

The classification of verbal group complexes is applicable to Slavic languages, particularly focusing on hypotactic extending verbal group complexes, also known as conation, as this is the only type present in our target texts There are four distinct subtypes within this category of verbal group complexes.

(33) ”try to do”, “attempt to do”, ”avoid doing”

(a) Bg: опитвам да стремя се да избягвам да

Try to attempt-Refl to avoid to

(b) Cz: zkoušet pokusit se vyhnout se try to attempt to avoid doing

(c) Ru: попробовать сделать постараться сделать

Try do-Inf attempt do-Inf.

(34) ”succeed in doing”, “menage/get to do”, “fail doing/to do”…

(a) Bg: успявам да направя пропускам да направя succeed to do fail to do

(b) Cz: uspět podařit se selhat

Succeed manage to do fail

(c) Ru: удаваться сделать succeed do-Inf.

(35) ”be (un)able to do”, “know how to do”

(a) Bg: (не съм) в състояние съм да направя зная да правя

(not be-Sg1) able be-Sg1 to do know-Sg1 to do

(b) Cz: být (nebýt) schopen vědět, jak be (not to be) able know how to

(c) Ru: быть в состоянии сделать знать, как сделать

Be-Inf able do-Inf know how do-Inf.

(36) ”learn to do”, “practise doing“

(a) Bg: уча се да правя упражнявам се да правя learn-Refl to do exercise-Refl to do

(b) Cz: učit se cvičit learn practise

(c) Ru: научиться делать learn-Refl do-Inf.

Conation in the three languages is typically realized in active voice, either with a reflexive form or as a causative structure Passive constructions are atypical for Slavic languages

In Bulgarian and Czech, distinct grammatical constructions are used to convey conation Bulgarian employs a "da-construction" and nominalization following the primary verb, while Czech utilizes either an infinitive or nominalization after the primary verb In contrast, Russian exclusively uses the infinitive as a secondary verb.

Conative and Reussive verbal group complexes typically involve a single individual in both primary and secondary processes, often utilizing the active voice In contrast, Potential and Achieval structures are more commonly expressed through a causative format, clearly identifying both the Initiator and the actual "doer" of the action.

(37) Enable someone to do something

(a) Bg: давам възможност/позволявам някому да прави нещо give possibility /enable so-Dat to do sth

(b) Cz: Umožnit někomu něco udělat.

Enable so-Dat sth to do

(c) Ru: давать возможность кому-нибудь делать что-нибудь give possibility so-Dat do-Inf sth

(38) Teach someone to do something

(a) Bg: уча някого да прави нещо teach so to do sth.

(b) Cz: Učit někoho něco dělat. to teach so sth do-inf

(c) Ru: учить кого-нибудь делать что-нибудь teach so do-inf sth

In the discussed example, the semantics indicate a general possibility, rendering the actual agent of the secondary verb implicit and generic This is achieved through the use of an infinitive form of the secondary verb in Czech and Russian, while Bulgarian employs nominalization Additionally, this characteristic can be emphasized by omitting the explicit mention of the agent in the secondary process, as illustrated in the following examples.

(39) Implicit Agent of secondary process

(a) Bg: Системата позволява да правите това.

System-Det allow-Sg3 to do-Pl2 it

(b) Cz: Systém umožňuje udělat to.

Systém allow-Sg3 to do-Inf smth.

(c) Ru: Система позволяет делать/сделать это

System allow-Sg3 do-Inf.Perf/Inf.Imperf it.

(40) Explicit Agent of secondary process

(a) Bg: Системата Ви позволява да правитетова.

System-Det you-AccPl2 allow-Sg3 to do-Pl2 it

(b) Cz: Systém vám umožňuje udělat to.

Systém you-DatPl allow-sgš to do it.

(c) Ru: Система позволяет Вам делать/сделать это.

System allow-Sg3 you-Dat do-Inf.Perf/Inf.Imperf it.

The next section presents the formal grammar specifications for the three Slavic grammars to cover these types of modality, made on the base of English grammar Nigel.

We adopted the main part of system network presented bellow from Nigel grammar

MODAL (+MODAL, MODAL::MODAL-AUX)

If a modal operator is specified in the clause

In Bulgarian, Czech, and Russian, modal verbs exhibit tense forms; however, for the purpose of our examples and the absence of various tenses in our specific register, we will maintain the "temporal-modal" distinction.

The next three systems deal with the semantics of modal verb.

If the process is ambient

Elseif the process is existential

Then choose NONVOLITIONAL Elseif the modal operator is explicitly specified as volitional

Then choose VOLITIONAL Else choose NONVOLITIONAL

Chooser: DEGREE-OF-MODALITY-CHOOSER

If a modal operator is explicitly specified as necessity

GENERAL-POSSIBILITY (MODAL::NOT-ABILITY-AUX)

If the process is ambient

Elseif the process is existential

Then choose GENERAL-POSSIBILITY Elseif the modal operator is explicitly specified as ability

Then choose ABILITY Else choose GENERAL-POSSIBILITY

The sentence generation process for example (1) involves the NONVOLITIONAL, POSSIBILITY, and GENERAL-POSSIBILITY branches, utilizing features categorized under Modal as possibility-aux and not-ability-aux This system operates uniformly across all three languages.

MODALITY-CONDITIONAL (FINITE::CONDITIONALITY-AUX,

FINITE^MODAL) MODALITY-NONCONDITIONAL (FINITE/MODAL)

If the clause is explicitly specified as conditional

If the clause is explicitly specified as negative

In Bulgarian, modal verbal groups are formed using a modal verb combined with a "da-construction," which includes the particle "da" followed by the main verb that matches the finite verb in person and number The main verb in this structure functions as an Auxstem A comprehensive discussion on finite-Auxstem agreement can be found in the Agreement chapter (INSERT REFERENCE HERE LATER) of this document.

In Czech and Russian, the main verb in modal constructions is an infinitive, which is represented in the grammatical framework through the inflection of the AuxStem function In contrast, Bulgarian employs a "da-construction" that involves inserting the da-particle, which lexifies the construction and dictates that the da-particle precedes the AuxStem Additionally, finite AuxStem agreement in person and number is observed in Bulgarian, highlighting a key distinction from Czech and Russian.

In Czech and Russian Auxstem is inserted and inflectified as Stem.

AUXSTEM-INSERTED BG: (+AUXSTEM, +DA-PARTICLE,

DA-PARTICLE!DA, DA-PARTICLE^AUXSTEM, FINITE~AUXSTEM PERSON-FORM,

FINITE~AUXSTEM NUMBER-FORM) CZ: (+AUXSTEM, AUXSTEM:::IFINITIVE-FORM) RU: (+AUXSTEM, AUXSTEM:::IFINITIVE-FORM)

In the three languages, the primary verb in the modal construction serves as the voice carrier, achieved through the conflation of Auxstem and Voice within the AUXSTEM-VOICE system.

Nigel grammar does not include the implementation of verbal group complexes, focusing solely on the PHASE verbal group This principle serves as the foundation for structuring the system network to generate CONATION verbal group complexes The proposed solution is applicable across three languages, with explicit emphasis on the differences highlighted in the subsequent presentation.

The system of CONATION introduces Conation and Conationdependent elements into the structure of the clause to be generated

If the clause contains conation verbal group complex

Each subtype of CONATION leads to particular lexical items in each language See the system CONATION-TYPE:

CONATIVE leads to the choice of the verb “try”

REUSSIVE leads to the choice of the verb “succeed”

POTENTIAL leads to the choice of the verb “allow”

ACHIEVAL leads to the choice of the verb “learn”

If the verbal group complex is explicitly specified as conative

Elseif the verbal group complex is explicitly specified as

Then choose REUSSIVEElseif the verbal group complex is explicitly specified

As expressing potential Then choose POTENTIAL Else choose ACHIEVAL

Subject dropping

Czech and Bulgarian are examples of pro-drop languages, meaning that the subject in declarative clauses can be unexpressed or implicit This feature allows for the subject to be omitted, akin to how it is often left unstated in imperative sentences in English In contrast, Russian, like English, does not typically drop subjects, as illustrated in the provided examples.

(a) En: Open the Styles dialog box.

(b) Cz: Otevřete dialogové okno Styles.

Save-Pl2 dialogue window-Acc Styles.

(c) Bg: Oтворете диалоговия прозорец Styles.

Open-Pl2 dialogue window Styles.

(d) Ru: Oткройте диалоговое окно Styles.

Open-Pl2 dialogue window-Acc Styles.

(a) En: We open the Styles dialog box.

(b) Cz: Otevřete | Otevřeme dialogové okno Styles.

Save-Pl2 | Save-Pl1Pl1 dialogue window-Acc Styles.

(c) Bg: Oтваряте | Oтваряме диалоговия прозорец Styles

Open-Pl2 | Open-Pl1 dialogue window Styles.

(d) Ru: Вы oткроете | Мы oткроем диалоговое окно Styles

Open-Pl2 | Open-Pl1 dialogue window -Acc Styles.

In sentences where the subject is implied rather than explicitly stated, the main finite verb still shows agreement in gender and number with the subject It is crucial to differentiate this scenario from sentences that completely lack a subject For instance, in reflexive passive constructions, the goal of a non-transitive verb is expressed through a prepositional phrase, which prevents it from functioning as a subject in the passive voice Only the goals of transitive verbs, which appear in the accusative case in the active voice, can assume the subject role in passive constructions.

Cz: Klikne se na tlačítko OK

Clicks refl on button OK

One clicks on the OK button / The Ok button is clicked on.

In this case, in a clause with no Subject, the Finite element takes the third person singular form in Czech (Hana Skoumalová, p.c.)

John Bateman proposes a novel perspective on the Finite element in clauses with dropped Subjects, arguing that its gender and number do not align with a non-existent Subject Instead, it functions similarly to markers in switch-reference languages, indicating whether the Subject has changed Additionally, the dropped Subject may refer to entities from previous clauses that were not initially Subjects but can be pronominalized This introduces a compelling dimension to switch-reference scenarios, although further exploration of this alternative view will need to be postponed for future research.

(a) En: Do you open the Styles dialog box?

(b) Cz: Otevřete dialogové okno Styles?

Save-Pl2 dialogue window-Acc Styles?

(c) Bg: Oтваряте ли диалоговия прозорец Styles?

Open-Pl2 dialogue window Styles

(d) Ru: Вы oткроете диалоговое окно Styles?

You open-Pl2 dialogue window-Acc Styles?

The similarity between imperative and indicative moods regarding subject implicitness is limited to interactants In imperative sentences, the subject is usually the hearer, while in indicative sentences, the subject can be either an interactant or a non-interactant, allowing for its omission.

(47) Indicative: Declarative mood (cont after (45))

(a) En: It appears in the middle of the screen.

(b) Cz: Objeví se uprostřed obrazovky.

Appears refl middle-Prep screen-Gen.

(c) Bg: Появява се в център на экрана

Appears refl in middle of screen.

(d) Ru: Oно появится в центре экрана

It appears-refl v middle screen-Gen.

The criteria for omitting a subject closely resemble those for subject realization through a weak or unstressed pronoun form in non-pro-drop languages such as English.

The subject of discussion must be prominent within the context, either through the text that immediately precedes it or within the shared circumstances between the speaker and the listener.

The Subject's entity should be presented as Given, meaning it is contextually bound and must not be contrasted with another entity, as outlined in Section 3.5.

In Czech and Bulgarian, the dropping of a Subject that refers to a contextually salient entity is common, particularly when the Subject pertains to an interactant in instructional texts Such interactants are typically considered sufficiently salient, allowing for the Subject to be omitted unless it is introduced as New or contrasted with another entity While contrast between an interactant and another entity can be found in some instructional texts, as shown in the provided examples, these instances do not appear in the instructional materials we are producing in Agile.

Systém zobrazí obrys okna Vy pak musíte určit jeho polohu.

System displays frame window-Gen You-nom then must-Pl2 specify its location- Acc

The system displays a window contour You then have to specify its location.

Style-Acc must-Pl2 define you-Nom.

The style must be defined by you / You must define the style yourself.

In the analysis of sentence structures, the interactant-Subject "vy" (you) is identified as Given, contrasting with the non-interactant Subject "systém" (system) from the previous sentence Conversely, in another instance, the interactant-Subject is introduced as New within the information structure, highlighting a contrast Notably, in Agile target texts, interactants are consistently presented as neither new nor contrasted, resulting in the omission of Subjects that refer to interactants.

In Bulgarian, the situation is similar, as illustrated in the following parallel examples.

(50) Bg: Системата изобразява рамката на прозореца

System displays frame of window-det

The system displays the window contour. Вие трябва да окажете неговото място.

You must-Pl2 specify its location-Acc

You have to specify its location.

(51) Bg: Вие трябва да дефинирате стила you must define-Pl2 style-det

You must define the style yourself

(52) Bg: Стилът трябва да се дефинира от Вас. style-det must-Pl2 refl define by you-Acc.

The style must be defined by you.

(53) Стилът трябва да се дефинира от Вас. style-det must-Pl2 refl define-refl by you-Acc

The style must be defined by you

Non-interactants gain salience within specific contexts, either situationally or textually While an instructional text's writer may assume that the reader can observe certain screen configurations or objects, the shared situational context alone does not ensure that these entities are sufficiently prominent To enhance clarity, the writer must explicitly introduce these elements into the text Consequently, situational salience alone is inadequate for subject dropping in instructional texts, as illustrated in the following example.

Stiskněte Ctrl-O pro otevření dokumentu Vyberte soubor a klepněte na OK.

Press Ctrl-O for opening document-Gen Choose file-Acc and click on OK.

Press Ctrl-O to open a document Choose a file and click OK.

(55) Bg: Натиснете Ctrl-O за отваряне на документа

Press-Pl2 Ctrl-O for opening of document-det Изберете файл и натиснете OK.

Choose file and click OK.

Press Ctrl-O to open a document Choose a file and click OK.

In both Czech and Bulgarian languages, the subject of the final sentence cannot be omitted based on the situational context of the dialogue Given that situational salience is inadequate for subject omission in written instructions, only textual salience is taken into account in the texts produced in Agile.

Textual salience is often viewed as a spectrum, where entities mentioned more recently in a text are perceived as more prominent, while the significance of those referenced earlier diminishes over time.

Hajičová (1993) emphasized the importance of considering information structure and referring expressions to assess the salience of an entity once it is introduced into a context or co-referred to later Additionally, when analyzing a text as composed of hierarchically related segments that serve as "focus spaces" (Grosz and Sidner 1986), the salience of entities in an embedded segment can significantly decrease when that segment is closed off Conversely, when a segment is revisited, the salience of the entities within it can be restored.

However, Subject dropping is a rather local phenomenon 12 It seems impossible to drop a non-interactant Subject unless it was explicitly referred to in an immediately preceding

The difference between written and spoken instructions lies in the insufficiency of situational salience for subject dropping in written contexts In spoken communication, both the speaker and listener share the same situational context, making it easier for subject dropping to occur This distinction is illustrated by the following examples, where the first sentence is provided, along with alternative continuations.

Specify the name of the file

(a) Objeví se v horní liště okna editoru

Appears refl in top ledge window-Gen editor-Gen

It appears in the title bar of the editor window.

(a) ? Stiskněte Enter Objeví se v horní liště okna editoru

Press Enter Appears refl in top ledge window-Gen editor-Gen

Press Enter It appears in the title bar of the editor window.

(b) Stiskněte Enter Název se objeví v horní liště okna editoru

Press Enter Name refl appears in top ledge window-Gen editor-Gen Press Enter The name appears in the title bar of the editor window.

While (56)+(a) is acceptable, (56)+(57)(a) lacks clarity due to the dropped Subject, making it less effective Although the referent can be inferred through lexico-semantic cues, our assessment is that Subject dropping is inappropriate in this instance In contrast, (56)+(57)(b) presents a seamless and natural flow of text.

In Bulgarian, the Subject cannot be dropped in any of continuations Either a personal pronoun or a nominal group has tobe used, as illustrated beow:

(58) Bg: Задайте име на файла

Specify name of file-det

Specify the name of the file

(a) То се появявя в заглавния ред на прозореца it appears refl in top ledge of window-det

It appears in the title bar of the window.

Word order

Generating the correct ordering of clause elements is a crucial challenge in natural language generation (NLG), particularly for creating fluent spoken or written text Improper word order can diminish fluency and lead to misinterpretation, as seen in English examples where quantifier scopes and prepositional phrase attachments vary based on order This issue is even more significant in languages with flexible word order, such as Slavic languages, where different arrangements can convey distinct meanings Therefore, selecting the appropriate word order for a specific context is essential for effective NLG in these languages.

A detailed discussion of the phenomena concerning ordering elements at the clause level has been presented in LSPEC2, 14 and we rehearse one example in Section 3.5.1 below.

Slavic languages exhibit a notable degree of word order flexibility, yet this flexibility is not arbitrary; it is influenced by the discourse context and communicative intentions Variations in word order often signify differences in the information status of entities and processes, particularly regarding their familiarity and salience to the listener Additionally, similar to other languages, word order plays a crucial role in text organization, impacting how information is presented to the audience.

The SFG framework emphasizes the importance of thematic and information structures, which relate to the arrangement of elements within a broader context In our LSPEC2 and IMPL3 deliverables, we expanded on Halliday's Systemic Functional Grammar (SFG) by integrating concepts from the Praguian Functional Generative Description (FGD) This collaboration led to the development of an ordering algorithm that synthesizes these insights for enhanced information structure analysis.

The proposed approach in the LSPEC2 and IMPL3 deliverables was not implemented in the Intermediate Prototype due to the absence of the :contextual-boundness parameter handling in KPML2.0 With the support provided in KPML2.1 and later versions, we can now effectively apply our flexible ordering algorithm.

In this chapter, we present our unique approach to managing word order within the AGILE project We begin by illustrating the word ordering phenomena observed in our three target languages, as previously discussed in LSPEC2, to enhance readers' comprehension of the concepts employed in our ordering algorithm and the challenges it addresses (Section 3.5.1) Additionally, we provide a concise overview of our text planning strategy and the interaction between the text planner and the sentence generator.

14 See also (Kruijff-Korbayová et al in prep.). order to make clear the interaction between the modules involved in text generation (Section3.5.2).

In Section 3.5.3, we introduce a flexible ordering algorithm that considers both grammatical constraints and information structure when organizing elements within a clause This approach has been implemented in Czech and Russian, as detailed in Sections 3.5.3.2 and 3.5.3.3, respectively Our methodology preserves the Systemic Functional Grammar (SFG) concept of Theme, positioning it at the beginning of a clause based on text organization For non-thematic constituents, we adopt principles from Functional Grammar Theory (FGD), employing systemic ordering while distinguishing between contextually bound and non-bound elements Contextually bound elements are organized separately from non-bound ones, with the latter following systemic ordering, while the former may deviate from it, although systemic ordering serves as the default guideline.

Languages exhibit varying degrees of structural constraints on word order, influenced by both grammar and information structure In Czech, clitics must adhere to Wackernagel’s position, which necessitates specific structural constraints, detailed in Section 3.5.4.1 Conversely, Bulgarian relies solely on structural constraints for word ordering, similar to the Nigel grammar approach for English, with its implementation outlined in Section 3.5.4.2.

3.5.1 Word Order Freedom in Czech, Russian and Bulgarian

In Czech, Russian, and Bulgarian, sentences that vary only in word order are not interchangeable within the same context However, the flexibility of word order differs among these three languages To highlight how variations in word order can convey different meanings, we will examine examples from each language These examples demonstrate the relationship between word order and information structure in these languages.

Open-imp file-Acc command-Ins Open

Open a|the file by the Open command.

File-Acc open-imp command-Ins Open.

Open the file by the Open command.

(c) Příkazem Open otevřete soubor. command-Ins Open open-imp file-Acc

By the Open command open a file.

(d) Příkazem Open soubor otevřete. command-Ins Open file-Acc open-imp

By the Open command open the file.

(e) Otevřete příkazem Open soubor. open-imp command-Ins Open-imp file-Acc

Open a file by the Open command.

File-Acc command-Ins Open open-imp.

Open the file by the Open command.

The sentence in (66)(a) is neutral and can be used unexpectedly or in response to the question, "What should we do?" It does not imply anything about the existence of a file or its specific identity, presenting both the file and command as contextually unrestricted.

The sentence in (66)(b) is suitable when a specific file is prominent, such as when a user is actively working with it, necessitating the use of the definite article in the English translation The act of opening the file may also be significant, but it is not essential This construction is appropriate in contexts where questions like "What should we do with the file?" or "How should we open the file?" arise, indicating that the file is contextually relevant while the command remains more general.

The verb form in (66) (b) is homonymous in both imperative and declarative modes While the declarative sentence can convey a generic meaning, such as "Any file can be opened by the Open command," it is essential to interpret the verb as not tied to a specific instance of opening This article focuses on the meanings related to a singular action of opening as illustrated in example (66).

The presumption of file salience and the Open command is highlighted in (66)(f), indicating their contextual relevance The appropriate contexts for using (66)(f) can be identified by addressing the question: What actions should we take regarding the file using the Open command?

The Open command is inherently significant, as indicated in (66) (c, d, e) Additionally, (66) (d) assumes the presence of a specific file, which is why the definite article is utilized in English Contexts for (66) (c) often revolve around the inquiry, "What should we do with the Open command?" It can also be applied in scenarios framed by similar questions.

When discussing the use of various commands or tools, it's important to note that the Open command may not always be prominent In this context, the file being referenced is considered non-specific, while the command itself is contextually relevant This distinction highlights the flexibility of command usage in different scenarios.

In the context of (66) (d), the question arises: "What should we do with the file using the Open command?" Here, both the file and the command are contextually linked, making this interpretation the most limited.

Textual conjunction

To ensure that readers can easily follow instructional texts, it is essential for the content to clearly present the hierarchical organization of tasks and the sequence of steps involved This organization can be achieved through effective layout and the use of hypotactic relationships between clauses Properly structuring the text not only aids in clarity but also emphasizes the importance of sequencing steps, which should be marked explicitly when necessary.

The content within an A-box is sequentially related, meaning that the order of sub-steps indicates the order in which they should be executed To effectively communicate these instructions, the output must mirror this sequence through the arrangement of sentences As the complexity of the input increases, a more structured approach to the generated text becomes necessary Two examples of how to achieve this through well-organized content are illustrated in Figures 62 and 63, which are referenced from TEXS3:3.2.1 for further discussion.

To draw a line and arc combination polyline

First start the PLINE command using one of these METHODs:

Windows: From the Polyline flyout on the Draw toolbar, choose Polyline.

DOS and UNIX: From the Draw menu, choose Polyline.

Then specify the start point of the line segment and the endpoint of the line segment.

First switch to Arc mode by entering a The Arc mode confirmation dialog box appears Select

Then specify the endpoint of the arc

First return to Line mode by entering l The Line mode confirmation dialog box appears Select

Then enter the distance and angle of the line in relation to the endpoint of the arc

4 Press Return to end the polyline.

Figure 62: A combination of sequence styles: using numbered list for the top-level GOALs, and explicit sequence discourse markers for the lower-level GOALs, along with aggregation.

To draw a line and arc combination polyline

1 Start the PLINE command using one of these METHODs:

Windows: From the Polyline flyout on the Draw toolbar, choose Polyline.

DOS and UNIX: From the Draw menu, choose Polyline.

2 Specify the start point of the line segment.

3 Specify the endpoint of the line segment.

Then draw an arc segment.

1 Enter a to switch to Arc mode The Arc mode confirmation dialog box appears

3 Specify the endpoint of the arc

Then draw another line segment.

1 Enter l to return to Line mode The Line mode confirmation dialog box appears

3 Enter the distance and angle of the line in relation to the endpoint of the arc

Finally, press Return to end the polyline.

Figure 63: A combination of sequence styles: using explicit sequence markers for the top-level

GOALs within the list of steps, and a numbered list for the lower-level steps.

In Section 3.2 of the TEXS3 deliverable, we explored various alternative methods for presenting complex content Specifically, in TEXS3:3.2.1, we emphasized the importance of explicitly marking sequences to aid readers in navigating texts that describe intricate structured tasks We proposed distinguishing between different styles of sequence realization to enhance clarity and comprehension.

 Unmarked running text sequence (realised by a continuous paragraph where the elements in the sequence do not include any overt sequence markers)

 Linguistically marked running text sequence (realised by a continuous paragraph where the elements in the sequence include overt sequence discourse markers)

 Unmarked list sequences (realised by bullet lists in which the elements in the sequence do not include any overt sequence discourse markers)

Numbered list sequences are structured lists where the items are organized numerically, and the numbering alone effectively indicates the order of the elements without the need for additional discourse markers This format ensures clarity and coherence in presenting information, making it easier for readers to follow the sequence.

 Linguistically marked list sequences (realised by lists of elements in which the elements in the sequence include overt sequence discourse markers)

Linguistic sequence discourse markers, such as "firstly," "secondly," "now," and "finally," play a crucial role in organizing content These expressions, along with their counterparts in Bulgarian, Czech, and Russian, help to create clear and coherent sequences in discourse For instance, examples (99), (100), and (101) demonstrate how these markers are utilized in English, Czech, Bulgarian, and Russian to convey the same information effectively.

3 and more elements: First, (Then) +23 , Finally

23 (X)+ means one or more occurrences of 'then'.

3 and more elements: Отначало, (След това) + , Накрая

3 and more elements: Nejprve, (Potom) + , Nakonec

2 elements: Сначала, Затем (Теперь is also possible, but the context of its usage is restricted to the here-and-now interaction style)

3 and more elements: Сначала ,(Затем) + , И наконец

Figure 64: Different linguistic markers depending on the number of elements in a sequence

1 Choose Element Properties The Element Properties dialog box appears

2 Enter the offset of the multiline element in the Element Properties dialog box

3 Select Add to add the element

1 Vyberte Element Properties Objeví se dialogové okno Element Properties

2 V dialogovém okně Element Properties zadejte offset elementu multičáry

3 Pro přidání elementu vyberte Add

1 Изберете Element Properties Появява се диалоговият прозорец Element Properties

2 Въведете отместването на елемента на мултилинията в диалоговия прозорец Element

3 Изберете Add, за да добавите елемента

1 Нажмите кнопку Element Properties Появится диалоговое окно Element Properties.

2 В диалоговом окне Element Properties введите смещение элемента мультилинии

3 Нажмите кнопку Add, чтобы добавить элемент

First choose Element Properties The Element Properties dialog box appears

Then enter the offset of the multiline element in the Element Properties dialog box

Finally select Add to add the element

Nejprve vyberte Element Properties Objeví se dialogové okno Element Properties

Potom v dialogovém okně Element Properties zadejte offset elementu multičáry

Nakonec vyberet Add pro přidání tohoto elementu

(c) Bulgarian Отначало изберете Element Properties Появява се диалоговият прозорец Element

Properties. След това въведете отместването на елемента на мултилинията в диалоговия прозорец

Element Properties. Накрая изберете Add, за да добавите елемента

(d) Russian Сначала нажмите кнопку Element Properties Появится диалоговое окно Element Properties. Затем диалоговом окне Element Properties введите смещение элемента мультилинии И наконец Нажмите кнопку Add, чтобы добавить элемент.

To begin, select Element Properties to open the dialog box Next, input the offset for the multiline element within this dialog Finally, click Add to incorporate the element into your project.

Nejprve vyberte Element Properties Objeví se dialogové okno Element Properties Potom v dialogovém okně Element Properties zadejte offset elementu multičáry Nakonec vyberte Add pro přidání tohoto elementu

(c) Bulgarian Отначало изберете Element Properties Появява се диалоговият прозорец Element

Properties След това въведете отместването на елемента на мултилинията в диалоговия прозорец Element Properties Накрая изберете Add, за да добавите елемента

(d) Russian Сначала нажмите кнопку Element Properties Появится диалоговое окно Element Properties Затем диалоговом окне Element Properties введите смещение элемента мультилинии И наконец нажмите кнопку Add, чтобы добавить элемент.

The classification of sequence styles highlights the interconnectedness of layout and linguistic realization in sequence presentation The choice between a list and running text pertains to layout, while the use of numbering versus discourse markers relates to linguistic choices Detailed discussions on text and sentence planning regarding these decisions can be found in the TEXS3 and TEXM3 deliverables.

This section focuses on the use of explicit sequence markers in procedural texts, highlighting their role in connecting separate sentences through linguistic markers These markers indicate non-structural relationships, making the connections between related content clear By reflecting relationships such as sequence or cause, linguistic markers create cohesive bonds between multiple sentences, enhancing the overall coherence of the text.

In SFG, this type of cohesion is known as conjunction (Halliday 1985, pp 323 330).

To minimize terminological confusion, we will refer to the concept as textual conjunction Halliday identifies various meanings related to elaboration, extension, and enhancement, which can be conveyed through conjunctive adjuncts (such as adverbial groups or prepositional phrases) or conjunctions like and, or, nor, but, yet, so, and then Typically, conjunctive adjuncts and conjunctions are positioned at the beginning of a sentence within the Theme, a pattern observed not only in English, as highlighted by Halliday, but also in languages such as Czech, Russian, and Bulgarian.

The classification of textual conjunction proposed by Halliday is summarised below:

24 Cf the discussion of clause complexity in the LSPEC2 deliverable for an explanation of elaboration, extension and enhancement.

 Elaboration o Apposition (expository, exemplifying) o Clarification (corrective, distractive, dismissive, particularizing, resumptive, summative, verificative)

 Extension o Addition (positive, negative) o Adversative o Variation (replacive, subtractive, alternative)

 Simple external (following, simultaneous, preceding, conclusive)

 Complex (immediate, interrupted, repetitive, specific, durative, terminal, punctiliar)

 Simple internal (following, simultaneous, preceding, conclusive) o Manner

 Conditional (positive, negative, concessive) o Matter (positive, negative)

In the context of procedural instructions generated in the Final Prototype of the Agile system, a notable subtype of textual conjunction is simple external temporal enhancement This type emphasizes explicit linguistic markers that indicate the sequence of processes The term 'external' signifies that the conjunctions refer to the chronological order of the actions rather than the progression of the discourse itself.

An additional classification of textual conjunction is discussed by Martin (Martin 1992).

He distinguishes the following different types of sequence regulation:

In Agile project instructions, we utilize two types of sequence regulation: numerical temporal sequence regulation and temporal sequence regulation Numerical temporal sequence regulation involves explicitly numbering the steps in a sequence However, as outlined in the TEXM3 deliverable, this numbering is achieved not through traditional lexicogrammar but by incorporating appropriate HTML markup, which is then interpreted by a compatible viewer, such as Microsoft Explorer in the Final Prototype.

Temporal sequence regulation involves the explicit marking of sequences through linguistic discourse markers, specifically distinguishing between sequence regulation and simple external temporal enhancement Unlike numerical indicators, this form of regulation relies on language-specific expressions as textual conjunctions, necessitating the application of lexico-grammars.

In the next section, we present linguistic specifications pertaining to the temporal sequence regulation type of textual conjunction as developed for the Agile Final Prototype.

This section outlines the formal linguistic specifications for the temporal sequence regulation type of textual conjunction discussed earlier The implementations related to this topic will be detailed in the following section.

The specification of temporal sequence regulation starts by distinguishing a conjuncted clause from a nonconjuncted one.

Check whether the semantics specifies a conjunctive relation

The initial systems designed for regulating temporal sequences aim to differentiate this specific type of textual conjunction from others Primarily, they distinguish non-structural conjunctions from structural conjunctions, highlighting the unique characteristics of temporal relationships in text.

For an additional conjunctive relation, choose structural.

Next, it is decided whether the conjunctive relation can be lexified.

If the conjunction can be lexified, then choose lexified-conjunctive else choose openchoice-conjunctive

Now, a process-regulated textual conjunction is distinguished.

Chooser: CONJUNCTIVE-PROCESS-REGULATION-CHOOSER

If the conjunction relationship arises from a progression or a logical derivation, then choose process-regulated else choose not-process-regulated

Finally, the choice of textual conjunction type is made.

Chooser: PROCESS-REGULATED-TYPE-CHOOSER

When determining the appropriate regulation type, if the relationship is a logical consequence, opt for necessity-regulation If it involves a presentational, numerical, temporal, or logical sequence, select sequence-regulation For relationships defined by time, choose temporal-regulation, and if none of these apply, utilize spatial-regulation.

Once the feature sequence-regulation has been reached, a distinction is made between an absolute and a relative position in the sequence

If the conjunction relationship corresponds to an absolute position in the sequence then choose absolute-sequence else choose relative-sequence

In English, the realization corresponding to relative-sequence would be "further" As for the absolute position, there is another decision made:

Chooser: ABSOLUTE-SEQUENCE-CONJUNCTION-CHOOSER

Agreement

Agreement, or congruence, refers to the alignment of two or more syntactical units that share specific grammatical features such as case, number, gender, or person In languages like Czech, Bulgarian, and Russian, there are three distinct types of agreement.

L: line-FSg is-Sg3 disappeared-FSg

L: command-FSg is-Sg3 accessible-FSg

L: Command-ISgNom is-Sg3 accessible-NomISg.

L: Command-FSg is-accessible-FSg.

3 Agreement within the nominal group

(105) En: Enter the fifth external point.

Bg: Задайте петата външна точка.

L: Enter-Pl2 fifth-FSg external-FSg point-FSg

Cz: Zadejte pátý externí bod.

L: Enter-Pl2 fifth-ISgAcc external-ISgAcc point-ISgAcc

Ru: Введите пятую внешнюю точку.

L: Enter-Pl2 fifth-FSgAcc external-FSgAcc point-FSgAcc

Agreement can be categorized from two perspectives: syntactical and semantic While semantic agreement exists in some languages, it is relatively uncommon and not relevant to our focus Therefore, this discussion will concentrate solely on syntactical agreement.

In this chapter, we utilize specific abbreviations for morphological categories in word-by-word translations to English The order of these categories is as follows: Part of Speech (POS), Gender, Number, Case, and Person Categories that are not relevant or interesting in the context, such as case for finite verbs, are omitted Each category may have various values, although not all values are applicable in every language.

26 Sometimes more detailed than classical divison to 9 or 10 POS categories, e.g PastPart (past participle) This category is also omitted if it is the same for the English word.

 POS: Adj (adjective), PastPart (past participle), etc.

 Gender: M (masculine, in Czech masculine animate), I (masculine inanimate 27 ), F (feminine), N (neuter)

 Number: Sg (singular), Pl (plural)

 Case 28 : Nom (nominative), Gen (genitive), Dat (dative), Acc (accusative), Voc (vocative), Loc (locale) and Ins (instrumental)

Therefore, for example: FSg means feminine singular, Sg3 means Singular third person and NSgNom means neuter, singular, nominative

Case Nom, Gen, Dat, Acc, Loc, Ins

Table 3 – Comparison of morphological features relevant for agreement

In Czech, Bulgarian and Russian, a predicate 29 usually agrees with its nominative subject in person, number and gender (if applicable)

L: Command-FSg was-Sg3 accessible-FSg

L: Command-ISg was-ISg3 accessible-ISg.

28 Present only in Czech and Russian

29 By that we mean finite verb for simple verbal forms and all parts of compound verbal forms (See 3.7.2.1.1 for more details)

30 It is in fact past participle See 3.7.2.1.1 for more details

L: Command-FSg is-Sg3 accessible-FSg

L: Command-ISg is-Sg3 accessible-ISg.

(108) En: The system enables you to create a multiline style …

Bg: Системата позволява да създадете

L: System-FSg enable-Sg3 create-inf.

Cz: Systém umožňuje vytvářet styly multičár

L: System-ISg enables-3Sg to-create styles of-multilines

This holds even if this subject is realized by a zero pronoun (so called pro-drop) 31

(109) En: Enter the distance between

(110) En: Enter the distance between

Bg: Вие задайте разстоянието между

L: You-Pl2 enter-Pl2 distance between…

Cz: Vy zadejte vzdálenost mezi

L: You enter-2Pl distance between

If the subject is in a case different from nominative 32 (e.g., in genitive)

L: Five points-IPlGen disapeared-NSg3. or the category of case is inappropriate for the subject (infinitival or sentential subjects) 33

(112) En: To open a drawing is simple.

Cz: Otevřít kresbu je jednoduché.

L: To-open drawing-FsgAcc is simple. or if the verb has no subject at all (e.g meteorological verbs or certain feelings verbs) 34

In Czech and Bulgarian, as well as in Russian when using the imperative form, the subject may often be represented as a zero pronoun when it is not stressed, meaning the personal pronoun is omitted This phenomenon occurs in both indicative and imperative sentences However, when the subject needs to be emphasized, the personal pronouns must be clearly stated.

32 This is present only in Czech and Russian

33 Currently not present in our domain.

34 Not present in our domain.

(115) En: The button will be clicked 35

Cz: Klepne se na tlačítko.

L: Click-Sg3 refl on button. then the verb is assigned the default category of gender, number and person, which is neuter, singular and 3 rd person.

Number of the predicate is determined by grammatical number of the subject, no matter if it denotes single object or set of objects.

Compound verbal forms are constructed using finite forms of auxiliary verbs combined with nonfinite forms, such as infinitives and participles, of the main verb In Czech, these forms include the future tense, which uses an auxiliary verb followed by an infinitive (e.g., "já budu volat"), the past tense with an auxiliary and a past participle (e.g., "já jsem volal"), the present conditional with an auxiliary and a past participle (e.g., "já bych volal"), and the past conditional formed with the present conditional of the auxiliary plus a past participle (e.g., "já bych byl volal") Additionally, passive constructions utilize an auxiliary verb alongside a passive participle (e.g., "já jsem volán").

In Chapter 4, which focuses on mood and modality, a detailed description of compound verbal forms is provided It is essential to note that all words, excluding the infinitive, must agree with the subject in the same manner as finite verbs The primary distinction lies in the specific morphological categories that each word can accept.

Language Verbal form Gender Number Person

35 “na tlačítko” is adjunct in Czech and “klepnout” is intransitive verb, therefore when transformed into reflexive passive, there is no subject.

36 In Czech, the auxiliary verb is not present in the third person

37 That means: be + past part of be

38 For Bulgarian, it seems to be more natural to say that only finite agree with subject and other parts (infinitive, participle) agree with the finite

(117) En: You can save the line.

Bg: Вие можете да запазите линията.

L: you-Pl2 can-Pl2 save-Pl2DaConstr line

In Czech and Russian, the agreement between coordinated subjects can be complex However, for our purposes, we can simplify this by assuming that the predicate associated with a coordinated subject is always plural, and that the person remains consistent throughout the nominal group For a more in-depth analysis of this issue in Czech, refer to Bémová (1995).

In Czech, the gender of a predicate is determined by the minimal gender of the coordinated participants, following the order of masculine (m), animate (i), feminine (f), and neuter (n) This approach also addresses the straightforward scenario where all participants share the same gender Conversely, in Bulgarian and Russian, gender distinctions are not relevant in the plural form.

(118) En: The line and the box were deleted.

Cz: Úsečka a políčko byly smazány.

L: Line-FSg and field-NSg were-FPl3 deleted-FPl

Ru: Линия и окно были удалены.

L: Line-FSg and field-NSg were-Pl deleted-Pl

However in Czech, there is an exception: if all participants have neuter gender and at least one is in singular then the gender of the predicate is feminine 42 :

(119) En: The button and the box were enabled.

39 Russian and Bulgarian do not distinguish gender of past participles in plural.

40 Simple da construction do not distinguish gender.

In the context of grammar, the finite verb structure illustrates that the selection between masculine (m) and feminine (f) forms is not influenced by their inanimate counterparts (i) Notably, the plural verbal and adjectival forms for both feminine and masculine inanimate nouns are identical, indicating that the distinction between the two genders is irrelevant when considering their grammatical roles.

42 Just to make things looking more complicated (obě in the second clause has to be in neuter, therefore also the second verb has to be in neuter):

En: The button and the box were enabled and both disappeared.

Cz: Tlačítko a políčko nebyly povolené a obě zmizela

L: Button-NSg and field-NSg not-were-FPl3 enabled-FPl and both-NPl disappeared-NPl

Cz: Tlačítko a políčko byly povolené 43

L: Button-NSg and field-NSg were-FPl3 enabled-FPl

For more detailed description of agreement in Czech see [Kopečný 1962]

The primary challenge in implementation arises from the unknown number and gender of subjects when finite forms can be inflected, while inflection is not possible when these details are known To address this issue, an agreement operator will be utilized for clarity and consistency, applying the same approach to person as well.

The challenge lies in implementing linguistically plausible default values for finite structures To address this, we will treat these cases similarly to standard agreement, identifying values within subject-side systems and transmitting them through agreement operators to the predicate While this approach does not apply to sentences lacking a subject, such instances fall outside the scope of our domain.

Of course, each language uses only systems that it needs (Bulgarian omits systems dealing with case, Bulgarian and Russian omits inanimate gender, etc.)

Agreement systems rely significantly on the inflectional characteristics of the noun or pronoun that represents the subject In this context, we will first outline the essential features related to these inflectional properties.

T HING -C ASE - where ẻ {Nom, Gen, Dat, Acc, Voc, Loc, Ins}

T HING -N UMBER - where ẻ {Sg, Pl}

Not all of these properties are present in all languages and properties for one category need not to be in one system.

The feminine and neuter plural forms of verbs are not identical; the verb is actually in the feminine form An example of an incorrect sentence would feature the verb in the neuter plural.

Cz:* Tlačítko a políčko nebyla povolená

L: Button-NSg and field-NSg not-were-NPl3 enabled-NPl

44 Even for person there are some cases when the person of predicate is different from semantically derived person of subject:

En: Five of you came

L: Five you-PlGen2 came-NSg3

Predicate systems categorize based on the subject's classification, leading to two scenarios: agreement (SVAgreement) and non-agreement (SVNoAgreement) Agreement occurs when the subject is in the nominative case, while non-agreement is observed when the subject is in the genitive case.

SVA GREEMENT (Thing-Case-Nom)

Systems determining gender of the predicate (neuter is default):

S UBJ -A GR -G ENDER - (Thing-Gender- & SVAgreement)

Subj-Agr-Gender-N (Thing-Gender-N or SVNoAgreement)

Systems determining number of the predicate (singular is default):

S UBJ -A GR -N UMBER -S G (Thing-Number-Sg or SVNoAgreement)

Subj-Agr-Number-Pl (Thing-Number-Pl & SVAgreement)

Systems determining person of the predicate (3 rd person is default):

S UBJ -A GR -P ERSON -

(Pronoun-Person-

& SVAgreement)

(Pronoun-Person-3 or nominal-term-resolution or SVNoAgreement)

These systems are used to pass information determined by subject systems to appropriate words of predicate

System passing information to finite:

45 There are no infinitives or clauses in subject in our domain However, in the future, appropriate feature is just simply added into SVNoAgreement after Thing-Case-Gen

S UBJECT -F INITE -A GREEMENT (Finite-Inserted & Subject-Inserted)

(Subj-Agr-Number-Sg ~ :::Number-Sg-Form) (Subj-Agr-Number-Pl ~ :::Number-Pl-Form)

(Subj-Agr-Person-1 ~ :::Person-1-Form) (Subj-Agr-Person-2 ~ :::Person-2-Form) (Subj-Agr-Person-3 ~ :::Person-3-Form))

This system ensures that when Subject side systems determine number and person of the predicate (i.e enters feature Subj-Agr-*-*), finite is inflectified appropriately.

System passing information to past or passive participles:

( (Past-Participle-Inserted | Participle-Passive) &

Subject-Inserted) [Subject-AuxStem-Agreement]

(Subj-Agr-Number-Sg ~ :::Number-Sg-Form) (Subj-Agr-Number-Pl ~ :::Number-Pl-Form)

(Subj-Agr-Gender-M ~ :::Gender-M-Form) (Subj-Agr-Gender-I ~ :::Gender-I-Form) (Subj-Agr-Gender-F ~ :::Gender-F-Form) (Subj-Agr-Gender-N ~ :::Gender-N-Form))

This system ensures that when Subject side systems determine number and gender of the predicate, participle is inflectified appropriately.

Bulgarian resources use similar system to ensure agreement with infinitive (da construction), it connects it by agreement with finite:

A UXSTEM -I NSERT (Modal | P-Future | Da-phase)

^ Da-particle ^ Auxstem (Finite = AuxStem

(Person-First-Form ~ :::Person-First-Form) (Person-Second-Form ~ :::Person-Second-Form) (Person-Third-Form ~ :::Person-Third-Form) (Number-Sg-Form ~ :::Number-Sg-Form)

(Number-Pl-Form ~ :::Number-Pl-Form))

To address the issue of predicates with coordinated subjects, we can assume that the number of predicates is always plural and that the person remains consistent within the nominal group However, it is essential to identify the gender of the predicate and inflect it correctly This can be achieved by evaluating the minimal value of gender through the comparison of two adjacent elements.

Quantification

This section explores two key phenomena in Slavic languages: quantitative construction and quantity selection construction We focus on cardinal numerals that indicate the quantity of nouns, which can be easily modeled However, the quantity selection construction, such as "one of the following methods," presents greater complexity, particularly in Czech and Russian Unlike English and Bulgarian, Czech and Russian exhibit more intricate government patterns, with the cardinal numeral serving as the head of the construction.

In Czech, when the numeral is above 5, the counted object is not in nominative or accusative but in genitive case

In Russian, the expression of quantity has unique characteristics The cardinal number "odin" (one) functions as an adjective, agreeing with the noun in gender, number, and case Other cardinal numbers, such as "dva" (two), "tri" (three), and "chetyre" (four), are used with the noun in the genitive case when the entire nominal group is in the nominative or accusative This discussion focuses solely on inanimate objects, excluding the complexities of animate/inanimate distinctions Additionally, the use of animate quantities is not addressed in this context.

In the singular form, the term "Thing" is used, while for cardinal numbers above four, "Thing" is expressed in the plural In indirect cases such as genitive, dative, instrumental, and locative, cardinal numbers are also in plural form and must agree with "Thing" in case Additionally, the cardinal number "two" must match the noun in gender.

Two-INom points-IPlNom disappeared-IPl3.

Two-FNom points-FPlNom disappeared-Pl3.

Two points-Pl disappeared-Pl3

Five-INom points-IPlGen disappeared-NSg3.

Five-Nom points-FPlGendisappeared-NSg3.

Initial system for determining case of a nominal group is the following:

The concept behind the features of Thing-Case-PreNom/Gen is that it will default to nominative or genitive unless specific alterations occur, with current changes applicable only to nominative and accusative cases Additionally, this is not the sole method for expressing these cases, as variations exist, particularly for the genitive.

Currently we need this special behaviour only for nominative, accusative and genitive, therefore the system could be as following:

The previous treatment demonstrates greater consistency, as it relies on preselection from higher ranks Consequently, it is essential to utilize preselections with Thing-Case-PreNom instead of Thing-Case-Nom, as was done in earlier versions of Nigel.

Nominative and accusative cases can be converted to genitive under specific conditions To streamline input requirements for other systems, we combine the features Thing-Case-PreNom and Thing-Case-PreAcc into a single feature called NumerativableCases.

NumerativableCases (Or Thing-Case-PreNOM Thing-Case-PreAcc) [NumerativableCases]

Is the object numerified (This system differs from the original Nigel system Numeration in the input condition.):

If the object is numerified, the following system will determine if the number is higher that 4:

CHOOSER: More-Than-Four-Chooser

Following two systems partition Numerativable cases into two path:

1) Numerative – numerative case (genitive) will be used

2) NonNumerative – proposed case will be normally used

Numerative (NumerativableCases & More-Than-Four)

(NonNumerifiedX | Not-More-Than-Four))

Following systems determine really applied cases:

Thing-Case-Nom (Thing-Case-PreNOM & NonNumerative)

[Thing-Case-Nom Thing:::Case-Nom-Form

Thing-Case-Gen (Thing-Case-PreGen | Numerative)

[Thing-Case-Gen] Thing:::Case-Gen-Form

Thing-Case-Acc (Thing-Case-PreAcc & NonNumerative)

[Thing-Case-Acc] Thing:::Case-Acc-Form

Rest of the systems are simple:

Thing-Case- (Thing-Case-Pre)

[Thing-Case-] Thing:::Case--Form where ẻ {Dat, Voc, Loc, Ins}

The primary systems suitable for agreement in nominal groups between the noun and the coordinator are outlined The additions to the code are emphasized in bold All modifications restrict the system's input based on the previously described algorithm.

:INPUTS (AND PLURAL (OR NONNUMERIFIED MORE-THAN-FOUR

(and LESS-THAN-FIVE (or genitive dative INSTRUMENTAL PREPOSITIONAL))))

:OUTPUTS ((1.0 PLURAL-TO-THING (INFLECTIFY THING PLURAL-FORM)))

:INPUTS (AND ACCUSATIVE NOMINAL-GROUP-SIMPLEX

(OR INDIVIDUAL-NAME SINGULAR NONPLURAL NONNUMERIFIED eq-one))

:INPUTS (OR (AND RGENITIVE NOMINAL-GROUP-SIMPLEX)

(AND (OR ACCUSATIVE NOMINATIVE) (or MORE-THAN-FOUR LESS-THAN- FIVE)

This article discusses the implementation of events related to the appearance of THING within nominal groups in the genitive case It highlights how two systems facilitate the transmission of the genitive case from the nominal group to the terminal ORDINAL node.

Figure 94: Accusative with numeral above 4

Specify seven-Acc points-Gen.

Figure 95: Accusative with numeral not greater than 4 50

50 The numeral dva (two) is not correctly inflectified by Czech morphology.

Create two-Acc multiline-Acc.

Figure 96: Fragment of a sentence with genitive with a numeral

points-Acc five-Gen multilines-Gen

En: points of five multilines.

Figure 97: The numeral “5” having Acc of the all nominal group (Ru)

In Figure 98 an indirect case of the nominal group is presented – the noun is in Pl and in appropriate case (Dat), the cardinal agrees with it

We describe a model for the selective quantitative construction An example from our sample texts is given below:

(129) En: Open the dialog box using one of following methods

(a) Ru: Oткройте диалоговое окно одним из следующих способов:

Open dialog-Adj box-N-Acc one-Ins of following-Adj methods-Gen

(b) Bg: Отворете прозореца с един от тези методи

Open window-Det by one of these methods

(c) Cz: Otevřete dialogové okno jednou z následujících metod Open dialogue-Adj box-Acc one-Ins from following-Adj methods-Gen

When comparing English to Russian, a notable difference is the use of grammatical cases In Russian, the phrase "one of the following methods" utilizes the instrumental case for "odnim" (one) and the genitive case for "sleduyushich sposobov" (following methods) Conversely, English and Bulgarian do not employ grammatical cases in this context, highlighting a significant linguistic distinction.

Therefore, the syntactic modelling of this kind of structure is quite straightforward Figure

The structural representation of quantification phenomena in Bulgarian, as modeled by Nigel, reveals a flat phrase structure organization In this tree structure, the elements Quantifier, Qselector, Deictic, and Thing are arranged at the same hierarchical level, with the Thing implicitly serving as the head of the construction Notably, the structures in English and Bulgarian are identical, as illustrated in example (129).

Figure 99: The quantity selection construction as it is modelled for English and Bulgarian.

The representation of the phrase in Czech and Russian is inadequate, as the quantifier "odnim" serves as the head, while the remainder of the phrase functions as a postmodifier Additionally, within this postmodifier, the preposition "iz" governs its argument, "sposobov," indicating a more complex structure compared to English and Bulgarian.

100 It is not that we meet another type of conceptualisation of the phenomenon in Czech or Russian compared to English or Bulgarian However, the realizations are cross- linguistically different.

Figure 100: Structure of quantification construction realized for Czech and Russian grammar

We enhanced the existing NIGEL systems to create a new structure, as illustrated in Figure 100, which depicts the graph structure pertinent to Russian and Czech construction realization To facilitate selective quantification in construction, we introduced a new phrase quantity selection list (q-slct-list) that more accurately represents linguistic phenomena Additionally, the nominal group construction, previously encompassing the entire structure in the NIGEL model, has been repositioned to a deeper level, aligning with the Thing and Status collocation.

In Figure 101 we show SPL for the quantity selective construction that realises the Russian sentence from the example above

(|a11| / DM::OPEN-SCREEN-OBJECT :SPEECHACT IMPERATIVE

:PROPERTY-ASCRIPTION (J / QUALITY :LEX DIALOGOVYJ)))

:QUANTITY-SELECTION-id 1 :NUMBER PLURAL

:PROPERTY-ASCRIPTION (S / QUALITY :LEX SLEDUJUSCHIJ)))

Figure 101 SPL for the quantity selective construction (Ru)

The SPL expressions :QUANTITY-SELECTION-Q QUANTITY and QUANTITY-SELECTION-id 1 effectively facilitate generation in Russian and Czech In contrast, the :DETERMINER THIS expression is essential for English and Bulgarian but does not impact the generation process for Russian and Czech, rendering it irrelevant.

The implementation for Bulgarian follows the NIGEL implementation

Figure 102: Quantification system and chooser (Bg)

Then the quantity selection construction is finished by the Quantity-type and Quantity- selection systems that are shown in Figure 103

Figure 103: Quantifier-type and Quantifier-select systems (Bg)

To optimize the generation process for Russian and Czech, a reorganization is required The new group initiates by setting the q-slct-list preselect to the Minirange grammatical function Consequently, the MINIRANGE-TYPE system within the PPOTHER region has been modified.

Figure 104: The system minirange-type (Ru and Cz)

To enhance the Quantification group, it is essential to distinguish between the selective quantification case and the MINIRANGE-THING case Consequently, we will introduce a new output feature, MINIRANGE-THING-Q-SEL-LIST, to the system's OUTPUTS.

Clause complexity

This chapter focuses on the treatment of clause complexity within the grammars of Czech, Russian, and Bulgarian in the Final Prototype of the Agile system A comprehensive overview of clause complexity, as defined by Systemic Functional Grammar (SFG) (Halliday, 1985), was previously discussed in the LSPEC2 deliverable, while the implementations for the Intermediate Prototype were outlined in the IMPL2 deliverable.

The Agile corpus revealed various types of clause complexity; however, only a limited selection was present in the Intermediate Prototype target texts Consequently, the Intermediate Prototype grammars concentrated on a narrower range of clause complexity types during this phase.

 hypotactic enhancement (manner and purpose circumstantials)

The following examples illustrate the types covered in the Intermediate Prototype:

(131) paratactic extension: positive addition (conjunction)

(a) En: Specify the internal point and press Return.

(b) Cz: Určete vnitřní bod a stiskněte Enter.

Specify-Pl2 internal-Acc point-Acc and press-Pl2 Return.

(c) Bg: Въведете вътрешна точка и натиснете Return

Specify-Pl2 internal point and press-Pl2 Return.

(d) Ru: Укажите внутреннюю точку и нажмите Return

Specify-Pl2 internal-Acc point-Acc and press-Pl2 Return.

(a) En: Choose OK and close the dialog box.

(b) Cz: Vyberte OK a uzavřete dialogový panel.

Choose-Pl2 OK and close-Pl2 dialog-Acc box-Acc

(c) Bg: Изберете ОК и затворете диалоговия прозорец

Choose-Pl2 OK and close-Pl2 dialog box

(d) Ru: Нажмите кнопку OK и закройте диалоговое окно

Choose-Pl2 OK and close-Pl2 dialog-Acc box-Acc

(133) hypotactic enhancement: manner (means) circumstantial

(a) En: Start the PLINE command using one of the following methods.

Spusťte příkaz PLINE použitím jedné z následujících metod. Start-Pl2 command-Acc PLINE using-ins one-Gen of following-Gen methods- Gen

(c) Bg: (dependent: finite) Стартирайте командата PLINE, като използвате един от следните методи

Start-Pl2 command-the PLINE, by use-Pl2 one of following methods

(d) Ru: (dependent: nonfinite) Запустите команду PLINE, воспользовавшись одним из следующих способов

Start-Pl2 command-Acc PLINE using-gerund one-Ins of following-Gen methods-Gen

(a) En: Press Return to end the polyline.

Stiskněte Return, pro ukončení křivky.

Press-Pl2 Return for ending polyline

(c) Bg: (dependent: finite) Натиснете Return, за да завършите полилинията

Press-Pl2 Return so that end-Pl2 polyline

(d) Ru: (dependent: nonfinite) Нажмите клавишу Return, чтобы завершить рисование полилинии

Press-Pl2 key Return, in-order-to end drawing polyline-Gen

In the Final Prototype, the following additional coverage is needed:

(135) paratactic extension: positive variation (disjunction)

(a) En: Specify the internal point or press Return.

(b) Cz: Určete vnitřní bod nebo stiskněte Enter.

Specify-Pl2 internal-Acc point-Acc or press-Pl2 Return.

(c) Bg: Въведете вътрешна точка или натиснете Return

Specify-Pl2 internal point o r press-Pl2 Return.

(d) Ru: Укажите внутреннюю точку или нажмите Return

Specify-Pl2 internal-Acc point-Acc or press-Pl2 Return.

(a) En: Choose OK and then close the dialog box.

(b) Cz: Vyberte OK a potom uzavřete dialogový panel.

Choose-Pl2 OK and then close-Pl2 dialog-Acc box-Acc

(c) Bg: Изберете ОК и cлед това затворете диалоговия прозорец Choose-Pl2 OK and after that close-Pl2 dialog box

(d) Ru: Нажмите кнопку OK и затем закройте диалоговое окно

Choose-Pl2 OK and then close-Pl2 dialog-Acc box-

(a) En: If you use Windows, enter the Draw command.

Pokud používáte Windows, zadejte příkaz Draw

If use-Pl2 Windows enter-Pl2 command -Acc Draw

(c) Ru: (dependent: nonfinite) Если вы используете Windows, введите команду Draw.

If you use Windows enter-Pl2 command-Acc Draw

Czech grammar includes two types of hypotactic purpose enhancement: one involves a dependent realized through nominalization from the Intermediate Prototype, while the other features a dependent expressed as a finite clause in the conditional mood.

Vyberte OK, abyste uložili vlastnosti multičáry

Choose-imp-Pl2 OK would-Pl2 save-pastparticiple properties-Acc multilne-Gen Choose OK in order to save the multilane properties.

Bulgarian offers two methods for expressing purpose-dependent constructions: the non-finite version discussed in the Intermediate Prototype grammar and a nominalization approach, akin to that found in Czech.

(139) Bg: purpose dependent: nominalisation Натиска се Return за завършване на полилинията

Press-3sg refl Return for ending of polyline

Return is pressed for ending the polyline.

In the LSPEC2 discussion, certain types were identified but not fully integrated into the Czech and Bulgarian grammars within the Intermediate Prototype Our focus was primarily on generating the purpose-dependent type realized through nominalization.

This chapter outlines the formal specifications of clause complexity found in the Final Prototype texts, beginning with an overview based on Systemic Functional Grammar (SFG) and detailing the specific types of clause complexity included in our grammars We emphasize the enhanced coverage of clause complexity in the Final Prototype compared to the Intermediate Prototype Additionally, we discuss the implementations of the clause complexity region within the Final Prototype grammars and provide examples of generated sentences.

Clause complexity, as outlined by Halliday (1985), conceptualizes a sentence as a complex of clauses, akin to how a group is viewed as a word complex Within this framework, a head clause can be identified alongside its modifying clauses This understanding of clause complexity allows for a comprehensive examination of a sentence's functional organization Halliday asserts that a sentence is fundamentally a clause complex, which is the sole grammatical unit recognized above the clause level.

The complexity of clausal organization stems from the various ways clauses can relate to one another, as interpreted by Systemic Functional Grammar (SFG) through the 'logical' component of language This analysis distinguishes two key dimensions: interdependency, which includes parataxis and hypotaxis, and logico-semantics, focusing on expansion and projection However, since neither the Agile corpus nor the Agile target texts contain any clause complexes classified as projections, this discussion will solely address the concept of expansion.

This section will provide a detailed description of taxi dimensions and their expansion, emphasizing that clause complexity is influenced by the interaction of these two dimensions.

3.9.1.1 Interdependency or Taxis between Clause Complexes

The idea behind taxis is to elucidate the relative status of the head clause and the modifying clause(s):

 hypotaxis is the relation between a dependent element and its dominant, on which the former is dependent, whereas

Parataxis refers to a relationship between two elements that exist independently of each other, highlighting their equal importance According to Systemic Functional Grammar (SFG), all logical relationships in language can be categorized as either hypotactic or paratactic.

The following specification captures this distinction for the expansion subtype of clause complexity (we are not considering projection, as noted above):

If one of the parts is more prominent, then choose hypotaxis

We can consider a pair of clauses as a clause nexus relating a primary clause and a secondary clause The following table associates these terms to taxis:

51 As a matter of fact, the tactic dimension is general to all complexes, not just to clause complexes - in the same spirit it applies to word, group, and phrase.

Figure 110 - Clause nexus / tactic dimensions

In the next subsections we discuss hypotaxis and parataxis in some more detail

Hypotaxis refers to the relationship between elements of unequal status, specifically involving a dominant element and a dependent subordinate This asymmetrical relationship highlights the non-transitive nature of hypotactic connections, where the dependence of one element on another creates a distinct hierarchy.

In a hypotactic structure, elements are arranged in a way that reflects their dependence on one another, while this arrangement is not strictly linear This allows for flexibility in sentence construction, as a dependent clause can appear in various positions: after, before, within, or surrounding the dominant clause.

(a) The file is not stored until you Save it.

(b) If you Save the file, then it will be stored.

(c) Store, if you want, the file.

(d) You might, the manual says, save the file.

Figure 111: Ordering of Dominant and Dependent in a hypotactic clause complex

The arrangement of clauses in a hypotactic complex is crucial for effective textual organization As outlined in the TEXS3 deliverable, the typical structure places the dominant clause before the dependent clause This default ordering is often suitable for texts generated in Agile However, when a different structure is necessary, the dependent clause should be designated as Thematic, allowing it to be positioned at the forefront of the sentence.

The specification of hypotactic complexes is complicated by the fact that multiple dependent elements can modify the same dominant element The distinction between a

In the HYPOTAXIS-ALPHA-COMPLEXITY system, both "simple" and "multiple" dependent structures are utilized Various systems are employed to address hypotactic extensions, although a detailed discussion of these systems is omitted for brevity.

If there are multiple dependents

Then choose complex-alpha-hypotactic-expansion

Parataxis involves a clause structure where both the primary and secondary elements hold equal importance, allowing each to function independently Unlike hypotaxis, where the dependent element relies on a dominant one, parataxis showcases a level of autonomy marked by logical characteristics such as symmetry and transitivity.

Ngày đăng: 18/10/2022, 14:07

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w