The information collected for the construction of a terminological record and represented on it is various and subject to change in any of its parts. This has an effect on the nature of the database system chosen for the management of
the information. Most information items in the database must be considered independent of each other; mostly they can be entered at separate times and with reference to different sources. It follows that it must be possible to verify and up-date them separately. It should also be a principle to separate the automatically collected factual information from the selective and evaluative information added by the terminologist; in this way factual data are preserved intact and can be used for different purposes by different terminologists or indeed other users. Human interpretation, as it exists in the separation and categorisation of senses or in the declaration of preferred forms, may vary from person to person and with the purpose of a particular dictionary. If the factual evidence which led to a particular interpretation is maintained, it can serve as corroboration of the terminologist's decision and can be re-assessed at a later date.
These observations point to the importance of separately providing full bibliographical information for each item of information as appropriate. A great deal of writing on terminology has in recent years been devoted to 'proper' referencing of sources; less effort has been spent on characterising the types of sources that provide the most suitable information and even less on the criteria for establishing representativeness of sources.
It should be a principle to proceed automatically as far as possible and to limit human manipulation of lexical data to the specific interpretive tasks the computer cannot perform. The machine can increasingly be instructed to draw inferences from the stored information, and end users can be expected to search an automatic dictionary or a term bank at different levels of depth and sophistication.
Other than by definition, a concept is considered to be suitably explained by indication of the linguistic forms of one or several related concepts. This is expressed by listing such obvious related concepts as antonyms, broader and narrower generic terms, broader and narrower partitive terms—classes of relationships that have been taken over from information science.
It is now increasingly considered necessary to exemplify the usage of tech
nical terms by means of example sentences, called contexts, and usage notes, which further specify the appropriate linguistic environment for a term. In addition and as a result of the availability of computer processing and storage, it has become good practice to provide full bibliographic references to the sources from where a term, its definition and context were extracted. This in
formation permits a close control of the usage and possible change of meaning of terms which by their very nature are semantically much more volatile than items of the general lexicon of a language.
Compilation of Terminology 139 In conceptually-based terminological data banks it is customary to give definitions in one language only. This approach reduces terms in a second, non-defining, language to the status of translation equivalents. In this case it is conventionally agreed that the dictionary entry corresponds to a concept and consequently the language of the definition is the source language for the entry term of the terminological record. Since the equivalent term in a second language is not a representation of a matching concept in the culture of this language, this language has the status of a target language. In order to give two terms the same status, two definitions would have to be established and the source and target language definitions would have to be found as being identical in meaning.
Bilingual terminology is therefore usually directional and non-reversible, i.e. translation equivalents cannot be simply converted into entries in their own right with the source language entry becoming a translation equivalent. In many cases a translation equivalent does not in fact refer to an authentic con
cept in the culture of the target language, because the translated terms effects the introduction of the new concept. For a number of subject fields in natural sciences and for a number of languages of societies enjoying similar states of technological development the strict directional approach is unnecessary because structures of knowledge largely coincide, yet it is difficult to decide where it can safely be abandoned. In cases where terminology is internation
ally agreed, the reversibility of entries in dictionaries becomes acceptable for the extent of this agreement.
Reversibility of entries is, however, possible and even necessary in diction
aries for bilingual countries or for multilingual regimes of supranational or
ganisations such as the European Communities. In these situations concepts are defined in identical manner in order to permit a multidirectional approach to the entries. This does not apply to the special cases of international stand
ards, which are discussed in section 4.3.6.
5.2.1 Methodological considerations
Modern techniques of computational linguistics make it unnecessary for the terminologist to be concerned how the data is stored in the computer. Con
sequently there are no longer the constraints caused by the necessity to man
ually sequence and order the entries within the terminology collection and further sequence the individual elements within each entry. The structure of a terminological system can be as complex as necessary—it is not beyond the
potential of computers to store a multi-dimensional semantic network. There is also no longer the physical limitation of the size of the record card, slip or other non-magnetic medium. Because computers offer unlimited storage to all intents and purposes, definitions, for example, can be as long as is necessary to properly define the term.
From a practical standpoint, there is less need for rigorous control during the research stage of terminology compilation. The task of checking ter
minological data can be handled either by the data acquisition software or by the database management system, as appropriate, and takes place either during input to the system or at the level of storage. It is recognised that the task of checking the completeness and consistency of both individual entries, where very complex relational structures are built up, and whole collections of terminological data is beyond the capabilities of a single individual or group of individuals. This task can, however be easily and efficiently performed by a reasonably intelligent piece of software.
This use of computers permits both a physical and temporal distribution of the task of compilation.
Information can be collected and stored in stages. As long as each item of data in the term record satisfies the controls imposed by the term-processing system for that data category, e.g. that a definition has a bibliographic refer
ence or that any related term entered does not invalidate the existing con
ceptual system, then as much data as available for a term can be entered at any time. This is particularly important where compilation is prompted by a production-type environment. The terminology user does not need to wait for a full record to be recorded but can specify which data elements have a high priority. Indeed, the user may even carry out the initial research and enter a subset of the terminological data which a terminologist would verify and complete at a later date.
Information can be collected on a distributed basis. Work can be dis
tributed among various people and locations without loss of quality. This is particularly important in the case of the compilation of multilingual ter
minology. Work can be dispersed over several countries or even over several continents such, for example, that all terminology compilation is carried out by native speakers or by subject specialists only.
Terminological data can be collected regardless of the onomasiological or the semasiological approach since ordering of data occurs totally inde
pendently of compilation. The dual-linear structure of the conventional dic
tionary, which forced a distinction between concept- and vocabulary-oriented terminology compilation has thus become irrelevant.
Compilation of Terminology 141 5.2.2 Quality of data
The use of a computer for input control and validation has resulted in a trend towards terminology of a higher quality. Because the computer is a far more efficient means of storing and disseminating terminology, the dangers of spreading terminology of low-quality or of a dubious nature are increased unless strict controls are exercised. The inconsistencies in the use of certain data categories can be eradicated, resulting in a more coherent and more reliable terminological collection. This increase in quality is in fact imperative in view of the far-reaching effect which computerised terminology processing will have on terminology dissemination. In order to maintain the higher quality of terminological data, the unchecked integration of existing dictionaries into terminological data banks is not considered sound practice and where this has taken place in the past it has been necessary to spend a great deal of time cleaning up the collection at a later date. It has been known for term banks, actively engaged in both the simultaneous tidying up of their data holdings and the input of new data, to experience a dramatic reduction in the size of their database because the elimination of unsound records proceeded at a faster speed than the input of new records.
The use of existing dictionaries, even those which publishers may have converted to machine-readable form, is fraught with difficulties. This has been a dilemma which computational linguists as well as terminologists have had to face as part of the rapid advance of their knowledge and experience in automation. Existing dictionaries may not prove suitable because of physical limitations, e.g. the data may be stored in a format which, although suitable for type-setting and printing, is unsuitable for useful machine manipulation.
More frequently, however, data from printed media which have been com
piled without computer assistance are incomplete, out-of-date or unreliable in various other ways.
In order to ensure a high quality of data in multi-lingual terminological collections it has become important to distinguish between original source texts and those which have been translated. Terms extracted from texts in their original language are normally genuine terms of that language and as such have full validity. Terms extracted from translated texts, however, may either be valid terms or only translation equivalents, coined for the particular translation in question. There is, therefore, a trend towards the use of genuine original texts for the extraction both of terms and contexts for a particular entry. Similarly there is a recognition that for many terms no exact match of concepts exists across languages and that the terminologist must offer several
possible equivalents along with context and usage information to allow the correct choice to be made by the end user.
5.2.3 Principles of data collection
Automation permits the collection and compilation of terminological data in stages and by team work while at the same time making it possible to exercise stricter control over consistency of data than manual methods. These pos
sibilities impose a greater necessity for generally agreed methodologies which should be based on a set of basic principles, such as the following:
1. Terminological data should be collected with a certain consistency of cri
teria.
2. All terminological information has sources which must be stated with the same accuracy and completeness as bibliographical data.
3. Terminological data have a limited validity in time. Information must there
fore be given full temporal identification.
4. The use of existing dictionaries is not considered sound terminological practice.
5. It is important to distinguish between original and translated texts. Terms extracted from translations may be genuine terms of a language or only translation equivalents.
6. Terminology extracted from running text or discourse offers a greater guar
antee of thematic completeness and coherence and ensures accurate dating of terms.
7. The linguistic behaviour of terms should be documented by suitable con
texts so that all relevant textual variants are covered.