Terminology Standards in the Aspect of Harmonization for International Term Database
Juris Borzovs, Ilze Ilziņa, Valentīna Skujiņa, Andrejs Vasiļjevs 02.11.2006.
-
«Tūrisma un viesmīlības terminu skaidrojošās vārdnīcas» atbalstītāji -
"Valodniecības pamatterminu skaidrojošās vārdnīcas" atbalstītāji -
Lai taptu vārds… -
Jauns terminu resurss -
Liec un brauc! (Park-and-ride) -
Latvija realizējusi starptautiskas terminu bankas izveidi -
Latvijā izveidota Eiropas terminu banka -
Latvijas Zinātņu akadēmijas Terminoloģijas komisija sešos gadu desmitos -
Terminology Standards in the Aspect of Harmonization for International Term Database -
Valodas kultūra: stabilais un mainīgais -
Izdevums "Terminoloģijas Jaunumi" -
Izdevumi un raksti 2001.–2005. gadā -
Izdevumi un raksti līdz 2000. gadam
(Click here for PowerPoint presentation of this material)
After joining the European Union every facet of social, political and scientific life had to meet the claims of globalization and had to adjust to them. Terminology was not an exception. Terminologists in the multilingual society of EU had to meet the challenge of creating a common platform for main players in the field of terminology and to represent the abundance of terms encountered in every area of everyday life.
Communication and interaction on international level has become a «must», the changing of messages with other languages is a rather commonplace happening. Nevertheless the messages have to transfer information in such a way that the receiver may it interpret correctly and without mistakes.
Necessity of establishing conformance between terms in multilingual communication and corresponding data exchange requires the standardization of term structure and creating of multilingual database. This is a rather urgent problem and has not to be put off for a long time. It is the main purpose of launching a project EuroTermBank of creating European digital content for the global networks – Collection of Pan-European Terminology Resources through Cooperation of Terminology Institutions to promote linguistic diversity in the information society. Participants of the project are from institutions of different countries: Germany, Denmark, Hungary, Poland, Lithuania, Estonia and Latvia. Latvia is also the coordinator of the project.
At the very beginning of the project we have stated, that the development process of terminology in different countries is rather scattered and there is not a unified way of it. This paper offers some conclusions of Latvian experience and marks some ways for problem solving.
EuroTermBank project involves a hierarchy of different tasks and topics, and every step of the project needs coordination and harmonization with other project participants.
In the area of standardization we distinguish three sets of problems where international harmonization is needed:
1) the general regulations for term and their definition creating;
2) the selection of multilingual terms and the evaluation of their quality in subject field term dictionaries and international standards;
3) the coding of terminological material and technical processing for multilingual database input.
The harmonization also may be classified on different levels.
1) To arrange the terminology process on national and international level all the partners have to come on terms about uniform methodology principles for ensuring compatibility of terminological resources. Lithuania, Latvia and Estonia in collaboration with other EU partners have already made their contribution, it is time to implement them.
2) For multilingual database a basic unified core of terms has to be developed which may be supplemented by equivalent terms in different languages. One basic language for standardized terms and definitions is advisable.
3) The design of data categories, data structures and exchange formats came to light as well as the necessity of digitalization and modification of terminological resources according to harmonized structural and technical requirements.
General regulations for terms and their definitions creating
General regulations of terms and their definitions creating are internationally standardized by International Standardization Organization (ISO). As it is defined in ISO 1087-1 : 2000, the term is «verbal designation of a general concept in a specific subject field», and the definition is «representation of a concept by a descriptive statement which serves to differentiate it from related concepts». Besides, the nature of definitions and definition writing is regulated by ISO 704, ISO 860, ISO 10241.
As our main aim is to establish the harmonization aspect of multilingual databases, let us consider the difference between the two concepts – harmonization and unification. Latvian terminologist E. Drezen had explained and specified the concept of unification in his book »Internationalization of Scientific-technical Terminology» in 1936. And for many years the unification of concepts and terms was on the top of attention both on a national and international level, for instance, the multilingual term base IOUTN in Warsaw at the end of 20th century was based exactly on the principle of unification.
Let us compare the content of the both terms — unification and harmonization — using Oxford Dictionary.
UNIFICATION and HARMONIZATION in comparison
unification
(Oxford Dictionary) unification – the act or an instance of unifying; the state of being unified; unify – reduce to unity or uniformity; unity – (1) oneness; being one, single, or individual; being formed of parts that constitute a whole. |
harmonization
(Oxford Dictionary) harmonize – (2)(often followed by with) bring into or be in harmony; harmony – (2) an apt or aesthetic arrangement of parts; (3) agreement, concord. |
Conclusion:
Unification is different from harmonization as the main essence of it we can stress with the word uniformity, but the main goal of harmonization is to reach concord, that means coordination, proportional adequacy, appropriate meaning, prevention of discrepancies.
General regulations of harmonization process are internationally standardized by ISO 860. According this standard the harmonization process is as follows:
- Harmonization starts with a comparison of the involved concept systems in terms of number of concepts, relations between concepts, depth of structure and type of characteristics leading to the construction of harmonised concept systems.
- All the concepts must then be analysed by comparing the definitions. If the definitions differ, it must be decided whether the difference is relevant or irrelevant. If relevant, it means that there are indeed two or more different concepts involved that must be defined and placed in the concept system.
- The essential (named also as «defining») characteristics for the harmonized concepts have to be established.
- When the concepts are harmonized, the terms can be harmonized taking into account the differences and similarities between languages, the tradition of term formation in the subject field and in a given language as well as the already established terminology.
The terms concept harmonization and term harmonization are playing important role, and they are included in the standard of terms and definitions ISO 1087:
concept harmonization (3.6.5)
Activity for reducing or eliminating minor differences between two or more concepts which are already closely related to each other.
NOTE: Concept harmonization is an integral part of terminology standardization. |
term harmonization (3.6.6)
Activity leading to the designation of one concept in different languages by terms which reflect the same or similar characteristics or have the same or slightly different forms. |
By creating multilingual databases these harmonizing aspects are rather urgent and important – to coordinate, to prevent discrepancies, to reach mutual adequacy between concepts and terms. In short, we do not need to unify, but according to peculiarity of every language try to harmonize the multilingual terms for the database.
From these definitions we may deduce, that at the concept level it is important to define more exactly concepts, which are closely related to each other, preventing possible differences. At the same time language systems are different, and at the term level not always it is possible to express the same concept with similar designation. Nevertheless for definition creating and term-building it is important to express the essential characteristics of the concept by terms in different languages. This is a fundamental request by creating transnational databases and establishing their practical usability. The formal similarity of the corresponding terms in multilingual languages, of course, is welcome.
What do we mean by essential characteristics of a concept? In the theory of terminology just as in logic we distinguish the necessary and the sufficient characteristics for accomplishing the task of classification of every concept and term in a definite subject field concept and term system. The place of a concept in a definite concept system is determined by vertical (for superordinate or subordinate concepts) and horizontal (for two or more subordinated i.e. coordinated, adjacent concepts) characteristics. For example, in a horizontal aspect we cannot announce as an essential characteristic for the concept ‘tree’ the concept ‘green leaves’, because also bushes and caulescent plants do have leaves. How would it be possible to characterize a tree or the concept of all trees? It could be characterized by being a perennial plant, but it differs from other perennial plants by having a wood-fibre trunk and a crown.
Two following schemes are given to illustrate the concept and term analysis and system creating process taking into account vertical and horizontal characteristics:
Superordinated
concept ↕
|
‘a plant’
↕ |
‘a furniture’
↕ |
Subordinated/ Superordinated
concept ↕
|
‘a tree’
↕ |
‘a chair’
↕ |
Subordinated
concept |
‘a conifer’ | ‘an arm-chair’ |
Figure 1. Concepts divided on the base of vertical characteristics
(‘with a trunk’) (‘with a bough’) (‘with a stalk’)
‘a furniture’
(‘for sitting’) (‘for sleeping’) (‘for eating’)
Figure 2. Subordinated concepts divided on the base of horizontal characteristics
Classification of concepts according the appropriate vertical and horizontal characteristics is the way of creating the definite concept system from corresponding concept field in any subject field and language. For better comprehension of the concepts and terms concept field and concept system let us compare both definitions given in ISO 1087:
concept field (3.2.10)
Unstructured set of thematically related concepts. |
concept system (3.2.11)
Set of concepts structured according to the relations among them. |
On the term level the different elements of terms are field and system.
On the definition level the essential characteristic which separates the concepts included into terms is connected with structuring: term-element field is assigned to a set without definite structure, the term-element system – to a structured set. Essential characteristic which should be mentioned in both cases is the notion of thematically related concepts. In the second definition it is implicitly included in the combination of words according to the relations.
By creating multilingual database the necessity of coming to a unified basic term core in the main language (the pivot language) is a must. Afterwards it may be expanded by equivalents of other languages, so representing a cluster of the term in different languages. This core should be represented in the pivot language, the essential characteristics of the term being represented in the definitions.
Organizations whose standards are used to create Latvian ICT subject field terms
As an example of selection of multilingual terms and the evaluation of their quality in branch terminology let us consider problems of harmonization and standardization in ICT.
First of all, we have to understand the concept of a term, to check main (essential) characteristics and to develop equal definition of a concept. All the multilingual terminologists have to consider, explore and follow some common source model of terms. As several organizations in different branches have developed recommendations and standards of terminology including also vocabularies and glossaries of terms, this task may be facilitated. Let us mention only some of these organizations, as their standards we have used to create Latvian ICT terms:
- The International Electrotechnical Commission (IEC) – an international standard organization dealing with electrical, electronic and related technologies. Some of its standards are developed jointly with ISO.
- The Institute of Electrical and Electronics Engineers (IEEE) – a leading developer of international standards that underpin many of today's products and services, particularly in telecommunications, information technology and power generation. With an active portfolio of nearly 1,300 standards and projects under development, they are the central source for standardization in a broad range of emerging technologies.
- The International Organization for Standardization (ISO) – an international standard-setting body composed of representatives from national standards bodies. The organization produces world-wide industrial and commercial standards, the so-called ISO standards.
Nevertheless many ICT companies, creators of hardware and software, are developing their own standards which sometimes are rather widespread and at the very end get the status of national branch standards.
Multilingual international databases are also essential support for term creating.
Databases of standard terminology terms are given in several languages, but in ICT Latvian terminologists prefer to work with the English databases and create definitions of terms according to them.
Nevertheless even in one language the same term in different applications may have different meanings. Before starting the task of international harmonization of a term we have to fix the branch of its application and search out if it is used in other branches as well. The explanation of the keyword security the three of mentioned standard organizations represent as
1) in computer science – the existence and enforcement of techniques which restrict access to data, and the conditions under which data may be obtained;
2) in electricity – the ability of an electric power system to suitably respond to disturbances arising within that system, including both local and widespread disturbances and the loss of major generation and transmission facilities;
3) in ordnance – measures taken by a command to protect itself from espionage, observation, sabotage, annoyance, or surprise; a condition which results from the establishment and maintenance of protective measures which ensure a state of inviolability from hostile acts or influences; protection of supplies or supply establishments against enemy attack, fire, theft, and sabotage.
In these cases one and the same English term security regardless of different definitions in different subject fields (computer science, electricity, ordnance) expresses the same basic concept, and therefore it is possible on the base of this concept to appropriate one term as equivalent in Latvian too: drošība (at least, in main meaning of this word).
Not always the same term in Latvian will cover all the meanings of an English (source) term, because rather often one term describes different entities. Sometimes there is one corresponding Latvian term, sometimes for one English term different Latvian terms are created.
The term record is used:
1) in computer data processing – a collection of data items arranged for processing by a program;
2) in a database – (sometimes called a row) is a group of fields within a table that are relevant to a specific entity;
3) in Virtual Telecommunications Access Method, IBM's proprietary telecommunications access method for mainframes and part of its Systems Network Architecture (SNA) – the unit of data that is transmitted from sender to receiver.
For the first and second case the corresponding Latvian term is ieraksts, for the third case – bloks.
The term frame is used:
1) in telecommunications – data that is transmitted between network points as a unit complete with addressing and necessary protocol control information;
2) in film and video recording and playback – a single image in a sequence of images that are recorded and played back;
3) in computer video display technology – the image that is sent to the display image rendering devices.
In the first case frame is translated into Latvian as kadrs, in the other cases as ietvars.
Sometimes different branches of terminology having developed independently and having used different source languages even by explaining the same meaning of a term, use different Latvian equivalents of it, and the use of them justify by the long-time usage of the very term. This case may be illustrated by translation the English term reliability, which in ICT usage is translated as uzticamība, according to its definition: reliability is the ability of a system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances.
In power sector the same term is used drošums.
This situation illustrates that before term harmonization on international level the first harmonization step – general regulations of term and their definition creating should be done on national level in a target language.
EuroTermBank - international project for terminology consolidation
Consolidation and harmonization of terminology data on international level has been the main task of EuroTermBank project. EuroTermBank goal is to collect, harmonise and disseminate terminology resources in new EU member states. The result of the project is online terminology portal www.eurotermbank.com. Although project initially focused on terminology resources in Latvia, Lithuania, Estonia, Hungary and Poland it can be easily extended and its approach can well be applied to other countries as well.
EuroTermBank project is part of EU eContent programme aimed to stimulate the development and use of European digital content on the global networks and to promote the linguistic diversity in the information society.
Project partners are from Germany, Denmark, Estonia, Latvia, Lithuania, Hungary and Poland comprising good combination of universities, private companies and institutions of state coordinated/regulated terminology.
Project partners have identified and described more than 500 terminology resources. Prioritization of resources have been made and agreements signed for inclusion in EuroTermBank of more than 200 resources with total number of entries exceeding 700 000. During the project large scale digitalization of terminology resources has been accomplished with more than 200 000 terms converted from paper format to uniformly structured digital data.
EuroTermBank serves not only as a storage of terminology data but also as an access hub interconnecting number of terminology databases. In such a way user is provided with a single access point to a vast array of terminology data. At the moment the following external databases are connected with EuroTermBank:
- Termnet.lv - Official Latvian terminology database of Terminology Commission of LAS
- PolTerm – Polish legal terminology database
- Hungarian legislation terminology database
Standards for terminology data processing
Standardization is essential for consolidation of diverse terminology resources and ensuring exchangeability of terminology data. During EuroTermBank project standards for terminology data processing were assessed and applied for data modeling and data interchange interfaces.
The most recognized standardization body is Technical Committee 37 of International Standardization Organization (ISO TC37) Terminology and other language and content resources. Number of standards developed by ISO TC37 describe basic principles for terminology data modeling, processing, storage and interchange.
For purposes of storage and retrieval, terminology data is organized into terminological entries. Each entry includes information related to the single concept. This concept-oriented approach differs from widespread practice in many dictionaries to organize entries around lexical units. To consolidated terminology entries in different languages and from different sources it is necessary to group them around abstract language independent concepts (see Figure 3).
Figure 3. Transition from lexicographical approach to concept-oriented approach
Individual terminological entries consist of data items according to chosen data model and data category. The International Standard ISO 12620 Computer applications in terminology – Data categories specifies data categories for recording terminological information in both computerized and non-computerized environments and for the interchange and retrieval of terminological information independent of the local software applications or hardware environments in which these data categories are used. The use of uniform standard-compliant data category names and definitions greatly facilitates interchange of data between different systems and enhances the reusability of data.
Each concept-oriented entry in EuroTermBank is structured in four levels (see Figure 4). The entry level contains language independent information like entry identifier, picture etc. The language level contains language specific information like definition, reference, explanation etc. Term level contains the term - a designation of a defined concept in a specific language by a linguistic expression – and other term-related information. Word level concerns a particular words that forms a term and lexical information concerning these words.
Figure 4. Four level data structure of EuroTermBank system
For the interchange of terminological data an international standard ISO 12200 Computer applications in terminology – Machine-readable terminology interchange format (MARTIF) has been developed. It allows the distinct identification of separate data sets and data categories as well as their dependencies and relations. The format relies heavily on the data category names and definitions contained in the standard ISO 12620. MARTIF is based on ISO 8879 Standard Generalized Markup Language (SGML).
MARTIF provides an open, flexible mechanism for exchanging data between different terminology management systems. The main body of the MARTIF standard specifies the formalism to be used in preparing terminology data collections for interchange by defining the SGML Document Type Definition (DTD) and listing the appropriate tags (markup) used to structure the data. Normative Standard also specifies the markup for the individual terminological data categories to be used in the MARTIF environment, based on ISO 12620.
International standard ISO 16642 Computer applications in terminology - Terminological markup framework (TMF) facilitates the use and re-use of terminological data collections, taking into account the real-live conditions of different formats, database environments and term-bank systems as well as the various data models the collections are based on. The standard also addresses the need to provide better connections between terminological databases and other lexical resources used, for instance, in machine translation or natural language processing.
Localization Industry Standards Association (LISA) has developed an industry standard TBX (short for TermBase eXchange). It is a very practical terminology exchange format that is compliant with the terminology markup framework TMF. TBX is based on the TMF structural meta model; it specifies a set of data categories from ISO 12620 and adopts an XML style compatible with MARTIF.
The EuroTermBank system implements TBX standard to enable data exchange between different system modules, data exchange with external terminology databases, data import and export, data storage and data editing.
The current research is focused on possibilities and limitations of automatic database entry compounding - merging bilingual entries from several sources to form one multilingual entry. The key aspect is to distinguish, harmonize and merge those entries from several resources that are directly related to the same concept. Theoretical and some practical aspects of international terminology harmonization have been described in this paper. Further research and practical assessment work continues in the framework of EuroTermBank project.
Conclusions
- For creation of a definition the essential characteristics of a concept and its place in subject field concept system has to be established. Definition and the choice of the very term is based on necessary and sufficient characteristics of the concept. The analysis of the concepts being transnational has to be similar in the terminology work of each language. The choice of the very term is mainly determined by the singularity of the target language.
- For gradually improving the use of international database it is very important to accomplish methodically coordinated work in every national partner’s language, in every subject-field term system.
- As a result of the EU EuroTermBank project a methodology of creating multilingual term database is developed, large number of resources have been consolidated and integrated into online system. Key principle in development of EuroTermBank system is to rely on international standards to establish a sound foundation for consolidation of large variety of dispersed terminology resources and for ensuring exchangeability with other systems and applications.
References
ISO 704-2 : 2000: Terminology work — Principles and methods.
ISO 860:1996 Terminology work — Harmonization of concepts and terms.
ISO 1087-1 : 2000 Terminology work — Vocabulary — Part 1 : Theory and application.
ISO 10241-1 : 1992 International terminology standards — Preparation and layout.
ISO 12200 Computer applications in terminology – Machine-readable terminology interchange format (MARTIF)
ISO 12620 Computer applications in terminology – Data categories
ISO 16642 Computer applications in terminology – Terminological markup framework
Drezen, E. Internationalization of Scientific-technical Terminology. Riga, 2002 (1936). — 71 p.
Picht, H.; Draskau, J. Terminology: An Introduction. Copenhagen, CSE, 1985, p. 36–61.
Skujiņa, V. The Principles of Formation of Latvian Terminology. Riga, 2002, LAS/LLI of LU, 224 p.
Vasiļjevs A., Skadiņš R. Eurotermbank terminology database and cooperation network // The Second Baltic Conference on Human Language Technologies. – Tallinn: 2005. – p. 347-352.
Vasiļjevs A., Borzovs J., Skadiņš R., Liedskalniņš A. Development of web-based terminology database for new EU member countries – problems and opportunities. Seventh International Baltic Conference on Databases and Information Systems, Vilnius, 2006, p.228-238.
Henriksen L., Povlsen C., Vasiljevs A. 2005. EuroTermBank – a Terminology Resource based on Best Practice. In Proceedings of LREC 2006, the 5th International Conference on Language Resources and Evaluation, Genoa, on CD-ROM
Vasiljevs A., Schmitz K.-D., Collection, harmonization and dissemination of dispersed multilingual terminology resources in an online terminology databank. International Conference on Terminology, Standardization and Technology Transfer, Beijing, 2006, p. 265-272.
Wright S. E. A Guide to Terminological Data Categories. Conference on Terminology and Content Development, Copenhagen, 2005, p. 63-66.