BS 8723-4:2007 2008
$198.66
Structured vocabularies for information retrieval. Guide – Interoperability between vocabularies
Published By | Publication Date | Number of Pages |
BSI | 2008 | 62 |
PDF Catalog
PDF Pages | PDF Title |
---|---|
3 | Contents Introduction 1 1 Scope 2 2 Normative references 2 3 Definitions, symbols, abbreviations and other conventions 2 3.1 Definitions 2 3.2 Symbols, abbreviations and other conventions 4 4 Structural models for interoperability across vocabularies 5 4.1 General 5 4.2 Model 1: Structural unity 6 4.3 Model 2: Non-equivalent pairs 6 4.4 Model 3: Backbone structure 7 4.5 Application of different scenarios 8 5 Mapping applications in context 9 5.1 General 9 5.2 The effect of different vocabulary types 9 5.3 Establishing mappings for index terms 10 5.4 Mapping of search terms 11 5.5 Combinations of terms 12 5.6 Treatment of pre-coordinated strings 13 5.7 Automatic mapping versus human mediation 15 6 Relationships and mappings across vocabularies and languages 16 6.1 Types of relationship 16 6.2 Degrees of equivalence 16 6.3 Equivalence between languages within a multilingual thesaurus 18 6.4 Mappings across structurally different vocabularies 19 7 Establishing equivalence for structurally different vocabularies 22 7.1 General 22 7.2 Accepting a near-match 22 7.3 One-to-many cross-vocabulary equivalence 22 7.4 Many-to-one mappings 23 8 Establishing equivalence between languages in a multilingual thesaurus 24 8.1 General 24 8.2 Accepting a near-match 25 8.3 Loan terms 25 8.4 Coined terms 26 9 Managing mappings and other relationship data 27 9.1 Management within one system 27 9.2 Management within two or more separate systems 27 9.3 Management external to the source and target vocabulary systems 27 10 Display of mapped vocabularies 28 10.1 General 28 10.2 Single record display 28 10.3 Multilingual thesaurus displays 29 10.4 Displays of mappings between structurally different vocabularies 35 10.5 Language and character encoding issues 36 |
4 | 11 Mapping system functionality 39 11.1 General 39 11.2 Switching or augmentation of index terms, notations or captions 39 11.3 Switching of search terms 39 11.4 Expansion of search terms 40 12 Management of projects for mapping vocabularies and languages 40 12.1 General 40 12.2 Structural considerations 40 12.3 Resources for multilingual projects 41 Annexes Annex A (informative) Practical examples encountered during preparation of a multilingual thesaurus 42 Bibliography 51 Index (BS 8723-4) 53 List of figures Figure 1 – Model 2 (non-equivalent pairs) as applied to four vocabularies 7 Figure 2 – Model 3 (backbone model) as applied to four vocabularies 7 Figure 3 – Single record display in a bilingual thesaurus (English – Spanish) 28 Figure 4 – Single record display in a bilingual thesaurus (Spanish – English) 29 Figure 5 – Alphabetical display for a bilingual thesaurus 30 Figure 6 – Alternative layout of entries in an alphabetical display 31 Figure 7 – Hierarchical display for a bilingual thesaurus 32 Figure 8 – Correspondence table for a bilingual thesaurus (English – Spanish) 33 Figure 9 – Correspondence table for a bilingual thesaurus (Spanish – English) 34 Figure 10 – Alphabetical display with mappings to two other vocabularies 35 List of tables Table 1 – Tags and their equivalents in other languages 5 Table 2 – Additional abbreviations and symbols used in mappings 5 Table 3 – Elements used to represent concepts 10 Table 4 – Mapping of index terms 11 Table 5 – Mapping of search terms 12 Table 6 – Two different styles of mapping (mappings from one source vocabulary to two target vocabularies are shown) 20 |
5 | Foreword |
7 | Introduction a) In a multinational company, knowledge and information gained at one site needs to be accessible to staff in offices around th… b) Information produced in the public sector needs to be easily accessible to a variety of audiences. These could, for example, … c) A third example concerns large collections of data, indexed in past years and decades with different vocabularies in diverse … |
8 | 1 Scope 2 Normative references 3 Definitions, symbols, abbreviations and other conventions 3.1 Definitions |
10 | 3.2 Symbols, abbreviations and other conventions |
11 | Table 1 Tags and their equivalents in other languages Table 2 Additional abbreviations and symbols used in mappings 4 Structural models for interoperability across vocabularies 4.1 General |
12 | 4.2 Model 1: Structural unity 4.3 Model 2: Non-equivalent pairs |
13 | Figure 1 Model 2 (non-equivalent pairs) as applied to four vocabularies 4.4 Model 3: Backbone structure Figure 2 Model 3 (backbone model) as applied to four vocabularies |
14 | 4.5 Application of different scenarios |
15 | 5 Mapping applications in context 5.1 General 5.2 The effect of different vocabulary types 5.2.1 General |
16 | 5.2.2 Elements to be mapped Table 3 Elements used to represent concepts 5.3 Establishing mappings for index terms |
17 | Table 4 Mapping of index terms 5.4 Mapping of search terms |
18 | Table 5 Mapping of search terms 5.5 Combinations of terms |
19 | 5.6 Treatment of pre-coordinated strings 5.6.1 Mapping index terms where the pre-coordinate scheme is the source vocabulary a) Map the terms or notations representing the separate headings and subdivisions of the pre-coordinate scheme to the target vocabulary. |
20 | b) In addition to a), also map any pre-coordinated strings that are enumerated in the source vocabulary. If possible, map them to pre-coordinated expressions in the target vocabulary, but if not they may be mapped to a combination of terms. c) Sometimes there are good reasons for not using the source vocabulary as the basis of mappings. In order to include additional… 5.6.2 Mapping index terms where the pre-coordinate scheme is the target vocabulary |
21 | 5.6.3 Mappings for search statement conversion, where the pre-coordinate scheme is the source vocabulary 5.6.4 Mappings for search statements, where the pre-coordinate scheme is the target vocabulary 5.7 Automatic mapping versus human mediation |
22 | 6 Relationships and mappings across vocabularies and languages 6.1 Types of relationship 6.1.1 General 6.1.2 Intra-vocabulary equivalence versus cross-vocabulary equivalence 6.2 Degrees of equivalence |
23 | a) Exact equivalence: In this ideal situation, the target vocabulary contains a concept identical in scope to the concept in the… b) Inexact equivalence: In this case, corresponding concepts in the two vocabularies have overlapping scopes. An equivalence rel… c) Partial equivalence: A concept in one of the vocabularies is broader in scope than a concept in the other. This situation nor… d) Non-equivalence: The target vocabulary does not contain a concept that matches the source vocabulary concept, even partially or inexactly. |
24 | 6.3 Equivalence between languages within a multilingual thesaurus 6.3.1 Equivalence between preferred terms |
25 | 6.3.2 Correspondence between non-preferred terms 6.4 Mappings across structurally different vocabularies 6.4.1 General |
26 | 6.4.2 Examples, styles and conventions for non-equivalent pairs of thesauri Table 6 Two different styles of mapping (mappings from one source vocabulary to two target vocabularies are shown) |
27 | 6.4.3 Mapping non-equivalent pairs of different types |
28 | 7 Establishing equivalence for structurally different vocabularies 7.1 General 7.2 Accepting a near-match 7.3 One-to-many cross-vocabulary equivalence a) A broad but simple concept may be made up of several narrower concepts that are comparable and belong to the same fundamental category. |
29 | b) A complex concept represented by a single term (often a multi-word term as described in BS 8723-2:2005, Clause 7) in the source vocabulary may be conveyed by the combination of two or more simple terms in the target vocabulary. 7.4 Many-to-one mappings |
30 | 8 Establishing equivalence between languages in a multilingual thesaurus 8.1 General |
31 | 8.2 Accepting a near-match 8.3 Loan terms |
32 | 8.4 Coined terms a) The source language term, which represents a new concept to the users of the target language, is for some reason not acceptable as a loan term. b) The source language term has already been used as a loan term by authors writing in the target language, but the term needs t… c) In a thesaurus containing three or more languages, a concept first expressed in one of the languages has already been transla… a) literal translation of the source language term or its semantic components; b) construction of a term or phrase which expresses the general meaning of the source language term; |
33 | c) the invention of a neologism, which should be as concise as possible to encourage acceptance (these inventions sometimes approximate to literal translations). 9 Managing mappings and other relationship data 9.1 Management within one system 9.2 Management within two or more separate systems a) the mappings should be maintained in only one of the databases; b) the mappings should be maintained reciprocally in both databases. 9.3 Management external to the source and target vocabulary systems a) the set of two or more equivalent preferred terms, notations or captions, one from each of the interoperating vocabularies; b) the nature of the relationship between them. |
34 | 10 Display of mapped vocabularies 10.1 General 10.2 Single record display Figure 3 Single record display in a bilingual thesaurus (English – Spanish) |
35 | Figure 4 Single record display in a bilingual thesaurus (Spanish – English) 10.3 Multilingual thesaurus displays 10.3.1 Alphabetical displays |
36 | Figure 5 Alphabetical display for a bilingual thesaurus |
37 | Figure 6 Alternative layout of entries in an alphabetical display 10.3.2 Systematic displays |
38 | Figure 7 Hierarchical display for a bilingual thesaurus 10.3.3 Correspondence tables |
39 | Figure 8 Correspondence table for a bilingual thesaurus (English – Spanish) |
40 | Figure 9 Correspondence table for a bilingual thesaurus (Spanish – English) |
41 | 10.3.4 Other displays 10.4 Displays of mappings between structurally different vocabularies 10.4.1 Alphabetical displays Figure 10 Alphabetical display with mappings to two other vocabularies |
42 | 10.4.2 Systematic displays 10.4.3 Correspondence tables 10.5 Language and character encoding issues 10.5.1 General |
43 | 10.5.2 Display issues 10.5.3 Filing orders a) the difference between upper and lower case letters is ignored in sorting for English (and in most other languages in the Latin, Greek, Cyrillic and Georgian scripts); b) in Spanish, the letter ñ files after the letter n, and before the letter o; in French, the tilde (~) is ignored and the letter ñ interfiles with the letter n; c) in Czech, ch is treated as a single letter coming after the letter h and before i; d) several alternatives exist for sorting of ideographic characters, as used in China, Japan and the Republic of Korea. In addit… e) numerical strings are sometimes filed as text, e.g. 1, 10, 102, 11, 120, 2, but other times as numbers, e.g. 1, 2, 10, 11, 102, 120. |
44 | 1) produce multiple outputs, one for each language, in which the sequence of all the others is driven by correspondence to the first one; or 2) choose an underlying systematic sequence (e.g. from smallest to largest or from south to north), applying across all the languages. 10.5.4 Normalization for information retrieval |
45 | 11 Mapping system functionality 11.1 General 11.2 Switching or augmentation of index terms, notations or captions 11.3 Switching of search terms |
46 | 11.4 Expansion of search terms 12 Management of projects for mapping vocabularies and languages 12.1 General 12.2 Structural considerations |
47 | 12.3 Resources for multilingual projects a) a good understanding of each of the natural languages involved; b) a good knowledge of the subject area of the vocabulary; c) a good understanding of the difference between normal translation and the identification of equivalents for information retrieval purposes. |
48 | Annex A (informative) Practical examples encountered during preparation of a multilingual thesaurus A.1 Situation 1 A.1.1 Scenario A.1.2 Solution A A.1.3 Solution B A.1.4 Solution C |
49 | A.1.5 Discussion A.2 Situation 2 A.2.1 Scenario A.2.2 Solution A |
50 | A.2.3 Solution B A.2.4 Solution C |
51 | A.2.5 Solution D A.2.6 Solution E A.2.7 Discussion |
52 | A.3 Situation 3 A.3.1 Scenario |
53 | A.3.2 Solution A A.3.3 Solution B |
54 | A.3.4 Solution C A.3.5 Discussion A.4 Situation 4 A.4.1 Scenario |
55 | A.4.2 Solution A A.4.3 Solution B |
56 | A.4.4 Solution C A.4.5 Discussion |
57 | Bibliography [1] WORLD WIDE WEB CONSORTIUM. Character model for the World Wide Web 1.0: Fundamentals. W3C recommendation, 15 February 2005. [2] WORLD WIDE WEB CONSORTIUM. Character model for the World Wide Web 1.0: Normalization. W3C working draft, 27 October 2005. [3] UNICODE CONSORTIUM, ed. JOAN ALIPRAND et al. The Unicode standard, version 4.0. Boston, MA: Addison-Wesley, 2003. ISBN 0-321-18578-1. [4] WORLD WIDE WEB CONSORTIUM. HTML 4.01 specification. W3C specification, 24 December 1999. |
58 | [5] WORLD WIDE WEB CONSORTIUM. Extensible Markup Language (XML) 1.0 (fourth edition). W3C recommendation, 16 August 2006, edited in place 29 September 2006. [6] WORLD WIDE WEB CONSORTIUM. Extensible Markup Language (XML) 1.1. W3C recommendation16 August 2006, edited in place 29 September 2006. |
59 | Index (BS 8723-4) |