-
Notifications
You must be signed in to change notification settings - Fork 0
/
cisi_queries.txt
1484 lines (1483 loc) · 67 KB
/
cisi_queries.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
</text>
</query>
<query>
<queryno>query_1</queryno>
<text>
What problems and concerns are there in making up descriptive titles?
What difficulties are involved in automatically retrieving articles from
approximate titles?
What is the usual relevance of the content of articles to their titles?
</text>
</query>
<query>
<queryno>query_2</queryno>
<text>
How can actually pertinent data, as opposed to references or entire articles
themselves, be retrieved automatically in response to information requests?
</text>
</query>
<query>
<queryno>query_3</queryno>
<text>
What is information science? Give definitions where possible.
</text>
</query>
<query>
<queryno>query_4</queryno>
<text>
Image recognition and any other methods of automatically
transforming printed text into computer-ready form.
</text>
</query>
<query>
<queryno>query_5</queryno>
<text>
What special training will ordinary researchers and businessmen need for proper
information management and unobstructed use of information retrieval systems?
What problems are they likely to encounter?
</text>
</query>
<query>
<queryno>query_6</queryno>
<text>
What possibilities are there for verbal communication between computers and
humans, that is, communication via the spoken word?
</text>
</query>
<query>
<queryno>query_7</queryno>
<text>
Describe presently working and planned systems for publishing and printing
original papers by computer, and then saving the byproduct, articles coded in
data-processing form, for further use in retrieval.
</text>
</query>
<query>
<queryno>query_8</queryno>
<text>
Describe information retrieval and indexing in other languages.
What bearing does it have on the science in general?
</text>
</query>
<query>
<queryno>query_9</queryno>
<text>
What possibilities are there for automatic grammatical and contextual analysis
of articles for inclusion in an information retrieval system?
</text>
</query>
<query>
<queryno>query_10</queryno>
<text>
The use of abstract mathematics in information retrieval, e.g. group theory.
</text>
</query>
<query>
<queryno>query_11</queryno>
<text>
What is the need for information consolidation, evaluation, and retrieval in
scientific research?
</text>
</query>
<query>
<queryno>query_12</queryno>
<text>
Give methods for high speed publication, printing, and distribution of
scientific journals.
</text>
</query>
<query>
<queryno>query_13</queryno>
<text>
What criteria have been developed for the objective evaluation of information
retrieval and dissemination systems?
</text>
</query>
<query>
<queryno>query_14</queryno>
<text>
What future is there for automatic medical diagnosis?
</text>
</query>
<query>
<queryno>query_15</queryno>
<text>
How much do information retrieval and dissemination systems, as well as
automated libraries, cost?
Are they worth it to the researcher and to industry?
</text>
</query>
<query>
<queryno>query_16</queryno>
<text>
What systems incorporate multiprogramming or remote stations in information
retrieval? What will be the extent of their use in the future?
</text>
</query>
<query>
<queryno>query_17</queryno>
<text>
Means of obtaining large volume, high speed, customer usable
information retrieval output.
</text>
</query>
<query>
<queryno>query_18</queryno>
<text>
What methods are there for encoding, automatically matching,
and automatically drawing structures extended in two dimensions,
like the structural formulas for chemical compounds?
</text>
</query>
<query>
<queryno>query_19</queryno>
<text>
Techniques of machine matching and machine searching systems.
Coding and matching methods.
</text>
</query>
<query>
<queryno>query_20</queryno>
<text>
Testing automated information systems.
</text>
</query>
<query>
<queryno>query_21</queryno>
<text>
The need to provide personnel for the information field.
</text>
</query>
<query>
<queryno>query_22</queryno>
<text>
Automated information in the medical field.
</text>
</query>
<query>
<queryno>query_23</queryno>
<text>
Amount of use of books in libraries.
Relation to need for automated information systems .
</text>
</query>
<query>
<queryno>query_24</queryno>
<text>
Educational and training requirements for personnel in the information field.
Possibilities for this training. Needs for programs providing this training.
</text>
</query>
<query>
<queryno>query_25</queryno>
<text>
International systems for exchange and dissemination of information.
</text>
</query>
<query>
<queryno>query_26</queryno>
<text>
Cost and determination of cost associated with systems of automated information.
</text>
</query>
<query>
<queryno>query_27</queryno>
<text>
Computerized information retrieval systems. Computerized indexing systems.
</text>
</query>
<query>
<queryno>query_28</queryno>
<text>
Computerized information systems in fields related to chemistry.
</text>
</query>
<query>
<queryno>query_29</queryno>
<text>
Specific advantages of computerized index systems.
</text>
</query>
<query>
<queryno>query_30</queryno>
<text>
Information dissemination by journals and periodicals.
</text>
</query>
<query>
<queryno>query_31</queryno>
<text>
Information systems in the physical sciences.
</text>
</query>
<query>
<queryno>query_32</queryno>
<text>
Attempts at computerized and mechanized systems for general libraries.
Problems and methods of automated general author and title indexing systems.
</text>
</query>
<query>
<queryno>query_33</queryno>
<text>
Retrieval systems which provide for the automated transmission of information
to the user from a distance.
</text>
</query>
<query>
<queryno>query_34</queryno>
<text>
Methods of coding used in computerized index systems.
</text>
</query>
<query>
<queryno>query_35</queryno>
<text>
Government supported agencies and projects dealing with information dissemination.
</text>
</query>
<query>
<queryno>query_36</queryno>
<text>
What are some of the theories and practices in computer translating of
texts from one national language to another? How can machine translating
compete with traditional methods of translating in comprehending nuances
of meaning in languages of different structures?
</text>
</query>
<query>
<queryno>query_37</queryno>
<text>
What lists of words useful for indexing or classifying material are
available? Wanted are lists of terms that are descriptive vocabularies
of particular fields or schedules of words that are related to each other
in meaningful schemes. Wanted are lists that have been tested, at least to
some extent, and found useful for organizing material and for retrieving it.
</text>
</query>
<query>
<queryno>query_38</queryno>
<text>
How can access words in an information retrieval system be kept up to date?
Word meanings and usage often change and lists must be dynamic to be current.
What definitions of the problem and progress toward solutions have been made
in providing necessary flexibility in systems of subject headings, index
words, or other symbols used for getting at stored data?
</text>
</query>
<query>
<queryno>query_39</queryno>
<text>
The progress of information retrieval presents problems of maladjustment
and dislocation of personnel. Training and retraining of people to use
the new equipment is important at all levels. Librarians, assistants,
technicians, students, researchers, and even executives will need education
to learn the purpose, values, and uses of information systems and hardware.
What programs have been developed to change the attitudes and skills of
traditional workers and help them to learn the newer techniques?
</text>
</query>
<query>
<queryno>query_40</queryno>
<text>
What is the status of machine translation? What progress has been
made in the use of computers to transfer from one language to another
with some degree of automation? What problems and stumbling blocks
have been found and are they considered to be insurmountable limitations
or only challenging to the field of documentation on an international scale?
</text>
</query>
<query>
<queryno>query_41</queryno>
<text>
Is alphabetical ordering of material considered to be a useful tool in
information retrieval? What studies have been done to compare the
effectiveness of alphabetical order with other organization schemes?
Is there a generally accepted form of arranging material in
alphabetical order, and is there an easy way of achieving this form
without going to a great amount of effort?
</text>
</query>
<query>
<queryno>query_42</queryno>
<text>
The average student or researcher has difficulty in comprehending the
vocabulary of information retrieval. It appears important that this
new field be understood before it is to be fully accepted. What basic
articles would provide an understanding of the various important aspects
of the information storage and retrieval?
</text>
</query>
<query>
<queryno>query_43</queryno>
<text>
The difficulties encountered in information retrieval systems are often
less related to the equipment used than to the failure to plan
adequately for document analysis, indexing, and machine coding. The
position of the programmer is to take a problem and write it in a way
in which the equipment will understand. What articles have been written
describing research in maximizing the effectiveness of programming?
</text>
</query>
<query>
<queryno>query_44</queryno>
<text>
There are presently fifty to one hundred technical journals being
published. On the average, two new journals appear every day. In
the many journals published, one to two million articles appear every
year. What attempts have been made to cope with this amount of
scientific and technical publication in terms of analysis, control,
storage, and retrieval?
</text>
</query>
<query>
<queryno>query_45</queryno>
<text>
I am looking for information about the impact of automation on
libraries and its significance for libraries in general. This includes
the increasing importance of automation in view of the proliferation of
information today, and how automation can help libraries cope with
this problem. How will automation affect libraries and how should they
react to the idea of automation?
</text>
</query>
<query>
<queryno>query_46</queryno>
<text>
I am seeking information on the use of data processing in libraries and
the mechanization of routine library processes and procedures. I would
like descriptions of both general and specific applications of
automation in such areas as circulation, cataloging, acquisitions,
serial records, and other record-keeping. Examples should be based on
the operation of a conventional public or university library, or
practices in a special library which could also be applied in a public
or university library. Give descriptions of equipment and operations,
both present and projected.
</text>
</query>
<query>
<queryno>query_47</queryno>
<text>
Is there any established means at present for an international exchange
of material about information retrieval? If there is, does it take
the form of an international agency or center which regularly
distributes information retrieval methods and research results? If
there is not, in what ways has this material crossed national
boundaries? What seem to have been some of the problems blocking a
better international exchange, and is any effort being made to solve
some of those problems?
</text>
</query>
<query>
<queryno>query_48</queryno>
<text>
Information retrieval is still such a new and experimental field that a
line distinguishing research and practice is often difficult - even
impossible - to draw. Are there, however, actual centers of research
on information retrieval? If so, in which countries are they
located? Who supports them - government, business, universities, or
libraries? Can information retrieval as a specialized research
discipline be said to be emerging, or is it still an amalgam of skills
from other fields, such as mathematics, engineering, and library
science? In other words, tell me about information retrieval research.
</text>
</query>
<query>
<queryno>query_49</queryno>
<text>
Most resources have been spent on applying information retrieval
techniques to the physical and medical sciences. But, has information
retrieval been used at all in the natural sciences, social sciences,
and humanities? If so, what have been some of the problems which have
been encountered with these subject areas and how have they been
solved, if at all? Have the characteristics of these subject areas
necessitated the development of new information retrieval techniques?
What are the prospcts for future machine control in these areas?
</text>
</query>
<query>
<queryno>query_50</queryno>
<text>
Is there any use for traditional classification schemes - DDC, UDC, LC,
etc. - in information retrieval systems? If there is, which scheme
appears most suited to machine use and where has it been applied?
If there is not, why are these classification schemes irrelevant?
Has research shown that a subject classification of knowledge is
completely unnecessary in machine systems? Or, have new schemes
been devised which appear to be more suited to machine use?
</text>
</query>
<query>
<queryno>query_51</queryno>
<text>
Coordinate indexing utilizes descriptors for controlled language. Of
what use are descriptors in the construction of an index? How can
descriptors be used for searching in an information retrieval system?
</text>
</query>
<query>
<queryno>query_52</queryno>
<text>
What are the characteristics of MEDLARS (Medical Literature Analysis
and Retrieval System) project which has been undertaken by the
National Library of Medicine? How does it index current medical
journals and of what relation is this indexing system to Index Medicus?
What are the major components of the MEDLARS project and its major operating
details?
</text>
</query>
<query>
<queryno>query_53</queryno>
<text>
How can the computer be used in medical science for diagnostic and
clinical record keeping purposes? Have any programs of automation
been tried in hospitals? If so, what have been the results?
What problems have been encountered in the use of automation in
medicine? For what purposes can an automated system of clinical
records be used? What are other possible uses of the computer in medicine?
</text>
</query>
<query>
<queryno>query_54</queryno>
<text>
What is the effect on librarians of automation? Note the new types
of technology to be used in the library which will have an effect on
the status, position, and function of the librarians. What changes
are being contemplated or have been initiated to introduce automation
into the education of librarians?
</text>
</query>
<query>
<queryno>query_55</queryno>
<text>
What are the aims and objectives of the medical literature analysis
and retrieval system (MEDLARS)? How does MEDLARS operate? What are
the possible applications of MEDLARS to future information retrieval
systems?
</text>
</query>
<query>
<queryno>query_56</queryno>
<text>
The standard method of finding information in today's libraries is
through the use of the alphabetically arranged card catalog or the
classified catalog based on a classification system such as the DC or
LC. Can these systems be modified for use with automated information
retrieval?
</text>
</query>
<query>
<queryno>query_57</queryno>
<text>
In catalogs which are either arranged alphabetically or arranged by
classification number, the LC entry, printed in readable language, is
ultimately important because the individual looking for information
has a definite author, title, or subject phrase in his language
(probably English in our case) in mind. Will LC entries and subject
headings be used in the same manner in automated systems?
</text>
</query>
<query>
<queryno>query_58</queryno>
<text>
Directions in Library Networking
Bibliographic control before and after MARC is reviewed. The capability
of keying into online systems brought an interdependence among libraries,
the service centers that mediate between them, and the large utilities that
process and distribute data. From this has developed the basic network
structure among libraries in the United States. The independent development
of major networks has brought problems in standardization and coordination.
The authors point out that while technology has led toward centralization
of automated library services, new developments are now pushing toward
decentralization. Coordination is a requirement to avoid fragmentation in
this new environment.
</text>
</query>
<query>
<queryno>query_59</queryno>
<text>
Performance Testing of a Book and Its Index as a Information Retrieval
System
The retrieval performance of book indexes can be measured in terms of
their ability to direct a user selectively to text material whose identity
but not location is known. The method requires human searchers to base
their searching strategies on actual passages from the book rather than on
test queries, natural or contrived. It circumvents the need for relevance
judgement, but still yields performance indicators that correspond
approximately to the recall and precision ratios of large document retrieval
system evaluation. A preliminary application of the method to the subject
indexing of two major encyclopedias showed one encyclopedia apparently
superior in both the finding and discrimination abilities of retrieval
performance. The method is presently best suited for comparative testing
since its ability to yield absolute or reproducible measures is as yet not
established.
</text>
</query>
<query>
<queryno>query_60</queryno>
<text>
The Combined Use of Bibliographic Coupling and Cocitation for Document
Retrieval
A linkage similarity measure which takes into account both the bibliographic
coupling of documents and their cocitations (both cited and citing papers)
produced improved document retrieval over a measure based only on
bibliographic coupling. The test collection consisted of 1712 papers whose
relevance to specific queries had been judged by users. To evaluate the
effect of using cocitation data, we calculated for each query two measures
of similarity between each relevant paper and every other paper retrieved.
Papers were then sorted by the similarity measures, producing two ordered
lists. We then compared the resulting predictions of relevance, partial
relevance, and non-relevance to the user's evaluations of the same papers.
Overall, the change from the bibliographic coupling measure to the linkage
similarity measure, representing the introduction of cocitation data,
resulted in better retrieval performance.
</text>
</query>
<query>
<queryno>query_61</queryno>
<text>
Searching Biases in Large Interactive Document Retrieval Systems
The way that individuals construct and modify search queries on a
large interactive document retrieval system is subject to systematic biases
similar to those that have been demonstrated in experiments on judgements
under uncertainty. These biases are shared by both naive and sophisticated
subjects and cause the inquirer searching for documents on a large interactive
system to construct and modify queries inefficiently. A searching algorithm
is suggested that helps the inquirer to avoid the effect of these biases.
</text>
</query>
<query>
<queryno>query_62</queryno>
<text>
Fuzzy Requests: An Approach to Weighted Boolean Searches
This article concerns the problem of how to permit a patron to
represent the relative importance of various index terms in a Boolean
request while retaining the desirable properties of a Boolean system.
The character of classical Boolean systems is reviewed and related to the
notion of fuzzy sets. The fuzzy set concept then forms the basis of the
concept of a fuzzy request in which weights are assigned to index terms.
Ther properties of such a system are discussed, and it is shown that such
systems retain the manipulability of traditional Boolean requests.
</text>
</query>
<query>
<queryno>query_63</queryno>
<text>
Feature Comparison of an In-House Information Retrieval System With a
Commercial Search Service
A commercially available online search was used as a standard for
comparative searching and evaluation of an in-house information system
based on automatic indexing. System features were identified and
evaluated on the basis of their usefulness in various kinds of searching,
their ease in implementation, and how they are influenced by differences
in user type or specific applications. Some common features of the
commercial system, such as online instruction, user-specified print formats,
dictionary display, and truncation, are seen to be unnecessary or
impractical for the in-house system. In designing the in-house system,
therefore, detald consideration must be given to the applications,
operating environment, and real user needs. While a commercial system can
serve as a useful standard for comparative evaluation, one must be
careful not to attempt to duplicate it blindly in-house.
</text>
</query>
<query>
<queryno>query_64</queryno>
<text>
Measurement in Information Science: Objective and Subjective Metrical Space
It is argued that in information science we have to distinguish
physical, objective, or document space from perspective, subjective, or
information space. These two spaces are like maps and landscapes: each
is a systematic distortion of the other. However, transformation can be
easily made once the two spaces are distinguished. If the transformations
are omitted we only get unhelpful physical solutions to information problems.
</text>
</query>
<query>
<queryno>query_65</queryno>
<text>
A Model of Cluster Searching Based on Classification
The use of document clusters has been suggested as an efficient file
organization for a document retrieval system. It is possible that by
using this information about the relationships between documents that the
effectiveness of the system (i.e., its ability to distinguish relevant
from non-relevant documents) may also be improved. In this paper a
probabilistic model of cluster searching based on query classification is
described. This model is tested with retrieval experiments which indicate
that it can be more effective than heuristic cluster searches and cluster
searches based on other models. It can also be more effective than a full
search in which every document is compared to the query. The efficiency
aspects of the implementation of the model are discussed.
(Inform. Systems, Vol. 5, No. 3, 1980, pp. 189-195)
</text>
</query>
<query>
<queryno>query_66</queryno>
<text>
The Technology of Library and Information Networks
Current online library network technology is described, including the
physical and functional aspects of networks. Three types of networks are
distinguished: search service (e.g., SDC, Lockheed), customized service
that provide bibliographic files (e.g., OCLC, Inc., RLIN), and service
center (e.g., NELINET, INCOLSA). It is predicted that as technology
evolves more services will be provided outside the library directly to the
user through his home or office.
</text>
</query>
<query>
<queryno>query_67</queryno>
<text>
The Use of Titles for Automatic Document Classification
An experimental computer program has been developed to classify
documents according to the 80 sections and five major section groupings of
Chemical Abstracts (CA). The program uses pattern recognition techniques
supplemented by heuristics. During the "training" phase, words from
pre-classified documents are selected, and the probability of occurrence
of each word in each section of CA is computed and stored in a reference
dictionary. The "classification" phase matches each word of a document
title against the dictionary and assigns a section number to the document
using weights derived from the probabilities in the dictionary. Heuristic
techniques are used to normalize word variants such as plurals, past
tenses, and gerunds in both the training phase and the classification
phase. The dictionary lookup technique is supplemented by the analysis of
chemical nomenclature terms into their component word roots to influence
the section to which the documents are assigned. Program performance and
human consistency have been evaluated by comparing the program results
against the published sections of CA and by conducting an experiment with
people experienced in the assignment of documents to CA sections. The
program assigned approximately 78% of the documents to the correct major
section groupings of CA and 67% of the correct sections or cross-references
at a rate of 100 documents per second.
</text>
</query>
<query>
<queryno>query_68</queryno>
<text>
Brief Communications
Some of the automatic classification procedures used in information
retrieval derive clusters of documents from an intermediate similarity
matrix, the computation of which involves comparing each of the documents
in the collection with all of the others. It has recently been suggested
that many of these comparisons, specifically those between documents
having no terms in common, may be avoided by means of the uyse of an inverted
file to the document collection. This communication shows that the
approach will effect reductions in the number of interdocument comparisons
only if the documents are each indexed by a limited number of indexing
terms; if exhaustive indexing is used, many document pairs will be compared
several times over and the computation will be greater than when
conventional approaches are used to generate the similarity matrix.
</text>
</query>
<query>
<queryno>query_69</queryno>
<text>
The Application of a Minicomputer to Thesaurus Construction
The Use of a minicomputer in various phases of creating the thesaurus
for the National Information Center for Special Education Materials
(NICSEM) database is described. The minicomputer is used to collect,
edit, and correct candidate thesaurus terms. The use of the minicomputer
eases the process of grouping terms into files of similar concepts and
facilitates the generation of products useful in vocabulary review and in
term structuring. Syndetic relations, indicated by assigning coded
identification numbers, are altered easily in the design phase to reflect
restructuring requirements. Because thesaurus terms are already in machine-
readable form, it is simple to prepare print programs to provide permuted,
alphabetic, hierarchical, and chart formatted term displays. Overall, the
use of the minicomputer facilitates initial thesaurus entry development by
reducing clerical effort, editorial staff decisions, and overall processing
times.
</text>
</query>
<query>
<queryno>query_70</queryno>
<text>
Adaptive Design for Decision Support Systems
Decision Support Systems (DSS) represent a concept of the role of
computers within the decision making process. The term has become a
rallying cry for researchers, practitioners, and managers concerned that
Management Science and Management Information Systems fields have become
unnecessarily narrow in focus. As with many rallying cries, the term is
not well defined. For some writers, DSS simply mean interactive systems
for use by managers. To others, the key issue is support, rather than
system. They focus on understanding and improving the decision process;
a DSS is then designed using any available and suitable technology. Some
researchers view DSS as a subfield of MIS, while others regard it as an
extension of Management Science techniques. The former define Decision
Support as providing managers with access to data and the latter as giving
them access to analytic models.
The key argument of this paper is that the term DSS is relevant to
situations where a "final" system can be developed only through an
adaptive process of learning and evolution. The design strategy must
then focus on getting finished; this is very different from Management
Science and Data Processing approaches. The research issued for DSS
center around adaption and evolution; they include managerial learning
representation of tasks and user behavior, design architecture and
strategies for getting started.
</text>
</query>
<query>
<queryno>query_71</queryno>
<text>
An Automatic Method for Extracting Significant
Phrases in Scienfific or Technical Documents
A new method is described to extract significant phrases in the title
and the abstreact of scientific or technical documents. The method is
based upon a text structure analysis and uses a relatively small dictionary.
The dictionary has been constructed based on the knowledge about concepts
in the field of science or technology and some lexical knowledge. For
significant phrases and their component items may be used in different
meanings among the fields. A text analysius approach has been applied to
select significant phrases as substantial and semantic information carriers
of the contents of the abstract.
The results of the experiment for five sets of documents have shown
that the significant phrases are effectively extracted in all cases, and
the number of them for every document and the processing time is fairly
satisfactory. The information representation of the document, partly
using the method, is discussed with relation to the construction of the
document information retrieval system.
(Info. Proc. & Management, Vol. 16, No. 3, 1980, pp.119-127)
</text>
</query>
<query>
<queryno>query_72</queryno>
<text>
Answer-Passage Retrieval by Text Searching
Passage retrieval (already operational for lawyers) has advantages in
output form opver references retrieval and is economically feasible.
Previous experiments in passage retrieval for scientists have demonstrated
recall and false retrieval rates as good or better than those of present
reference retrieval services. The present experiment involved a greater
variety of forms of retrieval question. In addition, search words were
selected independently by two different people for each retrieval question.
The search words selected, in combination with the computer procedures used
for passage retrieval, produced average recall ratios of 72 and 67%,
respectively, for the two selectors. The false retrieval rates were (except
for one predictably difficult question) respectively 13 and 10 falsely
retrieved sentences per answer-paper retrieved.
</text>
</query>
<query>
<queryno>query_73</queryno>
<text>
Partial-Match Retrieval Using Indexed Descriptor Files
In this paper we describe a practical method of partial-match retrieval
in very large data files. A binary code word, called a descriptor, is
associated with each record of the file. These record descriptors are
then used to form a derived descriptor for a block of several records,
which will serve as an index for the block as a whole; hence, the name
"indexed descriptor files."
First the structure of these files is described and a simple, efficient
retrieval algorithm is presented. Then its expected behavior, in terms of
storage accesses, is analyzed in detail. Two different file creation
procedures are sketched, and a number of ways in which the file organization
can be "tuned" to a particular application is suggested.
</text>
</query>
<query>
<queryno>query_74</queryno>
<text>
Cooperation and Competition Among Library Networks
Recenty technological advances and the success of OCLC, Inc. has led
to the emergence of three additional nonprofit library networks: the
Research Libraries Information Network (RLIN) of the Research Libraries
Group, Inc., the University of Toronto Library Automation System (UTLAS),
and the Washington Library Network (WLN). This paper examines the economic
and technological factors affecting the evolution of these networks and
also explores the role of those state and regional (multistate) networks
that broker OCLC services. The competitive and cooperative nature of
network relationships is a major theme of the discussion.
</text>
</query>
<query>
<queryno>query_75</queryno>
<text>
An Integrated Understander
A new type of natural language parser is presented. The idea behind
this parser is to map input sentences into the deepest form of the
representation of their meaning and inferences, as is appropriate. The
parser is not distinct from an entire understanding system. It uses an
integrated conception of inferences, scripts, plans and other knowledge to
aid in the parse. Furthermore, it does not attempt to parse everything it
sees. Rather, it determines what is most interesting and concentrates on
that, ignoring the rest.
</text>
</query>
<query>
<queryno>query_76</queryno>
<text>
Library Networks and Resource Sharing in the United States:
An Historical and Philosophical Overview
This paper discusses the origins of library networks and traces their
development in the United States in the late 1960s through the present.
The concept of resource sharing, with particular attention to the inter-
library loan and programs for the cooperative acquisition and storage of
materials, is examined in relationship to library networks. In particular,
attention is given to the question of how these two major components of
library cooperation, which have tended to be separate, might become more
closely integrated.
</text>
</query>
<query>
<queryno>query_77</queryno>
<text>
Normalization of Titles and Their Retrieval
This paper presents a method of normalizations of English titles and
their retrieval. The title expressed by a noun phrase or a noun clause
is converted to a function-expression by parsing. For the retrieval with
a reasonable recall rate as well as a high precision rate, the function-
expression is transformed to a predicate-governor form, and then normalized
to a standard form. Therefrom, various items are extracted and recorded
in a hierarchical tree-like inverted file.
In order to keep the recall rate in a reasonable value, several
retrieval stages are implemented based on the key-term and case-label
matching. The retrieval is controlled by the preciseness of the specification
of case-labels for each key-term.
(Info. Proc. & Management, Vol. 16, No. 3, 1980, pp. 155-167)
</text>
</query>
<query>
<queryno>query_78</queryno>
<text>
Cascaded ATN Grammars
A generalization of the notion of ATN grammar, called a cascaded ATN
(CATN), is prescribed. CATN's permit a decomposition of complex language
understanding behavior into a sequence of cooperating ATN's with separate
domain of responsibility, where each stage (called an ATN transducer)
takes its input from the output of the previous stage. The paper includes
an extensive discjussion of the principles of factoring-conceptual
factoring reduces the number of places that a given fact needs to be
represented in a grammar, and hypothesis factoring reduces the number
of distinct hypotheses that have to be considered during parsing.
</text>
</query>
<query>
<queryno>query_79</queryno>
<text>
Algorithms for Processing Partial Match Queries Using Word Fragments
Algorithms are given to process partially specified queries in a
compressed database system. The proposed methods handle effectively
queries that use either whole words or word fragments as language elements.
The methods are compared and critically evaluated in terms of the design
and retrieval costs. The analyses show that the method which exploits the
interdependence of fragments as well as the relevance of fragments to
records in the file has maximum design cost and least retrieval cost.
(Inform. Systems, Vol. 5, No. 4, April 1980, pp. 323-332)
</text>
</query>
<query>
<queryno>query_80</queryno>
<text>
A General Formulation of Bradford's Distribution: The Graph-Oriented
Approach
From the detailed analysis of eight previously published mathematical
models, a general formulation of Bradford's distribution can be deduced as
follows: y = a log(x + c) + b, where y is the ratio of the cumulative
frequency of articles to the total number of articles and x is the ratio
of the rank of journals to the total number of journals. The parameters a, b,
and c are the slope, the intercept, and the shift in a straight line to log rank,
respectively. Each of the eight models is a special case of the general
formulation and is one of five types of formulation. In order to estimate
three unknown parameters, a statistical method using root-weighted square
error is proposed. A comparative experiment using 11 databases suggests that
the fifth type of formulation with three unknown parameters is the best fit
to the observed data. A further experiment shows that the deletion of the
droop data leads to a more accurate value of parameters and less error.
</text>
</query>
<query>
<queryno>query_81</queryno>
<text>
Lexical Problems in Large Distributed Information Systems
The lexical problems in large information systems are created by the
necessity of handling a great number of names and their interrelations.
Such lexical problems are not covered completely by the concept data
dictionaries, which are mostly concerned with database scheme design rather
than the execution of operations. In this paper we introduce our view of a
lexical subsystem as a separate component in an information system architecture,
to deal with linguistic and control functions concerning the lexical problems
in local and network environments. The lexical suybsystem is a special
efficiently organized program package, which plays the role of a "linguistic
filter" in a broad sense for lexically incorrect queries, promotes integration
of databases and information retrieval systems, and facilitates the creation
of local information systems. We hope that lexical subsystems can become
productive for any large, especially distributed, information system.
(Information Processing & Management, Vol. 16, February 1980, pp. 259-267)
</text>
</query>
<query>
<queryno>query_82</queryno>
<text>
The Relational Model in Information Retrieval
The relational model has received increasing attention during the
past decade. Its advantages include simplicity, consistency, and a sound
theoretical basis. In this article, the naturalness of viewing information
retrieval relationally is demonstrated. The relational model is presented,
and the relational organization of a bibliographical database is shown.
The notion of normalization is introduced and first, second, third, and
fourth normal forms are demonstrated. Relational languages are discussed,
including the relational calculus, relational algebra, and SEQUEL.
Numerous examples pertinent to information retrieval are presented in these
relational languages. Advantages of the relational approach to information
retrieval are noted.
</text>
</query>
<query>
<queryno>query_83</queryno>
<text>
Electronic Information Interchange in an Office Environment
This paper describes an architectural approach that provides information
exchange across a broad spectrum of user applications and office automation
offerings. Some of the architectures described herein are currently
implemented in existing IBM products. These and other architectures will
provide the basis for document interchange capability between products
such as the IBM 5520 Administrative System, the IBM System/370 Distributed
Office Support System (DISOSS), and the IBM Displaywriter System.
Specifically described is a document distribution architecture and its
associated data streams and others.
A general overview of the architectures as opposed to a detailed
technical description is provided. The architectures described are
protocols for interchange between application processes; they do not
address the specific user interface. The document distribution
architectures utilize SNA for data transmission and communications control
facilities.
(IBM Systems Journal, Vol. 20, No. 1, 1981, pp. 4-22)
</text>
</query>
<query>
<queryno>query_84</queryno>
<text>
The Use of Automatic Relevance Feedback in Boolean Retrieval Systems
A technique is described for automatic reformulation of boolean
queries. Based on patron relevance judgements of an initial retrieval,
prevalence measures are derived for terms appearing in the retrieved set
of documents that reflect a term's distribution among the relevant and
non-relevant documents. These measures are then used to guide the
construction of a boolean query for a subsequent retrieval. To illustrate
the technique, a series of tests is described of its application to a small
data base in an experimental environment. Results compare favourably with
feedback as employed in a SMART-type system. MOre extensive testing is
suggested to validate the technique.
</text>
</query>
<query>
<queryno>query_85</queryno>
<text>
Interacting in Natural Language With Artificial Systems: The Donau Project
This paper is intended to propose a new methodological approach to
the conception and development of natural language understanding systems.
This new contribution is supported by the design, implementation, and
experimentation of DONAU: a general purpose domain oriented natural
language understanding system developed and presently running at the Milan
Polytechnic Artificial Intelligence Project. The system is based on a two
level modular architecture intended to overcome the lack of flexibility and
generality often pointed out in many existing systems, and to facilitate
the exchange of results and actual experiences between different projects.
The horizontal level allows an independent and parallel development of the
single segments of the system (syntactic analyser, information extractor,
legality controller). The vertical level ensures the possibility of changing
(enlarging or redefining) the definition of the semantic domain on which each
particular version of the system is oriented and specialized in a simple,
incremental, and user-oriented way. In the paper the general architecture of
the system and the mode of operation of each segment are illustrated in
detail. Linguistic models, knowledge representation, and parsing algorithms
are described and illustrated by means of selected examples. Performance
evaluations of the system in the application version on data base inquiry are
reported and discussed. Promising directions for future research are presented
in the conclusions.
(Inform. Systems, Vol. 5, NO. 4, February 1980, pp. 333-344)
</text>
</query>
<query>
<queryno>query_86</queryno>
<text>
Approximate String Matching
Approximate matching of strings is reviewed with the aim of
surveying techniques suitable for finding an item in a database when
there may be a spelling mistake or other error in the keyword. The
methods found are classified as either equivalence or similarity problems.
Equivalence problems are seen to be readily solved using canonical forms.
For similarity problems difference measures are surveyed, with a full
description of the well-established dynamic programming method relating