现代信息检索：英文版

作　者：	Ricardo Baeza-Yates Berthier Ribeiro-Neto
出版社：	机械工业出版社
丛编项：	经典原版书库
版权说明：	本书为出版图书，暂不支持在线阅读，请支持正版图书
标　签：	外文原版书图书馆学档案学社会科学

ISBN	出版时间	包装	开本	页数	字数
未知	暂无	暂无	未知	0	暂无

作者简介

　　Ricardo Baeza-Yates于加拿大滑铁卢大学获得计算机科学博士学位。曾担任智利计算机科学学会主席，现任智利大学计算机科学系全职教授，还是ACM、AMS、EATCS、IEEE、SCCC及SIAM会员。他的主要研究方向为算法与数据结构、文本检索、图形界面以及可视化在数据库中的应用。Berthier Ribeiro-Neto于加利福尼亚大学洛杉矶分校获得计算机科学博士学位。现任巴西Minas Gerais联合大学计算机科学系副教授，同时也是ACM、ASIS及IEEE会员。他的主要研究方向是信息检索系统、数字图书馆、Web界面及视频点播。

内容简介

本书介绍了有关信息检索方面的所有新变化，而且其组织（包括支持本书的主页www．dcc．ufmg．br／irbook）使读者既可以对现代信息检索有一个全面的了解，又可以获取现代信息检索所有关键主题的详细知识。本书的主要内容由信息检索领域的代表人物Baeza-Yates和Ribeiro-Neto编写，对于那些希望深入研究关键领域的读者，书中还提供了由其他主要研究人员编写的关于特殊主题的发展现状的内容：并行和分布式信息检索一算法和体系结构。用户界面和可视化一查询组织和结果可视化的主要界面范型。多媒体信息检索：模型与语言——包括IMULTOS和SQL3。索引和搜索——R树、GEMINI和QBIC。图书馆和图书目录系统——在线系统和公共访问目录。数字图书馆——有效部署面临的主要挑战。文本信息检索——所有主要的信息检索模型、查询操作．文本操作、索引和搜索。Web——挑战。方法和模型、搜索引擎、目录、查询语言、元搜索及趋势。本书可以作为信息检索专业必修课程及相关专业研究生课程的教材。同时，本书对于计算机科学、信息科学和图书馆科学专业的学生，以及相关产品的程序员及分析人员，也是非常有价值的。RicardoBaeza-Yates于加拿大滑铁卢大学获得计算机科学博士学位。曾担任智利计算机科学学会（SCCC）主席，现任智利大学计算机科学系全职教授，同时也是世界上多所大学的客座教授，还是ACM、AMS、EATCS、IEEE、SCCC及SIAM会员。他的主要研究方向为算法与数据结构、文本检索，图形界面以及可视化在数据库中的应用。BerthierRibeiro-Neto于加利福尼亚大学洛杉矶分校获得计算机科学博士学位。现任巴西MinasGerais联合大学计算机科学系副教授，同时也是ACM、ASIS及IEEE会员。他的主要研究方向是信息检索系统、数字图书馆。Web界面及视频点播。

图书目录

Preface

Acknowledgements

Biographies

1 Introduction

1.1 Motivation

1.1.1 Information versus Data Retrieval

1.1.2 Information Retrieval at the Center of the Stage

1.1.3 Focus of the Book

1.2 Basic Concepts

1.2.1 The User Task

1.2.2 Logical View of the Documents

1.3 Past, Present, and Future

1.3.1 Early Developments

1.3.2 Information Retrieval in the Library

1.3.3 The Web and Digital Libraries

1.3.4 Practical Issues

1.4 The Retrieval Process

1.5 Organization of the Book

1.5.1 Book Topics

1.5.2 Book Chapters

1.6 How to Use this Book

1.6.1 Teaching Suggestions

1.6.2 The Book's Web Page

1.7 Bibliographic Discussion

2 Modeling

2.1 Introduction

2.2 A Taxonomy of Information Retrieval Models

2.3 Retrieval: Ad hoc and Filtering

2.4 A Formal Characterization of IR Models

2.5 Classic Information Retrieval

2.5.1 Basic Concepts

2.5.2 Boolean Model

2.5.3 Vector Model

2.5.4 Probabilistic Model

2.5.5 Brief Comparison of Classic Models

2.6 Alternative Set Theoretic Models

2.6.1 Fuzzy Set Model

2.6.2 Extended Boolean Model

2.7 Alternative Algebraic Models

2.7.1 Generalized Vector Space Model

2.?.2 Latent Semantic Indexing Model

2.7.3 Neural Network Model

2.8 Alternative Probabilistic Models

2.8.1 Bayesian Networks

2.8.2 Inference Network Model

2.8.3 Belief Network Model

2.8.4 Comparison of Bayesian Network Models .

2.8.5 Computational Costs of Bayesian Networks

2.8.6 The Impact of Bayesian Network Models

2.9 Structured Text Retrieval Models

2.9.1 Model Based on Non-Overlapping Lists

2.9.2 Model Based on Proximal Nodes

2.10 Models for Browsing

2.10.1 Flat Browsing

2.10.2 Structure Guided Browsing

2.10.3 The Hypertext Model

2.11 Trends and Research Issues

2.12 Bibliographic Discussion

3 Retrieval Evaluation

3.1 Introduction

3.2 Retrieval Performance Evaluation

3.2.1 Recall and Precision

3.2.2 Alternative Measures

3.3 Reference Collections

3.3.1 The TREC Collection

3.3.2 The CACM and ISI Collections

3.3.3 The Cystic Fibrosis Collection

3.4 Trends and Research Issues

3.5 Bibliographic Discussion

4 Query Languages

4.1 Introduction

4.2 Keyword-Based Querying

4.2.1 Single-Word Queries

4.2.2 Context Queries

4.2.3 Boolean Queries

4.2.4 Natural Language

4.3 Pattern Matching

4.4 Structural Queries

4.4.1 Fixed Structure

4.4.2 Hypertext

4.4.3 Hierarchical Structure

4.5 Query Protocols

4.6 Trends and Research Issues

4.7 Bibliographic Discussion

5 Query Operations

5.1 Introduction

5.2 User Relevance Feedback

5.2.1 Query Expansion and Term Reweighting for the Vector Model

5.2.2 Term Reweighting for the Probabilistic Model

5.2.3 A Variant of Probabilistic Term Reweighting

5.2.4 Evaluation of Relevance Feedback Strategies

5.3 Automatic Local Analysis

5.3.1 Query Expansion Through Local Clustering

5.3.2 Query Expansion Through Local Context Analysis

5.4 Automatic Global Analysis

5.4.1 Query Expansion based on a Similarity Thesaurus

5.4.2 Query Expansion based on a Statistical Thesaurus

5.5 Trends and Research Issues

5.6 Bibliographic Discussion

6 Text and Multimedia Languages and Properties

6.1 Introduction

6.2 Metadata

6.3 Text

6.3.1 Formats

6.3.2 Information Theory

6.3.3 Modeling Natural Language

6.3.4 Similarity Models

6.4 Markup Languages

6.4.1 SGML

6.4.2 HTML

6.4.3 XML

6.5 Multimedia

6.5.1 Formats

6.5.2 Textual Images

6.5.3 Graphics and Virtual Reality

6.5.4 HyTime

6.6 Trends and Research Issues

6.7 Bibliographic Discussion

7 Text Operations

7.1 Introduction

7.2 Document Preprocessing

7.2.1 Lexical Analysis of the Text

7.2.2 Elimination of Stopwords

7.2.3 Stemming

7.2.4 Index Terms Selection

7.2.5 Thesauri

7.3 Document Clustering

7.4 Text Compression

7.4.1 Motivation

7.4.2 Basic Concepts

7.4.3 Statistical Methods

7.4.4 Dictionary Methods

7.4.5 Inverted File Compression

7.5 Comparing Text Compression Techniques

7.6 Trends and Research Issues

7.7 Bibliographic Discussion

8 Indexing and Searching

8.1 Introduction

8.2 Inverted Files

8.2.1 Searching

8.2.2 Construction

8.3 Other Indices for Text

8.3.1 Suffix Trees and Suffix Arrays

8.3.2 Signature Files

8.4 Boolean Queries

8.5 Sequential Searching

8.5.1 Brute Force

8.5.2 Knuth-Morris-Pratt

8.5.3 Boyer-Moore Family

8.5.4 Shift-Or

8.5.5 Suffix Automaton

8.5.6 Practical Comparison

8.5.7 Phrases and Proximity

8.6 Pattern Matching

8.6.1 String Matching Allowing Errors

8.6.2 Regular Expressions and Extended Patterns

8.6.3 Pattern Matching Using Indices

8.7 Structural Queries

8.8 Compression

8.8.1 Sequential Searching

8.8.2 Compressed Indices

8.9 Trends and Research Issues

8.10 Bibliographic Discussion

9 Parallel and Distributed IR

9.1 Introduction

9.1.1 Parallel Computing

9.1.2 Performance Measures

9.2 Parallel IR

9.2.1 Introduction

9.2.2 MIMD Architectures

9.2.3 SIMD Architectures

9.3 Distributed IR

9.3.1 Introduction

9.3.2 Collection Partitioning

9.3.3 Source Selection

9.3.4 Query Processing

9.3.5 Web Issues

9.4 Trends and Research Issues

9.5 Bibliographic Discussion

10 User Interfaces and Visualization

10.1 Introduction

10.2 Human-Computer Interaction

10.2.1 Design Principles

10.2.2 The Role of Visualization

10.2.3 Evaluating Interactive Systems

10.3 The Information Access Process

10.3.1 Models of Interaction

10.3.2 Non-Search Parts of the Information Access Process

10.3.3 Earlier Interface Studies

10.4 Starting Points

10.4.1 Lists of Collections

10.4.2 Overviews

10.4.3 Examples, Dialogs, and Wizards

10.4.4 Automated Source Selection

10.5 Query Specification

10.5.1 Boolean Queries

10.5.2 From Command Lines to Forms ana Menus

10.5.3 Faceted Queries

10.5.4 Graphical Approaches to Query Specification

10.5.5 Phrases and Proximity

10.5.6 Natural Language and Free Text Queries

10.6 Context

10.6.1 Document Surrogates

10.6.2 Query Term Hits Within Document Content

10.6.3 Query Term Hits Between Documents

10.6.4 SuperBook: Context via Table of Contents

10.6.5 Categories for Results Set Context

10.6.6 Using Hyperlinks to Organize Retrieval Results

10.6.7 Tables

10.7 Using Relevance Judgements

10.7.1 Interfaces for Standard Relevance Feedback

10.7.2 Studies of User Interaction with Relevance Feedback Systems

10.7.3 Fetching Relevant Information in the Background

10.7.4 Group Relevance Judgements

10.7.5 Pseudo-Relevance Feedback

10.8 Interface Support for the Search Process

10.8.1 Interfaces for String Matching

10.8.2 Window Management

10.8.3 Example Systems

10.8.4 Examples of Poor Use of Overlapping Windows

10.8.5 Retaining Search History

10.8.6 Integrating Scanning, Selection, and Querying

10.9 Trends and Research Issues

10.10 Bibliographic Discussion

11 Multimedia IR: Models and Languages

11.1 Introduction

11.2 Data Modeling

11.2.1 Multimedia Data Support in Commercial DBMSs

11.2.2 The MULTOS Data Model

11.3 Query Languages

11.3.1 Request Specification

11.3.2 Conditions on Multimedia Data

11.3.3 Uncertainty, Proximity, and Weights in Query Expressions

11.3.4 Some Proposals

11.4 Trends and Research Issues

11.5 Bibiographic Discussion

12 Multimedia IR: Indexing and Searching

12.1 Introduction

12.2 Background -- Spatial Access Methods

12.3 A Generic Multimedia Indexing Approach

12.4 One-dimensional Time Series

12.4.1 Distance Function

12.4.2 Feature Extraction and Lower-bounding

12.4.3 Experiments

12.5 Two-dimensional Color Images

12.5.1 Image Features and Distance Functions

12.5.2 Lower-bounding

12.5.3 Experiments

12.6 Automatic Feature Extraction

12.7 Trends and Research Issues

12.8 Bibliographic Discussion

13 Searching the Web

13.1 Introduction

13.2 Challenges

13.3 Characterizing the Web

13.3.1 Measuring the Web

13.3.2 Modeling the Web

13.4 Search Engines

13.4.1 Centralized Architecture

13.4.2 Distributed Architecture

13.4.3 User Interfaces

13.4.4 Ranking

13.4.5 Crawling the Web

13.4.6 Indices

13.5 Browsing

13.5.1 Web Directories

13.5.2 Combining Searching with Browsing

13.5.3 Helpful Tools

13.6 Metasearchers

13.7 Finding the Needle in the Haystack

13.7.1 User Problems

13.7.2 Some Examples

13.7.3 Teaching the User

13.8 Searching using Hyperlinks

13.8.1 Web Query Languages

13.8.2 Dynamic Search and Software Agents

13.9 Trends and Research Issues

13.10 Bibliographic Discussion

14 Libraries and Bibliographical Systems

14.1 Introduction

14.2 Online IR Systems and Document Databases

14.2.1 Databases

14.2.2 Online Retrieval Systems

14.2.3 IR in Online Retrieval Systems

14.2.4 'Natural Language' Searching

14.3 Online Public Access Catalogs (OPACs)

14.3.1 0PACs and Their Content

14.3.2 0PACs and End Users

14.3.3 OPACs: Vendors and Products

14.3.4 Alternatives to Vendor OPACs

14.4 Libraries and Digital Library Projects

14.5 Trends and Research Issues

14.6 Bibliographic Discussion

15 Digital Libraries

15.1 Introduction

15.2 Definitions

15.3 Architectural Issues

15.4 Document Models, Representations, and Access

15.4.1 Multilingual Documents

15.4.2 Multimedia Documents

15.4.3 Structured Documents

15.4.4 Distributed Collections

15.4.5 Federated Search

15.4.6 Access

15.5 Prototypes, Projects, and Interfaces

15.5.1 International Range of Efforts

15.5.2 Usability

15.6 Standards

15.6.1 Protocols and Federation

15.6.2 Metadata

15.7 Trends and Research Issues

15.8 Bibliographical Discussion

Appendix: Porter's Algorithm

Glossary

References

Index