数据挖掘:实用机器学习技术及Java实现 英文版

数据挖掘:实用机器学习技术及Java实现 英文版
作 者: Ian Witten Eibe Frank
出版社: 机械工业出版社
丛编项: 经典原版书库
版权说明: 本书为公共版权或经版权方授权,请支持正版图书
标 签: 数据库存储与管理
ISBN 出版时间 包装 开本 页数 字数
未知 暂无 暂无 未知 0 暂无

作者简介

  Lan H.Witten,新西兰怀卡托大学计算机科学系教授。他是ACM和新西兰皇家学会的成员,并参加了英国、美国、加拿大和新西兰的专业计算、信息检索、工程等协会。他著有多部著作,是多家技术杂志的作者,发表过大量论文。

内容简介

本书是综合运用数据挖掘、数据分析、信息理论以及机器学习技术的里程碑。——微软研究院,图灵奖得主JimGray这是一本将数据挖掘算法和数据挖掘实践完美结合起来的优秀教材。作者以其丰富的经验,对数据挖掘的概念和数据挖掘所用的技术(特别是机器学习)进行了深入浅出的介绍,并对应用机器学习工具进行数据挖掘给出了良好的建议。数据挖掘中的各个关键要素也融合在众多实例中加以介绍。本书还介绍了Weka这种基于Java的软件系统。该软件系统可以用来分析数据集,找到适用的模式,进行正确的分析,也可以用来开发自己的机器学习方案。本书的主要特点:解释数据挖掘算法的原理。通过实例帮助读者根据实际情况选择合适的算法,并比较和评估不同方法得出的结果。介绍提高性能的技术,包括数据处理以及组合不同方法得到的输出。提供了本书所用的Weka软件和附加学习材料,可以从http://www.mkp.com/datamining上下载这些资料。JanH.Witten新西兰怀卡托(Waikato)大学计算机科学系教授。他是ACM和新西兰皇家学会的成员,并参加了英国、美国、加拿大和新西兰的专业计算、信息检索。工程等协会。他著有多部著作,是多家技术杂志的作者,发表过大量论文。EibeFrank毕业于德国卡尔斯鲁厄大学计算机科学系,目前是新西兰怀卡托大学机器学习组的研究员。他经常应邀在机器学习会议上演示其研究成果,并在机器学习杂志上发表多篇论文。

图书目录

Foreword vii

Preface xvii

1 What's it all about?

1.1 Data mining and machine learning

Describing structural patterns

Machine learning

Data mining

1.2 Simple examples: The weather problem and others

The weather problem

Contact lenses: An idealized problem

Irises: A classic numeric dataset

CPU performance: Introducing numeric prediction

Labor negotiations: A more realistic example

Soybean classification: A classic machine learning success

1.3 Fielded applications

Decisions involving judgment

Screening images

Load forecasting

Diagnosis

Marketing and sales

1.4 Machine learning and statistics

1.5 Generalization as search

Enumerating the concept space

Bias

1.6 Data mining and ethics

1.7 Further reading

2 Input Concepts, instances, attributes

2.1 What's aconcept?

2.2 What's in an example?

2.3 What's in an attribute?

2.4 Preparing the input

Gathering the data together

Arff format

Attribute types

Missing values

Inaccurate values

Getting to know your data

2.5 Further reading

3 Output: Knowledge representation

3.1 Decision tables

3.2 Decision trees

3.3 Classification rules

3.4 Association rules

3.5 Ruleswith exceptions

3.6 Rules involving relations

3.7 Trees for numeric prediction

3.8 Instance-based representation

3.9 Clusters

3.10 Further reading

4 Algorithms: The basic methods

4.1 Inferring rudimentary rules

Missing values and numeric attributes

Discussion

4.2 Statistical modeling

Missing values and numeric attributes

Discussion

4.3 Divide and conquer: Constructing decision trees

Calculating information

Highly branching attributes

Discussion

4.4 Covering algorithms: Constructing rules

Rules versus trees

A simple covering algorithm

Rules versus decision lists

4.5 Mining association rules

Item sets

Association rules

Generating rules efficiently

Discussion

4.6 Linear models

Numeric prediction

Classification

Discussion

4.7 Instance-based learning

The distance function

Discussion

4.8 Further reading

5 Credibility: Evaluating what's been learned

5.1 Training and testing

5.2 Predicting performance

5.3 Cross-validation

5.4 Other estimates

Leave-one-out

The bootstrap

5.5 Comparing data mining schemes

5.6 Predicting probabilities

Quadratic loss function

Informational loss function

Discussion

5.7 Counting the cost

Lift charts

ROC curves

Cost-sensitive learning

Discussion

5.8 Evaluating numeric prediction

5.9 The minimum description length principle

5.10 Applying MDL to clustering

5.11 Further reading

6 Implementations: Real machine learning schemes

6.1 Decision trees

Numeric attributes

Missing values

Pruning

Estimating error rates

Complexity of decision tree induction

From trees to rules

C4.5: Choices and options

Discussion

6.2 Classification rules

Criteria for choosing tests

Missing values, numeric attributes

Good rules and bad rules

Generating good rules

Generating good decision lists

Probability measure for rule evaluation

Evaluating rules using a test set

Obtaining rules from partial decision trees

Rules with exceptions

Discussion

6.3 Extending linear classification: Support vector machines

The maximum margin hyperplane

Nonlinear class boundaries

Discussion

6.4 Instance-based learning

Reducing the number of exemplars

Pruning noisy exemplars

Weighting attributes

Generalizing exemplars

Distance functions for generalized exemplars

Generalized distance functions

Discussion

6.5 Numeric prediction

Model trees

Building the tree

Pruning the tree

Nominal attributes

Missing values

Pseudo-code for model tree induction

Locally weighted linear regression

Discussion

6.6 Clustering

Iterative distance-based clustering

Incremental clustering

Category utility

Probability-based clustering

The EM algorithm

Extending the mixture model

Bayesian clustering

Discussion

7 Moving on: Engineering the input and output

7.1 Attribute selection

Scheme-independent selection

Searching the attribute space

Scheme-specific selection

7.2 Discretizing numeric attributes

Unsupervised discretization

Entropy-based discretization

Other discretization methods

Entropy-based versus error-based discretization

Converting discrete to numeric attributes

7.3 Automatic data cleansing

Improving decision trees

Robust regression

Detecting anomalies

7.4 Combining multiple models

Bagging

Boosting

Stacking 258

Error-correcting output codes

7.5 Further reading

8 Nuts and bolts: Machine learning algorithms in Java

8.1 Getting started

8.2 Javadoc and the class library

Classes, instances, and packages

The weka. core package

The weka. classifiers package

Other packages

Indexes

8.3 Processing datasets using the machine learning programs

Using M5'

Generic options

Scheme-specific options

Classifiers

Meta-learning schemes

Filters

Association rules

Clustering

8.4 Embedded machine learning

A simple message classifier

8.5 Writing new learning schemes

An example classifier

Conventions for implementing classifiers

Writing filters

An example filter

Conventions for writing filters

9 Looking forward

9.1 Learning from massive datasets

9.2 Visualizing machine learning

Visualizing the input

Visualizing the output

9.3 Incorporating domain knowledge

9.4 Text mining

Finding key phrases for documents

Finding information in running text

Soft parsing

9.5 Mining the World Wide Web

9.6 Further reading

References

Index

About the authors