数据仓库、数据发掘和联机分析处理.(影印版)

数据仓库、数据发掘和联机分析处理.(影印版)
作 者: Alex Berson, Stephen J. Smith
出版社: Computing Mcgraw-Hill
丛编项:
版权说明: 本书为公共版权或经版权方授权,请支持正版图书
标 签: 暂缺
ISBN 出版时间 包装 开本 页数 字数
未知 暂无 暂无 未知 0 暂无

作者简介

暂缺《数据仓库、数据发掘和联机分析处理.(影印版)》作者简介

内容简介

The last few years have seen a growing recognition of information as a key business tool. Those who successfully gather, analyze, understand, and act upon information are among the winners in this new information age. There- fore, it is only reasonable to expect the rate of producing and consuming infor- mation to grow. We can define information as that which resolves uncertainty. We can further say that decisionmaking is the progressive resolution of uncer- tainty and is a key to a purposeful behavior by any mechanism (or organism). In general, the current business market dynamics make it abundantly clear that, for any company, information is the very key to survival.If we look at the evolution of the information processing technologies, we can see that while the first generation of client/server systems brought data to the desktop, not all of this data was easy to understand, unfortunately, and as such, it was not very useful to end users. As a result, a number of new tech- nologies have emerged that ar...

图书目录

Forewordxix

Prefacexxi

Part1.Foundation

Chapter1.IntroductiontoDataWarehousing

1.1WhyAlltheExcitement?

1.2TheNeedforDataWarehousing

1.3ParadigmShift

1.3.1ComputingParadigm

1.3.2BusinessParadigm

1.4BusinessProblemDefinition

1.5OperationalandInformationalDataStores*

1.6DataWarehouseDefinitionandCharacteristics

1.7DataWarehouseArchitecture

1.8ChapterSummary

Chapter2.Client/ServerComputingModelandDataWarehousing

2.1OverviewofClient/ServerArchitecture

2.1.1Host-BasedProcessing

2.1.2Master-SlaveProcessing

2.1.3First-GenerationClient/ServerProcessing

2.1.4Second-GenerationClient/ServerProcessing

2.2ServerSpecializationinClient/ServerComputingEnvironments

2.3ServerFunctions

2.4ServerHardwareArchitecture

2.5SystemConsiderations

2.6RISCversusClSC

2.7MultiprocessorSystems

2.7.1SMPDesign

2.7.2SMPFeatures

2.7.3SMPOperatingSystems

2.8SMPImplementations

Chapter3.ParallelProcessorsandClusterSystems

3.1Distributed-MemoryArchitecture

3.1.1Shared-NothingArchitectures

3.1.2Shared-DiskSystems

3.2ResearchIssues

3.3ClusterSystems

3.4AdvancesinMultiprocessingArchitectures

3.5OptimalHardwareArchitectureforQueryScalability*

3.5.1UniformityofDataAccessTimes

3.5.2SystemArchitectureTaxonomyandQueryExecution

3.6ServerOperatingSystems

3.6.1OperatingSystemRequirements

3.6.2MicrokernelTechnology

3.7OperatingSystemImplementations

3.7.1UNIX

3.7.2Windows/NT

3.7.3OS/2

3.7.4NetWare

3.7.5OSSummary

Chapter4.DistributedDBMSImplementations

4.1ImplementationTrendsandFeaturesofDistributedClient/ServerDBMS

4.1.1RDBMSArchitectureforScalability

4.1.2RDBMSPerformanceandEfficiencyFeatures

4.1.3TypesofParallelism

4.2DBMSConnectivity

4.3AdvancedRDBMSFeatures

4.4RDBMSReliabilityandAvailability

4.4.1Robustness,TransactionsRecovery,andConsistency

4.4.2FaultTolerance

4.5RDBMSAdministration

Chapter5.Client/ServerRDBMSSolutions

5.1State-of-the-MarketOverview

5.2Oracle

5.2.1SystemManagement

5.2.2OracleUniversalServer

5.2.3OracleConTextOption

5.2.4OracleSpatialDataOption

5.3Informix

5.3.1Features

5.3.2InformixUniversalServer

5.4Sybase

5.4.1SYBASESQLServer

5.4.2PerformanceImprovementsinSYBASESystem11

5.5IBM

5.5.1Background

5.5.2DB2UniversalDatabase

5.6Microsoft

5.6.1Background

5.6.2MSSQLServer

5.6.3DataWarehousingandMarketPositioning

Part2DataWarehousing

Chapter6.DataWarehousingComponents

6.1OverallArchitecture

6.2DataWarehouseDatabase

6.3Sourcing,Acquisition,Cleanup,andTransformationTools

6.4Metadata

6.5AccessTools

6.5.1QueryandReportingTools

6.5.2Applications

6.5.3OLAP

6.5.4DataMining

6.5.5DataVisualization

6.6DataMarts

6.7DataWarehouseAdministrationandManagement

6.8InformationDeliverySystem

Chapter7.BuildingaDatawarehouse

7.1BusinessConsiderations:ReturnonInvestment

7.1.1Approach

7.1.2OrganizationalIssues

7.2DesignConsiderations

7.2.1DataContent

7.2.2Metadata

7.2.3DataDistribution

7.2.4Tools

7.2.5PerformanceConsiderations

7.2.6NineDecisionsintheDesignofaDataWarehouse

7.3TechnicalConsiderations

7.3.1HardwarePlatforms

7.3.2DataWarehouseandDBMSSpecialization

7.3.3CommunicationsInfrastructure

7.4ImplementationConsiderations

7.4.1AccessTools

7.4.2DataExtraction,Cleanup,Transformation,andMigration

7.4.3DataPlacementStrategies

7.4.4Metadata

7.4.5UserSophisticationLevels

7.5IntegratedSolutions

7.6BenefitsofDataWarehousing

7.6.1TangibleBenefits

7.6.2IntangibleBenefits

Chapter8.MappingtheDataWarehousetoaMultiprocessorArchitecture

8.1RelationalDatabaseTechnologyforDataWarehouse

8.1.1TypesofParallelism

8.1.2DataPartitioning

8.2DatabaseArchitecturesforParallelProcessing

8.2.1Shared-MemoryArchitecture

8.2.2Shared-DiskArchitecture

8.2.3Shared-NothingArchitecture

8.2.4CombinedArchitecture

8.3ParallelRDBMSFeatures

8.4AlternativeTechnologies

8.5ParallelDBMSVendors

8.5.1Oracle

8.5.2Informix

8.5.3IBM

8.5.4Sybase

8.5.5Microsoft

8.5.6OtherRDBMSProducts

8.5.7SpecializedDatabaseProducts

Chapter9.DBMSSchemasforDecisionSupport

9.1DataLayoutforBestAccess

9.2MultidimensionalDataModel

9.3StarSchema

9.3.1DBAViewpoint

9.3.2PotentialPerformanceProblemswithStarSchemas

9.3.3SolutionstoPerformanceProblems

9.4STARjoinandSTARindex

9.5BitmappedIndexing

9.5.1SYBASEIQ

9.5.2Conclusion

9.6ColumnLocalStorage

9.7ComplexDataTypes

Chapter10.DataExtraction,Cleanup,andTransformationTools

10.1ToolRequirements

10.2VendorApproaches

10.3AccesstoLegacyData

10.4VendorSolutions

10.4.1PrismSolutions

10.4.2SASInstitute

10.4.3CarletonCorporation'sPassportandMetaCenter

10.4.4ValityCorporation

10.4.5EvolutionaryTechnologies

10.4.6InformationBuilders

10.5TransformationEngines

10.5.1Informatica

10.5.2Constellar

Chapter11.Metadata

11.1MetadataDefined

11.2MetadataInterchangeInitiative

11.3MetadataRepository

11.4MetadataManagement

11.5ImplementationExamples

11.5.1PlatinumRepository

11.5.2R&O:TheROCHADERepository

11.5.3PrismSolutions

11.5.4LogicWorksUniversalDirectory

11.6MetadataTrends

Part3.BusinessAnalysis

Chapter12.ReportingandQueryToolsandApplications

12.1ToolCategories

12.1.1ReportingTools

12.1.2ManagedQueryTools

12.1.3ExecutiveInformationSystemTools

12.1.4OLAPTools

12.1.5DataMiningTools

12.2TheNeedforApplications

12.3CognosImpromptu

12.4Applications

12.4.1PowerBuilder

12.4.2Forte

12.4.3InformationBuilders

Chapter13.On-LineAnalyticalProcessing(OLAP)

13.1NeedforOLAP

13.2MultidimensionalDataModel

13.3OLAPGuidelines

13.4MultidimensionalversusMulfirelationalOLAP

13.5CategorizationofOLAPTools

13.5.1MOLAP

13.5.2ROLAP

13.5.3ManagedQueryEnvironment(MQE)

13.6StateoftheMarket

13.6.1CognosPowerPlay

13.6.2IBIFOCUSFusion

13.6.3PilotSoftware

13.7OLAPToolsandtheInternet

13.8Conclusion

Chapter14.PatternsandModels

14.1Definitions

14.1.1WhatIsaPattern?WhatIsaModel?

14.1.2VisualizingaPattern

14.2ANoteonTerminology

14.3WhereAreModelsUsed?

14.3.1Problem1:Selection

14.3.2Problem2:Acquisition

14.3.3Problem3:Retention

14.3.4Problem4:Extension

14.4WhatIsthe"Right"Model?

14.4.1ThePerfectModel

14.4.2MissingData

14.5Sampling

14.5.1TheNecessityofSampling

14.5.2RandomSampling

14.6ExperimentalDesign

14.6.1AvoidingBias

14.6.2MoreonSampling

14.7Computer-IntensiveStatistics

14.7.1Cross-validation

14.7.2JackknifeandBootstrapResampling

14.8PickingtheBestModel

Chapter15.Statistics

15.1Data,Counting,andProbability

15.1.1Histograms

15.1.2TypesofCategoricalPredictors

15.1.3Probability

15.1.4Bayes'Theorem

15.1.5Independence

15.1.6CausalityandCollinearity

15.1.7SimplifyingthePredictors

15.2HypothesisTesting

15.2.1HypothesisTestingonaReal-WorldProblem

15.2.2HypothesisTesting,PValues,andAlpha

15.2.3MakingMistakesinRejectingtheNullHypothesis

15.2.4DegreesofFreedom

15.3ContingencyTables,theChiSquareTest,andNoncausalRelationships

15.3.1ContingencyTables

15.3.2TheChiSquareTest

15.3.3SometimesStrongRelationshipsAreNotCausal

15.4Prediction

15.4.1LinearRegression

15.4.2OtherFormsofRegression

15.5SomeCurrentOfferingsofStatisticsTools

15.5.1SASInstitute

15.5.2SPSS

15.5.3MathSoft

Chapter16.ArtificialIntelligence

16.1DefiningArtificialIntelligence

16.2ExpertSystems

16.3FuzzyLogic

16.4TheRiseandFallofAl

Part4.DataMining

Chapter17.IntroductiontoDataMining

17.1DataMiningHasComeofAge

17:2TheMotivationforDataMiningIsTremendous

17.3LearningfromYourPastMistakes

17.4DataMining?Don'tNeedIt--I'veGotStatistics

17.5MeasuringDataMiningEffectiveness:Accuracy,Speed,andCost

17.6EmbeddingDataMiningintoYourBusinessProcess

17.7TheMoreThingsChange,theMoreTheyRemaintheSame

17.8DiscoveryversusPrediction

17.8.1GoldinThemTharHills

17.8.2Discovery--FindingSomethingYouWeren'tLookingFor

17.8.3Prediction

17.9Overfitting

17.10StateoftheIndustry

17.10.1TargetedSolutions

17.10.2BusinessTools

17.10.3BusinessAnalystTools

17.10.4ResearchAnalystTools

17.11ComparingtheTechnologies

17.11.1BusinessScoreCard

17.11.2ApplicationsScoreCard

17.11.3AlgorithmicScoreCard

Chapter18.DecisionTrees

18.1WhatIsaDecisionTree?

18.2BusinessscoreCard

18.3WheretoUseDecisionTrees

18.3.1Exploration

18.3.2DataPreprocessing

18.3.3Prediction

18.3.4ApplicationsScoreCard

18.4TheGeneralIdea

18.4.1GrowingtheTree

18.4.2WhenDoestheTreeStopGrowing?

18.4.3WhyWouldaDecisionTreeAlgorithmPreventtheTreeFromGrowingIfThereWeren'tEnoughData?

18.4.4DecisionTreesAren'tNecessarilyFinishedafterTheyAreFullyGrown

18.4.5AretheSplitsatEachLeveloftheTreeAlwaysBinaryYes/NoSplits?

18.4.6PickingtheBestPredictors

18.4.7PickingtheRightPredictorValuefortheSplit

18.5HowtheDecisionTreeWorks

18.5.1HandlingHigh-CardinalityPredictorsinID3

18.5.2C4.5EnhancesID3

18.5.3CARTDefinition

18.5.4PredictorsArePickedasTheyDecreasetheDisorderoftheData

18.5.5CARTSplitsUnorderedPredictorsbyImposingOrderonThem

18.5.6CARTAutomaticallyValidatestheTree

18.5.7CARTSurrogatesHandleMissingData

18.5.8CHAID

18.6CaseStudy:PredictingWirelessTelecommunicationsChumwithCART

18.7StrengthsandWeaknesses

18.7.1AlgorithmScoreCard

18.7.2StateoftheIndustry

Chapter19.NeuralNetworks

19.1WhatIsaNeural'Network?

19.1.1Don'tNeuralNetworksLearntoMakeBetterPredictions?

19.1.2AreNeuralNetworksEasytoUse?

19.1.3BusinessscoreCard

19.2WheretoUseNeuralNetworks

19.2.1NeuralNetworksforClustering

19.2.2NeuralNetworksforFeatureExtraction

19.2.3ApplicationsScoreCard

19.3TheGeneralIdea

19.3.1WhatDoesaNeuralNetworkLookLike?

19.3.2HowDoesaNeuralNetworkMakeaPrediction?

19.3.3HowIstheNeuralNetworkModelCreated?

19.3.4HowComplexCantheNeuralNetworkModelBecome?

19.3.5HiddenNodesAreLikeTrustedAdvisorstotheOutputNodes

19.3.6DesignDecisionsinArchitectingaNeuralNetwork

19.3.7DifferentTypesofNeuralNetworks

19.3.8KohonenFeatureMaps

19.3.9HowDoestheNeuralNetworkResembletheHumanBrain?

19.3.10ANeuralNetworkLearnstoSpeak

19.3.11ANeuralNetworkLearnstoDrive

19.3.12TheHumanBrainIsStillMuchMorePowerful

19.4HowtheNeuralNetworkWorks

19.4.1HowPredictionsAreMade

19.4.2HowBackpropagafionLearningWorks

19.4.3DataPreparation

19.4.4CombattingOverfitting

19.4.5ApplyingandTrainingtheNeuralNetwork

19.4.6ExplainingtheNetwork

19.5CaseStudy:PredictingCurrencyExchangeRates

19.5.1TheProblem

19.5.2Implementation

19.5.3Theresults

19.6StrengthsandWeaknessess

19.6.1AlgorithmScoreCard

19.6.2SomeCurrentMarketOfferings

19.6.3Radial-Basis-FunctionNetworks

19.6.4GeneticAlgorithmsandNeuralNetworks

19.6.5SimulatedAnnealingandNeuralNetworks

Chapter20.NearestNeighborandClustering

20.1BusinessScoreCard

20.2WheretoUseClusteringandNearest-NeighborPrediction

20.2.1ClusteringforClarity

20.2.2ClusteringforOutlierAnalysis

20.2.3NearestNeighborforPrediction

20.2.4ApplicationsScoreCard

20.3TheGeneralIdea

20.3.1ThereIsNoBestWaytoCluster

20.3.2HowAreTradeoffsMadeWhenDeterminingWhichRecordsFallintoWhichCluster

20.3.3ClusteringIstheHappyMediumbetweenHomogeneousClustersandtheLowestNumberofClusters

20.3.4WhatIstheDifferencebetweenClusteringandNearest-NeighborPrediction?

20.3.5WhatIsann-DimensionalSpace?

20.3.6HowIstheSpaceforClusteringandNearestNeighborDefined?

20.4HowClusteringandNearest-NeighborPredictionWork

20.4.1Lookingatann-DimensionalSpace

20.4.2HowIs"Nearness'Defined?

20.4.3WeightingtheDimensions:DistancewithaPurpose

20.4.4CalculatingDimensionWeights

20.4.5HierarchicalandNonhierarchicalClustering

20.4.6Nearest-NeighborPrediction

20.4.7KNearestNeighbors--VotingIsBetter

20.4.8GeneralizingtheSolution:PrototypesandSentries

20.5CaseStudy:ImageRecognitionforHumanHandwriting

20.5.1TheProblem

20.5.2SolutionUsingNearest-NeighborTechniques

20.6StrengthsandWeaknessess

20.6.1AlgorithmScoreCard

20.6.2PredictingFutureTrends

Chapter21.GeneticAlgorithms

21.1WhatAreGeneticAlgorithms?

21.1.1HowDoTheyRelatetoEvolution?

21.1.2GeneticAlgorithms,ArtificialLife,andSimulatedEvolution

21.1.3HowCanTheyBeUsedinBusiness?

21.1.4BusinessScoreCard

21.2WheretoUseGeneticAlgorithms

21.2.1GeneticAlgorithmsforOptimization

21.2.2GeneticAlgorithmsforDataMining

21.2.3ApplicationsScoreCard

21.3TheGeneralIdea

21.3.1DoGeneticAlgorithmsGuesstheRightAnswer?

21.3.2AreGeneticAlgorithmsFullyAutomated?

21.3.3CostMinimization:TravelingSalesperson

21.3.4CooperationStrategies:Prisoner'sDilemma

21.4HowtheGeneticAlgorithmWorks

21.4.1TheOverallProcess

21.4.2SurvivaloftheFittest

21.4.3Mutation

21.4.4SexualReproductionandCrossover

21.4.5ExplorationversusExploitation

21.4.6TheSchemaTheorem

21.4.7Epistasis

21.4.8ClassifierSystems

21.4.9RemainingChallenges

21.4.10Sharing:ASolutiontoPrematureConvergence

21.4.11MetalevelEvolution:TheAutomationofParameterChoice

21.4.12ParallelImplementation

21.5CaseStudy:OptimizingPredictiveCustomerSegments

21.6StrengthsandWeaknessess

21.6.1AlgorithmScoreCard

21.6.2StateoftheMarketplace

21.6.3PredictingFutureTrends

Chapter22.RuleInduction

22.1BusinessScoreCard

22.2WheretoUseRuleInduction

22.2.1WhatIsaRule?

22.2.2WhattoDowithaRule

22.2.3Caveat:RulesDoNotImplyCausality

22.2.4TypesofDatabasesUsedforRuleInduction

22.2.5Discovery

22.2.6Prediction

22.2.7ApplicationsScoreCard

22.3TheGeneralIdea

22.3.1HowtoEvaluatetheRule

22.3.2ConjunctionsandDisjunctions

22.3.3Defining"Interestingness"

22.3.4OtherMeasuresofUsefulness

22.3.5RulesversusDecisionTrees

22.4HowRuleInductionWorks

22.4.1ConstructingRules

22.4.2ABrute-ForceAlgorithmforGeneratingRules

22.4.3CombiningEvidence

22.5CaseStudy:ClassifyingU.S.CensusReturns

22.6StrengthsandWeaknesses

22.7CurrentOfferingsandFutureImprovements

Chapter23.SelectingandUsingtheRightTechnique

23.1UsingtheRightTechnique

23.1.1TheDataMiningProcess

23.1.2WhatAlltheDataMiningTechniquesHaveinCommon

23.1.3CasesinWhichDecisionTreesAreLikeNearestNeighbors

23.1.4RuleInductionIsLikeDecisionTrees

23.1.5CouldYouDoLinkAnalysiswithaNeuralNetwork?

23.2DataMiningintheBusinessProcess

23.2.1AvoidingSomeBigMistakesinDataMining

23.2.2UnderstandingtheData

23.3TheCaseforEmbeddedDataMining

23.3.1TheCostofaDistributedBusinessProcess

23.3.2TheBestWaytoMeasureaDataMiningTool

23.3.3TheCaseforEmbeddedDataMining

23.4HowtoMeasureAccuracy,Explanation,andIntegration

23.4.1MeasuringAccuracy

23.4.2MeasuringExplanation

23.4.3MeasuringIntegration

23.5WhattheFutureHoldsforEmbeddedDataMining

Part5.DataVisualizationandOverallPerspective

Chapter24.DataVisualization

24.1DataVisualizationPrinciples

24.2ParallelCoordinates

24.3VisualizingNeuralNetworks

24.4VisualizationofTrees

24.5StateoftheIndustry

24.5.1AdvancedVisualSystems

24.5.2AltaAnalytics

24.5.3BusinessObjects

24.5.4IBM

24.5.5PilotSoftware

24.5.6SiliconGraphics

Chapter25.PuttingItAllTogether

25.1DesignforScalability

25.2DataQuality

25.3ImplementationNotes

25.3.1OperationalDataStores

25.3.2DataMarts

25.3.3StarSchema

25.4MakingtheMostofYourWarehouse

25.5TheDataWarehousingMarket

25.6CostsandBenefits

25.6.1BigData--BiggerReturns

25.6.2LawofDiminishingReturns

25.7AUnifyingViewofBusinessInformation

25.8What'sNext

25.8.1DistributedWarehouseEnvironments

25.8.2UsingtheInternetorIntranetforInformationDelivery

25.8.3Object-RelationalDatabases

25.8.4VeryLargeDatabases(VLDBs)

25.9Conclusion

AppendixA.Glossary

AppendixB.BigData--BetterReturns:LeveragingYourHiddenDataAssetstoImproveROI

AppendixC.Dr.E.F.Codd's12GuidelinesforOLAP

AppendixD.10MistakesforDataWarehousingManagerstoAvoidBibliography605

Index609