安腾体系结构:理解64位处理器和EPIC原理 英文版

安腾体系结构:理解64位处理器和EPIC原理 英文版
作 者: Gregory Trimper
出版社: 清华大学出版社
丛编项: 大学计算机教育国外著名教材系列
版权说明: 本书为公共版权或经版权方授权,请支持正版图书
标 签: 微处理器/CPU
ISBN 出版时间 包装 开本 页数 字数
未知 暂无 暂无 未知 0 暂无

作者简介

暂缺《安腾体系结构:理解64位处理器和EPIC原理 英文版》作者简介

内容简介

本书全面介绍了新型的64位安腾体系结构及其具有突破性的性能。通过使用标准命令行工具和和大量实例,作者循序渐进地介绍了安腾汇编语言,以及安腾显式并行指令计算(EPIC)的指令集、寻址、寄存器栈引擎、谓词、I/O、过程调用、浮点操作规程等内容,并结合现代计算机体系结构的发展,详细阐述了安腾体系结构设计。每章都包括翔实的数字、论述以及编程练习,并且提供了大量的参考书目和丰富的网上资源。本书既可作为计算机及相关专业师生学习计算机体系结构或汇编语言的教材,也是有关研究人员系统了解安腾体系结构的很好参考书。

图书目录

List of Figures

List of Tables

Preface

Acknowledgments

Trademarks

Chapter 1 Architecture and Implementation

1.1 Analogy: Piano Architecture

1.2 Types of Computer Languages

1.3 Why Study Assembly Language?

1.4 Prefixes for Binary Multiples

1.5 Instruction Set Architectures

1.6 The Life Cycle of Computer Architectures

1.6.1 The 32-Bit Intel Architecture and Its Predecessors

1.6.2 The AlphaTM Architecture and Its Predecessors

1.6.3 The Itanium Architecture and Its Predecessors

1.6.4 The Naming of Architectures and Implementations

1.7 SQUARES: A First Programming Example

1.7.1 C, FORTRAN, and COBOL

1.7.2 Assembly Language for Itanium Architecture

1.8 Review of Number Systems

1.8.1 Positional Coefficients and Weights

1.8.2 Binary and Hexadecimal Representations

1.8.3 Signed Integers

Summary

References

Exercises

Chapter 2 Computer Structures and Data Representations

2.1 Computer Structures

2.1.1 The Central Processing Unit

2.1.2 The Memory

2.1.3 The Input/Output System

2.2 Instruction Execution

2.3 Classes of Instruction Set Architectures

2.4 Migration to 64-Bit Architectures

2.5 Itanium Information Units and Data Types

2.5.1 Integers

2.5.2 Floating-Point Numbers

2.5.3 Alphanumeric Characters

Summary

References

Exercises

Chapter 3 The Program Assembler and Debugger

3.1 Programming Environments

3.2 Program Development Steps

3.3 Comparing Variants of a Source File

3.4 Assembler Statement Types

3.4.1 Statement Format

3.4.2 Symbolic Addresses

3.4.3 Classes of Assembly Language Operators

3.5 The Functions of a Symbolic Assembler

3.5.1 Constants

3.5.2 Symbols or Identifiers

3.5.3 Storage Allocation

3.5.4 The Location Counter

3.5.5 Expressions

3.5.6 Control Statements

3.5.7 Elements of a Listing File

3.6 The Assembly Process

3.7 The Linking Process

3.8 The Program Debugger

3.8.1 Capabilities of Debugger Programs

3.8.2 Running SQUARES using gdb (Linux~ and HP-UX~)

3.8.3 Running SQUARES using adb (HP-UX)

3.8.4 Examples of Debugger Commands

3.9 Conventions for Writing Programs

Summary

References

Exercises

Chapter 4 Itanium Instruction Formats and Addressing

4.1 Overview of Itanium Instruction Formats

4.1.1 Instruction Bundles

4.1.2 Instruction Bit-Field Layouts

4.1.3 Classes of Itanium Instructions

4.2 Integer Arithmetic Instructions

4.2.1 Addition and Subtraction

4.2.2 Arithmetic Overflow

4.2.3 Shift Left and Add Instruction

4.2.4 Special-Case Arithmetic Operations

4.2.5 Multiplication of 16-Bit Signed Integers

4.2.6 Full-Width Multiplication and Division

4.3 Bit Encoding for Itanium Instructions

4.4 HEXNUM: Using Arithmetic Instructions

4.5 Data Access Instructions

4.5.1 Itanium Cache Structures

4.5.2 Integer Store Instructions

4.5.3 Integer Load Instructions

4.5.4 Move Long Immediate Instruction

4.5.5 Accessing Simple Record Structures

4.5.6 Access to Specialized CPU Registers

4.6 Other ALU Instructions

4.6.1 Sign-Extend Instruction

4.6.2 Zero-Extend Instruction

4.6.3 Instructions for Quantities Less Than 64 Bits in Width

4.7 DOTPROD: Using Data Access Instructions

4.8 Itanium Addressing Modes

4.8.1 Immediate Addressing

4.8.2 Register Direct Addressing

4.8.3 Register Indirect Addressing

4.8.4 Autoincrement Addressing

4.8.5 Summary of Itanium Addressing Modes

4.8.6 Addressing Details in Previous Programs

4.9 Addressing in Other Architectures

4.9.1 Modes Built on Register Indirect Addressing

4.9.2 Modes Built on Displacement Addressing

4.9.3 Comparison of Modes Across Architectures

Summary

References

Exercises

Chapter 5 Comparison, Branches, and Predication

5.1 Hardware Basis for Control of Flow

5.1.1 Condition Codes

5.1.2 State-Management Approaches

5.1.3 Predicate Registers

5.2 Integer Compare Instructions

5.2.1 Signed Comparison and Equality

5.2.2 Unsigned Comparison

5.3 Program Branching

5.3.1 Ordinary Branch Instructions

5.3.2 Timing Considerations for Branches

5.3.3 If...Then...Else Structures

5.3.4 Loop Structures

5.3.5 Branch Addressing Range

5.3.6 Locality and Program Performance

5.4 DOTLOOP: Using a Counted Loop

5.5 Stops, Instruction Groups, and Performance

5.5.1 Study of Stops and Groups in DOTLOOP

5.5.2 Simplified Rules for Data Dependency

5.5.3 How Itanium Assemblers Handle Stops

5.5.4 Local Labels for Loops

5.5.5 Loops, Branches, and Overall Performance

5.6 DOTCLOOP: Using the Loop Count Register

5.7 Other Structured Programming Constructs

5.7.1 Unconditional Compare Instructions

5.7.2 Nested If...Then...Else Structures

5.7.3 Multiway Branching

5.7.4 Simple Case Structures

5.8 MAXIMUM: Using Conditional Instructions

Summary

References

Exercises

Chapter 6 Logical Operations, Bit-Shifts, and Bytes

6.1 Logical Functions

6.1.1 Boolean Functions of Two Variables

6.1.2 Logical Instructions

6.1.3 Applications of Logical Functions

6.1.4 The Single-Bit Test Instruction

6.1.5 Parallel (Logical) Conditions

6.1.6 The Logical Basis of Addition

6.2 HEXNUM2: Using Logical Masks

6.3 Bit and Field Operations

6.3.1 Shift Instructions

6.3.2 Applications of Shift Operations

6.3.3 The Shift Right Pair Instruction

6.3.4 Extract and Deposit Instructions

6.4 SCANTEXT: Processing Bytes

6.5 Integer Multiplication and Division

6.5.1 Booth''s Algorithm for Multiplication

6.5.2 Unsigned Multiplication

6.5.3 Division Using Known Reciprocals

6.6 DECNUM: Converting an Integer to Decimal Format

6.7 Using C for ASCII Input and Output

6.7.1 GETPUT: Encapsulating C Functions

6.7.2 IO_C: A Simple Test Program

6.7.3 Additional Concepts

6.8 BACKWARD: Using Byte Manipulations

Summary

References

Exercises

Chapter 7 Subroutines, Procedures, and Functions

7.1 Memory Stacks

7.1.1 Stack Addressing for CISC Architectures

7.1.2 Stack Addressing for Load/Store Architectures

7.1.3 Stack Addressing for Itanium Architecture

7.1.4 User-Defined Stacks

7.2 DECNUM2: Using Stack Operations

7.3 Register Stacks

7.3.1 SPARC Register Windows

7.3.2 Itanium Register Stack

7.3.3 The alloc Instruction

7.3.4 The Register Stack Engine (RSE)

7.3.5 Banked Registers

7.4 Program Segmentation

7.4.1 Source-Level Modularity

7.4.2 Traditional Subroutines

7.4.3 Coroutines

7.4.4 Procedures and Functions

7.4.5 Shared Library Functions

7.5 Calling Conventions

7.5.1 Register Contention and Conventions

7.5.2 Call and Return Branch Instructions

7.5.3 Argument Passing: Locations

7.5.4 Argument Passing: Methods

7.5.5 Prologues and Epilogues

7.5.6 The regstk Directive

7.6 DECNUM3 and BOOTH: Making a Function

7.6.1 Defining the Interface

7.6.2 BOOTH: The Callable Function

7.6.3 DECNUM3: The Test Program

7.6.4 Position-Independent Code

7.7 Integer Quotients and Remainders

7.7.1 Routines Used by a High-Level Language

7.7.2 Open-Source Routines from Intel Corporation

7.8 RANDOM: A Callable Function

7.8.1 Choosing an Algorithm

7.8.2 RANDOM: Developing the Function

7.8.3 High-Level Language Calling Programs

Summary

References

Exercises

Chapter 8 Floating-Point Operations

8.1 Parallels Between Integer and Floating-Point Instructions

8.2 Representations of Floating-Point Values

8.2.1 IEEE Special Values

8.2.2 Values in Itanium Floating-Point Registers

8.3 Copying Floating-Point Data

8.3.1 Floating-Point Store Instructions

8.3.2 Floating-Point Load Instructions

8.3.3 Floating-Point Load Pair Instruction

8.3.4 Floating-Point Pseudoinstructions for Register-Register Copying

8.3.5 Floating-Point Merge Instruction

8.4 Floating-Point Arithmetic Instructions

8.4.1 Addition, Subtraction, and Multiplication

8.4.2 Fused Multiply-Add and Multiply-Subtract Instructions

8.4.3 Normalization as Another Special Case

8.4.4 Maximum and Minimum Operations

8.4.5 Rounding, Exceptions, and Floating-Point Control

8.5 HORNER: Evaluating a Polynomial

8.6 Predication Based on Floating-Point Values

8.6.1 Floating-Point Compare Instruction

8.6.2 Floating-Point Class Instruction

8.7 Integer Operations in Floating-Point Execution Units

8.7.1 Data Conversion Instructions

8.7.2 Integer Multiplication Instructions

8.7.3 Multiplication Strategies

8.7.4 Floating-Point Logical Instructions

8.8 Approximations for Reciprocals and Square Roots

8.8.1 Floating-Point Reciprocal Approximation

8.8.2 Reciprocal Square Root Approximation

8.8.3 Floating-Point Division

8.8.4 Open-Source Routines from Intel Corporation

8.9 APPROXPI: Using Floating-Point Instructions

Summary

References

Exercises

Chapter 9 Input and Output of Text

9.1 File Systems

9.1.1 Unix~ I/O Software

9.1.2 Linux~ I/O Software

9.2 Keyboard and Display I/O

9.2.1 Unformatted Line FO

9.2.2 Formatted I/O

9.3 SCANTERM: Using C Standard I/O

9.4 SORTSTR: Sorting Strings

9.5 Text File FO

9.5.1 Directory-Level Access

9.5.2 Unformatted Line I/O

9.5.3 Formatted FO

9.6 SCANFILE: Input and Output with Files

9.7 SORTINT: Sorting Integers from a File

9.8 Binary Files

Summary

References

Exercises

Chapter 10 Performance Considerations

10.1 Processor-Level Parallelism

10.1.1 Simplified Instruction Pipeline

10.1.2 Superscalar Pipelining

10.1.3 Itanium 2 Processor Pipelines

10.1.4 Pipeline Hazards

10.2 Instruction-Level Parallelism

10.2.1 RISC Approaches

10.2.2 The VLIW Idea

10.2.3 EPIC as a Way Forward

10.3 Explicit Parallelism in the ltanium Processors

10.3.1 Instruction Templates

10.3.2 Data Dependency and Speculation

10.3.3 Control Dependency and Speculation

10.3.4 Combined Control and Data Speculation

10.4 Software-Pipelined Loops

10.4.1 Traditional Loop Unrolling

10.4.2 Software Pipelining

10.4.3 Rotating Registers

10.4.4 Loop Phases

10.4.5 Branch Instructions for Software Pipelines

10.5 Modulo Scheduling a Loop

10.5.1 DOTCTOP: Implementation-Independent Schedule

10.5.2 DOTCTOP2: Itanium 2 Processor Schedule

10.5.3 Further Considerations

10.6 Program Optimization Factors

10.6.1 Instruction Size

10.6.2 Addressing Mode

10.6.3 Instruction Power

10.6.4 Program Size

10.6.5 Prefetching Lines intoCache

10.6.6 Use of Inline Functions

10.6.7 Instruction Reordering

10.6.8 Recursion and Related Factors

10.7 Fibonacci Numbers

10.7.1 FIB1: Function Using Recursion

10.7.2 FIB2: Function Without Recursion

10.7.3 FIB3: Function Using the Register Stack

10.7.4 TESTFIB: Showing the Cost of Recursion

Summary

References

Exercises

Chapter 11 Looking at Output from Compilers

11.1 Compilers for RISC-like Systems

11.1.1 Optimization Levels for Open-Source Compilers

11.1.2 Optimization Levels for Intel Compilers

11.1.3 Optimization Levels for HP-UX Compilers

11.1.4 Additional Optimization Possibilities

11.2 Compiling a Simple Program

11.2.1 Comparing Output from gcc and ecc (Linux)

11.2.2 Comparing Output from gcc and g77 (Linux)

11.2.3 Comparing Output from ccbundled and f90 (HP-UX)

11.3 Optimizing a Simple Program

11.3.1 Comparing Levels -O1 and -02 for g77 (Linux)

11.3.2 Compiler Messages

11.3.3 Loop Length and Optimization with f90 (HP-UX)

11.4 Inline Optimizations

11.5 Profile-Guided or Other Optimizations

11.6 Debugging Optimized Programs

11.7 Recursion for Fibonacci Numbers Revisited

Summary

References

Exercises

Chapter 12 Parallel Operations

12.1 Classification of Computing Systems

12.2 Integer Parallel Operations

12.3 Applications to Integer Multiplication

12.3.1 32x32-Bit Sources Giving 32-Bit Unsigned Product

12.3.2 32x32-Bit Sources Giving 64-Bit Unsigned Product

12.4 Opportunities and Challenges

12.5 Floating-Point Parallel Operations

12.6 Semaphore Support for Parallel Processes

12.6.1 Previous Architectures

12.6.2 Itanium Architecture

Summary

References

Exercises

Chapter 13 Variations Among Implementations

13.1 Why Implementations Change

13.1.1 Demands and Opportunities

13.1.2 Implications of Moore''s Law

13.1.3 Anticipating a Long Lifetime for an Architecture

13.2 How Implementations Change

13.3 The Original Itanium Processor

13.3.1 Comparison to the Itanium 2 Processor

13.3.2 Cache Hierarchy

13.3.3 Execution Units and Issue Ports

13.3.4 Pipelines

13.3.5 Latency Factors

13.3.6 Branch Prediction

13.3.7 Other Differences and Features

13.4 A Major Role for Software

13.4.1 New Architectures

13.4.2 New Implementations

13.4.3 New Instructions or More Registers

13.5 IA-32 Instruction Set Mode

13.6 Determining Extensions and Implementation Version

Summary

References

Exercises

Appendix A Command-Line Environments

References

Exercises

Appendix B Suggested System Resources

B. 1 System Hardware

B.1.1 Itanium Workstation or Server

B.1.2 Ski Simulator on an IA-32 Linux System

B.1.3 Ski Simulator on a Linux Virtual Machine

B.1.4 Other Simulators

B. 2 System Software

B.2.1 Linux

B.2.2 HP-UX

B.2.3 The Ski Simulator

B.2.4 64-Bit Windows

B.2.5 FreeBSD

B.2.6 OpenVMS

B.3 Desktop Client Access Software

B.3.1 Linux Personal Computers

B.3.2 Macintosh Personal Computers

B.3.3 Windows Personal Computers

B.3.3 References

Appendix C Itanium Instruction Set

C-1 Instructions Listed by Function

C-2 Instructions Listed by Assembler Opcode

References

Appendix D Itanium Registers and Their Uses

D.1 Instruction Pointer

D.2 General Registers and NaT Bits

D.3 Predicate Registers

D.4 Branch Registers

D.5 Floating-Point Registers

D.6 Application Registers

D.7 State Management Registers

D.8 System Information Registers

D.9 System Control Registers

References

Appendix E Conditional Assembly and Macros(GCC Assembler)

E.1 Interference from Explicit Stops

E.2 Repeat Blocks

E.2.1 Simple Repeat Blocks

E.2.2 Indefinite Repeat Blocks Using the .irp Directive

E.2.3 Indefinite Repeat Blocks Using the .irpc Directive

E.3 Conditional Assembly

E.4 Macro Processing

E.4.1 Defining a Macro

E.4.2 Invoking aMacro

E.4.3 Processing of Positional Parameters

E.4.4 Processing of Default Values and Keyword Parameters

E.4.5 Processing of String Parameters

E.5 Using Labels with Macros

E.6 Recursive Macros

E.7 Object File Sections

E.8 MONEY: A Macro Illustrating Sections

Summary

References

Exercises

Appendix F Inline Assembly

F.1 HP-UX C Compilers

F.2 GCC Compiler for Linux

F.3 Intel Compilers for Linux

References

Bibliography

Answers and Hints for Selected Exercises

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

About the Authors

Index