SRE实战(影印版 英文版)

SRE实战(影印版 英文版)
作 者: 纳特·韦尔奇
出版社: 东南大学出版社
丛编项:
版权说明: 本书为出版图书,暂不支持在线阅读,请支持正版图书
标 签: 暂缺
ISBN 出版时间 包装 开本 页数 字数
未知 暂无 暂无 未知 0 暂无

作者简介

暂缺《SRE实战(影印版 英文版)》作者简介

内容简介

《SRE实战(影印版 英文版)》是软件开发人员在网站灾难性故障中的生存指南。随着企业力求实现正常运行时间的大化,站点可靠性工程(Site Reli ability Engineering,SRE)首当其冲。当你的站点出现问题,修复故障已经迫在眉睫的时候,《SRE实战(影印版 英文版)》可以作为一个手把手的操作指南。Nat Welch在可靠性工程方面丰富的实战经验源自于某些对于系统中断事件极为敏感的互联网大公司。他用于监控现代Web服务、设置警报和评估事件响应的方法都经过了实践的考验,学会这些必将助你一臂之力。《SRE实战(影印版 英文版)》可不仅仅是教你如何应对灾难,它还为你揭示了安全测试和发布软件所需的工具和策略、长期增长计划以及预见未来的瓶颈。通过《SRE实战(影印版 英文版)》,你将学会如何制定自己的强健行动计划,以便在全公司的网站危机中凸显你的价值。

图书目录

Preface

Chapter 1: Introduction

A brief history

What is SRE?

What is in the book?

SRE as a framework for new projects

Summary

References

Chapter 2: Monitoring

Why monitoring?

Instrumenting an application

What should we measure?

A short introduction to SLIs, SLOs, and error budgets

Service levels

Error budgets

Collecting and saving monitoring data

Polling applications

Nagios

Prometheus

Cacti

Sensu

Push applications

StatsD

Telegraf

ELK

Displaying monitoring information

Arbitrary queries

Graphs

Dashboards

Chatbots

Managing and maintaining monitoring data

Communicating about monitoring

Do they even know there is monitoring?

References and related reading

Future reading

Summary

Chapter 3: Incident Response

What is an incident?

What is incident response?

Alerting

When do you alert?

How do you alert?

Alerting services

What is in an alert?

Who do you alert?

Being on call

Communication

Incident Command System (ICS)

Where do you communicate?

Recovering the system

Calling all clear

Summary

Chapter 4: Postmortems

What is a postmortem?

Why write a postmortem?

When to write a postmortem document

Carrying out incident analysis

How to write a postmortem document

Summary

Impact

Timeline

Root cause

Action items

Postmortems without action items

Appendix

Blameless postmortems

Holding a postmortem meeting

Analyzing past postmortems

MTFR and MTBF

Alert fatigue

Discussing past outages

Summary

References

Chapter 5: Testing_and Releasing_

Testing

What do you test?

Testing code

Testing infrastructure

Testing processes

Releasing

When to release

Releasing to production

Validating your release

Rollbacks

Automation

Continuous everything

Summary

Chapter 6: Capacity Planning

A quick introduction to business finance

Why plan?

Managing risk and managing expectations

Defining a plan

What is our current capacity?

When are we going to run out of capacity?

How should we change our capacity?

State and concurrency

Is your service limited by another service?

Scaling for events

Unpredictable growth-user-generated content

Preplanned versus autoscaling

Delivering

Execute the plan

Architecture——where performance changes come from

Tech as a profit center and procurement

Summary

Chapter 7: Building Tools

Finding projects

Defining projects

RDD

Example

Design documents

Planning projects

Example

Retrospectives and standups

Allocation

Building projects

Advice for writing code

Separation of concerns

Long-term work

Example OKRs

Notebooks

Documenting and maintaining projects

Summary

Chapter 8: User Experience

An introduction to design and UX

Real-world interaction design

User testing

Picking an experience

Designing the test

Finding people to test

Developer experience

Experience of tools

Performance budgets

Security

Authentication

Authorization

Risk profile

Phishing

ACM code of ethics

Summary

References

Chapter 9: Networking Foundations

The internet

Sending an HTTP request

DNS

dig

Ethernet and TCP/IP

Ethernet

IP

CIDR notation

ICMP

UDP

TCP

HTTP

curl and wget

Tools for watching the network

netstat

nc

tcpdump

Summary

Chapter 10: Linux and Cloud Foundations

Linux fundamentals

Everything is a file

Files, directories, and inodes

Sockets

Devices

/proc

Filesystem layout

What is a process?

Zombies

Orphans

What is nice?

syscalls

How to trace

Watching processes

Build your own

Cloud fundamentals

VMs

Containers

Load balancing

Autoscaling

Storage

Queues and Pub/Sub

Units of scale

Example architecture interview

Summary

References

Other Books You May Enjoy

Index