Humans By The Hundred презентация

Содержание


Humans By The Hundred
 Scaling Big Data for Big Team Growth$ whoami
 SRE Manager at Yelp
 CWRU Alum
 Pittsburgh native
 <3What is Yelp?
 Many sites: www, m, biz, api
 Mobile apps
Why Am I Here?DATA
 DATAThis talk is about peopleThe GoalIterate as fast as possibleRegardless of how many people are participatingDeploymentHow It StartsDeployment: the early days
 Get a few people together in slack/irc/etc.
Things get slower...
 Tests take longer to run
 More hosts =The Problem: Humans Are FallibleThe Problem: Humans Are Fallible
 “…oh @$#&”The Problem, With Math
 Assume:
 Every change has a chance ofThe Problem, With Math
 Only you
 p = .98 (98%)
 YouThe Problem, With Math
 p = (.98)nThe Problem, With Math
 p = (.98)nThis doesn’t scale!
 More developers = more changes
 More changes =Mitigating Exponential Decay
 p = (.98)nMitigating Exponential Decay
 p = (.98)nMaking it harder to screw up
 Write more tests
 Write betterJust write better software and stop making mistakes!PROBLEM SOLVEDThe Real World
 Testing builds confidence in our changes
 Testing doesMitigating Exponential Decay
 p = (.98)nMitigating Exponential Decay
 p = (.98)nService-Oriented Architecture
 Large monolith → smaller services
 Services communicate over network
Service-Oriented Architecture
 Benefits
 Smaller code bases = upper bound to n
Service-Oriented Architecture
 Drawbacks
 everything becomes decoupled
 function calls start looking likeSOA scales people, not code.Conquering SOA
 With the monolith, it’s easy to focus on meanConquering SOA
 In a SOA, focus on mean time to recoveryConquering SOA
 Fail fast
 Anticipate failure
 Leverage iteration speed to recoverConquering SOA
 Treat everything as distributed
 That means everything will fail
Reaping the Benefits
 Smaller failure domains
 Fewer people & changes toReaping the Benefits
 Smaller changes
 means smaller code reviews
 means fasterContinuous Delivery
 Everyone works against master branch
 Master is deployed whenPROBLEM SOLVEDTestingTests are hard to get right.How can we do better?“Not Recommended” Tests“Not Recommended” Tests
 If a test fails on master:
 a featureReliable tests >> test coverage.Don’t always run all the tests!Tests of external services should be monitoringDefine your boundaries.Questions?



Слайды и текст этой презентации
Слайд 1
Описание слайда:
Humans By The Hundred Scaling Big Data for Big Team Growth


Слайд 2
Описание слайда:
$ whoami SRE Manager at Yelp CWRU Alum Pittsburgh native <3 Web Operations Just a dude

Слайд 3
Описание слайда:

Слайд 4
Описание слайда:

Слайд 5
Описание слайда:
What is Yelp? Many sites: www, m, biz, api Mobile apps Partner platform Hundreds of developers Thousands of servers

Слайд 6
Описание слайда:
Why Am I Here?

Слайд 7
Описание слайда:

Слайд 8
Описание слайда:
DATA DATA

Слайд 9
Описание слайда:
This talk is about people

Слайд 10
Описание слайда:

Слайд 11
Описание слайда:

Слайд 12
Описание слайда:

Слайд 13
Описание слайда:

Слайд 14
Описание слайда:

Слайд 15
Описание слайда:

Слайд 16
Описание слайда:

Слайд 17
Описание слайда:
The Goal

Слайд 18
Описание слайда:
Iterate as fast as possible

Слайд 19
Описание слайда:
Regardless of how many people are participating

Слайд 20
Описание слайда:
Deployment

Слайд 21
Описание слайда:
How It Starts

Слайд 22
Описание слайда:
Deployment: the early days Get a few people together in slack/irc/etc. Merge up the code Run the tests Manually test it in stage Cross your fingers

Слайд 23
Описание слайда:

Слайд 24
Описание слайда:

Слайд 25
Описание слайда:
Things get slower... Tests take longer to run More hosts = longer downloads More developers = more eyeballs More features = more code

Слайд 26
Описание слайда:
The Problem: Humans Are Fallible

Слайд 27
Описание слайда:
The Problem: Humans Are Fallible “…oh @$#&”

Слайд 28
Описание слайда:

Слайд 29
Описание слайда:
The Problem, With Math Assume: Every change has a chance of success: 98% That means no test failures, no reverts, etc. Every deploy has a number of changes: n Any failure in the pipeline invalidates the deploy Let’s figure out the probability of a successful deployment: p

Слайд 30
Описание слайда:
The Problem, With Math Only you p = .98 (98%) You and a friend p = .98 * .98 = .96 (96%) You and nine co-workers p = .98 * .98 * .98 * … * .98 = .82 (82%)

Слайд 31
Описание слайда:
The Problem, With Math p = (.98)n

Слайд 32
Описание слайда:
The Problem, With Math p = (.98)n

Слайд 33
Описание слайда:

Слайд 34
Описание слайда:
This doesn’t scale! More developers = more changes More changes = longer deploys Longer deploys = less time to develop Less time to develop = slower to iterate Slower to iterate != the goal

Слайд 35
Описание слайда:
Mitigating Exponential Decay p = (.98)n

Слайд 36
Описание слайда:
Mitigating Exponential Decay p = (.98)n

Слайд 37
Описание слайда:

Слайд 38
Описание слайда:
Making it harder to screw up Write more tests Write better tests Get better code reviews Get better infrastructure Switch programming languages Use better tools

Слайд 39
Описание слайда:
Just write better software and stop making mistakes!

Слайд 40
Описание слайда:
PROBLEM SOLVED

Слайд 41
Описание слайда:

Слайд 42
Описание слайда:
The Real World Testing builds confidence in our changes Testing does not protect you from failure Better tools, tests, and infrastructure can raise our success rates

Слайд 43
Описание слайда:
Mitigating Exponential Decay p = (.98)n

Слайд 44
Описание слайда:
Mitigating Exponential Decay p = (.98)n

Слайд 45
Описание слайда:
Service-Oriented Architecture Large monolith → smaller services Services communicate over network Usually HTTP, but you can do RPC, SOAP, etc. Service = independent code base Independent deployments

Слайд 46
Описание слайда:
Service-Oriented Architecture Benefits Smaller code bases = upper bound to n Failure domains become isolated Technology independence Federated responsibility

Слайд 47
Описание слайда:
Service-Oriented Architecture Drawbacks everything becomes decoupled function calls start looking like HTTP requests versioning can be a nightmare tracking dependencies is hard data consistency becomes challenging end-to-end testing becomes hard(er), if not impossible

Слайд 48
Описание слайда:
SOA scales people, not code.

Слайд 49
Описание слайда:
Conquering SOA With the monolith, it’s easy to focus on mean time between failures (MTBF)

Слайд 50
Описание слайда:
Conquering SOA In a SOA, focus on mean time to recovery (MTTR)

Слайд 51
Описание слайда:
Conquering SOA Fail fast Anticipate failure Leverage iteration speed to recover fast

Слайд 52
Описание слайда:
Conquering SOA Treat everything as distributed That means everything will fail Use timeouts, retries Find ways to degrade gracefully Fail fast & isolated Don’t rely on synchronous processes Prepare for eventual consistency

Слайд 53
Описание слайда:
Reaping the Benefits Smaller failure domains Fewer people & changes to manage Deploys get smaller Deploys get faster Deploys become continuous

Слайд 54
Описание слайда:
Reaping the Benefits Smaller changes means smaller code reviews means faster validation means smaller blast radius means faster iteration

Слайд 55
Описание слайда:
Continuous Delivery Everyone works against master branch Master is deployed when commits added Deployment gated by tests Monitoring knows something is wrong before you do!

Слайд 56
Описание слайда:
PROBLEM SOLVED

Слайд 57
Описание слайда:
Testing

Слайд 58
Описание слайда:
Tests are hard to get right.

Слайд 59
Описание слайда:

Слайд 60
Описание слайда:

Слайд 61
Описание слайда:

Слайд 62
Описание слайда:

Слайд 63
Описание слайда:

Слайд 64
Описание слайда:

Слайд 65
Описание слайда:
How can we do better?

Слайд 66
Описание слайда:

Слайд 67
Описание слайда:
“Not Recommended” Tests

Слайд 68
Описание слайда:
“Not Recommended” Tests If a test fails on master: a feature is broken on the live website, or your test sucks and you should ditch it In either case, we disable it Ticket is created Developers can fix it later or just bin it and start fresh

Слайд 69
Описание слайда:
Reliable tests >> test coverage.

Слайд 70
Описание слайда:
Don’t always run all the tests!

Слайд 71
Описание слайда:
Tests of external services should be monitoring

Слайд 72
Описание слайда:
Define your boundaries.

Слайд 73
Описание слайда:

Слайд 74
Описание слайда:

Слайд 75
Описание слайда:

Слайд 76
Описание слайда:
Questions?


Скачать презентацию на тему Humans By The Hundred можно ниже:

Похожие презентации