Basic Troubleshooting: Difference between revisions

Line 37:
*Imagine a client making a request of a single web server.
*Ninety-nine times out of a hundred that request will be returned within an acceptable period of time.
*But one time out of hundred it may not. Say the disk is slow for some reason.
*Say the disk is slow for some reason.
*If you look at the distribution of latencies, most of them are small, but there's one out on the tail end that's large.
*That's not so bad really.
Line 45 ⟶ 44:
*Lets' change the example, now instead of one server you have 100 servers and a request will require a response from all 100 servers.
*That changes everything about your system's responsiveness.
*Suddenly the majority of queries are slow. 63% will take greater than 1 second. That's bad.
*That's bad.
 
*Using the same components and scaling them results in a really unexpected outcome.
Line 67 ⟶ 65:
Transiting multiple hops
Slow processing code