Basic Troubleshooting: Difference between revisions
Content added Content deleted
m (1 revision imported) |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
__TOC__ |
__TOC__ |
||
= Proxy Server = |
|||
= Capture Traffic passing though a Transparent Proxy = |
== Capture Traffic passing though a Transparent Proxy == |
||
The filters to be put into a firewall(ScreenOS here) to capture complete packet flow across a firewall. |
The filters to be put into a firewall(ScreenOS here) to capture complete packet flow across a firewall. |
||
Line 15: | Line 16: | ||
<br /> |
<br /> |
||
== Proxy Server Flow<ref>www.india.fidelity.com</ref>== |
== Proxy Server Flow<ref>www.india.fidelity.com</ref> == |
||
[[File:Proxy_server_flow_non_transparent.png|center]] |
[[File:Proxy_server_flow_non_transparent.png|center]] |
||
<br /> |
<br /> |
||
== Packet flow for HTTP Traffic == |
|||
[[File:Proxy_server_flow_non_transparent_http.png|center]] |
[[File:Proxy_server_flow_non_transparent_http.png|center]] |
||
<br /> |
<br /> |
||
== Packet flow for HTTPS Traffic == |
|||
[[File:Proxy_server_flow_non_transparent_https.png|center]] |
[[File:Proxy_server_flow_non_transparent_https.png|center]] |
||
<br /> |
<br /> |
||
= Tail Latency = |
|||
Source [http://highscalability.com/blog/2012/3/12/google-taming-the-long-latency-tail-when-more-machines-equal.html highscalability.com], [https://accelazh.github.io/storage/Tail-Latency-Study accelazh.github.io] |
|||
*Imagine a client making a request of a single web server. |
|||
*Ninety-nine times out of a hundred that request will be returned within an acceptable period of time. |
|||
*But one time out of hundred it may not. Say the disk is slow for some reason. |
|||
*If you look at the distribution of latencies, most of them are small, but there's one out on the tail end that's large. |
|||
*That's not so bad really. |
|||
*All it means is one customer gets a slightly slower response every once in a while. |
|||
*Lets' change the example, now instead of one server you have 100 servers and a request will require a response from all 100 servers. |
|||
*That changes everything about your system's responsiveness. |
|||
*Suddenly the majority of queries are slow. 63% will take greater than 1 second. That's bad. |
|||
*Using the same components and scaling them results in a really unexpected outcome. |
|||
*This is a fundamental property of scaling systems: you need to worry not just about not latency, but tail latency, that is the longer events in your system. |
|||
*High performance equals high tolerances. |
|||
*At scale you can’t ignore tail latency. |
|||
*This latency could come from: |
|||
RCP Library |
|||
DNS lookups |
|||
Disk Slow |
|||
Packet loss |
|||
Microbursts |
|||
Deep queues |
|||
High task response latency |
|||
Locking |
|||
Garbage collection |
|||
OS stack issues |
|||
Router/switch overhead |
|||
Transiting multiple hops |
|||
Slow processing code |
|||
Other Reasons: |
|||
Overprovisioned VMs |
|||
Many OS images being forked from a small shared base |
|||
A large request may be pegging your CPU/network/disk, and make the others queuing up. |
|||
something went wrong as a dead loop stuck your cpu. |
|||
*The latency percentile has low, middle, and tail parts. |
|||
*To reduce the low, middle parts: Provisioning more resources, cut and parallelize the tasks, eliminate “head-of-line” blocking, and caching will help. |
|||
*To reduce the tail latency: The basic idea is hedging. |
|||
*Even we’ve parallelized the service, the slowest instance will determine when our request is done. |
|||
*Code freezes--interrupt, context switch, cache buffer flush to disk, garbage collection, reindexing the database |
|||
Latest revision as of 23:46, 3 December 2019
Proxy Server
Capture Traffic passing though a Transparent Proxy
The filters to be put into a firewall(ScreenOS here) to capture complete packet flow across a firewall.
set ff src-ip 10.1.1.1 dst-ip 144.32.56.43 set ff src-ip 192.168.1.1 dst-ip 144.32.56.43 set ff src-ip 144.32.56.43 dst-ip 65.124.55.31 set ff src-ip 144.32.56.43 dst-ip 10.1.1.1
Proxy Server Flow[1]
Packet flow for HTTP Traffic
Packet flow for HTTPS Traffic
Tail Latency
Source highscalability.com, accelazh.github.io
- Imagine a client making a request of a single web server.
- Ninety-nine times out of a hundred that request will be returned within an acceptable period of time.
- But one time out of hundred it may not. Say the disk is slow for some reason.
- If you look at the distribution of latencies, most of them are small, but there's one out on the tail end that's large.
- That's not so bad really.
- All it means is one customer gets a slightly slower response every once in a while.
- Lets' change the example, now instead of one server you have 100 servers and a request will require a response from all 100 servers.
- That changes everything about your system's responsiveness.
- Suddenly the majority of queries are slow. 63% will take greater than 1 second. That's bad.
- Using the same components and scaling them results in a really unexpected outcome.
- This is a fundamental property of scaling systems: you need to worry not just about not latency, but tail latency, that is the longer events in your system.
- High performance equals high tolerances.
- At scale you can’t ignore tail latency.
- This latency could come from:
RCP Library DNS lookups Disk Slow Packet loss Microbursts Deep queues High task response latency Locking Garbage collection OS stack issues Router/switch overhead Transiting multiple hops Slow processing code
Other Reasons:
Overprovisioned VMs Many OS images being forked from a small shared base A large request may be pegging your CPU/network/disk, and make the others queuing up. something went wrong as a dead loop stuck your cpu.
- The latency percentile has low, middle, and tail parts.
- To reduce the low, middle parts: Provisioning more resources, cut and parallelize the tasks, eliminate “head-of-line” blocking, and caching will help.
- To reduce the tail latency: The basic idea is hedging.
- Even we’ve parallelized the service, the slowest instance will determine when our request is done.
- Code freezes--interrupt, context switch, cache buffer flush to disk, garbage collection, reindexing the database
- References
- ↑ www.india.fidelity.com
{{#widget:DISQUS |id=networkm |uniqid=Basic Troubleshooting |url=https://aman.awiki.org/wiki/Basic_Troubleshooting }}