Basic Troubleshooting

= Proxy Server =

Capture Traffic passing though a Transparent Proxy
The filters to be put into a firewall(ScreenOS here) to capture complete packet flow across a firewall.



set ff src-ip 10.1.1.1 dst-ip 144.32.56.43 set ff src-ip 192.168.1.1 dst-ip 144.32.56.43 set ff src-ip 144.32.56.43 dst-ip 65.124.55.31 set ff src-ip 144.32.56.43 dst-ip 10.1.1.1

Packet flow for HTTPS Traffic


= Tail Latency =

Source highscalability.com, accelazh.github.io


 * Imagine a client making a request of a single web server.
 * Ninety-nine times out of a hundred that request will be returned within an acceptable period of time.
 * But one time out of hundred it may not. Say the disk is slow for some reason.
 * If you look at the distribution of latencies, most of them are small, but there's one out on the tail end that's large.
 * That's not so bad really.
 * All it means is one customer gets a slightly slower response every once in a while.


 * Lets' change the example, now instead of one server you have 100 servers and a request will require a response from all 100 servers.
 * That changes everything about your system's responsiveness.
 * Suddenly the majority of queries are slow. 63% will take greater than 1 second. That's bad.


 * Using the same components and scaling them results in a really unexpected outcome.
 * This is a fundamental property of scaling systems: you need to worry not just about not latency, but tail latency, that is the longer events in your system.
 * High performance equals high tolerances.
 * At scale you can’t ignore tail latency.

RCP Library DNS lookups Disk Slow Packet loss Microbursts Deep queues High task response latency Locking Garbage collection OS stack issues Router/switch overhead Transiting multiple hops Slow processing code
 * This latency could come from:

Other Reasons: Overprovisioned VMs Many OS images being forked from a small shared base A large request may be pegging your CPU/network/disk, and make the others queuing up. something went wrong as a dead loop stuck your cpu.


 * The latency percentile has low, middle, and tail parts.
 * To reduce the low, middle parts: Provisioning more resources, cut and parallelize the tasks, eliminate “head-of-line” blocking, and caching will help.
 * To reduce the tail latency: The basic idea is hedging.
 * Even we’ve parallelized the service, the slowest instance will determine when our request is done.


 * References: