TCP/IP

From Network Security Wiki


Packet Headers

The headers of IP Protocol suite are as follows:

IP Packet

  • IP Datagram header = 20-60 bytes.
  • Header = 20 bytes if no options.
  • Up to 60 bytes if it contains options.
  • Data length = total length - header length = [65535 - (20~60)] bytes.
  • Total length field = 16 bytes
  • Therefore Max Packet length = 216-1= 65535
  • Fragment offset is required to use 13 bits, it takes away 3 bits, so it can only index every 8th (2^3) byte, so the indices were for 8-byte chunks. THUS the 8 * Fragment Offset to calculate the actual byte-offset of each fragment.


IPv4 Header Format
Version HLEN DSCP ECN Total Length
Identification Flags(DF,MF) Fragment Offset
Time To Live Protocol Header Checksum
Source IP Address
Destination IP Address
Options (if HLEN > 5)



Identification Field:

  • IPID of a packet remains same even after fragmentation.
  • It is used during reassembly of fragmented datagrams.
  • If a packet passes from a device multiple time, it can be traced with IPID.

TCP Packet

  • TCP packet is called Segment.
  • TCP header length = 20-60 bytes.
  • Header = 20 bytes if no options.
  • Up to 60 bytes if it contains options.


TCP Header
Source port Destination port
Sequence number
Acknowledgment number (if ACK set)
Data offset Reserved
0 0 0
N
S
C
W
R
E
C
E
U
R
G
A
C
K
P
S
H
R
S
T
S
Y
N
F
I
N
Window Size
Checksum Urgent pointer (if URG set)
Options (if data offset > 5. Padded at the end with "0" bytes if necessary.)
...


  • Sequence number field of a segment defines the number assigned to the first data byte contained in that segment.
  • Acknowledgment field in a segment defines the number of the next byte a party expects to receive. It acknowledges that all the previous bytes than that byte number were received. If ACK number = 1381, it mean all bytes till 1380 byte number are received & the sender now expects 1381 onwards bytes.
  • Byte No: The bytes of data being transferred in each connection are numbered by TCP. The numbering starts with an arbitrarily generated number.
  • Data offset: tells the upper layers where the data starts. Since TCP header can be anywhere from 5-15 words long, this tells where the header ends and the data begins.


UDP Packet

  • UDP packet is called Datagram.
  • UDP header size = 8 bytes.
  • UDP packet size may be between 8-65535 bytes.
  • But IP Datagram can have max 65535 bytes.
  • Therefore, UDP length= IP length - IP header length


UDP Header
Source port Destination port
Length Checksum


Internet Protocol

Options

        This section is under construction.

Fragmentation

  • Only Data in a datagram is fragmented. Header is never fragmented.
  • Options are in a TLV format which can be max 40 Bytes in length.
[Type][Length][Value]
  • Type field is 1 Byte & Length field is also 1 byte in length.
  • Options field may or may not be copied into each fragment.
  • Copy Options filed into a Fragment is based on the first bit of Type field.
1st bit = 0  ==> Copy options in 1st fragment only
1st bit = 1  ==> Copy options in all fragments
  • Fragmented Packet Analysis:
Before fragmentation
Original IP Datagram
Sequence Identifier Total Length DF Flag MF Flag Fragment offset
0 345 5140 0 0 0
After fragmentation
IP Fragments(ethernet)
Sequence Identifier Total Length DF Flag MF Flag Fragment offset
0-0 345 1500 0 1 0
0-1 345 1500 0 1 185
0-2 345 1500 0 1 370
0-3 345 700 0 0 555
  • Further Fragmentation of a Fragmented Packet
Before fragmentation
Original IP Datagram
Fragment Total bytes Header bytes Data bytes MF flag Fragment offset
1 2500 20 2480 1 0
2 2040 20 2020 0 310
After Fragmentation
IP Fragments(ethernet)
Fragment Total bytes Header bytes Data bytes MF flag Fragment offset
1-0 1500 20 1480 1 0
1-1 1020 20 1000 1 185
2-0 1500 20 1480 1 310
2-1 560 20 540 0 495

Transmission Control Protocol

  • IP is
- Unreliable
- Connection-less
- Unstreamed
- Responsible for Routing of Packets
  • TCP is
- Reliable
- Connection Oriented
- Stream Ordered
- Responsible for end to end delivery
  • Sequence number of packet is the number of the first byte in the packet.
  • Together with length in the TCP header, we know which packet has which bytes
  • Three phases:
Connection Establishment
   - Three way handshake
Data Transfer
Connection Termination
   - Three-way Termination
   - Four-way Termination with a half-close option
  • Connection Reset (using RST Flag):
Deny a connection request
Abort an existing connection
Terminate an idle connection
  • Maximum Segment Life
TCP standard defines MSL as being a value of 120 seconds (2 minutes). 
The common value for MSL is between 30 seconds and 1 minute. 
The MSL is the maximum time a segment can exist in the Internet before it is dropped. 

3-Way Handshake

        This section is under construction.


Windows

Do not confuse between
Congestion control = Adjusting cwnd in response to Packet loss and Congestion in Network
Flow control = Adjusting Sending Rate so that we do not overwhelm the Receive Buffer
  • TCP Windows
Send window 
Receive window
  • Flow Control
  • Opening and Closing Windows
  • Shrinking of Windows
  • Window Shutdown
  • Silly Window Syndrome: When the sending application program creates data slowly, the receiving application program consumes data slowly, or both.
  • Syndrome due to Sender
Solution - Nagle’s Algorithm
 - TCP sends the first piece of data
 - TCP accumulates data in the output buffer 
 - Waits until either the receiving TCP sends an acknowledgment 
 - Or until enough data has accumulated to fill a maximum-size segment
  • Syndrome due to Receiver
A. Solution - Clark’s Solution
 - Announce a window size of zero until either
     - There is enough space to accommodate a segment of maximum size
     - At least half of the receive buffer is empty.
B. Solution - Delayed Acknowledgment
 - Segment is not acknowledged immediately
 - Prevents the sender TCP from sliding its window
 - Another advantage is it reduces traffic
 - May result in the sender unnecessarily retransmitting the unacknowledged segments.
 - Should not be delayed by more than 500 ms to prevent retransmission

Error Control

  • Checksum
  • Acknowledgment
 - Cumulative Acknowledgment (ACK) 
 - Selective Acknowledgment (SACK) 
  • Retransmission
 - Retransmission after RTO
 - Retransmission after Three Duplicate ACK Segments(Reno) 
  • Out-of-Order Segments
 - Store them temporarily 
 - Flag them as out-of-order segments until the missing segments arrive
 - Not delivered to the process directly
 - Data is delivered to the process in order
  • Lost Segment
 - Receive data in other segments in its buffer but leaves a gap to indicate non continuity in the data
 - Receiver immediately sends an acknowledgment displaying the next byte it expects
 - Segment retransmitted after RTO or after 3 duplicate Acks.
  • Fast Retransmission
 - If RTO has a larger value
 - If sender receives four acknowledgments with same value (three duplicates)
 - Segment expected by all of these Ack is resent immediately
  • Delayed Segment
 - May time out
 - It is discarded if retransmitted & both reach
  • Duplicate Segment
 - It is discarded
  • Automatically Corrected Lost ACK
 - Key advantage of using cumulative acknowledgments
 - By Retransmission of Ack of next segments
 - By Ack received of next segments
  • Deadlock Created by Lost Acknowledgment
 - When receiver sends Ack with rwnd 0 
 - Sender shut down its window temporarily
 - Receiver sends Ack if it wants to remove the restriction
 - Problem arises if this acknowledgment is lost
 - Persistence timer (=RTO) is used to resolve this deadlock

Congestion Control

  • Congestion Window
  • Size of the sender’s window is dictated by:
 - Receiver (Receiver-Advertised window size)
 - Network (Congestion window size)
  • Actual Window Size = Minimum (rwnd, cwnd)
  • Slow Start - Exponential Increase
 - Sender starts with a slow rate of transmission(cwnd = 1 MSS)
 - Size increases one MSS each time one acknowledgement arrives
 - Increases the rate exponentially(1,2,4,8....) until a threshold is reached
  • Congestion Avoidance - Additive Increase
 - To avoid congestion TCP must slow down this Exponential growth 
 - Increases the cwnd Additively instead of Exponentially
 - When a “window” of segments is acknowledged the size of congestion window is increased by one
 - A window is the number of segments transmitted during RTT
 - The increase is based on RTT, not on the number of arrived ACKs
 - Therefore size of the congestion window increases additively until congestion is detected
  • Congestion Detection - Multiplicative Decrease
 - If congestion occurs, the window size must be decreased
 - Sender knows about congestion if it needs to retransmit (RTO or 3 Dup Acks received)
 - In both cases, size of Threshold is dropped to half
        Below Section can be merged with Tahoe vs Reno?
  • If RTO occured, TCP Reacts Strongly
 - Stronger possibility of congestion, Segment probably dropped in network
 - TCP sets threshold to half of the current window size
 - Reduces cwnd back to 1 Segment, starts the slow start phase again
  • If 3 Duplicate ACKs are received, TCP has a Weaker Reaction
 - Weaker possibility of congestion 
 - Segment may be dropped but some segments have arrived safely as 3 dup ACKs are received
 - TCP sets threshold to half of the current window size
 - Sets cwnd to value of Threshold
 - Starts the Congestion Avoidance phase
 - This is called fast transmission and fast recovery

Tahoe vs Reno

Source: Wikipedia

  • For each connection, TCP maintains a congestion window, limiting the total number of unacknowledged packets that may be in transit end-to-end.
  • This is analogous to TCP's sliding window used for flow control.
  • As long as non-duplicate ACKs are received, the congestion window is additively increased by one MSS every round trip time.
  • When a packet is lost, the likelihood of duplicate ACKs being received is very high:
  • The behavior of Tahoe and Reno differ in how they detect and react to packet loss:


Tahoe Reno
Feature Fast Retransmit Fast Retransmit + Fast Recovery
Detection RTO Expiry RTO and Fast-Retransmit(3 duplicate ACKs received)
Slow Start Threshold Set to Half of current Congestion Window Set equal to the new Congestion Window
Congestion Window Reduced to 1 MSS Set to Half after segment loss
Resultant State Resets to Slow-Start State Performs a Fast Retransmit; enters a phase called Fast Recovery
If an ACK times out, Slow Start is used as with Tahoe.


  • Fast Recovery (Reno only): In this state, TCP retransmits the missing packet that was signaled by three duplicate ACKs, and waits for an acknowledgment of the entire transmit window before returning to congestion avoidance.
  • If there is no acknowledgment, TCP Reno experiences a timeout and enters the slow-start state.
  • Both algorithms reduce congestion window to 1 MSS on a timeout event.

TCP Timers

  • Retransmission Time-out (RTO)
 - Needs Round-Trip Time (RTT)
    - Measured RTT - time required for segment to reach the destination and be acknowledged
    - Smoothed RTT
    - RTT Deviation - Most implementations use RTT deviation
 - RTO = Smoothed RTT + [4 x RTT Deviation]
  • Karn’s Algorithm
 - Do not consider RTT of a Retransmitted segment in calculation of RTO value
  • Exponential Backoff
 - The value of RTO is doubled for each retransmission
  • Persistence Timer
 - Issue of Deadlock created by Lost Ack, used to reset Window size 0 advertized earlier, is resolved by this timer
 - After timeout, sending TCP sends a special segment(1 byte of new data) called Probe
 - Probe causes the receiving TCP to resend Ack
 - If no reply, another probe is sent and value of persistence timer is doubled and reset 
 - Sender continues sending probes, doubling, resetting value of persistence timer until it reaches a threshold(generally 60s)
 - After that the sender sends one probe segment every 60s until the window is reopened
  • Keepalive Timer
 - If client crashes the connection remains open forever
 - Time-out is usually 2 hours.
 - If server do not hear from client after 2 hours, it sends a probe segment
 - If no response after 10 probes (75s apart) server terminates the connection

Options

1-byte options
  • End of option list
- It can only be used as the last option
- There are no more options in the header after EOP
- Only one occurrence of this option is allowed
  • No operation
- NOP option is also a 1-byte option used as a filler
- Comes before another option to help align it in a four-word slot
Multiple-byte options
  • Maximum Segment Size
- Size of the biggest unit of data that can be received by the destination of the TCP segment
- Defines the maximum size of the data, not the maximum size of the segment
- Field is 16 bits long, the value can be 0 to 65,535 bytes
- default values is 536 bytes
- Value is determined during connection establishment(1st & 2nd Packet) 
- Does not change during the connection
  • Window Scale Factor
- Window size field is 16 bits long so window can range from 0 to 65,535 bytes
- To increase the window size beyond this limit, WSF is used
- [New Window Size] = [Window Size in Header] × 2 WSF
- Value can be determined only during connection establishment 
- Does not change during the connection
- During data transfer, the size of the window (specified in the header) may be changed, but must be multiplied by same WSF
- One end may set the value of the window scale factor to 0, which means it supports this option but does not want to use it for this connection.
  • Timestamp
- Used to Measuring RTT 
- protection against wrapped sequence numbers
  • SACK-permitted
- Determined during connection establishment only(1st and 2nd Packet)
  • SACK
- Allows the sender to know which segments are actually lost and which have arrived out of order
- Sender can then send only those segments that are really lost
- Option includes a list for blocks arriving out of order
- Each block occupies two 32-bit numbers
- SACK option cannot define more than 4 blocks
- The information for 5 blocks occupies (5 × 2) × 4 + 2 or 42 bytes
- Allowed size of an option in TCP is only 40 bytes
- The first block of the SACK option can be used to report the duplicates
- The SACK option announces this duplicate data first and then the out-of-order block

Source: packetlife.net

  • TCP Retransmission with ACK
  • TCP SACK
  • Wireshark PCAP

Packet Capture: TCP SACK Sample Capture

PUSH vs URG Flags

Source: Packetlife.net

PSH Flag

  • Buffers are implemented on both sides of a TCP connection in both directions
  • Buffers allow for more efficient transfer of data when sending more than one MSS of data
  • Large buffers do more harm than good when dealing with real-time applications
  • For a Telnet session, if TCP waited until there was enough data to fill a packet before it would send one
  • A thousand characters are required before the first packet would make it to the remote device
  • The socket can be written by application with option of "pushing" data out immediately, rather waiting for additional data to enter the buffer
  • PSH flag in the outgoing TCP packet is set to 1
  • Upon receiving a packet with PSH flag, the other side immediately forwards the segment to application


URG Flag

  • RFC 6093 (Proposed Standard) will deprecates the use of URG flag
  • The URG flag is used to inform a receiving station that certain data within a segment is urgent and should be prioritized
  • If the URG flag is set, receiver checks the urgent pointer in TCP header
  • This pointer indicates how much of the data in the segment, counting from the first byte, is urgent
  • If the data size is 100 bytes and only first 50 bytes is urgent, the urgent pointer will have a value of 50
  • The URG flag isn't employed much by modern protocols


Capture file: Telnet PCAP

         This section needs a clarification. Refer Discussion page for details.
  • The 0xFF character sent in packet #86 is precedes the Telnet command 0xF2 (242) in packet #70 denoting a data mark.
  • Per RFC 854, this command should be sent with the TCP URG flag set.
  • The urgent pointer in packet #68 indicates that the first byte of the segment (which in this case is the entire segment) should be considered urgent data.


TCP Header
Source port Destination port
Sequence number
Acknowledgment number
Data offset N
S
C
W
R
E
C
E
U
R
G
A
C
K
P
S
H
R
S
T
S
Y
N
F
I
N
Window Size
Checksum Urgent pointer (if URG set)


MTU vs MSS[1]

  • The default TCP Maximum Segment Size is 536.
  • To use higher value, MSS is specified as a TCP option initially in the TCP SYN packet during the TCP handshake.
  • The value cannot be changed after the connection is established.
  • Each direction of data flow can use a different MSS.
  • Small MSS values will reduce or eliminate IP fragmentation, but will result in higher overhead.
  • For most computer users, the MSS option is established by the operating system.

ECN

        This section is under construction.
  • Explicit Congestion Notification (ECN) is an extension to the Internet Protocol and to the Transmission Control Protocol.
  • ECN allows end-to-end notification of network congestion without dropping packets.
  • ECN is an optional feature that may be used between two ECN-enabled endpoints when the underlying network infrastructure also supports it.
  • Conventionally, TCP/IP networks signal congestion by dropping packets.
  • When ECN is successfully negotiated, an ECN-aware router may set a mark in the IP header instead of dropping a packet in order to signal impending congestion.
  • The receiver of the packet echoes the congestion indication to the sender, which reduces its transmission rate as if it detected a dropped packet.
  • ECN requires specific support at the Internet layer and the transport layer for the following reasons:
  • In TCP/IP, routers operate within the Internet layer, while the transmission rate is handled by the endpoints at the transport layer.
  • Congestion may be handled only by the transmitter, but since it is known to have happened only after a packet was sent, there must be an echo of the congestion indication by the receiver to the transmitter.
  • Without ECN, congestion indication echo is achieved indirectly by the detection of lost packets.
  • With ECN, the congestion is indicated by setting the ECN field within an IP packet to CE and is echoed back by the receiver to the transmitter by setting proper bits in the header of the transport protocol.
  • For example, when using TCP, the congestion indication is echoed back by setting the ECE bit.

Operation of ECN with IP

  • ECN uses the two least significant (right-most) bits of the DiffServ field in the IPv4 or IPv6 header to encode four different codepoints:
00 – Non ECN-Capable Transport, Non-ECT
10 – ECN Capable Transport, ECT(0)
01 – ECN Capable Transport, ECT(1)
11 – Congestion Encountered, CE.
  • When both endpoints support ECN they mark their packets with ECT(0) or ECT(1).
  • If the packet traverses an active queue management (AQM) queue (e.g., a queue that uses random early detection (RED)) that is experiencing congestion and the corresponding router supports ECN, it may change the codepoint to CE instead of dropping the packet.
  • This act is referred to as “marking” and its purpose is to inform the receiving endpoint of impending congestion.
  • At the receiving endpoint, this congestion indication is handled by the upper layer protocol (transport layer protocol) and needs to be echoed back to the transmitting node in order to signal it to reduce its transmission rate.
  • Because the CE indication can only be handled effectively by an upper layer protocol that supports it, ECN is only used in conjunction with upper layer protocols, such as TCP, that support congestion control and have a method for echoing the CE indication to the transmitting endpoint.

Operation of ECN with TCP

  • TCP supports ECN using three flags in the TCP header.
  • The first one, the Nonce Sum (NS), is used to protect against accidental or malicious concealment of marked packets from the TCP sender.
  • The other two bits are used to echo back the congestion indication (i.e. signal the sender to reduce the amount of information it sends) and to acknowledge that the congestion-indication echoing was received.
  • These are the ECN-Echo (ECE) and Congestion Window Reduced (CWR) bits.
  • Use of ECN on a TCP connection is optional; for ECN to be used, it must be negotiated at connection establishment by including suitable options in the SYN and SYN-ACK segments.
  • When ECN has been negotiated on a TCP connection, the sender indicates that IP packets that carry TCP segments of that connection are carrying traffic from an ECN Capable Transport by marking them with an ECT codepoint.
  • This allows intermediate routers that support ECN to mark those IP packets with the CE codepoint instead of dropping them in order to signal impending congestion.
  • Upon receiving an IP packet with the Congestion Experienced codepoint, the TCP receiver echoes back this congestion indication using the ECE flag in the TCP header.
  • When an endpoint receives a TCP segment with the ECE bit it reduces its congestion window as for a packet drop.
  • It then acknowledges the congestion indication by sending a segment with the CWR bit set.
  • A node keeps transmitting TCP segments with the ECE bit set until it receives a segment with the CWR bit set.

ECN support in IP by routers

  • Since ECN marking in routers is dependent on some form of active queue management, routers must be configured with a suitable queue discipline in order to perform ECN marking.
  • Cisco IOS routers perform ECN marking if configured with the WRED queuing discipline since version 12.2(8)T.
  • Linux routers perform ECN marking if configured with one of the RED or GRED queue disciplines with an explicit ecn parameter, by using the sfb discipline, or by using the CoDel Fair Queueing (fq_codel) discipline.



Misc

Some misc information to be remembered regarding TCP/IP is below.

Ranges of ports used for TCP/IP networks

Well-known ports 0 to 1023
Registered ports 1024 to 49151
Dynamic, private or ephemeral ports 49152–65535


Important port numbers

Port Protocol Transport
20-21 FTP Control(21)/Data(20) TCP
22 SSH/SCP TCP
23 Telnet
25 SMTP
53 DNS
67-68 DHCP/BOOTP(67 - Server, 68 - Client)
69 TFTP
80 HTTP TCP
88 Kerberos TCP,UDP
110 POP3
123 NTP
135 Microsoft RPC
137-139 NetBIOS
161 SNMP (Poll) UDP
162 SNMP (Trap) TCP, UDP
179 BGP
389 LDAP
443 HTTP over SSL
445 SMB
465 SMTP over SSL
500 ISAKMP
514 Syslog
520 RIP
521 RIPng (IPv6)
995 POP3 over SSL
1433 SQL Server
1701 L2TP
1812-1813 RADIUS
3306 MySQL
3389 Terminal Server(RDP)
5060 SIP
6881-6999 BitTorrent


Important protocol numbers

Protocol no Protocol name
1 ICMP
4 IPv4 (encapsulation)
6 TCP
17 UDP
47 GRE
50 ESP
51 AH
88 EIGRP
89 OSPF



Study of a packet header

This is a IP header from an IP packet received at destination :

4500 003c 1c46 4000 4006 b1e6 ac10 0a63 ac10 0a0c
Bytes Details
45 4 = IP version & 5 = Header Length (4 byte words; 5×4=20 bytes)
00 Type of Service (TOS)
003c Total Length IP header. Here it is 60
1c46 Identification Field
4000 These two bytes are divided into 3 bits and 13 bits = flags and fragment offset
4006 First byte ’40′ = TTL field and ’06′ = protocol field (TCP here)
be16 Checksum
ac10 0a63 Source IP address
ac10 0a0c Destination IP address
  • SNMP Connections:


References
  1. www.akamai.net


{{#widget:DISQUS |id=networkm |uniqid=TCP/IP |url=https://aman.awiki.org/wiki/TCP/IP }}