Dump

__NOINDEX__

R&S Quick Notes – IGP
RIP

Know your filters: Offset-list, Distribute-lists, distance command. With filters read carefully: “between 25 & 45″ or “from 25 to 45″. Know your prefix-lists or alternatively using ACL’s instead. “passive interface” command, ONLY stops the sending of updates out the interface. Interface will still receive and process those updates. Passive interfaces will still be advertised in other updates.

EIGRP

Advertising a default route out one interface: “ip summary-address eigrp [AD] 0.0.0.0 0.0.0.0″ To see if a neighbor is configured as STUB, “show ip eigrp neighbors [detail]” as look for ‘CONNECTED SUMMARY’ On frame-relay multipoint interfaces, don’t forget to disable split-horizon. External EIGRP routes AD (admin distance = 170) can NOT be changed on per prefix basis. Metric weight values:

1 0 1 0 0 = Default 0 0 1 0 0 = Only DLY 1 0 0 0 0 = Only BW   3 0 1 0 0 = BW has 3 times more weight reference than DLY

Metric formula:

Metric = ((107 / BW) + (DLY/10) ) * 256

IPv6

RIPng – “no ip split-horizon” in a process command not a interface command. EIGRPv6 – Do not forget to enable eigrp under the process. IPv6 tunnel method with least overhead : IPv6IP Tunnel protocol numbers for ACL’s : IPv6IP = Protocol-41, &  GRE IPv6 = Protocol-47 You can not redistribute a default static route(::/0) with ospfv3. Dynamic information (ie IGP next-hops) recurses to remote link-local address, not the global unicast interface.

Windows Script Backup ScreenOS
@echo off REM ================================================================ REM ===This Script may give following error: REM ===FATAL ERROR: Network error: Connection timed out ==> Check IP addresses REM ===FATAL ERROR: Network error: Connection refused ==> Check SSH Parameters on Firewall REM ===WARNING - POTENTIAL SECURITY BREACH! ==> SSH Public Keys changed/recalculated REM ===Access denied ==> Password wrong REM ================================================================ REM ===No of times "Access denied" message appears ==> no of wrong firewalls with wrong pwds REM ================================================================ REM ===Configurable Parameters REM ================================================================ set username=aman set CFGFILE=BackupList.txt set DESTDIR=Backups\ REM ================================================================ REM ===Script code starts here REM ================================================================ SET TIMESTAMP=%date:~-4,4%.%date:~-7,2%.%date:~-10,2% for /F "tokens=1,2,3 delims=," %%A in (%CFGFILE%) do (   IF NOT EXIST "%DESTDIR%%TIMESTAMP%" mkdir "%DESTDIR%%TIMESTAMP%"    plink -ssh -C -batch -pw %%C %username%@%%B get config > "%DESTDIR%%TIMESTAMP%\%%A.cfg" ) echo Backup completed

"BackupList.txt" file: R1,192.168.1.3,cisco R2,192.168.1.4,cisco2

Need to download Plink from this Link

HA Best Practices
Basic:

1.	Two Firewalls should have same Hardware(Model, Modules, Ram, Ports, etc) 2.	Firmware should be exactly same i.e. Major, Minor as well as Patch version 3.	Licenses & features on both firewalls should be same (Basic, Advanced, AV, DI, AS, Web filtering, etc) 4.	One firewall with expired license should be not put in cluster with a firewall with no license as it may cause them to become out of sync & different free memory. 5.	It is recommended to configure cluster with 2 dedicated HA links. 6.	VSD Group should be 0. If it is not 0, need to assign interfaces to that VSD Group on both firewalls. 7.	Console access is always recommended before Configuring/ Implementing/ Troubleshooting NSRP issues. 8.	Hostnames of the firewalls should be different to differentiate between devices.

Preempt:

1.	Preempt should be enabled. 2.	Hold-down timer should have a higher value (~120-180 seconds) to prevent NSRP failover flapping. 3.	Preempt need not be configured on the backup device. 4.	It should not be configured in environments with dynamic routing protocols due to protocol re-convergence. 5.	The priority of the preferred backup should be a higher value, as the lower priority takes precedence.

Interface Monitoring:

1.	Only add critical interfaces to Monitoring to avoid unnecessary failovers/preempts.

Track-IP:

1.	Track-IP is necessary to achieving a successful failover event, when the primary Juniper firewall stops passing traffic; but the monitored interfaces remain up while using interface monitoring only. 2.	Need to determine one or more hosts that can reliably respond to ICMP/ARP traffic.

Master-Always-Exist:

1.	With NSRP monitoring enabled, both NSRP peers can become 'Inoperable'. Enabling the master-always-exist option will ensure that the cluster remains available. 2.	Run the command “set nsrp vsd-group master-always-exist” only on Master & it will sync to Backup automatically. Secondary-path:

1.	To avoid Split Brain, Configure NSRP with 2 dedicated HA links. 2.	The secondary-path option allows NSRP to poll the peer via an alternate, non-dedicated interface. The purpose is only to prevent a split-brain scenario, so NSRP sync data is not carried across this link, only heart-beat messages.

RTO Sync:

1.	Backup Session Timeout Acknowledge should be enabled 2.	Route Synchronization should not be used unless Dynamic Routing Protocol is running.

HA probe:

1.	HA probe must be enabled if the HA links are connected through a layer 2 switch. 2.	It should NOT be used if they are directly connected. 3.	Duplex settings on the switch and firewall interfaces should match. Authentication & Encryption password:

1.	Use NSRP Authentication & Encryption if the HA Cables connect through a layer 2 switch. 2.	No need to use Authentication & Encryption if they are directly connected. Misc:

1.	While adding secondary firewall in a cluster, Interface-based Default Route such as “set interface gateway ” will result in loss of communication as the Interface will become Inactive. Need to add regular Default Route before proceeding. 2.	Duplicate MAC address seen when 2 set of NSRP Clusters with same Cluster ID and VSD-Group are attached to the same switch/or are in same Broadcast Domain. Changing the Cluster ID or VSD group number will resolve the issue.

KBs Referred:

http://kb.juniper.net/KB9311 http://kb.juniper.net/KB9309 http://kb.juniper.net/KB11432

SRX Import config
1. load system terminal Merge|Replace| {}{}{}{} Cntl+D commit

2. Edit Set... set... one by one commit

Juniper SRX Firewalls
Juniper SRX Firewalls run = used in configure mode to use operational mode commands

//Show Routes show route brief show route best x.x.x.x set routing-options static route 10.2.2.0/24 next-hop 10.1.1.254 //Forwarding Table run show route forwarding-table destination x.x.x.x/24

//TraceOptions settings root@fw1# show security flow | display set set security flow traceoptions file matt_trace set security flow traceoptions file files 3 set security flow traceoptions file size 100000 set security flow traceoptions flag basic-datapath set security flow traceoptions packet-filter f0 source-prefix 10.0.0.1/32 destination-prefix 200.1.2.3/32 set security flow traceoptions packet-filter f1 source-prefix 10.0.0.1/32 destination-prefix 200.1.2.3/32 activate security flow traceoptions commit monitor start matt_trace monitor list

!! Kill the capture monitor stop  clear log            !! Clear the log file delete security flow traceoptions commit file delete 

//Show Traceoptions show security flow session source-prefix 10.124.80.42 destination-prefix 117.1.1.25 start shell

egrep ‘matched filter|(ge|fe|reth)-.*->.*|session found|create session|dst_xlate|routed|search|denied|src_xlate|outgoing phy if’ /var/log/matt_trace | sed -e ‘s/.*RT://g’ | sed -e ‘s/tcp, flag 2 syn/–TCP SYN–/g’ | sed -e ‘s/tcp, flag 12 syn ack/–TCP SYN\/ACK–/g’ | sed -e ‘s/tcp, flag 10/–TCP ACK–/g’ | sed -e ‘s/tcp, flag 4 rst/–TCP RST–/g’ | sed -e ‘s/tcp, flag 14 rst/–TCP RST\/ACK–/g’ | sed -e ‘s/tcp, flag 18/–TCP PUSH\/ACK–/g’ | sed -e ‘s/tcp, flag 11 fin/–TCP FIN\/ACK–/g’ | sed -e ‘s/tcp, flag 5/–TCP FIN\/RST–/g’ | sed -e ‘s/icmp, (0\/0)/–ICMP Echo Reply–/g’ | sed -e ‘s/icmp, (8\/0)/–ICMP Echo Request–/g’ | sed -e ‘s/icmp, (3\/0)/–ICMP Destination Unreachable–/g’ | sed -e ‘s/icmp, (11\/0)/–ICMP Time Exceeded–/g’ | awk ‘/matched/ {print “\n\t\t\t=== PACKET START ===”}; {print};’

//Show Sessions run show security flow session destination-prefix x.x.x.x

//Match Policy run show security match-policies from-zone zonea to-zone zoneb source-ip x.x.x.x destination-ip x.x.x.x protocol tcp source-port 1024 destination-port xx

//Check for Block Group show security policies from-zone untrust to-zone trust | display set | grep deny

//Find Syntax for an Existing Command show | display set | xxxxxxxxx

//VPN Troubleshooting show security ike security-associations [index ] [detail] show security ipsec security-associations [index ] [detail] show security ipsec statistics [index ]

//VPN //Set proxy ID’s for a route based tunnel set security ipsec vpn vpn-name ike proxy-identity local 10.0.0.0/8 remote 192.168.1.0/24 service any

//Packet Capture set security datapath-debug capture-file my-capture set security datapath-debug capture-file format pcap set security datapath-debug capture-file size 1m set security datapath-debug capture-file files 5 set security datapath-debug maximum-capture-size 400 set security datapath-debug action-profile do-capture event np-ingress packet-dump set security datapath-debug packet-filter my-filter action-profile do-capture set security datapath-debug packet-filter my-filter source-prefix 1.2.3.4/32

//Super SRX Packet Capture Filter egrep ‘matched filter|(ge|fe|reth ) -.*- > .*|session found|Session \(id|session id|create|dst_nat|chose interface|dst_xlate|routed|search|denied|src_xlate|dip id|outgoing phy if|route to|DEST|post’ /var/log/mchtrace | uniq | sed -e ‘s/.*RT://g’ | awk ‘/matched/ {print “\n\t\t\t=== PACKET START ===”} ; {print} ;’ | awk ‘/^$/ {print “\t\t\t=== PACKET END ===”}; {print};’ ; echo | awk ‘/^$/ {print “\t\t\t=== PACKET END ===”}; {print};’

// Policy commands

show | display set (shows policy) set system syslog set security log set interfaces ge-0/0/3 gigether-options auto-negotation (redundant-parent) set security policies from-zone xxx to-zone xxx policy policy_name match set security zones security-zone untrust address-book address set security nat source rule-set zone-to-zone rule rule-source-nat match source-address 10.0.0.0 set routing-instances set applications

set security ike proposal set security ike policy set security ike gateway set security ipsec proposal set security ipsec policy set security ipsec vpn

show|compare commit check commit comments ticket#2222 and-quit

set security policies from-zone dmz to-zone trust policy 12 match source-address h_10.124.0.1 destination-address h_1.2.3.4 application tcp_22 set security policies from-zone dmz to-zone trust policy 12 then permit set security policies from-zone dmz to-zone trust policy 12 then log session-init session-close

+        match { +            source-address h_10.124.0.1; +            destination-address h_1.2.3.4; +            application tcp_22; +        } +         then { +            permit; +            log { +                session-init; +                session-close; +            } +         } +     }

Various: show system uptime 	Uptime show version 	Version of platform (host/model) show chassis firmware 	Firmware loaded on FPCs show system software detail show chassis routing-engine 	CPU, Memory for Routing-Engine show chassis fan 	Speed and status of fans show chassis environment 	Temperature status of components show chassis hardware detail 	Hardware inventory (backplane) show system core-dumps 	Core-dumps show system alarms 	System alarms show chassis alarms 	Alarms for hardware and chassis show system boot-messages 	Logs from boot sequence show log chassisd 	Logs for SRX chassis (Cards) show log messages 	Recent system messages show configuration security log 	Syslog configuration show system buffers 	Utilization of memory buffers show system virtual-memory 	Virtual memory utilization show system processes 	Processes running on system show security idp memory 	IDP memory statistics show security monitoring performance session 	Session counts on each FPC

MIP in a policy-based VPN
KB9924

This work-around is for configuring a Mapped Internet Protocol (MIP) address in a policy-based VPN, where they are typically created on tunnel interfaces in a route-based VPN. This workaround applies when the customer requirement does not allow for a route-based VPN.

Customer requirements:

A site-to-site VPN tunnel between a Juniper firewall and a Cisco. The Cisco Peer IP address and the Remote subnet must use the same Public IP address. MIPs need to be configured for the servers behind the Juniper Firewall.

For these requirements, a route-based VPN on the Juniper firewall is not an option because a route is needed to the remote network pointing to the tunnel interface. If the peer IP and remote IP addresses are the same for both devices, the IKE negotiation can not be established. A policy-based VPN can be configured for this design, since only a default route is needed and then a policy can be used to determine the VPN. On the Juniper firewall, a MIP needs to be configured for the servers on the private network, which need to be accessed via a VPN from the Cisco site. However, MIPs are not directly supported in policy-based VPN.

If the outgoing interface is in a zone other than Untrust (for example, zone is ISP), follow KB27122- [ScreenOS] How to configure a MIP in a policy based VPN when outgoing interface is in zone other than Untrust

Untrust-Tun is the Tunnel type zone, carrier zone that helps encryption-decryption set interface tunnel.1 zone Untrust-Tun

Fixed IP on the tunnel interface set interface tunnel.1 ip 4.4.4.10/24 MIP will be used by the cisco-remote network to connect to server behind the Juniper firewall's local network set interface tunnel.1 mip 4.4.4.11 host 20.20.20.5 netmask 255.255.255.255

A route needs to be added to send the traffic to the tunnel interface: set route 25.34.5.7 interface tunnel.1 Phase 1 configuration: set ike gateway Netscreen-Cisco-IKE address 25.34.5.7 main outgoing-interface ethernet4 preshare test sec-level standard

Phase 2 configuration: set vpn Netscreen-Cisco-VPN gateway Netscreen-Cisco-IKE sec-level standard Bind Tunnel Zone (Juniper firewall will recognize the MIP configured on the tunnel interface): set vpn Netscreen-Cisco-VPN bind zone Untrust-Tun

Then an appropriate access-list needs to be configured on the Cisco end to support Proxy-IDs generated by the polices in the Juniper firewall. set policy from untrust to trust 2.2.2.2/32 MIP (4.4.4.10) any tunnel vpn Netscreen-Cisco-VPN log set policy from trust to untrust 20.20.20.5/32 2.2.2.2/32 any tunnel vpn Netscreen-Cisco-VPN log

Note: The MIP will work in only one direction. If traffic needs to be initiated from the Netscreen Trust zone over the tunnel and that traffic must use NAT, then a DIP is required, and the DIP cannot use the same IP as the MIP. This is a limitation. If a bi-directional MIP is required a route based VPN must be used.

Workaround if outgoing is other than Untrust Zone
If the outgoing interface is in a zone other than Untrust (for example, zone is ISP) proceed with following:

set zone "ISP" set internet ethernet0/2 zone "ISP" ISP is the zone for outgoing interface ethernet0/2: set internet ethernet0/2 ip 1.1.1.1/24

ISP-Tun zone is the carrier zone for the tunnel for NAT-ing: set zone "ISP-Tun" tunnel ISP

Untrust-Tun is the Tunnel type zone, carrier zone that helps encryption-decryption: set interface tunnel.1 zone ISP-Tun

Fixed IP on the tunnel interface set interface tunnel.1 ip 4.4.4.10/24

MIP will be used by the remote network to connect to server behind the ScreenOS firewall's local network: set interface tunnel.1 mip 4.4.4.11 host 20.20.20.5 netmask 255.255.255.255

A route needs to be added to send the traffic to the tunnel interface; for the translation to take place: set route 6.7.8.9/32 interface tunnel.1

Phase 1 configuration: set ike gateway Netscreen-IKE address 2.2.2.2 main outgoing-interface ethernet0/2 preshare test sec-level standard

Phase 2 configuration: set vpn Netscreen-VPN gateway Netscreen-IKE sec-level standard

Bind Tunnel Zone (ScreenOS firewall will identify the MIP configured on the tunnel interface): set vpn Netscreen-VPN bind zone Untrust-Tun

Then an appropriate access-list must be configured on the remote end to support Proxy-IDs generated by the polices in the ScreenOS firewall. set policy from ISP to trust 6.7.8.9/32 MIP (4.4.4.11) any tunnel vpn Netscreen-VPN log set policy from trust to ISP 20.20.20.5/32 6.7.8.9/32 any tunnel vpn Netscreen-VPN log

get sa detail
CORPORATE-> get sa total configured sa: 1 HEX ID  Gateway   Port Algorithm   SPI Life:sec kb Sta PID vsys 00000001< 2.2.2.2 500 esp:3des/sha1 c2e1f0e4 3296 unlim A/- -1 0 00000001> 2.2.2.2 500 esp:3des/sha1 74098e47 3296 unlim A/- -1 0

We can see that the remote peer is 2.2.2.2. The State shows A/-. The possible states are below:

I/I SA Inactive. VPN is currently not connected. A/- SA is Active, VPN monitoring is not enabled A/D SA is Active, VPN monitoring is enabled but failing thus DOWN A/U SA is Active, VPN monitoring is enabled and UP

Gateway IP address for Next Hop
Why is it necessary to specify 'Gateway IP address for Next Hop' during the configuration of static default route?

SSG-> set route 0.0.0.0/0 int eth0/1
 * Scenario I: Next-hop gateway IP address is not specified in the static default route.

SSG-> get db st

route to 4.2.2.2 cached arp entry with MAC 000000000000 for 4.2.2.2 add arp entry with MAC 000000000000 for 4.2.2.2 to cache table wait for arp rsp for 4.2.2.2 ifp2 ethernet0/1, out_ifp ethernet0/1, flag 10000e00, tunnel ffffffff, rc 0 outgoing wing prepared, not ready

SSG-> get route | i 4.2.2.2 Because the next-hop IP address is not specified in the default route, the firewall is doing an ARP for 4.2.2.2.
 * 16 0.0.0.0/0 eth0/1 0.0.0.0 S 20 1 Root

When the firewall needs to forward a packet via the default route, it needs the MAC address of the default router in order to build the frame to forward the packet.

The reason for the failure is that the firewall is waiting for an ARP response from 4.2.2.2, as if it was on a connected segment. This is indicated by the 'wait for arp rsp for 4.2.2.2', which it never receives.

It then drops the packet with the message 'outgoing wing prepared, not ready' which indicates that there is no ARP response;

SSG-> set route 0.0.0.0/0 int eth0/1 gateway 1.1.1.2
 * Scenario II: Next-hop gateway ip address is specified in the static default route.

SSG-> get db st

route to 1.1.1.2 cached arp entry with MAC 000000000000 for 1.1.1.2 add arp entry with MAC 002688e8c305 for 1.1.1.2 to cache table arp entry found for 1.1.1.2 ifp2 ethernet0/1, out_ifp ethernet0/1, flag 10800e00, tunnel ffffffff, rc 1 outgoing wing prepared, ready

SSG-> get route | i 4.2.2.2 In this scenario, the firewall found the MAC address for the next-hop gateway (ISP router with ip 1.1.1.2) in its ARP table.
 * 15 0.0.0.0/0 eth0/1 1.1.1.2 S 20 1 Root

It was then able to build the frame and forward the packet to the ISP router, which in turn routed the packet to its next hop, until the packet reached the destination IP 4.2.2.2.

SRX Stuck on old technology
The SRX uses stateful inspection which relies on port and protocol for policy decisions, a technique that is ineffective at controlling applications that use dynamic ports, encryption, or tunnel across often used/allowed ports to bypass firewalls.

Stateful Inspection
This solution allows calls to come from any port on an inside machine, and will direct them to port 25 on the outside.

So why is it wrong?

Our defined restriction is based solely on the outside host’s port number, which we have no way of controlling. Now an enemy

can access any internal machines and port by originating his call from port 25 on the outside machine.

What can be a better solution ?

The ACK signifies that the packet is part of an ongoing conversation Packets without the ACK are connection establishment messages, which we are only permitting from internal hosts

Sub interface number
The maximum permitable number for sub interface number in Juniper SSG140 firewall is 100. The firewall will accept a number in the range of 1-100 only. Sub Interface names in Juniper Netscreen firewalls are like: eth0/1.50, eth0/2.100. A name like eth0/2.101 or eth0/2.200 will not be acceptable.

Window size smaller that MTU
If window size is smaller than MTU, packet retransmissions will occur. This is an application issue. This means buffer size is smaller & lager packets are received.



= Certificates = ​ A session symmetric key between two parties is used only once.

The symmetric (shared) key in the Diffie-Hellman method is K = g xy mod p.

In public-key cryptography, everyone has access to everyone’s public key; public keys are available to the public.

Our example uses small numbers, but note that in a real situation, the numbers are very large. Assume that g = 7 and p = 23. The steps are as follows: 1. Alice chooses x = 3 and calculates R 1 = 7 3 mod 23 = 21. 2. Alice sends the number 21 to Bob. 3. Bob chooses y = 6 and calculates R 2 = 7 6 mod 23 = 4. 4. Bob sends the number 4 to Alice. 5. Alice calculates the symmetric key K = 4 3 mod 23 = 18. Bob calculates the symmetric key K = 21 6 mod 23 = 18. The value of K is the same for both Alice and Bob; g xy mod p = 7 18 mod 35 = 18.

Public Announcement: The naive approach is to announce public keys publicly. Bob can put his public key on his website or announce it in a local or national newspaper. When Alice needs to send a confidential message to Bob, she can obtain Bob’s public key from his site or from the newspaper, or even send a message to ask for it. This approach, however, is not secure; it is subject to forgery. For example, Eve could make such a public announcement. Before Bob can react, damage could be done. Eve can fool Alice into sending her a message that is intended for Bob. Eve could also sign a document with a corresponding forged private key and make everyone believe it was signed by Bob. The approach is also vulnerable if Alice directly requests Bob’s public key. Eve can intercept Bob’s response and substitute her own forged public key for Bob’s public key.

CSR has a Public Key.

CA signs it.

Certificate is a proof of public key.

Encrypt using public key & receiver decrypts using private key.

There are two types of certificate authorities (CAs), root CAs and intermediate CAs.

Certificate 1 - Issued To: example.com; Issued By: Intermediate CA 1 Certificate 2 - Issued To: Intermediate CA 1; Issued By: Intermediate CA 2 Certificate 3 - Issued To: Intermediate CA 2; Issued By: Intermediate CA 3 Certificate 4 - Issued To: Intermediate CA 3; Issued By: Root CA

Root CA certificates, on the other hand, are "Issued To" and "Issued By" themselves,

For enhanced security purposes, most end user certificates today are issued by intermediate certificate authorities.

Installing an intermediate CA signed certificate on a web server or load balancer usually requires installing a bundle of certificates.

The CA will also provide a so called intermediate CA file or chain certificate. It proves that your chosen CA is trusted by one of the root CAs. You will need the intermediate CA certificate as 'chain' certificate in your clientssl profile.

Nonce is Number Once

In an asymmetric key encryption scheme, anyone can encrypt messages using the public key, but only the holder of the paired private key can decrypt. Security depends on the secrecy of the private key.

In the Diffie–Hellman key exchange scheme, each party generates a public/private key pair and distributes the public key. After obtaining an authentic copy of each other's public keys, Alice and Bob can compute a shared secret offline. The shared secret can be used, for instance, as the key for a symmetric cipher.


 * Public-key encryption, in which a message is encrypted with a recipient's public key. The message cannot be decrypted by anyone who does not possess the matching private key, who is thus presumed to be the owner of that key and the person associated with the public key. This is used in an attempt to ensure confidentiality.


 * Digital signatures, in which a message is signed with the sender's private key and can be verified by anyone who has access to the sender's public key. This verification proves that the sender had access to the private key, and therefore is likely to be the person associated with the public key. This also ensures that the message has not been tampered with, as any manipulation of the message will result in changes to the encoded message digest, which otherwise remains unchanged between the sender and receiver.

= TCP =

Source: TCP/IP Protocol-Suite, B.Forouzan


 * TCP uses the services of IP, a connectionless protocol, but itself is connection-oriented.
 * TCP uses the services of IP to deliver individual segments to the receiver, but it controls the connection itself.
 * If a segment is lost or corrupted, it is retransmitted. IP is unaware of this retransmission.
 * If a segment arrives out of order, TCP holds it until the missing segments arrive; IP is unaware of this reordering.
 * Sequence number of packet is the number of the first byte in the packet.
 * Together with length in the TCP header, we know which packet has which bytes

TCP Connection
Connection Establishment Data Transfer Connection Termination
 * TCP transmits data in full-duplex mode.
 * When two TCPs in two machines are connected, they are able to send segments to each other simultaneously.
 * In TCP, connection-oriented transmission requires three phases:

Three way handshake

 * The process starts with the server.
 * The server program tells its TCP that it is ready to accept a connection.
 * This request is called a passive open.
 * The client program issues a request for an active open.
 * A client that wishes to connect to an open server tells its TCP to connect to a particular server.
 * TCP can now start the three-way handshaking process




 * 1st Packet:
 * SYN segment is for synchronization of sequence numbers.
 * The client in our example chooses a random number as the first sequence number and sends this number to the server.
 * This sequence number is called the initial sequence number(ISN).
 * This segment does not contain an acknowledgment number.
 * It does not define the window size either; a window size definition makes sense only when a segment includes an acknowledgment.
 * The segment can also include some options - WS, MSS, SACK_PERM
 * Note that the SYN segment is a control segment and carries no data.
 * However, it consumes one sequence number.
 * When the data transfer starts, the ISN is incremented by 1.
 * We can say that the SYN segment carries no real data, but we can think of it as containing one imaginary byte.


 * 2nd Packet:
 * The server sends the second segment, a SYN + ACK segment with two flag bits set: SYN and ACK.
 * This segment has a dual purpose.
 * First, it is a SYN segment for communication in the other direction.
 * The server uses this segment to initialize a sequence number for numbering the bytes sent from the server to the client.
 * The server also acknowledges the receipt of the SYN segment from the client by setting the ACK flag and displaying the next sequence number it expects to receive from the client.
 * Because it contains an acknowledgment, it also needs to define the receive window size, rwnd (to be used by the client).


 * 3rd Packet:
 * The client sends the third segment which is just an ACK segment.
 * It acknowledges the receipt of the second segment with the ACK flag and acknowledgment number field.
 * Note that the sequence number in this segment is the same as the one in the SYN segment; the ACK segment does not consume any sequence numbers.
 * The client must also define the server window size.
 * In general, the third segment usually does not carry data and consumes no sequence numbers.


 * Note:
 * A SYN cannot carry data, but it consumes one Sequence number.
 * A SYN+ACK cannot carry data, but consumes one Sequence number.
 * A ACK if carrying no data, consumes no sequence number.

Simultaneous Open

 * In rare situation when both processes issue an active open.
 * In this case, both TCPs transmit a SYN + ACK segment to each other.
 * Only one single connection is established between them.

SYN Flooding Attack

 * TCP handshake is susceptible to SYN flooding attack.
 * This happens when a malicious attackers send a large number of SYN segments.
 * The server, assuming that the clients are issuing an active open, allocates the necessary resources and setting timers.
 * The TCP server then sends the SYN+ACK segments to the fake clients, which are lost.
 * When the server waits for the third packet, resources are allocated without being used.
 * If the number of SYN segments is large, the server eventually runs out of resources.
 * It may be unable to accept connection requests from valid clients.
 * This SYN flooding attack belongs to denial of service attack group.
 * One strategy is to postpone resource allocation until the server can verify that the connection request is coming from a valid IP address, by using a Cookie.
 * SCTP uses this strategy.

Data Transfer

 * After connection is established, bidirectional data transfer can take place.
 * The client and server can send data and acknowledgments in both directions.
 * Data traveling in the same direction as an acknowledgment are carried on the same segment.
 * The acknowledgment is piggybacked with the data.

Connection Termination
Three-way handshaking Four-way handshaking with a half-close option.
 * Any of the two parties involved in exchanging data (client or server) can close the connection, it is usually initiated by the client.
 * Most implementations today allow two options for connection termination:

Three-Way Termination



 * 1st Packet:
 * The client TCP, after receiving a close command from the client process, sends the FIN segment.
 * A FIN segment can include the last chunk of data sent by the client or it can be just a control segment.
 * If it is only a control segment, it consumes only one sequence number.


 * 2nd Packet:
 * The server TCP after receiving the FIN, informs its process
 * It then sends a FIN+ACK to confirm the receipt of the FIN from the client and to announce the closing of the connection in the other direction.
 * This segment can also contain the last chunk of data from the server.
 * If it does not carry data, it consumes only one sequence number.


 * 3rd Packet:
 * The client TCP sends an ACK segment to confirm the receipt of the FIN from the TCP server.
 * This segment contains the acknowledgment number, which is one plus the sequence number received in the FIN segment from the server.
 * This segment cannot carry data and consumes no sequence numbers.


 * Note:
 * The FIN segment consumes one sequence number if it does not carry data.
 * The FIN + ACK segment consumes one sequence number if it does not carry data.

Half-Close

 * In TCP, one end can stop sending data while still receiving data. This is called a Half-Close.
 * Either the server or the client can issue a half-close request.
 * It can occur when the server needs all the data before processing can begin.
 * An example is sorting.
 * When the client sends data to the server to be sorted, the server needs to receive all the data before sorting can start.
 * This means the client, after sending all data, can close the connection in the client-to-server direction.
 * However, the server-to-client direction must remain open to return the sorted data.
 * The server, after receiving the data, still needs time for sorting; its outbound direction must remain open.




 * The data transfer from the client to the server stops.
 * The client half-closes the connection by sending a FIN segment.
 * The server accepts the half-close by sending the ACK segment.
 * The server, however, can still send data.
 * When the server has sent all of the processed data, it sends a FIN segment, which is acknowledged by an ACK from the client.
 * After half closing the connection, data can travel from server to client and acknowledgments can travel from client to server.
 * The client cannot send any more data to the server.

Connection Reset
Deny a connection request Abort an existing connection Terminate an idle connection
 * TCP at any end may
 * All of these are done with the RST flag.

Maximum Segment Life

 * The TCP standard defines MSL as being a value of 120 seconds (2 minutes).
 * In modern networks TCP allows implementations to choose a lower value.
 * The common value for MSL is between 30 seconds and 1 minute.
 * The MSL is the maximum time a segment can exist in the Internet before it is dropped.
 * TCP segment is encapsulated in an IP datagram, which has a limited lifetime (TTL).
 * When the IP datagram is dropped, the encapsulated TCP segment is also dropped.

TIME-WAIT state and 2SML timer
There are two reasons for the existence of the TIME-WAIT state and the 2SML timer:


 * 1st Reason:
 * If the last ACK segment is lost, the server TCP, which sets a timer for the last FIN, assumes that its FIN is lost and resends it.
 * If the client goes to the CLOSED state and closes the connection before the 2MSL timer expires, it never receives this resent FIN segment, and consequently, the server never receives the final ACK.
 * The server cannot close the connection.
 * The 2MSL timer makes the client wait for a duration that is enough time for an ACK to be lost (one SML) and a FIN to arrive (another SML).
 * If during the TIME-WAIT state, a new FIN arrives, the client sends a new ACK and restarts the 2SML timer.


 * 2nd Reason:
 * A duplicate segment from one connection might appear in the next one.
 * Assume a client and a server have closed a connection.
 * After a short period of time, they open a connection with the same socket addresses (same source and destination IP addresses and same source and destination port numbers).
 * This new connection is called an incarnation of the old one.
 * A duplicated segment from the previous connection may arrive in this new connection and be interpreted as belonging to the new connection if there is not enough time between the two connections.
 * To prevent this problem, TCP requires that an incarnation cannot occur unless 2MSL amount of time has elapsed.
 * Some implementations, however, ignore this rule if the initial sequence number of the incarnation is greater than the last sequence number used in the previous connection.

TCP Windows
Send window Receive window
 * TCP uses two windows for each direction of data transfer:
 * Four windows for a bidirectional communication.

Send Window



 * The window shown here is of size 100 bytes (normally thousands of bytes).
 * The send window size is dictated by the receiver (flow control) and the congestion in the underlying network (congestion control).
 * The figure shows how a send window opens, closes, or shrinks.

Receive Window


rwnd = buffer size − number of waiting bytes to be pulled
 * TCP allows the receiving process to pull data at its own pace.
 * This means that part of the allocated buffer at the receiver may be occupied by bytes that have been received and acknowledged, but are waiting to be pulled by the receiving process.
 * The receive window size is then always smaller or equal to the buffer size
 * The receiver window size determines the number of bytes that the receive window can accept from the sender before being overwhelmed (flow control).

Flow Control

 * Flow control balances the rate a producer creates data with the rate a consumer can use the data.
 * TCP separates flow control from error control.




 * Data travels from Sending Process to Sending TCP, then to the Receiving TCP, and finaly to the receiving process (paths 1, 2, and 3).
 * Flow control feedback's are traveling from the receiving TCP to the sending TCP and from the sending TCP up to the sending process (paths 4 and 5).
 * Most implementations of TCP do not provide flow control feedback from the receiving process to the receiving TCP; they let the receiving process pull data from the receiving TCP whenever it is ready.
 * Thus receiving TCP controls the sending TCP; the sending TCP controls the sending process.
 * Flow control feedback from the Sending TCP to the Sending Process (path 5) is achieved through simple rejection of data by sending TCP when its window is full.
 * Windows are used to achieve flow control from Receiving TCP to Sending TCP, as discussed in below section.

Opening and Closing Windows

 * To achieve flow control, TCP forces the sender and the receiver to adjust their window sizes.
 * The size of the buffer for both parties is fixed when the connection is established.
 * The receive window closes (moves its left wall to the right) when more bytes arrive from the sender;
 * It opens (moves its right wall to the right) when more bytes are pulled by the process.
 * Assume that it does not shrink (the right wall does not move to the left).
 * The opening, closing, and shrinking of the send window is controlled by the receiver.
 * The send window closes (moves its left wall to the right) when a new acknowledgement allows it to do so.
 * The send window opens (its right wall moves to the right) when the RWND advertised by the receiver allows it to do so.



The diagram shows 8 segments:

1. Client sends the server a SYN to request connection. The client announces its ISN = 100. The server, allocates a buffer size of 800 (assumption) and sets its window to cover the whole buffer (rwnd = 800). The number of the next byte to arrive starts from 101.

2. This is an ACK + SYN segment. The segment uses ack no = 101 to show that it expects to receive bytes starting from 101. It also announces that the client can set a buffer size of 800 bytes.

3. The third segment is an ACK segment from client to server.

4. After the client has set its window with the size (800) dictated by the server, the process pushes 200 bytes of data. The TCP client numbers these bytes 101 to 300. It creates a segment and sends it to server. The segment has starting byte number as 101 and the segment carries 200 bytes. The window of client is then adjusted to show 200 bytes of data are sent but waiting for acknowledgment. When this segment is received at the server, the bytes are stored, and the receive window closes to show that the next byte expected is byte 301; the stored bytes occupy 200 bytes of buffer.

5. The fifth segment is the feedback from the server to the client. The server acknowledges bytes up to and including 300 (expecting to receive byte 301). The segment also carries the size of the receive window after decrease (600). The client, after receiving this segment, purges the acknowledged bytes from its window and closes its window to show that the next byte to send is byte 301. The window size decreases to 600 bytes. Although the allocated buffer can store 800 bytes, the window cannot open (moving its right wall to the right) because the receiver does not let it.

6. Sent by the client after its process pushes 300 more bytes. The segment defines seq no as 301 and contains 300 bytes. When this segment arrives at the server, the server stores them, but it has to reduce its window size. After its process has pulled 100 bytes of data, the window closes from the left for the amount of 300 bytes, but opens from the right for the amount of 100 bytes. The result is that the size is only reduced 200 bytes. The receiver window size is now 400 bytes.

7. The server acknowledges the receipt of data, and announces that its window size is 400. When this segment arrives at the client, the client has no choice but to reduce its window again and set the window size to the value of rwnd = 400. The send window closes from the left by 300 bytes, and opens from the right by 100 bytes.

8. This one is also from the server after its process has pulled another 200 bytes. Its window size increases. The new rwnd value is now 600. The segment informs the client that the server still expects byte 601, but the server window size has expanded to 600. After this segment arrives at the client, the client opens its window by 200 bytes without closing it. The result is that its window size increases to 600 bytes.


 * Shrinking of Windows
 * The receive window cannot shrink.
 * The send window can shrink if the receiver defines a value for rwnd that results in shrinking the window.

Window Shutdown

 * Shrinking the send window by moving its right wall to the left is discouraged.
 * There is one exception: the receiver can temporarily shut down the window by sending a RWND of 0.
 * This can happen if the receiver does not want to receive data from the sender for a while.
 * The sender do not actually shrink the size of the window, but stops sending data until a new advertisement has arrived.
 * Even when the window is shut down by an order from the receiver, the sender can always send a segment with 1 byte of data.
 * This is called Probing and is used to prevent a deadlock.

Silly Window Syndrome

 * A serious problem can arise in the sliding window operation when either the sending application program creates data slowly or the receiving application program consumes data slowly, or both.
 * Any of these situations results in the sending of data in very small segments, which reduces the efficiency of the operation.
 * If TCP sends segments containing only 1 byte of data, it means that a 41-byte datagram (20 bytes TCP header and 20 bytes IP header) transfers only 1 byte of user data.
 * The Overhead is 41:1
 * The inefficiency is even worse after accounting for the data link layer and physical layer overhead.


 * Syndrome due to Sender


 * The sending TCP may create a silly window syndrome if it is serving an application program that creates data slowly(e.g:1 byte at a time).
 * The application program writes 1 byte at a time into the buffer of the sending TCP.
 * If the sending TCP does not have any specific instructions, it may create segments containing 1 byte of data.
 * The result is a lot of 41-byte segments that are traveling through an internet.
 * The solution is to prevent the sending TCP from sending the data byte by byte.
 * The sending TCP must be forced to wait and collect data to send in a larger block.
 * If it waits too long, it may delay the process.
 * If it does not wait long enough, it may end up sending small segments.


 * Solution - Nagle’s Algorithm


 * The sending TCP sends the first piece of data it receives from the sending application program even if it is only 1 byte.
 * After sending the first segment, the sending TCP accumulates data in the output buffer and waits until either the receiving TCP sends an acknowledgment or until enough data has accumulated to fill a maximum-size segment.
 * Above Step is repeated for the rest of the transmission.


 * Syndrome Created by the Receiver


 * if Receiving TCP is serving an application that consumes data slowly (like 1 byte at a time) Syndrome may occur.
 * Assume that the sender creates data in blocks of 1000 byte, but the receiver consumes data 1 byte at a time.
 * Also assume that the input buffer of the receiving TCP is 4 kilobytes. The sender sends the first 4 kilobytes of data.
 * The receiver stores it in its buffer.
 * Now its buffer is full.
 * It advertises a window size of zero, which means the sender should stop sending data.
 * The receiving application reads the first byte of data from the input buffer of the receiving TCP.
 * Now there is 1 byte of space in the incoming buffer.
 * The receiving TCP announces a window size of 1 byte, which means that the sending TCP takes this advertisement as good news and sends a segment carrying only 1 byte of data.
 * The procedure will continue.
 * One byte of data is consumed and a segment carrying 1 byte of data is sent.
 * This is again an efficiency problem.


 * Two solutions are possible


 * Clark’s Solution


 * Announce a window size of zero until either
 * 1) There is enough space to accommodate a segment of maximum size
 * 2) At least half of the receive buffer is empty.


 * Delayed Acknowledgment


 * The second solution is to delay sending the acknowledgment.
 * This means that when a segment arrives, it is not acknowledged immediately.
 * The receiver waits until there is a decent amount of space in its incoming buffer before acknowledging the arrived segments.
 * The delayed acknowledgment prevents the sending TCP from sliding its window.
 * After the sending TCP has sent the data in the window, it stops.
 * This removes the syndrome.
 * Delayed acknowledgment also has another advantage: it reduces traffic.
 * The receiver does not have to acknowledge each segment.
 * However, there also is a disadvantage in that the delayed acknowledgment may result in the sender unnecessarily retransmitting the unacknowledged segments.
 * TCP adjusts this by defining that the acknowledgment should not be delayed by more than 500 ms.

Error Control

 * TCP is a reliable transport layer protocol.
 * This means that an application program that delivers a stream of data to TCP relies on TCP to deliver the entire stream to the application program on the other end in order, without error, and without any part lost or duplicated.
 * TCP provides reliability using error control. Error control includes mechanisms for detecting and resending corrupted segments, resending lost segments, storing out-of-order segments until missing segments arrive, and detecting and discarding duplicated segments.
 * Error control in TCP is achieved through the use of three simple tools: checksum, acknowledgment, and time-out.

Checksum

 * Each segment includes a checksum field, which is used to check for a corrupted segment.
 * If a segment is corrupted as deleted by an invalid checksum, the segment is discarded by the destination TCP and is considered as lost.
 * TCP uses a 16-bit checksum that is mandatory in every segment.

Acknowledgment

 * TCP uses acknowledgments to confirm the receipt of data segments.
 * Control segments that carry no data but consume a sequence number are also acknowledged.
 * ACK segments are never acknowledged.

There are two types of acknowledgment:
 * Acknowledgment Type:


 * Cumulative Acknowledgment (ACK)
 * TCP was originally designed to acknowledge receipt of segments cumulatively.
 * The receiver advertises the next byte it expects to receive, ignoring all segments received and stored out of order.
 * Also called Positive Cumulative Acknowledgment or ACK.
 * "Positive” indicates that no feedback is provided for discarded, lost, or duplicate segments.
 * The 32-bit ACK field in the TCP header is used for cumulative acknowledgments
 * Its value is valid only when the ACK flag bit is set to 1.


 * Selective Acknowledgment (SACK)
 * A SACK does not replace ACK, but reports additional information to the sender.
 * A SACK reports a block of data that is out of order.
 * Also reports a block of segments that is duplicated.
 * There is no provision in the TCP header for adding this type of information.
 * SACK is implemented as an option at the end of the TCP header.


 * Acknowledgment Generation

1. When end A sends a data segment to end B, it must include (piggyback) an acknowledgment that gives the next sequence number it expects to receive. This rule decreases the number of segments needed and therefore reduces traffic.

2. When the receiver has no data to send and it receives an in-order segment (with expected sequence number) and the previous segment has already been acknowledged, the receiver delays sending an ACK segment until another segment arrives or until a period of time (normally 500 ms) has passed. In other words, the receiver needs to delay sending an ACK segment if there is only one outstanding in-order segment. This rule reduces ACK segment traffic.

3. When a segment arrives with a sequence number that is expected by the receiver, and the previous in-order segment has not been acknowledged, the receiver immediately sends an ACK segment. In other words, there should not be more than two in-order unacknowledged segments at any time. This prevents the unnecessary retransmission of segments that may create congestion in the network.

4. When a segment arrives with an out-of-order sequence number that is higher than expected, the receiver immediately sends an ACK segment announcing the sequence number of the next expected segment. This leads to the fast retransmission of missing segments.

5. When a missing segment arrives, the receiver sends an ACK segment to announce the next sequence number expected. This informs the receiver that segments reported missing have been received.

6. If a duplicate segment arrives, the receiver discards the segment, but immediately sends an acknowledgment indicating the next in-order segment expected. This solves some problems when an ACK segment itself is lost.

Retransmission

 * The heart of the error control mechanism is the retransmission of segments.
 * When a segment is sent, it is stored in a queue until it is acknowledged.
 * When the retransmission timer expires or when the sender receives three duplicate ACKs for the first segment in the queue, that segment is retransmitted.


 * Retransmission after RTO


 * The sending TCP maintains one retransmission time-out (RTO) for each connection.
 * When the timer matures, i.e. times out, TCP sends the segment in the front of the queue (the segment with the smallest sequence number) and restarts the timer.
 * Note that again we assume Sf < Sn.
 * This version of TCP is sometimes referred to as Tahoe.
 * We will see later that the value of RTO is dynamic in TCP and is updated based on the round-trip time (RTT) of segments.
 * RTT is the time needed for a segment to reach a destination and for an acknowledgment to be received.


 * Retransmission after Three Duplicate ACK Segments(Reno)


 * The previous rule about retransmission of a segment is sufficient if the value of RTO is not large.
 * To help throughput by allowing sender to retransmit sooner than waiting for a time out, most implementations today follow the three duplicate ACKs rule and retransmit the missing segment immediately.
 * This feature is called fast retransmission, and the version of TCP that uses this feature is referred to as Reno.
 * In this version, if three duplicate acknowledgments (i.e., an original ACK plus three exactly identical copies) arrives for a segment, the next segment is retransmitted without waiting for the time-out.


 * Out-of-Order Segments


 * TCP implementations today do not discard out-of-order segments.
 * They store them temporarily and flag them as out-of-order segments until the missing segments arrive.
 * Out-of-order segments are never delivered to the process.
 * TCP guarantees that data are delivered to the process in order.


 * Lost Segment




 * A lost segment is discarded somewhere in the network; a corrupted segment is discarded by the receiver itself.
 * Both are considered lost.
 * We are assuming that data transfer is unidirectional: one site is sending, the other receiving.
 * In our scenario, the sender sends segments 1 and 2, which are acknowledged immediately by an ACK (rule 3).
 * Segment 3, however, is lost.
 * The receiver receives segment 4, which is out of order.
 * The receiver stores the data in the segment in its buffer but leaves a gap to indicate that there is no continuity in the data.
 * The receiver immediately sends an acknowledgment to the sender displaying the next byte it expects (rule 4).
 * Note that the receiver stores bytes 801 to 900, but never delivers these bytes to the application until the gap is filled.
 * The sender TCP keeps one RTO timer for the whole period of connection.
 * When the third segment times out, the sending TCP resends segment 3, which arrives this time and is acknowledged properly (rule 5).


 * Fast Retransmission


 * Here RTO has a larger value.
 * Each time the receiver receives the fourth, fifth, and sixth segments, it triggers an acknowledgment (rule 4).
 * The sender receives four acknowledgments with the same value (three duplicates).
 * Although the timer has not matured, the rule for fast transmission requires that segment 3, the segment that is expected by all of these duplicate acknowledgments, be resent immediately.
 * After resending this segment, the timer is restarted.


 * Delayed Segment
 * TCP uses the services of IP, which is a connectionless protocol.
 * Each IP datagram encapsulating a TCP segment may reach the final destination through a different route with a different delay.
 * Hence TCP segments may be delayed.
 * Delayed segments sometimes may time out.
 * If the delayed segment arrives after it has been resent, it is considered a duplicate segment and discarded.


 * Duplicate Segment
 * A duplicate segment can be created, for example, by a sending TCP when a segment is delayed and treated as lost by the receiver.
 * Handling the duplicated segment is a simple process for the destination TCP.
 * The destination TCP expects a continuous stream of bytes.
 * When a segment arrives that contains a sequence number equal to an already received and stored segment, it is discarded.
 * An ACK is sent with ackNo defining the expected segment.


 * Automatically Corrected Lost ACK




 * A key advantage of using cumulative acknowledgments.
 * Figure shows a lost acknowledgment sent by the receiver of data.
 * In the TCP acknowledgment mechanism, a lost acknowledgment may not even be noticed by the source TCP.
 * TCP uses an accumulative acknowledgment system.
 * We can say that the next acknowledgment automatically corrects the loss of the acknowledgment.




 * If the next acknowledgment is delayed for a long time or there is no next acknowledgment (the lost acknowledgment is the last one sent), the correction is triggered by the RTO timer.
 * A duplicate segment is the result.
 * When the receiver receives a duplicate segment, it discards it, and resends the last ACK immediately to inform the sender that the segment or segments have been received.
 * Note that only one segment is retransmitted although two segments are not acknowledged.
 * When the sender receives the retransmitted ACK, it knows that both segments are safe and sound because acknowledgment is cumulative.


 * Deadlock Created by Lost Acknowledgment


 * There is one situation in which loss of an acknowledgment may result in system deadlock.
 * This is the case in which a receiver sends an acknowledgment with rwnd set to 0 and requests that the sender shut down its window temporarily.
 * After a while, the receiver wants to remove the restriction; however, if it has no data to send, it sends an ACK segment and removes the restriction with a nonzero value for rwnd.
 * A problem arises if this acknowledgment is lost.
 * The sender is waiting for an acknowledgment that announces the nonzero rwnd.
 * The receiver thinks that the sender has received this and is waiting for data.
 * This situation is called a deadlock; each end is waiting for a response from the other end and nothing is happening.
 * A retransmission timer is not set.
 * To prevent deadlock, a persistence timer was designed.

Congestion Control

 * Congestion control in TCP is based on both open-loop and closed-loop mechanisms.
 * TCP uses a congestion window and a congestion policy that avoid congestion and detect and alleviate congestion after it has occurred.

Actual window size = Minimum (rwnd, cwnd)
 * Congestion Window
 * It is not only the receiver that can dictate to the sender the size of the sender’s window.
 * The network can also dectate the size.
 * If the network cannot deliver the data as fast as it is created by the sender, it must tell the sender to slow down.
 * So Receiver and Network determine the size of the sender’s window.
 * The sender has two pieces of information: the Receiver-Advertised window size and the Congestion window size.
 * The actual size of the window is the minimum of these two:


 * Congestion Policy
 * TCP’s general policy for handling congestion is based on three phases:
 * Slow Start
 * Congestion Avoidance
 * Congestion Detection


 * In the slow start phase, the sender starts with a slow rate of transmission, but increases the rate rapidly to reach a threshold.
 * When the threshold is reached, the rate of increase is reduced.
 * Finally if ever congestion is detected, the sender goes back to the slow start or congestion avoidance phase, based on how the congestion is detected.


 * Slow Start - Exponential Increase




 * The slow start algorithm is based on the idea that the size of the congestion window (cwnd) starts with 1 MSS.
 * The MSS is determined during connection establishment using an option of the same name.
 * The size of the window increases one MSS each time one acknowledgement arrives.
 * The algorithm starts slowly, but grows exponentially.
 * Assume that rwnd is much longer than cwnd, so that the sender window size always equals cwnd.
 * Ignore delayed-ACK policy for now and assume that each segment is acknowledged individually.
 * The sender starts with cwnd = 1 MSS.
 * This means that the sender can send only one segment.
 * After the first ACK arrives, the size of the congestion window is increased by 1, which means that cwnd is now 2.
 * Now two more segments can be sent.
 * When two more ACKs arrive, the size of the window is increased by 1 MSS for each ACK, which means cwnd is now 4.
 * Now four more segments can be sent.
 * When four ACKs arrive, the size of the window increases by 4, which means that cwnd is now 8.
 * In the slow start algorithm, the size of the congestion window increases exponentially until it reaches a threshold.


 * Congestion Avoidance - Additive Increase




 * In slow start algorithm, the size of the congestion window increases exponentially.
 * To avoid congestion before it happens, one must slow down this exponential growth.
 * TCP's Congestion avoidance feature increases the cwnd additively instead of exponentially.
 * When the size of the congestion window reaches the slow start threshold, the slow start phase stops and the additive phase begins.
 * Each time the whole “window” of segments is acknowledged, the size of the congestion window is increased by one.
 * A window is the number of segments transmitted during RTT.
 * The increase is based on RTT, not on the number of arrived ACKs.
 * Therefore the size of the congestion window increases additively until congestion is detected.


 * Congestion Detection - Multiplicative Decrease


 * If congestion occurs, the congestion window size must be decreased.
 * The only way a sender can guess that congestion has occurred is the need to retransmit a segment.
 * This is a major assumption made by TCP.
 * Retransmission is needed to recover a missing packet which is assumed to have been dropped by a router due to overloaded or congested.
 * Retransmission can occur in one of two cases: when the RTO timer times out or when three duplicate ACKs are received.
 * In both cases, the size of the threshold is dropped to half (multiplicative decrease).

Most TCP implementations have two reactions:

1. If a time-out occurs, there is a stronger possibility of congestion; a segment has probably been dropped in the network and there is no news about the following sent segments.

In this case TCP reacts strongly:
 * a. It sets the value of the threshold to half of the current window size.


 * b. It reduces cwnd back to one segment.


 * c. It starts the slow start phase again.

2. If three duplicate ACKs are received, there is a weaker possibility of congestion; a segment may have been dropped but some segments after that have arrived safely since three duplicate ACKs are received. This is called fast transmission and fast recovery.

In this case, TCP has a weaker reaction as shown below:
 * a. It sets the value of the threshold to half of the current window size.


 * b. It sets cwnd to the value of the threshold (some implementations add three segment sizes to the threshold).


 * c. It starts the congestion avoidance phase.

TCP Timers
Most TCP implementations use at least four timers
 * Retransmission
 * Persistence
 * Keepalive
 * TIME-WAIT


 * Retransmission Timer

To retransmit lost segments, TCP employs one retransmission timer for the whole connection period that handles the retransmission time-out (RTO), the waiting time for an acknowledgment of a segment.

The following rules apply to the retransmission timer:

1. When TCP sends the segment in front of the sending queue, it starts the timer.

2. When the timer expires, TCP resends the first segment in front of the queue, and restarts the timer.

3. When a segment (or segments) are cumulatively acknowledged, the segment (or segments) are purged from the queue.

4. If the queue is empty, TCP stops the timer; otherwise, TCP restarts the timer.

To calculate the retransmission time-out (RTO), we first need to calculate the RTT.
 * Round-Trip Time (RTT)
 * Measured RTT - The measured round-trip time for a segment is the time required for the segment to reach the destination and be acknowledged, although the acknowledgment may include other segments. In TCP only one RTT measurement can be in progress at any time.
 * Smoothed RTT - The measured RTT is likely to change for each round trip. The fluctuation is so high in today’s Internet that a single measurement alone cannot be used for retransmission time-out purposes.
 * RTT Deviation - Most implementations use RTT deviation


 * Retransmission Time-out (RTO)
 * The value of RTO is based on the smoothed round-trip time and its deviation.
 * Take the running smoothed average value of Smoothed RTT, and add four times the running smoothed average value of RTT Deviation (normally a small value).


 * Karn’s Algorithm
 * Do not consider the round-trip time of a retransmitted segment in the calculation of RTTs.
 * Do not update the value of RTTs until you send a segment and receive an acknowledgment without the need for retransmission.
 * TCP does not consider the RTT of a retransmitted segment in its calculation of a new RTO.


 * Exponential Backoff
 * Most TCP implementations use an exponential backoff strategy to calculate the value of RTO if a retransmission occurs.
 * The value of RTO is doubled for each retransmission.
 * So if the segment is retransmitted once, the value is two times the RTO.
 * If it transmitted twice, the value is four times the RTO.


 * Persistence Timer
 * To deal with a zero-window-size advertisement, TCP needs Persistence Timer.
 * If the receiving TCP announces a window size of zero, the sending TCP stops transmitting segments until the receiving TCP sends an ACK segment announcing a nonzero window size.
 * This ACK segment can be lost.
 * Remember - ACK segments are not acknowledged nor retransmitted in TCP.
 * Both TCPs might continue to wait for each other forever (a deadlock).
 * To correct this deadlock, TCP uses a persistence timer for each connection.
 * When the sending TCP receives an acknowledgment with a window size of zero, it starts a persistence timer.
 * When the persistence timer goes off, the sending TCP sends a special segment called a Probe.
 * This segment contains only 1 byte of new data.
 * It has a sequence number, but its sequence number is never acknowledged; it is even ignored in calculating the sequence number for the rest of the data.
 * The probe causes the receiving TCP to resend the acknowledgment.
 * The value of the persistence timer is set to the value of the retransmission time.
 * If a response is not received from the receiver, another probe segment is sent and the value of the persistence timer is doubled and reset.
 * The sender continues sending the probe segments and doubling and resetting the value of the persistence timer until the value reaches a threshold (generally 60s).
 * After that the sender sends one probe segment every 60 s until the window is reopened.


 * Keepalive Timer
 * A keepalive timer is used in some implementations to prevent a long idle connection between two TCPs.
 * If a client opens a TCP connection to a server, transfers some data, and becomes silent.
 * Perhaps the client has crashed. In this case, the connection remains open forever.
 * To remedy this situation, most implementations equip a server with a keepalive timer.
 * Each time the server hears from a client, it resets this timer.
 * The time-out is usually 2 hours.
 * If the server does not hear from the client after 2 hours, it sends a probe segment.
 * If there is no response after 10 probes, each of which is 75s apart, it assumes that the client is down and terminates the connection.

Options
The TCP header can have up to 40 bytes of optional information.


 * 1-byte options
 * 1) End of option list
 * 2) No operation


 * Multiple-byte options
 * 1) Maximum Segment Size
 * 2) Window Scale Factor
 * 3) Timestamp
 * 4) SACK-permitted
 * 5) SACK


 * End of Option
 * EOP is a 1-byte option used for padding at the end of the option section.
 * It can only be used as the last option. There are no more options in the header after EOP.
 * Only one occurrence of this option is allowed.
 * After this option, the receiver looks for the payload data.
 * Data from the application program starts at the beginning of the next 32-bit word.


 * No Operation
 * NOP option is also a 1-byte option used as a filler.
 * It normally comes before another option to help align it in a four-word slot.
 * NOP can be used more than once.


 * Maximum Segment Size (MSS)


 * MSS option defines the size of the biggest unit of data that can be received by the destination of the TCP segment.
 * It defines the maximum size of the data, not the maximum size of the segment.
 * The field is 16 bits long, the value can be 0 to 65,535 bytes.
 * Each party defines the MSS for the segments it will receive during the connection.
 * If a party does not define this, the default values is 536 bytes.
 * The value of MSS is determined during connection establishment and does not change during the connection.


 * Window Scale Factor
 * Window size field in the header defines the size of the sliding window.
 * This field is 16 bits long, which means that the window can range from 0 to 65,535 bytes.
 * It may not be sufficient if the data are traveling through a long channel with a wide bandwidth.
 * To increase the window size, a window scale factor is used.
 * The new window size is found by first raising 2 to the number specified in the window scale factor.
 * Then this result is multiplied by the value of the window size in the header.

New Window Size = Window Size in Header × 2 Window Scale Factor

If Window Scale Factor is 3. An end point receives an acknowledgment in which the window size is advertised as 32,768. New Window Size = 32,768 × 23 = 262,144 bytes.


 * Although the scale factor could be as large as 255, the largest value allowed by TCP/IP is 14.
 * Maximum window size is 216 × 214 = 230, which is less than the maximum value for the sequence number.
 * The size of the window cannot be greater than the maximum value of the sequence number.
 * The value of the window scale factor can also be determined only during connection establishment; it does not change during the connection.
 * During data transfer, the size of the window (specified in the header) may be changed, but it must be multiplied by the same window scale factor.
 * One end may set the value of the window scale factor to 0, which means although it supports this option, it does not want to use it for this connection.


 * Timestamp


 * This is a 10-byte option with the format shown in Figure 15.46. Note that the end with the active open announces a timestamp in the connection request segment (SYN segment).
 * If it receives a timestamp in the next segment (SYN + ACK) from the other end, it is allowed to use the timestamp; otherwise, it does not use it any more.
 * The timestamp option has two applications: it measures the round-trip time and prevents wraparound sequence numbers.


 * Measuring RTT


 * Timestamp can be used to measure the round-trip time (RTT).
 * TCP, when ready to send a segment, reads the value of the system clock and inserts this value, a 32-bit number, in the timestamp value field.
 * The receiver, when sending an acknowledgment for this segment or an accumulative acknowledgment that covers the bytes in this segment, copies the timestamp received in the timestamp echo reply.
 * The sender, upon receiving the acknowledgment, subtracts the value of the timestamp echo reply from the time shown by the clock to find RTT.


 * Note that there is no need for the sender’s and receiver’s clocks to be synchronized because all calculations are based on the sender clock.
 * Also note that the sender does not have to remember or store the time a segment left because this value is carried by the segment itself.

is subtracted from the current time.
 * The receiver needs to keep track of two variables. The first, lastack, is the value of the last acknowledgment sent.
 * The second, tsrecent, is the value of the recent timestamp that has not yet echoed.
 * When the receiver receives a segment that contains the byte matching the value of lastack, it inserts the value of the timestamp field in the tsrecent variable.
 * When it sends an acknowledgment, it inserts the value of tsrecent in the echo reply field.
 * The sender simply inserts the value of the clock (for example, the number of seconds past midnight) in the timestamp field for the first and second segment.
 * When an acknowledgment comes (the third segment), the value of the clock is checked and the value of the echo reply field
 * RTT is 12 s in this scenario.
 * The receiver’s function is more involved.
 * It keeps track of the last acknowledgment sent (12000).
 * When the first segment arrives, it contains the bytes 12000 to 12099.
 * The first byte is the same as the value of lastack.
 * It then copies the timestamp value (4720) into the tsrecent variable.
 * The value of lastack is still 12000 (no new acknowledgment has been sent).
 * When the second segment arrives, since none of the byte numbers in this segment include the value of lastack, the value of the timestamp field is ignored.
 * When the receiver decides to send an accumulative acknowledgment with acknowledgment 12200, it changes the value of lastack to 12200 and inserts the value of tsrecent in the echo reply field.
 * The value of tsrecent will not change until it is replaced by a new segment that carries byte 12200 (next segment).
 * Note that as the example shows, the RTT calculated is the time difference between sending the first segment and receiving the third segment.
 * This is actually the meaning of RTT: the time difference between a packet sent and the acknowledgment received.
 * The third segment carries the acknowledgment for the first and second segments.


 * PAWS


 * The timestamp option has another application, protection against wrapped sequence numbers (PAWS).
 * The sequence number defined in the TCP protocol is only 32 bits long.
 * Although this is a large number, it could be wrapped around in a high-speed connection.
 * This implies that if a sequence number is n at one time, it could be n again during the lifetime of the same connection.
 * Now if the first segment is duplicated and arrives during the second round of the sequence numbers, the segment belonging to the past is wrongly taken as the segment belonging to the new round.
 * One solution to this problem is to increase the size of the sequence number, but this involves increasing the size of the window as well as the format of the segment and more.
 * The easiest solution is to include the timestamp in the identification of a segment.
 * In other words, the identity of a segment can be defined as the combination of timestamp and sequence number.
 * This means increasing the size of the identification.
 * Two segments 400:12,001 and 700:12,001 definitely belong to different incarnations.
 * The first was sent at time 400, the second at time 700.


 * SACK-Permitted and SACK Options


 * As we discussed before, the acknowledgment field in the TCP segment is designed as an accumulative acknowledgment, which means it reports the receipt of the last consecutive byte: it does not report the bytes that have arrived out of order.
 * It is also silent about duplicate segments.
 * This may have a negative effect on TCP’s performance.
 * If some packets are lost or dropped, the sender must wait until a time-out and then send all packets that have not been acknowledged.
 * The receiver may receive duplicate packets.
 * To improve performance, selective acknowledgment (SACK) was proposed.
 * Selective acknowledgment allows the sender to have a better idea of which segments are actually lost and which have arrived out of order.
 * The new proposal even includes a list for duplicate packets.
 * The sender can then send only those segments that are really lost.
 * The list of duplicate segments can help the sender find the segments which have been retransmitted by a short time-out.
 * The SACK-permitted option of two bytes is used only during connection establishment.
 * The host that sends the SYN segment adds this option to show that it can support the SACK option.
 * If the other end, in its SYN + ACK segment, also includes this option, then the two ends can use the SACK option during data transfer.
 * Note that the SACK-permitted option is not allowed during the data transfer phase.
 * The SACK option, of variable length, is used during data transfer only if both ends agree (if they have exchanged ACK-permitted options during connection establishment).
 * The option includes a list for blocks arriving out of order.
 * Each block occupies two 32-bit numbers that define the beginning and the end of the blocks.
 * We will show the use of this option in examples; for the moment, remember that the allowed size of an option in TCP is only 40 bytes.
 * This means that a SACK option cannot define more than 4 blocks.
 * The information for 5 blocks occupies (5 × 2) × 4 + 2 or 42 bytes, which is beyond the available size for the option section in a segment.
 * If the SACK option is used with other options, then the number of blocks may be reduced.
 * The first block of the SACK option can be used to report the duplicates.
 * This is used only if the implementation allows this feature.
 * The SACK option announces this duplicate data first and then the out-of-order block.
 * This time, however, the duplicated block is not yet acknowledged by ACK, but because it is part of the out-of-order block (4001:5000 is part of 4001:6000), it is understood by the sender that it defines the duplicate data.

= OSPF BGP Interview Questions =

Hardware
1900 series routers for small branch office they support wan connectivity up to 25 M also called ISRG2 with two intergrated Gig ports. rd 3900 series for medium and large branch office support up to 375 Mbps ,for example in 3945 we have 3 intergrated Gig ports and we can installed T3/E3 card based on bandidth requirement of site.

ASR1002 -for large branch office and HuB topology.

Cisco Catalyst 2960G 24 and 48-Port Switches is EOL ,is replaced with 2960 X seris that is with 24 port and 48 ports switches ,support stacking,provide backplance of 80 GBPS.

2960X-

Total 10/100/1000 Ethernet Ports 	24 or 48 Uplinks 	                       2x10 GE (SFP+) or 4x1 GE (SFP) options FlexStack+ 	                     Optional on all LAN Base AND IP-Lite models PoE/PoE+ Power Available 	     370W or 740W

Architecture
Small branch office -up to 50 users .for small branch its not neccessary to have mutlilayer architecture.

Medium branch -up to 100 users .for medium/large we should have mutlilayer architecture to provide high availiblity and resilency,

Large branc- up to 200 users or more

Redistribution from osfp to bgp:

all redistributed routes into bgp takes ad value of BGP ,inorder redistribute all the ospf routes internal ,external (E1&E2) we need to uses redisrtibute ospf process mathc internal external 1 external 2

Redistribution of bgp into Ospf will take metric one ,Reditributio of ospf into BGP take IGP metric

Qos -Each router maintain two queue hardware queue works on FIFO and software queues (LLQ,CBWFQ,Flow based WFq) ,service policy applies only on software queue

Use the tx-ring-limit command to tune the size of the transmit ring to a non-default value (hardware queue is last stop before the packet is transmitted)

Note: An exception to these guidelines for LLQ is Frame Relay on the Cisco 7200 router and other non-Route/Switch Processor (RSP) platforms. The original implementation of LLQ over Frame Relay on these platforms did not allow the priority classes to exceed the configured rate during periods of non-congestion. Cisco IOS Software Release 12.2 removes this exception and ensures that non-conforming packets are only dropped if there is congestion. In addition, packets smaller than an FRF.12 fragmentation size are no longer sent through the fragmenting process, reducing CPU utilization.

It's all based upon whether there is or is not congestion on the link.

The priority queue (LLQ) will always be served first, regardless of congestion. It will be both guaranteed bandwidth AND policed if there is congestion. If there is not congestion, you may get more throughput of your priority class traffic.

If the class is underutilized then the bandwidth may get used by other classes. Generally speaking this is harder to quantify than you may think. Because in normal classes, the "bandwidth" command is a minimum of what's guaranteed. So you may get MORE in varying amounts just depending on what is in the queue at any point in time of congestion.

As mentioned before, policers determine whether each packet conforms or exceeds (or, optionally, violates) to the traffic configured policies and take the prescribed action. The action taken can include dropping or re-marking the packet. Conforming traffic is traffic that falls within the rate configured for the policer. Exceeding traffic is traffic that is above the policer rate but still within the burst parameters specified. Violating traffic is traffic that is above both the configured traffic rate and the burst parameters.

An improvement to the single-rate two-color marker/policer algorithm is based on RFC 2697, which details the logic of a single-rate three-color marker.

The single-rate three-color marker/policer uses an algorithm with two token buckets. Any unused tokens in the first bucket are placed in a second token bucket to be used as credits later for temporary bursts that might exceed the CIR. The allowance of tokens placed in this second bucket is called the excess burst (Be), and this number of tokens is placed in the bucket when Bc is full. When the Bc is not full, the second bucket contains the unused tokens. The Be is the maximum number of bits that can exceed the burst siz

Queing -FIFO,PQ,WFO,CBWFQ

PQ- high priorty queue is always serviced first irrrespective traffic coming fron other queue.

WFQ-flow based ,each flow consist of source port ,destination port ,source and destination WFO always give prefernce smaller flows and lower packet size

CBWFQ-each traffic is classifed and placed in class ,each class is allcated some amount of bandwidth ,queues are always serviced on basis amount of allocated bandwidth to queue.

Random Early Detection (RED) is a congestion avoidance mechanism that takes advantage of the congestion control mechanism of TCP. By randomly dropping packets prior to periods of high congestion, RED tells the packet source to decrease its transmission rate. WRED drops packets selectively based on IP precedence. Edge routers assign IP precedences to packets as they enter the network. (WRED is useful on any output interface where you expect to have congestion. However, WRED is usually used in the core routers of a network, rather than at the edge.) WRED uses these precedences to determine how it treats different types of traffic.

When a packet arrives, the following events occur:

1. The average queue size is calculated.

2. If the average is less than the minimum queue threshold, the arriving packet is queued.

3. If the average is between the minimum queue threshold for that type of traffic and the maximum threshold for the interface, the packet is either dropped or queued, depending on the packet drop probability for that type of traffic.

4. If the average queue size is greater than the maximum threshold, the packet is dropped.

IPSEC
TWo modes trasnport ,tunnel mode

Transport mode only data packet is encrypted tunnel mode -ESP header is placed between new IP header and data


 * -Encrypted---|

Data | Original IP Header | ESP Header | New IP Header

In Transport mode only the data is encrypted, and the original IP header is places in front of the ESP header.


 * --Encrypted-|

Data -- | ESP Header | Original IP Header

encryption algo -DES,3DES,AES

Phase 1 -authenticatation of IPsec peers and negotiation of SA to provide secure communication channel for phase 2

Phase 2-data is tranfered based on SA parameters exhange and keys stored in SA database.

Phase 1- securty poiclies are negotiated,Diffe helman exchange ( used to genrate the preshared keys) ,authentication of remote peer

Tranform sets-consist of encryption algo,authication algo,key length proposed. diffe helman -public key exchange method that alows two peers to establish shared secret key. secret preshared keys are manuualy entered to authiticate the remote Peer.

SA consist of encryption algo ,authtication algo ,destination adress ,key lenghth and life time of tunnel.

each SA has life time based on two factors either amount of data transfered or time in seconds.

1, Define ISAKMP polciy 2. Define tranform set includes encryptio and data intergrity also 3 create ACL for intersting traffic 4. create crypto map which matches previously defined paramters 5. apply crypto on outgoing interface.

we want to use RSA Keys instead of preshared key then isakmp identity need to be defined crypto isakmp policy 1 authentication rsa-encr group 2 lifetime 240 crypto isakmp identity hostname

Protocol 50-ESP traffic protocol 51-AH traffic udp 500-ISKMP Traffic

ISAKMP: Authenticates the peers, Determines if Authentication is preshared ot RSA-ecryption, and prepares the SA which includes group(length of key in Bits) and lifetime of the tunnel.

IPSEC Trasnform set determines the encyption protocol AH/ESP with Data Encryption standards(DES/3DES) for the data to be trasported across the secure tunnel & esp-sha-hmac defines the key stregth and hashing algorithm for sharing keys

Mode (Tunnel/Transport can be defind in trasform set only.

All traffic that goes through the ASA is inspected using the Adaptive Security Algorithm and either allowed through or dropped. A simple packet filter can check for the correct source address, destination address, and ports, but it does not check that the packet sequence or flags are correct. A filter also checks every packet against the filter, which can be a slow process.

A stateful firewall like the ASA, however, takes into consideration the state of a packet: •Is this a new connection?

If it is a new connection, the ASA has to check the packet against access lists and perform other tasks to determine if the packet is allowed or denied. To perform this check, the first packet of the session goes through the "session management path," and depending on the type of traffic, it might also pass through the "control plane path."

The session management path is responsible for the following tasks:

–Performing the access list checks

–Performing route lookups

–Allocating NAT translations (xlates)

–Establishing sessions in the "fast path"

Some packets that require Layer 7 inspection (the packet payload must be inspected or altered) are passed on to the control plane path. Layer 7 inspection engines are required for protocols that have two or more channels: a data channel, which uses well-known port numbers, and a control channel, which uses different port numbers for each session. These protocols include FTP, H.323, and SNMP.

Is this an established connection? sa

If the connection is already established, the ASA does not need to re-check packets; most matching packets can go through the "fast" path in both directions. The fast path is responsible for the following tasks:

–IP checksum verification

–Session lookup

–TCP sequence number check

–NAT translations based on existing sessions

–Layer 3 and Layer 4 header adjustments

Data packets for protocols that require Layer 7 inspection can also go through the fast path.

BGP
BGP SYnchronization rule -IF the AS is acting transient for other AS routes learn through BGP will not be advertized unless the all the routes learn this routes though IGP.

If we turned on the synchronisation BGP router will not advertize the route learned from IBGP PEER to EBGP Peer unless that route is learned through IGP.

Split horizon rule -Routes larn though IBGp nei will not be advertized to other IBGP nei.

BGP path selection cretiron -route is excluded if next hop is unreachable ,hightest wieight ,high local pref ,route if locally orginated ,shortest as path len,prefer lowest origin code (IGP<EGP<Unknown),lowest MED,ebgp overIBGP, between IBGP closed IGP nei ,bet EBGP oldest route,lowest Router ID.

BGP Message types -Keepalive ,notification ,open ,update.

Routes received from a Route-Reflector-client is reflected to other clients and non-client neighbors.So if we have two route reflectors we should also keep in separte clusters ,, to avoide loops .That means that if you have multiple RRs with different cluster ID, optimal path is selected by selecting shorter cluster list. Having multiple RRs in the same cluster creates partial connectivity during failure

The first route reflector also set an additonal BGP attribute called originator id and add it to BGP router -id of client.if any router receive the route which contains its own router id will ignore the route

Confedrations -Breaking As into smaller As so that they can exchange routing updates using intra confedration EBGp Seesion. but on the intraconfedration EBGP session parmaters for IBGP are still preserved .(like next hop self,metric ,preference)

commands -under BGP process bgp confedration id x.x -Original As        -BGP confdration peers x.x ,y...- Need to specify the the intra confdration with in AS.

MED Vs As path prepend -MED doesnot goes beyond neibor As while As path prepeend goes beyond that.

BGP always compare md -compares MED for a path from neibors in differnt AS.

BGP Determinsic-Med -comparison of MEd for a path from differnt Peers advertize in same .As,

BGP conditional advertizement uses two terms advertize-map and non-exist-map ,advertize the prefix in adtervertize map only if there is no route in BGPtable defined in non-exist-map

BGP conditonal Inject and Exist map -BGP conditional Route injection advertize the specific route defined in inject map from the summary route present in exist map .Its reverse of Aggregation.

SOO -Site of orgin -is used to prevent routing loops and is used to identify the site from where the route is orginated and does not readvertize same route back to the site.

SOO is enabled on PE routers -marked the customer prefixes.

BGP communities are used to TAG the routes and they are used to perform policy routing in upstream router .Community attribute consist of four octets .. Inorder to send community we need to use send community command under BGP process. BGP community are : Internet: advertise these routes to all neighbors. Local-as: prevent sending routes outside the local As within the confederation. No-Advertise: do not advertise this route to any peer, internal or external. No-Export: do not advertise this route to external BGP peers.

Local AS command can be used in while migration of As - it will genrate BGP open message which is defined in local AS. nei x.x.x. local 100 no prepend replace as dual-as.( can be used for remote peer to configue whatever AS no has configured at there side ).

Peers Group -Peer groups are a way of defining templates/groups with settings for neighbor relationships. The same policy that goes to 1 neighbor in the peer group must go to all if it case one neighbor has a slightly different config we do not use peer-group for this neighbor the idea being a group with all required bgp settings and then add the neighbors to this group so they inherit the settings. using BGP peer group one update is sent to peer group instead of individual updates helps in optimisation of updates .Configration makes its simpler.

BGP route relector -Eliminates the need of bgp full mesh ,similar to ospf DR ,BDR elecltion, only peering needs to with RR. When RR get the update from its client it sent to other RR and its client. Modify the spilt horizon rule .BGP cluster id is used as loop prevention. Doesnot modiy the next hope attributes. Route reflectores modify split horizon rule now routes learn through IBGP can be forwarded to other IBGP nei ,route reflectore can do. if the client is having IBGP session with multiple routereflectores so each client will receive two copies of all routes.this can create the routing loops to avoid it each route reflector and its client form cluster which is identifed by cluster id which is unique in AS. whenver particular route is reflected route reflector router id is added to cluster list attirbute and set cluster id number in cluster -list.if for any reason route is reflected back to route reflectore for some reason it will reconganize cluster id includes its own router id. and will not forward it.

the BGP Link Bandwidth feature used to enable multipath load balancing for external links with unequal bandwidth capacity. This feature is enabled under an IPv4 or VPNv4 address family sessions by entering the bgp dmzlink-bw command. This feature supports both iBGP, eBGP multipath load balancing, and eiBGP multipath load balancing in Multiprotocol Label Switching (MPLS) Virtual Private Networks (VPNs). When this feature is enabled, routes learned from directly connected external neighbor are propagated through the internal BGP (iBGP) network with the bandwidth of the source external link.

The link bandwidth extended community indicates the preference of an autonomous system exit link in terms of bandwidth. This extended community is applied to external links between directly connected eBGP peers by entering the neighbor dmzlink-bw command. The link bandwidth extended community attribute is propagated to iBGP peers when extended community exchange is enabled with the neighbor send-community command.

it should be configured in conjuction with max path command.

bgp dmzlink-bw neighbor ip-address dmzlink-bw neighbor ip-address send-community [both | extended | standar

Aggreagate with AS set command -normal aggregation with summary command advertise the summary prefix only and suppress all the specific routes ,so router which is performing the aggreagation will include its own AS while sending the update. so when Aggreagate with AS set command is used it will include all the AS in updates for summary prefix  for those AS route performing the aggregation with AS list ,this will prevent routing loop.

attribute map -can be used to modify the community received in aggregation router to none.(command ) MAP.When particular is sending the prefix to router performing aggregation with community like no export  attached ,Aggregate router will inherit the communtiy and can cause issue to aggregate prefix while propagating ,TO avoid it we can  modifiy the community to none using atrribute map command (aggrgate address x.x.x.x .x.x.x as-set summary only attribute map )

BGP Backdor link- used to modifiy the AD for external route from 20 to 200 so that IGP learned route can be prefered over EBGP. command will be added to router which is learning the prefises from two routing ptotocols.

router bgp x.x.x.x

network x.x.x.x mask backdoor

OSPF
OSPF Packet type -Hello ,DBD ,LSR ,LSU ,LSA Each interface participate in OSPF send hello at 224.0.0.5 two router to form neighborship-same area ,samehello and dead interval,same subnetmask ,authentication must same. OSPF States-down,init,two way,extrat (DR ,BDR secltion),exchange (DBD contains entry of link or net type having following info link type,adv router,seq number,costoflink),if router donot have update info for link type it send LSR (loading state ),Neirouter send updated LSU again LSR router adds new entry in lSDB once all the routers have identical LSDB -routers are in full state.

to send request to DR and BDR (224.0.0.6 ).

for broadcast n/w type each ospf speaking router will be form full adjancey b/w DR, BDR and two way state b/w DR other routers.

sh ip ospf database summary ( prefix ) will give information for type 3 inter area routes learned via ABR. Type 3 LSA called summary LSA doesnot mean network prefixes are summarised while propagated by ABR means topolgy information is summarised.

EACH LSA in lSDB contains seq number ,EACH LSA is flooded after30 minutes ,each time LSA is flooded it is incremnted by one )-195

point to point -T1,E1,neiborus are discovered automatically,hellos send at M.A 224.0.0.5 ,NO DR BDR election as there are only two routers.

multiacess -DR ,BDR election DR failes BDR becomes DR and new BDR is elected.

if new router added with highest priorty it will not preemt existing DR and BDR election ,if DR or BDR goes down then only selection starts.

DR/BDR-ip ospf priority =0 for DR other

STUB Area- All the routers in Area must agree on stub flag, does not allow type 5 and type 4 LSA.and ABR genartes default route in stub area to reach external destination. to cofigure stub area - area x stub

Tottaly Stub area - removes type 3 ,4 ,5 LSA and ABR genrates inter area default route, total stubby area configured on ABR of the area. To configure totally stubby - on ABR area x stub no summary and other routers need to configued wth area x stub command.

NSSA area -was desgined to keep stub feature attribute and also allowed external routes ,ASBR will genrate type 7 LSA in NSSA and se the P bit 1 and ABR will translate type 7 to type 5 propagate in ospf domain and all routers should agree on NSSA area.ABR doesnot genrate default route automatically .so in case if we other external AS connected to other areas NSSA area will not have information for that external routes, so in that case we need to genrate defaul route mannually.

NOSo-total stubby area - remove type 3 ,4 ,5 lsa, genrates type 7 LSA and ABR genrates default route .note it is not necessary for ABR to be part of total stubby NSSA it can still run NSSA for that area in ospf process.

Order of preference of OSPF routes- O, OIA ,E1,E2 ,N1,N2.

When ABR does LSA translation from Type 7 to Type 5 ,if we look for external network in an area using sh ip os database external... there are field,Advertising router and Forwading address ,Advertising address will be address of ABR which is doing the translation and Forwading address is address of ASBR. Also if the forwading address field is 0.0.0.0, then traffic will be forwading to who is orginating the route.

if we have mutliple ABR in NSSA the ABR with highest router id will genrate type 5 LSA. this doesnot mean all the traffic will follow the ABR with highest router id because the forwading address field contains the information for the ASBR to reach external destination.

In case if we want to change the forwading address on ABR while tranlating from type 7 to type 5 we can use the command area i nssa no summary translate type 7 suppress forwading address.

Note - in the LSA lookup if the forwading address is 0.0.0.0 so the router which is advertising the lsa and is announcing it self to use himself to reach destination.

E1 and E2 routes -E1 routes external cost is added to cost of link packet traverse ,if we have multiple ASBR then we should use marked external routes as type E1

if we have muliple ASBR ,then default metric to reach external network would be same propagted by both of them ,in that case each ospf speaking router will use forward metric to reach ASBR as best path.In case the forward metric is same then decision will be based on router id of ASBR.

that can be verified by - sh ip ospf database external XXXx.

E2 -External cost only ,if we have single ASBR

Note- ABR has information for all the connected area's so when genrating the type 3 SLA topogy information is summarised and propagated from one area to other area.


 * Loop prevnetion mechanism in OSPF-Its ABR only that accespts and process the type 3 LSA if it is from backbone area.

area X filter-list prefix {in|out}. Good news here – this command applies after all summarization has been done and filters the routing information from being used for type-3 LSA generation. It applies to all three type of prefixes: intra-area routes, inter-area routes, and summaries generated as a result of the area X range command. All information is being learned from the router’s RIB. used to filter specific prefix in Type 3 LSA.

LSA Type 5 filerting -This LSA is originated by an ASBR (router redistributing external routes) and flooded through the whole OSPF autonomous system,Important -You may filter the redistributed routes by using the command distribute-list out configured under the protocol, which is the source of redistribution or simply applying filtering with your redistribution.

The key thing you should remember is that non-local route filtering for OSPF is only available at ABRs and ASBRs

Distribute list out on ABR and ASBR will filter the type 5 LSA while propagting --

we can verify using sh ip ospf database external x.x.x.x

Distribute list in - Will filter the information from routing table but lSA will still be propagtint to neiobor routers.

If we have NSSA area we want to filer type 5 SLA on ABR we can filter the forwading address using ditribute list on ABR. ( As the forwading address is copied from type 7 SLA when ABR regenrates the type 5 SLA out of it.

OSPF Network Types :

1 Point to point - Supports broadcast like t1, E1, there are only two routers no DR/BDR election ,hello and dead are 10/40

2. Brodacast - Like ethernet ,broadacst capabilty, There is DR and BDR election ,10 and 40

3. point to multipoint brodacast - have broadcast capabilty, NO DR and BDr election , hello/dead are 40 /130 , In case of hub and spoke topology hub will form adjancy with the spokes ,other spokes will not form adjancy as there is not direct layer connection so when hub receive the update from spoke it changes its next hop self while propagating the updates.

4. Point to multipoint non brodcast - No broadcast capabilty, hello will be send as unicast ,will not be send if neighbors are not defined manually As there is no brodcast capabilty hellos are send as unicast and there is no DR /BDR election. hello/dead are 40 /130 ,Special next hope processing.

Non-Broadcast is the default network type on multipoint frame-relay interface, eg a main interface. 5.Non broadcast n/w - Default network type is nonbroadcast for frame-relay network, there is no broadcast capabilty , hello are send as unicast ,neibors need to define mannualy .hello /dead 30-40 ,DR and BDR election , NBMAN-(Non broadcast)-Nei needs to define mannualy ,there is slection of DR and BDR ,full mesh or partail mesh,IN NBMAN if there is DR ,BDR selction all routers should be fully meshed or DR BDR can be staticly configured on router that should have full adjancies to all routers. Make sure the for non-broadcastn/w make sure hub is chossen as DR and need to define nei mannaulay to send ospf updates as unicast.

Note - Broadcast and non broadcast n/w, DR on receiveing the LSA's didnot change the next hop while propagating the LSA to other DR-other routers so in case of broadcast segment its fine while for non broadcaset frame relay network we need to mannualy define the layer 3 to layer 2 resoltuion to reach that neibour. while in case of point-point, HDLC there is only one device at other end so layer 3 to layer 2 mapping is not required.

6. In OSPF loopbacks are advertised as stub host and network type loopback.if the mask of loopback is /24 and we want to advertise as /24 to ospf domain we need to change the network type

By adjusting the hello/dead timers you can make non-compatible OSPF network types appear as neighbors via the “show ip ospf neighbor” but they won’t become “adjacent” with each other. OSPF network types that use a DR (broadcast and non-broadcast) can neighbor with each other and function properly. Likewise OSPF network types (point-to-point and point-to-multipoint) that do not use a DR can neighbor with each other and function properly. But if you mix DR types with non-DR types they will not function properly (i.e. not fully adjacent). You should see in the OSPF database “Adv Router is not-reachable” messages when you’ve mixed DR and non-DR types.

Here is what will work:

Broadcast to Broadcast Non-Broadcast to Non-Broadcast Point-to-Point to Point-to-Point Point-to-Multipoint to Point-to-Multipoint Broadcast to Non-Broadcast (adjust hello/dead timers) Point-to-Point to Point-to-Multipoint (adjust hello/dead timers)

command lines ,

1 sh ip os inter brief 2. sh ip route ospf 3. sh ip os boarder routers 4. sh ip os da summary x.x.x - type 3 5. sh ip os da external x.x.x.x-type 5 6. sh ip os data router .x..x.x.x- type 1

Sumarisation can occur on ABR and ASBR

ABR uses area range command when ABR /ASBR does sumarization it genrates null route for the summary, in case spefic prefix went unreachable for some reason and ABR has received traffic for that preifx it wll drop the traffic , if we want to avoid it use default route to forward the traffic we can use command ( no discard route internal / external) to drop the null route from routing table.

ASBR- Summary address x.x.x.x mask

RFC 2328 -to learn the ospf

Virtual links
All areas in an Open Shortest Path First (OSPF) autonomous system must be physically connected to the backbone area (Area 0). In some cases, where this is not possible, you can use a virtual link to connect to the backbone through a non-backbone area. You can also use virtual links to connect two parts of a partitioned backbone through a non-backbone area. The area through which you configure the virtual link, known as a transit area, must have full routing information. The transit area cannot be a stub area.

The transit area cannot be a stub area, because routers in the stub area do not have routes for external destinations. Because data is sent natively, if a packet destined for an external destination is sent into a stub area which is also a transit area, then the packet is not routed correctly. The routers in the stub area do not have routes for specific external destinations.

we can also use GRE link between nonbackbone area and backbone area ,run area 0 over tunneled interface but there is GRE overhead.IN case of virtul only OSPF packets are send as tunneled packet and data traffic is send as it is normal area connected to backbone area.

EIGRP
EIGRP runs on ip protocol 88, ospf 99

Eigrp is hybrid protocol and has some properties of distance vector and some link state.

Distance vector - Only knows what its directly connected neibors are advertizing and link state because it form adjancies.

Inorder to form adjancency EIGRP AS no should be same between neihbours.

EIGRP Multicast adress -224.0.0.10

EIGRP is like bgp will only advertize the route which is going to install in routing table.

EIGRP classes protocol does automatic summary by default ,so we need to disable the automatic summarisation ( no auto summary )

EIGRp does spilt horizon, in case of DMVPN we need to disable the split horizon so that routes learned on tunnel interface through one spoke need to advertize to other spoke through same tunnel interface. e

passive interface command works silghtly different in EIGRP ,it stops sending multicast/ unicast hello to nei thus prevent forming adjancies.

Issuing a neighbour statment in eigrp on a link means it stops listen to mutlicast address so we need to specify the neighbour mannuly to other side to form adjancies.

Timers in EIGRP is not nessescary to match to form adjancey.

EIGRP -Metric calculation by bandwidth ,delay ,relibilty ,load MTU.

Bandwidth is scaled as minimum bandwidth and total delay ,highest load ,lowest reliablilty while calculating composite metric.

Feasible distance is best metric along the path and its successor metric.

EIGRP -FD-is best metric along the path to desination router including metric to reach the neibor

Advertised distance -toatl metric along the path advertized by up stream router.

a router is feasible successor if AD<FD of successor

FD is used for loop avoidance. spilt horizonrule -never advertized the route on the interface on which it is learned.

Feasible succesors are only candidates for unequal path load balancing.

Load balancing is done in EIGRP though unequal cost paths through variance multiplier. EIGRP is only routing protocol that supports load balancing across unequal path unlike like rip ,ospf ,isis. Fd <= FSx variance ( FD) then the path is choosen for unequal cost load balancing.

EIGRP traffic eng.could be easily achieved by modify the delay vlaue instead of bandwidth.

EIGRP command ( sh ip eigrp nei, sh ip eigrp nei de , sh ip eigrp topology , sh ip eigrp route)

Equal cost load balancing the traffic is distributed based on CEF.to turn off cef on interface do ( no ip route-cache)

SIA -Stuck in active ,if router receive a queries for destination neworwork it taking too much time to respond be baecause of network flap or some network condtion occur route is considered in SIA state.

we can tune the amount of time router should wait before putting route in SIA state by timers acive-time command

to check which routers have not replied with queries issue sh ip eigrp topolgy ,router denoted by R meaning waiting for replies.

EIGRP perpforms auto summarization for a n/w when crossing a major n/w boundary

* Split horizon should only be disabled on a hub site in a hub-and-spoke network. no ip split-horizon eigrp x

EIGRP router id helps in loop prevention for external routes which says if I gets the routes with orignator that is equal to my router id then I will discard the routes

EIGRP provides faster convergnece as it doesnot need to run dual algo in case if there is feasible successor for the path, else if router do not have route it will send the query to its neibour router which will further progates the query to there neibours if the router doesnot recive the reply from the neibour before the timer expires it will mark this route in Stuck in active state and reset its neibour relationship if all its query are not answered with time time period. while in OSPF if the primary path goes down ,it need to send the LSA and SPF algo is run again. dcesor in mind. There is ways to bound the query domain You can do in either of 2 ways or both

1) Using Summary routes -ip summary-address eigrp 'as' [network] [mask] [ad] If RouterA sends a query message to RouterB and summarization is in use, RouterB will only have a summary router in its EIGRP topology table – not the exact prefix match of the query and will therefore send a network unknown response back to routerA. This stops the query process immediately at RouterB, only one hop away.

2) Using Stub - router eigrp 1 eigrp stub ' arguments' the default arguments are connected and summary this means it will advertised connected and summary routes only. A router will inform it neighbor of it stub status during the neighbor adjacency forming

Stub routers tell their neighbors “do not send me any queries”. Since no queries will be sent, it is extremely effective. However, it is limited in where you can use it. It is only used in non-transit paths and star topologies.

3. filtering the prefix

please note Eigrp neighbor router will propagate query received from neighbor router only if it has the extact match for the route ints topology table, if router doesnot have exact route in toplogy table it will send the reply with route unknow to its neighbor and further query will not be propagated.

4.Different AS domains

Different EIGRP AS numbers. EIGRP processes run independently from each other, and queries from one system don’t leak into another. However, if redistribution is configured between two processes a behavior similar to query leaking is observed.

Both IGRP and EIGRP use an Autonomous System (AS) number and only routers using the same AS number can exchange routing information using that protocol. When routing information is propagated between IGRP and EIGRP, redistribution has to be manually configured because IGRP and EIGRP use different AS numbers. However, redistribution occurs automatically when both IGRP and EIGRP use the same AS number

MPLS
LAbels are locally significant between two attached devices .Once the mpls ip is enabled lables are advertised for connected interfaces and IGP learned routes.

MPLS label -32 bit ,first 20 bits label value .20-22 -experimental bits for qos ,23 -BoS(bottom of stack bit to signify the bottom label in stack ,24-32 (TTL vaule )

MPLS label is palced between layer 2 and lyer 3 header know as shim headder.

FEC-group or flow of packets that are forwaded along the same path with same treatment. x Protocol used to distribute labels are LDP ,TDP and RSVP TDP is cisco propriatry.there is formation of LIB which contains local binding and remote binding from all the LSR,what extacly the remote binding need to be used based on best route in Ip routing table information is populated in LFIB.

LDP is used for neighbour discovery over udp port 646 on multicast address 224.0.0.2

for neighbor adjancy on tcp port 646.

Label advertisemnt is for IGP connected interfaces and IGP leanred routes.

How does router determine wheather it is ip packet or labeled - there is protocol field is layer 2 frame ,that tell router to look the cef for ip packet or to look LFIB.

Inorder to see extract from LFIB- sh mpls forwading-table

LFIB can be also seen as - sh mpls forwading-table prefix length

MPLS Stack operatios (Push ,pop,swap,Untagged ,aggregate- summaristion is performed on router ,to remove the lable and perform IP lookup,)

labels 0 to 15 are reserved lables - lable 0 is explict null lable ,lable 3 is implict null lable ,label 1 router alert, label 14 OAM alert label

Use of Implict null lable is penultimate hop popping.

Explict null lable is used to reserve the Qos information.

Inorder to change the mpls lable range - mpls lable range 16 to 10 lakh

MPLS lDP works on UDP protocol 646 and LDP hello messages are sent over multicast address 224.0.0.2 Inroder to check labels are received or not - sh mpls ldp discovery detail

COMMAND LINES FOR MPLS `

1. IP CEF 2. MPLS LABEL PROTOCOL TDP / LDP 3. MPLS IP

SH MPLS LDP INTERFACE sh MPlS LDP NEIGHBOR sh MPLS FORWADING TABLE SIMMILAR TO sH IP ROUTE.

php-Penultimate Hope Popping which says that device next to last hop in the path is going to remove the label for the optimisation of lable lookup so that end device doesnot need to perform two looks while sending the traffic to end customer.

so to acomplish this router which is next to last hop send implicit null label for all its connected and loopbackinterfaces.

Note for any destination which is one hop away in mpls forwading tabel we are going to see POP LABEL.

P routers in the core doesnot need to know the full reachbilty of customer routing information as they just swicthed the packets based on labels.

FOR MPLS to work correctly we need to enable BGP next hop self command for the EBGP updates to propagate over IBGP PEER with next hop information for loopback interface .if the BGP peering is formed not over loopbacks between PE'sinstead of phyical interfaces peerring will be formed but it will lead to black hole as the pHP will cause third last hop to perform POP operation and traffic will be forwared to next to last hop as ip packet for which it doesnt have information for the destination. the isssue is PHP get processed one hop too soon.

MPLS basis consist of two comonents 1) VRF's -separatation of customer routing information using vrf's per interface 2)exchange of routing information using MP-BGP.

VRF's without MPLS is called VRF lite .when using VRF's lite route distingusiher is only locally significant.

when we create VRF's any packet that comes to interface in VRF then the routing loopkup is done on that VRF's.

VNPV4 route- RD+IPV4 prefix (makes vpnv4 routes unique globly.(RD is 8 byte)

mpls vpn label - PE route exchange lable for each customer route via VPNV4.

Transport label- to tranport packet across remote PE.

RT_route traget is used to tell the PE which VRF route belongs and its BGP extented community attribute.

if we are running EIGRP over VRF's then we need to specify the autonomus system inside the vrf's separately else EIGRP adjancy will not be formed over EIGRP.

Route Target export- to advertise the routes from vrf into BGP.

Route Target import -To import the routes from BGP into VRF.

Between the PE's routers peering will be done globaly however customer routes will be redistributed in address-famil vpnv4.

Please note while configuring vpnv4 we need to acitivate the vpnv4 capabilty with remote-peers.

loop prevention mechanism for route-target -the route will not import any prefix into vrf unless it is specified.

packet structure-               Layer2 header-Transport+VPN--IP header-LAyer4 headerPAyload

So when the traffic reaches from remote PE to PE on other side it will just refer to VPN label to see which exitinterface or VRF packet belongs too.

Steps for MPLS once basic connectvity and MPLS is enabled on interface in MPLS n/w

1. create VRF with route distingusiher+RT

2. Assign VRF to interfaces

3. RUN VRF aware routing process betweem PE to CE

4. ESTABLISH VPNV4 PEERS

5. Redistriute subnet from VRF to BGP and vice versa..

SHAM Links
SHAM links are basically creation of Virtual links between PE running BGP network and extending OSPF domain over mpls.

When we are running OSPF between PEto CE and rediribute ospf routes into bGP and vice versa there is addtion ospf attibutes that is attached in BGP VPNV4 routes. so on other PE sidte when this routes are rediributed back from BGP to ospf these attributes helps where the redisributes routes to place in OSPF database as type 1,2 ,3,4,or 5.

Additionl attributed encoded from OSPF to BGP is like expample ( OSPF domain id ) which is created by the the local process id running if the ospf process id is same as doamin id in VPNV4 prefix ,the routes are injected in OPSF database as Type 3 LSA even if they are redistributed from BGP to OSPF. if the domain id do not match the routes are leanred as type 5 for other vpn site.

So if we have backdoor link between two sites ,backdoor link is always perfered instead of MPLS,so to avoid it we create a SHAM links over PE's like GRE tunnel to extend the OSPF domain over MPLS.so when the routes are reditrbuted from BGP to OSPF as Intraarea routes rather than interarea.

How to create SHAM links.

1. Allocate a address between the PE's reachable over mpls

2. under OSPF for that VRf create adjancy over PE's

router osps 1 vrf c area 0 shamlink source address destination address

OSPF path selection creteria -if we have two routes learned as Inter area routes but one of route is leanred BY ABR in backbone area and other via ABR in over non backbone area ,prefix is always preferd by backbone area.

Loop prevention mechanism for OSPF changes when its being used as Layer 3 MPLS.

Using OSPF Between PE/CE customer routes are sent as Type 3 LSA so this sent as DN(down) bit set so if the same route is recieved BY PE on other side it will make PE aware not to redistibute the route back in BGP.

Cabailty VRF lite command under OSPF process is used to ignore down bit and TyPE 3 lSA will not installed in routing table.

For Type 5 LSA either we need to do with DOWN bit or route TAG to prevent the loop.

Commands for switching
Note -Layer 2 header contains source mac ,des mac ,ether type ,ether type fields tells the process next layer 3 protocol like ipv4 ,ipv6.

sh int fa0/1 switchport ( trunk ,access ,administrative mode )

sh int trunk ( ports which are trunk )

sh spanning tree vlan 1 ( to check wheather traffic is forwaded in spanning tree )

if we have layer 2 ether channel then if we do sh spanning tree output it should show individual port channel group in output rather than individually phsyical links else we have issue.

on the swicth we have root port and designate port ,all the traffic from root port will be forwaded towards root bridge.

if the two switches are in differnt VTP domain, as long as they have trunking set between them is correct they will not effect the broadcast domain -Good

two ways to change priorty for root bridge

spaniing tree vlan 2 root primary

spanning tree vlan 2 priorty lesser than 32768

In spanning tree one of election for root port on non route bridge is based path cost that is local to interface

in 3560 swicth by default PVST+ is enabled

AUto -Auto -results in access port access mode-Dynamic desirable -Access port tunk with nonnegotiate ---auto -Becuase switch on left side is not sedning DTP frames.

BEst practises of truking -mode trunk and non negotiate ,Trunk negotaition are done on DTP when using DTP both the ends should in same VTP domain

when frame traverse the trunk link it is marked over truking protocol and on receiving end VID is removed before sending to access link

ISL and 802.1Q

ISL -encapulsate entire frame ,it dos not native vlan traffic ,orginal frame unmodifed ,ISL adds 26 byts header and 4 bytes trailer.range of isl 1-1024

802.1Q-insert 4 byte tag ,does not tag the frame that belong to native vlan ,additonal tag includes priroty field ,extending qos support ,4096 VLans,1-4096

inorder to maintain identical information of vlan database ,VLAn information is propagatd over trunk links in same VTP domain ,VTP information is advertized over trunk links only.

VTP is layer 2 messaging protocol.three version of VTP (1,2,3)

Limitaion of VTP version 1 ,2 -extended VLan funstionality wasonly used in when switch is configured in trasnsparent mode ,so the VTP version 3 is used.

Server mode -create ,del ,modify ,send and forward advertizements ,syn vlan database ,store information in nvram

transparent mode -`create ,del ,modify local Vlan ,forward advertizements,no syn vlan database, store information in nvram

client mode -- canot create ,del ,modify vlans ,forward advertizements,syn vlan database,do not store information in nvram.

Important -when ver new switch is added make sure its configration revision is less than any other swiches in VTP doamin else if it is high then it will erase all the vlan information of server and client to protect that either add switch in transpanrent mode or in differnt domain.

for VTP configration requires VTP domain ,password ,VTP mode on each switch .sh VTP status or VTP counters.

VTP pruning -used to remove unnessary flooding of brodcast traffic on the network.

STP-is used to avoid unwanted loops in the environment.

STP created one refernce point in n/w that is called root of tree ,based on rerfernce point decides whether there is redundant path in the n/w

Layer 2 forwading -By default CAM table entries got aged out every 300 sec

We can also create static mac address table entry in cam - command ( mac-address-table static mac-address VLAN id interface type )

Bridge segments collsion domain dose not segmets broadcast doamin

Root bridge -selection is based on bPDU contains bridge id which is combination of mac address and priorty (both are chosen lower ) on root bridge both the ports are DP. then there is selection of root port on non root bridge.

for root port selection is based on following paramteters ( lower root bride id ,lowest path to root brige ,lowest sender bridge id ,lowest port priority ,lowest port id.

for every lan segment -there is secltion of DP (selection is based on root id creteria)

802.1d states -Disabled ,blocking?(listen to incoming BPDU) ,listening ,learning ,forwading (tranmit BPDU)

Hello time -Default is 2 seconds ,time interval in which subsequent configration BPDU send root bridge ,for non root bridge TCN BPDU is 2 sec.

Forward delay -time interval swich port spends in listening and learning states ,default time is 15 second

Maximum age --time when max age is timed out is 20 seconds when the BPDU is aged out.

In case if any interface flap ( up /down states )switch will send the TCN BPDU untill it reach root bridge ,root bridge will send the configration BPDU with TC flag set and each switch will will rebuild its mac table based on forwadig delay time .(default is 300 sec) total time is 17 seconds.

total time the port trantion from blocking to forwadig state is 30 seconds

Port fast feature -when we enable port fast on the port so TCN BPDU is send in case of Topolgy change and port directly transtion to forwading state .SO there are chances that port fast enabled port could cause STP loops if the accidently switch is installed on that port ,to prevnet this we use BPDU Guard along with STP.

We can manully select the root bridge -spanning tree VLAn vlanid priotry (bridge priority)

we can set mannualy to become one bridge to be root bridge ( spanning tree vlan vlan id root (primary ,secondary,diameter)

We can aslo set the path cost -spanning tree vlan vlanid cost

port id is 16 bit -8 bit port priorty + 8 bit port number

spannin tree vlan vlan id port priority

RSTP have rapid convergence time ( discadring ,listening ,forwading )

RSTP works on port rules instead of rely on BPDU from root bridge.

RSTP-root port ,DP,alternate port is back up of root port ( have two up links ), back up port ( given segment active ling fail and there is no path to reach root then back up port become active.

IN RSTP all the full duplex ports are point to point links ,BPDU are exchanged between swiches in form of proposal and agreement ,once the given port is selected as DP and other switch send agrremnts message , RSTP convergys qucikly by throgh RSTP handhake.

HSPR-Provide redudancy of the gateways ,HSRP exchange the HSRP hello message on 224.0.0.2

VRRP-In VRRP we can use real ip add of router as virtual address ,IEE standard,router with highestest priorty is master router and other acts a back and VRRP messages are send on multicast address 224.0.0.18 ,Default interval is 1 second and preemtion is enabled by default.

GLBP -uses concept of AVG and one router act as primary while other act as backup ,AVG assign virtual macs to AVF,and it is AVF's which forwrd the packets based on virual mac's assgin by AVG.,

GLBP communicate over hello packets send every 3 seconds on multicast address (224.0.0.102),GLBP suppots up to 1024 vrtual routers.

This table shows the support of MST in Catalyst switches and the minimum software required for that support.

Catalyst Platform MST with RSTP -- (12.1 or higher ) Catalyst 2900 XL and 3500 XL Not Available Catalyst 2950 and 3550 Cisco IOS® 12.1(9)EA1 Catalyst 3560 Cisco IOS 12.1(9)EA1 Catalyst 3750 Cisco IOS 12.1(14)EA1 Catalyst 2955 All Cisco IOS versions Catalyst 2948G-L3 and 4908G-L3 Not Available Catalyst 4000, 2948G, and 2980G (Catalyst OS (CatOS)) 7.1 Catalyst 4000 and 4500 (Cisco IOS) 12.1(12c)EW Catalyst 5000 and 5500 Not Available Catalyst 6000 and 6500 (CatOS) 7.1 Catalyst 6000 and 6500 (Cisco IOS) 12.1(11b)EX, 12.1(13)E, 12.2(14)SX Catalyst 8500

Spaning tree
Spaning tree features that helps in reducing covergence time

1 .Portfast -used for access layer ports ,Ports directyly transtion to forwading state with out going to lisening and learing states.

2. uplink fast -is used in case of one of uplink goes down ,root port and alternate port forms uplink group ,if the root port goes down alternate port directyly transtion to forwading state with out going to lisening and learing states.

3. backbone fast -In case of indirect link failure ,switch on where backbone fast is enabled receice inferior BPD's from Desiganting switch anouncing it self as root bride ,On revceving the inferior BPDUS it will expire the max aga time imidiatlly and reconverge the toplogy.Backbone fast helps in optimisation of max-age timer,should be implemented globally. switch determine that path to root bridge has gone down so send the RLQ out all its ports and once the root bridge recieve the RLQ and send the response back and port receving the response can transtion to forwading the state

PAGP
auto Places a port into a passive negotiating state, in which the port responds to PAgP packets it receives but does not start PAgP packet negotiation. This setting minimizes the transmission of PAgP packets. This mode is not supported when the EtherChannel members are from different switches in the switch stack (cross-stack EtherChannel).

desirable

Places a port into an active negotiating state, in which the port starts negotiations with other ports by sending PAgP packets. This mode is not supported when the EtherChannel members are from different switches in the switch stack (cross-stack EtherChannel).

CISCO 3750 Stacking
All stack members are eligible stack masters. If the stack master becomes unavailable, the stack members that remain participate in the election of a new stack master from among themselves

Switches should have same ios for stack memeber to be fully functional ,if there is major version misimatch then switch will not join the stack however if there is minor version mismacth it will upgrade the switch to become fully functional.

The default stack member number of a 3750 switch is 1. When it joins a switch stack, its default stack member number changes to the lowest available member number in the stack. Stack members in the same switch stack cannot have the same stack member number. Every stack member, which includes a standalone switch, retains its member number until you manually change the number or unless the number is already used by another member in the stack.

Provisioning of switch -

You can use the offline configuration feature to provision (to supply a configuration to) a new switch before it joins the switch stack. In advance, you can configure the stack member number, switch type, and interfaces associated with a switch that are not currently part of the stack. The configuration that you create on the switch stack is called the provisioned configuration. The switch that is added to the switch stack and that receives this configuration is called the provisioned switch.

You manually create the provisioned configuration through the switch stack-member-number provision type global configuration command. The provisioned configuration also is automatically created when a switch is added to a switch stack that runs Cisco IOS Release 12.2(20)SE or later and when no provisioned configuration exists.

switch 2 provision ws-c3750-48ts

Remove switch from stack-no switch 2 provision ws-c3750-48ts

Spaning tree security features
Spanning Tree enhancements:

bpdu Gaurd---Enable on the edge ports, connected to the hosts. If bpdu is reveived on these interfaces, it will put the interface in shudown state.

bpdu filter---Enable on edge ports---it dont send and recieve bpdu if enabled, if bpdu received, drop the bpdu, port goes, through normal stp states.

root gaurd: Root guard prevent the switch to become root bridge, It is enabled on the designated ports of root switch, so that if those ports listen to the superior BPDU then put that port in inconsistent state.

Loop Gaurd: Spanning Tree Loop Guard helps to prevent loops when you use fibre links.STP is not able to detect Layer 1 issue, Enable alternate ports/backup ports when Loop Guard detects that BPDUs are no longer being received on a non-designated port, the port is moved into a loop-inconsistent state instead of transitioning to the listening/learning/forwarding state. and idealy it can be enabled on all the ports.should be enabled on non-designated ports.

Actually, loopguard is a method of protecting against unidirectional links. In order for spanning tree to function correctly, any link participating in the STP have to be bidirectional. If a link should become unidirectional, through a cable failure or interface fault, spanning tree could unblock a link which would cause a loop.

UDLD (UniDirectional Link Detection) is a Cisco proprietary protocol that will detect this condition. Loopguard is what you would use if you didn't have Cisco switches at each end of the link in question. Based on the various design considerations, you can choose either UDLD or the loop guard feature. In regards to STP, the most noticeable difference between the two features is the absence of protection in UDLD against STP failures caused by problems in software. As a result, the designated switch does not send BPDUs. However, this type of failure is (by an order of magnitude) more rare than failures caused by unidirectional links. In return, UDLD might be more flexible in the case of unidirectional links on EtherChannel. In this case, UDLD disables only failed links, and the channel should remain functional with the links that remain. In such a failure, the loop guard puts it into loop-inconsistent state in order to block the whole channel.

Additionally, loop guard does not work on shared links or in situations where the link has been unidirectional since the link-up. In the last case, the port never receives BPDU and becomes designated. Because this behaviour could be normal, this particular case is not covered by loop guard. UDLD provides protection against such a scenario.

Loopguard is not able to detect misiwring problem but UDLD able to detect this and UDLD is using its own layer 1 keepalive message.

DHCP snooping -allowed confgration of trusted and untrusted ports ,trusted will sorurce all the DHCP messages and untrusted will source on DHCP request,if the rouge DHCP server tries to reply the DHCP request DHCP snopping will make this port shut. DHCP option 82 -in wich port number is also added in DHCP request.

SPanning port security feature only works if we have configured the port in statc access / trunk port ,it won't work with port in dynamic mode.we can bind the mac address with switchport port security command and if we use sticky what ever mac is learned over interface it will mannualy add to secure cam table and also add in running config.

Second option is mannaul create static enriers in CAM table.

Storm control feature - used to limit the amount of unicast /mutlicast /broadcast packet recieved on interface .Simmilar to polcier in MQC.

Port base ACL- is used to apply access list on layer 2 port but its only used to filter inbound traffic. We can also use MAC based ACL but that is only used to restrict non-IP traffic.

IP source guard ( layer 2 port ,Dyanmic arp inspection is for arp spoofing.

VLAN
VLAN -create a broadcast domain,PVlan allows splitting the domain into multiple isolated subdomains.

Private Vlans _ Promicious, Cummunity , Isolated

Promiciuos -Carry traffic for all the pvlans

community vlan -Can only talk to ports in same community vlan and its promiciuos port

Isolated -Can only talk to promicious port

Primary VLAN— The primary VLAN carries traffic from the promiscuous ports to the host ports, both isolated and community, and to other promiscuous ports.

for low end switches ,there is command switchport mode protected act simmlar to isloated vlan ,all those ports configured for protected donot talk to each other .Usually, ports configured as protected are also configured not to receive unknown unicast (frame with destination MAC address not in switch’s MAC table) and multicast frames flooding for added security.

Configure -

Vlan 1000 Private vlan primary

vlan 1012 private vlan community

vlan 1013 private vlan ISolated

vlan 1000 private vlan association 1012,1013.

configure ports

1 int fa0/1 swicth port private-vlan 1000,1012 -each host port is member of two vlans. switch port private-vlan host

2. int fa0/2 switch port private-vlan 1000,1013 -isolocated port switch port private-vlan host

3. int vlan 1000 private vlan mapping 1012,1013 -promciuos port

This example shows how to associate community VLANs 100 through 103 and isolated VLAN 109 with primary VLAN 5:

switch# configure terminal switch(config)# vlan 5 switch(config-vlan)# private-vlan association 100-103, 109

This example shows how to configure the Ethernet port 1/12 as a host port for a private VLAN and associate it to primary VLAN 5 and secondary VLAN 101:

switch# configure terminal switch(config)# interface ethernet 1/12 switch(config-if)# switchport mode private-vlan host switch(config-if)# switchport private-vlan host-association 5 101

Layer 2 COS
We need to enable MLS QOS, For switches we can do both the inbound and outbound queing ,whenever traffic hit the ingress port switch will first do cleassifcation/marking based on port configration ,then it goes to policer if configured to trasmit/remark/drop the traffic ,then it goes to inbound queing before it is transmitted .on swicthes when we enable MLS QOS and there is no trust boundary configured it will rewrite the traffic to ZERO.ss

Ingress/EGRess -Packets are mapped to queue bases on DSCP/COS value. If the port is an access port or Layer 3 port, you need to configure the mls qos trust dscp command. You cannot use the mls qos trust cos command because the frame from the access port or Layer 3 port does not contain dot1q or ISL tag. CoS bits are present in the dot1q or ISL frame only. If the port is trunk port, you can configure either the mls qos trust cos or mls qos trust dscp command. The dscp-cos map table is used to calculate the CoS value if the port is configured to trust DSCP. Similarly, the cos-dscp map table is used to calculate the DSCP value if the port is configured to trust CoS.

By default, the PC sends data untagged. Untagged traffic from the device attached to the Cisco IP Phone passes through the phone unchanged, regardless of the trust state of the access port on the phone. The phone sends dot1q tagged frames with voice VLAN ID 20. Therefore, if you configure the port with the mls qos trust cos command, it trusts the CoS values of the frames from the phone (tagged frames) and sets the CoS value of the frames (untagged) from the PC to 0. After that, the CoS-DSCP map table sets the DSCP value of the packet inside the frame to 0 because the CoS-DSCP map table has DSCP value 0 for the CoS value 0. If the packets from the PC have any specific DSCP value, that value will be reset to 0. If you configure the mls qos cos 3 command on the port, it sets the CoS value of all the frames from the PC to 3 and does not alter the CoS value of the frames from the phone. Queing for 6500 -

Receive queue -1p1q4t -One priority queue and 1 standard queue wth 4 threshold. 1p1q8t ,1q2t

Transmit queue -1p3q4t ,1p7q8

6500 Architecture
chassis -6503/6503-E ,6504-E,6506/6506-E,6509,6513 ( 13 slot chassis) Cisco has introduced new E series chasis. The first generation switching fabric was delivered by the switch fabric modules (WS-C6500-SFM and WS-C6500-SFM2), each providing a

total switching capacity of 256 Gbps. More recently, with the introduction of the Supervisor Engine 720, the crossbar switch fabric

has been integrated into the Supervisor Engine 720 baseboard itself, eliminating the need for a standalone switch fabric module.

The capacity of the new integrated crossbar switch fabric on the Supervisor Engine 720 has been increased from 256 Gbps to 720

Gbps. The Supervisor Engine 720-3B and Supervisor Engine 720-3BXL also maintain the same fabric capacity size of 720 Gbps.

6509 - Sup cards on slots 5 and 6 ,supported sup -sup32&sup720 6513-13 slots -sup cards on 7th and 8th slot ,sup32&sup720

The Supervisor Engine 720-3B and Supervisor Engine 720-3BXL also maintain the same fabric capacity size of 720 Gbps. 6501676

SUP32 -This supervisor engine provides an integrated PFC3B and MSFC2a by default.

cards.supports 6700 series line cards

SUp720-3B- same backplane capacity ,It incorporates new PFC3B for addtionnal funcationality ( mainly supports of mpls in hardware)

Sup720-3BXl-It incorporates new PFC3BXL ,It is functionally identical to the Supervisor Engine 720-3B, but differs in its capacity

for supporting routes and NetFlow entries.

Sup2T-incorporates MSFC5 (control plane functions) and PFC4 (hardware accelarated data plane function) cards ,2 Tbps Switch Fabric

PFC4 supports addtional featuers Cisco TrustSec (CTS) and Virtual Private LAN Service (VPLS).

The 2 Tbps Switch Fabric provides 26 dedicated 20 Gbps or 40 Gbps channels to support the new 6513-E chassis

SUP2T- All new 6900 series modules All new 6800 series modules (again, WS-X6816-GBIC is not one of those) Those 6700 series modules that are equipped either with CFC or DFC4 Some 6100 series modules

The control plane funations are mainly performed by route processor situated on MFSc3 itself includes running process for running

routing protocol ,addres resoltion ,maintaing SVI's ,...

Switch processor looks after switching funations building layer 2 cam tables .. ,all layer 2 protocols (SPaniing tree,VTP...)

MFSC -maintains routing table does not participate in forwading the packets ,it build cef table pushed down to PFC and DFCs.

The PFC is a daughter card that sits on the supervisor base board and contains the ASICs that are used to accelerate Layer 2 and

Layer 3 switching in hardware.

layer 2 funations -mac based forwading based on cam table, layer 3 functions forwading the packets using layer 3 look up.

Classic line cards support a connection to the 32-Gbps shared bus but do not have any connections into the crossbar switch fabric.

Classic line cards are supported by all generations of the supervisor engines, from the Supervisor Engine 1 through to the

Supervisor Engine 720-3BXL

Modes in SUP720 -RPR -state information is not in syc -time taken to switchover is 2-4 minutes ,traffic disrupption ,IO modules are reloaded.

reloaded. RPR+-state is partially intialized ... need a addtional information to have the sytem in sych.switchover time is

30 to 60 seconds ,IO modules are not reloded. SSO- fully synchronised.

do show redundancy to check the redundancy status

To set the redandancy mode redundancy keepalive-enable mode sso main-cpu auto-sync running-config

Sups supporting VSS- VS-S720-10G-3C * VS-S720-10G-3CXL* Sup2T


 * Stacking ,VSS have single control plane as master while vpc is having two independent control planes

Nexus Archetecture
Independant control and data plane, High availiabilty - Dual SUP, Power redundancy , line card reduandancy

7009,7010,7018

7009- 9 slots -Sup on 1 and 2 ,suppport of 5 Fabric chanel ,each fab channel provides 46 Gig backplane capacity so total of 5X46=230 per slot bandwidth

7010-10 slots -Sup on 5 and 6 ,suppport of 5 Fabric chanel ,each fab channel provides 46 Gig backplane capacity so total of 5X46=230 per slot bandwidth

7018-18 slots -Sup on 9 and 10 ,suppport of 5 Fabric chanel ,each fab channel provides 46 Gig backplane capacity so total of 5X46=230 per slot bandwidth

Sup supported -Sup1 which includes 4 VDC including default VDC -on default VDC you can allocate resource and perform data plane functions as well.

SUP2- 4+1 VDC- extra one is admin vdc just for allocating resoucres ,not passes data. SUP2E-8+1 VDC's- Require additional licence to add extra 4 VDC.

Lincards supported - M and F series I/O module

The initial series of line cards launched by cisco for Nexus 7k series switches were M1 and F1. M1 series line cards are basicaly used for all major layer 3 operations like MPLS, OTV, routing etc,however, the F1 series line cards are basically layer 2 cards and used for for FEX, FabricPath, FCoE etc. If there is only F1 card in your chassis, then you can not achieve layer 3 routing. You need to have a M1 card installed in chassis so that F1 card can send the traffic to M1 card for proxy routing. The fabric capacity of M1 line card is 80 Gbps. Since F1 line card dont have L3 functionality, they are cheaper and provide a fabric capacity of 230 Gbps. Later cisco released M2 and F2 series of line cards. A F2 series line card can also do basic Layer 3 functions,however,can not be used for OTV or MPLS. M2 line card's fabric capacity is 240 Gbps while F2 series line cards have fabric capacity of 480 Gbps.

There are two series of Fabric modules, FAB1 and FAB2. Each FAB1 has a maximum throughput of 46Gbps per slot meaning the total per slot bandwidth available when chassis is running on full capacity, ie. there are five FAB1s in a single chassis would be 230Gbps. Each FAB2 has a maximum throughput of 110Gbps/slot meaning the total per slot bandwidth available when there are five FAB2s in a single chassis would be 550Gbps. These are the FAB module capacity,however, the actual throughput from a line card is really dependent on type of line card being used and the fabric connection of the linecard being used. You can mix all cards in same vdc EXCEPT F2 card. The F2 card has to be on it's own VDC. You can't mix F2 cards with M1/M2 and F1 in the same VDC. As per cisco, its a hardware limitation and it creates forwarding issues. M & M1Xl series are used for creating layer 3 routing functions ,creation of SVI's ,fex ,OTV ,trustsec - example-M132XP f- layer 2 functions,fabric path ,vpc+, FCOE -F132XP, F248XP

The current shipping I/O module donot leaverage full bandwidth max is 80 Gig for 10 Gig module

In Ideal design we should have pair of M1 and F1 series module per VDC

Depending on line cards we have shared mode Vs Dedicated mode

Shared mode - All the ports in port group share the bandwidth

Dedicared Mode -first port in port group will get the entire bandwidth and rest of ports are disable

example -32 Port 10 Gig IoModule -N7k-M132Xp-12 and back plane capacity of 80 gig

Per port group will have 10 Gig bandwidth that can used as shared mode or dedicated mode

Port group is combination of contiguous ports in odd and even numbering.

1 Gig module require 1 Fabric ie is 46 Gig and 2 Fab for N+1 redundancy

10 Gig -require 2 FABric and 3 for N+1 redundancy

VoQ's -are virtual output queues ,is called virtual as it resides on Ingrees I/O module but represnt egress bandwidth capacity. VoQ's are managed by central arbiter.

Nex 5000 & 5500 - Mainly used for layer 2 only .(Access layer)

5000 -5010 ,5020 /''ju 5500 -5548, layer 2 only but supports for layer 3 card as well.

Nex2k- act as remote line cards for 7k and 5k .once we have connected the downlink ports from 7kor 5k ,enable the feature fex  parent swicth will automatically discover fex switch. we need to configure uplink port on parent switch with switchmode fex ,fex associate number.once the featuer is enabled and ports and cables are connected it start pulling the IOS from its parent switch.once the fex is online you can see the port number on parent swicth as int(fexassociatenumber)1/x .. .

Note - Downlink ports on parent switch need to configure with switchmode fex ,fex associate no ... and there is no configration required on ports on fex switch connectected uplink port.

Nex2k -Doesnot support local swictinig... if two host in same vlan connected to 2k are tring to communicate ,then communication will happen through parent switch.

These fexed ports are pinned to uplink connected to parent switch .All management is done from parent switch.

two types of pinning (Static pinning & Dynamic Pinning)

issue with static piining -Once the uplink fail b/w nex2k and parent switch all the piined fexed port need to mannual move to other uplink to make it operational while on dynamic piining its automatically redistribued

Nex 5k -Support static pinning and vpc when we connect Nex 2k.

Nex 7k - Not all the line cards support Fex, only support port channel when we connect Nex 2k to 7k

All the fexed ports are considered as edge ports from STP point of view and there is BPDU guard is enabled on this.

CFS- Cisco fabric services is used to syn configration and control box between chasis.

Mangement interface is out of band connectivity as this is separte management vrf.

VDC is virtual device context used for virtuallization of hardware ( both control plane and data plane )

Allocate resource in VDC - can allocate M1, F1 ,M2 but not F2 cards apart from its own vdc.

VDC 1 is default vdc - used to create / delete / suspend other vdc ,allocate resoucres ,system wide qos, ethanalizer ,NX-Os upgrade across all the vdc.

From default vdc we can use switchto command to move to other vdc ,switch back to return to default vdc.

Creating an Admin VDC:

Enter the system admin-vdc command after bootup. The default VDC becomes the admin VDC. All the nonglobal configuration in the default VDC is lost after you enter this command. This option is recommended for existing deployments where the default VDC is used only for administration and does not pass any traffic.

You can change the default VDC to the admin VDC with the system admin-vdc migratenew vdc name command. After entering this command, the nonglobal configuration on a default VDC is migrated to the new migrated VDC. This option is recommended for existing deployments where the default VDC is used for production traffic whose downtime must be minimized.

CMP port is associated in SUP 1 - used a console access to SUP as separte kickstart and system image then chasis.

Non default vdc has two separate user roles vdc admin - has read /write access to vdc vdc operator -read only access to vdc.

vdc high availiablity polciy - based on single sup / or dual Sup

Bridge Assurance and Network Ports
Cisco NX-OS contains additional features to promote the stability of the network by protecting STP from bridging loops. Bridge assurance works in conjunction with Rapid-PVST BPDUs, and is enabled globally by default in NX-OS. Bridge assurance causes the switch to send BPDUs on all operational ports that carry a port type setting of "network", including alternate and backup ports for each hello time period. If a neighbor port stops receiving BPDUs, the port is moved into the blocking state. If the blocked port begins receiving BPDUs again, it is removed from bridge assurance blocking, and goes through normal Rapid-PVST transition. This bidirectional hello mechanism helps prevent looping conditions caused by unidirectional links or a malfunctioning switch.

Bridge assurance works in conjunction with the spanning-tree port type command. The default port type for all ports in the switch is "normal" for backward compatibility with devices that do not yet support bridge assurance; therefore, even though bridge assurance is enabled globally, it is not active by default on these ports. The port must be configured to a spanning tree port type of "network" for bridge assurance to function on that port. Both ends of a point-to-point Rapid-PVST connection must have the switches enabled for bridge assurance, and have the connecting ports set to type "network" for bridge assurance to function properly. This can be accomplished on two switches running NX-OS, with bridge assurance on by default, and ports configured as type "network" as shown below.

Cisco Nexus 7009-- sUP IN slot 1 and Slot 2 Cisco Nexus 7010-- Cisco Nexus 7018-- Line card Capacity differ in diffrent modules...

Two type of line cards are available :

1) M sERIES: Layer 3 cards--svi, ospf, otv, Can be layer 2, Trust Sec Fex

2) F Series : Layer 2 cards only F2 SUPPORT fabric Path, VPC+, FCOE

Cisco Nexus 5k :Used Mainly layer 2 switches

5000--5020 and 5010

5500--5548 and 5596

Nexus 2k: Remote line card

VDC: Seprate control plan per vdc Why use vdc : Diffrent roles per chassis per vdc Multiple tenanats Test users for later production use

vdc limitation : 4 vdc in sup 1- vdc 1 default vdc used for allocating resoureces ,system vide Qos ,passes traffic as well. 4 vdc + 1 in sup 2- 4 vdc passes traffic but 1 is admin vdc only for allocating resources does not passes traffic. 8 + 1 vdc in sup 2e

Default vdc 1 is default can not be removed.

VPC-two independ control planes ,to syn the state information we use VPC Peer link.when we create vpc domain id ,it creates unique system mac for the peer devices that present the downstream devices as a single logical unit ,the system mac information is present in LACP system id which is combination of priority and mac address. In case of orphan ports each vpc peer uses it own local system mac.

based on lowest priorty there is selection of VPC primary and seconday, in case of primary goes down and recovers vpc secondary will still be seconday peer however operational role will primary.

Prmary peer is responsible for reply to arp broadcast and stp BPDU information.

Two type of inconsistency while formation of peers.

type 1 - global which incldues stp parameters ,stp mode ,port type ,stp revision no        - vpc interface configration ( speed ,duplex ,allowed vlans, on/active)

type 2 - HRSP sysncronsation ,mac age time for BPDU ,glbp syscorinsation ,SVI ,IGMP snooping ,vlan data base... etc.

VPC Peers - two phsyical switches .vpc design is possible only when we have two remote peers.

Peer link -to sys control plane information ,layer 2 link simmilar to VSL link in VSS. use CFS ( cisco fabric services to sys the control plane information). Peer link- not used for data plane.

keepalive link- Layer 3 link to make sure both the chasis are active ,as heartbeat in the control plane use UDP ping also used to prevent split brain situation(dual active).

VPC member ports -downstream ports facing towards end servers that is forming port channel used for data plane forwading.

from spanning tree point of view we don't have any blocking ports. what ever vlans we are allowing on VPC port channel same no of vlans should be allowed on peer link.

order of creation of vpc.

1. establish ip connectivity for VPC Keep alive link 2. enable feature for vpc and lacp. 3. create vpc domain. 4. define keepalive peer address 5. create port channel for vpc peer link -if this goes down then there will be services failure so should have redandancy. 6. verify vpc consistency paramteres .( speed ,duplex ,allowed vlan on member ports ) 7. disable vpc menmber ports 8. configure vpc member port 9 enable vpc member port.

VPC Peer link get down -

The range of values is 1 to 65636, and the default value is 32667. The switch with lower priority will be elected as the vPC primary switch. If the peer link fails, vPC peer will detect whether the peer switch is alive through the vPC peer keepalive link. If the vPC primary switch is alive, the vPC secondary switch will suspend its vPC member ports to prevent potential looping while the vPC primary switch keeps all its vPC member ports active. when the vPC peer-link is down then both vPC peers will not be seen or acting a one virtual switch to the downstream switch and this will revert back to traditional STP and may cause a potential loop as well as the downstream switch is multi homed and this will end up to L2 loops

second case once the secondary vpc peer disable its member port and vpc primary peer got failed ? in this case both the chassis will be disabled and traffic will not be passed ...

vpc auto recovery feature is there to avoid above situation.

Senario 2: peer-link up and running, keepalive link down

Q1: According to the cisco official docs, it seems that nothing is affacted by this failure and the only reaction is that peer-link will act as keepalive link temporarily. So, end users will not be aware of this failure at all, Am I right?

Yes, cfs still running in the peer-link.

Senario 3: both peer-link and keepalive link are down, split-brain scenario will be formed

orphan ports - like if we have server connected with single link on vpc domain can lead to issues for that server for rest of the connectivity.

in ideal design use M cards for Keepalive link and use pair of F cards for peer link.

VPC roles -when we configure the vpc for port channeling one of chaisi act as primary other as a seconday if both the chasis are primary then we have dual active conditon or split brain condition.

VPC Loop avoidance -if the frame arrive through vpc memeber port crossing vpc peer link it will not allow to exit through member port unless all the remote members ports are up. ALso vpc loop avoidance is reprogammed in dataplane based on state information of vpc member ports.

In double-sided vPC, two access switches are connected to two aggregation switches whereas in single-sided vPC, one access switch is connected to two aggregation switches.

VPC+ -running fabric path and vpc together.

FHRP- In case of HSRP/VRRP/GLBP using vpc ..when we configure one chais as active and other as standby for Layer 3 gateway ,some how remote host forward the traffic to standy VPC peer ,the standby vpc peer will not forward the traffic using peer-link for active HSRP gateway address istead it will forward out the traffic for the destination out its ports as if it is active member of HSRP group ,so both acts as active -active .Peer gateway feature must be enables while configering vpc with HSRP to avoid traffic over PEER link.

Fabric path -layer 2 routing ,eliminates need of STP also called mac-in-mac out routing.

Terminology in fabric path 1.Leaf switch -connects CE domain to FP doamin 2.spine switch -All the ports are in FP domain 3.FP core ports -links omnm leaf to spine or spine to spine switches (switches in core ) command line -Switchport mode fabricpath. IS_IS used in the fabric path core for layer 2 routing.

Advantages of IS-IS-Uses its own layer 3 tranport protocol and IPV4 or IPV6 is not required.

Fabric path uses concept of Fabric path switch id -automatically genrated -simmilar to ospf router id and the switch ids are part of new TLV's defines in FP protcol.

Fabric path support ECMP-Equal cost mutlipathing.. Fabric path is simmilar to Trill but FP is cisco proietry feature and trill is open standard.

Switch id -identifies the node in shortest path tree.

To mannual assign switch id -Fabric path switch-id

Fabric path data plane - CE frames are received on classical ethernet doamin are encapsulated with Fabric path header. hardware supported for fabric path -nex7k F1 and F2 cards and nex 5500 only ,5010 ,5020 donot suppot fabric path. Traffic is forwaded in FP domain used source and destination switch id's and SPT is calculated uses the same IS_IS or ospf routing protocol.

fabric path uses convention mac learning to learns the mac of soucre/destination.it will not learan mac as mac being leanred in traditional mac learning during arp flooding ,Spine swicthes will install the mac address in CAM table when there is bidirectional communication and ARP is send as unicast to get mac of remote host for layer 2 encapsualtion.

please note leaf switches must be root brides of spanning tree and they are demarcation point for st,spanning tree is not extened over FP. commands for fabric path

1. install feature-set fabric path

2.feature-set Fabricpath

3. under vlan command mode fabricpath- those vlans that need to tunneled over fabric path.

4. leaf switch - switchport mode fabricpath- for the upstream ports facing spine swicthes .. spine switch -all the ports.

Fabric path runs on conversational mac learning insted of tradional mac learning ,in tradition mac learning one host need to send traffic to other host will send arp broadcast and switches in transient path populate the mac table.

in conversational mac leanring -all the devices in FP domain will have mac address of conntected host ,so if one host needs to communicate to other host over FP domain,it will initiate arp request and then mac address of remote host get populated on FP domain swicth and communication take place.

fabric path we don't get layer 2 loop and also they have TTL vlaue in data field to prevent indefinate loop if it occured.

OTV--layer 2 data centre interconnect technology ,layer 2 vpn over ipv4.,for oTV we should have understaning of SSM,ASM.

To setup FCOE

1.Enable FCoE on the switch. 2.Map a VSAN for FCoE traffic onto a VLAN. 3.Create virtual Fibre Channel interfaces to carry the FCoE traffic.

switch(config)# feature fcoe

switch(config)# vlan XXX switch(config-vlan)# fcoe vsan YYY switch(config-vlan)# exit

switch(config)# interface vfc ZZ switch(config-if)# bind interface ethernet 1/ZZ switch(config-if)# no shutdown switch(config-if)# exit

switch(config)# vsan database switch(config-vsan-db)# vsan interface vfc switch(config-vsan-db)# exit

F5 Trainging
LTM How BIG IP process Traffic

Node -represent the Ip address Pool member -combination of Ip address and port number ,in other words pool member is application server on which F5 will redirect the traffic Pool-combitnation of pool memeber.

Virtual server -combination of virtual IP and port ,is also know as listner and we associate virtual server to pool members.

load balacing mehtods
static -Round robin ,ratio Dyanamic -LFOPD (least connection ,fastest ,observed,predective,dyanmic ratio )

Least connection -load balacing is based on no of connection counts ,if the connection counts are equal it will use round robin

Fastest -No of layer 7 request pending on each member.

Observed -ration load balacing method but ratio assigned by BIG IP,No off least connections counts BIG IP assign the request and check dyanamically and assign the ratio's of the request.

Predective -similar to oberved but assigns the ratio agressivley based on average connection counts.

load balacing by poolmember or node.

Priorty activation -helps to configure back sets for exsiting pool members .BIG Ip will use high priorty pool member first.

Fallback host is only used for HTTP request ,if all the pool memebers are not availiable BIG will redirect the cilent request

Monitors :check the status of nodes and pool memembers ,if any pool meember resposnse time is not good or is not responding big ip will not send the request to that node.

monitor type :

adress check -BIG IP send ICMP request and wait for reply if there is no reply it considers nei down does not send the trafic further to that node.

service check -will check TCP port number on which server is lisenting ,if no responce it considers down

contect check -we can check if the server is resondping with right contest ,like for http requset get/http .... request is send.

interactive check -TEST for FTP connection .once connection is open username and  password is send then request is send get /file once file is recieved  connection is closed.

F5 recommends time out = 3n+1 (frquency) for setting the monitor for http

Customization of monitor

Assign nodes to monitor

Profiles -defining traffic behaviour for virtual server.

Profiles contains setting how to process traffic though virtual servers.if for certain application BIG IP load balace the traffic then it will break the client connection to avoid this we use perstiance profile so that return request for the cilent is send to same server.

persisteance profile - isconfigured for clients and group of cilents how BIG IP knows the returning client request need to send to same server ,persistance profile is confiured taking source ip address of http cookie

SSL termination

FTP profile

All virtual servers have layer four profile includes tCP,UDP,fastl4

Profile types -service profile ,persistance profile ,protocol profile ,ssl profile ,authentication profile ,other profiles.

Persistence types
source address persistance :keeps the track of source ip address ,adminstrator can set the net mask in persitance record so that all lients in same mask will assigned to same pool member.

Limitation -if the client address being NAted.

Cookie persistance -only uses http protocol

Three modes : (insert ,rewrite ,passive ) mode.

Insert mode -BIG ip create special cookie in HTTP resonse to client. rewrite -pool member created blanl cookie and big ip inserts special cookie passive -pool memeber created special cookie and BIG IP let it pass through

SSL Profile
SSL is secured socket layer.

website which uses HTTPS we need to us SSL profile as traffic is being Nated for source clients and web app is using https protocol. Using SSL termination BIG can decrypt the traffic and assigned to pool member.

BIG IP contains SSL encryption hardware so all the encruption and key exchange are done in hardware .centralized certifiacte management.

I rule
I Rule is a script that direct traffic though BIG IP, based on TCl command language .I rule give controll of inbound and outbound traffic from BIg IP.

I rule contains follwing events ( I rule name ,events ,condtion ,action )

Multicasting
Ranges

224.0.0.0/4 - 224.0.0.0 -239.255.255.255

Link local address - 224.0.0.0/24

Source specifc multicast -232.0.0.0/24

Administrativley scoped -239.0.0.0/8

Multicast control plane work differntly than unicast routing ,it needs to know who is sender of mutlicast and to which group ,also the reciever of multicast.

Multicast Data plane -do RPF check ( was traffic received on correct interface and bulid multicast routing table ).

Multicast is source based routing.

IGMP -Host on LAN singanl the router to join the mutlicast group.

Two kind of request - (*,G)-Any source who is genrating the mutlicast stream for that group -Supported by IGMP V1 and V2                      (S,G)-want to join particular source sending the mutlicast group .-IGMP version 3 support both (s,g and (*,G)

IGMP get enabled when the IP PIM [ Dense mode,sparse mode,SParse-DENSE-mode) is enabled.

BY default IGMP version 2 is enabled.

IP IGMP join group address can be used for testing on routers to see weather muticast traffic is recieved on router for particular group.

ip igmp static group command can be used to mannually put the request for particular mutlicast group insteaed of reling on IGMP queriy messsage for particular group.

PIM- used to siganl routers to bulid muticast tree ,tree could be sender to receiver or sender to rendevpoint--- receiver.

PIM version 1 or 2 ,By default its PIM version 2, RP information is already encoded in PIM packet in version 2. PIM version 2 has field for BSR.

DENSE mode - Implicit join ,mutilcast traffic is send across entire network unless if some one report for not joing the particular stream.Flood and prune behiviour. Nighbor discovery on multiicast address 224.0.0.13 same for sparse mode as well.

Note if we have (*,G) entery then we know about reciver and if we have (S,G) entry then we know about sender as well.

Two ways to genrate mutlicast traffic either through pinging mutlicast address or through IP SLA. IN PIM dense -through RPF nei information is used to send unicast packet back to source ,message could pim prune or graft message .when the multicast source flood the traffic for particular multicast groups each multicast enable router will install (S,G entry) and (*,G) entries even if they are not intersted.

So in dense every router needle to install (*,G ) and (S,G) entry as we canot have (S,G) untill we have (*,G) entries.so if the source is active every router need to maintain the state table for mutlicasting.

Graft message for (S,G) entry is to unprune the mutlicast traffic as earlier it was set to prune.

State refresh to keep the link prune as its original state.

SParse mode -uses explict join unless it is asked by someone to join mutlicast traffic uses RP as reference point.In case we are using source specific mutlicast we don't need RP.for Group specfic joins we need RP.Traffic is not send anywhere unless it is requested .Sparse mode uses both source based trees and shortest path trees RP needs to know the recievers and senders. DR on lan segment send (S,G) register mess age to  and RP in turns reply regiester stop process and recievers on lan sengment send IGMP join and which will be converted to pim join(*,G) message to RP to form RPT tree.So pim join will traverse from receiver till RP every device will have (*,G) entry and from source till RP every device will have (S,G) entry.once RP knows about sender and reciver it will send (S,G) join request back to source and source would start sending the mutlicast traffic to RP then to receiver.then its up to the last hop reouter on reciever side for the optimation process weather it want to join directly to source using SPT bypassing RP.

Note -When we do debug only process switchd traffic is debug if we want to debug the data plane traffic then we need to disable cef (no ip route cache),if we change the unicast routing it will also change the mutlicasting routing,To change the unicast routing we can also use Ip mroute command.

Source based tree- tree is bulid based on shortest path from reciver till sender. shared tree -tree from sender to RP and then RP till receiver.

To check RP configured on each transient router -sh ip pim rp mapping RP can be assigned staticaly (ip pim rp address ) or dynamically ( auto RP and BSR)

Auto RP -uses two data plane mutlicast address (224.0.1.39) advertised by routers willing to become RP to mapping agents , 224.0.1.40- chooses the RP and advertised to rest of routers for RP information.

To stay on shared tree rather than SPT ( ip pm spt-threshold infinity)

SParse-dense-mode -ANY group for which we have RP assigned used sparse mode for other uses dense mode.

RPF check is used for loop free path in mutlicast data plane ,AS per RPF check if the mutlicast packet is received on incoming interface router will check the unicast routing for source and that matches the incoming interface RPF check Passes else fail.

Once the mutlicast routing table is populated router always prefer (S,G) over (*,G) and in muticast routing table we have incoming interfaces and OIL for outgoing intefrcae list if the RPF check passes mutilcast traffic is send across all interfaces in OIL.

On multicast router -sh ip igmp group -- shows which multicast group is active on ethernet and which receiver has joined the group

To determine which router is IGMP querier router - sh ip igmp interface EO

We can manauly tune the query interval and query max response time - query interval - ip igmp query interval 120 (default 60 sec) respose time - ip igmp query-max-response-time 20 (default 10 sec)

IOS command to support which version of IGMP is - Ip igmp version 1/2

Test commands for IGMP

ip igmp join group

ip igmp static group

for sparse mode we need to assgn RP - ip pim rp address x.x.x.x

inorder to check if there are any rp mapping - sh ip pim rp mappings

Inoder to check for mutlicating packet conuters- sh ip mroute counters

In sparse mode there is SPT switch over shorted path tree

for the SPT threshold we can set the threshold on DR muticast router that is receiving the IGMP join request in gloabl config mode ip pim spt threshold (vlaue)- Value is volume of multicast feed

if the Rpf check is failing we can still have interface to forward multicase by static mrouter ( ip mroute server mask next hop address )

Security
when we enable aaa new-model it will check for local autentication for line vty lines ,but we can log in through console ,Dot1x only works in conjection with radius configration and provide autthetication between client and switch

Wireless
802.11 a/b - 5 ghz with 54 mbps and 2.4 ghz with 11 mbps 802.11 g- 2.4 ghz ,54 mbps transmission rate 802.11 n -2.4 or 5 ghz with 600 mbps 802.11 ac -5 ghz with 1 gbps

WLAN covered area -100 m /300 ft.

WMAN-802.16 -WIMAX to cover large geographical areas. SSID- AN AP can broadcast mutliple SSID over a single channel.

Roaming -client connected to one coverage area is moving to other coverage area (one AP to other AP).

Higer than frequency lower the wavelengh. AMplitude -there can be different level of powers to inject the signals in air.

SNR is measured in DBM ,its signal to noise ratio ...-50 SNR is good signal strength. SSID are maps to vlans either at WLC or autonomas AP but there is different encryption domain per SSID.As the wireless ,no separte broadcast doamin on wireless ,every one listem to management frame and discard the frame if it is not intented for them.

WCS manage the WLC ---AP's.

CUWN- manages all the AP's ,through WLC and Most of config is done through WLC.

LWAP and CAPWAP - relay lot of information from AP to WLC that coverage ,interference that AP is expering ,client data encapsulated in CAPWAP protocol ,ALSO information about RSSI and SNR what client is getting.

LWAP -12222 &12223 (control and data),IPV4,encrpt control data only

CAPWAP-5246,5247,IPV4 and IPV6,NAT traversal,uses different option for DHCP based AP association,encrpt data via DTLS (datagram trasport layer securty ,P-MTU discovery.

CLient data is send to AP - relay to WLC ,WLC decpatulate 802.11 header and elcapsulate 803.1 header with 803.1Q TAG based on SSID on which traffic is received.

WLC -performs RRM -dyanamic channel assignments.

WLC modes -LAyer 2 mode -AP and controller in same subnet (not used at all now )and LAyer 3 subnet -AP and controller are in differnt subnet.

CLient roaming -

Layer 2 roaming - that includes moving over from one AP to other and intercontroller movemnet -in case of interconroller client data base is send across to other contoller so the client is not required to reauthicate again.

Layer 3 roaming -controler on different subnet.when client moves from one controlelr to other controller in differnt subnet ,first controler see the client has moved to other controller and does the mobiltiy anoucement so copy of client data base is send to second controller with the entry marked as  anchor and remote controller marks it forigen entry,howver client address is still retained ,all the traffic sent over to second controller will send acorss to first controller as tunneled traffic and forward to destination.

Requirments for seamless roaming - controllers in same moblity group,same SSID,same code of version ,same acl ,same virtual ip address,same capwap mode.

Anchor mobility - is used in Guest access to send the traffic from guest users to specifc WLC in enterprise enviorment.

Wirless security- symetric encrption -both using same key at both the ends to decrypt the traffic. Asymetric encryption -uses pair of public and private key ,data will be encrpted using public key and sent to me and I will use private key to decrpt the data (public certificates).

Layer 2 authication and encryption-

open authtication -open to all.

WEP( Wired equivalent protocol)- authetication is open or shared ,confidentiality is mainted through CRC check ,Encryption using 128 bit key.

setps -client sends authentication request ,AP respond by sending clear text message ,client encrypt using encrypted packet and respond to AP,AP comapre the response using static wep key.

LEAP- light weight extensible protocol -Better than EAP ,but it was nonstandar so depricated.

WPA-Wifi protected access (WPA) -uses 802.1x for key managment and authentication,TKIP for encryption and data integrity.

WPA2-802.1x for key managment,strong encrption methods using AES-CCMP ,was designed using AES in mind.

802.1x -origanlly used for wired network for authtication ,uses radius server for centralize management and uses EAP protocol for communication. EAP protcol is used between client and AP ,Radius protocol is used between AP and Server. Steps for 802.1x 1. client send authentication credentials to AP ,AP in truns forwards to ACS server using RAdius protocol ,Radius server send the respone back to AP ,AP in tuurns to client. 2. CLient then send the challenge to Radius server ,Radius server respond with validation ,client and radius server dervie the unique session keys ,passed to AP and AP cache the key which is used for encrpting data between client and AP.

EAP is used in genric terms ,actuall implemenation are EAP-FAST ,EAP-TLS,CISCO LEAP.....

like EAP-FAST uses  active directory while EAP-TLS uses certificates.

Where encryption happend -Between AP and WLC form secure CAPWAP-DTLS tunnel using manafacture install certificates at both the devices to genrate public keys .WPA2/AES with PSK or 802.1x is used between client and AP.

CCIE Datacenter
Nexus Swicthes MDS Switches- 9222i _running NXOS version.(Multi layer director switches (9200/9500),Support native fiber channel swicthes of TOR access or EOR aggregation, support of FCOE,FCIP ,ISCSI.

ACE-4710 _local application load balancing.

ACE -GSS 4400 -GSS _Globl site selector-DNS based global load balancing.

UCS-6248 Fabric intraconnect

C Series -Rack mount server

UCS Fabric intronnect -COntrol and managment for C and B series server.

B Series- Blade server chaisis,

DCNM-(Data centre network manager)

Nexsus 5k- End of row (EOR) aggregation or top of RAC Access and doesnot have reduandant Sup ,Support Unified IO support FCOE and native FC swicthing. Mainly support Layer 2 swicthing but there is add in module for layer 3 in 5548 &5596. Supports of Unified IO means support both ethernet and FCOE ,there is also 5548UP and 5596 UP support unified ports. Storage -like we create vlan and use VTP to advertise the vlan information we can use CFS to distrbiute the zonning information to other switches ,it like acl in wan.

Nexus 2 K - TOp of rack access swicthes.Support Unified IO support FCOE and no native FC swicthing like nex7k. Parent switch could be 5k or 7k.No local switching ,its VN TAG/802.1 BR SWITCH. Nex2k just uses the VN Tag to forward the frames.doesnot look at layer 2 header.

Nexus 7k having redandunt SUP use graceful restart/ NSF to signal other devices in network that switchover is taking place ,Goal is to keep sending the traffic during swicthover

Basically two roles in Graceful restart/NSF 1. NSF Capable device signal remote peer about the switchover and send a grace LSA like type 9 Opaque LSA in OSPF ,which will signal remote peer to hold down the control plance information until the switchover to take place.Traffic will contunue to send using data plane line cards.

2. NSF helper device- which understand the Graceful retart signals.

NS-OS ISSU & ISSD -SSO allow software upgrades or downgrades whithout traffic disruption.

1. Download image to flash 2. Upgrade the stanby sup 3. DO SSO standby sup become primary SUP -Command line (System switchover)

EPLD -Firmware required to upgrade the cards for certain functionality.- to check show version module x epld. 4. Upgrade the software on standby SUP.

OTV-Overlay Transport virtulization - Layer 2 VPN over IPV4,Layer 2 data centre intraconnect -Used for Virtual machine vmware mobilty.Same subnet across differnt datacentre commnuicating using OTV. Helps in optimisation of ARP request. STP is not spanned over DCI.

1. OTV edge device -Device running OTV feature. 2. Authorative Edge device - if we have mutliple devices for redanducny purpose and they are used to forward traffic for same set of vlans ,so AED is active router for that vlan and there is a election. 3. Extend vlan -VLans that are briging over OTV. 4. Site vlans -Internal vlan that is not extend over OTV. 5. OTV site identifier - Unique per DC site Shared between AED's. 6. Internal interface - Where the end host traffic is received. 7. Overlay interface -tunnell interface taht perform OTV encapsulation 8. OTV join interface -Layer 3 physical link taht is used to route the traffic upstream towards DCI cannot be loopback or SVI.

otv cONTROL GROUP -mUTLICAST ADDRESS USED TO DISCOVER remote sites in control plane .Uses IS_IS to advertise mac address b/w AED for the end devices. MAcs are advertised as mutlicast control group ,so DCI must support ASM.

OTV data group -to tunnel mutlicast traffic over OTV data planne.it uses SSM so AED devcies must run IGMPV3 to join source specific mutlicast.

OTV Adjacny server is used to remove the mutlicast requirment in the middle. one of AED is choosen as OTV adjancy servers and other AED register with AED server.Now all the end points are known.In this case all the data and control traffic is unicast as all the end points are known.

NX-OS -Storage Archtect
Stoarge High level componets : 1. Storage Arrays -physical dis -Block level access to servers.RADI -Physical writing is done on multiple disc for redundancy purpose. Please note SAN is differnt from DAS or NAS (anything which we need to login for file shares).

2. SAN swicthes -nex 5k,7K,MDS .MDS SWicthes can support mutltiple protocol conversions like IP routers like FC to FCOE,FC to ISCSI and FC to FCIP vice versa. Nexsus -support fibre channel ,doesnot support FCIP or ISCSI which is IP based.

3.HBA-Host Bus adapter-Server interface inorder to address storage network.Basically NIC card for SAN- used to connect servers and stoarges to SAN swicthes

NAtive fibre channel HBA- speed 1/2/4/8/16 GBPS. ISCSI HBA's -regular ethernet NIC cards but support ISCSI offload ,1/10 GBPS.

FCOE - CNA -unified IO - would be using in UCS C series chasis ,so this card can be used as ethernet or as FCOE traffic.

Nexsus 5558UP and 5596 UP will be used support both ethernet 1/10 GIg or 1/2/4/8 NAtive Fiber channel. to change the port type from ethernet to FC- slot 1 ---> port 24-34 type FC.

Nexsus is etherent switch does support FCOE but not navtive FC .-Supported on F1 and F2 modules. in case of F2 require sup 2/2E with.6.1 (1) or above.

Please note whatever model of hardware we use for storage should be in its OWN VDC.

Fiber channel - rplaces the tradditon locally attached disc with SCSCI cable over disc accessible over network. fiber channel is protocol static used to send SCISI data over SAN.

Fiber channel port types : N_port - Node port -END host where traget or initiatore resides in P2p topology. NL_port- End port in artibary loop topology. F_port- Fabric port which is a switch port. F_L port-Switches connected to NL_port. E_port-expansion port-Inter switch link- simmilar to trunking in ethernet. TE_port- Simmilar to dot1Q in ethernet.-used to trunk multiple VSANS across swicthes.

Fiber channel addressing : WWNs- 8 byte address burned in by manyfacturer.

WWNN - physical address of server, switch ,physical disc WWPN-physical address of ports of server, switch ,physical disc used in zoning to limit the traffic. HBAS in SAN world have mutliple physical ports can be used to access same physical disc denoted as mutliple VSANS.Each of these multiple ports get its own port name.

FCID- 3 byte logical address assigned by Fabric.Is used in data plane swicthing. consist of three parts 1. Domain ID -each switch gets its own domain ID.-identify switch in Fabric.is assigned by principle switch simmar to spanning tree root bridge or can be assigned mannualy. 2. Area id -set of ports on switch have a area id 3. port id -End station connected to switch have port id.

Fibre channel routing - Fabric shortes path first protocol is used to route traffic between swicthes.simmilar to router ids in ospf we have domain ids in FCID. support ECMP.runns automatically as Fabric service.

fiber channel is connection oriented aservice means end station need to register in the control plane of fabric before sending the traffiic. Registration consist of three parts 1. F logi -where Nport register with F port of fabric ,switch learns the WWNN and WWPN and assign the FCId to the node. can be checked using sh flogi database.

2. Plogi- initiator tells target it wants to talk. 3. process login ( PLRI) -Uperlayer login between node ports.

Fiber channel name server- Is simmilar to arp chache .-used to resolve WWNM to FCID.,this doesnot require configration.

VSAN -is the virtulization of SANs simmalar to VLAN.fiber channel service is going to run per vsan basis means (FLOGI,FCNS,Zoning etc per vsan).Isolates the management and failure domain.

Zoning is simmailar to acess-list in ip world,controls which initiator can talk to which Target and is required not optional.By default zoning policy is deny.

SOftzoning - Initiator register with fcns to get the zoning information.zoning is enforsed in control plane only ,intiator can mannualy mount the wrong Target.

Hard zoning - Initiator register with fcns to get the zoning information.zoning is enforsed in control plane and dataplane ,intiator cannot mannualy mount the wrong Target.Nx-OS /SAN-OS runs this by default.

Zone/zoneset -Zones is simmilar to acl entry is called in zoneset ,one zoneset can be applied to vsan and activated.

FC alliases -makes zoning configration simplier and can be distributed through mannual zone distribution. Device aliases are advertised through CFS.

SAN port channel -simmar to ethernet port channel, uses same numbering as ethernet port channel.Sh port channel usage command in nexsus to check unused port channel numbers.

Tyopically support three interface :

1. FC 2. Iscsi for low to mid range array 3.native FCOE (Newer arrays)-which can be direclty attached to fibre channel forwading switches ( like nex7k and nex5k).

Node port virtualization- as there is limitaion of domain ids max to 256 out of which some are reserved and only 239 are availiaable ,fixes the domain id problem by remving the switch not to be participated in Fabric services.switches that run NPV appear to rest of fabric as end host Upstream lport on NP swicth is called N_P port (proxy node port) and downstream port on NPIV switch is called F_Port.

Upstream switch or core switch runs NPIV feature and downstream device run in NPV mode and in NPV mode it allocated mutliple FCID's per port basis.

IPstorage features- FC vs FCOE -uper layer protocol will be same just the difference in layer 1 and layer 2 transport ,in FC it will be FCP in upper layer and layer 1 and layer 2 transport is FC while in FCOE ethernet is at layer 2 transport. FCIP was designed to do ISCSI read and writes over long distance which consist of FCP at upper layers followed by TCP then Ip headed then ethernet header.Native FC is sensitive to latency and drops so FCIP was designed.

ISCSI-inerternet small computers system interface-sending scsi commands over IP.completly different protocol stack than FC.used to small to medium range SAN's.No dedicated SAN swicthes required.ISCSI is not SAN switching ,storage is running IP, end host running IP, transport is running IP.MDS is used on protocol conversion between FC and ISSCI.

FCOE-is also called converged ethernet ,unified fabric,unified wire.

FCOE intiation protocol is FIP port that is end host. FCOE forwader (FCF) that is fabric switch. the device starts the negotiation is ENODE and uses virtual fiber interface on switch for registration purpose.

different port tyype - V_N -node port -end host side. V_F-virtual fabric port -switch port V_E- for trunking between switches.

IN fCOE we are replacing layer 1 and layer 2 transport as ethernet while all the upper layer services remaian the same which includes FCID,domain ids ,FCNS,FSPF.. etc. FIP is a control plane protocol used for negotition and uses a ether type 0x8914,used to discover FCF and perform Flogi. fcoe uses separate ether type 0x8906.

FCOE addressing- as ethernet uses 6 byes but FCID is 3 bytes so to make it 6 bytes ,Switch is configured with FCOE mac address that is appended to FCID result is 6 byte FPMA.

configuring FCOE-

1.Configure VSAN 2 Associate VSAN to VLAN 3. configure virtual fiber channel interface 4. Bind physical interface to VFC 5. assign vfc to VSAN 6. configure physical interface as trunk to support ethernet lan traffic ,7. activate interfaces.