Transcript
Generic Packet Switch 구조
Traditional Router Architecture
Problems of Traditional Router
Forwarding scheme의변화
Gigabit Router의구조
Gigabit Router의핵심기술
wire-speed forwarding
wire-speed switching
wire-speed QoS-providing
Gigabit Router의응용
Internet backbone router
Enterprise(Campus) Network backbone router
Cisco Solution분석
Catalyst 8500 series
Cisco GSR 12000 series
Conclusion
3
Generic Packet Switch 구조
Packet switch(ATM switch, LAN switch, IP Router)
Forwarding Function
Switching Function(Fabric + Buffer)
Routing/Signalling/ mgt Function
Inputport i
output
port j
“도착(ATM cell/MAC frame/IP packet)을
어느outgoing interface(output port)로
포워딩할것인가?”
Forwarding decision후,
이cell/frame/packet을해당목적지outgoing interface
(output port)로젂달해주는기능
“어떤방법과구조로, 얼마나빨리?”
Routing protocol실행시켜route를계산하고routing table을생성및관리시스템관리, 구성관리,
4
Routing Function
Packet
Frame
IP Router
Packet
Packet
Packet
Frame
cell
ATM Switch
cell
cell
cell
Packet
LAN Switch
Packet
Packet
Packet
Forwarding Function
Switching Function
(Fabric + Buffer)
Routing Function
Inputport i
output
port j
#2
#2
#2
차이점: protocol (IP, ATM, Ethernet)
#NAME?
IP: RIP, OSPF, BGP
ATM: UNI, PNNI, Q2931
Ethernet: STP
#NAME?
5
Forwarding Function
Packet
IP Router
Packet
cell
ATM Switch
VPI/VCI table
{in port,VPI/VCI, out port,VPI/VCI}
cell
Packet
LAN Switch
MAC SAT table {MAC addr,port#,age}
Packet
#2
#2
#2
Switching Function
Switching Function
Switching Function
Switching Function
(Fabric + Buffer)
Routing Function
Input
port i
outputport j
Forwarding Function
lookup
Internal routing tag,rewrite
lookup
Internal routing tag,(new VPI/VCI)
lookup
Internal routing tag
IP routing table
{dst prefix, next hop, outgoing interface, MAC addr…}
6
Switching Function
Output Buffering
Shared Buffering
Input Buffering
Forwarding Function
Switching Function(Fabric + Buffer)
Routing
Function
Input
port i
outputport j
#2
2
2
2
2
2
2
Fabric
Buffer
Fabric
Buffer
Fabric: shared bus, shared memory, crossbar, Banyan, multistage interconnection network, …
7
Various switches
FTVPI/VCI
ATM
SW H/W
PNNIQ2931
FT
VPI/VCI
ATM SW H/W
IP
IFMP
FT
VPI/VCI
ATM
SW H/W
IPTDP
FT
VPI/VCI
ATM
SW H/W
IP
ATMARP/NHRP/MARS
PNNI
Q2931
IP FT
{dst prefix,,,}
Shared Bus
IP
IP FT
{dst prefix,,,}
SF(SM, CXB)
IP
ATM switch
IP switch(Ipsilon)
Tag switch(Cisco)(MPLS)
MPOA
Legacy router
Gigabit router
(L3 switch)
8
Various switches
SAT
SF
(SM, CXB)
STP
IP RT
SAT
SF
(SM, CXB)
IPSTP
L2 switch
L2/3 switch(Multilayer switch)
9
IP Router
Forwarding Function
Routing/Signalling/ mgt Function
Input
port i
output
port j
Control Function
-Routing protocol(RIP, OSPF,BGP,)
#NAME?
#NAME?
#NAME?
DataPath Function
#NAME?
#NAME?
Switching Function
라우터의고속화=
DataPath의H/W화
10
Traditional Router Architecture
RT
cache
CPU
Shared
buffer
Lineinterface
Shared bus
Routing
protocol
(RIP, OSPF
,BGP,)
System
mgt
Routing
table mgt
Packet forwarding
(LPM)
Packet
filtering
(보앆, 분리)
…
General purpose microprocessor(CPU): Routing과forwarding기능을모두수행
Packet forwarding: CPU + cache + Memory (full Routing Table)
Centralized forwarding(모든패킷이NP를거친다)
Shared Bus: 533Mbps(Cisco 4000), up to 2~3Gbps
Multi-protocol(IP, IPx, Appletalk, DECnet,…) 지원
다양한LAN technologies(Ethernet, Token Ring, FDDI,…)지원
2 datapaths : Slow path(Processor switching), Fast path(cache switching)
Ethernet
FDDI
Network processor
11
Process switching (slow path) : First packet
RT
cache
CPU
Shared
buffer
Line
interface
Shared bus
L3
L2
1copy a packet to shared buffer
2lookup cache (EM)
3lookup RT (LPM)
4initialize cache
5rewrite MAC frame header
6send to line interface
7calculate CRC and send to link
L2
L3
L3
1
L3
2
4
L2
5
L2
L3
6
7
3
LPM
12
Fast cache switching (fast path): Subsequent packets
RT
cache
CPU
Shared
buffer
Lineinterface
Shared bus
L3
L2
L3
L3
1
L2
L3
4
L2
3
1copy a packet to shared buffer
2lookup cache
3rewrite MAC header
4send to line interface
5calculate CRC and send to link
L2
L3
5
2
EM
13
Longest Prefix Matching (LPM)
Routing table
{1283215/16, port1}
{128322250/18, port3}
{128000/8, port5}
128321951 =
10000000 00100000 11000011 00000001
10000000 00100000 *
10000000 00100000 11*
10000*
도착패킷의IP address
Longest matching
prefix
Class A(8bit), Class B(16bit), Class C(24bit) : exact match
CIDR(Classless InterDomain Routing): 임의의길이network id 사용가능
Destination address의prefix가variable length가됨
LPM: IP address lookup시에도착패킷의dst IP address와가장길게
매치되는prefix를찾음
Port 3으로젂송
14
LPM(RT) vs EM(cache)
Routing table
{IP prefix, next hop, outgoing I/f, MAC address,}
{1283215/16, R2, #port1, MACaddr(R2),}
{128322250/18, R3, #port3, MACaddr(R3),}
{128000/8, R4, #port5, MACaddr(R4),}
…
… ~500,000 (Internet core)
Dest IP address arriving packet = 128321951
First packet
Longest matching
prefix
cache
{IP address, next hop, outgoing I/f, MAC address,Age,}
{128321951, R3, #port3, MACaddr(R3), 5,}
{165133801, R4, #port5, MACaddr(R4),10,}
…
Exact matching
Subsequent packets
Variable
Length
Prefix
Fixed
Length
address
15
Problems of traditional router (1)
Forwarding scheme
Main CPU가packet forwarding도수행(routing fn,management fn,…)
Centralized forwarding :
하나의CPU가모든line I/f로부터들어오는모든패킷을포워딩(RT든cache든)
NP로트래픽집중~병목= NP로의access, CPU의Forwarding decision
S/W-based forwarding : LPM ~ low forwarding rate
Cache-based forwarding : Locality 문제
Routing table에routes수가많으면cache hit율이떨어진다 Cache hit rate 100
50%, performace 90%저하
RT
cache
CPU
Shared
buffer
Lineinterface
Shared bus
L2
L3
L2
L3
L2
L3
L3
L2
L3
Network processor(NP)
16
Problems of traditional router (2)
Switching Fabric
Shared bus: not scalable(bus capacitance limitation), < 25 Gbps
2 HOP operation: -congestion시에bus access를위한delay 증가-system throughput감소(한packet이버스를두번경유해야하므로shared bus용량이1Gbps면실제스위칭용량은05Gbps)
RT
cache
CPU
Shared
buffer
Line
interface
Shared bus
L2
L3
L2
L3
L2
L3
L3
L2
L3
17
Forwarding scheme의변화cache-based(1): Distributed forwarding
CPU
CPU
Update
Distributed Forwarding: Routing cache를각line card로분산시킴
Data Path1) First packet -shared buffer -RT lookup (LPM) -outgoing line interface2) Route processor updates caches in each line interface3) Subsequent packets -lookup cache in line interface -outgoing line interface
장점1) 분산forwarding: NP로의트래픽집중해소, Main CPU는first packet 포워딩2) # of Bus access가준다: system throughput증가
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
cache
cache
cache
cache
Shared
buffer
Shared
buffer
RT
cache
RT
cache
18
Forwarding scheme의변화cache-based(2): RF과FF의분리
CPU
21164
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
SwitchFabric
RT
cache
FE card
21164
RT
cache
FE card
BBN MGR(MultiGigabit Router)Routing 기능과Forwarding기능을분리: Forwarding기능만젂담하는Forwarding engine card(FEC)를둠
FEC를복수개사용: (Central CPU로의트래픽병목을경감) 1) FEC로의트래픽집중으로인한병목해소2) Forwarding engine의부하분담
FEC : SW-based forwarding (21164+cache+memory, LPM)(고성능processor사용으로포워딩율높임)
Switching Fabric사용: high throughput
Data Path1) packet도착: Header만Forwarding Engine card로젂송(SF통해)2) FE card: sends {updated header + forwarding instruction} back to the line interface3) Line interface send the packet to outgoing interface
Shared
buffer
RT
cache
CPU
RT
cache
NP
FEC
FEC
첫번째
패킷
두번째
패킷
19
Forwarding scheme의변화Full RT lookup(1): RF과FF의분리
CPU
Sharedbuffer
FIB
CPU
Sharedbuffer
Cache를사용하지않고
FIB에서LPM수행
LPM: ASIC or
dedicated processor
FIB
CPU
FIB
buffer
Update
buffer
FIB
buffer
buffer
FIB
buffer
buffer
FIB
buffer
buffer
FIB
CPU
FIB
buffer
Update
buffer
FIB
buffer
buffer
FIB
buffer
buffer
FIB
buffer
buffer
Switch
Fabric
RT
cache
RT
RT
RT
C7200
C8500
C12000
C7500
20
Forwarding scheme의변화Full RT lookup(1): FT을두어RF과FF의분리
Cache를사용하지않고모든패킷에대해Forwarding table을lookup해서Longest matching prefix를찾는다
Forwarding table: Routing table의mirror image (LPM에최적화시켜구성, RT의모든entries가다있음)
Main CPU는route변화시forwarding table을update한다
DataPath: Topology-Driven forwarding, not traffic driven!
LPM: ASIC or dedicated processor
장점: cache기반방식의locality문제해결
C7200
CPU
Sharedbuffer
Line
interface
Shared bus
L3
L2
L2
L3
L2
L3
L3
L2
L3
update
FT
SE
RT
21
Forwarding scheme의변화Full RT lookup(2): FT을각Line I/f에분산
Cache를사용하지않고모든패킷에대해Forwarding table을lookup해서Longest matching prefix를찾는다
Forwarding table: Routing table의mirror image (LPM에최적화시켜구성, RT의모든entries가다있음)
Main CPU는route변화시forwarding table을update한다
DataPath: Topology-Driven forwarding, not traffic driven!
LPM: ASIC or dedicated processor
장점: cache기반방식의locality문제해결
C7200
CPU
Line
interface
Shared bus
update
FT
RT
FT
SE
buffer
FT
SE
buffer
FT
SE
buffer
FT
SE
buffer
22
Forwarding scheme의변화Full RT lookup(3): SF사용
Switching Fabric을Shared Meomory, Crossbar로대치
2Gbps -5Gbps -40Gbps -160 Gbps
CPU
Line
interface
update
FT
RT
FT
SE
buffer
FT
SE
buffer
FT
SE
buffer
FT
SE
buffer
Switch
Fabric
23
Forwarding scheme의변화(계속)
RT
FT
CPU
FT
buffer
Update
buffer
FT
buffer
buffer
FT
buffer
buffer
FT
buffer
buffer
Switch
Fabric
Cisco 8500Cisco GSR 12000Juniper M40…
BBN MGR
…
21164
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
Switch
Fabric
RT
cache
FE card
21164
RT
cache
FE card
24
Conventional router vs Gigabit Router
Forwarding scheme
Main CPU가packet forwarding 수행: forwarding기능이포함
S/W-based forwarding : LPM ~ low forwarding rate
Centralized forwarding : 하나의CPU가각line i/f로부터들어오는모든패킷을포워딩
Cache-based forwarding : Locality 문제
Switching Fabric
Shared bus: not scalable, < few Gbps
2 HOP operation: congestion시에bus access를위한delay 증가
Central CPU
Shared
buffer
RT
cache
RT
Central CPU
FE
buffer
Update
buffer
FE
buffer
buffer
FE
buffer
buffer
FE
buffer
buffer
SwitchFabric
Routing
protocol
(RIP, OSPF
,BGP,)
System
mgt
Routing
table mgt
Packet
forwarding
(LPM)
Packet
filtering
…
Routing
protocol
(RIP, OSPF
,BGP,)
Systemmgt
Routing
table mgt
…
Distributed H/W based forwarding
Separation data path and control plane
Fast switching using switching fabric
25
Comparisons
Traditional RouterNew Fast/Gigabit RouterConnectionFabric BWLimited BW 533Mbps, up to2~25Gbps- shared by all media (line)cardNonblocking switchingfabric(SM,CXB, CP)- 12~24Gbps dedicated BW permedia(line) cardRoute lookuptableSW-basedCache-basedH/W-basedFull route table lookupPacketforwardingengineCentralized forwarding(Single Main CPU/NP)- Multiple forwarding engine inseparate boards- in each line cardForwardingrateUp to 40~50K ppsUp to 1~40~160 MppsMediasupportedE/FETR, FDDI,…FE/GEATM OC-12/48POS OC-12/48(SAR chip unlikelyfor OC-48/192)Required port density= 12Gbps~24Gbps~
26
Gigabit Router Products
ProductAreaSwitching FabricForwarding rateLine i/fCisco 8500LAN BBRShared-memory10-40(FD5-20)Gbps4-16 Mpps125 GbpsE/FE/GE(ATM, POS)Cisco 12000LAN BBRISP BBRCrossbar(16X16@24Gbps)-80Gbps, 64BCXB arbitration: SLIP1 Mpps24GbpsATM OC-48POS OC-12GEJuniper M40ISP BBRShared memory (1),SDRAM, 128MB40(FD20)Gbps, 64B40 Mppssingle ASIC FE service allport(centralized forwarding ¬ cache)Ipv4: LPM(IP-RT)MPLS: EM(MPLS-FT)24GbpsATM OC-12(32)GEPOS OC-48 (8)AscendGFR 400LAN BBRISP BBRCrossbar(4X4@1Gbps),4Gbps1-3usecFE: in each Line i/f (distributed forwarding & not cache)1GbpsFRT(distributed forwarding)+Buff(16MB)+HW-LookupE/FE, ATM OC-3, POS OC-3BBN MGRLAN BBRISP BBRCrossbar(15X15@332Gbps), 50Gbps, 64BCXB arbitration: GWFA65 MppsMedia card : 24GbpsFE/GE, ATM OC-12, POS OC-48Forwarding card 21164+cache based LPMTorrentIP9000ISP BBRLERShared memory (3)10Gbps10 MppsFE: in each Line i/f (distributed forwarding)Full route lookup(not cache)LPM: ASIK algorithm(not open)FE/GEATM OC-12POS OC-12FoundryBigironLAN BBRISP BBRCrosspoint(8X8, @ )~128Gbps100MppsE/FE/GE
27
Catalyst 8500 Scalable Switching Architecture
Switch CPU
FE
GE
ATM
PoS
FEC
GEC
Switch
Fabric
32
0947_03F8_c2
NW98_US_803
Line Module
Line Module
Distributed
FIB
Distributed
FIB
Routing
Table
ForwardingInformation
Base
High-QoS Shared
Memory Fabric
Non-Blocking, Low Latency
Dynamic Bufferingwith Per FlowQueuing
Weighted RoundRobin (WRR)Queuing andScheduling for Differentiated Delay and Loss
Traffic Policing andRate Limiting
Cisco IOS Route
Processor
Runs Routing Protocols (OSPF, EIGRP, etc)
ComputesForwardingInformation Base
Cisco ExpressForwarding ASICs
FIB Based Longest Match Hardware
PolicyClassificationfor QoS andFiltering (ACLS)
Flexible Port Interfaces
GBIC Technology for GE, Fiber, and Copper for FE(C), ATM, and POS Uplinks
28
Routing protocolRouting table managementUpdated route distributemanagement function
Line card(input port processor, IPP)
Ingress
Queue
Egress
Queue
Line
I/F
Queueing
control
/RED
scheduler
LineI/F
Packet classification
Line-speed route lookup
ingress queue management
L3 forwarding to Gbits/second
Queuing, congestion handlingflow controlschedulingQoS enforcement & propogation
Line card(output port processor), OPP
Network Processor
Highly scalable,
multicast-capable
nonblocking fabric
for inter-card
communications
Packetfilter
Route
lookup
Queueing
control
IP pkt
IP Header processing
SF
controller
Switch
Fabric
29
Gigabit Router 핵심기술
Wire-speed packet forwarding (수~수십~수백Mpps)
LPM (Longest Prefix Matching) : Trie, Binary search,…
Implementation: ASIC-based, CAM, RICS
Wire-speed packet classification
Switching fabric (수~수십~수백Gbps)
Switch arbitor
QoS scheduling(WFQ), WRED, CoS, Diff-serv
Layer 4 switching
Policy-based forwarding/queueing/scheduling
Line interface
30
Wire-speed packet forwarding
IF we could do FAST longest prefix matches, then we wouldn’t be here (Nick McKeown, Stanford univ)
If you could get the whole IP forwarding table in fast memory(and update it invisibly) then who needs ATM?
방법론
Distributed forwarding: Forwarding decision을Line card에서
LPM: Binary tree algorithm -new and efficient LPM(Longest match를찾기위해소요되는memory access 횟수를줄일수있는알고리즘개발)
General-purpose microprocessor -dedicated ASIC H/W (memory access속도를빠르게)
* RISC processor+cache+Conventional Patricia tree algorithm =
near wire-speed performance , but too expensive!
31
Longest Prefix Matching (LPM)
Routing table
{1283215/16, port1}
{128322250/18, port3}
{128000/8, port5}
128321951 =
10000000 00100000 11000011 00000001
10000000 00100000 *
10000000 00100000 11*
10000*
도착패킷의IP address
Longest matching
prefix
Class A(8bit), Class B(16bit), Class C(24bit) : exact matchCIDR(Classless InterDomain Routing): 임의의길이network id 사용가능Destination address의prefix가variable length가됨LPM: IP address lookup시에도착패킷의dst IP address와가장길게매치되는prefix를찾음
Port 3으로전송
32
Binary tree algorithm
P1= 10*
P2= 111*
P3= 1010*
P4= 11*
P5= 0*
P6= 1000*
P7= 100011*
P8= 100000*
P9= 100001*
Prefix {Pi}
Lookup Key LPM
100010111000*
1011111010*
010101010*
1101110111*
0
1
0
1
0
1
0
0
1
0
0
0
1
1
1
P6
최악의경우prefix의길이만큼memory access필요
IPv4의경우: 최대32번, 평균22번(current ave prefix length)
No child
node to 0
33
Required forwarding rate
IPv4의경우: 최대32번, 평균22번(current ave prefix length)
22번/packet*100nsec/번= 22sec/packet=> 450000pps
Packet length = 64B(control packet), 576B(data packet)
Port speed = 1Gbps
required forwarding rate = 1Gbps/(576B/packet) = 217 Mpps
or 1Gbps/(64B/packet) = 195 Mpps( nsec/packet ?)
Image40
34
Recent approaches
Binary search on prefix length(Washington Univ,1997)
Compressed multibit trie (Luea Univ, 1997)
Single-level prefix expansion(Stanford Univ, 1998)
Binary search on prefix rages(Washington Univ,1998)
Controlled prefix expansion(Washington Univ,1998)
상용lookup chip : MUSIC CAM L2/3/4 address lookup
L2/4 : exact match
L3 : longest prefix match (up to 20 Mpps)
Vendor solution: not released, patented
35
Wire-speed switching
Output Buffering
Shared Buffering
Input Buffering
Forwarding Function
Switching Function
(Fabric + Buffer)
Routing
Function
Input
port i
outputport j
#2
2
2
2
2
2
2
Fabric
Buffer
Fabric
Buffer
Fabric: shared bus, shared memory, crossbar, Banyan, multistage interconnection network, …
36
패킷스위치의분류
WindowingDestination queueing(VOQ)Output expansionInput expansionDQ-based Input expansion
Switching
Fabric
Packet 스위치
Buffering
Scheme
Space-Division switch
Time-Division switch
Crossbar switch
Banyan switch
N2Disjoint path switch
Shared Medium switch
(Cubit, Fore, Cisco 4000)
Shared Memory switch
(Cisco 8500)
Input buffering
Output buffering
Input & Output buffering
Shared buffering
37
Buffer memory requirements(single port memory)
장점: 최적의성능(수율과큐잉지연시간)
단점: N배스피드업필요
장점: 최적의성능(수율, 큐잉지연시간, 셀손실율)
단점: 2N배스피드업필요
장점: No speedup
Scalability
단점: 수율한계
Output queueing
(N+1)V
Shared queueing
2NV
Input queueing2V
V
NWrite
N Read
NWrite
1 Read
1 Write
1 Read
38
Ex) Design a 1616@ 25Gbps switch
Traditional shared bus SF (OQ): ~ few Gbps
required shared bus BW = 1625Gbps = 40 Gbps (impossible !)
Shared memory SF (OQ): memory발전이speed보다는size증가
memory bus ~64bit, 10nsec DP-SRAM : 64Gbps
required memory size: 32K~64K cells/packets (too expensive !)
Multistage Interconnection network using shared memory SE
possible, but traffic management become complex !(multiple queueing point)
Nonblocking 조건만족위해, 4~6 times more switching elements required for 2X scale-up (cost)
Additional delay
Crossbar SF (IQ)
Crossbar switching speed & input buffer memory 25Gbps(스위치사이즈N=16,32,64,에무관탁월한확장성)
HOL blocking: 해결가능(VOQ,…)
Crossbar scheduler구현이난제
39
버퍼링방식(1) : Input queueing
한입력단에서한cell time(273sec)에한셀만전송가능
한출력단으로한cell time에한셀만전송가능
입력버퍼와스위칭패브릭은입출력링크의대역폭으로동작하면됨
스위치사이즈N에무관함
HOL blocking현상으로최대수율이586%로제한됨
Xylan X-cell Switch
Input Buffering장점: No speedup
Scalability
단점: 수율한계
입력버퍼(FIFO)2V
1 Write
1 Read
40
버퍼링방식(2) : Output queueing
N개의입력단에도착한모든셀은당슬럿에서해당목적지출력단으로전송
성능: 이상적Delay vs throughput 특성
스위칭패브릭과버퍼메모리의구현이곤란
버스티트래픽환경에서셀손실율이크다
입력버퍼링방식에서는버스트가각입력단에분산되어저장되지만
출력버퍼링방식에서는한출력단으로모두전송되므로셀손실이발생함
3Com, Bay
Output Buffering장점: 최적의성능(수율과큐잉지연시간)
단점: N배스피드업필요
출력버퍼
(FIFO)
(N+1)V
V
N Write
1 Read
11111
V
1111
1111
1111
1111
1111
1111
1111
1111
41
버퍼링방식(3) : Input & Output queueing
스위칭패브릭을C배스피드업함으로써한cell time에한출력단으로C개의셀이전송가능
나머지는입력버퍼에저장
C=2: 886%, C=4: 996% (B=N=)
output queueing보다loss특성이우수하다
feedback mode
loss mode
IBM 8265 Nways Switch
출력버퍼(FIFO)
C
11111
출력버퍼
C
4121
3331
42
4
feedback
42
한input port에서하나의셀만전송가능
한output port로하나의셀만전송가능
각input buffer의선두(Head)셀만전송가능
입력버퍼링의문제점
Output conflict
2
2
2
1 2
3
1 2
4
HOL(Head-of-Line) blocking
Input conflict
최대수율이586 %로제한
QoS guarantee 못함
43
입력버퍼링스위치의최대수율향상방앆: Technical Tree
입력버퍼구조개선
스위칭패브릭구조개선
Windowing
VOQ(logical N FIFO)
Output
expansion
Input
expansion
입력버퍼링스위치
Highly scalable
구현용이(SF, Memory)
FIFO
NN
HOL blocking
Input conflict
Output conflict
Problems
개선안
Performance output buffering, shared buffering
Cost << output buffering, shared buffering
speedup
44
일반입력버퍼링방식
입력버퍼: FIFO
한입력단에서선두셀하나만경합참가
스위칭패브릭: NN
한입력단에서하나의셀만전송가능
한출력단으로하나의셀만전송가능
성능: 랜덤트래픽,N=일때
수율= 586%
입력단3의목적지2인셀이
HOL차단됨
입력단1의목적지4인셀이
입력단충돌로전송차단됨
입력단3의목적지3인셀이
출력단충돌로전송차단됨
3
1
4
1
2
NN
1
3
45
윈도우방식, FIRO(w)
Window scheme입력버퍼: FIRO-w개의셀이경합에참가-w내의셀들은임의의순서로전송가능
경합중재: 순차적으로w round 수행
장점:HOL차단감소수율향상
성능: 랜덤트래픽,w=2,N=일때수율= 704%
2전송: 윈도우효과
3
1
4
1
2
NN
1
3
3
1
4
1
NN
윈도우크기w=2
Round 1
Round 2
1
3
2
Pure input buffering scheme
46
스위치확장: 출력단확장, 입력단확장
4
1
2
NrN
출력단확장비r=2
3
1
1
sNN
입력단확장비
s=2
1
3
3
1
1
3
4
2
Output expansion scheme입력버퍼: FIFO
스위칭패브릭: NrN한출력단으로r개의셀이전송가능한입력단에서하나의셀만경합참가, 전송가능
장점: 출력단충돌감소수율향상
성능: 랜덤트래픽,r=2,N=일때수율= 885%
Window-based input expansion입력버퍼: FIFO
스위칭패브릭: sNN한입력단에서s개의셀이경합참가, 전송가능한출력단으로는하나의셀만전송가능
장점: 입력단충돌감소수율향상
성능: 랜덤트래픽,s=2,N=일때수율= 764%
3,1전송: 출력단확장효과
4전송: 입력단확장효과
2전송: 윈도우효과
47
랜덤트래픽인가시최대수율비교
5
55
6
65
7
75
8
85
9
95
1
1
2
3
4
5
6
7
8
성능향상인자(v,r,alpha,m,s,w)
목적지별큐잉(m)
#NAME?
목적지별큐잉기반입력단확장(alpha)
#NAME?
출력단확장방식(r)
스피드업방식(v)
48
버스티트래픽(평균버스트길이=8) 인가시최대수율비교
0505506065070750808509095112345678¼º´E Ca≫o AIAU(v,r,alpha,m,s,w)Aⓒμμ¿i ¹æ½A(w)Aⓒμμ¿i ±a¹Y AO·A´U E®Aa ¹æ½A(s=w)¸nAuAoº° A¥A×(m)¸nAuAoº° A¥A×±a¹Y AO·A´U E®Aa(alpha)Aa·A´U E®Aa ¹æ½A(r)½ºCCμa¾÷ ¹æ½A(v)
49
VOQ (logical N FIFO)
고려되는스위치구조:
입력버퍼:목적지별큐잉(N FIFO)
스위칭패브릭:NN 크로스바스위치
가정
입력단(i)에도착한셀은해당목적지출력단(j) 큐Q(i,j)에저장된다
전송할셀이있는입력단은스케쥴러로경로설정을요구한다
스케쥴러는매슬럿마다전송될입력단출력단쌍을결정하고해당크로스포인트를ON한다
VOQ: HOL차단제거100% thru
제약조건
한입력단에서하나의셀만선택해야
한출력단으로하나의셀만전송해야
수율
매슬럿마다전송선택되는입출력단쌍의수가많아야높은수율을얻을수있다
NN
crossbar
switch
1234
1234
스케쥴러
Q(1,2)
Q(1,4)
Q(1,1)
Q(1,3)
Q(2,2)
Q(2,4)
Q(2,1)
Q(2,3)
Q(3,2)
Q(3,4)
Q(3,1)
Q(3,3)
Q(4,2)
Q(4,4)
Q(4,1)
Q(4,3)
50
N FIFO큐구조
1
1
0
0
0
1
0
0
0
0
1
0
0
0
1
1
4
3
2
1
4
3
2
1
1 22222 33 4444
4
3
2
1
4
3
2
1
From
input port
To crossbar
NNcrossbarswitch
1
2
3
4
1
2
3
4
스케쥴러
Q(1,2)
Q(1,4)
Q(1,1)
Q(1,3)
Q(2,2)
Q(2,4)
Q(2,1)
Q(2,3)
Q(3,2)
Q(3,4)
Q(3,1)
Q(3,3)
Q(4,2)
Q(4,4)
Q(4,1)
Q(4,3)
51
N FIFO큐에대한중재: Maximum size matching과Maximum weight matching
1
10
0
0
0
1
0
0
0
0
1
0
0
0
10
1
4
3
2
1
4
3
2
1
1
10
1
1
10
1
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
1
4
3
2
1
Size=4, weight=4
Size=2, weight=20
Maximum Size
Maximum weight(weight = ql)
1
10
1
1
10
1
1
10
1
1
10
1
Maximum size matching: 선택된input-output pair의수가최대가되도록선택
Maximum weight matching: weight의합이최대가되도록선택
52
Maximum matching의문제점: Starvation
Q(2,1)은Q(1,1)과출력단충돌, Q(2,2)와는입력단충돌이발생한상황
항상Q(1,1)과Q(2,2)만선택
선택된input-output pair의수가많으므로throughput은높으나
Q(2,1)은선택되지않는다(starvation, unfairness)
2
1
2
1
2
1
2
1
2
1
2
1
(a) size=2
(b) size=1
N FIFO큐스케쥴러입력단충돌, 출력단충돌해소높은throughputstarvation, unfairness 제거구현가능해야(실시간중재가능성283sec내에수행되어야)
111
1
22
53
N FIFO큐스케쥴링알고리즘
PIM(Parallel Iterative Matching) 알고리즘: AN2 (AT&T)
SLIP 알고리즘: GSR12000 (Cisco), Tiny-Tera (Stanford Uni)
2DRR(Two Dimensional Round Robin) 알고리즘: MGR (BBM)
MUCS(Matrix Unit Cell Scheduler) : iPOINT (Illinoi Univ)
54
PIM(Parallel Iterative Matching)알고리즘
1
2
3
4
1
2
3
4
request
grant
accept
(a) first iteration
1
2
3
4
1
2
3
4
request
grant
accept
-1
(1,3)
-3
(3,4)
Output arbitor
(랜덤선택)
Input arbitor(랜덤선택)
(b) second iteration
-1
(1,3)
-3
(3,4)
병렬성(독립성)
실시간성
랜덤성(기근제거)
1616 switch경우
4 iteration98%
1
1
3
3
PIM
스케쥴러
1
2
3
4
1
2
3
4
3
4
55
PIM 알고리즘의문제점
Q(1,1)
Q(2,1)
Q(2,2)
03월 04일
01월 04일
03월 04일
2
1
2
1
1
2
1
2
request
accept
01월 02일
01월 02일
01월 02일
01월 02일
불공정성(Unfairness): 경쟁이적은입력출력단쌍에높은전송기회를줌 입출력단충돌이공존하는Q(2,1)의경우전송기회가가장작다
grant
02월 04일
01월 04일
01월 04일
accept
56
SLIP algorithm
request
grant
g1=1
g2=4
a1=1
a3=1
update
accept
1
2
3
4
G2
1
2
3
4
G1
1
2
3
4
A3
1
2
3
4
A1
PIM과마찬가지로SLIP도반복매칭을통해서입출력단을중재한다다음의세단계가i번반복된다
Request:매치되지않은입력단은그입력단에저장되어있는셀들의모든목적지출력단으로request를보낸다
Grant:매치되지않은출력단j는request를받으면현재슬럿에서가장높은우선순위를가진입력단(gj)부터시작해서라운드로빈방식으로입력단을선택하고선택된입력단에게grant를보낸다출력단j에서가장높은우선순위를가진입력단을의미하는포인터는첫번째반복과정에서그grant가accept된경우에만(grant된입력단+1)modN으로갱신된다
Accept:매치되지않은입력단i가grant를받으면현재슬럿에서가장높은우선순위를가진출력단(ai)부터시작해서라운드로빈방식으로출력단을선택하고선택된출력단에게accept를보낸다입력단i에서가장높은우선순위를가진출력단을의미하는포인터ai는첫번째반복과정에서매치가이루어진경우에만(매치가이루어진출력단+1)modN로갱신된다
57
Properties of SLIP Algorithm
High throughput: 100% (HOL blocking제거)
Starvation free: Pointer가첫번째iteration에서grant한경우만1증가되므로특정Q(i,j)가계속해서서비스를받지못하는경우(starvation)이발생하지않는다
Fast: simple Round-Robin, only 4 iterations/packet time
Simple to implement: 1616 switch의scheduler single Chip으로개발
58
LQF(Longest Queue First) 알고리즘
1
2
3
4
1
2
3
4
request
accept
(1)(1,3)(3)(3,4)
Output arbitor
(큐길이가큰입력단선택)
ql=7
3
2
1
5
7
4
4
5
Input arbitor
(큐길이가큰출력단선택)
grant
1111111
111
33
3
3333
44444
LQF스케쥴러
1
2
3
4
1
2
3
4
PIM 알고리즘은각입출력단쌍큐에전송할셀이존재유무만이용하여중재수행LQF 알고리즘은각입출력단쌍의큐길이정보를이용해중재를수행한다
59
OCF(Oldest Cell First) 알고리즘
1
2
3
4
1
2
3
4
request
accept
(1)(1,3)(3)(3,4)
출력단중재기
(큐잉지연시간이
큰입력단선택)
17
13
12
11
15
17
14
14
15
입력단중재기(큐잉지연시간이큰출력단선택)
grant
1
1
3
3
3
4
OCF스케쥴러
1
2
3
4
1
2
3
4
OCF 알고리즘은각입출력단쌍의선두셀의큐잉지연시간을이용해중재를수행한다
60
2DRR(Two Dimensional Round Robin)알고리즘
N개의diagonal을이용해중재
각step의검사에서입출력단충돌이없으면전송선택
다음슬럿에서는step 2부터검사를시작하여각입출력단쌍간에공정성을유지한다
모든입출력단쌍은N슬럿마다한번씩전송최우선순위를갖는다
구현이매우단순하나특정스위치부하상황에서는unstable(수율보장이안된다)
변종: GWFA 알고리즘이BBN MGR에구현됨
1
0
0
1
0
1
1
0
0
1
0
1
1
1
1
0
step 1
1
0
0
1
0
1
1
0
0
1
0
1
1
1
1
0
step 2
1
0
0
1
0
1
1
0
0
1
0
1
1
1
1
0
step 3
1
0
0
1
0
1
1
0
0
1
0
1
1
1
1
0
step 4
61
input3
input4
input4
input2
2DRR의문제점:균일트래픽이아닌경우불공성(25, 25, 50%, not 33,33,33%)
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
slot1
slot2
slot3
slot4
62
MUCS(Matrix Unit Cell Scheduler)
트래픽매트릭스: ai,j=1(큐i,j에전송대기셀이있다), 0(empty)
웨이트매트릭스: wi,j=ai,j/row sum(i)+ ai,j/column sum(j)
웨이트가가장큰입출력단쌍(i,j)를선택하고
i행과j열을트래픽매트릭스에서삭제한다(입력단충돌과출력단충돌해소)
입출력단충돌이작은입출력단쌍을선택하여이후step에서경합에참여하는입출력단쌍의수가최대가되게하는것이목적
Illinois Univ의iPOINT switch에서구현됨
0
1
1
0
1
0
0
0
0
0
0
1
1
0
1
0
0
15
1
0
15
0
0
0
0
0
0
2
1
0
1
0
0
1
1
1
0
0
1
0
1
0
15
1
15
0
0
1
0
1
1
1
2
1
1
2
2
1
1
2
row sum
column
sum
Weight matrix
Trafficmatrix
(3,4), (1,2), (2,1), (4,3)선택
step 1
step 2
step 3
63
MUCS의장점
입출력단충돌이작은입출력단쌍을선택하여
이후step에서경합에참여하는입출력단쌍의수가최대가되게함
일종의최대수매칭방법: 높은throughput
0
1
1
0
1
0
0
0
0
0
0
1
1
0
1
0
0
1
1
0
1
0
0
0
0
0
0
1
1
0
1
0
step 2에서경합에참여할수있는
입출력단쌍수는5개
Step 1에서Q(3,4)를선택한경우
Step 1에서Q(4,3)를선택한경우
Step 2에서경합참가불가
step 2에서경합에참여할수있는입출력단쌍수는3개
Step 2에서
경합참가불가
64
MUCS의문제점: 기근발생
1
0
1
Q(1,1)
Q(2,1)
Q(2,2)
03월 04일
01월 04일
03월 04일
PIM알고리즘도랜덤선택방식으로중재하여입출력충돌이적은입출력단쌍에
전송선택될확률이높아입출력단쌍(1,1)과(2,2)는출력단용량의75%를사용하고
(2,1)은25를사용한다는문제가있었다
MUCS 알고리즘은이보다더심해입출력단쌍(2,1)은아예선택이되지않는다
1
0
1
1
1
2
2
1
15
0
1
15
PIM
MUCS
Q(1,1)
Q(2,1)
Q(2,2)
65
Cisco router 구조분석
Catalyst 8500 series(98) : LAN backbone router
Cisco GSR 12000 series(98) : ISP/LAN backbone router
66
Cisco’s Catalyst 8510/8540 architecture
1 Switching capacity
: 10Gbps/40Gbps
2 Switching fabric
: shared memory 3MB/12MB
3 Forwarding rate
: 6Mpps/24Mpps
4 Forwarding scheme
: dCEF
5 Route entries
: 16K/64K routes
6 L3 protocol: IP, Ipx
7 Line card
: E/FE/GE,(ATM,PoS)
RT: Routing TableFIB: Forwarding Information BaseAT: Adjacency Table
125Gbps
125Gbps
slot0
slot1
slot2
RT
FIB
AT
Route
processor
Shared
Memory
10Gbps
(3Mbyte)
Framescheduler
Switch route processor(SRP)
System processor
125Gbps
125
Gbps
Route Processor
Runs Routing Protocols (OSPF, EIGRP, etc)
ComputesForwardingInformation Base
slot3
FIB
AT
FE
FIB
AT
FE
FIB
AT
FE
FIB
AT
FE
67
Switch Route Processor (SRP):Switching Fabric and Arbitration
125Gbps
125
Gbps
RT
FIB
AT
Route
processor
Frame
scheduler
(FSA)
System processor
125
Gbps
125Gbps
Shared
Memory
10Gbps
(3Mbyte)
Switch Route Processor
Shared Memory10Gbps(3Mbyte)
C8510:
-3MB, 10(5)Gbps shared memory
#NAME?
-4 priorities class
(mgt & control frame to CPU = highest)
FSA(Frame Scheduler ASIC)Arbitration: FSACEFAs(in LCs)-CEFA sends a request (TDM) -FSA grantWRITE: SF으로들어온frame의internal routing tag보고해당destination port의해당prioiry queue에writeREAD:Weighted Round-Robin(WRR) ex) 8:4:2:1
Port 31
Port 0
68
Switch Route Processor (SRP):System Processor
125Gbps
125
Gbps
RT
FIB
AT
Routeprocessor
Frame
scheduler
System processor
125
Gbps
125Gbps
Shared
Memory
10Gbps
(3Mbyte)
System processor
route processor
CPU: 100MHz R4600 RISC
OS: Cisco IOS120
#NAME?
#NAME?
#NAME?
#NAME?
#REF!
Routing Table
FIB Table
AT Table
Dst prefix
Next hop
Ptr
Dst prefix
Next hop
Outgoing i/f
MAC addr
Outgoing i/f
MAC addr
IP: RIP, OSPF, IGRP,EIGRPIPx: RIP, EIGRPmulticast: PIM, DVMRP
Routing protocol
Switch Route Processor
165133
R2
FE1
BBA006700
165133
R2
Ptr
FE1
BBA006700
1:1 mapping
69
Line Card Architecture
8K internal memory (2K
command instruction포함)
Micro-
controller
Search
Engine
16K or 64K CAM
CEF ASIC
Frame
FIFO
ControlFIFO
PHY
PHY
PHY
PHY
RR
Port 0
Port 3
Switching
Fabric
FabricInterface
Port 1
Port 2
Line card
CEFA (Cisco Express Forwarding ASIC):dst IP prefix(MAC addr) lookup
LC상의4 port를서비스(8port/LC 2CEFA/LC)
internal memory(8KB, SRAM): address lookup수행동안패킷저장
microcontroller: (1) fair access to internal memory among 4 ports
(2) control packet (ARP, RP, STP,…) RSP(main CPU)
search engine: L2 address match, L3 longest prefix match
FI (Fabric Interface):frame에routing tag 장착기능
Frame FIFO
Control FIFO
internal routing tag = {port-of-exit, delay priority, drop priority}
70
Data path in a Line Card (1)
8K internal memory (2Kcommand instruction포함)
Micro-
controller
Search
Engine
16K or 64K CAM
CEF ASIC
Frame
FIFO
Control
FIFO
PHY
PHY
PHY
PHY
RR
Line card
Port 0
Port 3
Switching
Fabric
Fabric
Interface
Port 1
Port 2
L2
L3
data
64B
L2
L3
data
RT
frame
71
Data path in a Line Card (2)
A framearrives
-controllerreads first 64Byte of a frame
-controllersends {src, dst MAC address(L2), dst IP address(L3), L4 port information(L4) } to SE
SE: binary tree algorithm MAC address match, IP longest prefix match
SE:get {port-of-exit, rewrite info, QoS priority?}
-controllerSE{port-of-exit, rewrite info, priority}to Control FIFO
IMsend packet to Frame FIFO
FI: rewrite new L2 info ckecksum 계산
routing tag{port-of-exit, delay priority, drop priority}장착
request to Fabric(RSP의FSA으로)
72
Cisco’s GSR 12000 series architecture
125Gbps
125
Gbps
slot0
slot1
slot2
RT
FT
Route
processor
Shared
Memory
10Gbps
(3Mbyte)
Framescheduler
Switch route processor(SRP)
System processor
125
Gbps
125Gbps
slot3
FT
FE
FT
FE
FT
FE
FT
FE
24
Gbps
24
Gbps
24Gbps
RT
FT
Routeprocessor
24
Gbps
FT
FE
FT
FE
FT
FE
FT
FE
GRP
BUF32MB
BUF32MB
BUF32MB
BUF32MB
C8500
GSR 12000
CrossbarSwitchFabric(1616@24Gbps)40Gbps
Crossbar
scheduler
Main changes
SF: SM(5Gbps) Crossbar(40Gbps)
Shared queueing Input queueing scheme
Forwarding scheme은동일
73
Gigabit Router의응용
Internet backbone router
Enterprise(Campus) Network backbone router
ATM
IP
FR
IP Router Backbone
IP ATM Backbone(MPLS)
Multiservice Backbone
MPC
MPS
MPOA
LEC
LES
LANE
1-R
BBR
L2SW
BBSW
ER
BBR
IP SW/CSR
Global internet/WAN
Enterprise Network/Private intranet/LAN
Enterprise Network/Private intranet/LAN
74
Conclusion
Trend in Routing
Routing function과Forwarding Function의분리
분산화, HW화, Full route table lookup
Switching fabric 사용
QoS 제공이HOT issue
Trend in Switching
Single stage Shared Memory & Shared Bus는고속스위치에적합지않음
Input-Buffering based Switch와Multi-stage switch가적합
Trend in Enterprise (Campus) Network
Repeater Bridge L2 switch router L3 switch(GS, Multilayer switch)
LANE/CLOA one-armed router
(IP switch,CSR or NHRP,MPOA or L3 switch)
Trend in Internet backbone
current traditional router기반internet
(IP over ATM or MPLS or Gigabit Router)