Hi All I’m curious as to how many people are using Mikrotik devices as border routers? We’re using three CCR1072’s at three different datacentres. The problem (as you can probably guess) is the BGP implementation. We’re taking partial routes from peers (IX’s, local networks we work with, our own downstream IaaS platforms, etc). Currently we’re taking around 250k routes in each routers. They are already struggling. Given we don’t take full tables and we have default routes at each site, convergence isn’t necessarily a huge problem (although it is slow). But, things like “routing bgp advertisements print where peer=”ABC”” being used as part of diagnostics are taking over a minute to run and becoming a problem. How many others are actually using these devices in these sorts of scenarios? I’m beginning to think that once you reach this point, these things need to tossed aside for the real hardware which can cope in the real world… Thoughts? Shane
Hi Shane, I highly recommend you watch this video from MUM Europe 2018, where they compare CHR BGP performance across different hypervisors. If you can switch from CCR to CHR, I think you will find some benefits in performance. https://youtu.be/xcgdGA1W_0o It's about 30mins long. I haven’t been able to find the slides for it yet. I know a lot of this is about BGP convergence, but I think you can see that it shows how much faster the general processing performance of a CHR is due to the per-core processor speed. If possible, I'd recommend you build one or more CHR VMs (free license may be suitable, with 1Mbps transmit limit per interface, but you get a free 60 day trial for other license levels). You could lab them up similar to the video (using pre-built route tables), or peer them with your existing CCR routers to pull in the live route tables, and then run your commands that you see performing slowly on the CCRs. Key points about their lab: Route tables are around 680k each Tests are run on same host hardware, with the same non-CHR devices, all done on traffic THROUGH the devices, not to/from the devices directly. The primary difference between the tests is the Hypervisor itself. More details about the equipment are in the video BGP convergence – 1 table VMWare 25s KVM (in Proxmox) 25s Hyper-V 12s BGP convergence – 2 tables VMWare 4m25s KVM 1m34s Hyper-V <1 minute Throughput 1500 byte packets VMWare 5.3Gbps 28% CPU KVM 4.4Gbps 28% CPU Hyper-V 4.4Gbps 32% CPU Throughput 512 byte packets VMWare 3Gbps 38% CPU KVM 1.8Gbps 29% CPU Hyper-V 1.7Gbps 33% CPU Throughput 96 byte packets VMWare 1Gbps KVM 400Mbps Hyper-V 400Mbps 37% CPU Full BGP load under traffic load VMWare 11 minutes KVM 9 minutes Hyper-V 43s So it seems that, given a free choice of Hypervisor, it comes down to a tradeoff between throughput performance and BGP convergence performance. If no other factors impacted the decision (eg cost, management skills, etc), route reflectors may be better hosted on Hyper-V, with packet-pushing routers hosted on VMWare. Regards, Philip Loenneker | Network Engineer | TasmaNet 40-50 Innovation Drive, Dowsing Point, Tas 7010, Australia P: 1300 792 711 philip.loenneker@tasmanet.com.au www.tasmanet.com.au -----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Shane Clay Sent: Thursday, 19 April 2018 10:44 PM To: MikroTik Australia Public List <public@talk.mikrotik.com.au> Subject: [MT-AU Public] BGP & Routing Tables Hi All I’m curious as to how many people are using Mikrotik devices as border routers? We’re using three CCR1072’s at three different datacentres. The problem (as you can probably guess) is the BGP implementation. We’re taking partial routes from peers (IX’s, local networks we work with, our own downstream IaaS platforms, etc). Currently we’re taking around 250k routes in each routers. They are already struggling. Given we don’t take full tables and we have default routes at each site, convergence isn’t necessarily a huge problem (although it is slow). But, things like “routing bgp advertisements print where peer=”ABC”” being used as part of diagnostics are taking over a minute to run and becoming a problem. How many others are actually using these devices in these sorts of scenarios? I’m beginning to think that once you reach this point, these things need to tossed aside for the real hardware which can cope in the real world… Thoughts? Shane _______________________________________________ Public mailing list Public@talk.mikrotik.com.au http://talk.mikrotik.com.au/mailman/listinfo/public_talk.mikrotik.com.au
I did the move from x86 to CCR - wanted phy - didn't try CHR. I had to become creative with my BGP was taking up to 3min to process 4 full table BGP peers (3 min per peer). I did a lot of filter I found to get to 2-3 sec I needed to be under 5k routes. Alex On 20 April 2018 at 09:14, Philip Loenneker < Philip.Loenneker@tasmanet.com.au> wrote:
Hi Shane,
I highly recommend you watch this video from MUM Europe 2018, where they compare CHR BGP performance across different hypervisors. If you can switch from CCR to CHR, I think you will find some benefits in performance. https://youtu.be/xcgdGA1W_0o It's about 30mins long. I haven’t been able to find the slides for it yet.
I know a lot of this is about BGP convergence, but I think you can see that it shows how much faster the general processing performance of a CHR is due to the per-core processor speed. If possible, I'd recommend you build one or more CHR VMs (free license may be suitable, with 1Mbps transmit limit per interface, but you get a free 60 day trial for other license levels). You could lab them up similar to the video (using pre-built route tables), or peer them with your existing CCR routers to pull in the live route tables, and then run your commands that you see performing slowly on the CCRs.
Key points about their lab: Route tables are around 680k each Tests are run on same host hardware, with the same non-CHR devices, all done on traffic THROUGH the devices, not to/from the devices directly. The primary difference between the tests is the Hypervisor itself. More details about the equipment are in the video
BGP convergence – 1 table VMWare 25s KVM (in Proxmox) 25s Hyper-V 12s
BGP convergence – 2 tables VMWare 4m25s KVM 1m34s Hyper-V <1 minute
Throughput 1500 byte packets VMWare 5.3Gbps 28% CPU KVM 4.4Gbps 28% CPU Hyper-V 4.4Gbps 32% CPU
Throughput 512 byte packets VMWare 3Gbps 38% CPU KVM 1.8Gbps 29% CPU Hyper-V 1.7Gbps 33% CPU
Throughput 96 byte packets VMWare 1Gbps KVM 400Mbps Hyper-V 400Mbps 37% CPU
Full BGP load under traffic load VMWare 11 minutes KVM 9 minutes Hyper-V 43s
So it seems that, given a free choice of Hypervisor, it comes down to a tradeoff between throughput performance and BGP convergence performance. If no other factors impacted the decision (eg cost, management skills, etc), route reflectors may be better hosted on Hyper-V, with packet-pushing routers hosted on VMWare.
Regards, Philip Loenneker | Network Engineer | TasmaNet 40-50 Innovation Drive, Dowsing Point, Tas 7010, Australia P: 1300 792 711 philip.loenneker@tasmanet.com.au www.tasmanet.com.au
-----Original Message----- From: Public [mailto:public-bounces@talk.mikrotik.com.au] On Behalf Of Shane Clay Sent: Thursday, 19 April 2018 10:44 PM To: MikroTik Australia Public List <public@talk.mikrotik.com.au> Subject: [MT-AU Public] BGP & Routing Tables
Hi All
I’m curious as to how many people are using Mikrotik devices as border routers? We’re using three CCR1072’s at three different datacentres.
The problem (as you can probably guess) is the BGP implementation. We’re taking partial routes from peers (IX’s, local networks we work with, our own downstream IaaS platforms, etc). Currently we’re taking around 250k routes in each routers. They are already struggling.
Given we don’t take full tables and we have default routes at each site, convergence isn’t necessarily a huge problem (although it is slow). But, things like “routing bgp advertisements print where peer=”ABC”” being used as part of diagnostics are taking over a minute to run and becoming a problem.
How many others are actually using these devices in these sorts of scenarios? I’m beginning to think that once you reach this point, these things need to tossed aside for the real hardware which can cope in the real world…
Thoughts?
Shane _______________________________________________ Public mailing list Public@talk.mikrotik.com.au http://talk.mikrotik.com.au/mailman/listinfo/public_talk.mikrotik.com.au _______________________________________________ Public mailing list Public@talk.mikrotik.com.au http://talk.mikrotik.com.au/mailman/listinfo/public_talk.mikrotik.com.au
participants (3)
-
Alex Samad
-
Philip Loenneker
-
Shane Clay