Disclaimer:
Every Network is different , so one solution cannot be applied to all. Therefore try to understand logic & create your own solution as per your network scenario. Just dont follow copy paste.
If anybody here thinks I am an expert on this stuff, I am NOT certified in anything Mikrotik/Cisco/Linux or Windows. However I have worked with some core networks and I read & research & try stuff all of the time. So I am not speaking/posting about stuff I am formerly trained in, I pretty much go with experience and what I have learned on my own. And , If I don’t know something then I read & learn all about it.
So , please don’t hold me/my-postings to be always 100 percent correct. I make mistakes just like everybody else. However – I do my best, learn from my mistakes and always try to help others
Scenario-1:
We are using Mikrotik CCR as PPPOE/NAS. We are using public ip routing setup so each user is assigned public ip via pppoe profile.
Scenario-2:
We are using single Mikrotik CCR as PPPOE/NAS. We have local dsl service therefore NATTING is also done on the same router.
Problem:
When we have network outages like light failure in any particular area , in LOG we see many PPPoE sessions disconnects with ‘peer not responding‘ messages. Exactly at this moments, our NAS CPU usage reaches to almost 100% , which results in router stops passing any kind of traffic. This can continue for a minute or so on.
As showed in the image below …
If you are using Masquarade /NAT on the router, that is the problem. When using Masquarade, RouterOS has to do full connection tracking recalculation on EACH
interface connect/disconnect.
So if you have lots of PPP session connecting/disconnecting, connection tracking
will constantly be recalculated which can cause high CPU usage. When interfaces connect/disconnect, in combination with NAT, it gives you high CPU usage.
Solution OR Possible Workarounds :
First read this
Separating NATTING from ROUTING in Mikrotik
https://aacable.wordpress.com/2018/03/27/separating-natting-from-routing-in-mikrotik/
- If you have private ip users with natting, Stop using Masquarade on same router that have a lot of dynamic interfaces. Just DO NOT use NAT on any router that have high number of connecting/disconnecting interfaces. Place an additional router connected with your PPPoE NAS, and route NAT there.
Example: Add another router & perform all natting on that router by sending marked traffic from private ip series to that nat router. Setup routing between the PPPoE NAS and the NAT router. - IF all of your clients are on public IP , you can simply Turn Off
connection tracking
completely. This is the simplest approach.But beware that turning of CT will disable all NATTING / marking traffic as well.
Note: You can exempt your specific public pool from connection tracking as well.
- Any device that is CORE device or Gateway on your network, It should be assigned to perform one job only. Try not to mix multiple functions in one device. This will save you from later headache of troubleshooting.
Please read this …
Features affected by connection tracking
- NAT
- firewall:
- connection-bytes
- connection-mark
- connection-type
- connection-state
- connection-limit
- connection-rate
- layer7-protocol
- p2p
- new-connection-mark
- tarpit
- p2p matching in simple queues
So if you will turn OFF the connection tracking, above features will stop working.
– Code Snippet:
Some working example of excluding your public pool from connection tracking
- First make sure
Connection Tracking
is set to AUTO
1/ip firewall connection tracking set enabled=auto
- Then make a address list which should have your users ip pool so that we can use this list as an Object in multiple rules later.
123/ip firewall address-list
add address=1.1.1.0/24 list=public_pool
#add address=2.1.1.0/24 list=public_pool
- Now create rule to turn off connection tracking from our public ip users witht the RAW table
123/ip firewall raw
add action=notrack chain=prerouting src-address-list=public_pool
add action=notrack chain=prerouting dst-address-list=public_pool
That’s it!
Some Tips for General Router Management
- Turn off all non essential services that are not actually being used or needed. Services place an additional CPU load on any system. Example, you can move your DHCP role to cisco switches for better response , also for intervlan routing it is highly recommended., Also if your ROS is acting as DNS as well, then move DNS role to dedicated dns server like BIND etc. This will free up some resources from the core system
- Use 10-gig network cards instead of 1-gig / Use 1-gig network cards instead of 100 meg
- Disable STP if it is not needed. Now this is highly debatable part I know 🙂
- Use Dynamic queues , they are spreader over multi cores