I have a strange problem on our network. We have 4 CMTS routers: 3 Cisco and 1 Motorola 64K. The Motorola is new and only has a handful of customers on it (less than 60). Some of the modems on the Motorola are randomly getting knocked offline. What is strange is the router still sees the modem in an online state and the modem thinks it is still online (solid 'online' light) but I can't ping it. The only way to fix the problem is to power cycle the modem. If I leave the modem alone, eventually the router will see it as offline and the 'online' light on the modem will flash.
In one case I have two modems on two separate routers that both exhibit the same problems. The modems are right next to each other and they go offline within a few hours of each other. My plant is pretty clean and all the numbers look pretty good. The upstream frequency on the two modems I saw with the problems are 24.784 (Motorola) and 21.376 (Cisco). I know this is happening with about 6 modems on the Motorola. I'm not sure of the numbers on the other routers since I don't track those as close.
Anyone have any ideas?
I can't tell you exactly what is going on but may be able to get you looking at the right direction.
what is happening is you are losing communication with your modems, they do not go offline right away because they wait a specified time value of t3 timer I believe, for a RNG-RSP from the CMTS they only go off line if they don't hear from the CMTS after that timer has expired. The same goes for the cmts, it does not mark the modem off line the first time if fails to check in it waits a specified timeout time before marking it as off line.
Now, as to the cause, that is a little trickier, you are losing downstream and/or upstream communication with the modem. This could be an outage, ingress, common path or many other things. Since you stated that your plant is clean here are some other ideas for you.
1. We had a problem where modems were being seen on the wrong upstream port because all the upstreams on that blade were configured for the same upstream frequency, this caused these modems to exhibit behavior similar to yours, as well as ghost modems to appear on other upstream ports. This happened randomly for a while and was not a huge problem, then one day we had a lot of these ghost modems stop coming up altogether, the solution was to put the upstream ports in question on different frequencies. The problem was easily identifiable by the modems appearing on the wrong upstream port.
2. use MRTG or something similar I would recommend zenoss or cacti and graph the signal levels of the modem and signal levels/online modems of the cmts ports and see if anything identifiable shows up when the problem occurs. You may want to keep track of free minislots and processor utilization and upstream utilization.
All of our upstreams are using the same frequency. We have 4 blades active with 5 upstreams each. All upstreams on a blade share the same frequency, so we are using 4 separate frequencies. Do you share any frequencies on your upstream at all?
We only had that problem with one blade specifically two ports on that blade, on those ports we had to change the upstream frequencies one is 21.x the other is 24.x . all the other ports on that blade and the other blades all have the same frequency set 27.x mhz. We replaced the blade and the laser in the combining network multiple times, It probably points to some faulty equipment likely in the combining network but four separate network engineers could not figure it out. One of our network engineers here found an obscure cisco doc documenting the problem with the solution.
The indicator of the problem was two different modems connected to the same outlet in the headend had two separate timing offsets, when i reality they should have been close if not identical, one was 2200 the other 1300.
Here is the blurb he emailed me
"One very rare alternative cause of Negative Time Offsets is where two or
more CMTS devices serve a common cable segment. If two CMTSs are set up with
the same upstream frequency settings for a particular cable segment then one
CMTS may "overhear" an Initial Ranging Request from a Cable Modem connecting
to another CMTS. This Initial Ranging Request may be heard at a random time
within the Initial Ranging Interval and hence an invalid time offset will be
calculated for the Cable Modem." From:
http://www.cisco.com/en/US/tech/tk86/tk89/technologies_tech_note09186a00...
460b.shtml
We were affected by this though only one cmts serves this particular area.
What's the snr on the return, channel width and modulation? what kind of modems and do they have the latest firmware?
can you email me a show tech and what the modem firmware is?
~Cmcaldas@gmail.com