Arris E6000 RSM failure | docsis.org

You are here

Arris E6000 RSM failure

1 post / 0 new
ckgth
Arris E6000 RSM failure

Hi there,

we have an E6000 Gen-1.
Problem is, thas RSM in Slot6 and 7 have failure and switch over.. RPIC of RSMs i have replace.
In the failure time the Log shows:

09/12/22 18:15:26 [00.0000011951] <2473075513> (notc) CM 44:4e:6d:05:fa:92 (AVM) reset:SM_RANGING_OTHER_CHAN uchan:121 SID:10640 State:Ranging Complete uptime:150 sec card:0 CM cfg file:
09/12/22 18:15:34 [06.0000000119] <2473085714> (notc) [SBY] Maintenance Event: RSMNPUXING10TOPSWITCHRXSTATRDDIP4OOS: stoxspi42/10:Rsm Stox - STOX_SPI 1_0 Top Switch RD_DIP4_OOS (DIP-4 errors above threshold)
09/12/22 18:15:34 [06.0000000120] <2473085714> (notc) [SBY] Maintenance Event: RSMNPUXING10TOPSWITCHRXSTATRYDISABLED: stoxspi42/10:Rsm Stox - STOX_SPI 1_0 Top Switch RY DISABLE(calendar state machine disabled)
09/12/22 18:15:34 [06.0000000121] <2473085714> (notc) [SBY] Maintenance Event: RSMNPU1SPIASTATLKDWN: ezNpxx/1:Fatal Device Error - NPU1 SPI-A Stat Link Down
09/12/22 18:15:34 [06.0000000122] <2473080646> (notc) [SBY] SystemMtceCard::handleSystemMtceNotification(): SystemMtceCard.6: deviceError=9: FaultFlags=0x1
09/12/22 18:15:34 [06.0000000123] <2473080650> (alrt) [SBY] SystemMtceCard::handleSystemMtceNotification(): SystemMtceCard.6: Fatal Error: Device Fatal Error: RSMNPU1SPIASTATLKDWN:ezNpxx/1:NPU1 SPI-A Stat Link Down:
09/12/22 18:15:34 [06.0000000124] <2473080320> (notc) [SBY] SystemMtce::cardNeedsRecovery(): SystemMtceCard.6: Fatal Device Error Fault: Initiating Card Recovery
09/12/22 18:15:34 [06.0000000125] <2473080325> (notc) [SBY] SystemMtce::handleResetSystem(): resetType=LOCAL_SCM
09/12/22 18:15:34 [07.0000000408] <2473080332> (alrt) [ACT] SystemMtce::handleSystemMtceStatus(): mtceRequests=0x8: cloneRsmCardMtceState=IN_RECOVERY
09/12/22 18:15:34 [07.0000000409] <2473080656> (notc) [ACT] SystemMtceCard::resetCard(): Resetting Card 6: resetType=DELAYED_RESET: recoveryCnt=1
09/12/22 18:15:34 [07.0000000410] <2473080443> (notc) [ACT] Dump of SystemMtce (0x45244c40)
09/12/22 18:15:34 [07.0000000411] <2473080443> (notc) [ACT] cloneRsmCardNumber=7: coldStartStatus=NO: reloadColdStart=0: isConfigured=1
09/12/22 18:15:34 [07.0000000412] <2473080443> (notc) [ACT] systemMtceFlags=0xd3: systemMtceFlags2=0x10: cardRecoveriesEnabled=1: chassisSerial=15023CHS0055
09/12/22 18:15:34 [07.0000000413] <2473080443> (notc) [ACT] Cached Running Image: CER_V06.00.00.0080_RDU0105.gen1rsm.full.img: file=CER_V06.00.00.0080_RDU0105.gen1rsm.full.img: size=226232352: COMMITTED: patchVersion=_RDU0105;1533934043
09/12/22 18:15:34 [07.0000000414] <2473080443> (notc) [ACT] portMtceInfoId=1719: portMtceEventDelay=0: portMtceEventDelayCnt=0: systemReloadStatus=0x1
09/12/22 18:15:34 [07.0000000415] <2473080443> (notc) [ACT] checkFSStatus=0x100: activeMiniSlotCounterDevice=2: timeStampAuditEnabled=1: timestampSyncNeeded=0
09/12/22 18:15:34 [07.0000000416] <2473080443> (notc) [ACT] miniSlotCounter=0xf109eb80: miniSlotCounterWrap=0x197: miniSlotUpdateCounter=14: miniSlotSyncInProgress=0
09/12/22 18:15:34 [07.0000000417] <2473080443> (notc) [ACT] cloneStatusCounter=0: cloneStatusThreshold=248: cloneResetCnt=1: cloneResetSoakInterval=2: cloneTestMode=0
09/12/22 18:15:34 [07.0000000418] <2473080443> (notc) [ACT] assertActivePulse=0: rfSwitchValue=0x0: slotPowerMask=0x1e: rsmCardHardwareDetectedType=1
09/12/22 18:15:34 [07.0000000419] <2473080443> (notc) [ACT] actStbyAuditCount=0
09/12/22 18:15:34 [07.0000000420] <2473080443> (notc) [ACT] dqTid=0: dqTableId=21: dqNumEntries=16
09/12/22 18:15:34 [07.0000000421] <2473080443> (notc) [ACT] systemMtceTimeouts:
09/12/22 18:15:34 [07.0000000422] <2473080443> (notc) [ACT] cloneMonitorThreshold=4: cloneInitThreshold=5: cloneRcvyThreshold=240
09/12/22 18:15:34 [07.0000000423] <2473080443> (notc) [ACT] systemMtceStatusSeqno=0x7a2f324f: cloneSystemMtceStatus:
09/12/22 18:15:34 [07.0000000424] <2473080443> (notc) [ACT] seqno=0xcf3fda6f: rfSwitchValue=0x0: mtceFlags=0xd5: mtceStatusFlags2=0x0: mtceRequests=0x8
09/12/22 18:15:34 [07.0000000425] <2473080443> (notc) [ACT] localMtceState=IN_RECOVERY: cloneMtceState=ACTIVE
09/12/22 18:15:34 [07.0000000426] <2473080443> (notc) [ACT] clockSyncTimer ON: systemMtceTimer ON: watchdogTimer ON: auditTimer ON: rfSwitchTimer ON: overloadUpdateTimer ON
09/12/22 18:15:34 [07.0000000427] <2473080443> (notc) [ACT] portMtceNotificationTimer OFF: minislotAuditTimer ON: reassertActiveTimer OFF: autoCommitTimer ON
09/12/22 18:15:34 [07.0000000428] <2473080443> (notc) [ACT] portMtceEventDelayTimer OFF: spareGroupLicenseTimer ON: dataplaneLinkStatusTimer OFF
09/12/22 18:15:34 [07.0000000429] <2473080443> (notc) [ACT] fabricOpBitMap 0x0: fabricRmtOpBitMap 0x0: waitFabricOpsCounter 1
09/12/22 18:15:34 [07.0000000430] <2473080443> (notc) [ACT] SpareGroupManagers:
09/12/22 18:15:34 [07.0000000431] <2473080443> (notc) [ACT] minislotSync->0x545ec0c0: systemMtceReset->0x0: itsSystemReload->0x0
09/12/22 18:15:34 [07.0000000432] <2473080443> (notc) [ACT] localRsmCpuOverload=NORMAL
09/12/22 18:15:34 [07.0000000433] <2473080443> (notc) [ACT] userSystemCpuOverload=NORMAL: value=0: sensitivity=2: threshold=2: leak/ypeg/rpeg=20/40/100: normal/yellow/red=1200/1600/2400
09/12/22 18:15:34 [07.0000000434] <2473080443> (notc) [ACT] Current state(s) = BasicMtce InService WaitForEvents
09/12/22 18:15:34 [07.0000000435] <2473080652> (notc) [ACT] Dump of SystemMtceCard.6 (0x452f1540)
09/12/22 18:15:34 [07.0000000436] <2473080652> (notc) [ACT] cardNumber=7: cardVersionInfoRetrieved=1: isClone=1: isPresent=1: cardUpTime=2723
09/12/22 18:15:34 [07.0000000437] <2473080652> (notc) [ACT] slotHasPower=1: cardHeldInReset=0
09/12/22 18:15:34 [07.0000000438] <2473080652> (notc) [ACT] basicMtceState=IN_RECOVERY: stableMtceState=STDBY: stableMtceStateTime=2576: firstKnownState=UNKNOWN: mtceReconfigureState=UNKNOWN
09/12/22 18:15:34 [07.0000000439] <2473080652> (notc) [ACT] recoveryCnt=1: resetPending=DELAYED_RESET: pingFaultErrorCount=0: successiveFailedBoots=0
09/12/22 18:15:34 [07.0000000440] <2473080652> (notc) [ACT] isSane=1: longInitInProgress=0: monitoringActive=1: delayedRecoveryInProgress=0: spareGroupLeader=-1
09/12/22 18:15:34 [07.0000000441] <2473080652> (notc) [ACT] lastCardDeviceError=0: lastMtceFaultId=11603: lastResetReason=: lastCardFatalError=None
09/12/22 18:15:34 [07.0000000442] <2473080652> (notc) [ACT] cardHasValidCardEntryData=1: cardHasValidPortEntryData=1: canSendCardMtceNotification=0: cardDiagTarget=1: cardNeeds2bSynced=0
09/12/22 18:15:34 [07.0000000443] <2473080652> (notc) [ACT] cardDetectedType=6: cardSubDetectedType=6: cardHardwareDetectedType=1: cardMisconfigured=0: syncInProgress=0
09/12/22 18:15:34 [07.0000000444] <2473080652> (notc) [ACT] cardLicPortsAdminUp=0: dataQueryTid=0: dataQueryType=0: cardCommitInProgress=0
09/12/22 18:15:34 [07.0000000445] <2473080652> (notc) [ACT] fabricPortTargetState=ENABLED: fabricPortState=DISABLED: clockFaulted=0
09/12/22 18:15:34 [07.0000000446] <2473080652> (notc) [ACT] dataplaneLinkAFaulted=0: dataplaneSecondaryLinkAFaulted=0: dataplaneLinkBFaulted=0: dataplaneSecondaryLinkBFaulted=0
09/12/22 18:15:34 [07.0000000447] <2473080652> (notc) [ACT] internalOverloadState=1: userOverloadState=1: value=0: leak/ypeg/rpeg=20/40/100: normal/yellow/red=1200/1600/2400
09/12/22 18:15:34 [07.0000000448] <2473080652> (notc) [ACT] badMtceStateCount=0: picPresent=1: systemMtcePmdData->0x55dbc9a0
09/12/22 18:15:34 [07.0000000449] <2473080652> (notc) [ACT] cardMtceInfo:
09/12/22 18:15:34 [07.0000000450] <2473080652> (notc) [ACT] id=20: cardMtceFlags=0x0: sysCpuOverload=1: rsmCpuOverload=1
09/12/22 18:15:34 [07.0000000451] <2473080652> (notc) [ACT] recoveryModes=0x1: reloadType=0: sparingSummaryMask=0x0: dulPacketInterval=10000
09/12/22 18:15:34 [07.0000000452] <2473080652> (notc) [ACT] cardStates [ 6 6 6 6 1 1 9 6 1 1 1 1 1 1 ]
09/12/22 18:15:34 [07.0000000453] <2473080652> (notc) [ACT] CerCardEntry:
09/12/22 18:15:34 [07.0000000454] <2473080652> (notc) [ACT] cardType=6: cardSubType=6: cardAllowedAnnexes=1: cardAdminState=1: cardPrState=1: cardSecState=5: cardDplxStatus=2
09/12/22 18:15:34 [07.0000000455] <2473080652> (notc) [ACT] cardDetected=6: cardSubDetected=6: cardHardwareDetected=1: cardSpareGroupId=0: cardSpareGroupMode=0: cardTemperature=1
09/12/22 18:15:34 [07.0000000456] <2473080652> (notc) [ACT] cardPicDetected=6: cardLastResetReason=Clone Faulted
09/12/22 18:15:34 [07.0000000457] <2473080652> (notc) [ACT] CerCardDataEntry:
09/12/22 18:15:34 [07.0000000458] <2473080652> (notc) [ACT] cardSerialNum=15023RSM0028: cardHwVerion=RSM-08241W/D05:AK:000: cardHwDeviations=000
09/12/22 18:15:34 [07.0000000459] <2473080652> (notc) [ACT] cardSwVersion=CER_V06.00.00.0080:CER_V06.00.00.0080;030918121914: cardPatchVersions=_RDU0105;1533934043
09/12/22 18:15:34 [07.0000000460] <2473080652> (notc) [ACT] cardFwVersion=FW_RSM_V01.02:bud/0;02.00.59.00:stox/0;00.03.01.6d:aeon/0;00.01.06.0c:
09/12/22 18:15:34 [07.0000000461] <2473080652> (notc) [ACT] cardCommittedSwVersion=Kernel-1;CER_V06.00.00.0080;03/09/18 04:33:48 PM?Kernel-2;CER_V06.00.00.0080;03/09/18 04:33:48 PM: cardCpuType=: cardCpuSpeed=1500000000: cardBusSpeed=500000000: cardRamSize=4096
09/12/22 18:15:34 [07.0000000462] <2473080652> (notc) [ACT] cardNorFlashSize=67108864: cardNandFlashSize=2147483648
09/12/22 18:15:34 [07.0000000463] <2473080652> (notc) [ACT] cardFpgaSource=5: cardLastBootSource=1: cardBootVersion=CER_BOOT0_V00.00.09;03/07/13 04:51:51 PM: cardLastBootVersion=
09/12/22 18:15:34 [07.0000000464] <2473080652> (notc) [ACT] picSerialNum=16507RHB0058: picHwVersion=D01: picHwDeviations=000
09/12/22 18:15:34 [07.0000000465] <2473080652> (notc) [ACT] picModelNum=ARCT03323: picModelName=RPIC-10002W: picMfgRevision=AG
09/12/22 18:15:34 [07.0000000466] <2473080652> (notc) [ACT] picMfgDateTime=0000-00-00 69:39:38: picMfg=ARRIS
09/12/22 18:15:34 [07.0000000467] <2473080652> (notc) [ACT] cardMfgDateTime=0000-00-00 69:39:38: cardMfg=PLEXUS
09/12/22 18:15:34 [07.0000000468] <2473080652> (notc) [ACT] cardProductName=RSM-08241W: cardPartModelNum=ARCT03309: cardProductVersion=D05
09/12/22 18:15:34 [07.0000000469] <2473080652> (notc) [ACT] cardAssetTag=15023RSM0028: cardFeedAPresent=1: cardFeedBPresent=1
09/12/22 18:15:34 [07.0000000470] <2473080652> (notc) [ACT] cardDataAllowedPorts=-1: cardDataLicenseAnnex=1: cardDataLicensePorts=0: cardDataLicenseDate=0000-00-00 00:00:00
09/12/22 18:15:34 [07.0000000471] <2473080652> (notc) [ACT] cardMtceNotificationTimer OFF: cardEntryTimer ON: portEntryTimer OFF: auditPortEntryTimer ON
09/12/22 18:15:34 [07.0000000472] <2473080652> (notc) [ACT] nmiBackupRecoveryTimer ON: delayedCardResetTimer ON: delayedRcvyTimer OFF
09/12/22 18:15:34 [07.0000000473] <2473080652> (notc) [ACT] readPortEntryTimer OFF: cardInitializationTimer OFF: versionInfoTimer OFF
09/12/22 18:15:34 [07.0000000474] <2473080652> (notc) [ACT] statusRefreshTimer OFF: commDisableTimer OFF: cardLastResetReasonTimer OFF: cardPresenceTimer OFF
09/12/22 18:15:34 [07.0000000475] <2473080652> (notc) [ACT] userOverloadTimer OFF: dataplaneLinkFaultedTimer OFF: clockFaultedTimer OFF: cardPwrUpTimer OFF
09/12/22 18:15:34 [07.0000000476] <2473080652> (notc) [ACT] auditPortEntryMode FALSE
09/12/22 18:15:34 [07.0000000477] <2473080652> (notc) [ACT] itsCardMtceFunction->0x53bd2300: itsMtceControl->0x5582cba0: itsSystemMtce->0x45244c40: itsServiceLEDBlink->0x0
09/12/22 18:15:34 [07.0000000478] <2473080652> (notc) [ACT] Current state(s) = WaitForEvents CardProvisioned Waiting
09/12/22 18:15:34 [07.0000000479] <2473080652> (notc) [ACT] MtceControl: pingRate=0: current state(s) = Normal Idle MtcePingDisabled
09/12/22 18:15:34 [07.0000000480] <2473080652> (notc) [ACT] allowInit=1: allowConfigure=0: allowEnterService=0
09/12/22 18:15:34 [07.0000000481] <2473080738> (notc) [ACT] State PMD data seemed to fit in buffer rc=1171172, bufSz=1536000
09/12/22 18:15:34 [07.0000000482] <2473080377> (notc) [ACT] SystemMtce: State PMD generation scheduled: cardNumber=6: dumpNumber=0
09/12/22 18:15:34 [07.0000000483] <2473080448> (notc) [ACT] SystemMtce::notifyCardStateChange(): SystemMtceCard.6: basicMtceState=OUT_OF_SERVICE
09/12/22 18:15:34 [07.0000000484] <2473078604> (notc) [ACT] RsmCardMtceFunction: Clone RSM card failed: shutting down remote services
09/12/22 18:15:34 [07.0000000485] <2473089584> (notc) [ACT] DPX: (ACT) received MtceStandbyOOS
09/12/22 18:15:34 [07.0000000486] <2473089584> (notc) [ACT] DPX: (ACT) DpdLocalChannel received STANDBY_OOS
09/12/22 18:15:34 [07.0000000487] <2473089584> (notc) [ACT] DPX: (ACT) DpdRemoteChannel received STANDBY_OOS
09/12/22 18:15:34 [07.0000000488] <2473089584> (notc) [ACT] DPX: (ACT) DpdRemoteChannel changing linkState(Down)
09/12/22 18:15:34 [07.0000000489] <2473089584> (notc) [ACT] DPX: (ACT) DpxRemoteChannel received STANDBY_OOS
09/12/22 18:15:34 [07.0000000490] <2473089584> (notc) [ACT] DPX: (ACT) DpxRemoteChannel changing linkState(Down)
09/12/22 18:15:34 [07.0000000491] <2473089584> (notc) [ACT] DPX: (ACT) Setting sync state to not allowed
09/12/22 18:15:34 [07.0000000492] <2473080377> (notc) [ACT] SystemMtce: State PMD generation beginning: cardNumber=6: dumpNumber=0
09/12/22 18:15:34 [07.0000000493] <2473080377> (notc) [ACT] SystemMtce: State PMD generation completed: cardNumber=6: dumpNumber=0
09/12/22 18:15:34 [07.0000000494] <2473080696> (notc) [ACT] SystemMtceCard::handleBootReport(): Card 6: Entering Core Dump: MTCE_STATUS_ENTERING_CORE_DUMP/Entering Core Dump: bms=OUT_OF_SERVICE
09/12/22 18:15:35 [07.0000000495] <0000000003> (erro) [ACT] Link Down : Index(0x07060008) : Admin(up) : Oper(down) : Descr(ethernet 6/0)
09/12/22 18:15:35 [07.0000000496] <0000000003> (erro) [ACT] Link Down : Index(0x07060048) : Admin(up) : Oper(down) : Descr(mgmt 6/0)
09/12/22 18:15:35 [07.0000000497] <2473080640> (notc) [ACT] Card Primary State Change:
Trap Severity=major,Card:6,CardType=RSM,Card Subtype=RSM10g,Card Primary State=oos
09/12/22 18:15:35 [07.0000000498] <2473080641> (notc) [ACT] Card Secondary State Change:
Trap Severity=major,Card:6,CardType=RSM,Card Subtype=RSM10g,Card Secondary State=fault
09/12/22 18:15:35 [07.0000000499] <2473080642> (notc) [ACT] Card Duplex Status Change:
Trap Severity=minor,Card:6,CardType=RSM,Card Subtype=RSM10g,Card Duplex Status=notapplicable
09/12/22 18:15:55 [07.0000000500] <2473080696> (notc) [ACT] SystemMtceCard::handleBootReport(): Card 6: Exiting Core Dump: MTCE_STATUS_EXITING_CORE_DUMP/Exiting Core Dump: bms=OUT_OF_SERVICE
09/12/22 18:15:55 [07.0000000501] <2473080656> (notc) [ACT] SystemMtceCard::resetCard(): Resetting Card 6: resetType=POWERDOWN_RESET: recoveryCnt=1
09/12/22 18:15:55 [07.0000000502] <2473085714> (notc) [ACT] Maintenance Event: RSMXOVERP4DOWN: fm4000pt/104:Crossover Data Plane Ethernet Switch Ports - LINK DOWN XOVER Bottom Switch Port 4
09/12/22 18:15:55 [07.0000000503] <2473085714> (notc) [ACT] Maintenance Event: RSMXOVERP19DOWN: fm4000pt/119:Crossover Data Plane Ethernet Switch Ports - LINK DOWN XOVER Bottom Switch Port 19
09/12/22 18:15:55 [07.0000000504] <2473085714> (notc) [ACT] Maintenance Event: RSMXOVERP21DOWN: fm4000pt/121:Crossover Data Plane Ethernet Switch Ports - LINK DOWN XOVER Bottom Switch Port 21
09/12/22 18:15:59 [07.0000000505] <2473085714> (notc) [ACT] Maintenance Event: RSMNPUXING00TOPSWITCHATLVCNTO12: stoxspi42/0:Rsm Stox - STOX reports: Flow Control from NPU 0 SPI A for Eth 6/0 is stuck ON
09/12/22 18:16:12 [00.0000011952] <2473075513> (notc) CM 44:4e:6d:85:1a:25 (AVM) reset:SM_RANGING_OTHER_CHAN uchan:146 SID:10641 State:Ranging Complete uptime:157 sec card:0 CM cfg file:
09/12/22 18:16:19 [02.0000000048] <2473098669> (notc) DsDataMgr: Aged Out Flow Reported: tfid 16943, sid 559, sfid 1118, slot 0, md 3, mac_addr c80e.148b.28de
09/12/22 18:16:20 [03.0000000041] <2473098669> (notc) DsDataMgr: Aged Out Flow Reported: tfid 17313, sid 929, sfid 1858, slot 1, md 6, mac_addr dc15.c80d.fb61
09/12/22 18:16:20 [07.0000000506] <2473080641> (notc) [ACT] Card Secondary State Change:
Trap Severity=indeterminate,Card:6,CardType=RSM,Card Subtype=RSM10g,Card Secondary State=firmwarepump
09/12/22 18:17:14 [07.0000000507] <2473080641> (notc) [ACT] Card Secondary State Change:
Trap Severity=indeterminate,Card:6,CardType=RSM,Card Subtype=RSM10g,Card Secondary State=swdownload
09/12/22 18:17:15 [01.0000004158] <2473075513> (notc) CM cc:ce:1e:a4:52:69 (AVM) reset:SM_RANGING_OTHER_CHAN uchan:199 SID:9424 State:Ranging Complete uptime:154 sec card:1 CM cfg file:

Can you help or can you say, what the
09/12/22 18:15:34 [06.0000000119] <2473085714> (notc) [SBY] Maintenance Event: RSMNPUXING10TOPSWITCHRXSTATRDDIP4OOS: stoxspi42/10:Rsm Stox - STOX_SPI 1_0 Top Switch RD_DIP4_OOS (DIP-4 errors above threshold)
09/12/22 18:15:34 [06.0000000120] <2473085714> (notc) [SBY] Maintenance Event: RSMNPUXING10TOPSWITCHRXSTATRYDISABLED: stoxspi42/10:Rsm Stox - STOX_SPI 1_0 Top Switch RY DISABLE(calendar state machine disabled)
09/12/22 18:15:34 [06.0000000121] <2473085714> (notc) [SBY] Maintenance Event: RSMNPU1SPIASTATLKDWN: ezNpxx/1:Fatal Device Error - NPU1 SPI-A Stat Link Down
09/12/22 18:15:34 [06.0000000122] <2473080646> (notc) [SBY] SystemMtceCard::handleSystemMtceNotification(): SystemMtceCard.6: deviceError=9: FaultFlags=0x1

NPU-1 fatal error can say?

many thanks
Christian