In an environment where you have cross-replicating Spectrum Protect servers and where you decommission a virtual machine, subsequent replications (replicate node commands) will fail if there remains unreplicated data on one of the servers.
Replication fails because ever since I don't know what version of SP, this behaviour was introduced (being able to replicate decommissioned objects was considered a program error; there even was an APAR for it, I believe).
Problem is that when this happens, it's not at all clear. The errors messages around the failing replication don't immediately point in the direction of decommissioned objects.
You get a slew of errors like this:
ANR8216W Error sending data on socket 73404. Reason 10053.
ANR3178E A communication error occurred during session 3030812 with replication server ISPBEI01.
ANR3334W The server experienced a TCP/IP error while receiving data on socket -22736.
This one, if you happen to spot it, gives the best indication of what is going on (if you know to make the connection with replication being disallowed for decommissioned filespaces).
ANR1650W The server detected partially replicated data from a previous replication operation. This might result in extended processing time for process 203 while the server is replicating node CLSIK020_HV_TGT, file space \VMFULL-SVSIK121. (SESSION: 188936, PROCESS: 203)
In the Operations Center console, lots of critical ANR9999D errors are thrown.
Really nasty way of working for logging a standard situation.
The message preceding these ANR9999D batches does explain what is going on:
Feb 24, 2022, 11:56:35 AM ANR2363E SESSION 1263181: Operation is not allowed because file space \VMFULL-SVSIM0M8 belonging to node CLVMQ001_HV_TGT is decommissioned. (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D_2077889187 SmReplServerSession(smrepl.c:2812) Thread<194>: Replication protocol error, unknown verbType=0400 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> issued message 9999 from: (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff8389cc882 OutDiagToCons()+b2 output.c:1447 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff8389c58e2 outDiagfExt()+122 outvarg.c:294 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff838847768 SmReplServerSession()+1938 smrepl.c:2078 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff83876859d DoReplServer()+a4d smexec.c:10634 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff83875d286 smExecuteSession()+2636 smexec.c:4304 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff838ae8847 psSessionThread()+457 tcpcomm.c:2895 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff837cbd273 startThread()+5b3 pkthread.c:3936 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff84e70fb80 o__realloc_base()+60 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff84f9b84d4 BaseThreadInitThunk()+14 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff851f51791 RtlUserThreadStart()+21 (SESSION: 1263181)
Please clean up the error handling of this situation.
Our VM environment is managed more and more automatically, and it's difficult to predict when would be a good time to decommission a VM.
Replication should have run, but no new backups can have been made, so there are no differences between the SP servers.
It has become impossible to pick a time-slot when all these conditions have been met, every day.
So bottom-line, we just have to accept that sometimes decommissioned VMs will still have replication changes pending.
A nice, user-friendly error message to that effect would go a long way, instead of critical server events about replication protocol errors.
It will improve the cost-of-ownership of Spectrum Protect (try explaining to a novice SP engineer to look for ANR2363E messages to solve replication failures!).
This request may not be delivered within the release currently under development, but the theme is aligned with the current multi-year strategy. IBM may consider and evaluate any RFE Community feedback for this request through activities such as voting. IBM will update this request in the future.