Skip to Main Content
IBM System Storage Ideas Portal


This portal is to open public enhancement requests against IBM System Storage products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Future consideration
Created by Guest
Created on Feb 24, 2022

Replication failures caused by decommissioned filespaces (virtual machines) are not sufficiently visible

In an environment where you have cross-replicating Spectrum Protect servers and where you decommission a virtual machine, subsequent replications (replicate node commands) will fail if there remains unreplicated data on one of the servers.
Replication fails because ever since I don't know what version of SP, this behaviour was introduced (being able to replicate decommissioned objects was considered a program error; there even was an APAR for it, I believe).
Problem is that when this happens, it's not at all clear. The errors messages around the failing replication don't immediately point in the direction of decommissioned objects.

You get a slew of errors like this:
ANR8216W Error sending data on socket 73404. Reason 10053.
ANR3178E A communication error occurred during session 3030812 with replication server ISPBEI01.
ANR3334W The server experienced a TCP/IP error while receiving data on socket -22736.

This one, if you happen to spot it, gives the best indication of what is going on (if you know to make the connection with replication being disallowed for decommissioned filespaces).
ANR1650W The server detected partially replicated data from a previous replication operation. This might result in extended processing time for process 203 while the server is replicating node CLSIK020_HV_TGT, file space \VMFULL-SVSIK121. (SESSION: 188936, PROCESS: 203)

In the Operations Center console, lots of critical ANR9999D errors are thrown.
Really nasty way of working for logging a standard situation.
The message preceding these ANR9999D batches does explain what is going on:

Feb 24, 2022, 11:56:35 AM ANR2363E SESSION 1263181: Operation is not allowed because file space \VMFULL-SVSIM0M8 belonging to node CLVMQ001_HV_TGT is decommissioned. (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D_2077889187 SmReplServerSession(smrepl.c:2812) Thread<194>: Replication protocol error, unknown verbType=0400 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> issued message 9999 from: (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff8389cc882 OutDiagToCons()+b2 output.c:1447 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff8389c58e2 outDiagfExt()+122 outvarg.c:294 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff838847768 SmReplServerSession()+1938 smrepl.c:2078 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff83876859d DoReplServer()+a4d smexec.c:10634 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff83875d286 smExecuteSession()+2636 smexec.c:4304 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff838ae8847 psSessionThread()+457 tcpcomm.c:2895 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff837cbd273 startThread()+5b3 pkthread.c:3936 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff84e70fb80 o__realloc_base()+60 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff84f9b84d4 BaseThreadInitThunk()+14 (SESSION: 1263181)
Feb 24, 2022, 11:56:35 AM ANR9999D Thread<194> 7ff851f51791 RtlUserThreadStart()+21 (SESSION: 1263181)

Please clean up the error handling of this situation.
Our VM environment is managed more and more automatically, and it's difficult to predict when would be a good time to decommission a VM.
Replication should have run, but no new backups can have been made, so there are no differences between the SP servers.
It has become impossible to pick a time-slot when all these conditions have been met, every day.

So bottom-line, we just have to accept that sometimes decommissioned VMs will still have replication changes pending.
A nice, user-friendly error message to that effect would go a long way, instead of critical server events about replication protocol errors.
It will improve the cost-of-ownership of Spectrum Protect (try explaining to a novice SP engineer to look for ANR2363E messages to solve replication failures!).

Idea priority Medium
  • Admin
    Juan Carlos Jimenez Fuentes
    Reply
    |
    Jun 13, 2022

    This request may not be delivered within the release currently under development, but the theme is aligned with the current multi-year strategy. IBM may consider and evaluate any RFE Community feedback for this request through activities such as voting. IBM will update this request in the future.