Skip to Main Content
IBM System Storage Ideas Portal


This portal is to open public enhancement requests against IBM System Storage products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Future consideration
Created by Guest
Created on Jul 24, 2023

Improve zOS host IP connection loss detection and failover behavior

There have been incidents in the field showing the current design of CSM z/OS host IP connection loss detection and failover to redundant active host connection is significantly delayed to 15-20 minutes. This may happen especially when the connected system used by the HyperSwap session will fail during a planned or unplanned HyperSwap.

CSM might still recognize the HS trigger, but inturn tries to query HS Status via the host connection that may no longer be working. The default timeout of 15 minutes waiting for command responses could be decreased, but this may lead to problems when normal command processing from IOS would take longer than the configured timeout.

Suggestions for improved design:

- Each Host connection should have its own connection pinger with a configurable timeout (e.g. 60 sec). When the connction ping fails, this connection should be closed with an I/O exception.

- When connection was closed, the connection re-trial can be started independently of any session command processing. When the connection can be re-established, it can be marked as active again, and possibly be re-used by Sessions that communicate to this sysplex.

- The Session communication to the system currently uses only 1 host connection, even if there are more active host connections available to the sysplex. When a command times out after 15 min, it seems another attempt using same connection is made with shorter timeout. If the connection pinger closes the unresponsive host connection quicker, all Sessions using that closed connection should re-act on this event and immediately switch over to another active host connection to the same sysplex if available.

 

Idea priority High
  • Guest
    Reply
    |
    Sep 15, 2023
    Reopening this as a IDEA uncommitted candidate. The issue is that once the command has been issued across the IP network, there isn't an easy way for us to determine that the connection is actually dead. Need to rearchitect a way to either determine faster that the connection is gone for a system on the sysplex, or find a way to determine the connection is gone after the command is sent, and kill the sent command. Will look at this as a future feature.
  • Guest
    Reply
    |
    Jul 24, 2023
    Not sure whether the code works exactly as described, however this appears more like a defect as the redundant connection should failover faster. Opened an internal defect to track looking into this issue closer.