Skip to Main Content
IBM System Storage Ideas Portal


This portal is to open public enhancement requests against IBM System Storage products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Not under consideration
Created by Guest
Created on Jan 28, 2015

Trigger Error 1195 A node has been missing for 30 minutes after any condition that has a node in a non functional state.

Generally the sequence we have seen is a 1196 warning email that a node has failed and then 30 minutes later an 1195 error indicating that the node has been gone for 30 minutes, which also generates an email, and a call home.

What we have experienced, twice on two different clusters, is a 1196 warning that a node has left the cluster which generates an email. Then we received a 1193 UPS warning that also generates an email. After 30 minutes, we never received the expected 1195 error however and so no call home is ever generated. Unless someone is monitoring their email a failed node can go unnoticed for hours.

We raised this concern a year ago and were told this is working as designed and it happened again recently. This does not seem like the proper SVC response to an node being offline.

Working through our Lab Advocate we received the following response to this request for a design change:
====================================================

When a node goes into service state due to low UPS charge, the warning is generated that this is happened,
at this point the SVC node is just waiting for the UPS to report that it has sufficient charge, and then the node will spring back into life.
UPSs are very bad at knowing how long it will take to get there (or even if they will ever get there).

As such the SVC does not know when it will become available again.
From the cluster's point of view, the node has not left the cluster, it is not offline, and it is fully ready to spring into performing IO once more, after the UPS says it is ready.

If the UPS does decide that the battery is dead - a call home will be triggered, but the UPSs are very bad at doing this.

This is a known issue, and we have fixed it by using our own batteries in the DH8, which behaves as you have asked it to :)
===================================================

Our response is that from the cluster's point of view the node is not offline, but if the other remaining node in the IO group were to then fail, the IO group would go offline. Right? So the node with the UPS issue may as well be offline as it is providing no value in it's state of limbo. The point being that the reason for a call home is to inform IBM that there is something wrong and the risk is raised. When only one node is active, for 30 minutes, the call home should be generated regardless of the reason.

I do appreciate the fact that the DH8 does not have this same exposure, however that does not fix the situation ( design flaw, sorry let's call it what it is :-) ) on the vast majority of the existing SVC population.

Idea priority Medium
  • Guest
    Reply
    |
    Jun 12, 2015

    Due to processing by IBM, this request was reassigned to have the following updated attributes:
    Brand - Servers and Systems Software
    Product family - Storage
    Product - IBM System Storage SAN Volume Controller (SVC) / Spectrum Virtualize

    For recording keeping, the previous attributes were:
    Brand - Tivoli
    Product family - Storage
    Product - IBM System Storage SAN Volume Controller (SVC) / Spectrum Virtualize

  • Guest
    Reply
    |
    Feb 18, 2015

    While the concern is understood, the new node design of the 2145-DH8 no longer presents this issue, and we are of necessity placing our development resources in other areas to improve the SVC for you.

  • Guest
    Reply
    |
    Jan 30, 2015

    I disagree with #1service. I am sure customers who deploy SVC's would state the current design is lacking (I agree) and if there is an automated way to "alert" users via a call-home, IBM should implement. If this is to be an enterprise solution IBM needs to get serious about this. Thank You

  • Guest
    Reply
    |
    Jan 30, 2015

    I believe this Machine is working as designed. The Machine sends an e-mail that a node goes offline. Then it is the responsibility of the customer to do problem determination on why the node went down. The SVC was designed to run with a node down in a cluster without customer impact. I do not believe that this should be an issue. So my vote is no.