Skip to Main Content
IBM System Storage Ideas Portal


This portal is to open public enhancement requests against IBM System Storage products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Future consideration
Created by Guest
Created on Jan 8, 2019

TS7760T with a problematic PVOL requires SSR intervention to go ROR. Request is to change this behavior to mitigate interve

The problem to be discussed:
a) Reclaim of a non-exported, not-properly-ejected, physically damaged tape should have been easier to do in a multi-cluster grid, in the scenario DTNA experienced.
b) Reclaim of offsite tapes should not have been impacted due to the damaged tape, in the scenario DTNA experienced.
c) Even after getting the damaged tape's PVOL to show zero active LVOLS, nothing DTNA could do from the customer GUI could get the tape to logically eject and not generate error messages any more.
The fundamental design shortcoming, from DTNA's perspective, is that the TS7760T in question persisted in insisting on regaining access to the damaged tape (which was impossible), ignoring the fact that it could have gotten all of the needed LVOLS from other clusters in the grid. Every LVOL on the damaged PVOL, and every LVOL on the offsite tapes, had extra consistent copies in the cache of one or two other TS7700s in the same grid family. It does not make any sense that the customer has to get the support center involved, to initiate a ROR process. The EJECT or MOVE commands from the customer GUI should have been able to remove the tape from consideration, or at least moved as many LVOLS are possible (just like the ROR does).
The one potential complication that could have affected these attempts to EJECT or MOVE the LVOLs using other copies in the grid is if the timing for the removal of scratched LVOLS is inconsistent. For instance, the first attempt to EJECT the bad volume would have happened when 33 scratched LVOLS were 1 or 2 hours past their “earliest deletion on” date/time, and 2 scratched LVOLS had started their 1 day grace period just 1 or 2 hours prior to the EJECT attempt. Perhaps some of the 33 ‘eligible for deletion' LVOLS had followed through with physical deletion on the other members of the grid. All of the 33 LVOLS eligible for deletion were in cache on two other clusters leading up to this event.
One reason this potential timing problem with scratched tapes comes to mind is because when the ROR was finally initiated by the IBM support center, all but 1 LVOL moved, with the remaining LVOL being a scratched LVOL that had reached it's eligible for deletion' time 21 hours prior to the ROR attempt. We have a cut-and-paste display that shows it no longer in cache on the companion clusters, yet apparently not physically removed from the cluster that owned the bad tape. There were 10 other LVOLS that had been scratched at the same time, and none of those held up the ROR, but that 1 LVOL lingered for some reason.
A second flaw is that it appeared that on-demand Copy Export Reclaim was ‘stuck'. DTNA had at least 33 PVOLs for which they had issued COPYEXP,RECLAIM commands. These were shown to be in CE_RECLAIM status, yet no reclaims were completing. After a ROR was done for the bad PVOL, getting that bad POL down to only 1 remaining LVOL (which was in scratched-but-not-removed status), the reclaims resumed. However, oddly, once the 1 remaining LVOL did get removed, the Reclaims stopped happening once again. It took a forced pause and resume of the TS7760T to once again get the Reclaims happening.

Idea priority Medium
  • Guest
    Reply
    |
    Jan 8, 2019

    Attachment (Description): Full write up email from the client on the perceived design shortcomings.