This portal is to open public enhancement requests against IBM System Storage products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).
We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,
Post an idea.
Get feedback from the IBM team and other customers to refine your idea.
Follow the idea through the IBM Ideas process.
Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.
IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.
ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.
The actual problem was WAN saturation causing delayed replication between our two sites. The vast majority of our replication between sites is deferred. However, some workloads (like HSM migration and DB2 log offloads) require sync mode between sites. Those workloads were being affected and were proceeding very slowly during the time the WAN was saturated. We suspected network issues but didn't have any trending to compare the current response times to.
Having the network stats for trending like those that appear as output in the GRIDLINK STATUS commands at the link level would have pointed us to a general network issues. Statistics like latency, packets sent/retransmitted, read/write/total MB/S etc. would be a good start. While having those at the cluster level would be a great start having them at the link level would be even better. We've had instances where one link was operating at a much lower level and being able to trend that could be helpful in diagnosing other routing issues out of a specific LAN leg.
Could you provide more specific examples of stats you think would have helped? What was the actual problem and symptom? How did the team figure out why it was occurring?