Skip to Main Content
IBM System Storage Ideas Portal


This portal is to open public enhancement requests against IBM System Storage products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Not under consideration
Workspace ESS
Created by Guest
Created on Mar 17, 2021

Spectrum Scale / ESS soft lockup issue

I need help on this requirements for the system using Power9 AC922 servers and ESS 3000 storages. The customer has faced soft lockup issues with their applications running on AC922 which Spectrum Scale client runs on as well. They are definitely trying to solve issues on their application, however in the meantime, those are very strict for use in production and we have been strongly requested to provide alternatives.

Problems:
Two or more application processes on multiple nodes in a Scale cluster open a single identical file with RO (Read Only). Once soft lockup happened, other processes cause inaccessible to the Scale cluster until the locked node is expelled from the cluster after 'failureDetectionTime' exceeds. The customer is NOT acceptable for waiting for its recovery.

Reasons:
Metanode which is responsible for the RO file to manage i-node information hangs up, succeeding processes which want to ask its status to metanode and/or to request revoking tokens must be waited.

Alternative Ideas:
1.All applications do not modify any files. If the customer would not mind maintaining atime in metadata, I think there should be nobody who contaminates metadata information. Thus any locking mechanism would not be required.
We can specify 'noatime' option in mmchfs, and also can set '-i yes' in mmchattr to notify Scale of the Read Only.

Using the latest version of Spectrum Scale, the above scenario does not work, and we see still waiting for expelling. Thus some internal code enhancement would be required.

2. CPU cores binding for user's applications would be isolated from others which are responsible for Scale processes and other kernel processes. I just think we may use 'tuna', 'taskset' or 'isolcpus' to achieve the goal, but we do not have any technical sureness on it.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/index

Idea priority Urgent
  • Guest
    Reply
    |
    Apr 23, 2021

    We still have no clear view of whether this is a bug, a problem with application behavior, or an enhancement request. If it is an enhancement request, we don't know what is being asked for.

  • Guest
    Reply
    |
    Apr 6, 2021

    Sorry, it's not clear what the request here is for. What exactly does "soft lockup" mean? That sounds like a bug, not an enhancement. Can you please provide more description of what exactly the problem is?