Skip to Main Content
IBM System Storage Ideas Portal
Hide about this portal


This portal is to open public enhancement requests against IBM System Storage products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Recover failed Ganesha node - which mark with F flag, without reboot.

See this idea on ideas.ibm.com

Hi

We had NFS Ganesha node that failed - but we did not able to recover it , only reboot solve it.


Sysmon attempted to remove the F flag three times, but all attempts to acquire the lock failed.
As a result, the node remained in a failed state despite being healthy.
Restarting the node allowed the process to retry successfully once the lock became available.

NFS failure: RPC null checks and the stat checks (collects IO number 2 times and if its the same, the test fails) both failed. thats why nfs_not_active. This was fixed soon.
why node stayed in failed state even NFS was healthy: But to remove Failed state we need a fail-over lock and we didn't get it, it tries 3 times. Later when the lock was available, we had already exhausted our tries. And this is working as designed. But ideally we may want to redesign this.

Idea priority Medium
  • Guest
    Reply
    |
    Feb 24, 2025

    Hi

     

    CAn we add to mmhealth monitor the following errors in ganesha.log file (or all CRIT messages to be monitor):

     

    2025-01-26 12:20:48 : epoch 001c0612 : ess4-proto6 : gpfs.ganesha.nfsd-2163267[svc_4386] fsal_find_fd :FSAL :CRIT :Open for locking failed for access Read/Write

    2025-01-26 12:20:48 : epoch 001c0612 : ess4-proto6 : gpfs.ganesha.nfsd-2163267[svc_2687] fsal_find_fd :FSAL :CRIT :Open for locking failed for access Read/Write

    2025-01-26 12:20:48 : epoch 001c0612 : ess4-proto6 : gpfs.ganesha.nfsd-2163267[svc_4005] fsal_find_fd :FSAL :CRIT :Open for locking failed for access Read/Write

    2025-01-26 12:20:49 : epoch 001c0612 : ess4-proto6 : gpfs.ganesha.nfsd-2163267[svc_4612] fsal_find_fd :FSAL :CRIT :Open for locking failed for access Read/Write

     

    2025-01-26 16:07:25 : epoch 0010060f : ess4-proto2 : gpfs.ganesha.nfsd-2657517[svc_749] mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit (943718) Exceeded (open_fd_count = 943719), waking LRU thread.

    2025-01-26 16:07:25 : epoch 0010060f : ess4-proto2 : gpfs.ganesha.nfsd-2657517[svc_989] mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit (943718) Exceeded (open_fd_count = 943723), waking LRU thread.

    2025-01-26 16:07:25 : epoch 0010060f : ess4-proto2 : gpfs.ganesha.nfsd-2657517[svc_752] mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit (943718) Exceeded (open_fd_count = 943723), waking LRU thread.