Skip to Main Content
IBM System Storage Ideas Portal


This portal is to open public enhancement requests against IBM System Storage products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

Status Future consideration
Created by Guest
Created on Dec 18, 2023

Improve system memory hard limit for GPFS program startup to adapt to AI system

We hit the following issue on Spectrum Scale 5.1.6.1 when GPFS is starting up.  
2023-12-13_21:22:28.648+0800: [I] Verifying minimum system memory configurations.
2023-12-13_21:22:28.648+0800: [I] The system memory configuration is 2063930 MiB
2023-12-13_21:22:28.648+0800: [I] The daemon memory configuration hard floor is 1536 MiB
2023-12-13_21:22:28.649+0800: [I] Initializing the main process ...
2023-12-13_21:22:28.683+0800: [E] Failed to allocate 92274688 bytes in memory pool, err -1
2023-12-13_21:22:28.683+0800: [X] logAssertFailed: err == E_OK
2023-12-13_21:22:28.683+0800: [X] return code 12, reason code 0, log record tag 0
2023-12-13_21:22:28.909+0800: [X] logAssertFailed: !"clock_gettimeP is NULL"  
The error messages indicates "Failed to allocate xxx bytes", but the system still have a lot of free memory(nearly 1500G+ free,total about 2015G  ).


The issue turns out to be caused by the VMALLOC limit. There is a 1T limitation for the ADDR above VMALLOC_START. So when other kernel modules occupied huge memory, GPFS will fail to allocate new memory because the ADDR exceeds the 1T limit.  
There are 888252 of below in /proc/vmallocinfo, each takes about 3M, that is 2.6T+ in total.
0xffffa4ee71c00000-0xffffa4ee71f01000 3149824 ttm_bo_kmap+0x233/0x2a0 [ttm] phys=0x0000000094005000 ioremap
0xffffa4ee72000000-0xffffa4ee72301000 3149824 ttm_bo_kmap+0x233/0x2a0 [ttm] phys=0x0000000094005000 ioremap
0xffffa4ee72800000-0xffffa4ee72b01000 3149824 ttm_bo_kmap+0x233/0x2a0 [ttm] phys=0x0000000094005000 ioremap
0xffffa4ee72c00000-0xffffa4ee72f01000 3149824 ttm_bo_kmap+0x233/0x2a0 [ttm] phys=0x0000000094005000 ioremap
0xffffa4ee73c00000-0xffffa4ee73f01000 3149824 ttm_bo_kmap+0x233/0x2a0 [ttm] phys=0x0000000094005000 ioremap  
We have also tried the efix with 4T limit and still hit the issue on some nodes.  
We cannot start up GPFS right now. Even we reboot the node, the issue may go away temporarily but it may come back any time.  
We sincerely and eagerly hope this hard limit can be improved to fit AI system with large-memory. Thank you.

Idea priority Urgent