Implement recall thread timeout on HSM service driver.

Explanation first:

Whenever a file is recalled on the OS file system level (e.g. double clicked in explorer) this sets of a driver call defined by the reparse point definition in the "STUB" file. This causes the kernel mode driver to request the HSM service to perform a recall, which is initiated at this time.

There are x number of concurrent recall threads.

At the current point in time, there are no timeout on the recall threads. If the TSM server misbehaves, and do not deliver data, and at the same time does not bail out with an error, the recall thread is stuck forever.

In the above mentioned scenario, further recall requests beyond the x number of concurrently running recall threads are queued up, and all those request have kernel references (because the filesystem level driver triggered by the reparse point call, carries a reference from the kernel mode process to the user mode process responsible for performing the recall).

If all x conccurent recall threads gets stuck in this unfortunate mode, recall requests will build up until the point where the kernel non-paged memory pool is exhausted due to the high number of kernelmode <--> usermode references, and the server bluescreens because of this.

Apparantly there is a cleanup routing to kill off stuck threads, however this routing is only triggered after a thread successfully returns, so if all threads gets stuck within the time limit where a thread is considered stuck, then there is no further control mechanisms to avoid recall queue buildup.

This is clearly to be considered a bug, but apparantly there is no bug reporting mechanism to this product, so here we go with an "enhancement" request.

REQUEST:

Create a control mechanism which monitors recall threads, or change the way recall threads run, so that a timeout is enforced on recall operations. Any recall threads exceeding the defined timeout is then terminated and cleaned up, so as to avoid killing the server by exhausting the kernel nonpaged memory pool.

Idea priority

Urgent

Post comment

Guest

Jun 12, 2015

Due to processing by IBM, this request was reassigned to have the following updated attributes:
Brand - Servers and Systems Software
Product family - Storage
Product - Tivoli Storage Manager (TSM) Family

For recording keeping, the previous attributes were:
Brand - Tivoli
Product family - Storage
Product - Tivoli Storage Manager (TSM) Family

Reply
Hide replies

Guest

Sep 14, 2011

Thank you for submitting this enhancement request, we do understand the requirement and the rationale behind it but unfortunately we currently do not plan to implement this enhancement request. the current logic HSM uses is that if someone tried to recall a file the expected result is that the file will be restored (unless an error happens) if we decide to follow the suggested enhancement it might cause multiple failures for reasons that are not related to the HSM functionality without user interaction.

Reply
Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Shape the future of IBM!

Search existing ideas

Post your ideas

Specific links you will want to bookmark for future use

Implement recall thread timeout on HSM service driver.

Please enter your email address

RELATED IDEAS

Implement recall thread timeout on HSM service driver.