You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The initial retention script ret.js suffered memory problems, it would initiate a number of requests but eventually hit an ENOMEM error inside either node-fetch or the native node fetch interface:
I tried a few approaches to solve this which kept reducing the memory usage but none fixed the error:
an Item object consumes way more memory than the simple JSON API responses due to the parsed XML it contains, so do not map JSON to Items as you go but only in the final summarize function
create a queue and track active requests, only allowing N active requests at a time (abandoned this approach entirely)
try both an older (node 20.16.0) and newer (22.8.0) node version
switch from node-fetch to node's native fetch implementation
Of all these, 1 and 4 made a noticeable difference, but the errors continued. Finally, I gave up on asynchronous code and rewrote the search function await the HTTP response and the parsing of the JSON response body. This means there's only one request at a time and node is better able to garbage collect prior response and data objects. The memory still spikes when Items are created but the script did complete successfully, so this issue is merely to track what I did.
If this problem recurs in the future, here are a couple more ideas:
The number of items greater than 7 years old will only continue to grow; try processing them in date range chunks using not only modifiedBefore but modifiedAfter as well.
Use the node command line flag --max-old-space-size to tell node to use more memory
Split ret.js into a get.js script which simply streams JSON API data into an unprocessed file, then make ret.js stream through JSON items through Item to determine if they should be deaccessioned and write the final items.json file.
The text was updated successfully, but these errors were encountered:
Any process that makes repeated requests to VAULT is seeing similar errors (e.g. course_lists is too) and I think it has to do the with application. I haven't found a solution other than spacing our requests, which mitigates but does not eliminate the problem.
The initial retention script ret.js suffered memory problems, it would initiate a number of requests but eventually hit an
ENOMEM
error inside either node-fetch or the native node fetch interface:I tried a few approaches to solve this which kept reducing the memory usage but none fixed the error:
Item
object consumes way more memory than the simple JSON API responses due to the parsed XML it contains, so do not map JSON to Items as you go but only in the finalsummarize
functionnode-fetch
to node's native fetch implementationOf all these, 1 and 4 made a noticeable difference, but the errors continued. Finally, I gave up on asynchronous code and rewrote the
search
functionawait
the HTTP response and the parsing of the JSON response body. This means there's only one request at a time and node is better able to garbage collect prior response and data objects. The memory still spikes whenItem
s are created but the script did complete successfully, so this issue is merely to track what I did.If this problem recurs in the future, here are a couple more ideas:
modifiedBefore
butmodifiedAfter
as well.node
command line flag--max-old-space-size
to tell node to use more memoryThe text was updated successfully, but these errors were encountered: