`ENOMEM` HTTP error in ret.js #43

phette23 · 2024-09-13T15:50:34Z

The initial retention script ret.js suffered memory problems, it would initiate a number of requests but eventually hit an ENOMEM error inside either node-fetch or the native node fetch interface:

node:internal/deps/undici/undici:13185
      Error.captureStackTrace(err);
            ^

TypeError: fetch failed
    at node:internal/deps/undici/undici:13185:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5) {
  [cause]: Error: connect ENOMEM 209.40.90.39:443 - Local (0.0.0.0:0)
      at internalConnect (node:net:1093:16)
      at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
      at GetAddrInfoReqWrap.emitLookup [as callback] (node:net:1492:9)
      at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:132:8) {
    errno: -12,
    code: 'ENOMEM',
    syscall: 'connect',
    address: '209.40.90.39',
    port: 443
  }
}

I tried a few approaches to solve this which kept reducing the memory usage but none fixed the error:

an Item object consumes way more memory than the simple JSON API responses due to the parsed XML it contains, so do not map JSON to Items as you go but only in the final summarize function
create a queue and track active requests, only allowing N active requests at a time (abandoned this approach entirely)
try both an older (node 20.16.0) and newer (22.8.0) node version
switch from node-fetch to node's native fetch implementation

Of all these, 1 and 4 made a noticeable difference, but the errors continued. Finally, I gave up on asynchronous code and rewrote the search function await the HTTP response and the parsing of the JSON response body. This means there's only one request at a time and node is better able to garbage collect prior response and data objects. The memory still spikes when Items are created but the script did complete successfully, so this issue is merely to track what I did.

If this problem recurs in the future, here are a couple more ideas:

The number of items greater than 7 years old will only continue to grow; try processing them in date range chunks using not only modifiedBefore but modifiedAfter as well.
Use the node command line flag --max-old-space-size to tell node to use more memory
Split ret.js into a get.js script which simply streams JSON API data into an unprocessed file, then make ret.js stream through JSON items through Item to determine if they should be deaccessioned and write the final items.json file.

The text was updated successfully, but these errors were encountered:

phette23 · 2024-10-30T20:58:34Z

Any process that makes repeated requests to VAULT is seeing similar errors (e.g. course_lists is too) and I think it has to do the with application. I haven't found a solution other than spacing our requests, which mitigates but does not eliminate the problem.

phette23 added bug javascript Pull requests that update Javascript code labels Sep 13, 2024

phette23 closed this as completed in ed959fa Sep 13, 2024

phette23 reopened this Sep 13, 2024

phette23 assigned phette23 and unassigned phette23 Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ENOMEM` HTTP error in ret.js #43

`ENOMEM` HTTP error in ret.js #43

phette23 commented Sep 13, 2024

phette23 commented Oct 30, 2024

ENOMEM HTTP error in ret.js #43

ENOMEM HTTP error in ret.js #43

Comments

phette23 commented Sep 13, 2024

phette23 commented Oct 30, 2024

`ENOMEM` HTTP error in ret.js #43

`ENOMEM` HTTP error in ret.js #43