Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENOMEM HTTP error in ret.js #43

Open
phette23 opened this issue Sep 13, 2024 · 1 comment
Open

ENOMEM HTTP error in ret.js #43

phette23 opened this issue Sep 13, 2024 · 1 comment
Labels
bug javascript Pull requests that update Javascript code

Comments

@phette23
Copy link
Member

The initial retention script ret.js suffered memory problems, it would initiate a number of requests but eventually hit an ENOMEM error inside either node-fetch or the native node fetch interface:

node:internal/deps/undici/undici:13185
      Error.captureStackTrace(err);
            ^

TypeError: fetch failed
    at node:internal/deps/undici/undici:13185:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5) {
  [cause]: Error: connect ENOMEM 209.40.90.39:443 - Local (0.0.0.0:0)
      at internalConnect (node:net:1093:16)
      at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
      at GetAddrInfoReqWrap.emitLookup [as callback] (node:net:1492:9)
      at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:132:8) {
    errno: -12,
    code: 'ENOMEM',
    syscall: 'connect',
    address: '209.40.90.39',
    port: 443
  }
}

I tried a few approaches to solve this which kept reducing the memory usage but none fixed the error:

  1. an Item object consumes way more memory than the simple JSON API responses due to the parsed XML it contains, so do not map JSON to Items as you go but only in the final summarize function
  2. create a queue and track active requests, only allowing N active requests at a time (abandoned this approach entirely)
  3. try both an older (node 20.16.0) and newer (22.8.0) node version
  4. switch from node-fetch to node's native fetch implementation

Of all these, 1 and 4 made a noticeable difference, but the errors continued. Finally, I gave up on asynchronous code and rewrote the search function await the HTTP response and the parsing of the JSON response body. This means there's only one request at a time and node is better able to garbage collect prior response and data objects. The memory still spikes when Items are created but the script did complete successfully, so this issue is merely to track what I did.

If this problem recurs in the future, here are a couple more ideas:

  1. The number of items greater than 7 years old will only continue to grow; try processing them in date range chunks using not only modifiedBefore but modifiedAfter as well.
  2. Use the node command line flag --max-old-space-size to tell node to use more memory
  3. Split ret.js into a get.js script which simply streams JSON API data into an unprocessed file, then make ret.js stream through JSON items through Item to determine if they should be deaccessioned and write the final items.json file.
@phette23 phette23 added bug javascript Pull requests that update Javascript code labels Sep 13, 2024
@phette23 phette23 reopened this Sep 13, 2024
@phette23 phette23 assigned phette23 and unassigned phette23 Sep 13, 2024
@phette23
Copy link
Member Author

Any process that makes repeated requests to VAULT is seeing similar errors (e.g. course_lists is too) and I think it has to do the with application. I haven't found a solution other than spacing our requests, which mitigates but does not eliminate the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug javascript Pull requests that update Javascript code
Projects
None yet
Development

No branches or pull requests

1 participant