Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Out of bounds memory access #339

Open
rhuanbarreto opened this issue Sep 24, 2024 · 8 comments
Open

[Bug] Out of bounds memory access #339

rhuanbarreto opened this issue Sep 24, 2024 · 8 comments

Comments

@rhuanbarreto
Copy link

rhuanbarreto commented Sep 24, 2024

When running tests using pglite, i'm getting this error many times:

bun test v1.1.29 (6d43b366)

index.test.ts:

# Unhandled error between tests
-------------------------------
1 | import{b as Ge,d as se,e as Ce,f as We,g as X,h as G,i as ie,j as re}from"./chunk-SJVDOE3S.js";import{A as Q,B as Ae,C as $,D as Me,E as Y,F as Ve,G as Re,a as ze,b as je,c as ae,d as le,e as ce,f as de,g as ue,h as pe,i as fe,j as me,k as he,l as ye,m as ge,n as we,o as xe,p as be,q as C,r as ve,s as Ee,t as Pe,u as _e,v as Te,w as Fe,x as Se,y as W,z as De}from"./chunk-7M5G4I52.js";import"./chunk-TAZBUB4Z.js";import{a as ne,b as oe,c as Qe,d as $e}from"./chunk-3BCYDNSE.js";import{b as Le,c as qe,d as i,e as p,f,g as S,i as h}from"./chunk-Y3AVQXKT.js";h();h();h();var et=new Error("timeout while waiting for mutex to become available"),tt=new Error("mutex already locked"),Ke=new Error("request for lock canceled"),Xe=function(u,t,e,s){function d(r){return r instanceof e?r:new e(function(n){n(r)})}return new(e||(e=Promise))(function(r,n){function o(c){try{l(s.next(c))}catch(m){n(m)}}function a(c){try{l(s.throw(c))}catch(m){n(m)}}function l(c){c.done?r(c.value):d(c.value...

RuntimeError: Out of bounds memory access (evaluating 'this.mod._pg_initdb()')
      at postgres.wasm.wasm-function[sort_checkpoint_bufferids_med3]
      at postgres.wasm.wasm-function[sort_checkpoint_bufferids]
      at postgres.wasm.wasm-function[CheckPointBuffers]
      at postgres.wasm.wasm-function[CheckPointGuts]
      at postgres.wasm.wasm-function[CreateCheckPoint]
      at postgres.wasm.wasm-function[ShutdownXLOG]
      at postgres.wasm.wasm-function[pg_proc_exit]
      at postgres.wasm.wasm-function[pg_initdb]
      at /<redacted>/node_modules/@electric-sql/pglite/dist/index.js:1:10594
-------------------------------

I'm using PGlite v0.2.6 and Bun v1.1.29

This is flaky. Sometimes it works, sometimes doesn't. Any help on fixing this?

@samwillis
Copy link
Collaborator

Hey @rhuanbarreto

Thanks for the report. If you could narrow down the cause and produce a reproduction it would be super helpful.

If you want to dig in yourself, setting the debug level to 1 with PGlite.create({ debug: 1 }) will output a trace of what Postgres is doing internally. It usually gives a good indication of where these errors can occur.

@rhuanbarreto
Copy link
Author

Upgraded to v0.2.8 and enabled debug. Got a new related error (still happens at initdb):

pglite: no db
# WARNING: program "postgres" is needed by initdb but was not found in the same directory as "/tmp/pglite/bin/initdb"
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

# 3442
The database cluster will be initialized with this locale configuration:
  provider:    libc
  LC_COLLATE:  C
  LC_CTYPE:    C.UTF-8
  LC_MESSAGES: C
  LC_MONETARY: C
  LC_NUMERIC:  C
  LC_TIME:     C
# 2651
# 2705
# 2706
# 3444
The default text search configuration will be set to "english".
# 3446

Data page checksums are disabled.

# 3458
creating directory /tmp/pglite/base ... ok
creating subdirectories ... ok
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
# 3461: TODO: fsync_pgdata ?
# FIXING: int shmget (key_t __key=738, size_t __size=40, int __shmflg=1920) pagesize default=65536
# FIXING: void *shmat (int __shmid=666, const void *__shmaddr=0, int __shmflg=0)
# 360: InitPostgres(boot): /home/runner/work/pglite/pglite/postgresql-16.4/src/backend/bootstrap/bootstrap.c
bun test v1.1.29 (6d43b366)

tests/facilities.test.ts:
popen failure: Function not implemented
# popen["/tmp/pglite/bin/postgres" --boot -X 1048576  -F -c log_checkpoints=false  ] (BOOT)
# pg_pclose(/tmp/initdb.boot.txt) 129:../../src/include/pg_config_os.h
# popen["/tmp/pglite/bin/postgres" --single -F -O -j -c search_path=pg_catalog -c exit_on_error=true -c log_checkpoints=false  template1 >/dev/null] (SINGLE)
# pg_pclose(/tmp/initdb.single.txt) 129:../../src/include/pg_config_os.h
2024-09-30 10:33:13.246 GMT [42] DEBUG:  invoking IpcMemoryCreate(size=144228352)
2024-09-30 10:33:13.311 GMT [42] DEBUG:  dynamic shared memory system will support 224 segments
2024-09-30 10:33:13.341 GMT [42] DEBUG:  created dynamic shared memory control segment 3475055906 (5388 bytes)
2024-09-30 10:33:13.342 GMT [42] DEBUG:  transaction ID wrap limit is 2147483650, limited by database with OID 1
2024-09-30 10:33:13.342 GMT [42] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2024-09-30 10:33:13.342 GMT [42] DEBUG:  creating and filling new WAL file
2024-09-30 10:33:13.346 GMT [42] DEBUG:  done creating and filling new WAL file
2024-09-30 10:33:13.347 GMT [42] DEBUG:  InitPostgres
2024-09-30 10:33:13.347 GMT [42] NOTICE:  database system was shut down at 2024-09-30 10:33:13 GMT
2024-09-30 10:33:13.348 GMT [42] DEBUG:  checkpoint record is at 0/100028
2024-09-30 10:33:13.348 GMT [42] DEBUG:  redo record is at 0/100028; shutdown true
2024-09-30 10:33:13.348 GMT [42] DEBUG:  next transaction ID: 3; next OID: 10000
2024-09-30 10:33:13.348 GMT [42] DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
2024-09-30 10:33:13.348 GMT [42] DEBUG:  oldest unfrozen transaction ID: 3, in database 1
2024-09-30 10:33:13.348 GMT [42] DEBUG:  oldest MultiXactId: 1, in database 1
2024-09-30 10:33:13.348 GMT [42] DEBUG:  commit timestamp Xid oldest/newest: 0/0
2024-09-30 10:33:13.348 GMT [42] DEBUG:  transaction ID wrap limit is 2147483650, limited by database with OID 1
2024-09-30 10:33:13.348 GMT [42] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2024-09-30 10:33:13.348 GMT [42] DEBUG:  starting up replication slots
2024-09-30 10:33:13.348 GMT [42] DEBUG:  xmin required by slots: data 0, catalog 0
2024-09-30 10:33:13.350 GMT [42] DEBUG:  starting up replication origin progress state
2024-09-30 10:33:13.351 GMT [42] DEBUG:  reading stats file "pg_stat/pgstat.stat"
2024-09-30 10:33:13.351 GMT [42] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2024-09-30 10:33:13.351 GMT [42] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1

# Unhandled error between tests
-------------------------------
1 | import{b as Ge,d as se,e as Ce,f as We,g as X,h as G,i as ie,j as re}from"./chunk-SJVDOE3S.js";import{A as Q,B as Ae,C as $,D as Me,E as Y,F as Ve,G as Re,a as ze,b as je,c as ae,d as le,e as ce,f as de,g as ue,h as pe,i as fe,j as me,k as he,l as ye,m as ge,n as we,o as xe,p as be,q as C,r as ve,s as Ee,t as Pe,u as _e,v as Te,w as Fe,x as Se,y as W,z as De}from"./chunk-7M5G4I52.js";import"./chunk-TAZBUB4Z.js";import{a as ne,b as oe,c as Qe,d as $e}from"./chunk-3BCYDNSE.js";import{b as Le,c as qe,d as r,e as p,f,g as S,i as h}from"./chunk-Y3AVQXKT.js";h();h();h();var et=new Error("timeout while waiting for mutex to become available"),tt=new Error("mutex already locked"),Ke=new Error("request for lock canceled"),Xe=function(d,t,e,s){function c(n){return n instanceof e?n:new e(function(a){a(n)})}return new(e||(e=Promise))(function(n,a){function u(o){try{l(s.next(o))}catch(m){a(m)}}function i(o){try{l(s.throw(o))}catch(m){a(m)}}function l(o){o.done?n(o.value):c(o.value).then(u,i)}l((s=s.apply(d,t||[])).next())}

RuntimeError: call_indirect to a null table entry (evaluating 'this.mod._pg_initdb()')
      at postgres.wasm.wasm-function[InputFunctionCall]
      at postgres.wasm.wasm-function[OidInputFunctionCall]
      at postgres.wasm.wasm-function[InsertOneValue]
      at postgres.wasm.wasm-function[boot_yyparse]
      at postgres.wasm.wasm-function[BootstrapModeMain]
      at postgres.wasm.wasm-function[pg_initdb]
      at /home/vsts/work/r1/a/_typescript/node_modules/@electric-sql/pglite/dist/index.js:1:10623
-------------------------------

@pmp-p
Copy link
Collaborator

pmp-p commented Oct 2, 2024

Why does "bun test v1.1.29 (6d43b366)" appears in the initdb flow ? is something trying to run concurrently ?

i don't get where this block is coming from

bun test v1.1.29 (6d43b366)

tests/facilities.test.ts:
popen failure: Function not implemented

@jamesgpearce
Copy link
Contributor

jamesgpearce commented Oct 4, 2024

I just started getting this in my tests after upgrading from 0.2.7 to 0.2.10, using Jest.

Looks like 0.2.8 is where it starts failing. I can try to isolate more tomorrow...

    RuntimeError: memory access out of bounds

      at _emscripten_memcpy_bulkmem (wasm:/wasm/postgres.wasm-028403ca:1:7169865)
      at __memcpy (wasm:/wasm/postgres.wasm-028403ca:1:7169850)
      at __fwritex (wasm:/wasm/postgres.wasm-028403ca:1:7191391)
      at fwrite (wasm:/wasm/postgres.wasm-028403ca:1:7191480)
      at internal_putbytes (wasm:/wasm/postgres.wasm-028403ca:1:6002810)
      at socket_putmessage (wasm:/wasm/postgres.wasm-028403ca:1:6003857)
      at pq_endmessage (wasm:/wasm/postgres.wasm-028403ca:1:5951676)
      at EmitErrorReport (wasm:/wasm/postgres.wasm-028403ca:1:3898103)
      at errfinish (wasm:/wasm/postgres.wasm-028403ca:1:3890102)
      at ShutdownXLOG (wasm:/wasm/postgres.wasm-028403ca:1:1501253)
      at pg_proc_exit (wasm:/wasm/postgres.wasm-028403ca:1:1942531)
      at pg_shutdown (wasm:/wasm/postgres.wasm-028403ca:1:476636)
      at Object.Module._pg_shutdown (node_modules/@electric-sql/pglite/dist/postgres.js:9554:118)
      at H._pg_shutdown [as close] (node_modules/@electric-sql/pglite/src/pglite.ts:462:7)

UPDATE: I was closing the database too quickly and it hadn't completed unlistening to a channel. In 0.2.8 and above this seems important!

@samwillis
Copy link
Collaborator

@jamesgpearce

I was closing the database too quickly and it hadn't completed unlistening to a channel. In 0.2.8 and above this seems important!

A few versions before 0.2.8 had a bug that was resulting in a .close() not checkpointing the WAL. Postgres then always had to replay the WAL on start. We fixed that, but it may be something odd is happen as a result (we may have messed something)

Is the error happing on restart or after the close call?

I'm keen to reproduce this so we can sure it's prevented. Do you have a rough idea of the sequence that triggers it?

@rhuanbarreto
Copy link
Author

I'm working on checking if it's a race condition. We are using pglite in the test suite but we are not initializing the DB in the beforeAll phase. So maybe this is generating a race condition

@jamesgpearce
Copy link
Contributor

I'm keen to reproduce this so we can sure it's prevented. Do you have a rough idea of the sequence that triggers it?

I'm tempted to say I'm in the wrong! I didn't await the unlisten (the async handle that is returned from listen) before immediately closing. Awaiting that (or putting in a tiny sleep) was enough to prevent the OOB error.

@seveibar
Copy link

Hey everyone, I have a consistent reproduction of this issue on a repo, it could be a bit more simplified, but thought I would share.

I can only get this error on CI (ubuntu), can't get it on my macbook!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants