Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDbCache::setValue collisions - TApplicationComponent::getClassFxEvents thrashing cache #940

Open
majuca opened this issue May 10, 2023 · 38 comments

Comments

@majuca
Copy link
Contributor

majuca commented May 10, 2023

HI, since the version 4.2.2, we have a lot error like this:

image

An idee?

@ctrlaltca
Copy link
Member

Possibly related to issue #820 and this change in PR #841: https://github.com/pradosoft/prado/pull/841/files#diff-e89e3dcbd35b5518b965e802e9aa3d2d3ecb051a7c6fe6feaafd0928c17bc277
Is the database mysql/innodb? You should be able to get some more information on the last deadlock with SHOW ENGINE INNODB STATUS;
Also, did you flush (wipe) the cache when upgrading prado version? prado-cli includes a "cache" action to perform the flush

@majuca
Copy link
Contributor Author

majuca commented May 10, 2023

Indeed, I would confirm that problem is related to the issue #820. If I change the application mode to Normal instead of Performance, it seems to be ok. Yes, I'am with mariadb/innodb 10.11.2.

The cache was removed completly.

Maybe the folloing code here in the function TApplicationComponent::getClassFxEvents:

if ($mode === TApplicationMode::Performance || isset(Prado::$classMap[$baseClassName])) {
$_classfx[$className] = $fx;
$cache->set(self::APP_COMPONENT_FX_CACHE, $_classfx);
}

@belisoful
Copy link
Member

belisoful commented May 10, 2023

It is difficult to debug that function. I have three comments:

  1. Getting this function right is critical.
  2. My apologies.
  3. When mode is Performance the cache should always be set because it's not found, so something else is going on

The whole point is to load all the class data once and put it in one place for loading once.

@belisoful
Copy link
Member

@majuca The issue is that the code "shouldn't" be getting to that point. The TApplicationSTatePersister should have had the GetClassFxEvents loaded from cache already. That code only runs once to load the cache, and not a second time, unless the cache is manually busted.

If that code is running multiple times (as it seems to be doing in your instance), then the classFxEvents are not being cached properly.

@belisoful
Copy link
Member

PS. The TApplicationStatePersister (being loaded) is in the Class map, and would set the cache regardless of being in performance mode. That's the weird part.

@belisoful
Copy link
Member

I think I got it. The scope of static $_classfx = null; is not correct. It should be outside the IF statement in the body of the function rather than in the if statement block. put static $_classfx = null; on line 54 rather than line 57.

When $_classfx[$className] = $fx; happens we are working on $_classfx out of scope.

@belisoful
Copy link
Member

The bug was there before #820, #820 unlocked the bug, like a Fortnite achievement award. lol.

@belisoful belisoful changed the title TDbCache TDbCache - getClassFxEvents thrashing cache May 10, 2023
belisoful added a commit to belisoful/prado that referenced this issue May 10, 2023
@majuca
Copy link
Contributor Author

majuca commented May 11, 2023

@belisoful Many thanks for your realy fast fix. I will test it today, I have several server where I can check this.

@belisoful
Copy link
Member

Mathematically (technically), this is the issue. and God bless you for having servers to test the correction.

@majuca
Copy link
Contributor Author

majuca commented May 11, 2023

Bad news, I applied the fix on one server and the error still happens, some examples:

#0 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(552): Prado\Data\TDbCommand->execute()
#1 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(495): Prado\Caching\TDbCache->deleteValue('46f3d3389421db0...')
#2 /var/www/html/vendor/pradosoft/prado/framework/Caching/TCache.php(155): Prado\Caching\TDbCache->setValue('46f3d3389421db0...', Array, 0)
#3 /var/www/html/vendor/pradosoft/prado/framework/TApplicationComponent.php(76): Prado\Caching\TCache->set('prado:applicati...', Array)
#4 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(503): Prado\TApplicationComponent->getClassFxEvents(Object(Prado\Exceptions\TErrorHandler))
#5 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(381): Prado\TComponent->listen()
#6 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(812): Prado\TComponent->__construct()
#7 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1162): Prado\TApplication->getErrorHandler()
#8 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(390): Prado\TApplication->onError(Object(Prado\Exceptions\TDbException))
#9 /var/www/html/index.php(19): Prado\TApplication->run()
#10 {main}
#0 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(552): Prado\Data\TDbCommand->execute()
#1 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(495): Prado\Caching\TDbCache->deleteValue('46f3d3389421db0...')
#2 /var/www/html/vendor/pradosoft/prado/framework/Caching/TCache.php(155): Prado\Caching\TDbCache->setValue('46f3d3389421db0...', Array, 0)
#3 /var/www/html/vendor/pradosoft/prado/framework/TApplicationComponent.php(76): Prado\Caching\TCache->set('prado:applicati...', Array)
#4 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(503): Prado\TApplicationComponent->getClassFxEvents(Object(TWebSocketModule))
#5 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(381): Prado\TComponent->listen()
#6 /var/www/html/vendor/pradosoft/prado/framework/Prado.php(354): Prado\TComponent->__construct()
#7 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(982): Prado\Prado::createComponent('Application.mod...')
#8 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1047): Prado\TApplication->internalLoadModule('webSocket')
#9 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1072): Prado\TApplication->applyConfiguration(Object(Prado\TApplicationConfiguration), false)
#10 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1072): Prado\TApplication->applyConfiguration(Object(Prado\TApplicationConfiguration), false)
#11 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1100): Prado\TApplication->applyConfiguration(Object(Prado\TApplicationConfiguration), false)
#12 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(373): Prado\TApplication->initApplication()
#13 /var/www/html/index.php(19): Prado\TApplication->run()
#14 {main}
#0 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(552): Prado\Data\TDbCommand->execute()
#1 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(495): Prado\Caching\TDbCache->deleteValue('46f3d3389421db0...')
#2 /var/www/html/vendor/pradosoft/prado/framework/Caching/TCache.php(155): Prado\Caching\TDbCache->setValue('46f3d3389421db0...', Array, 0)
#3 /var/www/html/vendor/pradosoft/prado/framework/TApplicationComponent.php(76): Prado\Caching\TCache->set('prado:applicati...', Array)
#4 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(503): Prado\TApplicationComponent->getClassFxEvents(Object(Prado\TApplicationStatePersister))
#5 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(381): Prado\TComponent->listen()
#6 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(872): Prado\TComponent->__construct()
#7 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(469): Prado\TApplication->getApplicationStatePersister()
#8 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1234): Prado\TApplication->loadGlobals()
#9 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(386): Prado\TApplication->onLoadState()
#10 /var/www/html/index.php(19): Prado\TApplication->run()
#11 {main}
#0 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(552): Prado\Data\TDbCommand->execute()
#1 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(495): Prado\Caching\TDbCache->deleteValue('46f3d3389421db0...')
#2 /var/www/html/vendor/pradosoft/prado/framework/Caching/TCache.php(155): Prado\Caching\TDbCache->setValue('46f3d3389421db0...', Array, 0)
#3 /var/www/html/vendor/pradosoft/prado/framework/TApplicationComponent.php(76): Prado\Caching\TCache->set('prado:applicati...', Array)
#4 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(503): Prado\TApplicationComponent->getClassFxEvents(Object(Prado\Web\TUrlManager))
#5 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(381): Prado\TComponent->listen()
#6 /var/www/html/vendor/pradosoft/prado/framework/Web/THttpRequest.php(324): Prado\TComponent->__construct()
#7 /var/www/html/vendor/pradosoft/prado/framework/Web/THttpRequest.php(762): Prado\Web\THttpRequest->getUrlManagerModule()
#8 /var/www/html/vendor/pradosoft/prado/framework/Web/THttpRequest.php(778): Prado\Web\THttpRequest->parseUrl()
#9 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1103): Prado\Web\THttpRequest->resolveRequest(Array)
#10 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(373): Prado\TApplication->initApplication()
#11 /var/www/html/index.php(19): Prado\TApplication->run()
#12 {main}

@belisoful
Copy link
Member

belisoful commented May 11, 2023

It still seems that the cache is not getting set properly and is setting the cache much more than needed. I can imagine the first second of a web server going online, loading all its classes, and there being a collision here or maybe there, but after loading the first time, the code being executed should never be hit, especially in a performance mode.

That's the part that is boggling my mind. Are you running a shared DB cache system?

Could it be that the other instances are clobbering the cache causing the correction to not work properly? If other instances are incorrectly invalidating the shared DB cache, the new code is going to have the same issue.

@belisoful
Copy link
Member

I can see how your situation would happen in a shared DB cache environment where other instances are not behaving nicely like the new fix. The other misbehaving servers will thrash the cache despite the fix.

In such a case, the solution is to deploy the micro-fix to all servers in the shared environment. It'll speed things up too.

@belisoful
Copy link
Member

such suspense... lol. 😅 (j/k. more accurately: 🕵️)

@majuca
Copy link
Contributor Author

majuca commented May 15, 2023

I doing some tests this week, last end of wee was very busy. But, in our application, the database is not shared. But when this problem happens, it seams on servers where we have a lot of request by seconds.

@belisoful
Copy link
Member

Especially in performance mode, that code shouldn't be touched after initial load. So somehow, the cache is still being thrashed.

@majuca
Copy link
Contributor Author

majuca commented May 15, 2023

Mmm, if I put a debug in the function getClassFxEvents, this one is not called only one time, or perhaps I misunderstood your remark concerning the initial load. Indeed, each time a new class is used (like a TPage), this the APP_COMPONENT_FX_CACHE cache is updated.

I would like to say that the problem is coming from the function setValue of TDbCache. Indeed, this function is not thread safe (atomic):

protected function setValue($key, $value, $expire)
{
   $this->deleteValue($key);
   return $this->addValue($key, $value, $expire);
}

As we delete the value and create it just after, two threads could have conflict and specialy if the two thread started a transaction. This explain the message :

TDbCommand failed to execute the SQL statement "DELETE FROM pradocache WHERE itemkey=:key": SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

To have the function setValue as an atomic function, I would propose the following code:

protected function setValue($key, $value, $expire)
{
        $isCurrentTransaction = $this->getDbConnection()->getCurrentTransaction();
        $transaction = $this->getDbConnection()->getCurrentTransaction() ?? $this->getDbConnection()->beginTransaction();

 	$this->deleteValue($key);
	$res = $this->addValue($key, $value, $expire);

        if (!$isCurrentTransaction) {
            $transaction->commit();
        }

        return $res;
}

I currently trying this code on one server to see if it is better.

@belisoful
Copy link
Member

belisoful commented May 15, 2023

There is SQL for adding or updating a key if it exists, which may be a better route than transaction with deleting then re-adding. The "Add but update if exists" may be more efficient than a transaction as I believe that would be atomic.

What I am saying is that the first half of getClassFxEvents should be called with each new instance. but once the cache is loaded, the second half shouldn't be run, ever, especially in performance mode. your code keep running the second half and, again, it shouldn't be running the second half once you are cached and loaded. Somehow, your cache is getting busted... leading to "setValue" when it shouldn't be called.

the fact that you are even having a "setValue" is the problem. That it isn't atomic is a second problem.

To explain further, the first half gets the data, but all the data should be cached and shouldn't have to setValue any more. In performance mode, it shouldn't get to that point once the cache is loaded.

I hope this isn't lost in translation.

@belisoful
Copy link
Member

I think the non-transaction atomic SQL we are looking for is INSERT INTO TABLE (id, name, age) VALUES (1, "A", 19) ON DUPLICATE KEY UPDATE name = VALUES (name), ...

@belisoful
Copy link
Member

belisoful commented May 15, 2023

Put simply.... if you are getting a a butch of requests at the same time doing a "setValue" (and even remotely possibly colliding) then the caching isn't working right!!!! SetValue is the least of the problems.

@majuca
Copy link
Contributor Author

majuca commented May 15, 2023

Sure, you are right, your sql INSERT is for sure better than a SQL transaction and should fix the atomic problem of the funtion setValue.

And concerning the function getClassFxEvents, for me is it now ok and the only problem is see is on the setValue.

@belisoful
Copy link
Member

Thank you for that. The getClassFxEvents working right was keeping me up.

@belisoful belisoful changed the title TDbCache - getClassFxEvents thrashing cache TDbCache::setValue collisions - TApplicationComponent::getClassFxEvents thrashing cache May 16, 2023
@belisoful
Copy link
Member

belisoful commented May 16, 2023

This should work for you:

protected function setValue($key, $value, $expire)
	{
		if (!$this->_cacheInitialized) {
			$this->initializeCache();
		}
		$expire = ($expire <= 0) ? 0 : time() + $expire;
		$sql = "INSERT INTO {$this->_cacheTable} (itemkey,value,expire) VALUES (:key,:value,$expire) ON DUPLICATE KEY UPDATE value=VALUES(value), expire=VALUES(expire)";
		$command = $this->getDbConnection()->createCommand($sql);
		$command->bindValue(':key', $key, \PDO::PARAM_STR);
		$command->bindValue(':value', serialize($value), \PDO::PARAM_LOB);

		try {
			$command->execute();
			return true;
		} catch (\Exception $e) {
			try {
				$this->initializeCache(true);
				$command->execute();
				return true;
			} catch (\Exception $e) {
				return false;
			}
		}
	}

NM on the addition stuff.

belisoful added a commit to belisoful/prado that referenced this issue May 16, 2023
This is a better implementation of getClassFXEvents.  it shaves a few milliseconds from the testing time.  woot.
belisoful added a commit to belisoful/prado that referenced this issue May 16, 2023
@belisoful
Copy link
Member

The getClassFxEvents still isn't working quite right because TDBCache isn't loaded for most of its use. It's used mainly in loading modules (and their global fx event handlers; speeding it up) but TDBCache is a module that isn't installed until after the modules are loaded.

As such, getClassFXEvents shouldn't be using TDBCache. So this update to TDBCache is important for the DB Cache to work correctly in heavily concurrent environments, but it will be removed from this context. The TDBCache won't be any further issue in this function because it's being totally removed.

The solution for getClassFXEvents is to do an exclusive lock write to a new FXEvents cache file in the "runtime" folder. When loading, it reads that file. If there are any new classes to be cached by any instance, they do an exclusive lock write to the file. If the file cannot be locked, the writing is (properly) skipped. this will stop any concurrency issues. This way, the cache is built and restored immediately and properly without TDBCache.

I have a new function for setValue that takes into account most things. This SQL has been tested in SQLite3 and simulated Oracle and MSSQL (using ChatGPT as if it were an Oracle and MSSQL database). It also includes your proposed transaction fix for any unsupported DBs. Thank you for that.

	protected function setValue($key, $value, $expire)
	{
		if (!$this->_cacheInitialized) {
			$this->initializeCache();
		}
		$db = $this->getDbConnection();
		$driver = $db->getDriverName();
		if (in_array($driver, ['mysql', 'mysqli', 'sqlite', 'ibm', 'oci', 'sqlsrv', 'mssql', 'dblib', 'pgsql'])) {
			$expire = ($expire <= 0) ? 0 : time() + $expire;
			if (in_array($driver, ['mysql', 'mysqli', 'sqlite']))
				$sql = "REPLACE INTO {$this->_cacheTable} (itemkey,value,expire) VALUES (:key,:value,$expire)";
			elseif ($driver === 'pgsql') {
				$sql = "INSERT INTO {$this->_cacheTable} (itemkey, value, expire) VALUES (:key, :value, :expire) ".
					"ON CONFLICT (itemkey) DO UPDATE SET value = EXCLUDED.value, expire = EXCLUDED.expire";
			} else {
				$sql = "MERGE INTO {$this->_cacheTable} AS c " .
				"USING (SELECT :key AS itemkey, :value AS value, $expire AS expire) AS data " .
				"ON c.itemkey = data.itemkey " .
				"WHEN MATCHED THEN " .
					"UPDATE SET c.value = data.value, c.expire = data.expire " .
				"WHEN NOT MATCHED THEN " .
					"INSERT (itemkey, value, expire) " .
					"VALUES (data.itemkey, data.value, data.expire)";
			}
			$command = $db->createCommand($sql);
			$command->bindValue(':key', $key, \PDO::PARAM_STR);
			$command->bindValue(':value', serialize($value), \PDO::PARAM_LOB);
	
			try {
				$command->execute();
				return true;
			} catch (\Exception $e) {
				try {
					$this->initializeCache(true);
					$command->execute();
					return true;
				} catch (\Exception $e) {
					return false;
				}
			}
		} else {
			$isCurrentTransaction = $this->getDbConnection()->getCurrentTransaction();
			$transaction = $this->getDbConnection()->getCurrentTransaction() ?? $this->getDbConnection()->beginTransaction();
			
			$this->deleteValue($key);
			$return = $this->addValue($key, $value, $expire);
			
			if (!$isCurrentTransaction) {
				$transaction->commit();
			}
			
			return $return;
		}
	}

@belisoful
Copy link
Member

Alright. I have a new getClassFxEvents method that doesn't use TDBCache. So the DB collisions won't be a problem at that point any more. In essence, due to TDBCache being a Module, it wasn't loading the cache properly for when the modules load... which is the point this method is designed to alleviate.

TDBCache loads too late.

This uses a file to store the cache as the cache is loaded and from then on its only read only, so no further writes when things are loaded up. Normal mode only stores the events of the framework, and performance mode stores the events of all classes.

Regarding TDBCache, that isn't a problem but it still should be updated to handle concurrency better. I have included an interesting solution: Driver specific SQL for 3 different kinds of systems and a fallback to @majuca beautifully simple transaction based "set" (delete/insert, "upsert").

(I mean "simple" in the mathematical simple solutions that solve things like E=mc^2; where simplicity is the beauty of the thing)

Others have run into this problem of "universal DB Set row" and reason why I think we should include a good solution in PRADO.

Regarding testing the Oracle, DB2, MSSQL, and Postgres SQL.... Interestingly, I had ChatGPT pretend to be these databases in turn, "...accept only SQL statements that the DB would accept, and output only what the DB would output. Do not output anything but what the DB would output"

I then entered the DB specific code (with prep work for the db having data to "set"). I tested with both existing data (update) and new data (insert). It worked properly. I believe these to be valid tests, I entered the SQL for the other DB systems and only got errors... as expected.

I think the new code on both fronts will resolve your issues. I didn't expect this bug to expose the faults of getClassFXEvents. We found the faults, and the PR should fix it.

@majuca
Copy link
Contributor Author

majuca commented May 17, 2023

Nice solution.

I could try it on a server having a lot of requests and give you a feed back.

@belisoful
Copy link
Member

belisoful commented May 17, 2023

Indeed. That would be lovely. This should speed things up a little bit too.

ctrlaltca pushed a commit that referenced this issue May 17, 2023
…Value uses atomic SQL (#952)

* #940 TApplicationComponent::getClassFxEvents

This is a better implementation of getClassFXEvents.  it shaves a few milliseconds from the testing time.  woot.

* #940 TDBCache::setValue uses Atomic SQL "INSERT... ON DUPLICATE"

* TDBCache::setValue uses "REPLACE" rather than "INSERT INTO  ON DUPLICATE KEY UPDATE"

* shortened the sql

* TDBCache::setValue selects on database type for SQL, fallback to transaction delete/add

And corrected a "/" into "DIRECTORY_SEPARATOR" in TApplicationStatePersister

* TApplicationComponent::getClassFxEvents uses runtime file cache

Rather than waiting for TDBCache module to load, this uses a runtime file cache to store the fxEvents.

* remove testing

* getClassFXEvents doesn't cache in Debug Mode

* removed cleanup that doesn't need to happen for Debug mode

* no change to the ticket 589test
@belisoful
Copy link
Member

No news is good news.

@belisoful
Copy link
Member

Are we ready to close this?

@belisoful
Copy link
Member

Also, I'd like to reiterate that this bug is severe enough and the fix important enough as to maybe warrant releasing 4.2.3 ASAP (sooner rather than later).

I think it's also important to get closures as PRADO event handlers out there too.

@majuca
Copy link
Contributor Author

majuca commented Sep 12, 2023

Sorry for my delay, I was not able to give you a feedback these last months for personal reasons. I retry the patch this week and give you a feedback at least friday.

@belisoful
Copy link
Member

@majuca I hope things are well for you and/or moving in the right direction.

@majuca
Copy link
Contributor Author

majuca commented Sep 24, 2023

I try the patch on two server with a lot of traffic. I does't have any problem or error this last days. I would confirm that it is working as expected.

@majuca
Copy link
Contributor Author

majuca commented Oct 25, 2023

Could we imagine making a new version of PRADO with just this one change? The difference is truly incredible.

@belisoful
Copy link
Member

That's what I was suggesting.... 4.2.3 w/ all the event updates 'n stuff. I saw immediately how critical this update was when I put it in place.

@ctrlaltca
Copy link
Member

I can prepare a release, but since we bumped up requirements from php 7.4 to php 8.1 I'd go directly for a 4.3.0 if you don't mind

@ctrlaltca
Copy link
Member

Done, 4.3.0 is released. Updated prado-demos, prado-app, the pradoframework.net and api docs websites.

@belisoful
Copy link
Member

yeeeeeesssss..... I entirely agree. and thank you. I hope people start contributing PRADO extensions in composer/packagist. That's a huge one. It was such a small change, with two functions, to unlock so much potential.

@majuca
Copy link
Contributor Author

majuca commented Oct 26, 2023

Good news. A big thank you to both of you for the incredible job you're doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants