Primary and secondary client indexes and client-side transaction validation #314

jcaamano · 2022-06-01T13:51:35Z

This PR introduces two different types of client indexes, primary and secondary,
and an option to enable transaction validation.

Primary indexes are verified to be unique at transaction validation in the context
of that specific client instance (cache) when validation is enabled, otherwise there
is no difference between both types of indexes.

When transaction validation is enabled, transactions are serialized in
the client. This avoids the validation of a transaction when other
transactions are in flight ensuring the consistency of such
validation.

The validation is done by running the transaction in a similar way it
is done server-side but on a throw-away database built on top of the
client-cache.

Topics for discussion:

Transaction validation has a performance cost and is not free. Specifically checking
indexes needs to account for them being updated throughout the transaction, either via
updates, mutates or deletes; and the check needs to be deferred until all operations are processed.
Processing the transaction as it is done in the server and reusing that code is the more natural approach.

Measuring with one of the added benchmarks shows from 5% to 20% performance loss.
```
cpu: Intel Core Processor (Skylake, IBRS)
BenchmarkClientValidateTransaction/1_update_ops_with_validating_client-4                    1524            707144 ns/op          691235 B/op       1361 allocs/op
BenchmarkClientValidateTransaction/1_update_ops_with_non_validating_client-4                1638            614031 ns/op          558956 B/op       1211 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_validating_client-4                    385           3191497 ns/op         6568366 B/op       6422 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_non_validating_client-4                462           2535916 ns/op         5276034 B/op       5350 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_validating_client-4                    48          25087192 ns/op        66203392 B/op      56964 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_non_validating_client-4                57          20474521 ns/op        53091101 B/op      46411 allocs/op
```
These benchmark tests do incur in IO that make drawing conclusions difficult. Profiling consistently shows a total extra of 4%-5% time spent:

note: we are using the bridge model which has a 4096 int sized array which relatively hurts performance and is not particularly representative of what we would find in ovn-k.

Pending to check how this really affects specific use cases like ovn-k.

Ways to go forward:
- The PR already includes several optimizations but there could be chances for more. I guess this was never a concern since the server is only used for testing and there might be room of improvement.
- Instead of enabling validation globally, let the user decide when to do it on a per-transaction basis.
- Compromise & only validate operations over tables with client indexes.
- Look for different implementation approaches.
We currently don't support garbage collecting strongly referenced rows that are not
referenced any more and we don't correctly consider that on validation. Regarding indexes
and client side validation, the outcome is not too bad as it will error when it should not but
will not lead to any undesired index dup. The workarounds would be not use validation, explicitly
remove strong referenced rows in the transaction, or use a separate transaction than the one
that removes the strongly referenced row to create a similar one with the same index.

coveralls · 2022-06-01T14:18:17Z

Pull Request Test Coverage Report for Build 2534420969

508 of 1286 (39.5%) changed or added relevant lines in 13 files are covered.
3 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-6.2%) to 68.58%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
model/client.go	0	1	0.0%
ovsdb/bindings.go	9	10	90.0%
mapper/mapper.go	21	28	75.0%
server/server.go	14	23	60.87%
client/transact.go	73	97	75.26%
cache/cache.go	39	70	55.71%
database/transaction.go	103	147	70.07%
test/test_model.go	226	887	25.48%

Files with Coverage Reduction	New Missed Lines	%
ovsdb/schema.go	1	87.24%
client/client.go	2	70.74%

Totals
Change from base Build 2436166898:	-6.2%
Covered Lines:	5232
Relevant Lines:	7629

💛 - Coveralls

jcaamano · 2022-06-01T20:17:33Z

Added some commits with further optimizations:

Make sure we have model clone available in test benchmarks
Don't allocate the even channel unless it is going to be used
Run the transaction on shallow model copies
Done clone replaced model on update

Before:

cpu: Intel Core Processor (Skylake, IBRS)
BenchmarkClientValidateTransaction/1_update_ops_with_validating_client-4         	      97	  12235776 ns/op	12775774 B/op	   18223 allocs/op
BenchmarkClientValidateTransaction/1_update_ops_with_non_validating_client-4     	     115	   8840098 ns/op	 7041598 B/op	    9751 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_validating_client-4        	      10	 103985626 ns/op	53229720 B/op	  175543 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_non_validating_client-4    	      13	  81000966 ns/op	33356238 B/op	   91082 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_validating_client-4       	       1	1745260207 ns/op	548136664 B/op	 1783419 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_non_validating_client-4   	       1	1452035366 ns/op	378110136 B/op	  936796 allocs/op
PASS
ok  	github.com/ovn-org/libovsdb/client	38.376s

After

cpu: Intel Core Processor (Skylake, IBRS)
BenchmarkClientValidateTransaction/1_update_ops_with_validating_client-4         	    1906	    684957 ns/op	  696900 B/op	    1403 allocs/op
BenchmarkClientValidateTransaction/1_update_ops_with_non_validating_client-4     	    2040	    574269 ns/op	  562441 B/op	    1250 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_validating_client-4        	     364	   3169125 ns/op	 6644786 B/op	    6971 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_non_validating_client-4    	     460	   2584837 ns/op	 5320557 B/op	    5790 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_validating_client-4       	      46	  25111775 ns/op	67033054 B/op	   62515 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_non_validating_client-4   	      57	  20995812 ns/op	53555582 B/op	   50825 allocs/op
PASS
ok  	github.com/ovn-org/libovsdb/client	11.135s

The performance difference is still 20%-30% although absolute numbers are much more digestible now: since nothing much more than transaction code is happening on transact, the difference sticks as both client-side and server-side benefit for the same improvements. The server side though is benefiting for a deep copy implementation of models that would normally not be available to it.

jcaamano · 2022-06-02T00:27:28Z

Some more performance improvements:

cpu: Intel Core Processor (Skylake, IBRS)
BenchmarkClientValidateTransaction/1_update_ops_with_validating_client-4         	    1911	    749220 ns/op	  697705 B/op	    1405 allocs/op
BenchmarkClientValidateTransaction/1_update_ops_with_non_validating_client-4     	    1995	    588297 ns/op	  562736 B/op	    1250 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_validating_client-4        	     340	   3289901 ns/op	 6640095 B/op	    6848 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_non_validating_client-4    	     411	   2671668 ns/op	 5317990 B/op	    5724 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_validating_client-4       	      42	  25968994 ns/op	67051895 B/op	   61211 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_non_validating_client-4   	      55	  20992106 ns/op	53539775 B/op	   50109 allocs/op
PASS
ok  	github.com/ovn-org/libovsdb/client	11.294s

Results show 10% to 25%. There is IO involved. Looking at the profile, numbers are closer to 5%.

jcaamano · 2022-06-02T01:01:58Z

note: we are using the bridge model which as a 4096 int sized array which relatively hurts performance and is not particularly representative of what we would find in ovn-k.

Index duplication is a commit constraint and should be reported as an additional operation result. Verification should be deferred until all operations have been processed and can be taken into account. From RFC: If "indexes" is specified, it must be an array of zero or more <column-set>s. A <column-set> is an array of one or more strings, each of which names a column. Each <column-set> is a set of columns whose values, taken together within any given row, must be unique within the table. This is a "deferred" constraint, enforced only at transaction commit time, after unreferenced rows are deleted and dangling weak references are removed. Ephemeral columns may not be part of indexes. and: if all of the operations succeed, but the results cannot be committed, then "result" will have one more element than "params", with the additional element being an <error>. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

To be able to reuse it in the client without incurring in circular dependencies. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Adds a client option that enables transaction verification. When transaction verification is enabled, transactions are serialized in the client. This avoids the verification of a transaction when other transactions are on flight ensuring the consistency of such verification. The verification is done by running the transaction in a similar way it is done server-side but on a throw-away database built on top of the client-cache. The primary use case for this verification is to check for duplicate indexes client-side instead of server-side and specifically client indexes which can not be checked server-side. Although support for this specific case will be introduced in later commit. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Introduces two types of client indexes: primary and secondary. Primary indexes are verified to no be duplicate at transaction verification, when enable, otherwise there is no difference between them. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

So that we have DeepCopy of models available and benchmarks throw out more realistic numbers. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Creating the event channel has a performance cost due to being buffered and high capacity. There is no use for it in some situations, like a transaction or database cache and in such cases the event processor wont be run. Delay initialization of the event channel until that time. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

We can run the transaction on shallow model copies of its own cache as it never modifies them directly, it uses tableUpdates instead. We can do the same for the temporary database of the validating client transaction. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

jcaamano · 2022-06-20T09:51:13Z

/hold
still assessing performance impact, and we probably need to do transaction serialization better

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

For updates and mutates, the difference for each column is calculated and set in the Modify row of an update2 notification. Calculating this difference has a performance cost when the column is a set with thousands of unordered values. While calculating the Modify row is a spec requirement over the wire, internally updates could be handled through the Old & New rows and a transaction that is specific for validating purposes can take advantage of this. Make the cache capable of processing an update2 with nil Modify and non-nil Old & New as if it were a normal update notification and avoid setting Modify for updates on validating transactions. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

jcaamano force-pushed the verify-indexes branch from d45cd71 to 2ee2f3d Compare June 1, 2022 14:04

jcaamano force-pushed the verify-indexes branch 4 times, most recently from 582489f to 9640378 Compare June 2, 2022 00:25

jcaamano force-pushed the verify-indexes branch 3 times, most recently from 266ce31 to 4158e94 Compare June 2, 2022 13:12

jcaamano added 6 commits June 6, 2022 10:27

database: move database & transaction to own package

fd89b7a

To be able to reuse it in the client without incurring in circular dependencies. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

database: add transaction validation tests

6bcda1d

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Make some tests use a reference test model

2090b8d

So that we have DeepCopy of models available and benchmarks throw out more realistic numbers. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

jcaamano force-pushed the verify-indexes branch from 4158e94 to b16f25a Compare June 6, 2022 10:57

jcaamano added 7 commits June 6, 2022 11:16

cache: don't clone old replaced model on update

b7c86f6

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

update golangci-lint

d4b1497

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

client: don't evaluate log arguments unless enabled

47b94f3

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

mapper: improve performance of NewRow

a66d782

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

database: allocate transaction results capacity

bd57ef7

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

jcaamano force-pushed the verify-indexes branch 2 times, most recently from 5f6ce60 to bd57ef7 Compare June 6, 2022 12:59

Add transaction validation benchmark with Mutates

28f0131

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

jcaamano added 4 commits June 20, 2022 16:13

database: optimize transaction diff

58bfc3b

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

database: optimize transaction diff for mutates

68aec71

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

ovsdb: pre-allocate OvsSet capacity

8678b45

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Primary and secondary client indexes and client-side transaction validation #314

Primary and secondary client indexes and client-side transaction validation #314

jcaamano commented Jun 1, 2022 •

edited

Loading

coveralls commented Jun 1, 2022 •

edited

Loading

jcaamano commented Jun 1, 2022

jcaamano commented Jun 2, 2022

jcaamano commented Jun 2, 2022

jcaamano commented Jun 20, 2022

Primary and secondary client indexes and client-side transaction validation #314

Are you sure you want to change the base?

Primary and secondary client indexes and client-side transaction validation #314

Conversation

jcaamano commented Jun 1, 2022 • edited Loading

coveralls commented Jun 1, 2022 • edited Loading

Pull Request Test Coverage Report for Build 2534420969

💛 - Coveralls

jcaamano commented Jun 1, 2022

jcaamano commented Jun 2, 2022

jcaamano commented Jun 2, 2022

jcaamano commented Jun 20, 2022

jcaamano commented Jun 1, 2022 •

edited

Loading

coveralls commented Jun 1, 2022 •

edited

Loading