Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary and secondary client indexes and client-side transaction validation #314

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

jcaamano
Copy link
Collaborator

@jcaamano jcaamano commented Jun 1, 2022

This PR introduces two different types of client indexes, primary and secondary,
and an option to enable transaction validation.

Primary indexes are verified to be unique at transaction validation in the context
of that specific client instance (cache) when validation is enabled, otherwise there
is no difference between both types of indexes.

When transaction validation is enabled, transactions are serialized in
the client. This avoids the validation of a transaction when other
transactions are in flight ensuring the consistency of such
validation.

The validation is done by running the transaction in a similar way it
is done server-side but on a throw-away database built on top of the
client-cache.

Topics for discussion:

  • Transaction validation has a performance cost and is not free. Specifically checking
    indexes needs to account for them being updated throughout the transaction, either via
    updates, mutates or deletes; and the check needs to be deferred until all operations are processed.
    Processing the transaction as it is done in the server and reusing that code is the more natural approach.

    Measuring with one of the added benchmarks shows from 5% to 20% performance loss.

    cpu: Intel Core Processor (Skylake, IBRS)
    BenchmarkClientValidateTransaction/1_update_ops_with_validating_client-4                    1524            707144 ns/op          691235 B/op       1361 allocs/op
    BenchmarkClientValidateTransaction/1_update_ops_with_non_validating_client-4                1638            614031 ns/op          558956 B/op       1211 allocs/op
    BenchmarkClientValidateTransaction/10_update_ops_with_validating_client-4                    385           3191497 ns/op         6568366 B/op       6422 allocs/op
    BenchmarkClientValidateTransaction/10_update_ops_with_non_validating_client-4                462           2535916 ns/op         5276034 B/op       5350 allocs/op
    BenchmarkClientValidateTransaction/100_update_ops_with_validating_client-4                    48          25087192 ns/op        66203392 B/op      56964 allocs/op
    BenchmarkClientValidateTransaction/100_update_ops_with_non_validating_client-4                57          20474521 ns/op        53091101 B/op      46411 allocs/op
    

    These benchmark tests do incur in IO that make drawing conclusions difficult. Profiling consistently shows a total extra of 4%-5% time spent:
    Alt text

    note: we are using the bridge model which has a 4096 int sized array which relatively hurts performance and is not particularly representative of what we would find in ovn-k.

    Pending to check how this really affects specific use cases like ovn-k.

    Ways to go forward:

    • The PR already includes several optimizations but there could be chances for more. I guess this was never a concern since the server is only used for testing and there might be room of improvement.
    • Instead of enabling validation globally, let the user decide when to do it on a per-transaction basis.
    • Compromise & only validate operations over tables with client indexes.
    • Look for different implementation approaches.
  • We currently don't support garbage collecting strongly referenced rows that are not
    referenced any more and we don't correctly consider that on validation. Regarding indexes
    and client side validation, the outcome is not too bad as it will error when it should not but
    will not lead to any undesired index dup. The workarounds would be not use validation, explicitly
    remove strong referenced rows in the transaction, or use a separate transaction than the one
    that removes the strongly referenced row to create a similar one with the same index.

@coveralls
Copy link

coveralls commented Jun 1, 2022

Pull Request Test Coverage Report for Build 2534420969

  • 508 of 1286 (39.5%) changed or added relevant lines in 13 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-6.2%) to 68.58%

Changes Missing Coverage Covered Lines Changed/Added Lines %
model/client.go 0 1 0.0%
ovsdb/bindings.go 9 10 90.0%
mapper/mapper.go 21 28 75.0%
server/server.go 14 23 60.87%
client/transact.go 73 97 75.26%
cache/cache.go 39 70 55.71%
database/transaction.go 103 147 70.07%
test/test_model.go 226 887 25.48%
Files with Coverage Reduction New Missed Lines %
ovsdb/schema.go 1 87.24%
client/client.go 2 70.74%
Totals Coverage Status
Change from base Build 2436166898: -6.2%
Covered Lines: 5232
Relevant Lines: 7629

💛 - Coveralls

@jcaamano
Copy link
Collaborator Author

jcaamano commented Jun 1, 2022

Added some commits with further optimizations:

  • Make sure we have model clone available in test benchmarks
  • Don't allocate the even channel unless it is going to be used
  • Run the transaction on shallow model copies
  • Done clone replaced model on update

Before:

cpu: Intel Core Processor (Skylake, IBRS)
BenchmarkClientValidateTransaction/1_update_ops_with_validating_client-4         	      97	  12235776 ns/op	12775774 B/op	   18223 allocs/op
BenchmarkClientValidateTransaction/1_update_ops_with_non_validating_client-4     	     115	   8840098 ns/op	 7041598 B/op	    9751 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_validating_client-4        	      10	 103985626 ns/op	53229720 B/op	  175543 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_non_validating_client-4    	      13	  81000966 ns/op	33356238 B/op	   91082 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_validating_client-4       	       1	1745260207 ns/op	548136664 B/op	 1783419 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_non_validating_client-4   	       1	1452035366 ns/op	378110136 B/op	  936796 allocs/op
PASS
ok  	github.com/ovn-org/libovsdb/client	38.376s

After

cpu: Intel Core Processor (Skylake, IBRS)
BenchmarkClientValidateTransaction/1_update_ops_with_validating_client-4         	    1906	    684957 ns/op	  696900 B/op	    1403 allocs/op
BenchmarkClientValidateTransaction/1_update_ops_with_non_validating_client-4     	    2040	    574269 ns/op	  562441 B/op	    1250 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_validating_client-4        	     364	   3169125 ns/op	 6644786 B/op	    6971 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_non_validating_client-4    	     460	   2584837 ns/op	 5320557 B/op	    5790 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_validating_client-4       	      46	  25111775 ns/op	67033054 B/op	   62515 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_non_validating_client-4   	      57	  20995812 ns/op	53555582 B/op	   50825 allocs/op
PASS
ok  	github.com/ovn-org/libovsdb/client	11.135s

The performance difference is still 20%-30% although absolute numbers are much more digestible now: since nothing much more than transaction code is happening on transact, the difference sticks as both client-side and server-side benefit for the same improvements. The server side though is benefiting for a deep copy implementation of models that would normally not be available to it.

@jcaamano jcaamano force-pushed the verify-indexes branch 4 times, most recently from 582489f to 9640378 Compare June 2, 2022 00:25
@jcaamano
Copy link
Collaborator Author

jcaamano commented Jun 2, 2022

Some more performance improvements:

cpu: Intel Core Processor (Skylake, IBRS)
BenchmarkClientValidateTransaction/1_update_ops_with_validating_client-4         	    1911	    749220 ns/op	  697705 B/op	    1405 allocs/op
BenchmarkClientValidateTransaction/1_update_ops_with_non_validating_client-4     	    1995	    588297 ns/op	  562736 B/op	    1250 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_validating_client-4        	     340	   3289901 ns/op	 6640095 B/op	    6848 allocs/op
BenchmarkClientValidateTransaction/10_update_ops_with_non_validating_client-4    	     411	   2671668 ns/op	 5317990 B/op	    5724 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_validating_client-4       	      42	  25968994 ns/op	67051895 B/op	   61211 allocs/op
BenchmarkClientValidateTransaction/100_update_ops_with_non_validating_client-4   	      55	  20992106 ns/op	53539775 B/op	   50109 allocs/op
PASS
ok  	github.com/ovn-org/libovsdb/client	11.294s

Results show 10% to 25%. There is IO involved. Looking at the profile, numbers are closer to 5%.

Alt text

@jcaamano
Copy link
Collaborator Author

jcaamano commented Jun 2, 2022

note: we are using the bridge model which as a 4096 int sized array which relatively hurts performance and is not particularly representative of what we would find in ovn-k.

@jcaamano jcaamano force-pushed the verify-indexes branch 3 times, most recently from 266ce31 to 4158e94 Compare June 2, 2022 13:12
jcaamano added 6 commits June 6, 2022 10:27
Index duplication is a commit constraint and should be reported as an
additional operation result. Verification should be deferred until all
operations have been processed and can be taken into account.

From RFC:

   If "indexes" is specified, it must be an array of zero or more
      <column-set>s.  A <column-set> is an array of one or more strings,
      each of which names a column.  Each <column-set> is a set of
      columns whose values, taken together within any given row, must be
      unique within the table.  This is a "deferred" constraint,
      enforced only at transaction commit time, after unreferenced rows
      are deleted and dangling weak references are removed.  Ephemeral
      columns may not be part of indexes.

and:

   if all of the operations succeed, but the results cannot
   be committed, then "result" will have one more element than "params",
   with the additional element being an <error>.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
To be able to reuse it in the client without incurring in circular
dependencies.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Adds a client option that enables transaction verification.

When transaction verification is enabled, transactions are serialized in
the client. This avoids the verification of a transaction when other
transactions are on flight ensuring the consistency of such
verification.

The verification is done by running the transaction in a similar way it
is done server-side but on a throw-away database built on top of the
client-cache.

The primary use case for this verification is to check for duplicate
indexes client-side instead of server-side and specifically client indexes
which can not be checked server-side. Although support for this specific
case will be introduced in later commit.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Introduces two types of client indexes: primary and secondary. Primary
indexes are verified to no be duplicate at transaction verification,
when enable, otherwise there is no difference between them.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
So that we have DeepCopy of models available and benchmarks throw out
more realistic numbers.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
jcaamano added 7 commits June 6, 2022 11:16
Creating the event channel has a performance cost due to being buffered
and high capacity. There is no use for it in some situations, like a
transaction or database cache and in such cases the event processor wont
be run. Delay initialization of the event channel until that time.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
We can run the transaction on shallow model copies of its own cache as
it never modifies them directly, it uses tableUpdates instead.

We can do the same for the temporary database of the validating client
transaction.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
@jcaamano jcaamano force-pushed the verify-indexes branch 2 times, most recently from 5f6ce60 to bd57ef7 Compare June 6, 2022 12:59
@jcaamano
Copy link
Collaborator Author

/hold
still assessing performance impact, and we probably need to do transaction serialization better

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
jcaamano added 4 commits June 20, 2022 16:13
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
For updates and mutates, the difference for each column is calculated
and set in the Modify row of an update2 notification. Calculating this
difference has a performance cost when the column is a set with
thousands of unordered values.

While calculating the Modify row is a spec requirement over the wire,
internally updates could be handled through the Old & New rows and a
transaction that is specific for validating purposes can take advantage
of this.

Make the cache capable of processing an update2 with nil Modify and
non-nil Old & New as if it were a normal update notification and avoid
setting Modify for updates on validating transactions.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants