potentially unsound answers #8

florianschanda · 2018-01-18T10:10:11Z

Hi,

We've found the following answers of gosat to be probably incorrect (sat that should be unsat and vice versa).

The filenames below come from my bag of benchmarks. https://github.com/florianschanda/smtlib_schanda I've tried my best to make sure the various status are correct by mistakes are of course possible.

nyxbrain/crafted-edge-cases/RNE:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RNE:the_answer_is_NaN_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTN:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTN:the_answer_is_NaN_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTP:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTP:the_answer_is_NaN_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTZ:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTZ:the_answer_is_NaN_5_unsat.smt2
random/fp.cast/cast_rna_20033.smt2
random/fp.cast/cast_rna_20193.smt2
random/fp.cast/cast_rna_20201.smt2
random/fp.cast/cast_rna_20202.smt2
random/fp.cast/cast_rna_20204.smt2
random/fp.cast/cast_rna_20208.smt2
random/fp.cast/cast_rne_19904.smt2
random/fp.cast/cast_rne_19908.smt2
random/fp.cast/cast_rne_19916.smt2
random/fp.cast/cast_rne_19929.smt2
random/fp.cast/cast_rne_19930.smt2
random/fp.cast/cast_rtn_20753.smt2
random/fp.cast/cast_rtn_20760.smt2
random/fp.cast/cast_rtn_20761.smt2
random/fp.cast/cast_rtn_20763.smt2
random/fp.cast/cast_rtn_20775.smt2
random/fp.cast/cast_rtz_20828.smt2
random/fp.cast/cast_rtz_20879.smt2
random/fp.cast/cast_rtz_20966.smt2
random/fp.cast/cast_rtz_21033.smt2
random/fp.cast/cast_rtz_21034.smt2
random/fp.cast/cast_rtz_21036.smt2
random/fp.cast/cast_rtz_21037.smt2
random/fp.cast/cast_rtz_21040.smt2
random/fp.cast/cast_rtz_21054.smt2
wintersteiger/mul/mul-has-no-other-solution-10044.smt2
wintersteiger/mul/mul-has-no-other-solution-13484.smt2
wintersteiger/mul/mul-has-no-other-solution-15092.smt2
wintersteiger/mul/mul-has-no-other-solution-15226.smt2
wintersteiger/mul/mul-has-no-other-solution-15435.smt2
wintersteiger/mul/mul-has-no-other-solution-15552.smt2
wintersteiger/mul/mul-has-no-other-solution-19889.smt2
wintersteiger/mul/mul-has-no-other-solution-3202.smt2
wintersteiger/mul/mul-has-no-other-solution-3804.smt2
wintersteiger/mul/mul-has-no-other-solution-7814.smt2
wintersteiger/mul/mul-has-no-other-solution-8691.smt2

The text was updated successfully, but these errors were encountered:

florianschanda · 2018-01-18T10:12:54Z

Also, for some of these the platform of the computer I am on seems to matter.

wintersteiger/mul/mul-has-no-other-solution-15884.smt2 for example returns SAT on my AMD computer but UNKNOWN on my Intel computer.

I am using the same binary (which I compiled on my intel). Both computers are running the same Debian. Below the relevant extracts from /proc/cpuinfo
model name : Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
model name : AMD Opteron(tm) Processor 6328

florianschanda · 2018-01-19T08:25:08Z

Some of the issues could be resolved by not using -march=native that the CMakeLists hard-codes. However the bulk of the issued described above remain.

The distance between any number and a NaN should be a (fixed) positve number. The original treatment is unsound Partial fix for #8

Currently, we do not support setting rounding modes. We detect such constructs in the given formula, and warn the user that soundness might be affected Related to #8

Related to #8

- Bug fix in how we treated NaNs in the distance function. The distance between any number and a NaN should by > 0. Currently, it's fixed to 1024. - Bug fix in how negation (`not` operator) propagated to sub-expressions. Observe de-morgan's law - Bug fix in handling logical equality and inequality. Formerly, we did not make a distinction between logical equality and FP equality which is wrong. Two separate distance functions were added to support this distinction. - Warn users about unsupported rounding modes. goSAT does not honor rounding mode spec and keeps the default mode in the underlaying machine. - Did minor code refactoring

abenkhadra · 2018-01-21T12:44:25Z

I can confirm that all smt examples you provided can lead to unsound results. The following bug fixes have been applied:

Bug fix in how we treated NaNs in the distance function. The distance between any number and a NaN should by > 0. Currently, it is set to 1024.
Bug fix in how negation (not operator) propagated to sub-expressions.
Bug fix in handling logical equality and inequality. Earlier, we did not make a distinction between logical equality and FP equality which is wrong. Two separate distance functions were added to support this distinction.

Also, the examples exposed two current limitations which might be addressed in the future:

goSAT does not honor rounding mode spec and keeps the default mode of the underlaying machine. For now, we warn users about unsupported rounding modes. Actually, LLVM does not support rounding modes directly in the IR yet. There are discussions about this issue, but still nothing in the upstream. Note that the C++ facility fesetround can be useful for setting the mode per formula (before calling the optimizer). What we need, however, is a method of finer granularity to set the mode per sub-expression.
We natively support FP64 (double) only, which is the input accepted by nlopt. FP32 is partially supported by casting it to double at the interface with nlopt. FP16 and FP128 are not supported. However, you can still specify variables of size FP16 or FP128 which will internally be treated as FP64.

Additionally, the changes have passed the regression tests on the griggio benchmarks. Hence, the results of our paper are still reproducible. However, the changes are significant and may cause bugs on other benchmarks :)

Finally, I do not think the -march=native plays a role here. The answers for a few benchmarks were unstable, with repeated runs, since they are very close to zero. Something in the order of 10^-17.
Playing with the XTOL parameter might help. However, I have not looked much into tuning nlopt parameters.

Thanks for experimenting with goSAT!

florianschanda · 2018-01-22T08:57:43Z

Thank you for the fast fixes :) I am currently re-running our benchmark suite and will let you know if anything new pops up.

I think it would be good that for anything you see that you don't support (like different rounding modes in the same expression) to instantly emit "unsupported" (see page 47 of the smtlib standard) or give the "unknown" response to a check-sat query.

Treating Float16 or Float128 as Float64 will not be sound, I would again encourage you to emit "unknown" at that point.

I've also now compiled without -march and with -march=opteron-sse3 and both work. -march=native really does not if you build on a recent Intel box and then use the binary on an AMD machine. Many executions terminate with "illegal instruction" and others just give incorrect results.

florianschanda · 2018-01-22T11:33:57Z

Results are different now, we've found 4970 unsound answers, but they are not the same ones as before. Below a reduced list, but it should cover everything.

crafted/QF_FP/why3_review_01.smt2
nyxbrain/crafted-edge-cases/RNE:gt_to_lt_unsat.smt2
nyxbrain/crafted-edge-cases/RTN:gt_to_lt_unsat.smt2
nyxbrain/crafted-edge-cases/RTP:gt_to_lt_unsat.smt2
nyxbrain/crafted-edge-cases/RTZ:gt_to_lt_unsat.smt2
random/fp.cast/cast_rne_19862.smt2
random/fp.div/div_rne_00703.smt2
random/fp.div/div_rne_00758.smt2
random/fp.div/div_rne_00852.smt2
random/fp.div/div_rne_00920.smt2
random/fp.div/div_rne_00925.smt2
random/fp.div/div_rne_00939.smt2
random/fp.div/div_rne_01008.smt2
random/fp.div/div_rne_01093.smt2
random/fp.div/div_rne_01107.smt2
random/fp.div/div_rne_01125.smt2
random/fp.div/div_rne_01156.smt2
random/fp.div/div_rne_01181.smt2
random/fp.div/div_rne_01209.smt2
random/fp.mul/mul_rne_08016.smt2
random/fp.mul/mul_rne_08042.smt2
random/fp.mul/mul_rne_08061.smt2
random/fp.mul/mul_rne_08081.smt2
random/fp.mul/mul_rne_08114.smt2
random/fp.mul/mul_rne_08137.smt2
random/fp.mul/mul_rne_08292.smt2
random/fp.mul/mul_rne_08324.smt2
random/fp.mul/mul_rne_08326.smt2
random/fp.mul/mul_rne_08341.smt2
random/fp.mul/mul_rne_08345.smt2
random/fp.mul/mul_rne_08395.smt2
random/fp.sub/sub_rne_11182.smt2
random/fp.sub/sub_rne_11531.smt2
wintersteiger/add/add-has-no-other-solution-10020.smt2
wintersteiger/add/add-has-no-other-solution-1003.smt2
wintersteiger/div/div-has-no-other-solution-10317.smt2
wintersteiger/div/div-has-no-other-solution-11528.smt2
wintersteiger/eq/eq-has-no-other-solution-10005.smt2
wintersteiger/eq/eq-has-no-other-solution-10015.smt2
wintersteiger/gt/gt-has-no-other-solution-10.smt2
wintersteiger/gt/gt-has-no-other-solution-10013.smt2
wintersteiger/lt/lt-has-no-other-solution-10001.smt2
wintersteiger/lt/lt-has-no-other-solution-10003.smt2
wintersteiger/mul/mul-has-no-other-solution-10112.smt2
wintersteiger/mul/mul-has-no-other-solution-10506.smt2
wintersteiger/sub/sub-has-no-other-solution-1.smt2
wintersteiger/sub/sub-has-no-other-solution-10068.smt2

We explicitly check the argument sort of Z3_OP_EQ. In case they are of sort Z3_FLOATING_POINT_SORT we treat equality as Z3_OP_FPA_EQ, otherwise code for Z3_OP_EQ will be generated as usual. Partial fix for #8

abenkhadra · 2018-01-23T14:53:41Z

I added a few usability features which includes reporting "unsupported" upon detecting FP16 or FP128 variables. More importantly, it turns out that the parser of z3 provides Z3_OP_EQ (logical equality) in places where we expect Z3_OP_FPA_EQ (FP equality). This causes a bug since the treatment of the two equality types was split, for good, in the recent updates.

For example, consider the benchmark wintersteiger/eq/eq-has-no-other-solution-10015.smt2 which is partially reproduced below:

(set-logic QF_FP)
(set-info :status unsat)
(define-sort FPN () (_ FloatingPoint 11 53))
(declare-fun x () FPN)
(declare-fun y () FPN)
(assert (= x (fp #b0 #b01000101001 #b0000011101011111100011111011000000011101101101000010)))
(assert (= y (fp #b0 #b11000011110 #b0110111100000001000011010011110101010001011010111100)))
(assert  (not (= (fp.eq x y) false)))
(check-sat)

the parser gives Z3_OP_FPA_EQ for the comparison between x and y which is expected. However, it gives Z3_OP_EQ for the comparison between x and its corresponding constant despite the fact that the comparison is between FP (not booleans).

I've implemented a workaround where for operation Z3_OP_EQ, we look ahead into the type of the arguments. Based on that, we decide what code should be generated.

florianschanda · 2018-01-23T15:02:21Z

The Z3 parser is correct, the comparison is SMTLIB =, not FP EQ.

I believe your workaround is only going to add unsoundness when benchmarks depend on subtle difference between the two. For these specific constants they are equivalent. But they are not equivalent if either of the constant is 0 or NaN.

In C terms you might want to translate (fp.eq x y) as x == y, and (= x y) as *(int*)(&x) == *(int*)(&y).

florianschanda · 2018-01-23T15:04:53Z

I should further note that the SMTLIB floating point standard makes this distinction, and several benchmarks depend on this.

abenkhadra · 2018-01-23T16:26:25Z

Sorry for the misunderstanding. I believe that z3 is doing the correct thing, however, as far as the code generator is concerned, it was unexpected.

I'm interested in supporting a reasonable subset of QF_FP in a sound or, for practical reasons, may be "soundy" way. Along the way, we can learn about the limits of the stochastic search approach. So far, goSAT passes the soundness test on the griggio benchmarks in addition to the examples provided.

abenkhadra added a commit that referenced this issue Jan 21, 2018

Fix bug in the distance function

cc5e670

The distance between any number and a NaN should be a (fixed) positve number. The original treatment is unsound Partial fix for #8

abenkhadra added a commit that referenced this issue Jan 21, 2018

Detect unsupported rounding mode

539b947

Currently, we do not support setting rounding modes. We detect such constructs in the given formula, and warn the user that soundness might be affected Related to #8

abenkhadra added a commit that referenced this issue Jan 21, 2018

Fix bug in expr negation

5fd48e1

Related to #8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potentially unsound answers #8

potentially unsound answers #8

florianschanda commented Jan 18, 2018 •

edited

Loading

florianschanda commented Jan 18, 2018 •

edited

Loading

florianschanda commented Jan 19, 2018

abenkhadra commented Jan 21, 2018

florianschanda commented Jan 22, 2018

florianschanda commented Jan 22, 2018

abenkhadra commented Jan 23, 2018

florianschanda commented Jan 23, 2018 •

edited

Loading

florianschanda commented Jan 23, 2018 •

edited

Loading

abenkhadra commented Jan 23, 2018

potentially unsound answers #8

potentially unsound answers #8

Comments

florianschanda commented Jan 18, 2018 • edited Loading

florianschanda commented Jan 18, 2018 • edited Loading

florianschanda commented Jan 19, 2018

abenkhadra commented Jan 21, 2018

florianschanda commented Jan 22, 2018

florianschanda commented Jan 22, 2018

abenkhadra commented Jan 23, 2018

florianschanda commented Jan 23, 2018 • edited Loading

florianschanda commented Jan 23, 2018 • edited Loading

abenkhadra commented Jan 23, 2018

florianschanda commented Jan 18, 2018 •

edited

Loading

florianschanda commented Jan 18, 2018 •

edited

Loading

florianschanda commented Jan 23, 2018 •

edited

Loading

florianschanda commented Jan 23, 2018 •

edited

Loading