Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potentially unsound answers #8

Open
florianschanda opened this issue Jan 18, 2018 · 9 comments
Open

potentially unsound answers #8

florianschanda opened this issue Jan 18, 2018 · 9 comments

Comments

@florianschanda
Copy link

florianschanda commented Jan 18, 2018

Hi,

We've found the following answers of gosat to be probably incorrect (sat that should be unsat and vice versa).

The filenames below come from my bag of benchmarks. https://github.com/florianschanda/smtlib_schanda I've tried my best to make sure the various status are correct by mistakes are of course possible.

nyxbrain/crafted-edge-cases/RNE:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RNE:the_answer_is_NaN_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTN:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTN:the_answer_is_NaN_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTP:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTP:the_answer_is_NaN_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTZ:IEEE_equality_is_almost_zero_difference_5_unsat.smt2
nyxbrain/crafted-edge-cases/RTZ:the_answer_is_NaN_5_unsat.smt2
random/fp.cast/cast_rna_20033.smt2
random/fp.cast/cast_rna_20193.smt2
random/fp.cast/cast_rna_20201.smt2
random/fp.cast/cast_rna_20202.smt2
random/fp.cast/cast_rna_20204.smt2
random/fp.cast/cast_rna_20208.smt2
random/fp.cast/cast_rne_19904.smt2
random/fp.cast/cast_rne_19908.smt2
random/fp.cast/cast_rne_19916.smt2
random/fp.cast/cast_rne_19929.smt2
random/fp.cast/cast_rne_19930.smt2
random/fp.cast/cast_rtn_20753.smt2
random/fp.cast/cast_rtn_20760.smt2
random/fp.cast/cast_rtn_20761.smt2
random/fp.cast/cast_rtn_20763.smt2
random/fp.cast/cast_rtn_20775.smt2
random/fp.cast/cast_rtz_20828.smt2
random/fp.cast/cast_rtz_20879.smt2
random/fp.cast/cast_rtz_20966.smt2
random/fp.cast/cast_rtz_21033.smt2
random/fp.cast/cast_rtz_21034.smt2
random/fp.cast/cast_rtz_21036.smt2
random/fp.cast/cast_rtz_21037.smt2
random/fp.cast/cast_rtz_21040.smt2
random/fp.cast/cast_rtz_21054.smt2
wintersteiger/mul/mul-has-no-other-solution-10044.smt2
wintersteiger/mul/mul-has-no-other-solution-13484.smt2
wintersteiger/mul/mul-has-no-other-solution-15092.smt2
wintersteiger/mul/mul-has-no-other-solution-15226.smt2
wintersteiger/mul/mul-has-no-other-solution-15435.smt2
wintersteiger/mul/mul-has-no-other-solution-15552.smt2
wintersteiger/mul/mul-has-no-other-solution-19889.smt2
wintersteiger/mul/mul-has-no-other-solution-3202.smt2
wintersteiger/mul/mul-has-no-other-solution-3804.smt2
wintersteiger/mul/mul-has-no-other-solution-7814.smt2
wintersteiger/mul/mul-has-no-other-solution-8691.smt2
@florianschanda
Copy link
Author

florianschanda commented Jan 18, 2018

Also, for some of these the platform of the computer I am on seems to matter.

wintersteiger/mul/mul-has-no-other-solution-15884.smt2 for example returns SAT on my AMD computer but UNKNOWN on my Intel computer.

I am using the same binary (which I compiled on my intel). Both computers are running the same Debian. Below the relevant extracts from /proc/cpuinfo
model name : Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
model name : AMD Opteron(tm) Processor 6328

@florianschanda
Copy link
Author

Some of the issues could be resolved by not using -march=native that the CMakeLists hard-codes. However the bulk of the issued described above remain.

abenkhadra added a commit that referenced this issue Jan 21, 2018
The distance between any number and a NaN should
be a (fixed) positve number. The original treatment is unsound

Partial fix for #8
abenkhadra added a commit that referenced this issue Jan 21, 2018
Currently, we do not support setting rounding modes. We detect
such constructs in the given formula, and warn the user that
soundness might be affected

Related to #8
abenkhadra added a commit that referenced this issue Jan 21, 2018
abenkhadra added a commit that referenced this issue Jan 21, 2018
- Bug fix in how we treated NaNs in the distance function.
  The distance between any number and a NaN should by > 0.
  Currently, it's fixed to 1024.

- Bug fix in how negation (`not` operator) propagated to
  sub-expressions. Observe de-morgan's law

- Bug fix in handling logical equality and inequality.
  Formerly, we did not make a distinction between logical
  equality and FP equality which is wrong. Two separate distance
  functions were added to support this distinction.

- Warn users about unsupported rounding modes. goSAT does not
  honor rounding mode spec and keeps the default mode in the
  underlaying machine.

- Did minor code refactoring
@abenkhadra
Copy link
Owner

I can confirm that all smt examples you provided can lead to unsound results. The following bug fixes have been applied:

  • Bug fix in how we treated NaNs in the distance function. The distance between any number and a NaN should by > 0. Currently, it is set to 1024.

  • Bug fix in how negation (not operator) propagated to sub-expressions.

  • Bug fix in handling logical equality and inequality. Earlier, we did not make a distinction between logical equality and FP equality which is wrong. Two separate distance functions were added to support this distinction.

Also, the examples exposed two current limitations which might be addressed in the future:

  • goSAT does not honor rounding mode spec and keeps the default mode of the underlaying machine. For now, we warn users about unsupported rounding modes. Actually, LLVM does not support rounding modes directly in the IR yet. There are discussions about this issue, but still nothing in the upstream. Note that the C++ facility fesetround can be useful for setting the mode per formula (before calling the optimizer). What we need, however, is a method of finer granularity to set the mode per sub-expression.

  • We natively support FP64 (double) only, which is the input accepted by nlopt. FP32 is partially supported by casting it to double at the interface with nlopt. FP16 and FP128 are not supported. However, you can still specify variables of size FP16 or FP128 which will internally be treated as FP64.

Additionally, the changes have passed the regression tests on the griggio benchmarks. Hence, the results of our paper are still reproducible. However, the changes are significant and may cause bugs on other benchmarks :)

Finally, I do not think the -march=native plays a role here. The answers for a few benchmarks were unstable, with repeated runs, since they are very close to zero. Something in the order of 10^-17.
Playing with the XTOL parameter might help. However, I have not looked much into tuning nlopt parameters.

Thanks for experimenting with goSAT!

@florianschanda
Copy link
Author

Thank you for the fast fixes :) I am currently re-running our benchmark suite and will let you know if anything new pops up.

I think it would be good that for anything you see that you don't support (like different rounding modes in the same expression) to instantly emit "unsupported" (see page 47 of the smtlib standard) or give the "unknown" response to a check-sat query.

Treating Float16 or Float128 as Float64 will not be sound, I would again encourage you to emit "unknown" at that point.

I've also now compiled without -march and with -march=opteron-sse3 and both work. -march=native really does not if you build on a recent Intel box and then use the binary on an AMD machine. Many executions terminate with "illegal instruction" and others just give incorrect results.

@florianschanda
Copy link
Author

Results are different now, we've found 4970 unsound answers, but they are not the same ones as before. Below a reduced list, but it should cover everything.

crafted/QF_FP/why3_review_01.smt2
nyxbrain/crafted-edge-cases/RNE:gt_to_lt_unsat.smt2
nyxbrain/crafted-edge-cases/RTN:gt_to_lt_unsat.smt2
nyxbrain/crafted-edge-cases/RTP:gt_to_lt_unsat.smt2
nyxbrain/crafted-edge-cases/RTZ:gt_to_lt_unsat.smt2
random/fp.cast/cast_rne_19862.smt2
random/fp.div/div_rne_00703.smt2
random/fp.div/div_rne_00758.smt2
random/fp.div/div_rne_00852.smt2
random/fp.div/div_rne_00920.smt2
random/fp.div/div_rne_00925.smt2
random/fp.div/div_rne_00939.smt2
random/fp.div/div_rne_01008.smt2
random/fp.div/div_rne_01093.smt2
random/fp.div/div_rne_01107.smt2
random/fp.div/div_rne_01125.smt2
random/fp.div/div_rne_01156.smt2
random/fp.div/div_rne_01181.smt2
random/fp.div/div_rne_01209.smt2
random/fp.mul/mul_rne_08016.smt2
random/fp.mul/mul_rne_08042.smt2
random/fp.mul/mul_rne_08061.smt2
random/fp.mul/mul_rne_08081.smt2
random/fp.mul/mul_rne_08114.smt2
random/fp.mul/mul_rne_08137.smt2
random/fp.mul/mul_rne_08292.smt2
random/fp.mul/mul_rne_08324.smt2
random/fp.mul/mul_rne_08326.smt2
random/fp.mul/mul_rne_08341.smt2
random/fp.mul/mul_rne_08345.smt2
random/fp.mul/mul_rne_08395.smt2
random/fp.sub/sub_rne_11182.smt2
random/fp.sub/sub_rne_11531.smt2
wintersteiger/add/add-has-no-other-solution-10020.smt2
wintersteiger/add/add-has-no-other-solution-1003.smt2
wintersteiger/div/div-has-no-other-solution-10317.smt2
wintersteiger/div/div-has-no-other-solution-11528.smt2
wintersteiger/eq/eq-has-no-other-solution-10005.smt2
wintersteiger/eq/eq-has-no-other-solution-10015.smt2
wintersteiger/gt/gt-has-no-other-solution-10.smt2
wintersteiger/gt/gt-has-no-other-solution-10013.smt2
wintersteiger/lt/lt-has-no-other-solution-10001.smt2
wintersteiger/lt/lt-has-no-other-solution-10003.smt2
wintersteiger/mul/mul-has-no-other-solution-10112.smt2
wintersteiger/mul/mul-has-no-other-solution-10506.smt2
wintersteiger/sub/sub-has-no-other-solution-1.smt2
wintersteiger/sub/sub-has-no-other-solution-10068.smt2

abenkhadra added a commit that referenced this issue Jan 23, 2018
We explicitly check the argument sort of Z3_OP_EQ.
In case they are of sort Z3_FLOATING_POINT_SORT we treat equality
as Z3_OP_FPA_EQ, otherwise code for Z3_OP_EQ will be generated
as usual.

Partial fix for #8
@abenkhadra
Copy link
Owner

I added a few usability features which includes reporting "unsupported" upon detecting FP16 or FP128 variables. More importantly, it turns out that the parser of z3 provides Z3_OP_EQ (logical equality) in places where we expect Z3_OP_FPA_EQ (FP equality). This causes a bug since the treatment of the two equality types was split, for good, in the recent updates.

For example, consider the benchmark wintersteiger/eq/eq-has-no-other-solution-10015.smt2 which is partially reproduced below:

(set-logic QF_FP)
(set-info :status unsat)
(define-sort FPN () (_ FloatingPoint 11 53))
(declare-fun x () FPN)
(declare-fun y () FPN)
(assert (= x (fp #b0 #b01000101001 #b0000011101011111100011111011000000011101101101000010)))
(assert (= y (fp #b0 #b11000011110 #b0110111100000001000011010011110101010001011010111100)))
(assert  (not (= (fp.eq x y) false)))
(check-sat)

the parser gives Z3_OP_FPA_EQ for the comparison between x and y which is expected. However, it gives Z3_OP_EQ for the comparison between x and its corresponding constant despite the fact that the comparison is between FP (not booleans).

I've implemented a workaround where for operation Z3_OP_EQ, we look ahead into the type of the arguments. Based on that, we decide what code should be generated.

@florianschanda
Copy link
Author

florianschanda commented Jan 23, 2018

The Z3 parser is correct, the comparison is SMTLIB =, not FP EQ.

I believe your workaround is only going to add unsoundness when benchmarks depend on subtle difference between the two. For these specific constants they are equivalent. But they are not equivalent if either of the constant is 0 or NaN.

In C terms you might want to translate (fp.eq x y) as x == y, and (= x y) as *(int*)(&x) == *(int*)(&y).

@florianschanda
Copy link
Author

florianschanda commented Jan 23, 2018

I should further note that the SMTLIB floating point standard makes this distinction, and several benchmarks depend on this.

@abenkhadra
Copy link
Owner

Sorry for the misunderstanding. I believe that z3 is doing the correct thing, however, as far as the code generator is concerned, it was unexpected.

I'm interested in supporting a reasonable subset of QF_FP in a sound or, for practical reasons, may be "soundy" way. Along the way, we can learn about the limits of the stochastic search approach. So far, goSAT passes the soundness test on the griggio benchmarks in addition to the examples provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants