NOTES ON GENERATING PSEUDO-BOOLEAN BENCHMARKS AND RUNNING PSEUDO-BOOLEAN SOLVERS
================================================================================

These notes explains some of the basics about how to generate
pseudo-Boolean benchmarks and how to run the RoundingSat pseudo-Boolean
solver. We should point out that RoundingSat is still very much research
software, and we are most grateful if you could send reports of any
problems you run into to Jakob Nordstrom (jn@di.ku.dk) and Andy Oertel
(andy.oertel@cs.lth.se).


INSTALLATION OF CNFGEN FOR GENERATING BENCHMARKS

To generate benchmark instances you can use the tool CNFgen. You need to
have Python installed to run CNFgen. The easiest way to install CNFgen is
through pip. When you have Python and pip installed, just type

pip3 install --user cnfgen

in a terminal window to install. Now you should be able to run 'cnfgen
--help' or 'pbgen --help' for options and an explanation how to use these
two tools.

By default, CNFgen just prints the generated CNF formula on standard
out. To write the formula to file, use the '--output' option. As an
example, the command

cnfgen --output out.cnf php 10 9

generates the well-known pigenonhole principle formula claiming that 10
pigeons fit into 9 holes although every pigeon wants a hole of its
own. Note, though, that the commandline above generates the CNF formula in
DIMACS format --- in order to make use of pseudo-Boolean solvers, we will
instead be interested in pseudo-Boolean formulas that use the full power of
0-1 integer linear inequalities to achieve more compact and expressive
encodings.

CNFgen can also be installed from the github repository
(https://github.com/MassimoLauria/cnfgen). First, make sure you have Python
installed. Then clone the github repository and run

python3 setup.py install --user

inside the cloned directory to install CNFgen.


SAT4J AND IMPLICIT HITTING SET PSEUDO-BOOLEAN SOLVERS

Below follows some information about two other solvers that will be
mentioned briefly in the lectures in addition to RoundingSat, which is the
solver developed by our research group.

Arguablly the most well-known and widely used pseudo-Boolean solver is
Sat4j. More information about Sat4j can be found at https://www.sat4j.org/,
and the solver can be downloaded from
https://gitlab.ow2.org/sat4j/sat4j/-/releases/. Sat4j is written in Java,
which causes a bit of a performance hit. For many benchmarks (but far from
all) it is significantly slower than RoundingSat also due to how conflict
analysis has been implemented (which will be discussed briefly in the
lectures).

The Constraint Reasoning and Optimization (CoReO) group in Helsinki has
developed an implicit hitting set (IHS) based solver on top of
RoundingSat. This solver is available at
https://bitbucket.org/coreo-group/pbo-ihs-solver/, where you can also find
installation instructions.


EXECUTIVE SUMMARY FOR THE PSEUDO-BOOLEAN SOLVERXS ROUNDINGSAT

Below follows an explanation of the main different ways of running the
RoundingSat pseudo-Boolean solver.

(1) BASIC VERSION 

For decision problems: --lp=0
For optimization problems: --lp=0 --opt-mode=linear
[but this pure solution-improving search is not very good]

(2) WITH LINEAR PROGRAMMING (LP) SOLVER INTEGRATION

For decision problems: --lp=1
For optimization problems: --lp=1 --opt-mode=linear

Uses the LP solver SoPlex from the SCIP suite (if that solver is
available), and so is subject to the SoPlex license. Trying it out for
research or evaluation purposes is totally fine, however. (Soplex has to be
iinstalled separately, but is included in the Docker image that we
provide.)

(3) WITH CORE-GUIDED OPTIMIZATION

Only relevant for optimization problems:
--lp=0 --opt-mode=hybrid --cg-strat=1 --cg-indcores=0

To this one can also try to add (while keeping the above options):
--cg-encoding=reified --cg-resprop=1

The above variants of the solver should already go pretty far. Note that
the recommended use is to switch on *either* LP integration *or*
core-guided techniques, but not both. They do not work well together. There
should be great potential for doing something about this, but we simply
have not had time to look into it.

For users who wants to have more information, below follows a more detailed
technical write-up that discusses various options and what they mean. But
first, let us explain how to get an executable version of the solver.


INSTALLATION OF ROUNDINGSAT

RoundingSat can be installed as a Docker image (Windows, MacOS and Linux),
downloaded as a compiled binary (Linux only), or compiled from source
(Linux only).

To install the RoundingSat Docker image from Docker Hub, make sure to have
Docker installed (https://docs.docker.com/engine/install/). If you
installed Docker Desktop, then open it and enter

docker run aoer/roundingsat

to run the Docker image. You can pass on any RoundingSAT option by just
appending them. For instance,

docker run aoer/roundingsat --help

prints out a list of the options that RoundingSat supports with some
(perhaps somewhat terse) explanations of what they mean.

To run the solver a benchmark formula stored in a file, use

docker run -v /path/to/instance:/instance aoer/roundingsat /instance/filename.opb

to mount the directory /path/to/instance in a directory /instance which the
Docker image can access.

A example of how to run RoundingSat with some options is

docker run -v /path/to/instance:/instance aoer/roundingsat --lp=0 --opt-mode=linear /instance/filename.opb .

RoundingSat can also be downloaded as a compiled binary. A version of
RoundingSat with support for arbitary-precision arithmetic can be found at

https://lu.box.com/s/sx3yos861l3onfqlbujuri4oz8y7igsg

Another version that interleaves the pseudo-Boolean conflict-driven search
with solving LP relaxations (inspired by MIP solvers), and that does so by
using the SoPlex solver from SCIP, is at

https://lu.box.com/s/hbel34r2khv0bzgz3e73i7qyzi32l22r

It is worth noting that SoPlex has a fairly strict license regarding usage,
especially for commercial actors, but according to what the SCIP team has
told us in practice everything should be OK as long as the SoPlex
integration is used for research purposes only and not for commercial gain.

You can also compile RoundingSAT from source by going to the repository

https://gitlab.com/MIAOresearch/software/roundingsat

and following the instructions in README.md.

You can run RoundingSAT with the option '--help' to get a full list of
options. Below follows a discussion of some particular settings that might
be interesting to investigate.

One setting that should probably be default, but is not, is emulating
standard VSIDS behaviour of CDCL solvers with '--vsids-var=0.8'. This gives
the VSIDS variable decay factor used by, e.g., Glucose. The current default
of RoundingSat is instead the MiniSat value of 0.95, which is often
slightly worse. So, in what follows below, you might always want to include
'--vsids-var=0.8' also (though the impact is often not huge).


PSEUDO-BOOLEAN SOLVING WITH LINEAR PROGRAMMING INTEGRATION

Especially for decision problems, it can be interesting to try RoundingSat
with LP integration. The option '--lp=RATIO' switches LP integration on if
RATIO > 0. RATIO is (roughly) the ratio between the number of pivots made
by the simplex algorithm in the LP solver and the number of conflicts
encountered during conflict-driven pseudo-Boolean search. The default value
is 1. The higher this value is, the more time is given to the LP solver as
compared to the PB solver.

Some settings to try for PB solving with LP integration (i.e., with
'--lp=RATIO' for RATIO > 0) are:

(a) '--lp-cut-learned=0'
(b) '--lp-cut-gomory=0'
(c) '--lp-cut-gomory=1' but for different values of VAL in the option
'--lp-cut-maxcos=VAL' and potentially also for different values of NUM in
'--lp-cut-gomlim=NUM'.

By way of explanation, if '--lp-cut-learned=1' is specified, some
constraints learned during PB search are also passed to the LP solver. If
'--lp-cut-learned=0' is set, then the LP solver only runs on the original
input formula. The default is 1.

If the LP solver finds a rational solution and the option '--lp-cut-gomory=1'
is set (which it is by default), then Gomory cuts are added to the LP
relaxation as new constraints that remove this rational solution;
'--lp-cut-gomory=0' switches this off. The option '--lp-cut-maxcos=VAL'
determines how parallel (i.e., similar) different cuts are allowed to be,
as measured by the cosine function. This is a float between 0 and 1 with
default value 0.1, and the smaller the value, the more "different" the
added cut constraints have to be. The option '--lp-cut-gomlim=NUM'
determines how many Gomory cuts can be added in one go; the default is 100.

[As a side remark, Gomory cuts are super-important for MIP solvers but
don't seem to make too much of a difference for RoundingSat. There should be
room for improving this, but we have not had the time and resources to look
into it.]

RoundingSat with LP integration can also be used for optimization, in which
case the best setting is probably '--opt-mode=linear'.


PSEUDO-BOOLEAN SOLVING WITH CORE-GUIDED SEARCH

For optimization problems, RoundingSat also has a version incorporating
different forms of core-guided search. There are different optimization
modes with are set by the option '--opt-mode=OPT', where OPT can be one of
linear, coreguided, coreboosted, or hybrid. Hybrid mode is the default.
Core-guided search is so far not playing well with LP integration in our
experience, so in what follows one should always have '--lp=0'. But it is
also possible to have '--lp=1' --- it just does not seem very good at
present, but we did not really work on tuning this.

[This is another problem that could be interesting to investigate, although
there are also much more exciting problems related to techniques that are
currently not implemented in RoundingSat at all.]

Some settings to try for core-guided search are:

(a) Default settings, i.e., '--opt-mode=hybrid --cg-hybrid=0.50'
(b) '--opt-mode=hybrid --cg-hybrid=FRAC' for 0 < FRAC < 1
(c) Whatever was best before plus '--cg-indcores=0 --cg-strat=0'
(d) Whatever was best before plus '--cg-indcores=1 --cg-strat=1'
(e) '--opt-mode=coreboosted --cg-boost=[10% of total running time]'
(assuming that there is a time-out limit)   
(f) And, just for the record, the best of the above options should be
compared against '--opt-mode=linear --lp=1', which often seems to be
competitive with or better than the best core-guided variant.

The hybrid mode switches between core-guided search focused on improving
the lower bound and linear search trying to find better solutions and
improve the upper bound. (Recall that in the OPB format optimization
problems are always written as minimization problems.)

The option '--cg-hybrid' decides what fraction of the running time will be
dedicated to core-guided search (though the time measure is a very rough
pseudo-time measure, so this is not exact in any way, and a 50-50 split
tends to give less time to core-guided search than to linear search in
practice). We have tried values for FRAC like 0.25, 0.5, or 0.75, which are
all reasonable.

The option '--cg-strat=0/1' switches off or on stratification, which means
that the solver will focus on the big coefficients in the objective
function first, and might therefore also find upper-bounding solutions
during the "core-guided" phase. The default is 1, i.e., stratification
on. In general, I believe that you might want a larger value for
'--cg-hybrid=FRAC' when '--cg-strat' is switched on, because we get less
"pure core-guided" search when stratification is on, and so it might be
good to increase the FRAC value to compensate for this. When stratification
is off, lower values like FRAC=0.25 or so might be better.

The option '--cg-indcores=1' switches on independent cores. In our
experiments so far, this has not tended to be very successful. One issue
here, though, might be that once you switch independent cores on, the way
we have it implemented is that it stays on forever. We should probably
change this so that it is switched off after a while. Also, we have
recently figured out that we apparently implemented this slightly
differently from Berg, Järvisalo et al. in Helsinki, and this might be a
reason why they get this to work better in their MaxSAT solvers than we do
in RoundingSat. (Technically speaking, RoundingSat looks for "disjoint cores",
which is a bit more restrictive than what Berg et al. mean by "independent
cores".)

The setting '--cg-boost=T' runs core-guided search for T seconds initially,
and then switches to linear search for the rest of the execution. We have
seen that this can sometimes be quite successful for T being something like
10% of the time-out limit. It is not clear whether this is better than doing,
say, '--opt-mode=hybrid --cg-hybrid=0.20' instead, though.

If you have read this far, and try out a reasonable subset of the
combinations discussed above, then you probably get pretty close to
squeezing out the best performance that RoundingSat can currently deliver.

The next section on arithmetic precision could potentially also be quite
important, depending on the concrete applications, but further down the
discussions will get increasingly more technical (and probably less
relevant).


SETTINGS FOR ARITHMETIC PRECISION

Very roughly, the default setting of RoundingSat keeps coefficients of
constraints in (almost) 32 bits and does the conflict analysis in (almost)
64 bits, except for a couple of bits sacrificed for simplicity and
efficiency. For some applications, such as arithmetic circuit verification,
we have seen that higher precision, or even arbitrary precision, for the
coefficients can be crucial for good performance.

The options governing arithmetic precision are as follows (where specifying
values 0 will give arbitary-precision arithmetic):

'--bits-overflow=BO': number of bits during conflict analysis. The default
is BO=62.

'--bits-reduced=BR': coefficients are reducted to this bit size when they
exceed size BO. The default is 29. (This should be smaller than BO by some
margin, to avoid hitting overflow again and again.)

'--bits-learned=BL': The number of bits in constraints that are learned and
stored in the constraint database.

In addition to the default, two settings that could perhaps be interesting
are:

(a) Unlimited precision: '--bits-overflow=0 --bits-reduced=0 --bits-learned=0'

(b) Learned constraints in limited precision but conflict analysis in
unlimited precision: '--bits-overflow=0 --bits-reduced=0 --bits-learned=29'


SOME MORE IN-DEPTH TECHNICAL SETTINGS FOR CORE-GUIDED SEARCH

'--cg-solutionphase=1': Switch off standard phase saving during linear
search and instead fix the phase to agree with the incumbent solution
(i.e., the best solution found so far). This is the default. By giving the
option '--cg-solutionphase=0' one instead gets standard phase saving as in
CDCL SAT solvers.

'--cg-encoding=reified' uses a different encoding of lazy variables during
core-guided search that gives stronger propagation. This should arguably be
the default setting, but the default is currently '--cg-encoding=lazysum'
instead. We have not seen any truly major impact here, though.

'--cg-resprop=1' resolves away propagated assumption variables instead of
leaving them in the core during core-guided search. Again, this should
probably be the default, but it seems this is instead '--cg-resprop=0',
i.e., leaving all assumption variables in the core, regardless of whether
they were assumed or propagated.


SOME TECHNICAL SETTINGS INHERITED FROM CDCL SOLVERS

RoundingSat currently only supports static, Luby-style, restarts. To adapt
the restart frequency, use '--luby-base=BASE" to change the base of the
Luby restart sequence (the default is BASE=2) and '--luby-mult=FAC' to
chance the multiplicative factor (the default is FAC=100).

As mentioned already, one can use '--vsids-var=VDECAY' to set the VSIDS
variable decay. The default 0.95 is inherited from MiniSat. More modern
solvers tend to have lower values like 0.8, or values that change slightly
over time (which is not supported by RoundingSat).