NOTES ON GENERATING PSEUDO-BOOLEAN BENCHMARKS AND RUNNING PSEUDO-BOOLEAN SOLVERS ================================================================================ These notes explains some of the basics about how to generate pseudo-Boolean benchmarks and how to run the RoundingSat pseudo-Boolean solver. We should point out that RoundingSat is still very much research software, and we are most grateful if you could send reports of any problems you run into to Jakob Nordstrom (jn@di.ku.dk) and Andy Oertel (andy.oertel@cs.lth.se). INSTALLATION OF CNFGEN FOR GENERATING BENCHMARKS To generate benchmark instances you can use the tool CNFgen. You need to have Python installed to run CNFgen. The easiest way to install CNFgen is through pip. When you have Python and pip installed, just type pip3 install --user cnfgen in a terminal window to install. Now you should be able to run 'cnfgen --help' or 'pbgen --help' for options and an explanation how to use these two tools. By default, CNFgen just prints the generated CNF formula on standard out. To write the formula to file, use the '--output' option. As an example, the command cnfgen --output out.cnf php 10 9 generates the well-known pigenonhole principle formula claiming that 10 pigeons fit into 9 holes although every pigeon wants a hole of its own. Note, though, that the commandline above generates the CNF formula in DIMACS format --- in order to make use of pseudo-Boolean solvers, we will instead be interested in pseudo-Boolean formulas that use the full power of 0-1 integer linear inequalities to achieve more compact and expressive encodings. CNFgen can also be installed from the github repository (https://github.com/MassimoLauria/cnfgen). First, make sure you have Python installed. Then clone the github repository and run python3 setup.py install --user inside the cloned directory to install CNFgen. SAT4J AND IMPLICIT HITTING SET PSEUDO-BOOLEAN SOLVERS Below follows some information about two other solvers that will be mentioned briefly in the lectures in addition to RoundingSat, which is the solver developed by our research group. Arguablly the most well-known and widely used pseudo-Boolean solver is Sat4j. More information about Sat4j can be found at https://www.sat4j.org/, and the solver can be downloaded from https://gitlab.ow2.org/sat4j/sat4j/-/releases/. Sat4j is written in Java, which causes a bit of a performance hit. For many benchmarks (but far from all) it is significantly slower than RoundingSat also due to how conflict analysis has been implemented (which will be discussed briefly in the lectures). The Constraint Reasoning and Optimization (CoReO) group in Helsinki has developed an implicit hitting set (IHS) based solver on top of RoundingSat. This solver is available at https://bitbucket.org/coreo-group/pbo-ihs-solver/, where you can also find installation instructions. EXECUTIVE SUMMARY FOR THE PSEUDO-BOOLEAN SOLVERXS ROUNDINGSAT Below follows an explanation of the main different ways of running the RoundingSat pseudo-Boolean solver. (1) BASIC VERSION For decision problems: --lp=0 For optimization problems: --lp=0 --opt-mode=linear [but this pure solution-improving search is not very good] (2) WITH LINEAR PROGRAMMING (LP) SOLVER INTEGRATION For decision problems: --lp=1 For optimization problems: --lp=1 --opt-mode=linear Uses the LP solver SoPlex from the SCIP suite (if that solver is available), and so is subject to the SoPlex license. Trying it out for research or evaluation purposes is totally fine, however. (Soplex has to be iinstalled separately, but is included in the Docker image that we provide.) (3) WITH CORE-GUIDED OPTIMIZATION Only relevant for optimization problems: --lp=0 --opt-mode=hybrid --cg-strat=1 --cg-indcores=0 To this one can also try to add (while keeping the above options): --cg-encoding=reified --cg-resprop=1 The above variants of the solver should already go pretty far. Note that the recommended use is to switch on *either* LP integration *or* core-guided techniques, but not both. They do not work well together. There should be great potential for doing something about this, but we simply have not had time to look into it. For users who wants to have more information, below follows a more detailed technical write-up that discusses various options and what they mean. But first, let us explain how to get an executable version of the solver. INSTALLATION OF ROUNDINGSAT RoundingSat can be installed as a Docker image (Windows, MacOS and Linux), downloaded as a compiled binary (Linux only), or compiled from source (Linux only). To install the RoundingSat Docker image from Docker Hub, make sure to have Docker installed (https://docs.docker.com/engine/install/). If you installed Docker Desktop, then open it and enter docker run aoer/roundingsat to run the Docker image. You can pass on any RoundingSAT option by just appending them. For instance, docker run aoer/roundingsat --help prints out a list of the options that RoundingSat supports with some (perhaps somewhat terse) explanations of what they mean. To run the solver a benchmark formula stored in a file, use docker run -v /path/to/instance:/instance aoer/roundingsat /instance/filename.opb to mount the directory /path/to/instance in a directory /instance which the Docker image can access. A example of how to run RoundingSat with some options is docker run -v /path/to/instance:/instance aoer/roundingsat --lp=0 --opt-mode=linear /instance/filename.opb . RoundingSat can also be downloaded as a compiled binary. A version of RoundingSat with support for arbitary-precision arithmetic can be found at https://lu.box.com/s/sx3yos861l3onfqlbujuri4oz8y7igsg Another version that interleaves the pseudo-Boolean conflict-driven search with solving LP relaxations (inspired by MIP solvers), and that does so by using the SoPlex solver from SCIP, is at https://lu.box.com/s/hbel34r2khv0bzgz3e73i7qyzi32l22r It is worth noting that SoPlex has a fairly strict license regarding usage, especially for commercial actors, but according to what the SCIP team has told us in practice everything should be OK as long as the SoPlex integration is used for research purposes only and not for commercial gain. You can also compile RoundingSAT from source by going to the repository https://gitlab.com/MIAOresearch/software/roundingsat and following the instructions in README.md. You can run RoundingSAT with the option '--help' to get a full list of options. Below follows a discussion of some particular settings that might be interesting to investigate. One setting that should probably be default, but is not, is emulating standard VSIDS behaviour of CDCL solvers with '--vsids-var=0.8'. This gives the VSIDS variable decay factor used by, e.g., Glucose. The current default of RoundingSat is instead the MiniSat value of 0.95, which is often slightly worse. So, in what follows below, you might always want to include '--vsids-var=0.8' also (though the impact is often not huge). PSEUDO-BOOLEAN SOLVING WITH LINEAR PROGRAMMING INTEGRATION Especially for decision problems, it can be interesting to try RoundingSat with LP integration. The option '--lp=RATIO' switches LP integration on if RATIO > 0. RATIO is (roughly) the ratio between the number of pivots made by the simplex algorithm in the LP solver and the number of conflicts encountered during conflict-driven pseudo-Boolean search. The default value is 1. The higher this value is, the more time is given to the LP solver as compared to the PB solver. Some settings to try for PB solving with LP integration (i.e., with '--lp=RATIO' for RATIO > 0) are: (a) '--lp-cut-learned=0' (b) '--lp-cut-gomory=0' (c) '--lp-cut-gomory=1' but for different values of VAL in the option '--lp-cut-maxcos=VAL' and potentially also for different values of NUM in '--lp-cut-gomlim=NUM'. By way of explanation, if '--lp-cut-learned=1' is specified, some constraints learned during PB search are also passed to the LP solver. If '--lp-cut-learned=0' is set, then the LP solver only runs on the original input formula. The default is 1. If the LP solver finds a rational solution and the option '--lp-cut-gomory=1' is set (which it is by default), then Gomory cuts are added to the LP relaxation as new constraints that remove this rational solution; '--lp-cut-gomory=0' switches this off. The option '--lp-cut-maxcos=VAL' determines how parallel (i.e., similar) different cuts are allowed to be, as measured by the cosine function. This is a float between 0 and 1 with default value 0.1, and the smaller the value, the more "different" the added cut constraints have to be. The option '--lp-cut-gomlim=NUM' determines how many Gomory cuts can be added in one go; the default is 100. [As a side remark, Gomory cuts are super-important for MIP solvers but don't seem to make too much of a difference for RoundingSat. There should be room for improving this, but we have not had the time and resources to look into it.] RoundingSat with LP integration can also be used for optimization, in which case the best setting is probably '--opt-mode=linear'. PSEUDO-BOOLEAN SOLVING WITH CORE-GUIDED SEARCH For optimization problems, RoundingSat also has a version incorporating different forms of core-guided search. There are different optimization modes with are set by the option '--opt-mode=OPT', where OPT can be one of linear, coreguided, coreboosted, or hybrid. Hybrid mode is the default. Core-guided search is so far not playing well with LP integration in our experience, so in what follows one should always have '--lp=0'. But it is also possible to have '--lp=1' --- it just does not seem very good at present, but we did not really work on tuning this. [This is another problem that could be interesting to investigate, although there are also much more exciting problems related to techniques that are currently not implemented in RoundingSat at all.] Some settings to try for core-guided search are: (a) Default settings, i.e., '--opt-mode=hybrid --cg-hybrid=0.50' (b) '--opt-mode=hybrid --cg-hybrid=FRAC' for 0 < FRAC < 1 (c) Whatever was best before plus '--cg-indcores=0 --cg-strat=0' (d) Whatever was best before plus '--cg-indcores=1 --cg-strat=1' (e) '--opt-mode=coreboosted --cg-boost=[10% of total running time]' (assuming that there is a time-out limit) (f) And, just for the record, the best of the above options should be compared against '--opt-mode=linear --lp=1', which often seems to be competitive with or better than the best core-guided variant. The hybrid mode switches between core-guided search focused on improving the lower bound and linear search trying to find better solutions and improve the upper bound. (Recall that in the OPB format optimization problems are always written as minimization problems.) The option '--cg-hybrid' decides what fraction of the running time will be dedicated to core-guided search (though the time measure is a very rough pseudo-time measure, so this is not exact in any way, and a 50-50 split tends to give less time to core-guided search than to linear search in practice). We have tried values for FRAC like 0.25, 0.5, or 0.75, which are all reasonable. The option '--cg-strat=0/1' switches off or on stratification, which means that the solver will focus on the big coefficients in the objective function first, and might therefore also find upper-bounding solutions during the "core-guided" phase. The default is 1, i.e., stratification on. In general, I believe that you might want a larger value for '--cg-hybrid=FRAC' when '--cg-strat' is switched on, because we get less "pure core-guided" search when stratification is on, and so it might be good to increase the FRAC value to compensate for this. When stratification is off, lower values like FRAC=0.25 or so might be better. The option '--cg-indcores=1' switches on independent cores. In our experiments so far, this has not tended to be very successful. One issue here, though, might be that once you switch independent cores on, the way we have it implemented is that it stays on forever. We should probably change this so that it is switched off after a while. Also, we have recently figured out that we apparently implemented this slightly differently from Berg, Järvisalo et al. in Helsinki, and this might be a reason why they get this to work better in their MaxSAT solvers than we do in RoundingSat. (Technically speaking, RoundingSat looks for "disjoint cores", which is a bit more restrictive than what Berg et al. mean by "independent cores".) The setting '--cg-boost=T' runs core-guided search for T seconds initially, and then switches to linear search for the rest of the execution. We have seen that this can sometimes be quite successful for T being something like 10% of the time-out limit. It is not clear whether this is better than doing, say, '--opt-mode=hybrid --cg-hybrid=0.20' instead, though. If you have read this far, and try out a reasonable subset of the combinations discussed above, then you probably get pretty close to squeezing out the best performance that RoundingSat can currently deliver. The next section on arithmetic precision could potentially also be quite important, depending on the concrete applications, but further down the discussions will get increasingly more technical (and probably less relevant). SETTINGS FOR ARITHMETIC PRECISION Very roughly, the default setting of RoundingSat keeps coefficients of constraints in (almost) 32 bits and does the conflict analysis in (almost) 64 bits, except for a couple of bits sacrificed for simplicity and efficiency. For some applications, such as arithmetic circuit verification, we have seen that higher precision, or even arbitrary precision, for the coefficients can be crucial for good performance. The options governing arithmetic precision are as follows (where specifying values 0 will give arbitary-precision arithmetic): '--bits-overflow=BO': number of bits during conflict analysis. The default is BO=62. '--bits-reduced=BR': coefficients are reducted to this bit size when they exceed size BO. The default is 29. (This should be smaller than BO by some margin, to avoid hitting overflow again and again.) '--bits-learned=BL': The number of bits in constraints that are learned and stored in the constraint database. In addition to the default, two settings that could perhaps be interesting are: (a) Unlimited precision: '--bits-overflow=0 --bits-reduced=0 --bits-learned=0' (b) Learned constraints in limited precision but conflict analysis in unlimited precision: '--bits-overflow=0 --bits-reduced=0 --bits-learned=29' SOME MORE IN-DEPTH TECHNICAL SETTINGS FOR CORE-GUIDED SEARCH '--cg-solutionphase=1': Switch off standard phase saving during linear search and instead fix the phase to agree with the incumbent solution (i.e., the best solution found so far). This is the default. By giving the option '--cg-solutionphase=0' one instead gets standard phase saving as in CDCL SAT solvers. '--cg-encoding=reified' uses a different encoding of lazy variables during core-guided search that gives stronger propagation. This should arguably be the default setting, but the default is currently '--cg-encoding=lazysum' instead. We have not seen any truly major impact here, though. '--cg-resprop=1' resolves away propagated assumption variables instead of leaving them in the core during core-guided search. Again, this should probably be the default, but it seems this is instead '--cg-resprop=0', i.e., leaving all assumption variables in the core, regardless of whether they were assumed or propagated. SOME TECHNICAL SETTINGS INHERITED FROM CDCL SOLVERS RoundingSat currently only supports static, Luby-style, restarts. To adapt the restart frequency, use '--luby-base=BASE" to change the base of the Luby restart sequence (the default is BASE=2) and '--luby-mult=FAC' to chance the multiplicative factor (the default is FAC=100). As mentioned already, one can use '--vsids-var=VDECAY' to set the VSIDS variable decay. The default 0.95 is inherited from MiniSat. More modern solvers tend to have lower values like 0.8, or values that change slightly over time (which is not supported by RoundingSat).