All search methods in DISCOVER are designed to find an approximate or exact solution to the same fundamental problem. Given a large set of $M$ candidate symbolic features $\Phi = {\Phi_1, \Phi_2, \dots, \Phi_M}$ and a target property vector $\mathbf{y}$, the goal is to find a linear model with at most $D$ features that minimizes the prediction error. This is the $L_0$-norm regularized least-squares problem:
\[\min_{\beta} \left\| \mathbf{y} - \Phi \beta \right\|_2^2 \quad \text{subject to} \quad \left\| \beta \right\|_0 \le D \tag{1}\]where $|\beta|_0$ counts the number of nonzero elements in the coefficient vector $\beta$. This problem is NP-hard, necessitating the use of various exact heuristic, approximate, or specialized algorithms.
The following methods are implemented in search.py to select the optimal subset of $D$ features from the screened candidates.
_find_best_models_brute_force)This is the most straightforward and exhaustive approach. It systematically evaluates every possible combination of ( D ) features from the pool of ( M ) candidates.
Generate feature combinations
For a given dimension ( D ), generate all ( \binom{M}{D} ) unique combinations of features.
Form feature matrix
For each combination, form a feature matrix ( X_D ).
Fit an Ordinary Least Squares (OLS) model
Solve for the coefficients ( \beta ):
\[\beta = X_D^\dagger \mathbf{y}\](pseudo-inverse is used if ( X_D ) is rank-deficient)
Compute model error
Calculate the Root Mean Squared Error (RMSE):
\[\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}\]or equivalently,
\[\text{RMSE} = \sqrt{\frac{1}{n} \|\mathbf{y} - X_D\beta\|_2^2}\]Select the best model
The combination that yields the minimum RMSE is selected as the best model for dimension ( D ).
This method guarantees finding the globally optimal set of ( D ) symbolic features from the screened pool.
However, its computational cost ( O!\left(\binom{M}{D}\right) ) makes it practical only for small ( M ) and ( D ).
_find_best_models_greedy)This is a forward selection algorithm that builds the model one feature at a time. It is computationally efficient but may not find the global optimum.
Dimension 1
Find the single best feature ( \Phi_1^* ) by fitting ( M ) separate 1D models.
The feature with the lowest error is selected:
Dimension 2
Keep ( \Phi_1^* ) fixed.
Search through all remaining ( M - 1 ) features to find the second feature ( \Phi_2^* ) that, when combined with ( \Phi_1^* ), yields the best 2D model:
Dimension D
Continue this process, adding one feature at each step that provides the greatest improvement to the existing model,
until a ( D )-dimensional model is constructed.
The greedy search quickly identifies a good, but not necessarily optimal, combination of symbolic features. In each step, it selects the symbolic expression that best explains the variance not already explained by the previously selected features.
_find_best_models_omp)OMP is a more robust version of the greedy algorithm. Instead of just adding the next best feature, it recalculates the coefficients for all selected features at each step.
Initialize the residual and active set
Initialize the residual: \(\mathbf{r}_0 = \mathbf{y}\) and the selected feature set: \(\mathcal{S}_0 = \emptyset\)
Iterative selection (for ( k = 1, \dots, D ) )
Find the most correlated feature
Select the feature ( \Phi_k^* ) from the remaining pool that is most correlated with the current residual ( \mathbf{r}_{k-1} ):
\[\Phi_k^* = \underset{\Phi_j \notin \mathcal{S}_{k-1}}{\text{argmax}} \left| \langle \mathbf{r}_{k-1}, \Phi_j \rangle \right|\]Add the selected feature
Update the active set: \(\mathcal{S}_k = \mathcal{S}_{k-1} \cup \{\Phi_k^*\}\)
Compute new coefficients
Solve a least-squares problem using all features currently in the active set ( \mathcal{S}_k ):
\[\beta_k = \Phi_{\mathcal{S}_k}^\dagger \mathbf{y}\](least-squares solution / pseudo-inverse)
Update the residual
\[\mathbf{r}_k = \mathbf{y} - \Phi_{\mathcal{S}_k} \beta_k\]OMP provides a more stable path to a solution than a simple greedy search. By re-fitting the entire model at each step, it better accounts for correlations between the selected symbolic features, often leading to a more physically meaningful final equation.
_find_best_models_sisso_pp)This is a highly efficient breadth-first search algorithm that avoids the combinatorial explosion of brute-force by leveraging linear algebra updates.
It maintains a “beam” of the best-performing models at each dimension.
Dimension 1
Evaluate all 1D models and retain the top ( N_{\text{beam}} ) models with the lowest residual sum of squares (RSS).
Dimension 2
For each of the ( N_{\text{beam}} ) models from ( D = 1 ), try adding every other available feature.
The QR Update
The key to its efficiency is using QR decomposition.
If we have the decomposition for a feature set ( X_D = Q_D R_D ), the RSS is easily calculated:
To find the RSS for a new model with an added feature ( \mathbf{x}{\text{new}} ), we don’t refit.
Instead, we compute the component of ( \mathbf{x}{\text{new}} ) orthogonal to the space spanned by ( Q_D ):
The reduction in RSS is then calculated directly:
\[\Delta \text{RSS} = \frac{(\mathbf{r}_D^T \mathbf{w}_{\text{new}})^2}{\|\mathbf{w}_{\text{new}}\|_2^2}\]where the residual is defined as:
\[\mathbf{r}_D = \mathbf{y} - Q_D Q_D^T \mathbf{y}\]Beam update
A new beam of the top ( N_{\text{beam}} ) 2D models is created, and the process repeats for
( D = 3, \dots, D_{\text{max}} ).
SISSO++ explores a much wider range of feature combinations than a simple greedy search, without incurring the cost of brute-force.
It is highly effective at finding high-quality, low-dimensional symbolic models that might be missed by a purely sequential approach.
These are stochastic heuristic search algorithms that explore the solution space by making random changes to a candidate model.
_find_best_models_rmhc):
_find_best_models_sa):
SA is similar to RMHC, but with a crucial difference: it can accept “downhill” moves (worse solutions) to escape local optima. The probability of accepting a worse solution with error increase $\Delta E$ is given by the Metropolis criterion:
\(P(\text{accept}) = e^{-\frac{\Delta E}{T}}\)
where $T$ is a “temperature” parameter that starts high (allowing many bad moves) and is gradually decreased (the “annealing schedule”), making the search converge towards a good solution.These methods are excellent for refining a model found by a faster method like a greedy search. They can escape the local optima that greedy methods are prone to, potentially swapping out a feature for another that, while worse on its own, enables a much better overall model when combined with the other $D-1$ features.
_find_best_models_miqp)This method provides a mathematically rigorous, exact solution to the $L_0$ problem. It recasts the problem into a format that can be solved by specialized solvers like Gurobi.
When applicable (for regression with $L_2$ loss) and computationally feasible, MIQP is the gold standard. It guarantees that the selected combination of $D$ symbolic features is the provably optimal one from the entire candidate pool, leaving no doubt that a better linear combination of $D$ features exists.
_find_best_models_ch_greedy)This is a specialized greedy search for the ch_classification task. The scoring function is not RMSE but a geometric measure of class separability.
This method finds a set of symbolic features that transform the original data into a new space where the different classes are maximally separable by geometric boundaries (convex hulls). The resulting “formula” is not a single equation for a property, but a set of coordinate transformations (the descriptors) that define this optimal classification space.
_refine_model_with_nlopt)After a linear model has been identified by one of the search methods, this optional step can refine it by introducing non-linear parameters.