bayesian#
Bayesian Optimization: Select experiments from the predicted posterior and update the prior
Classes
|
BayesianOptimizer is a class that implements a Bayesian optimization algorithm. |
- class obsidian.optimizer.bayesian.BayesianOptimizer(X_space: ParamSpace, surrogate: str | dict | list[str] | list[dict] = 'GP', seed: int | None = None, verbose: int = 1)[source]#
Bases:
Optimizer
BayesianOptimizer is a class that implements a Bayesian optimization algorithm.
This class is used to optimize a given function by iteratively selecting the next set of input parameters based on the results of previous evaluations. It uses a surrogate model to approximate the underlying function and an acquisition function to determine the next set of parameters to evaluate.
- Parameters:
X_space (ParamSpace) – The parameter space defining the search space for the optimization.
surrogate (str | dict | list[str] | list[dict], optional) –
The surrogate model(s) to use. It can be a string representing a single model type, a dictionary specifying multiple model types with their hyperparameters, or a list of strings or dictionaries.
Defaults to
'GP'
. Options are as follows:'GP'
: Gaussian Process with default settings (Matern Kernel, Gamma covariance priors)'MixedGP'
: GP with mixed parameter types (continuous, categorical). Will be re-selected by default if ‘GP’ is selected and input space is mixed.'DKL'
: GP with a NN feature-extractor (deep kernel learning)'GPflat'
: GP without priors. May result in optimization instability, but removes bias for special situations.'GPprior'
: GP with custom priors on the mean, likelihood, and covariance'MTGP'
: Multi-task GP for multi-output optimization. Will be re-selected by default if ‘GP’ is selected and the input space contains Task parameters.'DNN'
: Dropout neural network. Uses MC sampling to mask neurons during training and to estimate uncertainty.
seed (int | None, optional) – The random seed to use. Defaults to
None
.verbose (int, optional) – The verbosity level. Defaults to
1
.
- surrogate_type#
The shorthand name of each surrogate model.
- Type:
list[str]
- surrogate_hps#
The hyperparameters for each surrogate model.
- Type:
list[dict]
- is_fit#
Indicates whether the surrogate model has been fit to data.
- Type:
bool
- Raises:
TypeError – If the surrogate argument is not a string, dict, or list of str/dict.
ValueError – If the surrogate dictionary contains more than one surrogate model type.
KeyError – If the surrogate model is not selected from the available models.
ValueError – If the number of responses does not match the number of specified surrogate
- evaluate(X_suggest: DataFrame, X_t_pending: Tensor | None = None, target: Target | list[Target] | None = None, acquisition: str | dict | None = None, objective: MCAcquisitionObjective | None = None, eval_aq: bool = False) DataFrame [source]#
- Parameters:
X_suggest (pd.DataFrame) – Experiment matrix of real input variables, selected by optimizer.
X_t_pending (Tensor) – Suggested experiments yet to be run
target (Target or list of Target, optional) – The response(s) to be used for optimization,
acquisition (str | dict, optional) – Acquisition function name (str) or dictionary containing the acquisition function name and its hyperparameters.
objective (MCAcquisitionObjective, optional) – The objective function to be used for optimization. The default is
None
.eval_aq (bool, optional) – Whether or not to also evaluate the aq function. The default is
False
.
- Returns:
- Response prediction, pred interval, transformed mean, aq value,
and objective function evaluation(s)
- Return type:
pd.DataFrame
- fit(Z: DataFrame, target: Target | list[Target])[source]#
Fits the BO surrogate model to data.
- Parameters:
- Returns:
None. Updates the model in self.surrogate
- Raises:
NameError – If the target is not present in the data.
ValueError – If the number of responses does not match the number of specified surrogate models.
- property is_fit#
Check if all surrogate mdoels in optimizer are fit
- Returns:
True if the optimizer is fit, False otherwise.
- Return type:
bool
- classmethod load_state(config_save: dict)[source]#
Loads the parameters of the Bayesian Optimizer from a previously fit optimizer.
- Parameters:
config_save (dict) – A dictionary containing the fit parameters for later loading.
- Returns:
None. Updates the parameters of the BayesianOptimizer and its surrogate model.
- Raises:
ValueError – If the number of saved models does not match the number of named models.
- maximize(optim_samples=1026, optim_restarts=50, fixed_var: dict[slice(<class 'str'>, float | str, None)] | None = None) tuple[DataFrame, DataFrame] [source]#
Predicts the conditions which return the maximum response value within the parameter space.
- Parameters:
optim_samples (int) – The number of samples to be used for optimization. Default is
1026
.optim_restarts (int) – The number of restarts for the optimization process. Default is
50
.(dict(str (fixed_var) – float), optional): Name of a variable and setting, over which the suggestion should be fixed. Default values is
None
- Returns:
- tuple[pd.DataFrame, pd.DataFrame] = (X_suggest, eval_suggest)
- X_suggest (pd.DataFrame): Experiment matrix of real input variables,
selected by optimizer.
- y_suggest (pd.DataFrame): Mean results and prediction interval for
each suggested experiment.
- predict(X: DataFrame, return_f_inv: bool = True, PI_range: float = 0.7) DataFrame [source]#
Predicts a response over a range of experiments using the surrogate function.
- Parameters:
X (pd.DataFrame) – Experiments to predict over.
return_f_inv (bool, optional) – Whether or not to return the inverse-transformed objective function, which is the raw response (unscored). The default is
True
. Most internal calls set toFalse
to handle the transformed objective function.PI_range (float, optional) – The nominal coverage range for the returned prediction interval
- Returns:
Mean prediction and prediction interval for each response
- Return type:
pd.DataFrame
- Raises:
TypeError – If the input is not a DataFrame.
UnfitError – If the surrogate model has not been fit before predicting.
ValueError – If the prediction interval range is greater than 1.
NameError – If the input does not contain all of the required predictors from the training set.
- save_state() dict [source]#
Saves the parameters of the Bayesian Optimizer so that they can be reloaded without fitting.
- Returns:
A dictionary containing the fit parameters for later loading.
- Return type:
dict
- Raises:
UnfitError – If the surrogate model has not been fit before saving the optimizer.
- suggest(m_batch: int = 1, target: ~obsidian.parameters.targets.Target | list[~obsidian.parameters.targets.Target] | None = None, acquisition: list[str] | list[dict] = None, optim_sequential: bool = True, optim_samples: int = 512, optim_restarts: int = 10, objective: ~botorch.acquisition.objective.MCAcquisitionObjective | None = None, out_constraints: ~obsidian.constraints.output.Output_Constraint | list[~obsidian.constraints.output.Output_Constraint] | None = None, eq_constraints: ~obsidian.constraints.input.Linear_Constraint | list[~obsidian.constraints.input.Linear_Constraint] | None = None, ineq_constraints: ~obsidian.constraints.input.Linear_Constraint | list[~obsidian.constraints.input.Linear_Constraint] | None = None, nleq_constraints: ~obsidian.constraints.input.Nonlinear_Constraint | list[~obsidian.constraints.input.Nonlinear_Constraint] | None = None, task_index: int = 0, fixed_var: dict[slice(<class 'str'>, float | str, None)] | None = None, X_pending: ~pandas.core.frame.DataFrame | None = None, eval_pending: ~pandas.core.frame.DataFrame | None = None) tuple[DataFrame, DataFrame, DataFrame] [source]#
Suggest future experiments based on a maximization of some acquisition function calculated from the expectation of a surrogate model.
- Parameters:
m_batch (int, optional) – The number of experiments to suggest at once. The default is
1
.target (Target or list of Target, optional) – The response(s) to be used for optimization,
acquisition (list of str or list of dict, optional) –
Indicator for the desired acquisition function(s). A list will propose experiments for each acquisition function based on
optim_sequential
.The default is
['NEI']
for single-output and['NEHVI']
for multi-output. Options are as follows:'EI'
: Expected Improvement (relative to best ofy_train
). Accepts hyperparameter'inflate'
, a positive or negative float to inflate/deflate the best point for explore/exploit.'NEI'
: Noisy Expected Improvement. More robust thanEI
and uses all ofy_train
, but accepts no hyperparameters.'PI'
: Probability of Improvement (relative to best ofy_train
). Accepts hyperparameter'inflate'
, a positive or negative float to inflate/deflate the best point for explore/exploit.'UCB'
: Upper Confidence Bound. Accepts hyperparameter'beta'
, a positive float which sets the number of standard deviations above the mean.'SR'
: Simple Regret'RS'
: Random Sampling'Mean'
: Mean of the posterior distribution (pure exploitation/maximization of objective)'SF'
: Space Filling. Requests points that maximize the minimumd distance toX_train
based on Euclidean distance.'NIPV'
: Negative Integrated Posterior Variance. Requests the point which most improves the prediction interval for a random selection of points in the design space. Used for active learning.'EHVI'
: Expected Hypervolume Improvement. Can accept aref_point
, otherwise a point just below the minimum ofy_train
.'NEHVI'
: Noisy Expected Hypervolume Improvement. Can accept aref_point
, otherwise a point just below the minimum ofy_train
.'NParEGO'
: Noisy Pareto Efficient Global Optimization. Can acceptscalarization_weights
, a list of weights for each objective.
optim_sequential (bool, optional) – Whether or not to optimize batch designs sequentially (by fantasy) or simultaneously. Default is
True
.optim_samples (int, optional) – The number of samples to use for quasi Monte Carlo sampling of the acquisition function. Also used for initializing the acquisition optimizer. The default value is
512
.optim_restarts (int, optional) – The number of restarts to use in the global optimization of the acquisition function. The default value is
10
.objective (MCAcquisitionObjective, optional) – The objective function to be used for optimization. The default is
None
.out_constraints (Output_Constraint | list[Output_Constraint], optional) – An output constraint, or a list thereof, restricting the search space by outcomes. The default is
None
.eq_constraints (Linear_Constraint | list[Linear_Constraint], optional) – A linear constraint, or a list thereof, restricting the search space by equality (=). The default is
None
.ineq_constraints (Linear_Constraint | list[Linear_Constraint], optional) – A linear constraint, or a list thereof, restricting the search space by inequality (>=). The default is
None
.nleq_constraints (Nonlinear_Constraint | list[Nonlinear_Constraint], optional) – A nonlinear constraint, or a list thereof, restricting the search space by nonlinear feasibility. The default is
None
.task_index (int, optional) – The index of the task to optimize for multi-task models. The default is
0
.(dict(str (fixed_var) – float), optional): Name of a variable and setting, over which the suggestion should be fixed. Default values is
None
X_pending (pd.DataFrame, optional) – Experiments that are expected to be run before the next optimal set
eval_pending (pd.DataFrame, optional) – Acquisition values associated with X_pending
- Returns:
- tuple[pd.DataFrame, pd.DataFrame] = (X_suggest, eval_suggest)
- X_suggest (pd.DataFrame): Experiment matrix of real input variables,
selected by optimizer.
- eval_suggest (pd.DataFrame): Mean results (response, prediction interval, f(response), obj
function for each suggested experiment.
- Raises:
UnfitError – If the surrogate model has not been fit before suggesting new experiments.
TypeError – If the target is not a Target object or a list of Target objects.
IncorrectObjectiveError – If the objective does not successfully execute on a sample.
TypeError – If the acquisition is not a list of strings or dictionaries.
UnsupportedError – If the provided acquisition function does not support output constraints.