bayesian#
Bayesian Optimization: Select experiments from the predicted posterior and update the prior
Classes
|
BayesianOptimizer is a class that implements a Bayesian optimization algorithm. |
- class obsidian.optimizer.bayesian.BayesianOptimizer(X_space: ParamSpace, surrogate: str | dict | list[str] | list[dict] = 'GP', seed: int | None = None, verbose: int = 1)[source]#
Bases:
OptimizerBayesianOptimizer is a class that implements a Bayesian optimization algorithm.
This class is used to optimize a given function by iteratively selecting the next set of input parameters based on the results of previous evaluations. It uses a surrogate model to approximate the underlying function and an acquisition function to determine the next set of parameters to evaluate.
- Parameters:
X_space (ParamSpace) – The parameter space defining the search space for the optimization.
surrogate (str | dict | list[str] | list[dict], optional) –
The surrogate model(s) to use. It can be a string representing a single model type, a dictionary specifying multiple model types with their hyperparameters, or a list of strings or dictionaries.
Defaults to
'GP'. Options are as follows:'GP': Gaussian Process with default settings (Matern Kernel, Gamma covariance priors)'MixedGP': GP with mixed parameter types (continuous, categorical). Will be re-selected by default if ‘GP’ is selected and input space is mixed.'DKL': GP with a NN feature-extractor (deep kernel learning)'GPflat': GP without priors. May result in optimization instability, but removes bias for special situations.'GPprior': GP with custom priors on the mean, likelihood, and covariance'MTGP': Multi-task GP for multi-output optimization. Will be re-selected by default if ‘GP’ is selected and the input space contains Task parameters.'DNN': Dropout neural network. Uses MC sampling to mask neurons during training and to estimate uncertainty.
seed (int | None, optional) – The random seed to use. Defaults to
None.verbose (int, optional) – The verbosity level. Defaults to
1.
- surrogate_type#
The shorthand name of each surrogate model.
- Type:
list[str]
- surrogate_hps#
The hyperparameters for each surrogate model.
- Type:
list[dict]
- is_fit#
Indicates whether the surrogate model has been fit to data.
- Type:
bool
- Raises:
TypeError – If the surrogate argument is not a string, dict, or list of str/dict.
ValueError – If the surrogate dictionary contains more than one surrogate model type.
KeyError – If the surrogate model is not selected from the available models.
ValueError – If the number of responses does not match the number of specified surrogate
- evaluate(X_suggest: DataFrame, X_t_pending: Tensor | None = None, target: Target | list[Target] | None = None, acquisition: str | dict | None = None, objective: MCAcquisitionObjective | None = None, eval_aq: bool = False) DataFrame[source]#
- Parameters:
X_suggest (pd.DataFrame) – Experiment matrix of real input variables, selected by optimizer.
X_t_pending (Tensor) – Suggested experiments yet to be run
target (Target or list of Target, optional) – The response(s) to be used for optimization,
acquisition (str | dict, optional) – Acquisition function name (str) or dictionary containing the acquisition function name and its hyperparameters.
objective (MCAcquisitionObjective, optional) – The objective function to be used for optimization. The default is
None.eval_aq (bool, optional) – Whether or not to also evaluate the aq function. The default is
False.
- Returns:
- Response prediction, pred interval, transformed mean, aq value,
and objective function evaluation(s)
- Return type:
pd.DataFrame
- fit(Z: DataFrame, target: Target | list[Target])[source]#
Fits the BO surrogate model to data.
- Parameters:
- Returns:
None. Updates the model in self.surrogate
- Raises:
NameError – If the target is not present in the data.
ValueError – If the number of responses does not match the number of specified surrogate models.
- property is_fit#
Check if all surrogate mdoels in optimizer are fit
- Returns:
True if the optimizer is fit, False otherwise.
- Return type:
bool
- classmethod load_state(config_save: dict)[source]#
Loads the parameters of the Bayesian Optimizer from a previously fit optimizer.
- Parameters:
config_save (dict) – A dictionary containing the fit parameters for later loading.
- Returns:
None. Updates the parameters of the BayesianOptimizer and its surrogate model.
- Raises:
ValueError – If the number of saved models does not match the number of named models.
- maximize(optim_samples=1026, optim_restarts=50, fixed_var: dict[slice(<class 'str'>, float | str, None)] | None = None) tuple[DataFrame, DataFrame][source]#
Predicts the conditions which return the maximum response value within the parameter space.
- Parameters:
optim_samples (int) – The number of samples to be used for optimization. Default is
1026.optim_restarts (int) – The number of restarts for the optimization process. Default is
50.(dict(str (fixed_var) – float), optional): Name of a variable and setting, over which the suggestion should be fixed. Default values is
None
- Returns:
- tuple[pd.DataFrame, pd.DataFrame] = (X_suggest, eval_suggest)
- X_suggest (pd.DataFrame): Experiment matrix of real input variables,
selected by optimizer.
- y_suggest (pd.DataFrame): Mean results and prediction interval for
each suggested experiment.
- predict(X: DataFrame, return_f_inv: bool = True, PI_range: float = 0.7) DataFrame[source]#
Predicts a response over a range of experiments using the surrogate function.
- Parameters:
X (pd.DataFrame) – Experiments to predict over.
return_f_inv (bool, optional) – Whether or not to return the inverse-transformed objective function, which is the raw response (unscored). The default is
True. Most internal calls set toFalseto handle the transformed objective function.PI_range (float, optional) – The nominal coverage range for the returned prediction interval
- Returns:
Mean prediction and prediction interval for each response
- Return type:
pd.DataFrame
- Raises:
TypeError – If the input is not a DataFrame.
UnfitError – If the surrogate model has not been fit before predicting.
ValueError – If the prediction interval range is greater than 1.
NameError – If the input does not contain all of the required predictors from the training set.
- save_state() dict[source]#
Saves the parameters of the Bayesian Optimizer so that they can be reloaded without fitting.
- Returns:
A dictionary containing the fit parameters for later loading.
- Return type:
dict
- Raises:
UnfitError – If the surrogate model has not been fit before saving the optimizer.
- suggest(m_batch: int = 1, target: ~obsidian.parameters.targets.Target | list[~obsidian.parameters.targets.Target] | None = None, acquisition: list[str] | list[dict] = None, optim_sequential: bool = True, optim_samples: int = 512, optim_restarts: int = 10, objective: ~botorch.acquisition.objective.MCAcquisitionObjective | None = None, out_constraints: ~obsidian.constraints.output.Output_Constraint | list[~obsidian.constraints.output.Output_Constraint] | None = None, eq_constraints: ~obsidian.constraints.input.Linear_Constraint | list[~obsidian.constraints.input.Linear_Constraint] | None = None, ineq_constraints: ~obsidian.constraints.input.Linear_Constraint | list[~obsidian.constraints.input.Linear_Constraint] | None = None, nleq_constraints: ~obsidian.constraints.input.Nonlinear_Constraint | list[~obsidian.constraints.input.Nonlinear_Constraint] | None = None, task_index: int = 0, fixed_var: dict[slice(<class 'str'>, float | str, None)] | None = None, X_pending: ~pandas.core.frame.DataFrame | None = None, eval_pending: ~pandas.core.frame.DataFrame | None = None) tuple[DataFrame, DataFrame, DataFrame][source]#
Suggest future experiments based on a maximization of some acquisition function calculated from the expectation of a surrogate model.
- Parameters:
m_batch (int, optional) – The number of experiments to suggest at once. The default is
1.target (Target or list of Target, optional) – The response(s) to be used for optimization,
acquisition (list of str or list of dict, optional) –
Indicator for the desired acquisition function(s). A list will propose experiments for each acquisition function based on
optim_sequential.The default is
['NEI']for single-output and['NEHVI']for multi-output. Options are as follows:'EI': Expected Improvement (relative to best ofy_train). Accepts hyperparameter'inflate', a positive or negative float to inflate/deflate the best point for explore/exploit.'NEI': Noisy Expected Improvement. More robust thanEIand uses all ofy_train, but accepts no hyperparameters.'PI': Probability of Improvement (relative to best ofy_train). Accepts hyperparameter'inflate', a positive or negative float to inflate/deflate the best point for explore/exploit.'UCB': Upper Confidence Bound. Accepts hyperparameter'beta', a positive float which sets the number of standard deviations above the mean.'SR': Simple Regret'RS': Random Sampling'Mean': Mean of the posterior distribution (pure exploitation/maximization of objective)'SF': Space Filling. Requests points that maximize the minimumd distance toX_trainbased on Euclidean distance.'NIPV': Negative Integrated Posterior Variance. Requests the point which most improves the prediction interval for a random selection of points in the design space. Used for active learning.'EHVI': Expected Hypervolume Improvement. Can accept aref_point, otherwise a point just below the minimum ofy_train.'NEHVI': Noisy Expected Hypervolume Improvement. Can accept aref_point, otherwise a point just below the minimum ofy_train.'NParEGO': Noisy Pareto Efficient Global Optimization. Can acceptscalarization_weights, a list of weights for each objective.
optim_sequential (bool, optional) – Whether or not to optimize batch designs sequentially (by fantasy) or simultaneously. Default is
True.optim_samples (int, optional) – The number of samples to use for quasi Monte Carlo sampling of the acquisition function. Also used for initializing the acquisition optimizer. The default value is
512.optim_restarts (int, optional) – The number of restarts to use in the global optimization of the acquisition function. The default value is
10.objective (MCAcquisitionObjective, optional) – The objective function to be used for optimization. The default is
None.out_constraints (Output_Constraint | list[Output_Constraint], optional) – An output constraint, or a list thereof, restricting the search space by outcomes. The default is
None.eq_constraints (Linear_Constraint | list[Linear_Constraint], optional) – A linear constraint, or a list thereof, restricting the search space by equality (=). The default is
None.ineq_constraints (Linear_Constraint | list[Linear_Constraint], optional) – A linear constraint, or a list thereof, restricting the search space by inequality (>=). The default is
None.nleq_constraints (Nonlinear_Constraint | list[Nonlinear_Constraint], optional) – A nonlinear constraint, or a list thereof, restricting the search space by nonlinear feasibility. The default is
None.task_index (int, optional) – The index of the task to optimize for multi-task models. The default is
0.(dict(str (fixed_var) – float), optional): Name of a variable and setting, over which the suggestion should be fixed. Default values is
NoneX_pending (pd.DataFrame, optional) – Experiments that are expected to be run before the next optimal set
eval_pending (pd.DataFrame, optional) – Acquisition values associated with X_pending
- Returns:
- tuple[pd.DataFrame, pd.DataFrame] = (X_suggest, eval_suggest)
- X_suggest (pd.DataFrame): Experiment matrix of real input variables,
selected by optimizer.
- eval_suggest (pd.DataFrame): Mean results (response, prediction interval, f(response), obj
function for each suggested experiment.
- Raises:
UnfitError – If the surrogate model has not been fit before suggesting new experiments.
TypeError – If the target is not a Target object or a list of Target objects.
IncorrectObjectiveError – If the objective does not successfully execute on a sample.
TypeError – If the acquisition is not a list of strings or dictionaries.
UnsupportedError – If the provided acquisition function does not support output constraints.