bayesian#

Bayesian Optimization: Select experiments from the predicted posterior and update the prior

Classes

BayesianOptimizer(X_space[, surrogate, ...])

BayesianOptimizer is a class that implements a Bayesian optimization algorithm.

class obsidian.optimizer.bayesian.BayesianOptimizer(X_space: ParamSpace, surrogate: str | dict | list[str] | list[dict] = 'GP', seed: int | None = None, verbose: int = 1)[source]#

Bases: Optimizer

BayesianOptimizer is a class that implements a Bayesian optimization algorithm.

This class is used to optimize a given function by iteratively selecting the next set of input parameters based on the results of previous evaluations. It uses a surrogate model to approximate the underlying function and an acquisition function to determine the next set of parameters to evaluate.

Parameters:
  • X_space (ParamSpace) – The parameter space defining the search space for the optimization.

  • surrogate (str | dict | list[str] | list[dict], optional) –

    The surrogate model(s) to use. It can be a string representing a single model type, a dictionary specifying multiple model types with their hyperparameters, or a list of strings or dictionaries.

    Defaults to 'GP'. Options are as follows:

    • 'GP': Gaussian Process with default settings (Matern Kernel, Gamma covariance priors)

    • 'MixedGP': GP with mixed parameter types (continuous, categorical). Will be re-selected by default if ‘GP’ is selected and input space is mixed.

    • 'DKL': GP with a NN feature-extractor (deep kernel learning)

    • 'GPflat': GP without priors. May result in optimization instability, but removes bias for special situations.

    • 'GPprior': GP with custom priors on the mean, likelihood, and covariance

    • 'MTGP': Multi-task GP for multi-output optimization. Will be re-selected by default if ‘GP’ is selected and the input space contains Task parameters.

    • 'DNN': Dropout neural network. Uses MC sampling to mask neurons during training and to estimate uncertainty.

  • seed (int | None, optional) – The random seed to use. Defaults to None.

  • verbose (int, optional) – The verbosity level. Defaults to 1.

surrogate_type#

The shorthand name of each surrogate model.

Type:

list[str]

surrogate_hps#

The hyperparameters for each surrogate model.

Type:

list[dict]

is_fit#

Indicates whether the surrogate model has been fit to data.

Type:

bool

Raises:
  • TypeError – If the surrogate argument is not a string, dict, or list of str/dict.

  • ValueError – If the surrogate dictionary contains more than one surrogate model type.

  • KeyError – If the surrogate model is not selected from the available models.

  • ValueError – If the number of responses does not match the number of specified surrogate

evaluate(X_suggest: DataFrame, X_t_pending: Tensor | None = None, target: Target | list[Target] | None = None, acquisition: str | dict | None = None, objective: MCAcquisitionObjective | None = None, eval_aq: bool = False) DataFrame[source]#
Parameters:
  • X_suggest (pd.DataFrame) – Experiment matrix of real input variables, selected by optimizer.

  • X_t_pending (Tensor) – Suggested experiments yet to be run

  • target (Target or list of Target, optional) – The response(s) to be used for optimization,

  • acquisition (str | dict, optional) – Acquisition function name (str) or dictionary containing the acquisition function name and its hyperparameters.

  • objective (MCAcquisitionObjective, optional) – The objective function to be used for optimization. The default is None.

  • eval_aq (bool, optional) – Whether or not to also evaluate the aq function. The default is False.

Returns:

Response prediction, pred interval, transformed mean, aq value,

and objective function evaluation(s)

Return type:

pd.DataFrame

fit(Z: DataFrame, target: Target | list[Target])[source]#

Fits the BO surrogate model to data.

Parameters:
  • Z (pd.DataFrame) – Total dataset including inputs (X) and response values (y)

  • target (Target or list of Target) – The responses (y) to be used for optimization, packed into a Target object or list thereof

Returns:

None. Updates the model in self.surrogate

Raises:
  • NameError – If the target is not present in the data.

  • ValueError – If the number of responses does not match the number of specified surrogate models.

property is_fit#

Check if all surrogate mdoels in optimizer are fit

Returns:

True if the optimizer is fit, False otherwise.

Return type:

bool

classmethod load_state(config_save: dict)[source]#

Loads the parameters of the Bayesian Optimizer from a previously fit optimizer.

Parameters:

config_save (dict) – A dictionary containing the fit parameters for later loading.

Returns:

None. Updates the parameters of the BayesianOptimizer and its surrogate model.

Raises:

ValueError – If the number of saved models does not match the number of named models.

maximize(optim_samples=1026, optim_restarts=50, fixed_var: dict[slice(<class 'str'>, float | str, None)] | None = None) tuple[DataFrame, DataFrame][source]#

Predicts the conditions which return the maximum response value within the parameter space.

Parameters:
  • optim_samples (int) – The number of samples to be used for optimization. Default is 1026.

  • optim_restarts (int) – The number of restarts for the optimization process. Default is 50.

  • (dict(str (fixed_var) – float), optional): Name of a variable and setting, over which the suggestion should be fixed. Default values is None

Returns:

tuple[pd.DataFrame, pd.DataFrame] = (X_suggest, eval_suggest)
X_suggest (pd.DataFrame): Experiment matrix of real input variables,

selected by optimizer.

y_suggest (pd.DataFrame): Mean results and prediction interval for

each suggested experiment.

predict(X: DataFrame, return_f_inv: bool = True, PI_range: float = 0.7) DataFrame[source]#

Predicts a response over a range of experiments using the surrogate function.

Parameters:
  • X (pd.DataFrame) – Experiments to predict over.

  • return_f_inv (bool, optional) – Whether or not to return the inverse-transformed objective function, which is the raw response (unscored). The default is True. Most internal calls set to False to handle the transformed objective function.

  • PI_range (float, optional) – The nominal coverage range for the returned prediction interval

Returns:

Mean prediction and prediction interval for each response

Return type:

pd.DataFrame

Raises:
  • TypeError – If the input is not a DataFrame.

  • UnfitError – If the surrogate model has not been fit before predicting.

  • ValueError – If the prediction interval range is greater than 1.

  • NameError – If the input does not contain all of the required predictors from the training set.

save_state() dict[source]#

Saves the parameters of the Bayesian Optimizer so that they can be reloaded without fitting.

Returns:

A dictionary containing the fit parameters for later loading.

Return type:

dict

Raises:

UnfitError – If the surrogate model has not been fit before saving the optimizer.

suggest(m_batch: int = 1, target: ~obsidian.parameters.targets.Target | list[~obsidian.parameters.targets.Target] | None = None, acquisition: list[str] | list[dict] = None, optim_sequential: bool = True, optim_samples: int = 512, optim_restarts: int = 10, objective: ~botorch.acquisition.objective.MCAcquisitionObjective | None = None, out_constraints: ~obsidian.constraints.output.Output_Constraint | list[~obsidian.constraints.output.Output_Constraint] | None = None, eq_constraints: ~obsidian.constraints.input.Linear_Constraint | list[~obsidian.constraints.input.Linear_Constraint] | None = None, ineq_constraints: ~obsidian.constraints.input.Linear_Constraint | list[~obsidian.constraints.input.Linear_Constraint] | None = None, nleq_constraints: ~obsidian.constraints.input.Nonlinear_Constraint | list[~obsidian.constraints.input.Nonlinear_Constraint] | None = None, task_index: int = 0, fixed_var: dict[slice(<class 'str'>, float | str, None)] | None = None, X_pending: ~pandas.core.frame.DataFrame | None = None, eval_pending: ~pandas.core.frame.DataFrame | None = None) tuple[DataFrame, DataFrame, DataFrame][source]#

Suggest future experiments based on a maximization of some acquisition function calculated from the expectation of a surrogate model.

Parameters:
  • m_batch (int, optional) – The number of experiments to suggest at once. The default is 1.

  • target (Target or list of Target, optional) – The response(s) to be used for optimization,

  • acquisition (list of str or list of dict, optional) –

    Indicator for the desired acquisition function(s). A list will propose experiments for each acquisition function based on optim_sequential.

    The default is ['NEI'] for single-output and ['NEHVI'] for multi-output. Options are as follows:

    • 'EI': Expected Improvement (relative to best of y_train). Accepts hyperparameter 'inflate', a positive or negative float to inflate/deflate the best point for explore/exploit.

    • 'NEI': Noisy Expected Improvement. More robust than EI and uses all of y_train, but accepts no hyperparameters.

    • 'PI': Probability of Improvement (relative to best of y_train). Accepts hyperparameter 'inflate', a positive or negative float to inflate/deflate the best point for explore/exploit.

    • 'UCB': Upper Confidence Bound. Accepts hyperparameter 'beta', a positive float which sets the number of standard deviations above the mean.

    • 'SR': Simple Regret

    • 'RS': Random Sampling

    • 'Mean': Mean of the posterior distribution (pure exploitation/maximization of objective)

    • 'SF': Space Filling. Requests points that maximize the minimumd distance to X_train based on Euclidean distance.

    • 'NIPV': Negative Integrated Posterior Variance. Requests the point which most improves the prediction interval for a random selection of points in the design space. Used for active learning.

    • 'EHVI': Expected Hypervolume Improvement. Can accept a ref_point, otherwise a point just below the minimum of y_train.

    • 'NEHVI': Noisy Expected Hypervolume Improvement. Can accept a ref_point, otherwise a point just below the minimum of y_train.

    • 'NParEGO': Noisy Pareto Efficient Global Optimization. Can accept scalarization_weights, a list of weights for each objective.

  • optim_sequential (bool, optional) – Whether or not to optimize batch designs sequentially (by fantasy) or simultaneously. Default is True.

  • optim_samples (int, optional) – The number of samples to use for quasi Monte Carlo sampling of the acquisition function. Also used for initializing the acquisition optimizer. The default value is 512.

  • optim_restarts (int, optional) – The number of restarts to use in the global optimization of the acquisition function. The default value is 10.

  • objective (MCAcquisitionObjective, optional) – The objective function to be used for optimization. The default is None.

  • out_constraints (Output_Constraint | list[Output_Constraint], optional) – An output constraint, or a list thereof, restricting the search space by outcomes. The default is None.

  • eq_constraints (Linear_Constraint | list[Linear_Constraint], optional) – A linear constraint, or a list thereof, restricting the search space by equality (=). The default is None.

  • ineq_constraints (Linear_Constraint | list[Linear_Constraint], optional) – A linear constraint, or a list thereof, restricting the search space by inequality (>=). The default is None.

  • nleq_constraints (Nonlinear_Constraint | list[Nonlinear_Constraint], optional) – A nonlinear constraint, or a list thereof, restricting the search space by nonlinear feasibility. The default is None.

  • task_index (int, optional) – The index of the task to optimize for multi-task models. The default is 0.

  • (dict(str (fixed_var) – float), optional): Name of a variable and setting, over which the suggestion should be fixed. Default values is None

  • X_pending (pd.DataFrame, optional) – Experiments that are expected to be run before the next optimal set

  • eval_pending (pd.DataFrame, optional) – Acquisition values associated with X_pending

Returns:

tuple[pd.DataFrame, pd.DataFrame] = (X_suggest, eval_suggest)
X_suggest (pd.DataFrame): Experiment matrix of real input variables,

selected by optimizer.

eval_suggest (pd.DataFrame): Mean results (response, prediction interval, f(response), obj

function for each suggested experiment.

Raises:
  • UnfitError – If the surrogate model has not been fit before suggesting new experiments.

  • TypeError – If the target is not a Target object or a list of Target objects.

  • IncorrectObjectiveError – If the objective does not successfully execute on a sample.

  • TypeError – If the acquisition is not a list of strings or dictionaries.

  • UnsupportedError – If the provided acquisition function does not support output constraints.