Data Structure#

Experimental design space \(X_{space}\)#

Basic Syntax#

Each of the input varible is defined according to the variable type and domain. Continuous variable is specified by variable name, followed by lower and upper bounds.

Param_Continuous(‘varName’, lower_bound, upper_bound)

Discrete varaible is specified by variable name, followed by the (ordered) list of possible values in string format.

Param_Categorical(‘varName’, [‘level 1’, ‘level 2’, ‘level 3’,…])

An example list of input parameter specifications including commonly used variable types: continuous, categorical and ordinal:

from obsidian.parameters import Param_Continuous, Param_Categorical, Param_Ordinal

params = [
    Param_Continuous('Temperature', -10, 30),
    Param_Continuous('Concentration', 10, 150),
    Param_Continuous('Enzyme', 0.01, 0.30),
    Param_Categorical('Variant', ['MRK001', 'MRK002', 'MRK003']),
    Param_Ordinal('StirRate', ['Low', 'Medium', 'High']),
]

Then the \(X_{space}\) is specified as a ParamSpace class object, initialized by the list of parameters.

from obsidian import ParamSpace
X_space = ParamSpace(params)

The ParamSpace class object can be exported into dictionary format to facilite save to json files and reload for future usage:

import json

with open('X_space.json', 'w') as f:
    X_space_dict = X_space.save_state()
    json.dump(X_space_dict, f)
  
with open('X_space.json', 'r') as f:
    X_space_dict = json.load(f)
    X_space_reload = ParamSpace.load_state(X_space_dict)

In addition, the ParamSpace class contains various instance methods for input variable transformation, which are implicitly called during the optimization but no need for direct access by the user.

Additional Variable Types#

  • Continuous observatioal variable

    For example, an entire time course was measured during the experiment, and data at all the different timepoints ranging from 0 to 10 are used for fitting. But during optimization, we are only interested in improving the results for a certain fixed time point at 6.

    from obsidian.parameters import Param_Discrete_Numeric
    Param_Observational(name = 'Time', min = 0, max = 10, design_point = 6)
    
  • Discrete numerical variable

    from obsidian.parameters import Param_Discrete_Numeric
    Param_Discrete_Numeric('LightStage', [1, 2, 3, 4, 5])
    
  • Task variable

    Only one special ‘task’ categorical variable is allowed for encoding multiple tasks. Distinct response will be predicted for each task.

    from obsidian.parameters import Task
    Task('TaskVar', ['Task_A', 'Task_B', 'Task_C', 'Task_D'])
    

Initial experimental conditions, or seed experiments \(X_0\)#

When we start the APO workflow from scratch, the initial experimental conditions are usually generated by random sampling or design-of-experiments algorithms.

For example, generate six input conditions \(X_0\) according to previously specified \(X_{space}\) using Latin hypercube sampling (LHS) method:

from obsidian.experiment import ExpDesigner

designer = ExpDesigner(X_space, seed = 0)
X0 = designer.initialize(m_initial = 6, method='LHS')
print(X0.to_markdown())

Temperature

Concentration

Enzyme

Variant

StirRate

0

13.3333

68.3333

0.2275

MRK003

High

1

6.66667

115

0.0825

MRK003

Low

2

26.6667

45

0.0341667

MRK002

Medium

3

20

91.6667

0.275833

MRK001

Low

4

-6.66667

21.6667

0.179167

MRK002

Medium

5

0

138.333

0.130833

MRK001

High

The designer returns experimental conditions as a pandas dataframe, which is the default data format in various obsidian functions.

Experimental outcome variable(s) \(Y\)#

Basic Syntax#

Similar to the ParamSpace object for input variables, there is Target class object which handles the specification and preprocessing for experimental outcome variables.

For each outcome measurement, there are three essential arguments to be specified:

  • name: Variable name, which is a required input by user

  • f_transform: Transformation function for preprocessing the raw response values, to facilitate the numerical computations during optimization.

    • ‘Identity’: (default) No transformation

    • ‘Standard’: Normalization into zero mean and unit standard deviation

    • ‘Logit_MinMax’: Logit transofrmation with the range or scale automatically calculated based on data

    • ‘Logit_Percentage’: Assuming input response is a percentage ranging between 0 to 100, apply logit transofrmation with scale 1/100.

  • aim: Either ‘max’(default) or ‘min’, which specifies the desirable direction for improvement. Currently it only handles continuous outcome values.

Depend on the number of outcomes, define one Target object or a list of multiple objects:

from obsidian import Target

target = Target(name = 'Yield', f_transform = 'Logit_Percentage', aim='max')

target_multiple = [
    Target(name = 'Yield', f_transform = 'Logit_Percentage', aim='max'),
    Target(name = 'Cost', f_transform = 'Standard', aim='min')
]

Example#

To demonstrate the usage of Target class, we simulate a single task experimental outcome \(y_0\) using the previously generated \(X_0\) and an analytical function ‘shifted_parab’.

from obsidian.experiment import Simulator
from obsidian.experiment.benchmark import shifted_parab

simulator = Simulator(X_space, shifted_parab, name='Yield')
y0 = simulator.simulate(X0)
print(y0.to_markdown())

Yield

0

47.8147

1

62.5599

2

60.7972

3

39.1121

4

83.0833

5

52.2631

If manually input \(y_0\), it should be a pandas dataframe with the same variable name ‘Yield’ as specifed in the target definition.

When the ‘transform_f’ function is called with ‘fit=True’ during the optimization workflow, the raw response will be saved as an attribute to target object

y_transformed = target.transform_f(y0, fit = True)
type(target.f_raw) # torch.Tensor

The Target class object, as well as the input response ‘f_raw’ (if exists), can be exported into dictionary format to facilite save to json files and reload for future usage:

import json

with open('target.json', 'w') as f:
    target_dict = target.save_state()
    json.dump(target_dict, f)
  
with open('target.json', 'r') as f:
    target_dict = json.load(f)
    target_reload = Target.load_state(target_dict)

Use campaign object to manage data#

The Campaign class object acts as the central hub, seamlessly connecting all components within the APO workflow, including data management, optimizer, and experimental designer. It is the recommended approach that offers a more streamlined workflow compared to utilizing each component separately.

Here is an example of creating a Campaign class object and adding the initial dataset to its ‘data’ attribute:

from obsidian.campaign import Campaign

data_Iter0 = pd.concat([X0, y0], axis=1)
my_campaign = Campaign(X_space, target, seed=0)
my_campaign.add_data(data_Iter0)

The ‘add_data’ method will append each new batch of data to one pandas dataframe with incremental integer ‘Iteration’. The new data should be a dataframe contains both the input experimental conditions and the target outcomes.

There are various ways to retrieve data from Campaign:

print(my_campaign.data.to_markdown())

Observation ID

Temperature

Concentration

Enzyme

Variant

StirRate

Yield

Iteration

0

13.3333

68.3333

0.2275

MRK003

High

47.4471

0

1

6.66667

115

0.0825

MRK003

Low

61.3989

0

2

26.6667

45

0.0341667

MRK002

Medium

63.6213

0

3

20

91.6667

0.275833

MRK001

Low

43.4116

0

4

-6.66667

21.6667

0.179167

MRK002

Medium

84.5542

0

5

0

138.333

0.130833

MRK001

High

51.8577

0

and

my_campaign.X
my_campaign.y