Data Structure#

Experimental design space \(X_{space}\)#

Basic Syntax#

Each of the input varible is defined according to the variable type and domain. Continuous variable is specified by variable name, followed by lower and upper bounds.

Param_Continuous(‘varName’, lower_bound, upper_bound)

Discrete varaible is specified by variable name, followed by the (ordered) list of possible values in string format.

Param_Categorical(‘varName’, [‘level 1’, ‘level 2’, ‘level 3’,…])

An example list of input parameter specifications including commonly used variable types: continuous, categorical and ordinal:

from obsidian.parameters import Param_Continuous, Param_Categorical, Param_Ordinal

params = [
    Param_Continuous('Temperature', -10, 30),
    Param_Continuous('Concentration', 10, 150),
    Param_Continuous('Enzyme', 0.01, 0.30),
    Param_Categorical('Variant', ['MRK001', 'MRK002', 'MRK003']),
    Param_Ordinal('StirRate', ['Low', 'Medium', 'High']),
]

Then the \(X_{space}\) is specified as a ParamSpace class object, initialized by the list of parameters.

from obsidian import ParamSpace
X_space = ParamSpace(params)

The ParamSpace class object can be exported into dictionary format to facilite save to json files and reload for future usage:

import json

with open('X_space.json', 'w') as f:
    X_space_dict = X_space.save_state()
    json.dump(X_space_dict, f)
  
with open('X_space.json', 'r') as f:
    X_space_dict = json.load(f)
    X_space_reload = ParamSpace.load_state(X_space_dict)

In addition, the ParamSpace class contains various instance methods for input variable transformation, which are implicitly called during the optimization but no need for direct access by the user.

Additional Variable Types#

Continuous observatioal variable

For example, an entire time course was measured during the experiment, and data at all the different timepoints ranging from 0 to 10 are used for fitting. But during optimization, we are only interested in improving the results for a certain fixed time point at 6.
```
from obsidian.parameters import Param_Discrete_Numeric
Param_Observational(name = 'Time', min = 0, max = 10, design_point = 6)
```

Discrete numerical variable

from obsidian.parameters import Param_Discrete_Numeric
Param_Discrete_Numeric('LightStage', [1, 2, 3, 4, 5])

Task variable

Only one special ‘task’ categorical variable is allowed for encoding multiple tasks. Distinct response will be predicted for each task.
```
from obsidian.parameters import Task
Task('TaskVar', ['Task_A', 'Task_B', 'Task_C', 'Task_D'])
```

Initial experimental conditions, or seed experiments \(X_0\)#

When we start the APO workflow from scratch, the initial experimental conditions are usually generated by random sampling or design-of-experiments algorithms.

For example, generate six input conditions \(X_0\) according to previously specified \(X_{space}\) using Latin hypercube sampling (LHS) method:

from obsidian.experiment import ExpDesigner

designer = ExpDesigner(X_space, seed = 0)
X0 = designer.initialize(m_initial = 6, method='LHS')
print(X0.to_markdown())

	Temperature	Concentration	Enzyme	Variant	StirRate
0	13.3333	68.3333	0.2275	MRK003	High
1	6.66667	115	0.0825	MRK003	Low
2	26.6667	45	0.0341667	MRK002	Medium
3	20	91.6667	0.275833	MRK001	Low
4	-6.66667	21.6667	0.179167	MRK002	Medium
5	0	138.333	0.130833	MRK001	High

The designer returns experimental conditions as a pandas dataframe, which is the default data format in various obsidian functions.

Experimental outcome variable(s) \(Y\)#

Basic Syntax#

Similar to the ParamSpace object for input variables, there is Target class object which handles the specification and preprocessing for experimental outcome variables.

For each outcome measurement, there are three essential arguments to be specified:

name: Variable name, which is a required input by user
f_transform: Transformation function for preprocessing the raw response values, to facilitate the numerical computations during optimization.
- ‘Identity’: (default) No transformation
- ‘Standard’: Normalization into zero mean and unit standard deviation
- ‘Logit_MinMax’: Logit transofrmation with the range or scale automatically calculated based on data
- ‘Logit_Percentage’: Assuming input response is a percentage ranging between 0 to 100, apply logit transofrmation with scale 1/100.
aim: Either ‘max’(default) or ‘min’, which specifies the desirable direction for improvement. Currently it only handles continuous outcome values.

Depend on the number of outcomes, define one Target object or a list of multiple objects:

from obsidian import Target

target = Target(name = 'Yield', f_transform = 'Logit_Percentage', aim='max')

target_multiple = [
    Target(name = 'Yield', f_transform = 'Logit_Percentage', aim='max'),
    Target(name = 'Cost', f_transform = 'Standard', aim='min')
]

Example#

To demonstrate the usage of Target class, we simulate a single task experimental outcome \(y_0\) using the previously generated \(X_0\) and an analytical function ‘shifted_parab’.

from obsidian.experiment import Simulator
from obsidian.experiment.benchmark import shifted_parab

simulator = Simulator(X_space, shifted_parab, name='Yield')
y0 = simulator.simulate(X0)
print(y0.to_markdown())

	Yield
0	47.8147
1	62.5599
2	60.7972
3	39.1121
4	83.0833
5	52.2631

If manually input \(y_0\), it should be a pandas dataframe with the same variable name ‘Yield’ as specifed in the target definition.

When the ‘transform_f’ function is called with ‘fit=True’ during the optimization workflow, the raw response will be saved as an attribute to target object

y_transformed = target.transform_f(y0, fit = True)
type(target.f_raw) # torch.Tensor

The Target class object, as well as the input response ‘f_raw’ (if exists), can be exported into dictionary format to facilite save to json files and reload for future usage:

import json

with open('target.json', 'w') as f:
    target_dict = target.save_state()
    json.dump(target_dict, f)
  
with open('target.json', 'r') as f:
    target_dict = json.load(f)
    target_reload = Target.load_state(target_dict)

Use campaign object to manage data#

The Campaign class object acts as the central hub, seamlessly connecting all components within the APO workflow, including data management, optimizer, and experimental designer. It is the recommended approach that offers a more streamlined workflow compared to utilizing each component separately.

Here is an example of creating a Campaign class object and adding the initial dataset to its ‘data’ attribute:

from obsidian.campaign import Campaign

data_Iter0 = pd.concat([X0, y0], axis=1)
my_campaign = Campaign(X_space, target, seed=0)
my_campaign.add_data(data_Iter0)

The ‘add_data’ method will append each new batch of data to one pandas dataframe with incremental integer ‘Iteration’. The new data should be a dataframe contains both the input experimental conditions and the target outcomes.

There are various ways to retrieve data from Campaign:

print(my_campaign.data.to_markdown())

Observation ID	Temperature	Concentration	Enzyme	Variant	StirRate	Yield
0	13.3333	68.3333	0.2275	MRK003	High	47.4471
1	6.66667	115	0.0825	MRK003	Low	61.3989
2	26.6667	45	0.0341667	MRK002	Medium	63.6213
3	20	91.6667	0.275833	MRK001	Low	43.4116
4	-6.66667	21.6667	0.179167	MRK002	Medium	84.5542
5	0	138.333	0.130833	MRK001	High	51.8577

and

my_campaign.X
my_campaign.y