Data Structure#
Experimental design space \(X_{space}\)#
Basic Syntax#
Each of the input varible is defined according to the variable type and domain. Continuous variable is specified by variable name, followed by lower and upper bounds.
Param_Continuous(‘varName’, lower_bound, upper_bound)
Discrete varaible is specified by variable name, followed by the (ordered) list of possible values in string format.
Param_Categorical(‘varName’, [‘level 1’, ‘level 2’, ‘level 3’,…])
An example list of input parameter specifications including commonly used variable types: continuous, categorical and ordinal:
from obsidian.parameters import Param_Continuous, Param_Categorical, Param_Ordinal
params = [
Param_Continuous('Temperature', -10, 30),
Param_Continuous('Concentration', 10, 150),
Param_Continuous('Enzyme', 0.01, 0.30),
Param_Categorical('Variant', ['MRK001', 'MRK002', 'MRK003']),
Param_Ordinal('StirRate', ['Low', 'Medium', 'High']),
]
Then the \(X_{space}\) is specified as a ParamSpace
class object, initialized by the list of parameters.
from obsidian import ParamSpace
X_space = ParamSpace(params)
The ParamSpace
class object can be exported into dictionary format to facilite save to json files and reload for future usage:
import json
with open('X_space.json', 'w') as f:
X_space_dict = X_space.save_state()
json.dump(X_space_dict, f)
with open('X_space.json', 'r') as f:
X_space_dict = json.load(f)
X_space_reload = ParamSpace.load_state(X_space_dict)
In addition, the ParamSpace
class contains various instance methods for input variable transformation, which are implicitly called during the optimization but no need for direct access by the user.
Additional Variable Types#
Continuous observatioal variable
For example, an entire time course was measured during the experiment, and data at all the different timepoints ranging from 0 to 10 are used for fitting. But during optimization, we are only interested in improving the results for a certain fixed time point at 6.
from obsidian.parameters import Param_Discrete_Numeric Param_Observational(name = 'Time', min = 0, max = 10, design_point = 6)
Discrete numerical variable
from obsidian.parameters import Param_Discrete_Numeric Param_Discrete_Numeric('LightStage', [1, 2, 3, 4, 5])
Task variable
Only one special ‘task’ categorical variable is allowed for encoding multiple tasks. Distinct response will be predicted for each task.
from obsidian.parameters import Task Task('TaskVar', ['Task_A', 'Task_B', 'Task_C', 'Task_D'])
Initial experimental conditions, or seed experiments \(X_0\)#
When we start the APO workflow from scratch, the initial experimental conditions are usually generated by random sampling or design-of-experiments algorithms.
For example, generate six input conditions \(X_0\) according to previously specified \(X_{space}\) using Latin hypercube sampling (LHS) method:
from obsidian.experiment import ExpDesigner
designer = ExpDesigner(X_space, seed = 0)
X0 = designer.initialize(m_initial = 6, method='LHS')
print(X0.to_markdown())
Temperature |
Concentration |
Enzyme |
Variant |
StirRate |
|
---|---|---|---|---|---|
0 |
13.3333 |
68.3333 |
0.2275 |
MRK003 |
High |
1 |
6.66667 |
115 |
0.0825 |
MRK003 |
Low |
2 |
26.6667 |
45 |
0.0341667 |
MRK002 |
Medium |
3 |
20 |
91.6667 |
0.275833 |
MRK001 |
Low |
4 |
-6.66667 |
21.6667 |
0.179167 |
MRK002 |
Medium |
5 |
0 |
138.333 |
0.130833 |
MRK001 |
High |
The designer
returns experimental conditions as a pandas dataframe, which is the default data format in various obsidian
functions.
Experimental outcome variable(s) \(Y\)#
Basic Syntax#
Similar to the ParamSpace
object for input variables, there is Target
class object which handles the specification and preprocessing for experimental outcome variables.
For each outcome measurement, there are three essential arguments to be specified:
name: Variable name, which is a required input by user
f_transform: Transformation function for preprocessing the raw response values, to facilitate the numerical computations during optimization.
‘Identity’: (default) No transformation
‘Standard’: Normalization into zero mean and unit standard deviation
‘Logit_MinMax’: Logit transofrmation with the range or scale automatically calculated based on data
‘Logit_Percentage’: Assuming input response is a percentage ranging between 0 to 100, apply logit transofrmation with scale 1/100.
aim: Either ‘max’(default) or ‘min’, which specifies the desirable direction for improvement. Currently it only handles continuous outcome values.
Depend on the number of outcomes, define one Target
object or a list of multiple objects:
from obsidian import Target
target = Target(name = 'Yield', f_transform = 'Logit_Percentage', aim='max')
target_multiple = [
Target(name = 'Yield', f_transform = 'Logit_Percentage', aim='max'),
Target(name = 'Cost', f_transform = 'Standard', aim='min')
]
Example#
To demonstrate the usage of Target
class, we simulate a single task experimental outcome \(y_0\) using the previously generated \(X_0\) and an analytical function ‘shifted_parab’.
from obsidian.experiment import Simulator
from obsidian.experiment.benchmark import shifted_parab
simulator = Simulator(X_space, shifted_parab, name='Yield')
y0 = simulator.simulate(X0)
print(y0.to_markdown())
Yield |
|
---|---|
0 |
47.8147 |
1 |
62.5599 |
2 |
60.7972 |
3 |
39.1121 |
4 |
83.0833 |
5 |
52.2631 |
If manually input \(y_0\), it should be a pandas dataframe with the same variable name ‘Yield’ as specifed in the target
definition.
When the ‘transform_f’ function is called with ‘fit=True’ during the optimization workflow, the raw response will be saved as an attribute to target
object
y_transformed = target.transform_f(y0, fit = True)
type(target.f_raw) # torch.Tensor
The Target
class object, as well as the input response ‘f_raw’ (if exists), can be exported into dictionary format to facilite save to json files and reload for future usage:
import json
with open('target.json', 'w') as f:
target_dict = target.save_state()
json.dump(target_dict, f)
with open('target.json', 'r') as f:
target_dict = json.load(f)
target_reload = Target.load_state(target_dict)
Use campaign object to manage data#
The Campaign
class object acts as the central hub, seamlessly connecting all components within the APO workflow, including data management, optimizer, and experimental designer.
It is the recommended approach that offers a more streamlined workflow compared to utilizing each component separately.
Here is an example of creating a Campaign
class object and adding the initial dataset to its ‘data’ attribute:
from obsidian.campaign import Campaign
data_Iter0 = pd.concat([X0, y0], axis=1)
my_campaign = Campaign(X_space, target, seed=0)
my_campaign.add_data(data_Iter0)
The ‘add_data’ method will append each new batch of data to one pandas dataframe with incremental integer ‘Iteration’. The new data should be a dataframe contains both the input experimental conditions and the target outcomes.
There are various ways to retrieve data from Campaign
:
print(my_campaign.data.to_markdown())
Observation ID |
Temperature |
Concentration |
Enzyme |
Variant |
StirRate |
Yield |
Iteration |
---|---|---|---|---|---|---|---|
0 |
13.3333 |
68.3333 |
0.2275 |
MRK003 |
High |
47.4471 |
0 |
1 |
6.66667 |
115 |
0.0825 |
MRK003 |
Low |
61.3989 |
0 |
2 |
26.6667 |
45 |
0.0341667 |
MRK002 |
Medium |
63.6213 |
0 |
3 |
20 |
91.6667 |
0.275833 |
MRK001 |
Low |
43.4116 |
0 |
4 |
-6.66667 |
21.6667 |
0.179167 |
MRK002 |
Medium |
84.5542 |
0 |
5 |
0 |
138.333 |
0.130833 |
MRK001 |
High |
51.8577 |
0 |
and
my_campaign.X
my_campaign.y