Merge project files¶
This guide will walk you through the process of merging multiple project JSON files. This feature might be useful if multiple people are executing experiments with the same underlying project.
TL;DR¶
To perform a project merge, run lazyscribe.project.Project.merge:
from lazyscribe import Project
myversion = Project(fpath="project.json")
otherversion = Project(fpath="other-project.json")
new = myversion.merge(otherversion)
The new project will take on the author and fpath attributes from myversion.
myversion also takes priority for the merge in specific situations.
Appending¶
lazyscribe will compare experiments in two ways:
If the
slugis the same, compare thelast_updatedvalue, orIf the
slugis not the same, compare thecreated_atvalue.
Equality is determined by the contents of the entire experiment data. So, appending
new experiments assumes a unique slug value. Suppose you have the following projects:
[
{
"author": "Me",
"created_at": "2022-01-01T09:30:00",
"last_updated": "2022-01-01T09:30:00",
"last_updated_by": "Me",
"metrics": {"auroc": 0.4},
"name": "First experiment",
"parameters": {},
"short_slug": "first-experiment",
"slug": "first-experiment-20220101093000",
"tests": [],
"artifacts": []
}
]
[
{
"author": "Me",
"created_at": "2022-01-01T09:30:00",
"last_updated": "2022-01-01T09:30:00",
"last_updated_by": "Me",
"metrics": {"auroc": 0.4},
"name": "First experiment",
"parameters": {},
"short_slug": "first-experiment",
"slug": "first-experiment-20220101093000",
"tests": [],
"artifacts": []
},
{
"author": "My Friend",
"created_at": "2022-01-05T10:30:00",
"last_updated": "2022-01-05T10:30:00",
"last_updated_by": "My Friend",
"metrics": {"auroc": 0.5},
"name": "Second experiment",
"parameters": {"features": ["col1", "col2"]},
"short_slug": "second-experiment",
"slug": "second-experiment-20220105103000",
"tests": [],
"artifacts": []
}
]
In this scenario, the first experiment is identical in each project, but Project 2 has a new experiment. The result from the merge will be Project 2’s experiment list.
Updating¶
Suppose you have the following projects:
[
{
"author": "Me",
"created_at": "2022-01-01T09:30:00",
"last_updated": "2022-01-05T11:30:00",
"last_updated_by": "Me",
"metrics": {"auroc": 0.4},
"name": "First experiment",
"parameters": {"features": ["col1"]},
"short_slug": "first-experiment",
"slug": "first-experiment-20220101093000",
"tests": [],
"artifacts": []
}
]
[
{
"author": "Me",
"created_at": "2022-01-01T09:30:00",
"last_updated": "2022-01-01T09:30:00",
"last_updated_by": "Me",
"metrics": {"auroc": 0.4},
"name": "First experiment",
"parameters": {},
"short_slug": "first-experiment",
"slug": "first-experiment-20220101093000",
"tests": [],
"artifacts": []
},
{
"author": "My Friend",
"created_at": "2022-01-05T10:30:00",
"last_updated": "2022-01-05T10:30:00",
"last_updated_by": "My Friend",
"metrics": {"auroc": 0.5},
"name": "Second experiment",
"parameters": {"features": ["col1", "col2"]},
"short_slug": "second-experiment",
"slug": "second-experiment-20220105103000",
"tests": [],
"artifacts": []
}
]
In this scenario, I forgot to log the features parameter when I created the experiment, so
I opened it in editable mode a few days later and added it. This means that Project 2 has an outdated
representation of the experiment. When the projects are merged, the newer record will be preserved for
first-experiment-20220101093000 and second-experiment-20220105103000 will be added:
[
{
"author": "Me",
"created_at": "2022-01-01T09:30:00",
"last_updated": "2022-01-05T11:30:00",
"last_updated_by": "Me",
"metrics": {"auroc": 0.4},
"name": "First experiment",
"parameters": {"features": ["col1"]},
"short_slug": "first-experiment",
"slug": "first-experiment-20220101093000",
"tests": [],
"artifacts": []
},
{
"author": "My Friend",
"created_at": "2022-01-05T10:30:00",
"last_updated": "2022-01-05T10:30:00",
"last_updated_by": "My Friend",
"metrics": {"auroc": 0.5},
"name": "Second experiment",
"parameters": {"features": ["col1", "col2"]},
"short_slug": "second-experiment",
"slug": "second-experiment-20220105103000",
"tests": [],
"artifacts": []
}
]
Handling manual updates¶
Merging updated experiments works well when the user changes the experiment through the python interface.
However, if you choose to edit the project JSON directly, please make sure to update the last_updated
field. If the last_updated field is not changed, the wrong experiment might persist in the final project.
Here, the merge methodology takes the first project as priority; if you call project1.merge(project2),
the experiment from project1 will be preserved.