Merge project files¶
This guide will walk you through the process of merging multiple project JSON files. This feature might be useful if multiple people are executing experiments with the same underlying project.
TL;DR¶
To perform a project merge, run lazyscribe.Project.merge
:
from lazyscribe import Project
myversion = Project(fpath="project.json")
otherversion = Project(fpath="other-project.json")
new = myversion.merge(otherversion)
The new project will take on the author
and fpath
attributes from myversion
.
myversion
also takes priority for the merge in specific situations.
Appending¶
lazyscribe
will compare experiments in two ways:
If the
slug
is the same, compare thelast_updated
value, orIf the
slug
is not the same, compare thecreated_at
value.
Equality is determined by the contents of the entire experiment data. So, appending
new experiments assumes a unique slug
value. Suppose you have the following projects:
[
{
"author": "Me",
"created_at": "2022-01-01T09:30:00",
"last_updated": "2022-01-01T09:30:00",
"last_updated_by": "Me",
"metrics": {"auroc": 0.4},
"name": "First experiment",
"parameters": {},
"short_slug": "first-experiment",
"slug": "first-experiment-20220101093000",
"tests": [],
"artifacts": []
}
]
[
{
"author": "Me",
"created_at": "2022-01-01T09:30:00",
"last_updated": "2022-01-01T09:30:00",
"last_updated_by": "Me",
"metrics": {"auroc": 0.4},
"name": "First experiment",
"parameters": {},
"short_slug": "first-experiment",
"slug": "first-experiment-20220101093000",
"tests": [],
"artifacts": []
},
{
"author": "My Friend",
"created_at": "2022-01-05T10:30:00",
"last_updated": "2022-01-05T10:30:00",
"last_updated_by": "My Friend",
"metrics": {"auroc": 0.5},
"name": "Second experiment",
"parameters": {"features": ["col1", "col2"]},
"short_slug": "second-experiment",
"slug": "second-experiment-20220105103000",
"tests": [],
"artifacts": []
}
]
In this scenario, the first experiment is identical in each project, but Project 2 has a new experiment. The result from the merge will be Project 2’s experiment list.
Updating¶
Suppose you have the following projects:
[
{
"author": "Me",
"created_at": "2022-01-01T09:30:00",
"last_updated": "2022-01-05T11:30:00",
"last_updated_by": "Me",
"metrics": {"auroc": 0.4},
"name": "First experiment",
"parameters": {"features": ["col1"]},
"short_slug": "first-experiment",
"slug": "first-experiment-20220101093000",
"tests": [],
"artifacts": []
}
]
[
{
"author": "Me",
"created_at": "2022-01-01T09:30:00",
"last_updated": "2022-01-01T09:30:00",
"last_updated_by": "Me",
"metrics": {"auroc": 0.4},
"name": "First experiment",
"parameters": {},
"short_slug": "first-experiment",
"slug": "first-experiment-20220101093000",
"tests": [],
"artifacts": []
},
{
"author": "My Friend",
"created_at": "2022-01-05T10:30:00",
"last_updated": "2022-01-05T10:30:00",
"last_updated_by": "My Friend",
"metrics": {"auroc": 0.5},
"name": "Second experiment",
"parameters": {"features": ["col1", "col2"]},
"short_slug": "second-experiment",
"slug": "second-experiment-20220105103000",
"tests": [],
"artifacts": []
}
]
In this scenario, I forgot to log the features
parameter when I created the experiment, so
I opened it in editable mode a few days later and added it. This means that Project 2 has an outdated
representation of the experiment. When the projects are merged, the newer record will be preserved for
first-experiment-20220101093000
and second-experiment-20220105103000
will be added:
[
{
"author": "Me",
"created_at": "2022-01-01T09:30:00",
"last_updated": "2022-01-05T11:30:00",
"last_updated_by": "Me",
"metrics": {"auroc": 0.4},
"name": "First experiment",
"parameters": {"features": ["col1"]},
"short_slug": "first-experiment",
"slug": "first-experiment-20220101093000",
"tests": [],
"artifacts": []
},
{
"author": "My Friend",
"created_at": "2022-01-05T10:30:00",
"last_updated": "2022-01-05T10:30:00",
"last_updated_by": "My Friend",
"metrics": {"auroc": 0.5},
"name": "Second experiment",
"parameters": {"features": ["col1", "col2"]},
"short_slug": "second-experiment",
"slug": "second-experiment-20220105103000",
"tests": [],
"artifacts": []
}
]
Handling manual updates¶
Merging updated experiments works well when the user changes the experiment through the python interface.
However, if you choose to edit the project JSON directly, please make sure to update the last_updated
field. If the last_updated
field is not changed, the wrong experiment might persist in the final project.
Here, the merge methodology takes the first project as priority; if you call project1.merge(project2)
,
the experiment from project1
will be preserved.