Version artifacts¶
This guide will walk you through the process of using lazyscribe.repository.Repository
to access and version artifacts across projects.
A
lazyscribe.repository.Repositoryis an organized structure that stores and versions your artifacts.
Using a repository as a standalone structure¶
If you have artifacts and/or objects that were generated outside of the lazyscribe ecosystem,
you can still use them with the lazyscribe.repository.Repository structure. Similar to the guide
on saving artifacts with experiments, we will use
lazyscribe.repository.Repository.log_artifact():
from lazyscribe import Repository
repository = Repository("repository.json", mode="w")
repository.log_artifact(name="features", value=[0, 1, 2], handler="json", indent=4)
repository.save()
After lazyscribe.repository.Repository.log_artifact(), the value [0, 1, 2] will be associated
with the repository. However, it won’t appear as a JSON file until you call
lazyscribe.repository.Repository.save(). You can retrieve the artifact using
lazyscribe.repository.Repository.load_artifact():
from lazyscribe import Repository
repository = Repository("repository.json", mode="r") # read-only mode
features = repository.load_artifact(name="features")
So, what’s the big deal? In the repository class, you can log artifacts with overlapping names. Each artifact is assigned an integer version number as well as a creation date, allowing you to time-travel between versions.
from lazyscribe import Repository
# append-only mode reads in the existing repository and allows for new artifacts
repository = Repository("repository.json", mode="a")
repository.log_artifact(name="features", value=[0, 1, 2, 3], handler="json", indent=4)
repository.save()
Now we have two versions of the same features artifact. There are multiple ways to load a specific
version of your artifact.
from lazyscribe import Repository
repository = Repository("repository.json", mode="r")
# Without any additional parameters, Repository will retrieve the most recent version
newest = repository.load_artifact("features")
# You can specify a specific integer version (0-indexed)
oldest = repository.load_artifact("features", version=0)
# Or the exact datetime
on_this_date = repository.load_artifact("features", version="YYYY-MM-DDTHH:MM:SS")
# To "time-travel", use `match="asof"` with a datetime version to get the most recent version
# as of the given date
as_of_this_date = repository.load_artifact("features", version="YYYY-MM-DDTHH:MM:SS", match="asof")
Promote artifacts from experiments to the repository¶
Model experimentation is meant to be ephemeral. The Repository provides us with a structure to deploy and track versions of artifacts over time. So, how do these systems interact?
We can use lazyscribe.experiment.Experiment.promote_artifact() to associate an artifact with a repository.
The notion is that you may want to deploy/version the artifacts from the most successful experiment in
a project. Here’s how you use it.
First, let’s create a project and log an experiment:
from lazyscribe import Project
project = Project("project.json")
with project.log("my-experiment") as exp:
exp.log_artifact(name="features", value=[0, 1, 2], handler="json", indent=4)
project.save()
Now, let’s reload that project and promote the artifact to the repository:
from lazyscribe import Project, Repository
project = Project("project.json", mode="r")
repository = Repository("repository.json")
project["my-experiment"].promote_artifact(repository, "features")
If you are calling lazyscribe.experiment.Experiment.promote_artifact() after re-loading a project,
the method
copies the artifact from the experiment filesystem location to the repository filesystem location, and
calls
lazyscribe.repository.Repository.save()to ensurerepository.jsonis “in sync” with the filesystem.
If you log the artifact to an experiment and call lazyscribe.experiment.Experiment.promote_artifact() before
calling lazyscribe.project.Project.save(), it will behave exactly as if you called
lazyscribe.repository.Repository.log_artifact() – you will be responsible for calling
lazyscribe.repository.Repository.save().
Create associated groups of artifact-versions¶
Important
New in 2.0.0.
While versioning individual artifacts is useful, oftentimes we want to create groups of related assets. These assets have implicit compatibility, allowing users to time-travel through an entire deployment. We have implemented this type of functionality through releases.
All we need is a repository:
from lazyscribe import Repository
from lazyscribe import release as lzr
repository = Repository(..., mode="r")
release = lzr.create_release(repository, "v0.1.0")
The output lazyscribe.release.Release object contains 3 attributes:
tag: a string identifier for the release. Commonly coincides with semantic versioning.artifacts: a list of the latest available artifact names and versions in the repository.created_at: a creation timestamp for the release (in UTC).
Then, we can dump this release to a file:
with open("releases.json", "w") as outfile:
lzr.dump([release], outfile)
Now, if someone wants to reference the collective group of individual artifact-versions associated with this release, they can
open the repository,
load the release, and
filter the repository.
In action:
complete_repository = Repository(..., mode="r")
with open("releases.json", "r") as infile:
releases = lzr.load(infile)
my_release = lzr.find_release(releases, "v0.1.0")
filtered_repo_ = repository.filter(my_release.artifacts)
filtered_repo_ is a read-only version of the original repository object. It will have, at maximum, one
version for each artifact present in the original repository.
Just like artifacts themselves, lazyscribe.release.find_release() supports asof matches based
on the release creation timestamp.
Automated release creation via pyproject.toml¶
If you have multiple repositories under a single project header, managing groups of artifact-versions along side the project can be a challenge. We have additional functionality to synchronize your project version and repositories. Suppose you have the following set up:
├── src
│ ├── model-1
│ │ ├── ...
│ │ ├── repository.json
│ ├── model-2
│ │ ├── ...
│ │ ├── repository.json
├── pyproject.toml
we can integrate project metadata with the release files. All we have to do is add a section to our pyproject.toml
that tells Lazyscribe where to look:
[project]
version = "1.0.0"
...
[tool.lazyscribe]
repositories = [
"src/model-1/repository.json",
"src/model-2/repository.json"
]
with this configuration, we can create releases for both repositories at once using
lazyscribe.release.release_from_toml():
import lazyscribe.release as lzr
with open("pyproject.toml") as infile:
lzr.release_from_toml(infile.read())
we will have two new files in our tree.
├── src
│ ├── model-1
│ │ ├── ...
│ │ ├── repository.json
│ │ ├── releases.json
│ ├── model-2
│ │ ├── ...
│ │ ├── repository.json
│ │ ├── releases.json
├── pyproject.toml
These releases will have tag v1.0.0.