Logging non-global metrics and artifacts with tests#

In this tutorial, we will demonstrate how you can use lazyscribe.test.Test objects to log metrics, parameters, and artifacts for specific sub-populations of your experiment data.

A common pattern in ML development is to evaluate a model on the overall dataset and on specific data slices (e.g. by demographic group, data source, or class). Attaching these per-slice results directly to the experiment — rather than keeping them in separate files — makes it easier to compare slices across experiments and to reproduce past evaluations.

import json
import tempfile
from pathlib import Path

from sklearn.datasets import make_classification
from sklearn.svm import SVC

from lazyscribe import Project

First, create some toy data and split off a “subpopulation” (the last 200 samples).

X, y = make_classification(n_samples=1000, n_features=10, random_state=0)
X_sub, y_sub = X[800:], y[800:]

Next, initialise the project and run the experiment. We use lazyscribe.experiment.Experiment.log_test() as a context manager to log the sub-population evaluation.

Inside the context, we can call the same log_metric() and log_parameter() methods as on a regular experiment, as well as the new log_artifact() method.

tmpdir = Path(tempfile.mkdtemp())
project = Project(fpath=tmpdir / "project.json", mode="w")

with project.log(name="base-performance") as exp:
    model = SVC(kernel="linear", random_state=0)
    model.fit(X, y)
    exp.log_metric("score", model.score(X, y))

    with exp.log_test(name="subpopulation-a") as test:
        sub_score = model.score(X_sub, y_sub)
        predictions = model.predict(X_sub).tolist()

        test.log_metric("score", sub_score)
        test.log_parameter("n_samples", len(y_sub))

        # Persist the predictions list as a JSON artifact.
        test.log_artifact(name="predictions", value=predictions, handler="json")

Artifacts are not written to disk at call time. Call lazyscribe.Project.save() to persist both the project JSON and any pending artifact files.

project.save()

Let’s verify the test was captured by printing its data.

exp_data = project["base-performance"]
test_data = exp_data.tests[0]
print(json.dumps(test_data.to_dict(), indent=4, default=str))
{
    "name": "subpopulation-a",
    "description": null,
    "metrics": {
        "score": 0.955
    },
    "parameters": {
        "n_samples": 200
    },
    "artifacts": [
        {
            "name": "predictions",
            "fname": "predictions-20260424140221.json",
            "created_at": "2026-04-24T14:02:21",
            "expiry": null,
            "version": 0,
            "handler": "json"
        }
    ]
}

To reload the test artifact in a later session, open the project in read mode and call lazyscribe.test.Test.load_artifact() on the test.

project_read = Project(fpath=tmpdir / "project.json", mode="r")
exp_read = project_read["base-performance"]
test_read = exp_read.tests[0]
loaded_predictions = test_read.load_artifact("predictions")

print(loaded_predictions)
[0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1]

Total running time of the script: (0 minutes 0.019 seconds)

Gallery generated by Sphinx-Gallery