.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/tests.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials_tests.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials_tests.py:


Logging non-global metrics and artifacts with tests
====================================================

In this tutorial, we will demonstrate how you can use
:py:class:`lazyscribe.test.Test` objects to log metrics, parameters, and artifacts
for specific sub-populations of your experiment data.

A common pattern in ML development is to evaluate a model on the overall dataset
*and* on specific data slices (e.g. by demographic group, data source, or class).
Attaching these per-slice results directly to the experiment — rather than keeping
them in separate files — makes it easier to compare slices across experiments and
to reproduce past evaluations.

.. GENERATED FROM PYTHON SOURCE LINES 17-27

.. code-block:: Python


    import json
    import tempfile
    from pathlib import Path

    from sklearn.datasets import make_classification
    from sklearn.svm import SVC

    from lazyscribe import Project


.. GENERATED FROM PYTHON SOURCE LINES 28-29

First, create some toy data and split off a "subpopulation" (the last 200 samples).

.. GENERATED FROM PYTHON SOURCE LINES 29-33

.. code-block:: Python


    X, y = make_classification(n_samples=1000, n_features=10, random_state=0)
    X_sub, y_sub = X[800:], y[800:]


.. GENERATED FROM PYTHON SOURCE LINES 34-40

Next, initialise the project and run the experiment. We use :py:meth:`lazyscribe.experiment.Experiment.log_test`
as a context manager to log the sub-population evaluation.

Inside the context, we can call the same :py:meth:`~lazyscribe.test.Test.log_metric`
and :py:meth:`~lazyscribe.test.Test.log_parameter` methods as on a regular experiment,
as well as the new :py:meth:`~lazyscribe.test.Test.log_artifact` method.

.. GENERATED FROM PYTHON SOURCE LINES 40-59

.. code-block:: Python


    tmpdir = Path(tempfile.mkdtemp())
    project = Project(fpath=tmpdir / "project.json", mode="w")

    with project.log(name="base-performance") as exp:
        model = SVC(kernel="linear", random_state=0)
        model.fit(X, y)
        exp.log_metric("score", model.score(X, y))

        with exp.log_test(name="subpopulation-a") as test:
            sub_score = model.score(X_sub, y_sub)
            predictions = model.predict(X_sub).tolist()

            test.log_metric("score", sub_score)
            test.log_parameter("n_samples", len(y_sub))

            # Persist the predictions list as a JSON artifact.
            test.log_artifact(name="predictions", value=predictions, handler="json")


.. GENERATED FROM PYTHON SOURCE LINES 60-62

Artifacts are **not** written to disk at call time. Call :py:meth:`lazyscribe.Project.save`
to persist both the project JSON and any pending artifact files.

.. GENERATED FROM PYTHON SOURCE LINES 62-65

.. code-block:: Python


    project.save()


.. GENERATED FROM PYTHON SOURCE LINES 66-67

Let's verify the test was captured by printing its data.

.. GENERATED FROM PYTHON SOURCE LINES 67-72

.. code-block:: Python


    exp_data = project["base-performance"]
    test_data = exp_data.tests[0]
    print(json.dumps(test_data.to_dict(), indent=4, default=str))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    {
        "name": "subpopulation-a",
        "description": null,
        "metrics": {
            "score": 0.955
        },
        "parameters": {
            "n_samples": 200
        },
        "artifacts": [
            {
                "name": "predictions",
                "fname": "predictions-20260424140221.json",
                "created_at": "2026-04-24T14:02:21",
                "expiry": null,
                "version": 0,
                "handler": "json"
            }
        ]
    }


.. GENERATED FROM PYTHON SOURCE LINES 73-75

To reload the test artifact in a later session, open the project in read mode and call
:py:meth:`lazyscribe.test.Test.load_artifact` on the test.

.. GENERATED FROM PYTHON SOURCE LINES 75-82

.. code-block:: Python


    project_read = Project(fpath=tmpdir / "project.json", mode="r")
    exp_read = project_read["base-performance"]
    test_read = exp_read.tests[0]
    loaded_predictions = test_read.load_artifact("predictions")

    print(loaded_predictions)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1]


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.019 seconds)


.. _sphx_glr_download_tutorials_tests.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: tests.ipynb <tests.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: tests.py <tests.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: tests.zip <tests.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_