Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 16 additions & 18 deletions docs/basics/variable_registry.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,24 +63,24 @@ Update the registry

Multiple functions are available to modify the plotting information of the variables in the registry, add or remove some parameters.

Ranges
------
Binning and ranges
------------------

The :func:`update_variable_registry_ranges() <plothist.variable_registry.update_variable_registry_ranges>` function automatically updates the range parameter in the ``yaml`` file to the ``min`` and ``max`` values of the variable in the dataset:
The :func:`update_variable_registry_binning() <plothist.variable_registry.update_variable_registry_binning>` function automatically updates the number of bins parameter in the ``yaml`` file to the length of [``numpy.histogram_bin_edges``](https://numpy.org/doc/2.1/reference/generated/numpy.histogram_bin_edges.html#numpy-histogram-bin-edges) minus one (the bins are regular) and automatically updates the range parameter in the ``yaml`` file to the ``min`` and ``max`` values of the variable in the dataset:

.. code-block:: python

from plothist import update_variable_registry_ranges
from plothist import update_variable_registry_binning

update_variable_registry_ranges(df, variable_keys)
update_variable_registry_binning(df, variable_keys)

The range has been updated for all the variables in ``variables_keys``. The ``yaml`` file is now:
The number of bins and the range has been updated for all the variables in ``variables_keys``. The ``yaml`` file is now:

.. code-block:: yaml

variable_0:
name: variable_0
bins: 50
bins: 121 # = len(numpy.histogram_bin_edges(df["variable_0"], bins="auto")) - 1
range:
- -10.55227774892869 # min(df["variable_0"])
- 10.04658448558009 # max(df["variable_0"])
Expand All @@ -94,10 +94,9 @@ The range has been updated for all the variables in ``variables_keys``. The ``ya
variable_1:
...

Then, you may manually modify the ``yaml`` to get a more suitable range to display in the plot.

Calling this function again on the same variable keys will not overwrite their ``range`` parameter, unless the ``overwrite`` parameter is set to ``True``.
Then, you may manually modify the ``yaml`` to get a more suitable binning and range to display in the plot.

Calling this function again on the same variable keys will not overwrite their ``bins`` or ``range`` parameter, unless the ``overwrite`` parameter is set to ``True``.

Add or modify variable properties
---------------------------------
Expand All @@ -124,7 +123,7 @@ This will add the new properties to the ``yaml`` file to all the variables in ``

variable_0:
name: variable_0
bins: 50
bins: 121
range:
- -10.55227774892869
- 10.04658448558009
Expand Down Expand Up @@ -160,15 +159,14 @@ To remove a parameter from the plotting information, you can use the :func:`remo

from plothist import remove_variable_registry_parameters

remove_variable_registry_parameters(["range", "log", "legend_ncols", "new_property"], variable_keys)
remove_variable_registry_parameters(["bins", "range", "log", "legend_ncols", "new_property"], variable_keys)

The ``yaml`` file is updated:

.. code-block:: yaml

variable_0:
name: variable_0
bins: 50
label: variable_0
legend_location: best
docstring: ''
Expand Down Expand Up @@ -197,7 +195,7 @@ Here is an example of how to create, update, and use the variable registry to pl
plot_hist,
create_variable_registry,
update_variable_registry,
update_variable_registry_ranges,
update_variable_registry_binning,
get_variable_from_registry,
add_text,
)
Expand All @@ -208,8 +206,8 @@ Here is an example of how to create, update, and use the variable registry to pl
# Create the registry
create_variable_registry(variable_keys)

# Update the ranges
update_variable_registry_ranges(df, variable_keys)
# Update the number of bins and range
update_variable_registry_binning(df, variable_keys)

# Add custom info
update_variable_registry({"text": "my analysis"}, variable_keys)
Expand Down Expand Up @@ -245,7 +243,7 @@ Example: to plot a zoom on a variable but still keep the original one, you can c

variable_0:
name: variable_0
bins: 50
bins: 121
range:
- -10
- 10
Expand All @@ -257,7 +255,7 @@ Example: to plot a zoom on a variable but still keep the original one, you can c

variable_0_zoom:
name: variable_0
bins: 50
bins: 121
range:
- -1
- 1
Expand Down
Binary file modified docs/img/2d_hist_correlations_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/img/2d_hist_correlations_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/img/2d_hist_correlations_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 3 additions & 8 deletions src/plothist/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,7 @@
get_ratio,
get_ratio_variances,
)
from .histogramming import (
create_axis,
flatten_2d_hist,
make_2d_hist,
make_hist,
)
from .histogramming import create_axis, flatten_2d_hist, make_2d_hist, make_hist
from .plothist_style import (
add_luminosity,
add_text,
Expand Down Expand Up @@ -43,7 +38,7 @@
get_variable_from_registry,
remove_variable_registry_parameters,
update_variable_registry,
update_variable_registry_ranges,
update_variable_registry_binning,
)

__all__ = [
Expand Down Expand Up @@ -81,7 +76,7 @@
"set_fitting_ylabel_fontsize",
"set_style",
"update_variable_registry",
"update_variable_registry_ranges",
"update_variable_registry_binning",
]


Expand Down
4 changes: 2 additions & 2 deletions src/plothist/examples/2d_hist/2d_hist_correlations.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@
get_variable_from_registry,
make_2d_hist,
plot_2d_hist,
update_variable_registry_ranges,
update_variable_registry_binning,
)

# No need to redo this step if the registry was already created before
variable_keys = ["variable_0", "variable_1", "variable_2"]
unique_id = str(int(time.time() * 1000))[-8:] # unique ID based on current time
temporary_registry_path = f"./_temporary_variable_registry_{unique_id}.yaml"
create_variable_registry(variable_keys, path=temporary_registry_path)
update_variable_registry_ranges(df, variable_keys, path=temporary_registry_path)
update_variable_registry_binning(df, variable_keys, path=temporary_registry_path)

# Get all the correlation plot between the variables
variable_keys_combinations = list(combinations(variable_keys, 2))
Expand Down
31 changes: 22 additions & 9 deletions src/plothist/variable_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import warnings

import boost_histogram as bh
import numpy as np
import yaml

from plothist.histogramming import create_axis
Expand Down Expand Up @@ -126,7 +127,7 @@ def create_variable_registry(
{
variable_key: {
"name": variable_key,
"bins": 50,
"bins": "auto",
"range": ("min", "max"),
"label": variable_key,
"log": False,
Expand Down Expand Up @@ -259,25 +260,27 @@ def remove_variable_registry_parameters(
_save_variable_registry(variable_registry, path=path)


def update_variable_registry_ranges(
def update_variable_registry_binning(
data,
variable_keys: list[str] | None = None,
path: str = "./variable_registry.yaml",
overwrite: bool = False,
) -> None:
"""
Update the range parameters for multiple variables in the variable registry file.
Update both the bins and range parameters for multiple variables in the variable registry file.

Parameters
----------
data : numpy.ndarray or pandas.DataFrame
A dataset containing the data for the variables.
variable_keys : list[str]
A list of variable keys for which to update the range parameters in the registry. The variable needs to have a bin and range properties in the registry. Default is None: all variables in the registry are updated.
A list of variable keys for which to update the parameters in the registry.
The variable needs to have a bin and range properties in the registry.
Default is None: all variables in the registry are updated.
path : str, optional
The path to the variable registry file (default is "./variable_registry.yaml").
overwrite : bool, optional
If True, the range parameters will be overwritten even if it's not equal to ("min", "max") (default is False).
If True, the bin and range parameters will be overwritten even if it's different from "auto" or ("min", "max") (default is False).

Returns
-------
Expand All @@ -302,13 +305,23 @@ def update_variable_registry_ranges(
f"Variable {variable_key} does not have a name, bins or range property in the registry {path}."
)

range = ("min", "max") if overwrite else variable["range"]
bins = "auto" if overwrite else variable["bins"]
bin_number = len(np.histogram_bin_edges(data[variable["name"]], bins=bins)) - 1

if tuple(range) == ("min", "max"):
axis = create_axis(variable["bins"], tuple(range), data[variable["name"]])
range_val = ("min", "max") if overwrite else variable["range"]

if bins == "auto" or tuple(range_val) == ("min", "max"):
axis = create_axis(
bin_number if bins == "auto" else bins,
tuple(range_val),
data[variable["name"]],
)
if isinstance(axis, bh.axis.Regular):
update_variable_registry(
{"range": (float(axis.edges[0]), float(axis.edges[-1]))},
{
"bins": bin_number,
"range": (float(axis.edges[0]), float(axis.edges[-1])),
},
[variable_key],
path=path,
overwrite=True,
Expand Down
Loading