Evaluating the Benchmark Results
To evaluate the benchmark results, ensure you have data at path/to/data-loader/datasets, which should be downloaded using the data loader.
On a high-level the benchmark evaluation pipeline progresses in the following steps:
- Compute three error bound levels for each input dataset.
- Compress each input dataset with all the benchmark compressors for all three error bounds.
- Compute compressor performance metrics for evaluation purposes.
- Optional: Create summary plots of the benchmark results.
We will now go through each of these steps in more detail. As you work through these steps, the pipeline will progressively populate the directories datasets-error-bounds, compressed-datasets, metrics, and plots.
Create Error Bounds
Begin by creating the error bounds for each dataset using the following command:
uv run python -m climatebenchpress.compressor.scripts.create_error_bounds \
--data-loader-basepath=path/to/data-loader
datasets-error-bounds directory.
Compress Input Datasets
Next, compress all the input datasets by running:
uv run python -m climatebenchpress.compressor.scripts.compress \
--data-loader-basepath=path/to/data-loader
compressed-datasets directory with the following structure:
compressed-datasets/
dataset1/
{var_name}-{err_bound_type}={low_err_bound_val}_{var_name2}-{err_bound_type2}={low_err_bound_val2}
compressor1/
decompressed.zarr
measurements.json
compressor2/
...
{var_name}-{err_bound_type}={mid_err_bound_val}_{var_name2}-{err_bound_type2}={mid_err_bound_val2}/
...
{var_name}-{err_bound_type}={high_err_bound_val}_{var_name2}-{err_bound_type2}={high_err_bound_val2}/
...
dataset2/
...
...
var_name indicates the variable(s) in the dataset that are being compressed, while err_bound_type will be either abs_error or rel_error.
You can use additional arguments to control which compressors and datasets are processed: --exclude-compressor and --exclude-dataset to avoid using certain compressors and datasets, or --include-compressor and --include-dataset to only use selected compressors on selected datasets.
For example, the command
uv run python -m climatebenchpress.compressor.scripts.compress \
--data-loader-basepath=path/to/data-loader \
--include-compressor sz3 jpeg2000 \
--include-dataset era5
xargs.
Compute Metrics
After compression, evaluate compression metrics on the compressed datasets using:
uv run python -m climatebenchpress.compressor.scripts.compute_metrics \
--data-loader-basepath=path/to/data-loader
--exclude-compressor, --exclude-dataset, --include-compressor and --include-dataset arguments as used in the compression step.
Once the metrics are computed, combine all the metrics into a single CSV file by running:
uv run python -m climatebenchpress.compressor.scripts.concatenate_metrics
metrics/all_results.csv file which contains all the results.
Optional: Create Plots
Finally, generate visualization plots with the following command:
uv run python -m climatebenchpress.compressor.plotting.plot_metrics \
--data-loader-basepath=path/to/data-loader
plots directory. By default, this assumes access to a LaTeX compiler. If you do not have one on your system, you can add the --avoid-latex flag to this command.
Note that the full plotting process can take quite a lot of time because it generates individual plots for each error bound-compressor-dataset combination. If you want to avoid generating individual plots for certain datasets, you can do so with the --exclude-dataset command line option.