Recommended Access Process for the LLC4320 dataset on SciServer

See here for some details on the LLC4320 dataset. The uncompressed dataset is about 4Pb and it can be opened on SciServer with xarray in about 100 seconds. From that point, making simple plots takes a few seconds (see below for details on how to do this yourself).

Thanks to Oliver Jahn and Miguel Jimenez-Urias for creating the data on SciServer! They transferred the data from the NASA HPC cluster where it was originally computed and transformed the dataset for more performant user access.

Please send feedback to Thomas.Haine@jhu.edu and/or raise issues at the Poseidon-share Github repo.

To cite this dataset please use {zenodo doi tbc…} and:

Haine, T.W.N. 2025. Democratize the data: A new way to analyze and design ocean models. Oceanography 38(3), https://doi.org/10.5670/oceanog.2025.e303.

Good LLC4320 papers to cite are:

Rocha, C. B., Chereskin, T. K., Gille, S. T., & Menemenlis, D. (2016). Mesoscale to Submesoscale Wavenumber Spectra in Drake Passage. Journal of Physical Oceanography, 46(2), 601–620. https://doi.org/10.1175/JPO-D-15-0087.1
Arbic, B. K., Alford, M. H., Ansong, J. K., Buijsman, M. C., Ciotti, R. B., Farrar, J. T., Hallberg, R. W., Henze, C. E., Hill, C. N., Luecke, C. A., Menemenlis, D., Metzger, E. J., Müeller, M., Nelson, A. D., Nelson, B. C., Ngodock, H. E., Ponte, R. M., Richman, J. G., Savage, A. C., … Zhao, Z. (2018). A Primer on Global Internal Tide and Internal Gravity Wave Continuum Modeling in HYCOM and MITgcm. In E. P. Chassignet, A. Pascual, J. Tintoré, & J. Verron (Eds.), New Frontiers in Operational Oceanography (pp. 307–392). GODAE OceanView. https://doi.org/10.17125/gov2018.ch13

See also this link to LLC4320 animations.

Access Instructions:

Create an account on SciServer and login to SciServer.
Request access to the Grendel_ceph_Oceanography group by sending Tom Haine an email with your SciServer login name (you’ll see an invitation in the Groups tab when it’s confirmed).
Click Compute.
Create a container using:
- The Grendel K8s domain,
- The SciServer Essentials 4.0 Compute Image and,
- The Poseidon ceph data volume.
Start the container by clicking on it in the Containers list.
Transfer the LLC4320_access_demonstration.ipynb notebook to SciServer. This notebook opens the entire LLC4320 dataset.
Follow the instructions in the notebook to pip install the right packages.
Run the notebook to test opening the LLC4320 dataset and making some simple plots.
Optional: Transfer the LLC4320_2D_subsampled_access_demonstration.ipynb notebook to SciServer and run the notebook. This notebook quickly opens just the surface (2D) fields and vertically-subsampled 3D fields.

Notes:

The data is stored in zarr v.2 format and is most easily accessed using xarray.
To see all the variables available without opening anything do:
“ls -l ~/workspace/poseidon_ceph/LLC4320/Kerchunks/“
which lists the Kerchunk json metadata files (one for each variable).
Each json file provides a consolidated and performant approach for opening the entirety of said variable. It also enables users to create a dataset that contains only the variables of interest, before creating the xarray dataset (the notebook demonstrates this). This method is better (faster) than using xarray to open the whole dataset, then discarding unneeded variables.
The actual full dataset lives in: “/home/idies/workspace/poseidon_ceph/LLC4320_data”
Inside “LLC4320_data” are the folders containing the zarr stores for all the LLC4320 data. These are:
Salt Surface_Variables THETA U V W
Inside each of the “Salt“, “Surface_Variables“, “THETA“, “U“, “V“, and “W” folders are lots of zarr stores, named {“0_100“, “100_200“, “200_300″, … “10000_10100“, “10100_10200“, “10200_10311“}. The name refers to the time snapshots range they refer to. For example the “0_100” refers to the first 100 snapshots of the variable in hand.
Within each “0_100” etc. folder are the individual zarr chunk files. There are a lot of them, so beware of trying to list the directory contents (it takes a long time).
Each of these “0_100” etc. folders are self contained zarr stores. That means you can open them directly as:

xr.open_zarr(“/home/idies/workspace/poseidon_ceph/LLC4320_data/Salt/0_100”).

This will create the xarray dataset for 4D “Salt”, those (initial) 100 time snapshots. Same if you do it “10000_10100” instead, etc.
The grid and mask data lines in: “/home/idies/workspace/poseidon_ceph/LLC4320″
You can create a dataset for the grid and masks with:
xr.open_zarr(“/home/idies/workspace/poseidon_ceph/LLC4320”)
The surface (2D) and vertically sub-sampled data live in: “/home/idies/workspace/poseidon_ceph/LLC4320_2D” and “/home/idies/workspace/poseidon_ceph/LLC4320_subsample”. These data have different zarr chunking than the full dataset, which makes them faster to read.

Coming soon:

LLC4320 data access using the OceanSpy API.
LLC4320 data access using the Seaduck API.
Updated Poseidon-viewer with access to the full LLC4320 dataset (not just 10 days), plus upgrades and documentation.
Custom Oceanography image with pre-loaded and up-to-date python environment.

Recommended Access Process for the LLC4320 dataset on SciServer

Poseidon is a collaboration between researchers at:

Johns Hopkins University

Columbia University

MIT – Massachusetts Institute of Technology