Recommended Access Process for the LLC4320 dataset on SciServer

See here for some details on the LLC4320 dataset. The uncompressed dataset is about 4Pb and it can be opened on SciServer with xarray in about 100 seconds. From that point, making simple plots takes a few seconds (see below for details on how to do this yourself).

Thanks to Oliver Jahn and Miguel Jimenez-Urias for creating the data on SciServer! They transferred the data from the NASA HPC cluster where it was originally computed and transformed the dataset for more performant user access. Since that transfer finished, the NASA dataset is no longer available.

Please send feedback to Thomas.Haine@jhu.edu and/or raise issues at the Poseidon-share Github repo.

To cite this dataset please use:

Haine, T.W.N. 2025. Democratize the data: A new way to analyze and design ocean models. Oceanography 38(3), https://doi.org/10.5670/oceanog.2025.e303.

Good LLC4320 papers to cite are:

Rocha, C. B., Chereskin, T. K., Gille, S. T., & Menemenlis, D. (2016). Mesoscale to Submesoscale Wavenumber Spectra in Drake Passage. Journal of Physical Oceanography, 46(2), 601–620. https://doi.org/10.1175/JPO-D-15-0087.1
Arbic, B. K., Alford, M. H., Ansong, J. K., Buijsman, M. C., Ciotti, R. B., Farrar, J. T., Hallberg, R. W., Henze, C. E., Hill, C. N., Luecke, C. A., Menemenlis, D., Metzger, E. J., Müeller, M., Nelson, A. D., Nelson, B. C., Ngodock, H. E., Ponte, R. M., Richman, J. G., Savage, A. C., … Zhao, Z. (2018). A Primer on Global Internal Tide and Internal Gravity Wave Continuum Modeling in HYCOM and MITgcm. In E. P. Chassignet, A. Pascual, J. Tintoré, & J. Verron (Eds.), New Frontiers in Operational Oceanography (pp. 307–392). GODAE OceanView. https://doi.org/10.17125/gov2018.ch13

See also this link to LLC4320 animations. The LLC4320 cubed sphere grid is described in this paper and this user guide. “LLC” stands for “latitude-longitude cap” and “4320” stands for 4320×4320 horizontal grid points in each of 13 faces that comprise the cubed sphere. The LLC4320 dataset spans the period 13Sep2011–15Nov2012.

To quickly and interactively explore the dataset, use the Poseidon-viewer visualization tool. Launch it from your SciServer container, or use the standalone version.

Access Instructions:

Create an account on SciServer and login to SciServer.
Request access to the Grendel_ceph_Oceanography group by sending Tom Haine an email with your SciServer login name (you’ll see an invitation in the Groups tab when it’s confirmed).
Click Compute.
Create a container using:
- The Kraken domain,
- The Oceanography Compute Image and,
- The Poseidon ceph data volume.
Start the container by clicking on it in the Containers list.
Launch the Poseidon Viewer App to quickly explore the dataset interactively.
Transfer the LLC4320_access_demonstration.ipynb notebook to SciServer. This notebook opens the entire LLC4320 dataset.
Run the notebook using the Oceanography kernel to test opening the LLC4320 dataset and making some simple plots.
Optional: Transfer the LLC4320_2D_subsampled_access_demonstration.ipynb notebook to SciServer and run the notebook using the Oceanography kernel. This notebook quickly opens just the surface (2D) fields and vertically-subsampled 3D fields.

Notes:

The data is stored in zarr v.2 format and is most easily accessed using xarray.
To see all the variables available without opening anything do:
“ls -l ~/workspace/poseidon_ceph/LLC4320/Kerchunks/“
which lists the Kerchunk json metadata files (one for each variable).
Each json file provides a consolidated and performant approach for opening the entirety of said variable. It also enables users to create a dataset that contains only the variables of interest, before creating the xarray dataset (the notebook demonstrates this). This method is better (faster) than using xarray to open the whole dataset, then discarding unneeded variables.
The actual full dataset lives in: “/home/idies/workspace/poseidon_ceph/LLC4320_data”
Inside “LLC4320_data” are the folders containing the zarr stores for all the LLC4320 data. These are:
Salt Surface_Variables THETA U V W
Inside each of the “Salt“, “Surface_Variables“, “THETA“, “U“, “V“, and “W” folders are lots of zarr stores, named {“0_100“, “100_200“, “200_300″, … “10000_10100“, “10100_10200“, “10200_10311“}. The name refers to the time snapshots range they refer to. For example the “0_100” refers to the first 100 snapshots of the variable in hand.
Within each “0_100” etc. folder are the individual zarr chunk files. There are a lot of them, so beware of trying to list the directory contents (it takes a long time).
Each of these “0_100” etc. folders are self contained zarr stores. That means you can open them directly as:

xr.open_zarr(“/home/idies/workspace/poseidon_ceph/LLC4320_data/Salt/0_100”).

This will create the xarray dataset for 4D “Salt”, those (initial) 100 time snapshots. Same if you do it “10000_10100” instead, etc.
The grid and mask data lives in: “/home/idies/workspace/poseidon_ceph/LLC4320″
You can create a dataset for the grid and masks with:
xr.open_zarr(“/home/idies/workspace/poseidon_ceph/LLC4320”)
The surface (2D) and vertically sub-sampled data live in: “/home/idies/workspace/poseidon_ceph/LLC4320_2D” and “/home/idies/workspace/poseidon_ceph/LLC4320_subsample”. These data have different zarr chunking than the full dataset, which makes them faster to read.
There are a few known dataset glitches that concern some 2D fields:
- The time_fx variable (not the time variable) applies to these 2D fields: oceFWflx, oceQnet, oceQsw, oceSflux, oceTAUX, oceTAUY. There’s a 25 second difference between time_fx and time.
- A few snapshots for a few 2D fields had missing data in the original NASA files. They are:
  PhiBot, KPPhbl, oceFWflx, oceQnet, oceQsw, oceSflux, oceTAUX, oceTAUY and timesteps:
  [ 0, 288, 384, 408, 552, 588, 888, 924, 1140,
  1860, 1896, 1980, 2088, 2160, 2232, 2316, 2424, 2508,
  2592, 2676, 2772, 2856, 2868, 2904, 2952, 2964, 2976,
  3024, 3072, 3120, 3156, 3192, 3264, 3324, 4056, 4080,
  4500, 5712, 5976, 6024, 6096, 6756, 7128, 7692, 7764,
  7968, 8016, 8028, 8040, 8076, 8148, 8316, 8340, 8520,
  9336, 9408, 9492, 9732, 9864, 10140, 10141, 10142, 10143]
  These data are set to NaN.
- A few snapshots for the 2D fields (except SSH and SSH_notides) had data loss during conversion to zarr. They are: time-index range 10150-10199. These data are set to NaN.

Coming soon:

LLC4320 data access using the OceanSpy API.
LLC4320 data access using the Seaduck API.
LLC4320v2…

Recommended Access Process for the LLC4320 dataset on SciServer

Poseidon is a collaboration between researchers at:

Johns Hopkins University

Columbia University

MIT – Massachusetts Institute of Technology