Interpolating data

Xarray offers flexible interpolation routines, which have a similar interface to our indexing.

Note

interp requires scipy installed.

Scalar and 1-dimensional interpolation

Interpolating a DataArray works mostly like labeled indexing of a DataArray,

In [1]: da = xr.DataArray(
   ...:     np.sin(0.3 * np.arange(12).reshape(4, 3)),
   ...:     [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
   ...: )
   ...: 

# label lookup
In [2]: da.sel(time=3)
Out[2]: 
<xarray.DataArray (space: 3)>
array([ 0.427,  0.141, -0.158])
Coordinates:
    time     int64 3
  * space    (space) float64 0.1 0.2 0.3

# interpolation
In [3]: da.interp(time=2.5)
Out[3]: 
<xarray.DataArray (space: 3)>
array([0.701, 0.502, 0.259])
Coordinates:
  * space    (space) float64 0.1 0.2 0.3
    time     float64 2.5

Similar to the indexing, interp() also accepts an array-like, which gives the interpolated result as an array.

# label lookup
In [4]: da.sel(time=[2, 3])
Out[4]: 
<xarray.DataArray (time: 2, space: 3)>
array([[ 0.974,  0.863,  0.675],
       [ 0.427,  0.141, -0.158]])
Coordinates:
  * time     (time) int64 2 3
  * space    (space) float64 0.1 0.2 0.3

# interpolation
In [5]: da.interp(time=[2.5, 3.5])
Out[5]: 
<xarray.DataArray (time: 2, space: 3)>
array([[0.701, 0.502, 0.259],
       [  nan,   nan,   nan]])
Coordinates:
  * space    (space) float64 0.1 0.2 0.3
  * time     (time) float64 2.5 3.5

To interpolate data with a numpy.datetime64 coordinate you can pass a string.

In [6]: da_dt64 = xr.DataArray(
   ...:     [1, 3], [("time", pd.date_range("1/1/2000", "1/3/2000", periods=2))]
   ...: )
   ...: 

In [7]: da_dt64.interp(time="2000-01-02")
Out[7]: 
<xarray.DataArray ()>
array(2.)
Coordinates:
    time     datetime64[ns] 2000-01-02

The interpolated data can be merged into the original DataArray by specifying the time periods required.

In [8]: da_dt64.interp(time=pd.date_range("1/1/2000", "1/3/2000", periods=3))
Out[8]: 
<xarray.DataArray (time: 3)>
array([1., 2., 3.])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03

Interpolation of data indexed by a CFTimeIndex is also allowed. See Non-standard calendars and dates outside the Timestamp-valid range for examples.

Note

Currently, our interpolation only works for regular grids. Therefore, similarly to sel(), only 1D coordinates along a dimension can be used as the original coordinate to be interpolated.

Multi-dimensional Interpolation

Like sel(), interp() accepts multiple coordinates. In this case, multidimensional interpolation is carried out.

# label lookup
In [9]: da.sel(time=2, space=0.1)
Out[9]: 
<xarray.DataArray ()>
array(0.974)
Coordinates:
    time     int64 2
    space    float64 0.1

# interpolation
In [10]: da.interp(time=2.5, space=0.15)
Out[10]: 
<xarray.DataArray ()>
array(0.601)
Coordinates:
    time     float64 2.5
    space    float64 0.15

Array-like coordinates are also accepted:

# label lookup
In [11]: da.sel(time=[2, 3], space=[0.1, 0.2])
Out[11]: 
<xarray.DataArray (time: 2, space: 2)>
array([[0.974, 0.863],
       [0.427, 0.141]])
Coordinates:
  * time     (time) int64 2 3
  * space    (space) float64 0.1 0.2

# interpolation
In [12]: da.interp(time=[1.5, 2.5], space=[0.15, 0.25])
Out[12]: 
<xarray.DataArray (time: 2, space: 2)>
array([[0.888, 0.867],
       [0.601, 0.381]])
Coordinates:
  * time     (time) float64 1.5 2.5
  * space    (space) float64 0.15 0.25

interp_like() method is a useful shortcut. This method interpolates an xarray object onto the coordinates of another xarray object. For example, if we want to compute the difference between two DataArray s (da and other) staying on slightly different coordinates,

In [13]: other = xr.DataArray(
   ....:     np.sin(0.4 * np.arange(9).reshape(3, 3)),
   ....:     [("time", [0.9, 1.9, 2.9]), ("space", [0.15, 0.25, 0.35])],
   ....: )
   ....: 

it might be a good idea to first interpolate da so that it will stay on the same coordinates of other, and then subtract it. interp_like() can be used for such a case,

# interpolate da along other's coordinates
In [14]: interpolated = da.interp_like(other)

In [15]: interpolated
Out[15]: 
<xarray.DataArray (time: 3, space: 3)>
array([[0.787, 0.911,   nan],
       [0.912, 0.789,   nan],
       [0.348, 0.069,   nan]])
Coordinates:
  * time     (time) float64 0.9 1.9 2.9
  * space    (space) float64 0.15 0.25 0.35

It is now possible to safely compute the difference other - interpolated.

Interpolation methods

We use scipy.interpolate.interp1d for 1-dimensional interpolation. For multi-dimensional interpolation, an attempt is first made to decompose the interpolation in a series of 1-dimensional interpolations, in which case scipy.interpolate.interp1d is used. If a decomposition cannot be made (e.g. with advanced interpolation), scipy.interpolate.interpn() is used.

The interpolation method can be specified by the optional method argument.

In [16]: da = xr.DataArray(
   ....:     np.sin(np.linspace(0, 2 * np.pi, 10)),
   ....:     dims="x",
   ....:     coords={"x": np.linspace(0, 1, 10)},
   ....: )
   ....: 

In [17]: da.plot.line("o", label="original")
Out[17]: [<matplotlib.lines.Line2D at 0x7f44f5eb4090>]

In [18]: da.interp(x=np.linspace(0, 1, 100)).plot.line(label="linear (default)")
Out[18]: [<matplotlib.lines.Line2D at 0x7f4505f4b1d0>]

In [19]: da.interp(x=np.linspace(0, 1, 100), method="cubic").plot.line(label="cubic")
Out[19]: [<matplotlib.lines.Line2D at 0x7f44f5d47090>]

In [20]: plt.legend()
Out[20]: <matplotlib.legend.Legend at 0x7f44d5b738d0>
../_images/interpolation_sample1.png

Additional keyword arguments can be passed to scipy’s functions.

# fill 0 for the outside of the original coordinates.
In [21]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": 0.0})
Out[21]: 
<xarray.DataArray (x: 10)>
array([ 0.   ,  0.   ,  0.   ,  0.814,  0.604, -0.604, -0.814,  0.   ,  0.   ,  0.   ])
Coordinates:
  * x        (x) float64 -0.5 -0.2778 -0.05556 0.1667 ... 0.8333 1.056 1.278 1.5

# 1-dimensional extrapolation
In [22]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": "extrapolate"})
Out[22]: 
<xarray.DataArray (x: 10)>
array([-2.893, -1.607, -0.321,  0.814,  0.604, -0.604, -0.814,  0.321,  1.607,  2.893])
Coordinates:
  * x        (x) float64 -0.5 -0.2778 -0.05556 0.1667 ... 0.8333 1.056 1.278 1.5

# multi-dimensional extrapolation
In [23]: da = xr.DataArray(
   ....:     np.sin(0.3 * np.arange(12).reshape(4, 3)),
   ....:     [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
   ....: )
   ....: 

In [24]: da.interp(
   ....:     time=4, space=np.linspace(-0.1, 0.5, 10), kwargs={"fill_value": "extrapolate"}
   ....: )
   ....: 
Out[24]: 
<xarray.DataArray (space: 10)>
array([ 0.805,  0.497,  0.189, -0.119, -0.427, -0.718, -0.991, -1.264, -1.538, -1.811])
Coordinates:
    time     int64 4
  * space    (space) float64 -0.1 -0.03333 0.03333 0.1 ... 0.3 0.3667 0.4333 0.5

Advanced Interpolation

interp() accepts DataArray as similar to sel(), which enables us more advanced interpolation. Based on the dimension of the new coordinate passed to interp(), the dimension of the result are determined.

For example, if you want to interpolate a two dimensional array along a particular dimension, as illustrated below, you can pass two 1-dimensional DataArray s with a common dimension as new coordinate.

advanced indexing and interpolation

For example:

In [25]: da = xr.DataArray(
   ....:     np.sin(0.3 * np.arange(20).reshape(5, 4)),
   ....:     [("x", np.arange(5)), ("y", [0.1, 0.2, 0.3, 0.4])],
   ....: )
   ....: 

# advanced indexing
In [26]: x = xr.DataArray([0, 2, 4], dims="z")

In [27]: y = xr.DataArray([0.1, 0.2, 0.3], dims="z")

In [28]: da.sel(x=x, y=y)
Out[28]: 
<xarray.DataArray (z: 3)>
array([ 0.   ,  0.427, -0.773])
Coordinates:
    x        (z) int64 0 2 4
    y        (z) float64 0.1 0.2 0.3
Dimensions without coordinates: z

# advanced interpolation, without extrapolation
In [29]: x = xr.DataArray([0.5, 1.5, 2.5, 3.5], dims="z")

In [30]: y = xr.DataArray([0.15, 0.25, 0.35, 0.45], dims="z")

In [31]: da.interp(x=x, y=y)
Out[31]: 
<xarray.DataArray (z: 4)>
array([ 0.556,  0.635, -0.466,    nan])
Coordinates:
    x        (z) float64 0.5 1.5 2.5 3.5
    y        (z) float64 0.15 0.25 0.35 0.45
Dimensions without coordinates: z

where values on the original coordinates (x, y) = ((0.5, 0.15), (1.5, 0.25), (2.5, 0.35), (3.5, 0.45)) are obtained by the 2-dimensional interpolation and mapped along a new dimension z. Since no keyword arguments are passed to the interpolation routine, no extrapolation is performed resulting in a nan value.

If you want to add a coordinate to the new dimension z, you can supply DataArray s with a coordinate. Extrapolation can be achieved by passing additional arguments to SciPy’s interpnd function,

In [32]: x = xr.DataArray([0.5, 1.5, 2.5, 3.5], dims="z", coords={"z": ["a", "b", "c", "d"]})

In [33]: y = xr.DataArray(
   ....:     [0.15, 0.25, 0.35, 0.45], dims="z", coords={"z": ["a", "b", "c", "d"]}
   ....: )
   ....: 

In [34]: da.interp(x=x, y=y, kwargs={"fill_value": None})
Out[34]: 
<xarray.DataArray (z: 4)>
array([ 0.556,  0.635, -0.466, -0.735])
Coordinates:
    x        (z) float64 0.5 1.5 2.5 3.5
    y        (z) float64 0.15 0.25 0.35 0.45
  * z        (z) <U1 'a' 'b' 'c' 'd'

For the details of the advanced indexing, see more advanced indexing.

Interpolating arrays with NaN

Our interp() works with arrays with NaN the same way that scipy.interpolate.interp1d and scipy.interpolate.interpn do. linear and nearest methods return arrays including NaN, while other methods such as cubic or quadratic return all NaN arrays.

In [35]: da = xr.DataArray([0, 2, np.nan, 3, 3.25], dims="x", coords={"x": range(5)})

In [36]: da.interp(x=[0.5, 1.5, 2.5])
Out[36]: 
<xarray.DataArray (x: 3)>
array([ 1., nan, nan])
Coordinates:
  * x        (x) float64 0.5 1.5 2.5

In [37]: da.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[37]: 
<xarray.DataArray (x: 3)>
array([nan, nan, nan])
Coordinates:
  * x        (x) float64 0.5 1.5 2.5

To avoid this, you can drop NaN by dropna(), and then make the interpolation

In [38]: dropped = da.dropna("x")

In [39]: dropped
Out[39]: 
<xarray.DataArray (x: 4)>
array([0.  , 2.  , 3.  , 3.25])
Coordinates:
  * x        (x) int64 0 1 3 4

In [40]: dropped.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[40]: 
<xarray.DataArray (x: 3)>
array([1.19 , 2.508, 2.93 ])
Coordinates:
  * x        (x) float64 0.5 1.5 2.5

If NaNs are distributed randomly in your multidimensional array, dropping all the columns containing more than one NaNs by dropna() may lose a significant amount of information. In such a case, you can fill NaN by interpolate_na(), which is similar to pandas.Series.interpolate().

In [41]: filled = da.interpolate_na(dim="x")

In [42]: filled
Out[42]: 
<xarray.DataArray (x: 5)>
array([0.  , 2.  , 2.5 , 3.  , 3.25])
Coordinates:
  * x        (x) int64 0 1 2 3 4

This fills NaN by interpolating along the specified dimension. After filling NaNs, you can interpolate:

In [43]: filled.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[43]: 
<xarray.DataArray (x: 3)>
array([1.309, 2.316, 2.738])
Coordinates:
  * x        (x) float64 0.5 1.5 2.5

For the details of interpolate_na(), see Missing values.

Example

Let’s see how interp() works on real data.

# Raw data
In [44]: ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /build/python-xarray-VrETFG/python-xarray-2023.01.0/xarray/tutorial.py:132, in open_dataset(name, cache, cache_dir, engine, **kws)
    131 try:
--> 132     import pooch
    133 except ImportError as e:

ModuleNotFoundError: No module named 'pooch'

The above exception was the direct cause of the following exception:

ImportError                               Traceback (most recent call last)
Cell In [44], line 1
----> 1 ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)

File /build/python-xarray-VrETFG/python-xarray-2023.01.0/xarray/tutorial.py:134, in open_dataset(name, cache, cache_dir, engine, **kws)
    132     import pooch
    133 except ImportError as e:
--> 134     raise ImportError(
    135         "tutorial.open_dataset depends on pooch to download and manage datasets."
    136         " To proceed please install pooch."
    137     ) from e
    139 logger = pooch.get_logger()
    140 logger.setLevel("WARNING")

ImportError: tutorial.open_dataset depends on pooch to download and manage datasets. To proceed please install pooch.

In [45]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))

In [46]: ds.air.plot(ax=axes[0])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [46], line 1
----> 1 ds.air.plot(ax=axes[0])

File /build/python-xarray-VrETFG/python-xarray-2023.01.0/xarray/core/common.py:278, in AttrAccessMixin.__getattr__(self, name)
    276         with suppress(KeyError):
    277             return source[name]
--> 278 raise AttributeError(
    279     f"{type(self).__name__!r} object has no attribute {name!r}"
    280 )

AttributeError: 'Dataset' object has no attribute 'air'

In [47]: axes[0].set_title("Raw data")
Out[47]: Text(0.5, 1.0, 'Raw data')

# Interpolated data
In [48]: new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims["lon"] * 4)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [48], line 1
----> 1 new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims["lon"] * 4)

File /build/python-xarray-VrETFG/python-xarray-2023.01.0/xarray/core/common.py:278, in AttrAccessMixin.__getattr__(self, name)
    276         with suppress(KeyError):
    277             return source[name]
--> 278 raise AttributeError(
    279     f"{type(self).__name__!r} object has no attribute {name!r}"
    280 )

AttributeError: 'Dataset' object has no attribute 'lon'

In [49]: new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims["lat"] * 4)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [49], line 1
----> 1 new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims["lat"] * 4)

File /build/python-xarray-VrETFG/python-xarray-2023.01.0/xarray/core/common.py:278, in AttrAccessMixin.__getattr__(self, name)
    276         with suppress(KeyError):
    277             return source[name]
--> 278 raise AttributeError(
    279     f"{type(self).__name__!r} object has no attribute {name!r}"
    280 )

AttributeError: 'Dataset' object has no attribute 'lat'

In [50]: dsi = ds.interp(lat=new_lat, lon=new_lon)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [50], line 1
----> 1 dsi = ds.interp(lat=new_lat, lon=new_lon)

NameError: name 'new_lat' is not defined

In [51]: dsi.air.plot(ax=axes[1])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [51], line 1
----> 1 dsi.air.plot(ax=axes[1])

NameError: name 'dsi' is not defined

In [52]: axes[1].set_title("Interpolated data")
Out[52]: Text(0.5, 1.0, 'Interpolated data')
../_images/interpolation_sample3.png

Our advanced interpolation can be used to remap the data to the new coordinate. Consider the new coordinates x and z on the two dimensional plane. The remapping can be done as follows

# new coordinate
In [53]: x = np.linspace(240, 300, 100)

In [54]: z = np.linspace(20, 70, 100)

# relation between new and original coordinates
In [55]: lat = xr.DataArray(z, dims=["z"], coords={"z": z})

In [56]: lon = xr.DataArray(
   ....:     (x[:, np.newaxis] - 270) / np.cos(z * np.pi / 180) + 270,
   ....:     dims=["x", "z"],
   ....:     coords={"x": x, "z": z},
   ....: )
   ....: 

In [57]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))

In [58]: ds.air.plot(ax=axes[0])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [58], line 1
----> 1 ds.air.plot(ax=axes[0])

File /build/python-xarray-VrETFG/python-xarray-2023.01.0/xarray/core/common.py:278, in AttrAccessMixin.__getattr__(self, name)
    276         with suppress(KeyError):
    277             return source[name]
--> 278 raise AttributeError(
    279     f"{type(self).__name__!r} object has no attribute {name!r}"
    280 )

AttributeError: 'Dataset' object has no attribute 'air'

# draw the new coordinate on the original coordinates.
In [59]: for idx in [0, 33, 66, 99]:
   ....:     axes[0].plot(lon.isel(x=idx), lat, "--k")
   ....: 

In [60]: for idx in [0, 33, 66, 99]:
   ....:     axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), "--k")
   ....: 

In [61]: axes[0].set_title("Raw data")
Out[61]: Text(0.5, 1.0, 'Raw data')

In [62]: dsi = ds.interp(lon=lon, lat=lat)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [62], line 1
----> 1 dsi = ds.interp(lon=lon, lat=lat)

File /build/python-xarray-VrETFG/python-xarray-2023.01.0/xarray/core/dataset.py:3313, in Dataset.interp(self, coords, method, assume_sorted, kwargs, method_non_numeric, **coords_kwargs)
   3310     kwargs = {}
   3312 coords = either_dict_or_kwargs(coords, coords_kwargs, "interp")
-> 3313 indexers = dict(self._validate_interp_indexers(coords))
   3315 if coords:
   3316     # This avoids broadcasting over coordinates that are both in
   3317     # the original array AND in the indexing array. It essentially
   3318     # forces interpolation along the shared coordinates.
   3319     sdims = (
   3320         set(self.dims)
   3321         .intersection(*[set(nx.dims) for nx in indexers.values()])
   3322         .difference(coords.keys())
   3323     )

File /build/python-xarray-VrETFG/python-xarray-2023.01.0/xarray/core/dataset.py:2316, in Dataset._validate_interp_indexers(self, indexers)
   2312 def _validate_interp_indexers(
   2313     self, indexers: Mapping[Any, Any]
   2314 ) -> Iterator[tuple[Hashable, Variable]]:
   2315     """Variant of _validate_indexers to be used for interpolation"""
-> 2316     for k, v in self._validate_indexers(indexers):
   2317         if isinstance(v, Variable):
   2318             if v.ndim == 1:

File /build/python-xarray-VrETFG/python-xarray-2023.01.0/xarray/core/dataset.py:2281, in Dataset._validate_indexers(self, indexers, missing_dims)
   2278 from xarray.coding.cftimeindex import CFTimeIndex
   2279 from xarray.core.dataarray import DataArray
-> 2281 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)
   2283 # all indexers should be int, slice, np.ndarrays, or Variable
   2284 for k, v in indexers.items():

File /build/python-xarray-VrETFG/python-xarray-2023.01.0/xarray/core/utils.py:857, in drop_dims_from_indexers(indexers, dims, missing_dims)
    855     invalid = indexers.keys() - set(dims)
    856     if invalid:
--> 857         raise ValueError(
    858             f"Dimensions {invalid} do not exist. Expected one or more of {dims}"
    859         )
    861     return indexers
    863 elif missing_dims == "warn":
    864 
    865     # don't modify input

ValueError: Dimensions {'lat', 'lon'} do not exist. Expected one or more of Frozen({'x': 3, 'y': 4})

In [63]: dsi.air.plot(ax=axes[1])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [63], line 1
----> 1 dsi.air.plot(ax=axes[1])

NameError: name 'dsi' is not defined

In [64]: axes[1].set_title("Remapped data")
Out[64]: Text(0.5, 1.0, 'Remapped data')
../_images/interpolation_sample4.png