xarray数据结构之Coordinates

孟凯泽

2023-12-01

坐标(Coordinates)是存储在coords属性中的DataArray和Dataset对象的辅助变量：

In [65]: ds.coords
Out[65]: 
Coordinates:
    lat             (x, y) float64 42.25 42.21 42.63 42.59
    lon             (x, y) float64 -99.83 -99.32 -99.79 -99.23
  * time            (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
    reference_time  datetime64[ns] 2014-09-05
    day             (time) int64 6 7 8

与属性不同，xarray会在转换xarray对象的操作中解释并保留坐标。 xarray中有两种坐标类型：

维坐标是一维坐标，其名称等于其唯一的维（在输出数据集或数据数组时用*标记）。它们用于基于标签的索引和对齐，类似在pandas中DataFrame或Series上的索引(index)。实际上，这些“维度”坐标在内部使用pandas.Index来存储对应的值。
无维坐标是包含坐标数据但不是维坐标的变量。它们可以是多维的（请参见 Working with Multidimensional Coordinates），并且无维坐标的名称与其维的名称之间没有关系。无维坐标在索引或绘图时非常有用。否则，xarray不会直接使用与其关联的值。它们不用于对齐或自动索引，也不需要在进行算术运算时进行匹配（请参阅 Coordinates）。

注意：
xarray的术语与 CF terminology术语不同，在CF术语中，“维度坐标”被称为“坐标变量”，而“非维度坐标”被称为“辅助坐标变量”（有关详细信息，请参见GH1295）。

修改Coordinates

要完全添加或删除坐标数组，可以使用字典之类的语法，如上所示。

要在数据和坐标之间进行转换，可以使用set_coords()和reset_coords()方法：

In [66]: ds.reset_coords()
Out[66]: 
<xarray.Dataset>
Dimensions:             (time: 3, x: 2, y: 2)
Coordinates:
  * time                (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
Dimensions without coordinates: x, y
Data variables:
    temperature         (x, y, time) float64 11.04 23.57 20.77 ... 9.61 15.91
    temperature_double  (x, y, time) float64 22.08 47.15 41.54 ... 19.22 31.82
    precipitation       (x, y, time) float64 5.904 2.453 3.404 ... 1.709 3.947
    lat                 (x, y) float64 42.25 42.21 42.63 42.59
    lon                 (x, y) float64 -99.83 -99.32 -99.79 -99.23
    reference_time      datetime64[ns] 2014-09-05
    day                 (time) int64 6 7 8

In [67]: ds.set_coords(['temperature', 'precipitation'])
Out[67]: 
<xarray.Dataset>
Dimensions:             (time: 3, x: 2, y: 2)
Coordinates:
    temperature         (x, y, time) float64 11.04 23.57 20.77 ... 9.61 15.91
    precipitation       (x, y, time) float64 5.904 2.453 3.404 ... 1.709 3.947
    lat                 (x, y) float64 42.25 42.21 42.63 42.59
    lon                 (x, y) float64 -99.83 -99.32 -99.79 -99.23
  * time                (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
    reference_time      datetime64[ns] 2014-09-05
    day                 (time) int64 6 7 8
Dimensions without coordinates: x, y
Data variables:
    temperature_double  (x, y, time) float64 22.08 47.15 41.54 ... 19.22 31.82

In [68]: ds['temperature'].reset_coords(drop=True)
Out[68]: 
<xarray.DataArray 'temperature' (x: 2, y: 2, time: 3)>
array([[[11.041, 23.574, 20.772],
        [ 9.346,  6.683, 17.175]],

       [[11.6  , 19.536, 17.21 ],
        [ 6.301,  9.61 , 15.909]]])
Coordinates:
  * time     (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
Dimensions without coordinates: x, y

请注意，这些操作会跳过标注给定名称的坐标（用于索引）。这主要是因为我们不能完全确定如何设计接口，因为xarray无法在同一字典中存储名称相同但值不同的坐标和变量。但是我们确实认识到支持这样的事情会很有用。

Coordinates方法

Coordinates对象还具有一些有用的方法，主要用于将其转换为数据集对象：

In [69]: ds.coords.to_dataset()
Out[69]: 
<xarray.Dataset>
Dimensions:         (time: 3, x: 2, y: 2)
Coordinates:
    reference_time  datetime64[ns] 2014-09-05
    lat             (x, y) float64 42.25 42.21 42.63 42.59
  * time            (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
    day             (time) int64 6 7 8
    lon             (x, y) float64 -99.83 -99.32 -99.79 -99.23
Dimensions without coordinates: x, y
Data variables:
    *empty*

合并方法特别有趣，因为它实现了在算术运算中用于合并坐标的相同逻辑（请参阅Computation）：

In [70]: alt = xr.Dataset(coords={'z': [10], 'lat': 0, 'lon': 0})

In [71]: ds.coords.merge(alt.coords)
Out[71]: 
<xarray.Dataset>
Dimensions:         (time: 3, z: 1)
Coordinates:
  * time            (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
    reference_time  datetime64[ns] 2014-09-05
    day             (time) int64 6 7 8
  * z               (z) int64 10
Data variables:
    *empty*

如果您想实现自己的对xarray对象起作用的二元运算，那么coords.merge方法可能会很有用。将来，我们希望编写更多的辅助函数，以便您可以轻松地使函数像xarray的内置算法一样工作。

Indexes

要将坐标（或任何DataArray）转换为实际的pandas.Index，请使用to_index()方法：

In [72]: ds['time'].to_index()
Out[72]: DatetimeIndex(['2014-09-06', '2014-09-07', '2014-09-08'], dtype='datetime64[ns]', name='time', freq='D')

一个有用的快捷方式是indexs属性（在DataArray和Dataset上），它懒惰地构造一个字典，该字典的键由每个维度指定，其值是Index对象：

In [73]: ds.indexes
Out[73]: time: DatetimeIndex(['2014-09-06', '2014-09-07', '2014-09-08'], dtype='datetime64[ns]', name='time', freq='D')

MultiIndex coordinates

Xarray支持使用pandas.MultiIndex标注坐标值：

In [74]: midx = pd.MultiIndex.from_arrays([['R', 'R', 'V', 'V'], [.1, .2, .7, .9]],
   ....:                                  names=('band', 'wn'))
   ....: 

In [75]: mda = xr.DataArray(np.random.rand(4), coords={'spec': midx}, dims='spec')

In [76]: mda
Out[76]: 
<xarray.DataArray (spec: 4)>
array([0.642, 0.275, 0.462, 0.871])
Coordinates:
  * spec     (spec) MultiIndex
  - band     (spec) object 'R' 'R' 'V' 'V'
  - wn       (spec) float64 0.1 0.2 0.7 0.9

为了方便起见，可直接访问“虚拟”或“派生”坐标的多索引级别（在输出数据集或数据数组时以-标记）：

In [77]: mda['band']
Out[77]: 
<xarray.DataArray 'band' (spec: 4)>
array(['R', 'R', 'V', 'V'], dtype=object)
Coordinates:
  * spec     (spec) MultiIndex
  - band     (spec) object 'R' 'R' 'V' 'V'
  - wn       (spec) float64 0.1 0.2 0.7 0.9

In [78]: mda.wn
Out[78]: 
<xarray.DataArray 'wn' (spec: 4)>
array([0.1, 0.2, 0.7, 0.9])
Coordinates:
  * spec     (spec) MultiIndex
  - band     (spec) object 'R' 'R' 'V' 'V'
  - wn       (spec) float64 0.1 0.2 0.7 0.9

使用sel方法也可以进行多索引(multi-index)级别的索引(详情请参阅 Multi-level indexing)

与其他坐标不同，“虚拟”级别的坐标不会存储在DataArray和Dataset对象的coords属性中（尽管在输出coords属性时会显示它们）。因此，大多数与坐标相关的方法不适用于它们。它也不能用来代替一个特定的级别。

因为在DataArray或Dataset对象中，每个多索引(multi-index)级别都可以作为“虚拟”坐标进行访问，所以其名称不得与同一对象的其他级别的名称，坐标和数据变量冲突。尽管Xarray为具有未命名级别的多索引设置了默认名称，还是建议您显式设置级别的名称。

xarray数据结构之Coordinates

修改Coordinates

Coordinates方法

Indexes

MultiIndex coordinates

相关阅读

相关文章

相关问答

相关文档