Parallel NetCDF 简介

郎欣然

2023-12-01

Parallel NetCDF API

所有C接口前加ncmpi前缀，Fortran接口前加nfmpi前缀
函数返回整数 NetCDF 状态变量

1. Variable and Parameter Types

函数采用MPI_Offset类型来表示大小参数，与size_t相比（32-bit）MPI_Offset为64位变量，表示数据几乎不受限制。

有关变量起始下标编号start，各个维度长度count，及间隔大小stride等标量或向量都需定义为MPI_Offset类型。

2. Dataset Functions

ncmpi_create与ncmpi_open函数多了一个附加参数MPI_Info，这个参数主要用于传递提示变量。调用时传递MPI_INFO_NULL则可以忽略此功能。

int ncmpi_create(MPI_Comm comm,
                 const char *path,
                 int cmode,
                 MPI_Info info,
                 int *ncidp)
int ncmpi_open(MPI_Comm comm,
               const char *path,
               int omode,
               MPI_Info info,
               int *ncidp)

3. Define Mode Functions

所有进程必须采用相同值调用这类函数，在定义结束后，所有进程定义内容会进行检查与比较。若其不相同，函数ncmpi_enddef会返回错误代码。

4. Inquiry Functions

Inquiry函数可以在定义模式（define mode）或数据模式（data mode）下被调用。

5. Attribute Functions

Attributes（属性）主要在NetCDF中储存标量或是向量来描述变量。

在原始接口中，attribute函数可以在定义模式或数据模式下调用；然而，在数据模式状态下修改attributes的值有可能会失败。主要由于文件所需空间可能会改变。

6. Data Mode Functions

数据模式（data mode）可分为两个状态：总体模式（collective mode）与独立模式（independent mode）。当用户调用ncmpi_enddef或ncmpi_open后，文件自动进入总体模式。

在总体模式内，所有进程必须在代码相同位置调用相同的函数。调用参数如 start，count，stride 等则可以不同；在独立模式内，进程不必共同调用API。

在定义状态（define mode）下不能进入独立模式，需要首先调用ncmpi_enddef来离开定义状态随后进入数据模式。

数据模式函数分为两类。第一类模仿传统的NetCDF函数并且将其简单的又传统NetCDF接口迁移成为并行NetCDF函数接口。我们称这类数据接口为高级数据模式接口(high level data mode interface)。

第二个类函数使用更多的MPI功能来提供更好的处理内部数据，并且更充分地展示MPI-IO处理应用程序的能力。所有的第一类函数将按照这类函数实现。我们这类称为灵活数据模式接口（flexible data mode interface）。

在两类函数中，都提供了包括独立模式与总体模式操作。总体模式函数名后以_all结尾。所有这些进程必须同时调用该函数。

6.1. High Level Data Mode Interface

每个独立函数都类似于NetCDF数据模式接口。主要变化就是使用MPI_Offset代替size_t类型数据。

ncmpi_put_var_<type> 将变量所有值写入Netcdf文件；
ncmpi_put_vara_<type> 写入数据部分由start向量指定起始位置，count指定各维度长度；
ncmpi_put_vars_<type> 写入数据部分由start向量指定起始位置，count指定各维度长度，stride指定各维度间隔；
ncmpi_put_varm_<type>

6.2. Flexible Data Mode Interface

6.3. Mapping Between NetCDF and MPI Types

7. Q & A

For more details, please refer to Parallel netCDF Q&A

Q: How do I use the buffered nonblocking write APIs?
A: Buffered nonblocking write APIs copy the contents of user buffers into an internally allocated buffer, so the user buffers can be reused immediately after the calls return. A typical way to use these APIs is described below.

First, tell PnetCDF how much space can be allocated to be used by the APIs.
Make calls to the buffered put APIs.
Make calls to the (collective) wait APIs.
Free the space allocated by the internal buffer.

For further information about the buffered nonblocking APIs, readers are referred to this page.

Q: What is the difference between collective and independent APIs?
A: Collective APIs requires all MPI processes to participate the call. This requirement allows MPI-IO and PnetCDF to coordinate the I/O requesting processes to rearrange requests into a form that can achieve the best performance from the underlying file system. On the contrary, independent APIs (also referred as non-collective) has no such requirement. All PnetCDF collective APIs (except create, open, and close) have a suffix of _all, corresponding to their independent counterparts. To switch from collective data mode to independent mode, users must call ncmpi_begin_indep_data. API ncmpi_begin_indep_data is to exit the independent mode.

Q: Should I use collective APIs or independent APIs?
A: Users are encouraged to use collective APIs whenever possible. Collective API calls require the participation of all MPI processes that open the shared file. This requirement allows MPI-IO and PnetCDF to coordinate the I/O requesting processes to rearrange requests into a form that can achieve the best performance from the underlying file system. If the nature of user's I/O does not permit to call collective APIs (such as the number of requests are not equal among processes, or is determined at the run time), then we recommend the followings.

Force all the processes participate the collective calls. When a process has nothing to request, users can still call a collective API with zero-length request. This is achieved by set the contents of argument count to zero.
Use nonblocking APIs. Individual processes can make any number of calls to nonblocking APIs independently from other processes. At the end, a collective wait API, ncmpi_wait_all, is recommended to used to allow all nonblocking requests to commit to the file system.

总结：推荐使用集合接口（collective APIs），不适用也尽量使。

8. Example

/*********************************************************************
 *
 *  Copyright (C) 2012, Northwestern University and Argonne National Laboratory
 *  See COPYRIGHT notice in top-level directory.
 *
 *********************************************************************/
/* $Id$ */

/* simple demonstration of pnetcdf 
 * text attribute on dataset
 * write out rank into 1-d array collectively.
 * The most basic way to do parallel i/o with pnetcdf */

/* This program creates a file, say named output.nc, with the following
   contents, shown by running ncmpidump command .

    % mpiexec -n 4 pnetcdf-write-standard /orangefs/wkliao/output.nc

    % ncmpidump /orangefs/wkliao/output.nc 
    netcdf output {
    // file format: CDF-2 (large file)
    dimensions:
            d1 = 4 ;
            time = UNLIMITED ; // (2 currently)
    variables:
            int v1(time, d1) ;
            int v2(d1) ;

    // global attributes:
                :string = "Hello World\n",
        "" ;
    data:

         v1 = 
            0, 1, 2, 3,
            1, 2, 3, 4 ;


         v2 = 0, 1, 2, 3 ;
    }
*/

#include <stdlib.h>
#include <mpi.h>
#include <pnetcdf.h>
#include <stdio.h>

static void handle_error(int status, int lineno)
{
    fprintf(stderr, "Error at line %d: %s\n", lineno, ncmpi_strerror(status));
    MPI_Abort(MPI_COMM_WORLD, 1);
}

int main(int argc, char **argv) {

    int ret, ncfile, nprocs, rank, dimid1, dimid2, varid1, varid2, ndims;
    MPI_Offset start, count=1;
    int t, i;
    int v1_dimid[2];
    MPI_Offset v1_start[2], v1_count[2];
    int v1_data[4];
    char buf[13] = "Hello World\n";
    int data;

    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);

    if (argc != 2) {
        if (rank == 0) printf("Usage: %s filename\n", argv[0]);
        MPI_Finalize();
        exit(-1);
    }

    ret = ncmpi_create(MPI_COMM_WORLD, argv[1],
                       NC_CLOBBER, MPI_INFO_NULL, &ncfile);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    ret = ncmpi_def_dim(ncfile, "d1", nprocs, &dimid1);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    ret = ncmpi_def_dim(ncfile, "time", NC_UNLIMITED, &dimid2);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    v1_dimid[0] = dimid2;
    v1_dimid[1] = dimid1;
    ndims = 2;

    ret = ncmpi_def_var(ncfile, "v1", NC_INT, ndims, v1_dimid, &varid1);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    ndims = 1;

    ret = ncmpi_def_var(ncfile, "v2", NC_INT, ndims, &dimid1, &varid2);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    ret = ncmpi_put_att_text(ncfile, NC_GLOBAL, "string", 13, buf);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    /* all processors defined the dimensions, attributes, and variables,
     * but here in ncmpi_enddef is the one place where metadata I/O
     * happens.  Behind the scenes, rank 0 takes the information and writes
     * the netcdf header.  All processes communicate to ensure they have
     * the same (cached) view of the dataset */

    ret = ncmpi_enddef(ncfile);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    start=rank, count=1, data=rank;

    ret = ncmpi_put_vara_int_all(ncfile, varid2, &start, &count, &data);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    for (t = 0; t<2; t++){

        v1_start[0] = t, v1_start[1] = rank;
        v1_count[0] = 1, v1_count[1] = 1;
        for (i = 0; i<4; i++){
            v1_data[i] = rank+t;
        }
        
        /* in this simple example every process writes its rank to two 1d variables */
        ret = ncmpi_put_vara_int_all(ncfile, varid1, v1_start, v1_count, v1_data);
        if (ret != NC_NOERR) handle_error(ret, __LINE__);

    }
    
    ret = ncmpi_close(ncfile);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    MPI_Finalize();

    return 0;
}

转载于:https://www.cnblogs.com/li12242/p/5551387.html