CPU Information
[xiajing@hostname ~]$ cat /proc/cpuinfo | more
processor : 0power management
Memory Information
HPCC Output
########################################################################
This is the DARPA/DOE HPC Challenge Benchmark version 1.4.2 October 2012
Produced by Jack Dongarra and Piotr Luszczek
Innovative Computing Laboratory
University of Tennessee Knoxville and Oak Ridge National Laboratory
See the source files for authors of specific codes.
Compiled on Dec 31 2013 at 15:40:56
Current time (1388539865) is Wed Jan 1 09:31:05 2014
Hostname: 'localhost.localdomain'
########################################################################
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 10000
NB : 80
PMAP : Row-major process mapping
P : 2
Q : 2
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
Begin of MPIRandomAccess section.
Running on 32 processors (PowerofTwo)
Total Main table size = 2^29 = 536870912 words
PE Main table size = 2^24 = 16777216 words/PE
Default number of updates (RECOMMENDED) = 2147483648
Number of updates EXECUTED = 1438577152 (for a TIME BOUND of 60.00 secs)
CPU time used = 9.393572 seconds
Real time used = 29.611642 seconds
0.048581472 Billion(10^9) Updates per second [GUP/s]
0.001518171 Billion(10^9) Updates/PE per second [GUP/s]
Verification: CPU time used = 4.051384 seconds
Verification: Real time used = 7.062347 seconds
Found 0 errors in 536870912 locations (passed).
Current time (1388539903) is Wed Jan 1 09:31:43 2014
End of MPIRandomAccess section.
Begin of StarRandomAccess section.
Main table size = 2^24 = 16777216 words
Number of updates = 67108864
CPU time used = 4.358337 seconds
Real time used = 4.572036 seconds
0.014678114 Billion(10^9) Updates per second [GUP/s]
Found 0 errors in 16777216 locations (passed).
Node(s) with error 0
Minimum GUP/s 0.013138
Average GUP/s 0.015399
Maximum GUP/s 0.027626
Current time (1388539912) is Wed Jan 1 09:31:52 2014
End of StarRandomAccess section.
Begin of SingleRandomAccess section.
Node(s) with error 0
Node selected 20
Single GUP/s 0.059106
Current time (1388539915) is Wed Jan 1 09:31:55 2014
End of SingleRandomAccess section.
Begin of MPIRandomAccess_LCG section.
Running on 32 processors (PowerofTwo)
Total Main table size = 2^29 = 536870912 words
PE Main table size = 2^24 = 16777216 words/PE
Default number of updates (RECOMMENDED) = 2147483648
Number of updates EXECUTED = 1369197504 (for a TIME BOUND of 60.00 secs)
CPU time used = 8.101769 seconds
Real time used = 24.240022 seconds
0.056484994 Billion(10^9) Updates per second [GUP/s]
0.001765156 Billion(10^9) Updates/PE per second [GUP/s]
Verification: CPU time used = 3.612450 seconds
Verification: Real time used = 5.581661 seconds
Found 0 errors in 536870912 locations (passed).
Current time (1388539946) is Wed Jan 1 09:32:26 2014
End of MPIRandomAccess_LCG section.
Begin of StarRandomAccess_LCG section.
Main table size = 2^24 = 16777216 words
Number of updates = 67108864
CPU time used = 4.337340 seconds
Real time used = 4.496009 seconds
0.014926319 Billion(10^9) Updates per second [GUP/s]
Found 0 errors in 16777216 locations (passed).
Node(s) with error 0
Minimum GUP/s 0.011877
Average GUP/s 0.015436
Maximum GUP/s 0.027538
Current time (1388539955) is Wed Jan 1 09:32:35 2014
End of StarRandomAccess_LCG section.
Begin of SingleRandomAccess_LCG section.
Node(s) with error 0
Node selected 1
Single GUP/s 0.062886
Current time (1388539957) is Wed Jan 1 09:32:37 2014
End of SingleRandomAccess_LCG section.
Begin of PTRANS section.
M: 5000
N: 5000
MB: 80
NB: 80
P: 2
Q: 2
TIME M N MB NB P Q TIME CHECK GB/s RESID
---- ----- ----- --- --- --- --- -------- ------ -------- -----
WALL 5000 5000 80 80 2 2 0.07 PASSED 2.731 0.00
CPU 5000 5000 80 80 2 2 0.07 PASSED 2.778 0.00
WALL 5000 5000 80 80 2 2 0.08 PASSED 2.639 0.00
CPU 5000 5000 80 80 2 2 0.07 PASSED 2.817 0.00
WALL 5000 5000 80 80 2 2 0.07 PASSED 2.639 0.00
CPU 5000 5000 80 80 2 2 0.07 PASSED 2.986 0.00
WALL 5000 5000 80 80 2 2 0.07 PASSED 2.639 0.00
CPU 5000 5000 80 80 2 2 0.06 PASSED 3.077 0.00
WALL 5000 5000 80 80 2 2 0.09 PASSED 2.348 0.00
CPU 5000 5000 80 80 2 2 0.08 PASSED 2.353 0.00
Finished 5 tests, with the following results:
5 tests completed and passed residual checks.
0 tests completed and failed residual checks.
0 tests skipped because of illegal input values.
END OF TESTS.
Current time (1388539960) is Wed Jan 1 09:32:40 2014
End of PTRANS section.
Begin of StarDGEMM section.
Scaled residual: 0.0126049
Node(s) with error 0
Minimum Gflop/s 2.133491
Average Gflop/s 2.482096
Maximum Gflop/s 6.068602
Current time (1388539983) is Wed Jan 1 09:33:03 2014
End of StarDGEMM section.
Begin of SingleDGEMM section.
Node(s) with error 0
Node selected 18
Single DGEMM Gflop/s 3.081236
Current time (1388540000) is Wed Jan 1 09:33:20 2014
End of SingleDGEMM section.
Begin of StarSTREAM section.
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 8333333, Offset = 0
Total memory required = 0.1863 GiB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 89854 microseconds.
(= 89854 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (GB/s) Avg time Min time Max time
Copy: 2.1708 0.0826 0.0614 0.0879
Scale: 3.6641 0.0927 0.0364 0.1360
Add: 5.2765 0.1365 0.0379 0.2205
Triad: 5.3872 0.1326 0.0371 0.2591
-------------------------------------------------------------
Results Comparison:
Expected : 9610839459316406272.000000 1922167891863281152.000000 2562890522484375040.000000
Observed : 9610839458179614720.000000 1922167891630767360.000000 2562890521962342912.000000
Solution Validates
-------------------------------------------------------------
Node(s) with error 0
Minimum Copy GB/s 1.485591
Average Copy GB/s 1.998431
Maximum Copy GB/s 6.207920
Minimum Scale GB/s 1.492092
Average Scale GB/s 2.062079
Maximum Scale GB/s 6.174541
Minimum Add GB/s 1.596921
Average Add GB/s 2.407787
Maximum Add GB/s 7.079412
Minimum Triad GB/s 1.568196
Average Triad GB/s 2.759946
Maximum Triad GB/s 7.631835
Current time (1388540005) is Wed Jan 1 09:33:25 2014
End of StarSTREAM section.
Begin of SingleSTREAM section.
Node(s) with error 0
Node selected 14
Single STREAM Copy GB/s 13.680721
Single STREAM Scale GB/s 13.557017
Single STREAM Add GB/s 15.426190
Single STREAM Triad GB/s 14.121284
Current time (1388540005) is Wed Jan 1 09:33:25 2014
End of SingleSTREAM section.
Begin of MPIFFT section.
Number of nodes: 32
Vector size: 67108864
Generation time: 0.093
Tuning: 0.115
Computing: 0.834
Inverse FFT: 0.929
max(|x-x0|): 2.156e-15
Gflop/s: 10.459
Current time (1388540008) is Wed Jan 1 09:33:28 2014
End of MPIFFT section.
Begin of StarFFT section.
Vector size: 4194304
Generation time: 0.185
Tuning: 0.001
Computing: 0.510
Inverse FFT: 0.651
max(|x-x0|): 1.861e-15
Node(s) with error 0
Minimum Gflop/s 0.513238
Average Gflop/s 0.842781
Maximum Gflop/s 1.601306
Current time (1388540010) is Wed Jan 1 09:33:30 2014
End of StarFFT section.
Begin of SingleFFT section.
Node(s) with error 0
Node selected 8
Single FFT Gflop/s 1.883471
Current time (1388540010) is Wed Jan 1 09:33:30 2014
End of SingleFFT section.
Begin of LatencyBandwidth section.
------------------------------------------------------------------
Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart
Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany
Details - level 2
-----------------
MPI_Wtime granularity.
Max. MPI_Wtick is 0.000001 sec
wtick is set to 0.000001 sec
Message Length: 8
Latency min / avg / max: 0.000457 / 0.000457 / 0.000457 msecs
Bandwidth min / avg / max: 17.507 / 17.507 / 17.507 MByte/s
Use MPI_Wtick for estimation of max pairs
message size: 8
max time : 10.000000 secs
latency for msg: 0.000457 msecs
estimation for ping pong: 0.041127 msecs
max number of ping pong pairs = 200000
max client pings = max server pongs = 447
stride for latency = 1
Message Length: 8
Latency min / avg / max: 0.000417 / 0.000835 / 0.001917 msecs
Bandwidth min / avg / max: 4.173 / 11.503 / 19.174 MByte/s
Message Length: 2000000
Latency min / avg / max: 0.220060 / 0.220060 / 0.220060 msecs
Bandwidth min / avg / max: 9088.416 / 9088.416 / 9088.416 MByte/s
MPI_Wtime granularity is ok.
message size: 2000000
max time : 30.000000 secs
latency for msg: 0.220060 msecs
estimation for ping pong: 1.760483 msecs
max number of ping pong pairs = 17040
max client pings = max server pongs = 130
stride for latency = 1
Message Length: 2000000
Latency min / avg / max: 0.214458 / 0.302969 / 0.685930 msecs
Bandwidth min / avg / max: 2915.748 / 7079.332 / 9325.857 MByte/s
Message Size: 8 Byte
Natural Order Latency: 0.002861 msec
Natural Order Bandwidth: 2.796203 MB/s
Avg Random Order Latency: 0.002614 msec
Avg Random Order Bandwidth: 3.060580 MB/s
Message Size: 2000000 Byte
Natural Order Latency: 3.729463 msec
Natural Order Bandwidth: 536.270289 MB/s
Avg Random Order Latency: 3.922102 msec
Avg Random Order Bandwidth: 509.930628 MB/s
Execution time (wall clock) = 4.566 sec on 32 processes
- for cross ping_pong latency = 0.124 sec
- for cross ping_pong bandwidth = 2.677 sec
- for ring latency = 0.033 sec
- for ring bandwidth = 1.733 sec
------------------------------------------------------------------
Latency-Bandwidth-Benchmark R1.5.1 (c) HLRS, University of Stuttgart
Written by Rolf Rabenseifner, Gerrit Schulz, and Michael Speck, Germany
Major Benchmark results:
------------------------
Max Ping Pong Latency: 0.001917 msecs
Randomly Ordered Ring Latency: 0.002614 msecs
Min Ping Pong Bandwidth: 2915.748349 MB/s
Naturally Ordered Ring Bandwidth: 536.270289 MB/s
Randomly Ordered Ring Bandwidth: 509.930628 MB/s
------------------------------------------------------------------
Detailed benchmark results:
Ping Pong:
Latency min / avg / max: 0.000417 / 0.000835 / 0.001917 msecs
Bandwidth min / avg / max: 2915.748 / 7079.332 / 9325.857 MByte/s
Ring:
On naturally ordered ring: latency= 0.002861 msec, bandwidth= 536.270289 MB/s
On randomly ordered ring: latency= 0.002614 msec, bandwidth= 509.930628 MB/s
------------------------------------------------------------------
Benchmark conditions:
The latency measurements were done with 8 bytes
The bandwidth measurements were done with 2000000 bytes
The ring communication was done in both directions on 32 processes
The Ping Pong measurements were done on
- 992 pairs of processes for latency benchmarking, and
- 992 pairs of processes for bandwidth benchmarking,
out of 32*(32-1) = 992 possible combinations on 32 processes.
(1 MB/s = 10**6 byte/sec)
------------------------------------------------------------------
Current time (1388540015) is Wed Jan 1 09:33:35 2014
End of LatencyBandwidth section.
Begin of HPL section.
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 10000
NB : 80
PMAP : Row-major process mapping
P : 2
Q : 2
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 10000 80 2 2 26.59 2.508e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0054518 ...... PASSED
================================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
Current time (1388540044) is Wed Jan 1 09:34:04 2014
End of HPL section.
Begin of Summary section.
VersionMajor=1
VersionMinor=4
VersionMicro=2
VersionRelease=f
LANG=C
Success=1
sizeof_char=1
sizeof_short=2
sizeof_int=4
sizeof_long=8
sizeof_void_ptr=8
sizeof_size_t=8
sizeof_float=4
sizeof_double=8
sizeof_s64Int=8
sizeof_u64Int=8
sizeof_struct_double_double=16
CommWorldProcs=32
MPI_Wtick=1.000000e-06
HPL_Tflops=0.0250806
HPL_time=26.5869
HPL_eps=1.11022e-16
HPL_RnormI=7.83977e-11
HPL_Anorm1=2560.41
HPL_AnormI=2560.25
HPL_Xnorm1=10832.2
HPL_XnormI=5.05887
HPL_BnormI=0.499879
HPL_N=10000
HPL_NB=80
HPL_nprow=2
HPL_npcol=2
HPL_depth=1
HPL_nbdiv=2
HPL_nbmin=4
HPL_cpfact=R
HPL_crfact=C
HPL_ctop=1
HPL_order=R
HPL_dMACH_EPS=1.110223e-16
HPL_dMACH_SFMIN=2.225074e-308
HPL_dMACH_BASE=2.000000e+00
HPL_dMACH_PREC=2.220446e-16
HPL_dMACH_MLEN=5.300000e+01
HPL_dMACH_RND=1.000000e+00
HPL_dMACH_EMIN=-1.021000e+03
HPL_dMACH_RMIN=2.225074e-308
HPL_dMACH_EMAX=1.024000e+03
HPL_dMACH_RMAX=1.797693e+308
HPL_sMACH_EPS=5.960464e-08
HPL_sMACH_SFMIN=1.175494e-38
HPL_sMACH_BASE=2.000000e+00
HPL_sMACH_PREC=1.192093e-07
HPL_sMACH_MLEN=2.400000e+01
HPL_sMACH_RND=1.000000e+00
HPL_sMACH_EMIN=-1.250000e+02
HPL_sMACH_RMIN=1.175494e-38
HPL_sMACH_EMAX=1.280000e+02
HPL_sMACH_RMAX=3.402823e+38
dweps=1.110223e-16
sweps=5.960464e-08
HPLMaxProcs=4
HPLMinProcs=4
DGEMM_N=2886
StarDGEMM_Gflops=2.4821
SingleDGEMM_Gflops=3.08124
PTRANS_GBs=2.34838
PTRANS_time=0.085165
PTRANS_residual=0
PTRANS_n=5000
PTRANS_nb=80
PTRANS_nprow=2
PTRANS_npcol=2
MPIRandomAccess_LCG_N=536870912
MPIRandomAccess_LCG_time=24.24
MPIRandomAccess_LCG_CheckTime=5.58166
MPIRandomAccess_LCG_Errors=0
MPIRandomAccess_LCG_ErrorsFraction=0
MPIRandomAccess_LCG_ExeUpdates=1369197504
MPIRandomAccess_LCG_GUPs=0.056485
MPIRandomAccess_LCG_TimeBound=60
MPIRandomAccess_LCG_Algorithm=0
MPIRandomAccess_N=536870912
MPIRandomAccess_time=29.6116
MPIRandomAccess_CheckTime=7.06235
MPIRandomAccess_Errors=0
MPIRandomAccess_ErrorsFraction=0
MPIRandomAccess_ExeUpdates=1438577152
MPIRandomAccess_GUPs=0.0485815
MPIRandomAccess_TimeBound=60
MPIRandomAccess_Algorithm=0
RandomAccess_LCG_N=16777216
StarRandomAccess_LCG_GUPs=0.0154363
SingleRandomAccess_LCG_GUPs=0.0628861
RandomAccess_N=16777216
StarRandomAccess_GUPs=0.0153986
SingleRandomAccess_GUPs=0.0591065
STREAM_VectorSize=8333333
STREAM_Threads=1
StarSTREAM_Copy=1.99843
StarSTREAM_Scale=2.06208
StarSTREAM_Add=2.40779
StarSTREAM_Triad=2.75995
SingleSTREAM_Copy=13.6807
SingleSTREAM_Scale=13.557
SingleSTREAM_Add=15.4262
SingleSTREAM_Triad=14.1213
FFT_N=4194304
StarFFT_Gflops=0.842781
SingleFFT_Gflops=1.88347
MPIFFT_N=67108864
MPIFFT_Gflops=10.4591
MPIFFT_maxErr=2.15566e-15
MPIFFT_Procs=32
MaxPingPongLatency_usec=1.91728
RandomlyOrderedRingLatency_usec=2.61388
MinPingPongBandwidth_GBytes=2.91575
NaturallyOrderedRingBandwidth_GBytes=0.53627
RandomlyOrderedRingBandwidth_GBytes=0.509931
MinPingPongLatency_usec=0.417233
AvgPingPongLatency_usec=0.835485
MaxPingPongBandwidth_GBytes=9.32586
AvgPingPongBandwidth_GBytes=7.07933
NaturallyOrderedRingLatency_usec=2.86102
FFTEnblk=16
FFTEnp=8
FFTEl2size=1048576
M_OPENMP=-1
omp_get_num_threads=0
omp_get_max_threads=0
omp_get_num_procs=0
MemProc=-1
MemSpec=-1
MemVal=-1
MPIFFT_time0=9.53674e-07
MPIFFT_time1=0.170466
MPIFFT_time2=0.10667
MPIFFT_time3=0.101029
MPIFFT_time4=0.243675
MPIFFT_time5=0.178892
MPIFFT_time6=9.53674e-07
CPS_HPCC_FFT_235=0
CPS_HPCC_FFTW_ESTIMATE=0
CPS_HPCC_MEMALLCTR=0
CPS_HPL_USE_GETPROCESSTIMES=0
CPS_RA_SANDIA_NOPT=0
CPS_RA_SANDIA_OPT2=0
CPS_USING_FFTW=0
End of Summary section.
########################################################################
End of HPC Challenge tests.
Current time (1388540044) is Wed Jan 1 09:34:04 2014
########################################################################