測試機器,這結果,還有天理嘛?還有王法嘛?!
機器配置:I3 2100 (關閉超線程、虛擬化,以雙核使用),4G ddr3model name : Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz
$ free -m
total used free shared buffers cached
Mem: 3390 833 2556 0 36 392
-/+ buffers/cache: 405 2984
Swap: 5439 0 5439
數學庫:GotoBLAS2-1.13_bsd.tar.gz
$ mpirun -V
mpirun (Open MPI) 1.4.3
HPL 2.0
分別使用單進程、openmpi啟雙進程,測試結果如下:
單進程:
The following parameter values will be used:
N : 8192
NB : 128
PMAP : Row-major process mapping
P : 1
Q : 1
PFACT : Left Crout Right
NBMIN : 2 4
NDIV : 2
RFACT : Left Crout Right
BCAST : 1ring
DEPTH : 0
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00L2L2 8192 128 1 1 17.86 2.052e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0008950 ...... PASSED
open-mpi啟雙進程:
The following parameter values will be used:
N : 8192
NB : 128
PMAP : Row-major process mapping
P : 1
Q : 2
PFACT : Left Crout Right
NBMIN : 2 4
NDIV : 2
RFACT : Left Crout Right
BCAST : 1ring
DEPTH : 0
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00L2L2 8192 128 1 2 20.39 1.798e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0009460 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00L2L4 8192 128 1 2 20.17 1.818e+01
測試過程中使用top監控CPU使用率,發現單進程時,2個核都跑到了90%以上,xhpl進程佔cpu在190%左右。
為什麼單進程成績會比多進程更好呢?
《解決方案》
忘貼系統環境了:$ uname -a
Linux Fedora 2.6.40.3-0.fc15.i686 #1 SMP Tue Aug 16 04:24:09 UTC 2011 i686 i686 i386 GNU/Linux
$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-redhat-linux/4.6.0/lto-wrapper
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.6.0 20110603 (Red Hat 4.6.0-10) (GCC)
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-redhat-linux/4.6.0/lto-wrapper
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.6.0 20110603 (Red Hat 4.6.0-10) (GCC)
$ mpif90 -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gfortran
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-redhat-linux/4.6.0/lto-wrapper
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.6.0 20110603 (Red Hat 4.6.0-10) (GCC)
求大俠指點下!!!謝謝!!!
《解決方案》
路過的高手和達人們,請留下您寶貴的箴言吧……:mrgreen:
《解決方案》
你編譯GotoBLAS的時候,使用了多線程。
ps -Lef查看
所以,單進程也把CPU都佔滿了。雙進程由於開了多個線程,Linpack效率就很低了。
export OMP_NUM_THREADS=1
或者
更改GotoBLAS的配置文件,重新編譯。
《解決方案》
回復 4# blues083
太感謝了!!!我回頭就試。再次嚴重感謝!!!
《解決方案》
試驗結果出來了,的確是blues083所說,感謝感謝!!!
《解決方案》
呵呵,好!
《解決方案》
CU就是牛人多啊。