測試機器，這結果，還有天理嘛？還有王法嘛？！

←手機掃碼閱讀火星人 @ 2014-03-04 , reply:0

測試機器，這結果，還有天理嘛？還有王法嘛？！

機器配置：I3 2100 （關閉超線程、虛擬化，以雙核使用），4G ddr3model name : Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz

$ free -m
         total    used    free    shared buffers    cached
Mem:       3390       833    2556       0       36       392
-/+ buffers/cache:       405    2984
Swap:       5439       0    5439

數學庫：GotoBLAS2-1.13_bsd.tar.gz

$ mpirun -V
mpirun (Open MPI) 1.4.3

HPL 2.0
分別使用單進程、openmpi啟雙進程，測試結果如下：

單進程：
The following parameter values will be used:

N    : 8192
NB    :    128
PMAP : Row-major process mapping
P    :    1
Q    :    1
PFACT  : Left Crout Right
NBMIN  :    2       4
NDIV :    2
RFACT  : Left Crout Right
BCAST  : 1ring
DEPTH  :    0
SWAP : Mix (threshold = 64)
L1    : transposed form
U    : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
   ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be             1.110223e-16
- Computational tests pass if scaled residuals are less than             16.0

================================================================================
T/V             N NB    P    Q             Time                Gflops
--------------------------------------------------------------------------------
WR00L2L2       8192 128    1    1             17.86             2.052e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=       0.0008950 ...... PASSED
open-mpi啟雙進程：
The following parameter values will be used:

N    : 8192
NB    :    128
PMAP : Row-major process mapping
P    :    1
Q    :    2
PFACT  : Left Crout Right
NBMIN  :    2       4
NDIV :    2
RFACT  : Left Crout Right
BCAST  : 1ring
DEPTH  :    0
SWAP : Mix (threshold = 64)
L1    : transposed form
U    : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
   ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be             1.110223e-16
- Computational tests pass if scaled residuals are less than             16.0

================================================================================
T/V             N NB    P    Q             Time                Gflops
--------------------------------------------------------------------------------
WR00L2L2       8192 128    1    2             20.39             1.798e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=       0.0009460 ...... PASSED
================================================================================
T/V             N NB    P    Q             Time                Gflops
--------------------------------------------------------------------------------
WR00L2L4       8192 128    1    2             20.17             1.818e+01
測試過程中使用top監控CPU使用率，發現單進程時，2個核都跑到了90%以上，xhpl進程佔cpu在190%左右。

為什麼單進程成績會比多進程更好呢？

《解決方案》

忘貼系統環境了：$ uname -a
Linux Fedora 2.6.40.3-0.fc15.i686 #1 SMP Tue Aug 16 04:24:09 UTC 2011 i686 i686 i386 GNU/Linux
$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-redhat-linux/4.6.0/lto-wrapper
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.6.0 20110603 (Red Hat 4.6.0-10) (GCC)
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-redhat-linux/4.6.0/lto-wrapper
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.6.0 20110603 (Red Hat 4.6.0-10) (GCC)
$ mpif90 -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gfortran
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-redhat-linux/4.6.0/lto-wrapper
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux
Thread model: posix
gcc version 4.6.0 20110603 (Red Hat 4.6.0-10) (GCC)
求大俠指點下！！！謝謝！！！

《解決方案》

路過的高手和達人們，請留下您寶貴的箴言吧……:mrgreen:

《解決方案》

你編譯GotoBLAS的時候，使用了多線程。

ps -Lef查看

所以，單進程也把CPU都佔滿了。雙進程由於開了多個線程，Linpack效率就很低了。

export OMP_NUM_THREADS=1
或者
更改GotoBLAS的配置文件，重新編譯。

《解決方案》

回復 4# blues083

太感謝了！！！我回頭就試。再次嚴重感謝！！！

《解決方案》

試驗結果出來了，的確是blues083所說，感謝感謝！！！

《解決方案》

呵呵，好！

《解決方案》

CU就是牛人多啊。

Tags:

[火星人 ] 測試機器，這結果，還有天理嘛？還有王法嘛？！已經有994次圍觀

本文地址：http://coctec.com/docs/service/show-post-4880.html

測試機器，這結果，還有天理嘛？還有王法嘛？！

測試機器，這結果，還有天理嘛？還有王法嘛？！

熱門文章

最新文章