MPI_BCAST : Message truncated錯誤
矩陣A(n*n)在n=4000時,一個cpu也能正常運算,但是當cpu個數增加的時候,能夠實現正常運算的n的值卻很小如下:np表示cpu個數
n=200 run np=4 ok!
n=500 run np=4 failure 錯誤提示如下:
1 - MPI_BCAST : Message truncated
Aborting program !
Aborting program!
p1_28511: p4_error: : 14
2 - MPI_BCAST : Message truncated
Aborting program !
rm_l_1_28528: (5.464844) net_send: could not write to fd=5, errno = 32
p3_23996: p4_error: net_recv read: probable EOF on socket: 1
rm_l_3_24013: (5.179688) net_send: could not write to fd=5, errno = 32
Aborting program!
p2_23977: p4_error: : 14
rm_l_2_23994: (5.246094) net_send: could not write to fd=5, errno = 32
p1_28511: (9.472656) net_send: could not write to fd=5, errno = 32
p2_23977: (9.250000) net_send: could not write to fd=5, errno = 32
p3_23996: (9.183594) net_send: could not write to fd=5, errno = 32
這是什麼錯誤,該如何解決呢?集群環境:
compiler:intel fortran 9.0
os:RedHat Enterprise Linux AS4.0
cpu:Nocona 3.0G
MPICH1.2.7
[火星人
]
MPI_BCAST : Message truncated錯誤已經有80次圍觀
http://coctec.com/docs/service/show-post-9509.html