關於heartbeat的郵件通知問題

←手機掃碼閱讀火星人 @ 2014-03-04 , reply:0

關於heartbeat的郵件通知問題

我配置好后可以收到郵件通知，但是沒有標題，內容。

順便問下heartbeat能不能通知自已獲得的一些信息呢？比如另一個節點死掉。

# cat /var/lib/heartbeat/crm/cib.xml
<cib admin_epoch="0" epoch="2" num_updates="1" generated="false" have_quorum="false" ignore_dtd="false" num_peers="0" cib_feature_revision="2.0" cib-last-written="Thu Mar 26 12:12:55 2009">
<configuration>
   <crm_config>
   <cluster_property_set id="cib-bootstrap-options">
      <attributes>
         <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>
         <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="stop"/>
         <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="0"/>
         <nvpair id="cib-bootstrap-options-default-resource-failure-stickiness" name="default-resource-failure-stickiness" value="0"/>
         <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
         <nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="reboot"/>
         <nvpair id="cib-bootstrap-options-startup-fencing" name="startup-fencing" value="true"/>
         <nvpair id="cib-bootstrap-options-stop-orphan-resources" name="stop-orphan-resources" value="true"/>
         <nvpair id="cib-bootstrap-options-stop-orphan-actions" name="stop-orphan-actions" value="true"/>
         <nvpair id="cib-bootstrap-options-remove-after-stop" name="remove-after-stop" value="false"/>
         <nvpair id="cib-bootstrap-options-short-resource-names" name="short-resource-names" value="true"/>
         <nvpair id="cib-bootstrap-options-transition-idle-timeout" name="transition-idle-timeout" value="5min"/>
         <nvpair id="cib-bootstrap-options-default-action-timeout" name="default-action-timeout" value="20s"/>
         <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/>
         <nvpair id="cib-bootstrap-options-cluster-delay" name="cluster-delay" value="60s"/>
         <nvpair id="cib-bootstrap-options-pe-error-series-max" name="pe-error-series-max" value="-1"/>
         <nvpair id="cib-bootstrap-options-pe-warn-series-max" name="pe-warn-series-max" value="-1"/>
         <nvpair id="cib-bootstrap-options-pe-input-series-max" name="pe-input-series-max" value="-1"/>
         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: 552305612591183b1628baa5bc6e903e0f1e26a3"/>
      </attributes>
   </cluster_property_set>
   </crm_config>
   <nodes>
   <node id="6e49df15-d383-4926-8aa4-030556557a53" uname="f801.haijiye" type="normal"/>
   <node id="fac48e02-49f9-45af-b8fe-81a797f01586" uname="f802.haijiye" type="normal"/>
   </nodes>
   <resources>
   <group id="group_1">
      <primitive class="ocf" id="IPaddr_192_168_1_118" provider="heartbeat" type="IPaddr">
         <operations>
         <op id="IPaddr_192_168_1_118_mon" interval="5s" name="monitor" timeout="5s"/>
         </operations>
         <instance_attributes id="IPaddr_192_168_1_118_inst_attr">
         <attributes>
            <nvpair id="IPaddr_192_168_1_118_attr_0" name="ip" value="192.168.1.118"/>
         </attributes>
         </instance_attributes>
      </primitive>
      <primitive class="heartbeat" id="httpd_2" provider="heartbeat" type="httpd">
         <operations>
         <op id="httpd_2_mon" interval="120s" name="monitor" timeout="60s"/>
         </operations>
      </primitive>
      <primitive class="ocf" id="MailTo_3" provider="heartbeat" type="MailTo">
         <operations>
         <op id="MailTo_3_mon" interval="120s" name="monitor" timeout="60s"/>
         </operations>
         <instance_attributes id="MailTo_3_inst_attr">
         <attributes>
            <nvpair id="MailTo_3_attr_0" name="email" value="cocobear@yeah.net"/>
            <nvpair id="MailTo_3_attr_1" name="subject" value="Status-of-httpd-changed"/>
         </attributes>
         </instance_attributes>
      </primitive>
   </group>
   </resources>
   <constraints>
   <rsc_location id="rsc_location_group_1" rsc="group_1">
      <rule id="prefered_location_group_1" score="100">
         <expression attribute="#uname" id="prefered_location_group_1_expr" operation="eq" value="f801.haijiye"/>
      </rule>
   </rsc_location>
   </constraints>
</configuration>
</cib>

《解決方案》

回復 #1 可可熊的帖子

Querying a parameter of a resource. Say the resource is the following:
<primitive id="example_mail" class="ocf" type="MailTo" provider="heartbeat">
<instance_attributes id="example_mail_inst">
<attributes>
<nvpair id="example_mail_inst_attr0" name="email" value="root"/>
<nvpair id="example_mail_inst_attr1" name="subject" value="Example Failover"/>
</attributes>
</instance_attributes>
</primitive>

You could query the email address using the following:
crm_resource -r example_mail -g email
可以測試
crm_resource -r example_mail -p email -v "abc@abc.com"

[ 本帖最後由 kns1024wh 於 2009-3-26 15:08 編輯 ]

《解決方案》

我是使用V2的heartbeat，使用了CRM模塊，我是想在其中一個節點出問題的時候能收到郵件通知，我測試的時候主動關閉heartbeat：

service heartbeat stop

可以收到郵件，不過郵件是空的，不知道是怎麼回事。

不過我希望能在從機出現問題的時候主機也能發送郵件通知，這點好像做不到，我試著重啟從機了，沒有收到任何郵件。

《解決方案》

原帖由 kns1024wh 於 2009-3-26 13:19 發表 http://linux.chinaunix.net/bbs/images/common/back.gif

[ 本帖最後由 kns1024wh 於 2009-3-26 16:04 編輯 ]

《解決方案》

比如，從機關掉后，主機的heartbeat會寫日誌：

crmd: 2009/03/26_14:05:25 notice: crmd_ha_status_callback: Status update: Node f802.haijiye now has status

能不能在這種情況下進行mail通知呢？

《解決方案》

<primitive id="resource_" class="ocf" type="MailTo" provider="heartbeat">
      <meta_attributes id="resource__meta_attrs">
         <attributes>
         <nvpair id="resource__metaattr_target_role" name="target_role" value="started"/>
         </attributes>
      </meta_attributes>
      <instance_attributes id="resource__instance_attrs">
         <attributes>
         <nvpair id="0af557aa-018e-4fa5-9b48-953f2d33750a" name="email" value="lvsheat@qq.com"/>
         <nvpair id="f68a1346-2bd3-42ae-8b5a-3c96bbef8910" name="subject" value="test001"/>
         </attributes>
      </instance_attributes>
   </primitive>
test001 Takeover in progress at Thu Mar 26 15:44:49 CST 2009 onxxxxx
test001 Migrating resource away at Thu Mar 26 15:52:44 CST 2009 fromxxxxx

[ 本帖最後由 kns1024wh 於 2009-3-26 16:05 編輯 ]

《解決方案》

回復 #6 kns1024wh 的帖子

我現在是一台機器上可以發出帶內容的郵件，另一台發出去的全是空白。

f801是那台發出去空白的伺服器，我重啟了f802，f801接管的時候發出去的是空白

ib: 2009/03/26_16:02:44 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
cib: 2009/03/26_16:02:44 info: mem_handle_event: no mbr_track info
cib: 2009/03/26_16:02:44 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
cib: 2009/03/26_16:02:44 info: mem_handle_event: instance=9, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
cib: 2009/03/26_16:02:44 info: cib_ccm_msg_callback: LOST: f802.haijiye
cib: 2009/03/26_16:02:44 info: cib_ccm_msg_callback: PEER: f801.haijiye
cib: 2009/03/26_16:02:44 info: cib_process_readwrite: We are now in R/W mode
pengine: 2009/03/26_16:02:44 info: G_main_add_SignalHandler: Added signal handler for signal 15
pengine: 2009/03/26_16:02:44 info: pe_init: Starting pengine
crmd: 2009/03/26_16:02:44 info: join_make_offer: Making join offers based on membership 9
crmd: 2009/03/26_16:02:44 info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks
tengine: 2009/03/26_16:02:44 info: G_main_add_SignalHandler: Added signal handler for signal 15
tengine: 2009/03/26_16:02:44 info: G_main_add_TriggerHandler: Added signal manual handler
tengine: 2009/03/26_16:02:44 info: G_main_add_TriggerHandler: Added signal manual handler
tengine: 2009/03/26_16:02:44 info: te_init: Registering TE UUID: 112c0c00-03fb-4f6a-be94-4cf8d83f5cc9
tengine: 2009/03/26_16:02:44 info: set_graph_functions: Setting custom graph functions
tengine: 2009/03/26_16:02:44 info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses
tengine: 2009/03/26_16:02:44 info: te_init: Starting tengine
tengine: 2009/03/26_16:02:44 info: te_connect_stonith: Attempting connection to fencing daemon...
cib: 2009/03/26_16:02:44 info: cib_null_callback: Setting cib_diff_notify callbacks for tengine: on
crmd: 2009/03/26_16:02:44 info: update_dc: Set DC to f801.haijiye (2.0)
crmd: 2009/03/26_16:02:45 info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
crmd: 2009/03/26_16:02:45 info: do_state_transition: All 1 cluster nodes responded to the join offer.
cib: 2009/03/26_16:02:45 info: sync_our_cib: Syncing CIB to all peers
crmd: 2009/03/26_16:02:45 info: update_attrd: Connecting to attrd...
attrd: 2009/03/26_16:02:45 info: attrd_local_callback: Sending full refresh
tengine: 2009/03/26_16:02:45 info: te_connect_stonith: Connected
crmd: 2009/03/26_16:02:45 info: update_dc: Set DC to f801.haijiye (2.0)
heartbeat: 2009/03/26_16:02:45 WARN: glib: TTY write timeout on (no connection or bad cable? )
heartbeat: 2009/03/26_16:02:45 info: glib: See http://linux-ha.org/FAQ#TTYtimeout for details
crmd: 2009/03/26_16:02:45 info: do_dc_join_ack: join-1: Updating node state to member for f801.haijiye
tengine: 2009/03/26_16:02:45 info: process_graph_event: Action IPaddr_192_168_1_118_monitor_0 initiated by a different transitioner
tengine: 2009/03/26_16:02:45 info: update_abort_priority: Abort priority upgraded to 1000000
tengine: 2009/03/26_16:02:45 info: update_abort_priority: 'DC Takeover' abort superceeded
tengine: 2009/03/26_16:02:45 info: process_graph_event: Action httpd_2_monitor_0 initiated by a different transitioner
tengine: 2009/03/26_16:02:45 info: process_graph_event: Action MailTo_3_monitor_0 initiated by a different transitioner
crmd: 2009/03/26_16:02:45 info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
crmd: 2009/03/26_16:02:45 info: do_state_transition: All 1 cluster nodes are eligible to run resources.
pengine: 2009/03/26_16:02:46 info: determine_online_status: Node f801.haijiye is online
pengine: 2009/03/26_16:02:46 notice: group_print: Resource Group: group_1
pengine: 2009/03/26_16:02:46 notice: native_print:    IPaddr_192_168_1_118 (heartbeat::ocf:IPaddr): Started f801.haijiye
pengine: 2009/03/26_16:02:46 notice: native_print:    httpd_2 (heartbeat:httpd): Stopped
pengine: 2009/03/26_16:02:46 notice: native_print:    MailTo_3 (heartbeat::ocf:MailTo): Stopped
pengine: 2009/03/26_16:02:46 notice: NoRoleChange: Leave resource IPaddr_192_168_1_118 (f801.haijiye)
pengine: 2009/03/26_16:02:46 notice: RecurringOp: f801.haijiye    IPaddr_192_168_1_118_monitor_5000
pengine: 2009/03/26_16:02:46 notice: StartRsc:  f801.haijiye Start httpd_2
pengine: 2009/03/26_16:02:46 notice: RecurringOp: f801.haijiye    httpd_2_monitor_120000
pengine: 2009/03/26_16:02:46 notice: StartRsc:  f801.haijiye Start MailTo_3
pengine: 2009/03/26_16:02:46 notice: RecurringOp: f801.haijiye    MailTo_3_monitor_120000
crmd: 2009/03/26_16:02:46 info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
tengine: 2009/03/26_16:02:46 info: unpack_graph: Unpacked transition 0: 8 actions in 8 synapses
tengine: 2009/03/26_16:02:46 info: te_pseudo_action: Pseudo action 11 fired and confirmed
tengine: 2009/03/26_16:02:46 info: send_rsc_command: Initiating action 5: IPaddr_192_168_1_118_start_0 on f801.haijiye
crmd: 2009/03/26_16:02:46 info: do_lrm_rsc_op: Performing op=IPaddr_192_168_1_118_start_0 key=5:0:112c0c00-03fb-4f6a-be94-4cf8d83f5cc9)
lrmd: 2009/03/26_16:02:46 info: rsc:IPaddr_192_168_1_118: start
pengine: 2009/03/26_16:02:46 info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-96.bz2
IPaddr: 2009/03/26_16:02:46 INFO: Using calculated nic for 192.168.1.118: eth1
IPaddr: 2009/03/26_16:02:46 INFO: Using calculated netmask for 192.168.1.118: 255.255.255.0
lrmd: 2009/03/26_16:02:47 info: Managed IPaddr_192_168_1_118:start process 1977 exited with return code 0.
crmd: 2009/03/26_16:02:47 info: process_lrm_event: LRM operation IPaddr_192_168_1_118_start_0 (call=6, rc=0) complete
tengine: 2009/03/26_16:02:47 info: match_graph_event: Action IPaddr_192_168_1_118_start_0 (5) confirmed on f801.haijiye (rc=0)
tengine: 2009/03/26_16:02:47 info: send_rsc_command: Initiating action 6: IPaddr_192_168_1_118_monitor_5000 on f801.haijiye
tengine: 2009/03/26_16:02:47 info: send_rsc_command: Initiating action 7: httpd_2_start_0 on f801.haijiye
crmd: 2009/03/26_16:02:47 info: do_lrm_rsc_op: Performing op=IPaddr_192_168_1_118_monitor_5000 key=6:0:112c0c00-03fb-4f6a-be94-4cf8d83f5cc9)
crmd: 2009/03/26_16:02:47 info: do_lrm_rsc_op: Performing op=httpd_2_start_0 key=7:0:112c0c00-03fb-4f6a-be94-4cf8d83f5cc9)
lrmd: 2009/03/26_16:02:47 info: rsc:httpd_2: start
lrmd: 2009/03/26_16:02:47 info: RA output: (httpd_2:start:stdout) Starting httpd:
lrmd: 2009/03/26_16:02:47 info: Managed IPaddr_192_168_1_118:monitor process 2038 exited with return code 0.
crmd: 2009/03/26_16:02:47 info: process_lrm_event: LRM operation IPaddr_192_168_1_118_monitor_5000 (call=7, rc=0) complete
tengine: 2009/03/26_16:02:47 info: match_graph_event: Action IPaddr_192_168_1_118_monitor_5000 (6) confirmed on f801.haijiye (rc=0)
lrmd: 2009/03/26_16:02:50 info: RA output: (httpd_2:start:stderr) httpd: apr_sockaddr_info_get() failed for f801.haijiye

lrmd: 2009/03/26_16:02:50 info: RA output: (httpd_2:start:stderr) httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName

lrmd: 2009/03/26_16:02:52 info: RA output: (httpd_2:start:stdout) [
lrmd: 2009/03/26_16:02:52 info: RA output: (httpd_2:start:stdout) OK
lrmd: 2009/03/26_16:02:52 info: RA output: (httpd_2:start:stdout) ]
lrmd: 2009/03/26_16:02:52 info: RA output: (httpd_2:start:stdout)
lrmd: 2009/03/26_16:02:52 info: RA output: (httpd_2:start:stdout)

lrmd: 2009/03/26_16:02:52 info: Managed httpd_2:start process 2039 exited with return code 0.
crmd: 2009/03/26_16:02:52 info: process_lrm_event: LRM operation httpd_2_start_0 (call=8, rc=0) complete
tengine: 2009/03/26_16:02:52 info: match_graph_event: Action httpd_2_start_0 (7) confirmed on f801.haijiye (rc=0)
tengine: 2009/03/26_16:02:52 info: send_rsc_command: Initiating action 8: httpd_2_monitor_120000 on f801.haijiye
crmd: 2009/03/26_16:02:52 info: do_lrm_rsc_op: Performing op=httpd_2_monitor_120000 key=8:0:112c0c00-03fb-4f6a-be94-4cf8d83f5cc9)
tengine: 2009/03/26_16:02:52 info: send_rsc_command: Initiating action 9: MailTo_3_start_0 on f801.haijiye
crmd: 2009/03/26_16:02:52 info: do_lrm_rsc_op: Performing op=MailTo_3_start_0 key=9:0:112c0c00-03fb-4f6a-be94-4cf8d83f5cc9)
lrmd: 2009/03/26_16:02:52 info: rsc:MailTo_3: start
lrmd: 2009/03/26_16:02:53 info: Managed MailTo_3:start process 2060 exited with return code 0.
crmd: 2009/03/26_16:02:53 info: process_lrm_event: LRM operation MailTo_3_start_0 (call=10, rc=0) complete
lrmd: 2009/03/26_16:02:53 WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 230 ms (> 100 ms) before being called (GSource: 0x93fd1f0)
lrmd: 2009/03/26_16:02:53 info: G_SIG_dispatch: started at 429454334 should have started at 429454311
lrmd: 2009/03/26_16:02:53 info: Managed httpd_2:monitor process 2059 exited with return code 0.
crmd: 2009/03/26_16:02:53 info: process_lrm_event: LRM operation httpd_2_monitor_120000 (call=9, rc=0) complete
lrmd: 2009/03/26_16:02:53 info: Managed IPaddr_192_168_1_118:monitor process 2065 exited with return code 0.
tengine: 2009/03/26_16:02:53 info: match_graph_event: Action MailTo_3_start_0 (9) confirmed on f801.haijiye (rc=0)
tengine: 2009/03/26_16:02:53 info: te_pseudo_action: Pseudo action 12 fired and confirmed
tengine: 2009/03/26_16:02:53 info: send_rsc_command: Initiating action 10: MailTo_3_monitor_120000 on f801.haijiye
crmd: 2009/03/26_16:02:53 info: do_lrm_rsc_op: Performing op=MailTo_3_monitor_120000 key=10:0:112c0c00-03fb-4f6a-be94-4cf8d83f5cc9)
tengine: 2009/03/26_16:02:54 info: match_graph_event: Action httpd_2_monitor_120000 (8) confirmed on f801.haijiye (rc=0)
lrmd: 2009/03/26_16:02:54 info: Managed MailTo_3:monitor process 2100 exited with return code 0.
crmd: 2009/03/26_16:02:54 info: process_lrm_event: LRM operation MailTo_3_monitor_120000 (call=11, rc=0) complete
tengine: 2009/03/26_16:02:54 info: match_graph_event: Action MailTo_3_monitor_120000 (10) confirmed on f801.haijiye (rc=0)
tengine: 2009/03/26_16:02:54 info: run_graph: Transition 0: (Complete=8, Pending=0, Fired=0, Skipped=0, Incomplete=0)
tengine: 2009/03/26_16:02:54 info: notify_crmd: Transition 0 status: te_complete - <null>
crmd: 2009/03/26_16:02:54 info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
heartbeat: 2009/03/26_16:02:55 info: Link f802.haijiye:/dev/ttyS0 dead.
heartbeat: 2009/03/26_16:02:59 WARN: node f802.haijiye: is dead
heartbeat: 2009/03/26_16:02:59 info: Link f802.haijiye:eth0 dead.
crmd: 2009/03/26_16:02:59 notice: crmd_ha_status_callback: Status update: Node f802.haijiye now has status
lrmd: 2009/03/26_16:02:59 info: Managed IPaddr_192_168_1_118:monitor process 2119 exited with return code 0.
lrmd: 2009/03/26_16:03:04 info: Managed IPaddr_192_168_1_118:monitor process 2133 exited with return code 0.

Tags:

[火星人 ] 關於heartbeat的郵件通知問題已經有820次圍觀

本文地址：http://coctec.com/docs/service/show-post-6543.html

關於heartbeat的郵件通知問題