Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
SUSE:SLE-15-SP7:GA
pacemaker.21299
bsc#1173668-0001-Fix-attrd-prevent-leftover-att...
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File bsc#1173668-0001-Fix-attrd-prevent-leftover-attributes-of-shutdown-no.patch of Package pacemaker.21299
From a1a9c54cfc451e36f66db3738fd798b7464a1239 Mon Sep 17 00:00:00 2001 From: "Gao,Yan" <ygao@suse.com> Date: Fri, 25 Sep 2020 15:47:23 +0200 Subject: [PATCH] Fix: attrd: prevent leftover attributes of shutdown node in cib This commit prevents writing out of attributes from being triggered by cib_replace event on a node that is requesting shutdown, so that it prevents leftover attributes of the shutdown node in cib. Race conditions were encountered on shutdown and startup of a node. Pacemaker is v1.1.18+, but I think the latest revision should be potentially impacted too. Node2 was shutting down. When crmd was stopped from node2, node1 erased all the transient attributes of node2 from cib: ``` Sep 21 08:33:49 [7849] node1 crmd: notice: peer_update_callback: Our peer on the DC (node2) is dead Sep 21 08:33:49 [7844] node1 cib: info: cib_process_request: Completed cib_delete operation for section //node_state[@uname='node2']/transient_attributes: OK (rc=0, origin=node1/crmd/114, version=0.649.144) ``` And node2 became the DC and did a cib_replace: ``` Sep 21 08:33:49 [7849] node1 crmd: info: update_dc: Set DC to node1 (3.1.0) Sep 21 08:33:49 [7849] node1 crmd: info: do_dc_join_finalize: join-2: Syncing our CIB to the rest of the cluster Sep 21 08:33:49 [7844] node1 cib: info: cib_process_replace: Replaced 0.649.144 with 0.649.144 from node1 Sep 21 08:33:49 [7844] node1 cib: info: cib_process_request: Completed cib_replace operation for section 'all': OK (rc=0, origin=node1/crmd/126, version=0.649.144) ``` Meanwhile cib and attrd daemons on node2 didn't receive SIGTERM yet and were still running. Attrd reacted to the cib_replace and wrote all the node attributes back into cib again including its "shutdown" attribute: ``` Sep 21 08:33:49 [4444] node2 attrd: notice: attrd_cib_replaced_cb: Updating all attributes after cib_refresh_notify event Sep 21 08:33:49 [4441] node2 cib: info: cib_perform_op: ++ /cib/status/node_state[@id='14548837']/transient_attributes[@id='14548837']/instance_attributes[@id='status-14548837']: <nvpair id="status-14548837-shutdown" name="shutdown" value="1600677133"/> Sep 21 08:33:49 [4444] node2 attrd: info: attrd_cib_callback: Update 1103 for shutdown[node2]=1600677133: OK (0) ``` Later, attrd received SIGTERM and shut down: ``` Sep 21 08:33:49 [4439] node2 pacemakerd: notice: stop_child: Stopping attrd | sent signal 15 to process 4444 Sep 21 08:33:49 [4444] node2 attrd: notice: crm_signal_dinode1tch: Caught 'Terminated' signal | 15 (invoking handler) Sep 21 08:33:49 [4444] node2 attrd: info: main: Shutting down attribute manager ``` When node2 started again, it cleared its node attributes from cib, but cib of node1 didn't join yet by then: ``` Sep 21 08:42:46 [4844] node2 attrd: info: attrd_erase_attrs: Clearing transient attributes from CIB | xpath=//node_state[@uname='node2']/transient_attributes Sep 21 08:42:46 [4841] node2 cib: info: cib_process_request: Completed cib_delete operation for section //node_state[@uname='node2']/transient_attributes: OK (rc=0, origin=node2/attrd/2, version=0.649.0) ``` Then cib of node1 joined: ``` Sep 21 08:42:47 [4841] node2 cib: info: pcmk_cpg_membership: Node 14548836 joined group cib (counter=1.0, pid=0, unchecked for rivals) ``` Soon the node attributes of node2 got back into cib again by syncing of cib from node1: ``` Sep 21 08:42:47 [4846] node2 crmd: info: update_dc: Set DC to node1 (3.1.0) Sep 21 08:42:48 [4841] node2 cib: info: cib_process_replace: Replaced 0.649.90 with 0.649.410 from node1 Sep 21 08:42:48 [4841] node2 cib: info: cib_perform_op: ++ /cib/status/node_state[@id='14548837']/transient_attributes[@id='14548837']/instance_attributes[@id='status-14548837']: <nvpair id="status-14548837-shutdown" name="shutdown" value="1600677133"/> ``` So the "leftover" "shutdown" attribute of node2 caused it to shut down again: ``` Sep 21 08:42:51 [4846] node2 crmd: error: handle_request: We didn't ask to be shut down, yet our DC is telling us to. ``` --- attrd/commands.c | 11 +++++++++++ attrd/attrd_common.c | 35 ++++++++++++++++++++++++++++++++- attrd/attrd_common.h | 3 +++ 3 files changed, 48 insertions(+), 1 deletion(-) Index: pacemaker-1.1.18+20180430.b12c320f5/attrd/commands.c =================================================================== --- pacemaker-1.1.18+20180430.b12c320f5.orig/attrd/commands.c +++ pacemaker-1.1.18+20180430.b12c320f5/attrd/commands.c @@ -904,6 +904,17 @@ attrd_peer_update(crm_node_t *peer, xmlN v->current = (value? strdup(value) : NULL); a->changed = TRUE; + if (safe_str_eq(host, attrd_cluster->uname) + && crm_str_eq(attr, XML_CIB_ATTR_SHUTDOWN, TRUE)) { + + if (value != NULL && safe_str_neq(value, "0")) { + attrd_set_requesting_shutdown(); + + } else { + attrd_clear_requesting_shutdown(); + } + } + // Write out new value or start dampening timer if (a->timer) { crm_trace("Delayed write out (%dms) for %s", a->timeout_ms, attr); Index: pacemaker-1.1.18+20180430.b12c320f5/attrd/attrd_common.c =================================================================== --- pacemaker-1.1.18+20180430.b12c320f5.orig/attrd/attrd_common.c +++ pacemaker-1.1.18+20180430.b12c320f5/attrd/attrd_common.c @@ -22,6 +22,7 @@ cib_t *the_cib = NULL; +static bool requesting_shutdown = FALSE; // volatile because attrd_shutdown() can be called for a signal static volatile bool shutting_down = FALSE; @@ -29,6 +30,38 @@ static GMainLoop *mloop = NULL; /*! * \internal + * \brief Set requesting_shutdown state + */ +void +attrd_set_requesting_shutdown() +{ + requesting_shutdown = TRUE; +} + +/*! + * \internal + * \brief Clear requesting_shutdown state + */ +void +attrd_clear_requesting_shutdown() +{ + requesting_shutdown = FALSE; +} + +/*! + * \internal + * \brief Check whether we're currently requesting shutdown + * + * \return TRUE if requesting shutdown, FALSE otherwise + */ +gboolean +attrd_requesting_shutdown() +{ + return requesting_shutdown; +} + +/*! + * \internal * \brief Check whether we're currently shutting down * * \return TRUE if shutting down, FALSE otherwise Index: pacemaker-1.1.18+20180430.b12c320f5/attrd/main.c =================================================================== --- pacemaker-1.1.18+20180430.b12c320f5.orig/attrd/main.c +++ pacemaker-1.1.18+20180430.b12c320f5/attrd/main.c @@ -97,7 +97,7 @@ attrd_cpg_destroy(gpointer unused) static void attrd_cib_replaced_cb(const char *event, xmlNode * msg) { - if (attrd_shutting_down()) { + if (attrd_requesting_shutdown() || attrd_shutting_down()) { return; } Index: pacemaker-1.1.18+20180430.b12c320f5/attrd/attrd_common.h =================================================================== --- pacemaker-1.1.18+20180430.b12c320f5.orig/attrd/attrd_common.h +++ pacemaker-1.1.18+20180430.b12c320f5/attrd/attrd_common.h @@ -16,6 +16,9 @@ void attrd_init_mainloop(void); void attrd_run_mainloop(void); +void attrd_set_requesting_shutdown(void); +void attrd_clear_requesting_shutdown(void); +gboolean attrd_requesting_shutdown(void); gboolean attrd_shutting_down(void); void attrd_shutdown(int nsig); void attrd_init_ipc(qb_ipcs_service_t **ipcs,
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor