Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
SUSE:SLE-15-SP7:GA
pacemaker.20189
bsc#1173668-0001-Fix-attrd-prevent-leftover-att...
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File bsc#1173668-0001-Fix-attrd-prevent-leftover-attributes-of-shutdown-no.patch of Package pacemaker.20189
From a1a9c54cfc451e36f66db3738fd798b7464a1239 Mon Sep 17 00:00:00 2001 From: "Gao,Yan" <ygao@suse.com> Date: Fri, 25 Sep 2020 15:47:23 +0200 Subject: [PATCH] Fix: attrd: prevent leftover attributes of shutdown node in cib This commit prevents writing out of attributes from being triggered by cib_replace event on a node that is requesting shutdown, so that it prevents leftover attributes of the shutdown node in cib. Race conditions were encountered on shutdown and startup of a node. Pacemaker is v1.1.18+, but I think the latest revision should be potentially impacted too. Node2 was shutting down. When crmd was stopped from node2, node1 erased all the transient attributes of node2 from cib: ``` Sep 21 08:33:49 [7849] node1 crmd: notice: peer_update_callback: Our peer on the DC (node2) is dead Sep 21 08:33:49 [7844] node1 cib: info: cib_process_request: Completed cib_delete operation for section //node_state[@uname='node2']/transient_attributes: OK (rc=0, origin=node1/crmd/114, version=0.649.144) ``` And node2 became the DC and did a cib_replace: ``` Sep 21 08:33:49 [7849] node1 crmd: info: update_dc: Set DC to node1 (3.1.0) Sep 21 08:33:49 [7849] node1 crmd: info: do_dc_join_finalize: join-2: Syncing our CIB to the rest of the cluster Sep 21 08:33:49 [7844] node1 cib: info: cib_process_replace: Replaced 0.649.144 with 0.649.144 from node1 Sep 21 08:33:49 [7844] node1 cib: info: cib_process_request: Completed cib_replace operation for section 'all': OK (rc=0, origin=node1/crmd/126, version=0.649.144) ``` Meanwhile cib and attrd daemons on node2 didn't receive SIGTERM yet and were still running. Attrd reacted to the cib_replace and wrote all the node attributes back into cib again including its "shutdown" attribute: ``` Sep 21 08:33:49 [4444] node2 attrd: notice: attrd_cib_replaced_cb: Updating all attributes after cib_refresh_notify event Sep 21 08:33:49 [4441] node2 cib: info: cib_perform_op: ++ /cib/status/node_state[@id='14548837']/transient_attributes[@id='14548837']/instance_attributes[@id='status-14548837']: <nvpair id="status-14548837-shutdown" name="shutdown" value="1600677133"/> Sep 21 08:33:49 [4444] node2 attrd: info: attrd_cib_callback: Update 1103 for shutdown[node2]=1600677133: OK (0) ``` Later, attrd received SIGTERM and shut down: ``` Sep 21 08:33:49 [4439] node2 pacemakerd: notice: stop_child: Stopping attrd | sent signal 15 to process 4444 Sep 21 08:33:49 [4444] node2 attrd: notice: crm_signal_dinode1tch: Caught 'Terminated' signal | 15 (invoking handler) Sep 21 08:33:49 [4444] node2 attrd: info: main: Shutting down attribute manager ``` When node2 started again, it cleared its node attributes from cib, but cib of node1 didn't join yet by then: ``` Sep 21 08:42:46 [4844] node2 attrd: info: attrd_erase_attrs: Clearing transient attributes from CIB | xpath=//node_state[@uname='node2']/transient_attributes Sep 21 08:42:46 [4841] node2 cib: info: cib_process_request: Completed cib_delete operation for section //node_state[@uname='node2']/transient_attributes: OK (rc=0, origin=node2/attrd/2, version=0.649.0) ``` Then cib of node1 joined: ``` Sep 21 08:42:47 [4841] node2 cib: info: pcmk_cpg_membership: Node 14548836 joined group cib (counter=1.0, pid=0, unchecked for rivals) ``` Soon the node attributes of node2 got back into cib again by syncing of cib from node1: ``` Sep 21 08:42:47 [4846] node2 crmd: info: update_dc: Set DC to node1 (3.1.0) Sep 21 08:42:48 [4841] node2 cib: info: cib_process_replace: Replaced 0.649.90 with 0.649.410 from node1 Sep 21 08:42:48 [4841] node2 cib: info: cib_perform_op: ++ /cib/status/node_state[@id='14548837']/transient_attributes[@id='14548837']/instance_attributes[@id='status-14548837']: <nvpair id="status-14548837-shutdown" name="shutdown" value="1600677133"/> ``` So the "leftover" "shutdown" attribute of node2 caused it to shut down again: ``` Sep 21 08:42:51 [4846] node2 crmd: error: handle_request: We didn't ask to be shut down, yet our DC is telling us to. ``` --- daemons/attrd/attrd_commands.c | 11 +++++++++++ daemons/attrd/attrd_utils.c | 35 ++++++++++++++++++++++++++++++++- daemons/attrd/pacemaker-attrd.h | 3 +++ 3 files changed, 48 insertions(+), 1 deletion(-) diff --git a/daemons/attrd/attrd_commands.c b/daemons/attrd/attrd_commands.c index 96d992a20..deff82f47 100644 --- a/daemons/attrd/attrd_commands.c +++ b/daemons/attrd/attrd_commands.c @@ -886,6 +886,17 @@ attrd_peer_update(crm_node_t *peer, xmlNode *xml, const char *host, bool filter) v->current = (value? strdup(value) : NULL); a->changed = TRUE; + if (pcmk__str_eq(host, attrd_cluster->uname, pcmk__str_casei) + && pcmk__str_eq(attr, XML_CIB_ATTR_SHUTDOWN, pcmk__str_none)) { + + if (!pcmk__str_eq(value, "0", pcmk__str_null_matches)) { + attrd_set_requesting_shutdown(); + + } else { + attrd_clear_requesting_shutdown(); + } + } + // Write out new value or start dampening timer if (a->timeout_ms && a->timer) { crm_trace("Delayed write out (%dms) for %s", a->timeout_ms, attr); diff --git a/daemons/attrd/attrd_utils.c b/daemons/attrd/attrd_utils.c index 3b8446a9f..b7957fdd7 100644 --- a/daemons/attrd/attrd_utils.c +++ b/daemons/attrd/attrd_utils.c @@ -24,9 +24,42 @@ cib_t *the_cib = NULL; +static bool requesting_shutdown = FALSE; static bool shutting_down = FALSE; static GMainLoop *mloop = NULL; +/*! + * \internal + * \brief Set requesting_shutdown state + */ +void +attrd_set_requesting_shutdown() +{ + requesting_shutdown = TRUE; +} + +/*! + * \internal + * \brief Clear requesting_shutdown state + */ +void +attrd_clear_requesting_shutdown() +{ + requesting_shutdown = FALSE; +} + +/*! + * \internal + * \brief Check whether we're currently requesting shutdown + * + * \return TRUE if requesting shutdown, FALSE otherwise + */ +gboolean +attrd_requesting_shutdown() +{ + return requesting_shutdown; +} + /*! * \internal * \brief Check whether we're currently shutting down @@ -191,7 +224,7 @@ attrd_cib_disconnect() void attrd_cib_replaced_cb(const char *event, xmlNode * msg) { - if (attrd_shutting_down()) { + if (attrd_requesting_shutdown() || attrd_shutting_down()) { return; } diff --git a/daemons/attrd/pacemaker-attrd.h b/daemons/attrd/pacemaker-attrd.h index b82455564..15e08f55e 100644 --- a/daemons/attrd/pacemaker-attrd.h +++ b/daemons/attrd/pacemaker-attrd.h @@ -20,6 +20,9 @@ void attrd_init_mainloop(void); void attrd_run_mainloop(void); +void attrd_set_requesting_shutdown(void); +void attrd_clear_requesting_shutdown(void); +gboolean attrd_requesting_shutdown(void); gboolean attrd_shutting_down(void); void attrd_shutdown(int nsig); void attrd_init_ipc(qb_ipcs_service_t **ipcs, -- 2.26.2
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor