Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
SUSE:SLE-15-SP4:Update
pacemaker
0001-Fix-controller-Delay-join-finalization-if-...
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File 0001-Fix-controller-Delay-join-finalization-if-a-transiti.patch of Package pacemaker
From 39a497edcc01d0ab67c6da308cc7dd7d4bc96011 Mon Sep 17 00:00:00 2001 From: Reid Wahl <nrwahl@protonmail.com> Date: Wed, 22 Mar 2023 02:28:19 -0700 Subject: [PATCH] Fix: controller: Delay join finalization if a transition is in progress While a transition is in progress, CIB updates may be generated and received rapidly as resource actions complete. This can cause problems if it happens during a controller join sequence. The last two major steps of the join sequence are: 1. The client sends XML containing its resource history, obtained from its local executor, to the DC in do_cl_join_finalize_respond(). 2. The DC receives this client resource history in do_dc_join_ack(), deletes the client's node state in the CIB, and writes the received client resource history to the CIB as the client's new node state. However, suppose a resource action completes after the client generates its resource history XML. Further suppose that action is recorded in the CIB and is received by the DC's CIB manager before the DC updates the client's node state. In this case, the newer history item is deleted from the DC's CIB. The DC updated the client's node state based on the history XML that the client fetched earlier. Now, the DC does not know that the action completed on the client. This can result in an action improperly being scheduled a second time. Specifically, a user reported an issue in which a migrate_to operation was run a second time after completing. The second time, the migrate_to operation failed because the resource was no longer physically present on the source node. The do_dc_join_finalize() function in controld_join_dc.c contains a block that delays join finalization while a transition is in progress. If the R_IN_TRANSITION bit is set in the input register, the controller stalls. The problem is that nothing sets this bit. It was added by commit a1c1b340 in 2005, and the line of code that set the bit was mistakenly removed by commit feef7987 in 2008. We can tell that removing the bit-setting line of code was a mistake, because the code that clears the bit was kept (and moved elsewhere), while the code that checks the bit was unmodified. We do want to delay finalization if a transition is in progress. However, the R_IN_TRANSITION bit itself is no longer necessary: controld_globals.transition_graph->complete fulfills the same role, so we can use that and remove R_IN_TRANSITION. The complete flag is initialized to false (via calloc()) when a new graph is created during do_te_invoke(). It's set to true by the time we reach notify_crmd() (usually by te_graph_trigger()), which is where we previously cleared the R_IN_TRANSITION bit. This simple fix appears to resolve the known race conditions with client history fetching versus CIB updates during a join sequence. Closes T375 Signed-off-by: Reid Wahl <nrwahl@protonmail.com> --- daemons/controld/controld_fsa.h | 2 -- daemons/controld/controld_join_dc.c | 2 +- daemons/controld/controld_te_actions.c | 1 - 3 files changed, 1 insertion(+), 4 deletions(-) Index: pacemaker-2.1.2+20211124.ada5c3b36/daemons/controld/controld_fsa.h =================================================================== --- pacemaker-2.1.2+20211124.ada5c3b36.orig/daemons/controld/controld_fsa.h +++ pacemaker-2.1.2+20211124.ada5c3b36/daemons/controld/controld_fsa.h @@ -418,8 +418,6 @@ enum crmd_fsa_input { response? if so perhaps we shouldn't stop yet */ -# define R_IN_TRANSITION 0x10000000ULL - /* */ # define R_SENT_RSC_STOP 0x20000000ULL /* Have we sent a stop action to all * resources in preparation for * shutting down */ Index: pacemaker-2.1.2+20211124.ada5c3b36/daemons/controld/controld_join_dc.c =================================================================== --- pacemaker-2.1.2+20211124.ada5c3b36.orig/daemons/controld/controld_join_dc.c +++ pacemaker-2.1.2+20211124.ada5c3b36/daemons/controld/controld_join_dc.c @@ -440,7 +440,7 @@ do_dc_join_finalize(long long action, controld_set_fsa_input_flags(R_HAVE_CIB); } - if (pcmk_is_set(fsa_input_register, R_IN_TRANSITION)) { + if (!transition_graph->complete) { crm_warn("Delaying join-%d finalization while transition in progress", current_join_id); crmd_join_phase_log(LOG_DEBUG); Index: pacemaker-2.1.2+20211124.ada5c3b36/daemons/controld/controld_te_actions.c =================================================================== --- pacemaker-2.1.2+20211124.ada5c3b36.orig/daemons/controld/controld_te_actions.c +++ pacemaker-2.1.2+20211124.ada5c3b36/daemons/controld/controld_te_actions.c @@ -630,7 +630,6 @@ notify_crmd(crm_graph_t * graph) graph->abort_reason = NULL; graph->completion_action = tg_done; - controld_clear_fsa_input_flags(R_IN_TRANSITION); if (event != I_NULL) { register_fsa_input(C_FSA_INTERNAL, event, NULL);
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor