yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #96256
[Bug 2119126] Re: [Caracal][Offline][Masakari/Nova] - Instance-HA partially working
im not seeing anything that points to a nova bug here.
if there is one plase state it explcitly
the change id api beviohr in nova in 2.95 simple changes if the vm will be stop after the evacuate or if
it will be restored to the state it was in before.
masikari can choose its desired behvior.
the behvior was changed because evacuating to stopped is generally safer and its not alwasy possibel to start a vm
after its evacuated if ti has encypted cinder volumes for example. so evacuate to stopped work in more cases.
** Changed in: nova
Status: In Progress => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2119126
Title:
[Caracal][Offline][Masakari/Nova] - Instance-HA partially working
Status in masakari:
In Progress
Status in OpenStack Compute (nova):
Invalid
Bug description:
++++++++++++
ENV Details:
++++++++++++
OSA Version: OFFLINE Caracal 2024.1
OS: Ubuntu-22.04
Tool: OpenStack-Ansible
Virtual setup
++++++
Issue:
++++++
* Masakari is installed and running.
* Created a instance and enabled ``HA_Enabled=True` properties.
* Instance is tunning on the source node `cmpt001` destination node is ``offline20241``
* On source node Started instance-ha operation by :- ### or if there is a better way let me know
```
# Source compute
root@cmpt001:~# virsh list --all
Id Name State
-----------------------------------
4 instance-0000001e running
root@cmpt001:~#
```
systemctl stop nova-compute.service
systemctl stop pacemaker.service
kill -9 $(ps -eaf | grep instance-0000001e | awk '{print $2}' | head -n 1)
systemctl stop corosync.service
```
```
root@cmpt001:~#
root@cmpt001:~# virsh list --all
Id Name State
------------------------------------
- instance-0000001e shut off
root@cmpt001:~#
```
* Evacuation started. it took 15 sec to migrate the instance.
* Instance evacuation is happeneing, however the instance on the destination node is showing the Status as ShutOff
#Source compute logs
```
Jul3014: 50: 25cmpt001.ct.lanmasakari-instancemonitor[
1344
]: 2025-07-3014: 50: 25.5551344INFOmasakarimonitors.instancemonitor.libvirt_handler.callback[
-
]LibvirtEvent: type=VM,
hostname=cmpt001.ct.lan,
uuid=969ca417-9111-42de-836c-eb883e52f131,
time=2025-07-3014: 50: 25.552570,
event_id=LIFECYCLE,
detail=STOPPED_FAILED)```
```
```
Jul3014: 50: 25cmpt001.ct.lanmasakari-instancemonitor[
1344
]: 2025-07-3014: 50: 25.5571344INFOmasakarimonitors.ha.masakari[
-
]Sendanotification.{
'notification': {
'type': 'VM',
'hostname': 'cmpt001.ct.lan',
'generated_time': datetime.datetime(2025,
7,
30,
14,
50,
25,
552570),
'payload': {
'event': 'LIFECYCLE',
'instance_uuid': '969ca417-9111-42de-836c-eb883e52f131',
'vir_domain_event': 'STOPPED_FAILED'
}
}
}
```
Jul3014: 50: 25cmpt001.ct.lanmasakari-instancemonitor[
1344
]: 2025-07-3014: 50: 25.7861344INFOmasakarimonitors.ha.masakari[
-
]Response: openstack.instance_ha.v1.notification.Notification(type=VM,
hostname=cmpt001.ct.lan,
generated_time=2025-07-30T14: 50: 25.552570,
payload={
'event': 'LIFECYCLE',
'instance_uuid': '969ca417-9111-42de-836c-eb883e52f131',
'vir_domain_event': 'STOPPED_FAILED'
},
id=1,
notification_uuid=3be2b8e5-ed78-4085-8898-54b3fd5a9f78,
source_host_uuid=36cd2bdb-29e4-4cc4-9b10-a933e2608edc,
status=new,
created_at=2025-07-30T14: 50: 25.000000,
updated_at=None,
location=Munch({
'cloud': '192.168.131.200',
'region_name': 'RegionOne',
'zone': None,
'project': Munch({
'id': 'a5aebb0fbfc64ac49e3ea028e4f740dc',
'name': None,
'domain_id': None,
'domain_name': None
})
}))
```
# Destination compute + controller logs
```
root@offline20241:~# openstack server show 969ca417-9111-42de-836c-eb883e52f131 --fit
+-------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | offline20241.ct.lan |
| OS-EXT-SRV-ATTR:hostname | nc-masak-002 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | offline20241.ct.lan |
| OS-EXT-SRV-ATTR:instance_name | instance-0000001e |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-01be9wmi |
| OS-EXT-SRV-ATTR:root_device_name | /dev/vda |
| OS-EXT-SRV-ATTR:user_data | None |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2025-07-30T14:52:30.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | provider141=192.168.141.91 |
| config_drive | |
| created | 2025-07-30T14:46:27Z |
| description | nc-masak-002 |
| flavor | description=, disk='0', ephemeral='0', , id='m1.tiny', is_disabled=, is_public='True', location=, name='m1.tiny', original_name='m1.tiny', ram='512', rxtx_factor=, swap='0', vcpus='1' |
| hostId | 1cb881e0cf1cb53f01ef64ad3b04badf5e418abc5a42566593853d4a |
| host_status | UP |
| id | 969ca417-9111-42de-836c-eb883e52f131 |
| image | N/A (booted from volume) |
| key_name | None |
| locked | False |
| locked_reason | None |
| name | nc-masak-002 |
| progress | 0 |
| project_id | 2f5a2a06638942cbaaeeb466b2e17e10 |
| properties | HA_Enabled='True' |
| security_groups | name='secgroup1' |
| server_groups | [] |
| status | ACTIVE |
| tags | |
| trusted_image_certificates | None |
| updated | 2025-07-30T15:05:57Z |
| user_id | cfc72fc0bc3d4ba09ef9756ea2fb6395 |
| volumes_attached | delete_on_termination='False', id='b82238fb-fe80-41ac-b426-33de21eb6756' |
+-------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
root@offline20241:~# virsh list --all
Id Name State
-----------------------------------
16 instance-0000001e running
root@offline20241:~#
```
### On the source node the vm was showing in virsh list as shutoff, after reboot source compoute node, virsh stale entry got removed, however the instance on the destination went to the stutoff state. both in server list and virsh list.
```
Jul 30 15:04:30 offline20241.ct.lan nova-compute[2108454]: 2025-07-30 15:04:30.670 2108454 DEBUG oslo_concurrency.lockutils [None req-a919b840-363e-42b5-8ff3-e2ecd871c94b - - - - - -] Acquiring lock "969ca417-9111-42de-836c-eb883e52f131" by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" inner /openstack/venvs/nova-29.2.3.dev1/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:402
Jul 30 15:04:30 offline20241.ct.lan nova-compute[2108454]: 2025-07-30 15:04:30.671 2108454 DEBUG oslo_concurrency.lockutils [None req-a919b840-363e-42b5-8ff3-e2ecd871c94b - - - - - -] Lock "969ca417-9111-42de-836c-eb883e52f131" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 0.001s inner /openstack/venvs/nova-29.2.3.dev1/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:407
Jul 30 15:04:30 offline20241.ct.lan nova-compute[2108454]: 2025-07-30 15:04:30.714 2108454 INFO nova.compute.manager [None req-a919b840-363e-42b5-8ff3-e2ecd871c94b - - - - - -] [instance: 969ca417-9111-42de-836c-eb883e52f131] During _sync_instance_power_state the DB power_state (0) does not match the vm_power_state from the hypervisor (1). Updating power_state in the DB to match the hypervisor.
Jul 30 15:04:30 offline20241.ct.lan neutron-server[3870]: 2025-07-30 15:04:30.795 3870 WARNING neutron.db.agents_db [None req-c0ba4060-e268-4940-9e06-6366966d1a48 - - - - - -] Agent healthcheck: found 4 dead agents out of 9:
Type Last heartbeat host
DHCP agent 2025-07-23 12:31:59 offline20241
L3 agent 2025-07-23 12:32:16 offline20241
Open vSwitch agent 2025-07-23 12:32:16 offline20241
Metadata agent 2025-07-23 12:31:59 offline20241
Jul 30 15:04:30 offline20241.ct.lan nova-compute[2108454]: 2025-07-30 15:04:30.817 2108454 WARNING nova.compute.manager [None req-a919b840-363e-42b5-8ff3-e2ecd871c94b - - - - - -] [instance: 969ca417-9111-42de-836c-eb883e52f131] Instance is not stopped. Calling the stop API. Current vm_state: stopped, current task_state: None, original DB power_state: 0, current VM power_state: 1
Jul 30 15:04:30 offline20241.ct.lan nova-compute[2108454]: 2025-07-30 15:04:30.818 2108454 DEBUG nova.compute.api [None req-a919b840-363e-42b5-8ff3-e2ecd871c94b - - - - - -] [instance: 969ca417-9111-42de-836c-eb883e52f131] Going to try to stop instance force_stop /openstack/venvs/nova-29.2.3.dev1/lib/python3.10/site-packages/nova/compute/api.py:2768
Jul 30 15:04:30 offline20241.ct.lan apache2[1957602]: 192.168.131.60 - - [30/Jul/2025:15:04:30 +0000] "POST /v3/auth/tokens HTTP/1.1" 201 9818 "-" "openstacksdk/3.0.0 keystoneauth1/5.6.1 python-requests/2.31.0 CPython/3.10.12"
Jul 30 15:04:30 offline20241.ct.lan haproxy[370326]: 192.168.131.200:55974 [30/Jul/2025:15:04:30.373] keystone_service-front-2 keystone_service-back/offline20241 0/0/0/550/550 201 9770 - - ---- 129/1/0/0/0 0/0 "POST /v3/auth/tokens HTTP/1.1"
Jul 30 15:04:30 offline20241.ct.lan nova-compute[2108454]: 2025-07-30 15:04:30.945 2108454 DEBUG oslo_concurrency.lockutils [None req-a919b840-363e-42b5-8ff3-e2ecd871c94b - - - - - -] Lock "969ca417-9111-42de-836c-eb883e52f131" "released" by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: held 0.274s inner /openstack/venvs/nova-29.2.3.dev1/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:421
Jul 30 15:04:30 offline20241.ct.lan nova-compute[2108454]: 2025-07-30 15:04:30.975 2108454 DEBUG oslo_concurrency.lockutils [None req-f69d5ab0-a27a-4006-8473-b4a9f670753d - - - - - -] Acquiring lock "969ca417-9111-42de-836c-eb883e52f131" by "nova.compute.manager.ComputeManager.stop_instance.<locals>.do_stop_instance" inner /openstack/venvs/nova-29.2.3.dev1/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:402
Jul 30 15:04:30 offline20241.ct.lan nova-compute[2108454]: 2025-07-30 15:04:30.976 2108454 DEBUG oslo_concurrency.lockutils [None req-f69d5ab0-a27a-4006-8473-b4a9f670753d - - - - - -] Lock "969ca417-9111-42de-836c-eb883e52f131" acquired by "nova.compute.manager.ComputeManager.stop_instance.<locals>.do_stop_instance" :: waited 0.001s inner /openstack/venvs/nova-29.2.3.dev1/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:407
Jul 30 15:04:30 offline20241.ct.lan nova-compute[2108454]: 2025-07-30 15:04:30.977 2108454 DEBUG nova.compute.manager [None req-f69d5ab0-a27a-4006-8473-b4a9f670753d - - - - - -] [instance: 969ca417-9111-42de-836c-eb883e52f131] Checking state _get_power_state /openstack/venvs/nova-29.2.3.dev1/lib/python3.10/site-packages/nova/compute/manager.py:1782
Jul 30 15:04:30 offline20241.ct.lan nova-compute[2108454]: 2025-07-30 15:04:30.983 2108454 DEBUG nova.compute.manager [None req-f69d5ab0-a27a-4006-8473-b4a9f670753d - - - - - -] [instance: 969ca417-9111-42de-836c-eb883e52f131] Stopping instance; current vm_state: stopped, current task_state: powering-off, current DB power_state: 1, current VM power_state: 1 do_stop_instance /openstack/venvs/nova-29.2.3.dev1/lib/python3.10/site-packages/nova/compute/manager.py:3359
```
To manage notifications about this bug go to:
https://bugs.launchpad.net/masakari/+bug/2119126/+subscriptions