yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1214115] Re: ipavailabilityranges race condition when allocating from same range on multiple neutron-servers

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Thierry Carrez <thierry.carrez+lp@xxxxxxxxx>
Date: Wed, 05 Mar 2014 20:08:19 -0000
Reply-to: Bug 1214115 <1214115@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Changed in: neutron
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1214115

Title:
  ipavailabilityranges race condition when allocating from same range on
  multiple neutron-servers

Status in OpenStack Neutron (virtual network service):
  Fix Released

Bug description:
  Lets say that we start with an allocation_pool_id that looks like
  this:

  +--------------------------------------+----------------+----------------+
  | allocation_pool_id                   | first_ip       | last_ip        |
  +--------------------------------------+----------------+----------------+
  | 0f175416-a378-463b-9a84-18528f396e6f | 192.168.1.10  | 192.168.1.254  |
  +--------------------------------------+----------------+----------------+

  We then allocate a few of those IPs, let's say 10-20, our pool now
  looks like this:

  +--------------------------------------+----------------+----------------+
  | allocation_pool_id                   | first_ip       | last_ip        |
  +--------------------------------------+----------------+----------------+
  | 0f175416-a378-463b-9a84-18528f396e6f | 192.168.1.20  | 192.168.1.254  |
  +--------------------------------------+----------------+----------------+

  Now, we try and free a couple of those IPs, let's say 16, 17 and 18
  now we have this in the db:

  +--------------------------------------+----------------+----------------+
  | allocation_pool_id                   | first_ip       | last_ip        |
  +--------------------------------------+----------------+----------------+
  | 0f175416-a378-463b-9a84-18528f396e6f | 192.168.1.16  | 192.168.1.18  |
  | 0f175416-a378-463b-9a84-18528f396e6f | 192.168.1.20  | 192.168.1.254  |
  +--------------------------------------+----------------+----------------+

  The race condition I'm about to describe will probably hamper the
  above operation, but that's okay. Let's just pretend for the sake of
  illustration. Now let's suppose that I have 2 neutron-servers running,
  one gets a request to allocate 192.168.1.16 and the other gets a
  request to free 192.168.1.15. Both servers are going to generate
  UPDATEs to the DB, they will look something like this:

  SERVER 1: UPDATE ipavailabilityranges SET first_ip = '192.168.1.17' WHERE first_ip = '192.168.1.16'
  SERVER 2: UPDATE ipavailabilityranges SET first_ip = '192.168.1.15' WHERE first_ip = '192.168.1.16'

  Depending on order, how busy your neutron-servers are and how busy
  your database is one of the above statements is going to fail. That's
  okay, it reports the failure up through the API, the issue we see is
  that retries also tend to fail since usually only one operation
  affecting a single row in the table ever succeeds. If you have a very
  active neutron API and lots of free and allocate requests you end up
  getting into a very unusable state where active periods for the API
  are full of errors and get bogged down and fail until activity stops.

  This is one example of the race condition. There are obviously other
  ways to trigger it if you sit down and look at the applicable piece of
  code. Some kind of concurrency management is probably in order, not
  sure what the best way to solve this would be however...

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1214115/+subscriptions