← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1524124] [NEW] unscalable database schema design

 

Public bug reported:

I have noticed the keystone SQL database schema design is not scalable.
It can hold maybe a few hundreds or maximum thousands of entries, but
beyond this, it is going to certainly create very serious efficiency
problems, both in terms storage space and query response time. Here are
the main problem points I have spotted:

i) most of the tables use primary keys of varchar(64) type: role,
domain, project, token, user, group etc., supposed to contain unique hex
identifiers. I am not exactly sure about the rationale behind this
design? If the idea is to be able to accommodate up to 16**64=10**77
distinct records, than this is clearly flawed, as the tables won't hold
more than a few thousand entries given the current length of the primary
key (and foreign keys, for those minor entity tables that refer to the
major entity).

ii) some tables have composite keys on multiple varchar(64) fields:

Create Table: CREATE TABLE `assignment` (
  `type` enum('UserProject','GroupProject','UserDomain','GroupDomain') NOT NULL,
  `actor_id` varchar(64) NOT NULL,
  `target_id` varchar(64) NOT NULL,
  `role_id` varchar(64) NOT NULL,
  `inherited` tinyint(1) NOT NULL,
  PRIMARY KEY (`type`,`actor_id`,`target_id`,`role_id`),
  KEY `ix_actor_id` (`actor_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

iii) some tables have unique keys defined on varchar(255) columns:

Create Table: CREATE TABLE `role` (
  `id` varchar(64) NOT NULL,
  `name` varchar(255) NOT NULL,
  `extra` text,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

iv) the generated public id for (user,domain) entities is currently 64
hex chars, while only 32 hex chars are needed to ensure uniqueness up to
16**16=2**64=10**19 entries, which should be more than sufficient for
any practical installation.

In order to remedy these problems, I propose the following improvements:

i) replace the varchar(64) hex primary key by an auto-incremented
integer(4) column. This will hold up to 4 billion records and will
greatly reduce the  storage requirements and improve query performance.

ii) reduce the generated public id for (user, domain) entities to 32 hex
chars, stored in binary form as two bigint(8) columns.

iii) reduce the "name" field length to more manageable length or reduce
index size using a hash function.

** Affects: keystone
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1524124

Title:
  unscalable database schema design

Status in OpenStack Identity (keystone):
  New

Bug description:
  I have noticed the keystone SQL database schema design is not
  scalable. It can hold maybe a few hundreds or maximum thousands of
  entries, but beyond this, it is going to certainly create very serious
  efficiency problems, both in terms storage space and query response
  time. Here are the main problem points I have spotted:

  i) most of the tables use primary keys of varchar(64) type: role,
  domain, project, token, user, group etc., supposed to contain unique
  hex identifiers. I am not exactly sure about the rationale behind this
  design? If the idea is to be able to accommodate up to 16**64=10**77
  distinct records, than this is clearly flawed, as the tables won't
  hold more than a few thousand entries given the current length of the
  primary key (and foreign keys, for those minor entity tables that
  refer to the major entity).

  ii) some tables have composite keys on multiple varchar(64) fields:

  Create Table: CREATE TABLE `assignment` (
    `type` enum('UserProject','GroupProject','UserDomain','GroupDomain') NOT NULL,
    `actor_id` varchar(64) NOT NULL,
    `target_id` varchar(64) NOT NULL,
    `role_id` varchar(64) NOT NULL,
    `inherited` tinyint(1) NOT NULL,
    PRIMARY KEY (`type`,`actor_id`,`target_id`,`role_id`),
    KEY `ix_actor_id` (`actor_id`)
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8
  1 row in set (0.00 sec)

  iii) some tables have unique keys defined on varchar(255) columns:

  Create Table: CREATE TABLE `role` (
    `id` varchar(64) NOT NULL,
    `name` varchar(255) NOT NULL,
    `extra` text,
    PRIMARY KEY (`id`),
    UNIQUE KEY `name` (`name`)
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8

  iv) the generated public id for (user,domain) entities is currently 64
  hex chars, while only 32 hex chars are needed to ensure uniqueness up
  to 16**16=2**64=10**19 entries, which should be more than sufficient
  for any practical installation.

  In order to remedy these problems, I propose the following
  improvements:

  i) replace the varchar(64) hex primary key by an auto-incremented
  integer(4) column. This will hold up to 4 billion records and will
  greatly reduce the  storage requirements and improve query
  performance.

  ii) reduce the generated public id for (user, domain) entities to 32
  hex chars, stored in binary form as two bigint(8) columns.

  iii) reduce the "name" field length to more manageable length or
  reduce index size using a hash function.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1524124/+subscriptions


Follow ups