← Back to team overview

maria-developers team mailing list archive

Updated (by Monty): Add support for google protocol buffers (34)

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add support for google protocol buffers
CREATION DATE..: Tue, 21 Jul 2009, 21:11
SUPERVISOR.....: Monty
IMPLEMENTOR....: 
COPIES TO......: 
CATEGORY.......: Server-Sprint
TASK ID........: 34 (http://askmonty.org/worklog/?tid=34)
VERSION........: WorkLog-3.4
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Monty - Thu, 11 Feb 2010, 19:59)=-=-
High Level Description modified.
--- /tmp/wklog.34.old.17915     2010-02-11 17:59:17.000000000 +0000
+++ /tmp/wklog.34.new.17915     2010-02-11 17:59:17.000000000 +0000
@@ -1,5 +1,21 @@
-Add support for Google Protocol Buffers (further GPB). It should be possible 
-to have columns that store GPB-encoded data, as well as use SQL constructs to
+Add support for dynamic columns:
+
+- A column that can hold information from many columns 
+- One can instantly add or remove column data
+
+This is a useful feature for any store type of application, where you want to
+store different type of information for different kind of items.
+
+For example, for shoes you want to store: material, size, colour, maker
+For a computer you want to store ram, hard disk size etc...
+
+In a normal 'relational' system you would need to a table for each type.
+With dynamic columns you have all common items as fixed fields (like
+product_code, manufacturer, price) and the rest stored in a dynamic column.
+
+The proposed idea is to store the dynamic information in a blob in
+Google Protocol Buffers (further GPB) format and use SQL constructs to
 extract parts of GPB data for use in select list, for filtering, and so forth.
+
 Any support for indexing GPB data is outside of scope of this WL entry.
 

-=-=(Knielsen - Fri, 22 Jan 2010, 11:38)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.29965     2010-01-22 11:38:57.000000000 +0200
+++ /tmp/wklog.34.new.29965     2010-01-22 11:38:57.000000000 +0200
@@ -2,3 +2,12 @@
   and a parser for text form of .proto file which then exposes the parsed
   file via standard GPB message navigation API.
 
+* We should have both server-side support and client-side support (client side
+  means functions in libmysqlclient so that user can select the full BLOB and
+  extract fields in the application).
+
+* Add some kind of header to the GPB blob to support versioning and future
+  extensibility.
+
+* Add complete syntax description (update, add, drop, exists, ...).
+

-=-=(Psergey - Tue, 21 Jul 2009, 21:13)=-=-
Low Level Design modified.
--- /tmp/wklog.34.old.6462      2009-07-21 21:13:13.000000000 +0300
+++ /tmp/wklog.34.new.6462      2009-07-21 21:13:13.000000000 +0300
@@ -1 +1,4 @@
+* GPB tarball contains a protocol definition for .proto file structure itself
+  and a parser for text form of .proto file which then exposes the parsed
+  file via standard GPB message navigation API.
 

-=-=(Psergey - Tue, 21 Jul 2009, 21:12)=-=-
High-Level Specification modified.
--- /tmp/wklog.34.old.6399      2009-07-21 21:12:23.000000000 +0300
+++ /tmp/wklog.34.new.6399      2009-07-21 21:12:23.000000000 +0300
@@ -1 +1,78 @@
 
+<contents>
+1. GPB Encoding overview
+2. GPB in an SQL database
+2.1 Informing server about GPB field names and types
+2.2 Addressing GPB fields
+2.2.1 Option1: SQL Function
+2.2.2 Option2: SQL columns
+</contents>
+
+
+1. GPB Encoding overview
+========================
+
+GBB is a compact encoding for structured and typed data. A unit of GPB data
+(it is called message) is only partially self-describing: it's possible to 
+iterate over its parts, but, quoting the spec
+
+http://code.google.com/apis/protocolbuffers/docs/encoding.html:
+  " the name and declared type for each field can only be determined on the
+  decoding end by referencing the message type's definition (i.e. the .proto 
+  file). "
+
+2. GPB in an SQL database
+=========================
+
+It is possible to store GPB data in MariaDB today - one can declare a binary
+blob column and use it to store GPB messages. Storing and retrieving entire
+messages will be the only available operations, though, as the server has no
+idea about the GPB format. 
+It is apparent that ability to peek inside GPB data from SQL layer would be of
+great advantage: one would be able to 
+- select only certain fields or parts of GPB messages
+- filter records based on the values of GPB fields
+- etc
+performing such operations at SQL layer will allow to reduce client<->server 
+traffic right away, and will open path to getting the best possible
+performance.
+
+2.1 Informing server about GPB field names and types
+----------------------------------------------------
+User-friendly/meaningful access to GPB fields requires knowledge of GPB field
+names and types, which are not available from GPB message itself (see "GPB 
+encoding overview" section).
+
+So the first issue to be addressed is to get the server to know the definition
+of stored messages. We intend to assume that all records have GPB messages
+that conform to a certain single definition, which gives one definition per
+GPB field.
+
+DecisionToMake: How to pass the server the GPB definition?
+First idea: add a CREATE TABLE parameter which will specify either the
+definition itself or path to .proto file with the definition.
+
+2.2 Addressing GPB fields
+-------------------------
+We'll need to provide a way to access GPB fields. This can be complicated as
+structures that are encoded in GPB message can be nested and recursive.
+
+2.2.1 Option1: SQL Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Introduce an SQL function GPB_FIELD(path) which will return contents of the
+field.  
+- Return type of the function will be determined from GPB message definition. 
+- For path, we can use XPath selector (a subset of XPath) syntax.
+
+(TODO ^ the above needs to be specified in more detail. is the selector as
+simple as filesystem path or we allow quantifiers (with predicates?)?)
+
+2.2.2 Option2: SQL columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Make GPB columns to be accessible as SQL columns. 
+This approach has problems:
+- It might be hard to implement code-wise
+  - (TODO will Virtual columns patch help??)
+- It is not clear how to access fields from nested structures. Should we allow 
+  quoted names like `foo/bar[2]/baz' ? 
+



DESCRIPTION:

Add support for dynamic columns:

- A column that can hold information from many columns 
- One can instantly add or remove column data

This is a useful feature for any store type of application, where you want to
store different type of information for different kind of items.

For example, for shoes you want to store: material, size, colour, maker
For a computer you want to store ram, hard disk size etc...

In a normal 'relational' system you would need to a table for each type.
With dynamic columns you have all common items as fixed fields (like
product_code, manufacturer, price) and the rest stored in a dynamic column.

The proposed idea is to store the dynamic information in a blob in
Google Protocol Buffers (further GPB) format and use SQL constructs to
extract parts of GPB data for use in select list, for filtering, and so forth.

Any support for indexing GPB data is outside of scope of this WL entry.


HIGH-LEVEL SPECIFICATION:



<contents>
1. GPB Encoding overview
2. GPB in an SQL database
2.1 Informing server about GPB field names and types
2.2 Addressing GPB fields
2.2.1 Option1: SQL Function
2.2.2 Option2: SQL columns
</contents>


1. GPB Encoding overview
========================

GBB is a compact encoding for structured and typed data. A unit of GPB data
(it is called message) is only partially self-describing: it's possible to 
iterate over its parts, but, quoting the spec

http://code.google.com/apis/protocolbuffers/docs/encoding.html:
  " the name and declared type for each field can only be determined on the
  decoding end by referencing the message type's definition (i.e. the .proto 
  file). "

2. GPB in an SQL database
=========================

It is possible to store GPB data in MariaDB today - one can declare a binary
blob column and use it to store GPB messages. Storing and retrieving entire
messages will be the only available operations, though, as the server has no
idea about the GPB format. 
It is apparent that ability to peek inside GPB data from SQL layer would be of
great advantage: one would be able to 
- select only certain fields or parts of GPB messages
- filter records based on the values of GPB fields
- etc
performing such operations at SQL layer will allow to reduce client<->server 
traffic right away, and will open path to getting the best possible
performance.

2.1 Informing server about GPB field names and types
----------------------------------------------------
User-friendly/meaningful access to GPB fields requires knowledge of GPB field
names and types, which are not available from GPB message itself (see "GPB 
encoding overview" section).

So the first issue to be addressed is to get the server to know the definition
of stored messages. We intend to assume that all records have GPB messages
that conform to a certain single definition, which gives one definition per
GPB field.

DecisionToMake: How to pass the server the GPB definition?
First idea: add a CREATE TABLE parameter which will specify either the
definition itself or path to .proto file with the definition.

2.2 Addressing GPB fields
-------------------------
We'll need to provide a way to access GPB fields. This can be complicated as
structures that are encoded in GPB message can be nested and recursive.

2.2.1 Option1: SQL Function
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduce an SQL function GPB_FIELD(path) which will return contents of the
field.  
- Return type of the function will be determined from GPB message definition. 
- For path, we can use XPath selector (a subset of XPath) syntax.

(TODO ^ the above needs to be specified in more detail. is the selector as
simple as filesystem path or we allow quantifiers (with predicates?)?)

2.2.2 Option2: SQL columns
~~~~~~~~~~~~~~~~~~~~~~~~~~
Make GPB columns to be accessible as SQL columns. 
This approach has problems:
- It might be hard to implement code-wise
  - (TODO will Virtual columns patch help??)
- It is not clear how to access fields from nested structures. Should we allow 
  quoted names like `foo/bar[2]/baz' ? 


LOW-LEVEL DESIGN:



* GPB tarball contains a protocol definition for .proto file structure itself
  and a parser for text form of .proto file which then exposes the parsed
  file via standard GPB message navigation API.

* We should have both server-side support and client-side support (client side
  means functions in libmysqlclient so that user can select the full BLOB and
  extract fields in the application).

* Add some kind of header to the GPB blob to support versioning and future
  extensibility.

* Add complete syntax description (update, add, drop, exists, ...).


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)