dubuntu-team team mailing list archive

Thread
Date
Re: API and Database specification

To: "Jay I." <jay.27182818@xxxxxxxxx>
From: Charl Wentzel <charl.wentzel@xxxxxxxxxxxxxx>
Date: Mon, 27 Jul 2009 09:32:55 +0200
Cc: dubuntu-team@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1248526286.5207.26.camel@jay>
Reply-to: charl.wentzel@xxxxxxxxxxxxxx
Sender: Charl Wentzel <wentzel.charl@xxxxxxxxx>
Thanks Jay

I'll comment below...

> i was very busy with some urgent work and, sad to say, spent absolutely
> no time on coding. the database description will take a lot of space so
> i'll post it in another letter. in this one i'd like to describe the
> current state of the api:

Same here.  I'm not trying to get you in trouble at work, but I do want
to prevent another 6 month break in the project. So make a living first
and use (some of) your spare time on the project.  I'm working under
similar constraints.

Ok, after looking at your api, it's clear we agree on one thing:  The
API is a front-end for the local database.  It controls all interaction
with the database, such as extracting, changing, updating data.  It
should also take care of remote updates (from the main website) as well
as fetching/posting saved environments and any contributions (i.e.
categorisation of packages).

The library and it's api should simply give developers of front-ends the
tools they need to do whatever they need... and keep them as far away
from the actual database as possible.  We really don't want some
"clever" ____ to screw up the database.

Is this correct?

> == s_db* db_open(void);
> 
> opens a connection and populates 's_db' structure. currently it doesn't
> accept any arguments. it will accept connection related parameters and a
> language identifier that will be used for fetching localized data from
> the db.

Good

> == bool db_close(s_db* db);
> 
> closes a previously open connection and frees memory allocated for s_db
> instance. btw the 's_' prefix is part of my naming convention. it allows
> me to think less about possible namespace collisions as well as define
> variables/fields/etc with more natural names, e.g.:

Happy!  I agree with a good naming convension to overcome namespace
issues.

> struct s_book{
>   s_text text;
> };
> 
> i don't insist on using it but in my code you can find it everywhere so
> i thought a little explanation would be necessary.

Not sure what this is for, so I'll wait for your explanation.

> == size_t db_exists(s_db* db, const char* path, bool* is_cat);
> 
> checks whether a path exists in the database. the format for path is
> this:
> 
> path = '/' | ( '/' name ){1,}
> 
> examples:
> /
> /development/desktop
> /multimedia/audio/players/realplayer
> 
> and this is how the function  works: it creates a sha1 hash from the
> path, and fetches a record that matches the hash from a special table
> called 'rels' (short for relations). the matched record (if any)
> contains a package/category id and a boolean field 'is_cat' telling
> whether the record corresponds to a category or package. the parameter
> is_cat if not NULL accepts the value of this field and id is returned
> from the function.

Good!  I've never used the technique of a hash to find items in a tree!
This is a new technique to me, but it does sound very efficient.  I'd
like to look into it for some of my projects.  Do you have a link to
some documentation on this technique.  Or is this your own idea?  I like
it.

> == s_db_item* db_read(s_db* db, const char* path);
> 
> this function just fetches a record from either 'packs' or 'cats' table.
> it uses db_exists to obtain an id and populates a s_db_item structure
> with information from a corresponding record. s_db_item is used for both
> categories and packages. it contains several fields that are common for
> packages and categories and a union for those that differ. if this
> function succeeds it returns a pointer to a malloced s_db_item instance
> that can be destroyed with free.

I usually use inheritance on structures... a very cool feature of C++!
So you create the inherited structure you require, but return it as the
base class.  The function receiving it can then determine what it is.  

I would suggest adding a "type" parameter here, e.g.
"category"/"package"/"all".  Just in case the caller of the function
only wants a specific type returned.

> == bool db_write(s_db* db, const char* path, s_db_item* item);
> 
> item is populated by the frontend. the function succeeds only when path
> is valid.

Great!

> == bool db_create(s_db* db, const char* path, s_db_item* item);
> 
> adds a record to either 'cats' or 'packs'. s_db_item has a field 'type'
> that tells the function which table to use.

Ok.

> == bool db_destroy(s_db* db, const char* path);
> 
> removes a record from the db.

Ok.

> == char** db_list(s_db* db, const char* path, bool** cats);
> 
> lists all items inside a category. this function fails if path
> corresponds to a package. returns a list of null-term. strings. cats if
> not NULL contains an array of bools after the function returns.

I assume this will be a linked list?  That's cool.  But instead of a
linked list and a bool array can I make an alternative suggestion:

As referred to above we have a base class:

  typedef struct s_item t_item;

  struct s_item {
    char *     name;
    bool       cat;
    t_item *   next_item;
  }

This function could populate and return only the base class.  So now you
have only one return value, the start of the populated list.

Your db_read and db_write functions could use the same structure, but
have derived structures as follows...

  typedef struct s_pack t_pack;
  typedef struct s_cat t_cat;

  struct s_cat : s_item {
    // whatever is specific to a category
  }

  struct s_pack : s_item {
    // whatever is specific to a package
  }

So most of the time you'll be working with t_item and when necessary
create one of the inherited classes.

The base sturcture is already a linked list.  So all the inherited
classes are automatically linked lists as well.  So your read and write
functions could also return (or accept) a list of items instead of just
one (optional of course).

PS: the reason for the typedefs is so you can easily create the
structures, e.g.:
  new_item = (t_item*)malloc( sizefo(t_item));

What do you think?  I know a lot of people hate them, but I love linked
lists and they are one of my specialities.  Good coding overcomes memory
leaks... I hate garbage collection because it makes you lazy and sloppy.
Also the advanced string_list objects is usually unnecessary and just
bloats the binary.  But hey... that's just my opinion!

> == bool db_set_language(s_db* db, const char* language);
> 
> sets current language.

Multi-lingual!  Excellent!  I haven't even thought of that.  Although
English is not my first language it is the main official language (one
of eleven!) in our country.  So we're so used to using it it doesn't
even bother most of us to translate when using PC's.

> == bool db_get_language(s_db* db, char* language);
> 
> gets current language. language is a five character sequence:
> 
> xx-XX
> 
> e.g. en-US
> 
> == that's it for the moment. i'm gonna add functions for
> adding/removing/listing repositories and functions for explicit
> pattern-matching.

Well it is a good start.

> p.s.
> as we haven't come in terms on several quite important subjects yet i
> think now we should concentrate on working with packages and package
> categories and leave search categories/package groups/etc for future. if
> everyone will decide to go his own way then at least we will have some
> common codebase that can be used in both our approaches.

I say we give it a go anyway.  As you have remarked yourself, our
approach has a lot of simmilarities.  In fact the differences is in how
the data is presented in a gui not in how it is stored or searched in
the back-end.  I think we'll find it fairly easy to get a common
database and api!

Think about it.  Whether you have you "big tree" approach or my "split
trees" and "separate results".  You still need to tell the api what the
"search categories" are you wish to apply and what the "search strings"
would be.  Even the search results function could cater for both our
approache by added the "type" parameter I've mentioned.

The gui/cli front-end will just apply the api slightly differently to
cater for it's specific needs.  

Isn't that the whole idea?  I can guarantee you that there will be
others with other ideas on what the gui should look like.  So let them
create their own versions!  

The critical factor is that we MUST use the same core library (db and
API)!!! 

Let me know what you think.

Regards
Charl
References

API and Database specification
From: Charl Wentzel, 2009-07-23
Re: API and Database specification
From: Jay I., 2009-07-25