← Back to team overview

oqgraph-dev team mailing list archive

Re: IMDB tools

 

Do you have appropiate indexes for it?

Antony T Curtis
On Feb 26, 2013 6:00 PM, "Arjen Lentz" <arjen@xxxxxxxxxxxxx> wrote:

> Hmm somehow a typo (or old syntax) was still present in the SQL file.
>
> $ zcat actors.list.gz | pv | ./imdb2sql | mysql -u root
> --socket=/tmp/mysql.sock test
> 725MB 0:47:17
> loading the actors requires 260263 pages (or K) in the keybuffer.
>
> $ zcat actresses.list.gz | pv | ./imdb2sql | mysql -u root
> --socket=/tmp/mysql.sock test
>  427MB 0:24:42
> After this, 380564 MyISAM index pages in the key buffer.
>
> So key_buffer_size=400M for the above two operations is sufficient.
>
> mysql> insert into actors_movies_actors select
> am1.actor_movie_id,am2.actor_id from actors_movies am1 join actors_movies
> am2 on (am1.movie_id=am2.movie_id AND am1.actor_id != am2.actor_id);
> Query OK, 169301244 rows affected (1 hour 39 min 7.58 sec)
> Records: 169301244  Duplicates: 0  Warnings: 0
>
> MariaDB [test]> insert ignore into actors_graph_base (origid,destid)
> select am.actor_id,ama.actor_id from actors_movies_actors ama join
> actors_movies am using (actor_movie_id);
> Query OK, 149853110 rows affected, 65535 warnings (1 hour 51 min 19.86 sec)
> Records: 169301244  Duplicates: 19448134  Warnings: 19448134
>
> Ok so a simple actor -> actor graph is 149 million vertices.
>
> Somehow the system doesn't like the actors_graph table, getting errors
> (not found, error 2) no matter what I do. When I name it ag it's fine, with
> the basetable named agb.
> Single underscores have been used in other tests, so I dunno what's going
> on there.
>
>
> ----- Original Message -----
> > From: "Arjen Lentz" <arjen@xxxxxxxxxxxxx>
> > To: oqgraph-dev@xxxxxxxxxxxxxxxxxxx
> > Sent: Tuesday, 26 February, 2013 10:06:51 AM
> > Subject: [Oqgraph-dev] IMDB tools
> > Hi all
> >
> > Here is my old IMDB conversion tool (C code), and the schema I used.
> > It uses only Actors and Movies, filtering out TV series and DVDs.
> >
> > 1484084 actors \_ actors
> > 866799 actresses /
> > 574006 movies
> > 5900319 actors_movies
> >
> > actors.list.gz and actresses.list.gz files from
> > ftp://ftp.fu-berlin.de/pub/misc/movies/database/
> > (via http://www.imdb.com/interfaces)
> >
> > My laptop is still building the actors_movies_actors table so I don't
> > yet have a total for that, but from memory I think it was around 45
> > million edges.
> >
> > Currently I load using MyISAM with delay key writes and large key
> > buffer size.
> > We can possibly optimise the load methodology a bit further.
> >
> >
> > Cheers,
> > Arjen.
> > --
> > Arjen Lentz, Exec.Director @ Open Query (http://openquery.com)
> > Australian peace of mind for your MySQL/MariaDB infrastructure.
> >
> > Follow us at http://openquery.com/blog/ & http://twitter.com/openquery
> >
> >
> > --
> > Mailing list: https://launchpad.net/~oqgraph-dev
> > Post to : oqgraph-dev@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~oqgraph-dev
> > More help : https://help.launchpad.net/ListHelp
>
> --
> Arjen Lentz, Exec.Director @ Open Query (http://openquery.com)
> Australian peace of mind for your MySQL/MariaDB infrastructure.
>
> Follow us at http://openquery.com/blog/ & http://twitter.com/openquery
>
>
> --
> Mailing list: https://launchpad.net/~oqgraph-dev
> Post to     : oqgraph-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~oqgraph-dev
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References