← Back to team overview

software-store-developers team mailing list archive

Re: How to implement "Recommended for you"

 

Hi,

I commented on the individual parts of the mail below too, but it
seems like we should discuss at a high level what our recommendation
data sources are and where they come from. 

Here is my list, additions welcome :)

The current data about other people we have is:
- what all other people have installed (new recommender service/popcon)
- what all other people are using (zeitgeist/new recommender service/popcon)
- what specific apps other people like or dislike (rnr)

The data we have about the users system is:
- what apps the user has installed
- what apps the user is using (popcon/zeitgeist)
- what mimetypes the user is working with (zeitgeist)
- *maybe* the SSO ID of the user
- *maybe* what apps the user likes (based on his/her reviews)
(- the users contacts)

There is a certain overlap with popcon, so we should consider reusing
parts of it and parts of the raw data we have into this new system.

Our review based data will be relatively small because people have to
write a full review in order to "rate" a app. Having a lower threshold
here in the form of just "like/dislike this app" or "1-5 stars"
(without a review) would generate more data that we could use for the
purpose of good recommendations.

On Tue, Jun 14, 2011 at 11:02:51AM +0100, Matthew Paul Thomas wrote:
[..]
> >                                                              I know
> > there was some exciting conversation at UDS about tapping RnR and
> > contacts to make suggestions but when would this likely be ready?
> >...
> 
> I guess in some cases basing it on contacts would produce results better
> than basing it on Ubuntu users in general. For example, if you had a lot
> of classmates or colleagues as contacts ("oh, everyone else here is
> playing Hedgewars and chatting on Mumble, maybe I should too").
> 
> To start with, though, let's design recommendations assuming that we can
> use ratings but not contacts yet. We can always juice them up with
> contacts later.

Contacts are a interessting idea. I think this needs some more
discussion as I see some challenges here:
- Diversity of the contacts. I have in my contacts my familty, my
  friends and my co-workers and more people I know but don't interact
  much with. Their interessts and computer habits are very diverse, I
  really wonder if that will give me anything better than
  recommendations on the whole s-c user population. We could use
  "friends" or "favorite contacts" instead (which is also not quite
  right but probably closer)
- Privacy. We need to be careful with this feature, if a user has only
  very few contacts this could be used to gather data about the
  installed apps of them. We either need to make this opt-in or be
  very careful about leaking information. The nature of the data is
  not that sensitive so we may well be fine, but we need to take it
  into consideration.
- Technical: the server will have to know the users contacts
  (ubuntuone or uploading when the feature is activated) and the
  server will have to match ubuntu sso IDs to the applist of the given
  user. This will exclude users without a ubuntu SSO account.


> On Friday I identified six problems to solve for each of these new
> features: generation, storage, serving, displaying, caching, and fallback.
> 
> So, here's a straw-man sketch of what those might look like for
> recommendations. Please, everyone pick holes in it. :-)

Your below recommendations are mixing "generation of recommends" and
"storing the app list" into a single task apparently. I think its
easier to discuss them as two seperate tasks especially if we consider
reusing popcon for parts of it.

> Generation
> 
>     By default, the "Recommended for you" box contains only a "Turn On
>     Recommendations" button, and an explanation that turning them on
>     will submit data about what software you have installed. When you
>     turn it on, USC securely submits to the server a list of all the
>     packages you have installed, together with a UUID and (if you're
>     signed in) your SSO ID to link with your ratings. So you don't need
>     to sign in to an SSO account to get recommendations, you just need
>     to click one button.

I like the idea of this very prominent way of turning this feature
on as we need as many users as possible to do good recommendations. 

In the context of the UUID we will have to think about some more here:
- users with multiple machines
- users how reinstall their machine

So we should probably do periodic "ping" (even if the system does not
install/remove software a ping to tell the server that its still in
use) with the UUID to be able to remove no longer valid UUIDs over
time.

>     Whenever you submit the data (which is whenever you install,
>     remove, or rate anything subsequently), 

Given that the result of the recommends does also depend on the other
users we should probably re-generate periodically even if your system
does not change. How this needs to be cached will depend on the
complexity of the job. This is something that we need to discuss with
ISD and the people implementing this on the server side.

> the server uses a
>     recommender algorithm
>     <http://en.wikipedia.org/wiki/Recommendation_system#Algorithms> to
>     identify the ~50 packages you don't have installed that you're
>     most likely to rate as excellent.
> <http://cacm.acm.org/blogs/blog-cacm/22925-what-is-a-good-recommendation-algorithm/fulltext>

The acm.org link does not work for me, gives me a error that it can't
connect. 

I guess you added the wikipedia link because this is intended for the
spec. If that is the case I think we should just leave that out for
now. People who google "recommender algorithm" will get it as the
first hit anyway and for the spec it will be only helpful IMO when we
decided which algorithm to use (which is something we need to discuss
with the ISD team).

[..]
> Serving
> 
>     When sent a request containing the UUID, the server returns a
>     space-separated ordered list of packages representing the
>     recommendations for that UUID.

We will use something more standard like json for this. How exactly
the API should look like is something we need to discuss with the ISD
team that will actually implement it. 

But on a very high level view its indeed: "There will be a REST API
call that involves the UUID and that will return the recommendations
in some format that s-c can understand".

> Displaying
> 
>     USC asks the server for updated recommendations whenever the
>     feature is turned on and, since it last requested them:
>     -   you've installed or removed anything

When this happens the "my-installed-apps" list needs to be updated on
the server and the re-calcuation of the recommends needs to be
triggered. Depending on how long this takes we need to poll the
server. But this needs discussion with ISD as it will depend on the
implementation.

>     -   you've rated anything

This is something we may do on the server side, when a new review is
entered that could trigger the recalculation of the recommends server
side. Of course it depends on if this is on the same server as
rnrserver or not.

>     -   at least a week has passed (in case anything hot has been
>         released since then)

We can use the http ETAG to ask for changes more often and leave it to
the server to set the policy of the cache, this gives us more
flexibility in the future.

>     -   the cache is missing or unparseable.
[..]
> One thing I am really fuzzy on is what should happen if you use multiple
> computers for different purposes. They'll have different UUIDs, but
> you'll be signed in to the same SSO account.

A good point.

Ideally the recommendations will be different but good for both. If
e.g. you have a gaming machine and a image manipulation machine (with
a huge screen). Then it will hopefully recommend you the latest games
on the one and the best foto helper tools on the other.

Cheers,
 Michael


References