← Back to team overview

multi-touch-dev team mailing list archive

Re: Combinatorial touch braindump


On Thu, 2011-05-26 at 09:24 -0400, Stephen M. Webb wrote:

> I'm not clear on what "the global gesture" is here.

Global gesture was perhaps a poor choice of words. Maybe I should have
said something like "combined gesture". But what I meant was this:

The gesture formed by all touches that are in my area of interest
(usually drawing area or whole window).

So if there are two touches, then the two finger gesture(s). If there
are three, only the three finger ones, but _not_ the pairwise two finger

> It's true that in theory you get a combinatorial explosion on the number
> of touches.  The reality is that applications only subscribe to a subset
> of gestures and the number of touch combinations involved are quite
> tractable.

But the question is how far do we need to scale? Is it ok to freeze your
entire system just by touching your touch pad with 20 points, which
produces over 5000 pairs for 2, 3 and 4 touch gestures?

> It sounds like what you're
> saying is to pass the raw MT data to the application and have the
> application perform the gesture recognition.

No, absolutely not.

> The hardware is producing those data regardless.  If the bottom half of
> the device driver is eating too much bandwidth it needs to be fixed.  If
> the evdev layer in the kernel needs to consolidate events, it should be
> fixed.  If the gesture recognizer should be consolidating frames or
> gesture events, it should.  None of that should affect the data transfer
> design since they are orthogonal to it.

There is a huge difference between them. The kernel produces only 15
pieces of information (x, y coordinates etc) which it pretty much just
dumps from the device. The gesture recognizer does mathematical analysis
on all sub-pairs and transfers those on. 

> > Most applications only have one gesture they care about. This is the
> > common case where only the (window-)global gesture matters. These sorts
> > of applications include EoG, Evince etc. Pairwise gestures have no
> > semantic meaning on these applications.
> This is a big assumption and mostly incorrect.  The assumption is that
> all applications will care about all two-touch gestures within their
> bounding area (windows).  Unity cares about all 3- and 4-touch gestures.
> Some applications may care about all one-touch gestures (ie. just MT
> events).

These are the same thing _when the number of touches is what the
application cares about_. This leads us to the main semantical question:

Suppose an application subscribes to "two finger drag gestures". What
should it receive when the user does a four finger drag in the app's

I'm not asking what happens in the current code, but rather what should
happen in a perfect utopia world.

A) the application does not receive anything, since it subscribed only
to two finger grags and this one had four
B) it should receive six two finger drags as per the combinatorics thing
C) something else, what?

Obviously A can be implemented using B by filtering it in the

I'd say that the answer is "it depends".

Toolkits especially want B, because they need to split gestures to
subareas. As an example suppose an app with two scroll areas, which the
user scrolls independently.

But on the other hand what happens with Unity when there are five
touches? There is ever only one Unity. With five finger combinatorics it
it possible to get one four finger drag to the left and one to the
right. Should the sidebar move left? Right? Do nothing at all?

What all this boils down to is that there are two different use cases:
one where the gesture is only meaningful when the total amount of
touches is exactly as specified (in the area of interest) and the other
where it is not. Do we want to have the users always do the state
tracking and filtering to transform the latter to the former?