← Back to team overview

multi-touch-dev team mailing list archive

Re: Combinatorial touch braindump


On Thu, 2011-05-26 at 12:33 +0300, Jussi Pakkanen wrote:

> One of the issues we have been examining is the question of gesture
> combinatorics. The basic problem is that with, say, 5 touches you get
> the global gesture, 10 two finger gestures, 10 three finger gestures and
> 5 four finger gestures. The problem is which one of these should be
> evaluated and provided to the client.

I'm not clear on what "the global gesture" is here.

It's true that in theory you get a combinatorial explosion on the number
of touches.  The reality is that applications only subscribe to a subset
of gestures and the number of touch combinations involved are quite

For example, Unity is only interested in 3- and 4-touch gestures.  With
5 possible touches, that's 15 possible gestures.  15 is hardly an
unweildy number.

> The preliminary plan is that each application is always provided with
> the global gesture and all two touch pairs. Or maybe all pairs, I don't
> know these details. Could someone please provide the current facts on
> this? The client then filters the incoming events and only takes the
> ones he cares about.

No, the preliminary plan is that applications are provided with every
possible subscribed and recognized gesture and component touches.  If
the application asks for a particular gesture, they need to get the
data. No second guessing what the developer's intent was.  There is no
concept of "the global gesture" that I am aware of, unless you are
referring to simply the current touch frame.

> The advantages of this scheme is that the client does not need to
> communicate with the server producing zero round trips.

The _point_ of that scheme is that round trips are not required.

> The disadvantage is that this produces a combinatorial explosion. If the
> amount of touches is low this is still manageable. However suppose there
> are 15 touches (Apple Magic Trackpad supports up to 32 I think), this
> means 105 two pair touches (and 455 three pair touches and 1365 four
> pair touches, but let's ignore those). The hardware produces
> measurements every 10 ms and assuming events are not combined (are
> they?), this means up to 10 500 gesture events every second. Assuming
> one event takes 20 bytes, that gives roughly 205 kB/sec data rate.

The hardware is producing those data regardless.  If the bottom half of
the device driver is eating too much bandwidth it needs to be fixed.  If
the evdev layer in the kernel needs to consolidate events, it should be
fixed.  If the gesture recognizer should be consolidating frames or
gesture events, it should.  None of that should affect the data transfer
design since they are orthogonal to it.

The gesture recognizer has to process all combinations of gesture in
parallel anyways, since it doesn't know what it's going to recognize
until it's recognized a gesture.

The amount of data transferred between the recognizer and the
application(s) in each gesture event is not significant.  The bounding
factor is the number of data transfers not the size of the data.

Round trips are inherently racy and time consuming.

> Is that a lot? I'm not sure. Anyone with mobile experience want to weigh
> in?
> I thought about this issue and came up with the following. It is more of
> a explorative evaluation and not a concrete plan. It also ignores most
> or all of what the implementation currently does. so some parts might
> not be feasible. Consider this a nudge to start the ball rolling.
> Design goal
> The system should do common case automatically. Uncommon cases should be
> possible and mostly straightforward.
> Assumptions
> In order to keep this analysis down to earth I make some assumptions on
> usage.
> Most applications only have one gesture they care about. This is the
> common case where only the (window-)global gesture matters. These sorts
> of applications include EoG, Evince etc. Pairwise gestures have no
> semantic meaning on these applications.

This is a big assumption and mostly incorrect.  The assumption is that
all applications will care about all two-touch gestures within their
bounding area (windows).  Unity cares about all 3- and 4-touch gestures.
Some applications may care about all one-touch gestures (ie. just MT

> In applications that do want pairwise gestures, only a small subset of
> all pairs is meaningful. Suppose an application that has four
> independent pinch-to-zoom areas. That means up to eight touch points,
> with a total of 28 combinations. Only 4 of these (14%) are used. The
> others are meaningless. In mathematic terms this means that of the
> O(N^2) combinations only O(N) are used.

But the gesture recognizer does not know which combinations are going to
be used.  It has to send them all.  The application can then be a good
citizen and reject the unused ones, but the recognizer can not rely on
all applications being good citizens.  Developers are free to write bad
applications that can bring the system to its knees.  

Don't forget that on mobile devices, you're not likely to get 8
simultaneous touchpoints on an input device.  On a larger surface where
you could conceivably have that many touchpoints, you generally enjoy
the luxury of more and faster processors and no power consumption
constraint.  It turns out the requirements scale with the available

> The only thing that knows which touch pairs are meaningful in an
> application is the application itself. There is no reliable heuristic.

Therefore all subscribed combinations must be sent.

> Individual touches are almost always important to the application.

Are they?  I think few applications are interested in individual
touches.  Just like few applications are interested in which keys on a
keyboard are pressed:  they're interested in what text gets entered
instead.  Some applications are interested in individual touches, and we
support the Touch gesture.

> The common case
> Based on the discussion above, it seems that most applications' gesture
> needs can be fulfilled with just two pieces of information: the
> individual touches and the global gesture that those touches form.

I need a definition of 'global gesture'.  It sounds like what you're
saying is to pass the raw MT data to the application and have the
application perform the gesture recognition.  That design does not work
under the requirement of having Unity grab the 3- and 4-touch gestures
and providing a consistent feel across all applications.

> The complicated case
> This is the big one. The basic case of transferring all pairs is
> relatively simple but computing all the pairs always is a bit wasteful,
> especially since usually only a small subset is required. (In fact the
> most common case is the one above, meaning that none of the pairs is
> ever used.)
> Since the important pairs can not be reliable determined the only way to
> cut down on processing is that the app tells which pairs it cares about
> as they come and go. Something along the lins of gest_id =
> AddGestureDetection(num_touches, touch_point_array, gesture_mask). This
> adds a round trip to the server.

The app can't tell which pairs it wants until it receives them.  Ergo,
the existing design of sending all pairs and rejecting the unused ones
via a round trip.  Unless you're proposing to have the application do
all the recognition using the raw MT data, which is a nonstarter because
it does not support Unity grabbing 3- and 4-touch gestures and does not
provide the consistent feel across all applications that we are aiming

Applications (or toolkits used to build applications) already subscribe
to only a subset of all possible gestures, selected by number of
touches, window, device, and gesture class.  The do not need to get all
combinations of touches for all gestures all of the time, just all those
gestures that could possibly satisfy what they asked for.

> If the app wants to add all gesture pairs or other crazy things, it can
> do that. At it also gets the blame when the system grinds to a halt.


> Conclusions
> There are a quite a lot of issues, such as how Unity, the X server and
> apps work together. But this mail is long enough already.

How Unity and the apps work together, and consistency of interaction
across the user experience, is the driving requirement.

Stephen M. Webb <stephen.webb@xxxxxxxxxxxxx>
Canonical Ltd.

Follow ups