← Back to team overview

multi-touch-dev team mailing list archive

Combinatorial touch braindump


Hi all

One of the issues we have been examining is the question of gesture
combinatorics. The basic problem is that with, say, 5 touches you get
the global gesture, 10 two finger gestures, 10 three finger gestures and
5 four finger gestures. The problem is which one of these should be
evaluated and provided to the client.

The preliminary plan is that each application is always provided with
the global gesture and all two touch pairs. Or maybe all pairs, I don't
know these details. Could someone please provide the current facts on
this? The client then filters the incoming events and only takes the
ones he cares about.

The advantages of this scheme is that the client does not need to
communicate with the server producing zero round trips.

The disadvantage is that this produces a combinatorial explosion. If the
amount of touches is low this is still manageable. However suppose there
are 15 touches (Apple Magic Trackpad supports up to 32 I think), this
means 105 two pair touches (and 455 three pair touches and 1365 four
pair touches, but let's ignore those). The hardware produces
measurements every 10 ms and assuming events are not combined (are
they?), this means up to 10 500 gesture events every second. Assuming
one event takes 20 bytes, that gives roughly 205 kB/sec data rate.

Is that a lot? I'm not sure. Anyone with mobile experience want to weigh

I thought about this issue and came up with the following. It is more of
a explorative evaluation and not a concrete plan. It also ignores most
or all of what the implementation currently does. so some parts might
not be feasible. Consider this a nudge to start the ball rolling.

Design goal

The system should do common case automatically. Uncommon cases should be
possible and mostly straightforward.


In order to keep this analysis down to earth I make some assumptions on

Most applications only have one gesture they care about. This is the
common case where only the (window-)global gesture matters. These sorts
of applications include EoG, Evince etc. Pairwise gestures have no
semantic meaning on these applications.

In applications that do want pairwise gestures, only a small subset of
all pairs is meaningful. Suppose an application that has four
independent pinch-to-zoom areas. That means up to eight touch points,
with a total of 28 combinations. Only 4 of these (14%) are used. The
others are meaningless. In mathematic terms this means that of the
O(N^2) combinations only O(N) are used.

The only thing that knows which touch pairs are meaningful in an
application is the application itself. There is no reliable heuristic.

Individual touches are almost always important to the application.

The common case

Based on the discussion above, it seems that most applications' gesture
needs can be fulfilled with just two pieces of information: the
individual touches and the global gesture that those touches form.

The complicated case

This is the big one. The basic case of transferring all pairs is
relatively simple but computing all the pairs always is a bit wasteful,
especially since usually only a small subset is required. (In fact the
most common case is the one above, meaning that none of the pairs is
ever used.)

Since the important pairs can not be reliable determined the only way to
cut down on processing is that the app tells which pairs it cares about
as they come and go. Something along the lins of gest_id =
AddGestureDetection(num_touches, touch_point_array, gesture_mask). This
adds a round trip to the server.

If the app wants to add all gesture pairs or other crazy things, it can
do that. At it also gets the blame when the system grinds to a halt.


There are a quite a lot of issues, such as how Unity, the X server and
apps work together. But this mail is long enough already.

Follow ups