Implementing multi-touch camera controls (with inertia!)

"You know what really grinds my gears? Friction.": A meme from the show Family Guy where Peter is shown as the news anchor for a segment called "grinds my gears."
"Inertial panning" (or scrolling) works by applying friction to the movement velocity.
This makes the movement look smooth and natural.

tl;dr: In this post I describe how I implemented "traditional" swipe-based level-camera controls in Godot. This includes drag-to-pan, multi-touch pinch-to-zoom, inertial panning, swipe gesture-smoothing, and automatic camera limits based on the level-boundaries.

An animated GIF of a recording from Meteor Power showing my recent improvements to camera panning and zooming controls with pinch-to-zoom on mobile.
Here's what it looks like on a mobile phone.
Notice the panning inertia and pinch-to-zoom toward a target position.

Let's start simple: A one-touch drag

An animated GIF of a recording from Meteor Power showing camera panning without inertia.
Camera panning without inertia.
When you stop dragging, the camera stops moving!

The core of this is pretty straight-forward.
  1. Listen for a touch-move event
  2. Calculate the displacement since the last touch-move event
  3. Then translate the camera offset using that displacement

You might want to also include a slight multiplier on the camera displacement here, to adjust the pan to be more or less sensitive.

What about zoom?

The difficulty comes when you start changing the camera's zoom. When the camera is zoomed-in, you want a swipe gesture to move the camera less far. When the camera is zoomed-out, you want the same swipe gesture to move the camera further.

Well, actually, everything is moving the same distance in "screen space", regardless of the current zoom. It's in "level space" that the distances differ according to zoom. Which brings me to an important point:

Capture touch events in "screen space", then transform them into "level space".

For most of the panning and zooming logic, you want to use positions, distances, velocities, etc. in "level space", but you sometimes still need to know the screen-space coordinates.

If you're considering your touch positions in level-space, then you don't need to explicitly include zoom in your camera-offset calculation. If you're considering your touch positions in screen-space, then you'll need to multiply the offset by the current zoom multiplier.

Now let's add zoom controls

It's a lot simpler to calculate zoom updates using a mouse with a scroll wheel, than with a multi-touch gesture, so let's consider the mouse-based case first.

The core of this is also pretty straight-forward:

  1. Listen for mouse-scroll events.
  2. If the scroll direction is up, then divide by a constant zoom-speed-multiplier.
  3. If the scroll direction is down, then divide by the same zoom-speed-multiplier.

In this case, the complexity arises when you try to target the zoom toward the cursor's current position.

Targeting zoom toward a specific position

What does this mean? Take a look at this comparison of zoom with and without a target position.

An animated GIF of a recording from Meteor Power showing zoom in and out toward the mouse cursor.
  • You want zoom to look like this.
  • You've moved the cursor to the bottom-right corner, because you want to zoom-in to get a better look at whatever that green thing is over there.
  • Notice that the zoom keeps the green thing under the cursor the whole time.

An animated GIF of a recording from Meteor Power showing zoom in and out toward the the camera's current center position, regardless of where the mouse cursor is.
  • You don't want zoom to look like this.
  • You've moved the cursor to the bottom-right corner, but when you zoom, the camera just zooms toward it's current center.
  • And the thing you were looking at in the bottom-right corner disappears from view.

You don't get the desired behavior for free! You have to calculate a corresponding camera offset according to the old zoom, the new zoom, the cursor's position (in level-space), and the camera's current offset (in level-space).

(Source code)

Multi-touch pinch-to-zoom

Tracking multi-touch events isn't really too difficult to understand and implement. But it does involve a lot more boilerplate and more edge-cases that you could break. Here are the key steps:

  1. Listen for touch-down, touch-up, and touch-drag events.
  2. Determine the touch position in both screen-space and level-space.
  3. Get the touch-index for the event.
    • This tells you which finger corresponds to the event.
    • Godot (or the underlying platform) handles all of the complicated logic for determining whether or not an event should be attributed to the same finger as another active touch.
  4. Update a mapping from touch-index to touch-positions for all active touches.
  5. Calculate the current distance between touches, and the ratio from the previous distance.
  6. Emit a signal when there is a touch update, and there are exactly two active touches.
  7. Then the camera code updates the zoom according to the current pinch distance ratio.

(Source code)

Let's set some boundaries

Another important feature of a well-implemented camera is boundaries. That is, you don't want the player to be able to pan too-far away from the level. You also don't want the player to zoom too far in or out.

An animated GIF of a recording from Meteor Power showing camera zooming without min and max limits.
Camera zoom without limits!

Why not let the user handle this?

Obviously, the player could control this themselves, and just choose not to pan or zoom so far that they get lost. But as a UI designer, it's very important for you to understand the number one rule of UI design:

The user is an idiot!

Ok, so maybe not really. But it's very useful for you to think so. Because if you make it easy for the user to do the wrong thing, they're going to do the wrong thing. And then the bad experience is really your fault, not theirs.

So if you know the user will never benefit from zooming too far out or in, or panning too far away, don't let them!

First, define the viewable region

A simple way to do this is to find the minimum axially-aligned bounding-box (AABB) that contains all the collidable geometry in your level. Then you probably want to add some margin to that AABB.

(Source code)

Next, calculate the maximum zoom-out

You could just want to set a constant value for this, based on how far out you think things become difficult to see.

Or you could automatically calculate how far-out the camera needs to zoom before it touches the boundaries of the viewable-region on opposite sides. This calculation depends on the aspect ratio of the viewport and of the viewable-region, since the zoom-out will be limited by whichever dimension limit it hits first.

Or you might want to use the best of both approaches and choose whichever of the two above limits is smaller for the current level!

(Source code)

Now you can limit the camera offset

  1. First, clamp the zoom to stay between the min and max values we calculated above.
  2. Then, calculate the current size of the viewable region in level-space according to the current camera zoom.
  3. Then, we can calculate the camera min and max positions according to the current viewable region size and the viewable region boundary limits we calculated above.

(Source code)

Panning with inertia


"Pan inertia" (or "scroll inertia") just means that the movement will continue and slow-down gradually after you release your finger. This makes panning and scroll controls feel much more natural and smooth for the user. And this is a pretty standard feature for modern touch controls. So your players will notice when it's not there!

So how do we implement it?

In-case the word "inertia" didn't clue you in, you'll need to use some high-school-level physics skills to implement this feature. Here are the key steps:

  1. Update the touch-listener to also track the current drag velocity.
    • This is just the displacement from the previous touch position to the current position divided by the time between the current and previous touch events.
  2. Update the camera-controller to listen for touch-release events, and to then toggle-on deceleration physics.
  3. The deceleration physics updates the current velocity value at a constant rate, and it updates the camera offset according to the current velocity.
  4. The deceleration physics ends when either the velocity becomes small enough or another touch happens.

(Source code)

Noisy gesture data

Unfortunately, this simple inertia implementation will probably behave strangely for you. The problem is that touch-based gesture data is very noisy! This is for a couple different reasons:

  1. The actual human movement isn't regular—especially around the start and end of the gesture.
  2. Touch sensors may not be as accurate or precise as you expect.
    • And they can vary a lot on different devices!

You might notice noisy gesture data because of weird stops or giant jumps in the camera panning when you release a swipe. But this might only happen about half the time. For the other half of gestures, you might see pretty-correct-looking post-release deceleration.

An animated GIF of a recording from Meteor Power showing camera panning with inertia but without smoothing the gesture velocity.
Panning with inertia, but without smoothing the gesture velocity.

Smoothing the noise

To fix noisy gesture data, we apply some form of "smoothing". A simple way to do this is to just compare the latest position with a position from X seconds ago, rather than just with the previous position.

To use this fix, you'll need to keep track of many recent events in a buffer, rather than just the last one or two events as stand-alone variables.

(Source code)

An animated GIF of a recording from Meteor Power showing my recent improvements to camera panning and zooming controls.
Inertial camera panning and scroll-to-zoom (on a PC)!

🎉 Cheers!

This is a simple icon representing my sabbatical.