How to edit Ambisonic 360 Spatial Audio sound in Adobe Premiere for Zoom H2n workflow tutorial

Geoff from shows a tutorial video on the workflow for Adobe Premiere for how to capture and process the spatial audio (ambisonic surround sound) from the Zoom H2n, and then edit it to go along with a 360 video from the Samsung Gear 360, using Adobe Premiere, Audacity, ffmpeg, and the Google 360 Metadata Injector Tool.

This allows you to create an immersive sound experience when watching a 360 video.

Please subscribe :)

For my other tutorial on how to Process and Edit video from the Samsung Gear 360 Camera, please visit:

Contents of tip.txt as shown in the video (command to run in process.bat):
ffmpeg -i video-input.mp4 -i sound-input.wav -channel_layout 4.0 -c:v copy -c:a copy

Please note that there is a missing step in this video.
At time 11:38 in the tutorial, BEFORE importing the 4 audio tracks into Audacity, be sure to change the Audacity Project Rate (Hz) (in the bottom left of the audacity window) to 48000 instead of the default which is 44100. Doing this step will improve the audio quality.


Zoom H2n firmware update to version 2.0 or greater to support spatial audio:



Google 360 metadata tool (go to “Step 2 – Preparing for upload”):

Here is a link to the test video “final video injected”, showing Ambisonics, that I created in this tutorial, so you can see the finished product:

VR Spatial Audio Tip:
Zoom H2n in spatial audio mode records B-format (ambiX) not (FuMa)
Both have 4 Channels:
ambiX = WYZX

Youtube format (ambiX) – WYZX
W – omnidirectional
Y – left/right
Z – up/down
X – forward/back

Alternative format (FuMa) – WXYZ
W – omnidirectional
X – forward/back
Y – left/right
Z – up/down

What are W, X, Y and Z?
With Ambisonic technology, the directionality of the sound field is composed of spherical harmonic components. The zero-order component is termed W and is omnidirectional. The first-order components are figure-of-eight (lemniscate) responses which point forward, left and up. These are termed X, Y and Z, respectively. In practice, second-order and higher components are ignored.
The W, X, Y and Z channels are collectively called B-Format.

The fact that the Z component can be recorded creates the opportunity for periphonic (full-sphere) reproduction. Periphony requires speakers to be placed above and below the height of the listeners’ ears.

If you’re familiar with microphone techniques you’ll realize that the W and Y spherical harmonic components are equivalent to the M and S components of the M-S stereo recording technique. Ambisonics is a natural extension of this recording technique to three dimensions.

Technical notes:
This video was created using licensed and legal versions of Adobe Premiere (for editing), SnagIt (for screen capture), and the free sound editing tool Audacity. Audio was recorded using Zoom H1, using a Sennheiser ME-2 lavalier condenser microphone, and video footage was captured on a Nexus 5X.

Related Posts:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.