GSoC : Final Work Submission!

GSoC : Final Work Submission!

Surreal! My first Google Summer of Code has now come to an end. After almost 5 months (including the research pre-GSoC) it’s time to pack up. This has been a journey full of learning. This post is to serve the purpose of the final report for the work I have done during the summer.

Important links ⛓

Subtitle-Resync: High-speed subtitle synchronization

For an ideal subtitle file, the subtitles are perfectly aligned with the base audiovisual content. In other words, the audio and the corresponding subtitles co-oCCur. The misalignment of the subtitle files is the underlying problem that this project aims to solve so that the viewer does not have any burden before the fun starts (this is what matters).

Use case
Synchronization of subtitles between two versions (with and without commercials) of the same audiovisual content. It takes as input the original audiovisual content, the edited audiovisual content and the subtitles document of the original audiovisual content.

What has been done?

The whole project had been divided into modules and sub-modules. Can be seen in the Trello board, the function of each of which I have described in my proposal in great detail.

SiftSRT ✔️
SiftSRT: A complete subtitle parser and editor (Check it out!). Whatever our Tool A and Tool B require, related to subtitle files, SiftSRT has got it covered!.

Dactylogram ✔️
The complete audio-fingerprinting solution for Tool A, using chromaprint .

Audio Segmentation ✔️
Determination of presence of speech in an audio using webrtc for Tool B.

Alignment Algorithm

  • Behind the scenes!
    There is a lot going on under the hood. Everything about the working and more can be found in this post. Internally, the audio fingerprints that dactylogram generates are array of 32-bit integer. We don’t really work with them in this form, but the hashes look something like this:

    1
    2
    3
    4
    
     array([3594440902, 3594416326, 3598545270, 3598528806, 3615303462,
      3615435574, 3589233430, 3589233495, 3622787783, 3622794949,
      3623867085, 3623862925, 3623867357, 3590189549, 3573346669,
      2499605615, 2498559022, 2490230894, 2490230910, 2490362974])
    

    These fingerprints are treated usually on bit level. So it’s more useful to think of the fingerprint as a 2D black-and-white image where each white represents a 1 bit and each black pixel is a 0 bit:

    fp-2D

    We use a small subset of bits from the hashes and cross-match the two fingerprints. For each hash that appears in both fingerprints, we will calculate the difference of offsets at which they appear in the fingerprints. We will build a histogram of these offset differences, do some filtering (effectively estimating a density function using gaussian kernels) and find peaks in it. The peaks will tell us how do we need to align the two fingerprints to find matching segments in them.

    peaks

    Using the alignment algorithm our tool detects the multiple segments in the original file, separating the actual audiovisual content and the commercials. Once the segments are identified we use SiftSRT to edit the subtitle document and create a perfectly aligned subtitle file.

Where do we stand?

At this point, we have a perfectly working tool, capable of doing exactly what it is supposed to do.
Installation

  • Clone the repository from Subtitle-Resync @ Github
    1
    
      git clone https://github.com/CCExtractor/Subtitle-Resync.git
    
  • Navigate to install directory and run build.sh:
    1
    2
    
      cd install
      ./build.sh
    

    The tool is now ready to use!

Usage
For a complete list of options and parameters, please go through the project’s README .

  • Running Subtitle-Resync without any instruments lists all the parameters that are to be passed.
    1
    
       ./resync -o /path/to/original/audio.wav -m /path/to/modified/audio.wav -s /path/to/original/subtitle.srt
    

    w/0-args

  • Sync!
    1
    
      ./resync -o /path/to/original/audio.wav -m /path/to/modified/audio.wav -s /path/to/original/subtitle.srt
    

    What will this trigger?

    1. Read the original audio and modified audio.
    2. Extract audio fingerprints from the audio files.
    3. Compare the fingerprints and detect the different segments in the original content.
    4. Adjust the subtitle file and generate an in-sync subtitle file.

    See it in action:

What else can been done?

The project is in it’s early stage and will keep evolving. The available functions, usage instructions et cetera are expected to refactor over time. There are several ideas and features that can be added to this project.

  • Currently only SRT is the subtitle format that this project supports. Support for other formats can be added.
  • The misalignment of subtitles because of different encoding settings in videos causes a varying temporal offset. This is also a great feature to add in to our project.

Also, contribution of any kind is more than welcome. Feel free to raise an issue tracker here: here .

GSoC: Endgame

Before the final goodbyes, I would like to thank my mentor, Carlos Fernandez Sanz for accepting my proposal, providing guidance whenever I asked, and believing in me when things didn’t go as intended. I am grateful.
Shout out to the CCExtractor community for being so welcoming and supportive. I am fortunate to be a part of the CCExtractor Development
End? No, the journey doesn’t end here, it begins.. I’m sure I will stick around as a regular contributor.

"I pledge my life and honor to the Night's Watch, for this night and all the nights to come."