One thing we do at our workplace is produce video. We have a team of producers, techs, and editors that plan, shoot and edit video. What we don't have is staff who create subtitles, closed captions or transcripts. Often we outsource this work to a 3rd party, and include transcripts with our educational video. But if we have a finished video, and a script the video was created from, it is more cost effective (and timely) to create those internally.
I recently completed a survey of the available software, and although I don't have a complete list available, I will share what I thought was the best tool during my short testing.
Subtitle Composer: https://github.com/maxrd2/subtitlecomposer
There are versions available for Linux (Arch, Ubuntu, Debian, OpenSUSE and an AppImage) as well as Windows. You can of course build from source as well. Here is a list of all features from the feature list on the webpage:
I like it as it shows the video as well as an audio waveform that makes it easy to line up starting and stopping points for your audio. It also has a number of features that help correct and adjust your output automatically. It has a configurable interface, and lots of features to help you subtitle/caption better including variable playback rates, translations, and the ability to add scripts to help improve your workflow and tasks.
One feature I'm interested in is speech recognition from the audio/video file, as that would save me considerable time. To enable speech recognition, you need to install pocketsphinx and it dependencies. You can find more info here: https://github.com/maxrd2/SubtitleComposer/issues/88#issuecomment-341405873 and https://github.com/maxrd2/SubtitleComposer/issues/88#issuecomment-364688364 After installing the dependencies, you need to relaunch and reload your video. The Video -> Recognize Speech option doesn't appear until you actually play your video. Then you can run the recognition and pause the video. Once complete, you will have the recognized text (it wasn't great, maybe 60%) as well as natural breaking points for your audio. It is likely you will want to adjust those break points, as well as need to adjust the text.
I found it helpful to use the Edit -> Join Lines option to get all of the text into a single block, then transcribe it (fix the mistakes) and then Edit -> Split Lines to put it all back together. It looked like it worked perfectly and assigned the text back to the original breakpoints, but then the program crashed. :( Luckily, I had copied all of the text to a text editor for safe keeping. I reopened the program, reopened my video, created a single subtitle, pasted in my text (this would also work great if you had an existing script or transcript to work from), then did Edit -> Split Lines and it automatically creates breaks based on amount of text. Then, click Times -> Shift and enter the time stamp that speech starts, and speech stops. The program then automatically creates all of the breakpoints, with roughly the right size given the number of characters, and off sets them all to the correct amount. There are still manual adjustments required to merge together a few lines, split others, and adjust breakpoints, but it gets you an excellent head start.
| # | By | Comment | Post Date | Likes |
|---|