All of the English dialogue in "Star Wars", split into words, and sorted alphabetically.
Fun facts:
The word "lightsaber" only appears once in this film.
There are 43m5s of spoken English, 81m39s of other.
The most common word is "the", of course, said 368 times.
The word with most screen time is "you", at 52.56 seconds.
There are 1695 different words, and 11684 total words.
The longest words are "responsibility," "malfunctioning", "worshipfulness", and "identification", all 14 letters.
I labeled the words manually (!) using some software I wrote specifically for the purpose.
This is the Special Edition to troll Han-shot-first purists. Everyone knows the orig is the most legit.
A bit more information: http://radar.spacebar.org/f/a/weblog/...
Videogrep is a python script that searches through dialog in videos and then cuts together a new video based on what it finds. Basically, it’s a command-line “supercut” generator. The code is here on github.
The script searches through a video’s associated subtitle file (which needs to be in the same folder as the video, in standard .srt format), identifies timestamps for the dialog, and then uses the wonderful moviepy library to generate the new final cut.