Independent Films by the Numbers

The marketing of Independent Films

Archive for the 'Language' Category

Language Structure of Festival Titles

I ran the collection of festival movie titles I have gathered through a part of speech analyzer. My goal is to use this tagged information to help refine some future performance analyses I plan on doing. The results by themselves are interesting.

First, a disclaimer needs to be made about POS tagging using an automated piece of software. It is not perfect, especially in the case of tagging part of titles, which tend to lack rich contextual hints given their brevity. If fact, I had to trial a few POS taggers before I found one that worked to my liking. I think that nouns (NN) are over represented by the fact that when in doubt the tagger tags a word as a noun.

Also note, that I had to relax the cues for Proper Nouns vs. Nouns given the standard structure of titles, which captialize most words. As such proper nouns are not reflected in this analysis and were treated as standard nouns.

The results are as follows for the top 25 title part of speech structures:

key: Format = [RANK 1…25] [POS Tag] [Count] [% of Sample]

tags: NN = Noun; NNS = Plural Noun; JJ = Adjective; DT = Determiner; POS = Possesive Ending; IN = Preposition; VBN = Verb Past Participle; VBG = Verb Present Participle; CD = Number; RB = Adverb

1 NN 526 (14.66%)
2 NN NN 248 (6.91%)
3 JJ NN 184 (5.13%)
4 DT NN 161 (4.49%)
5 NNS 90 (2.51%)
6 JJ 72 (2.01%)
7 DT NN NN 69 (1.92%)
8 NN POS NN 53 (1.48%)
9 DT JJ NN 52 (1.45%)
10 DT NN IN NN 46 (1.28%)
11 NN NNS 46 (1.28%)
12 NN IN NN 44 (1.23%)
13 JJ NNS 40 (1.11%)
14 NN IN DT NN 34 (0.95%)
15 VBN 29 (0.81%)
16 NN NN NN 27 (0.75%)
17 DT NNS 24 (0.67%)
18 NN CD 23 (0.64%)
19 NN CC NN 22 (0.61%)
20 RB 21 (0.59%)
21 VBG NN 21 (0.59%)
22 NN VBG 21 (0.59%)
23 NN VBZ 19 (0.53%)
24 NNS IN NN 19 (0.53%)
25 CD NNS 18 (0.50%)

Comments are off for this post