Construction of a Parallel Corpus of English-Arabic Movie Subtitles

Multimedia or audiovisual translation (AVT) is a branch of translation concerned with transferring multimodal texts in audiovisual materials from one language/culture into another. Subtitling is a common form of AVT where the spoken dialogue is translated and written on the screen. Different scholars have examined subtitling but on a limited number of audiovisual products. This paper presents a new 1,254,278-word English-Arabic Movie Subtitles Corpus (EAMSC). It describes the data selection and extraction methods and suggests potential applications for the compiled corpus. The corpus includes the scripts of English movies selected on the basis of genres and high Internet Movie Database (IMDB) ratings, along with their subtitles extracted from Netflix and Orbit Showtime Network (OSN). To compare the subtitles chronologically, the researchers selected seventy movies, two movies per decade from the 1930s to the 1990s, thirty-one movies from 2000 to 2010, and the remaining twenty-five from 2011 to 2020. To improve its quality, the corpus has been manually checked at different levels, namely text segmentation, and alignment. This parallel corpus can also be used in language teaching, translator training, and AVT research. In addition, it can be used to explore the strategies of translating English movies into Arabic. Finally, the study recommends researchers to compile other AVT parallel corpora of different modes.