Correct character encoding is essential for accurate subtitle processing, especially when your subtitles are being translated, transcribed, or parsed by code. The safest and most universally accepted encoding format is UTF-8, though UTF-16 is also supported in many cases.
Incompatible or unknown character sets (charsets) are the number one reason subtitle translation and processing fails on our platform. When a file uses an outdated or platform-specific encoding like Windows-1252 or ISO-8859-1, characters can get misinterpreted, replaced with �, or dropped entirely. This results in broken output or failed translation jobs.
Windows-1252
ISO-8859-1
UTF (Unicode Transformation Format) encodings like UTF-8 and UTF-16 can represent nearly every character from every language in a consistent, cross-platform manner. This makes them ideal for global applications like subtitles, where multiple languages, accents, and symbols are common.
When a subtitle file is UTF-encoded:
é
é
While we do our best to automatically detect the encoding of uploaded subtitle files, charset detection is not an exact science. Many encodings share similar byte patterns, and short subtitle lines provide little context for accurate guessing.
For example, a file might look like UTF-8 but actually be encoded in Latin-1 or Windows-1252. This can lead to translation errors, character corruption, or job failure.
ANSI
Ensuring your subtitle files are saved in UTF-8 (without BOM) is one of the easiest ways to avoid processing issues. Here are some tools and methods to do this, depending on your operating system:
OpenSubtitles.com (preferred site)
OpenSubtitles.org(legacy site)