Subtitle Validator

Select subtitle file for analysis

Why Subtitle Files Must Be UTF Encoded

Correct character encoding is essential for accurate subtitle processing, especially when your subtitles are being translated, transcribed, or parsed by code. The safest and most universally accepted encoding format is UTF-8, though UTF-16 is also supported in many cases.

The Leading Cause of Subtitle Processing Failures

Incompatible or unknown character sets (charsets) are the number one reason subtitle translation and processing fails on our platform. When a file uses an outdated or platform-specific encoding like Windows-1252 or ISO-8859-1, characters can get misinterpreted, replaced with �, or dropped entirely. This results in broken output or failed translation jobs.

Why UTF Encoding Matters

UTF (Unicode Transformation Format) encodings like UTF-8 and UTF-16 can represent nearly every character from every language in a consistent, cross-platform manner. This makes them ideal for global applications like subtitles, where multiple languages, accents, and symbols are common.

When a subtitle file is UTF-encoded:

All characters (letters, symbols, punctuation, emojis, etc.) are preserved reliably.
Translation algorithms can read the text without guessing or failing.
You avoid garbled output and weird characters like Ã© instead of é.
Your file will work across different systems (Windows, macOS, Linux) and tools without re-encoding.

The Problem With Automatic Charset Detection

While we do our best to automatically detect the encoding of uploaded subtitle files, charset detection is not an exact science. Many encodings share similar byte patterns, and short subtitle lines provide little context for accurate guessing.

For example, a file might look like UTF-8 but actually be encoded in Latin-1 or Windows-1252. This can lead to translation errors, character corruption, or job failure.

Best Practices

Always save your subtitle files in UTF-8 without BOM if possible.
Avoid using outdated encodings like ANSI, Windows-1252, or ISO-8859-1.
Use modern text editors like VS Code, Notepad++, or Sublime Text to convert encodings if needed.
Check your file with our Subtitle Validator before uploading to ensure it's UTF-compatible.

How to Convert to UTF-8

Ensuring your subtitle files are saved in UTF-8 (without BOM) is one of the easiest ways to avoid processing issues. Here are some tools and methods to do this, depending on your operating system:

Windows Users

Notepad++ (Free, Recommended):
1. Open your subtitle file in Notepad++.
2. Go to Encoding in the top menu.
3. Select Convert to UTF-8 (make sure it’s not "Encode" — choose "Convert").
4. Save the file (Ctrl + S).
Windows Notepad (Simple, Built-in):
1. Open the subtitle file in Notepad.
2. Go to File > Save As…
3. In the “Encoding” dropdown at the bottom, select UTF-8.
4. Save the file with a new name to be safe.
VS Code (Free, Cross-platform):
1. Open your subtitle file in Visual Studio Code.
2. Click the encoding label in the status bar (bottom right, usually says “UTF-8” or similar).
3. Click “Save with Encoding”.
4. Select UTF-8 (not UTF-8 with BOM).
5. Save the file.

macOS Users

TextEdit (Built-in, but needs config):
1. Open TextEdit, then go to TextEdit > Preferences.
2. Set "Open and Save" encoding to UTF-8.
3. Open your file, then choose File > Save As….
4. Ensure it’s being saved as plain text (not rich text) and select UTF-8.
VS Code (Same as above, works on macOS):
1. Open the file, click the encoding label at bottom right.
2. Select “Save with Encoding” → choose UTF-8.
BBEdit (Free for basic use):
1. Open your subtitle file in BBEdit.
2. Go to Text > Reopen Using Encoding → choose UTF-8 if needed.
3. Then go to Text > Save As and confirm it’s saving in UTF-8.