Multilingual typesetting on Overleaf using polyglossia and fontspec
Introduction
This article provides an overview of typesetting multilingual documents on Overleaf using the XeLaTeX (or LuaLaTeX) compiler in conjunction with the fontspec
and polyglossia
LaTeX packages.
For many, if not most, users their default choice of TeX engine is pdfTeX
, which, unlike XeTeX and LuaTeX, does not have a built-in capability to read UTF-8 encoded text files. Using pdfTeX makes typesetting certain languages in LaTeX very complicated, especially those that do not use a Latin-based script. Some packages—such as inputenc
, fontenc
and arabtex
—provide support to pdfTeX for typesetting non-Latin languages and scripts, but not all glyphs and characters may be supported or rendered correctly in the output PDF, even if you’ve used the utf8
or utf8x
option with inputenc
.
For an in-depth discussion of UTF-8, Unicode encoding and the XeTeX/LuaTeX engines, the Overleaf article Unicode, UTF-8 and multilingual text: An introduction is a fascinating read.
Enter XeTeX and LuaTeX
The XeTeX and LuaTeX engines can directly read/process UTF-8 encoded text; consequently, they offer native support for Unicode—they can also work with TrueType and OpenType fonts directly. These properties make them a natural choice for typesetting multilingual or non-Latin documents in LaTeX, producing outputs like these:
These examples can be found in the Overleaf Gallery: How to Write Multilingual Text with Different Scripts in LaTeX on Overleaf and Multilingual "Thank-You".
If you’re looking to typeset Chinese, Japanese and Korean, have a look at these articles:
Xe(La)TeX is still useful for these languages, but more specialised TeX engines are available, specifically designed for typesetting CJK languages—such as pTeX for typesetting Japanese.
Note that if your cursor seems to be misbehaving whilst editing text in certain languages on Overleaf, you may want to click on the Overleaf Menu button (situated above the project file list) and change the “Font Family” option. You could also try changing your browser’s monospaced font preferences or using Overleaf’s Rich Text view instead. However, at the time of writing, the Source and Rich Text views may not (yet) fully support right-to-left text editing at the level of functionality we are aiming to achieve.
Changing the project’s compiler
The fontspec
and polyglossia
packages require the XeLaTeX or LuaLaTeX compiler, so you’ll need to set up your Overleaf project to use either of those compilers. Detailed instructions can be found in our article Choosing a LaTeX Compiler but here is a brief video clip showing how to set the compiler for your project:
Once you’re compiling with XeLaTeX or LuaLaTeX, you can (should) remove the inputenc
and fontenc
packages from your .tex file’s preamble because these Unicode-capable engines will assume input (text) files are UTF-8 encoded. Incidentally, all text files uploaded to Overleaf are converted to UTF-8 so you should usually use utf8
with inputenc
when working with the pdfLaTeX and LaTeX compilers on Overleaf.
If your entire document involves just one language
When using the fontspec
package you might get away with only setting up a main (serif) font, a sans-serif font and probably a monospaced font designed to support the language you are typesetting—there’s a catch, but we’ll revisit that later in the article. For example, if your entire document is in Greek, with some English words, you can simply write
\usepackage{fontspec}
\setmainfont[Script=Greek]{GFS Artemisia}
\setsansfont[Script=Greek]{GFS Neohellenic}
\setmonofont[Script=Greek]{Noto Mono}
. . .
Το Lorem Ipsum είναι \textsf{απλά} ένα κείμενο χωρίς νόημα
για τους επαγγελματίες της \texttt{τυπογραφίας} και στοιχειοθεσίας.
You can choose fonts from a list of available TrueType and OpenType fonts. The Ligatures=TeX
option is added automatically for \setmainfont
and \setsansfont
, so you don’t have to add that yourself. (\setromanfont
is an alias of \setmainfont
.)
The LaTeX code above produces the following output:
Multiple languages/scripts in the same document: Introducing polyglossia
If your document contains non-trivial amounts of text in multiple languages, the polyglossia
package is helpful to help take care of language-specific typesetting conventions and hyphenation.
\usepackage{fontspec}
\setmainfont{FreeSerif}
\setsansfont{FreeSans}
\setmonofont{FreeMono}
\usepackage{polyglossia}
\setdefaultlanguage{french}
\setotherlanguages{english,russian,thai}
\begin{document}
\begin{abstract}
Le Lorem Ipsum est simplement du faux texte employé dans
la composition et la mise en page avant impression.
\end{abstract}
Merci. \textenglish{Thank you.} \textrussian{Спасибо.} Et plus de
texte en français!
Le Lorem Ipsum est le faux texte standard ...
\begin{english}
Lorem Ipsum is simply dummy text ...
\end{english}
\begin{russian}
Lorem Ipsum - это текст-`\textsf{рыба}', часто используемый в
\texttt{печати} и вэб-дизайне. ...
\end{russian}
\begin{thai}
\XeTeXlinebreaklocale "th_TH"
\textenglish{Lorem Ipsum} คือ เนื้อหาจำลองแบบเรียบๆ ที่ใช้กันในธุรกิจงานพิมพ์หรืองานเรียงพิมพ์
\end{thai}
polyglossia
lets you set the main language of the document with \setdefaultlanguage
(default is English) and (possibly multiple) ‘other’ languages with \setotherlanguages
. (\setmainlanguage
is an alias of \setdefaultlanguage
.) If you expect to be using just one other foreign language you can use the singular \setotherlanguage
. The language names are the same as those used by babel
.
We’ve prepared a small example of a (primarily) French document which also contains some English, Russian and Thai text. We’ve decided to use the FreeSerif, FreeSans and FreeMono typefaces.
Because the document’s main language is french
, the abstract
environment automatically produces the heading ‘Résumé’. Notice how, at the end of the first paragraph, the exclamation mark is typeset using the French-spacing typesetting convention: it is set apart from ‘français’ even though it follows immediately after the word français
in the source code.
In the main text, short English, Cyrillic and Thai text snippets can be included in a paragraph of French text with \textenglish{Thank you}
, \textrussian{Спасибо}
and \textthai{ขอบคุณ}
. Generally, you can use \textLANGUAGE{...}
to typeset text in any LANGUAGE
that has been declared by \setdefaultlanguage
and \setotherlanguages
. Because the document’s main (serif) font is FreeSerif, and FreeSerif contains glyphs for Latin, Cyrillic and Thai (and more!) scripts, fontspec
and polyglossia
can use it to render all these texts into the output PDF.
For longer paragraphs of text in foreign/other languages, it is recommended to use \begin{LANGUAGE}
...\end{LANGUAGE}
, e.g. \begin{russian}
...\end{russian}
, \begin{thai}
...\end{thai}
. In the case of Arabic you can’t use \begin{arabic}
...\end{arabic}
; you’ll have to write \begin{Arabic}
...\end{Arabic}
instead, while \textarabic{...}
is still valid.
Some considerations may be needed for certain languages: for instance, within the thai
environment, the words Lorem Ipsum
need to be wrapped in a \textenglish{...}
(or \textfrench{...}
) command to ensure they are rendered using the Latin-script glyphs.
At this point you might ask: If FreeSerif is so versatile and contains glyphs for Russian and Thai anyway, why would we still need to use \textrussian
, \begin{english}
...\end{english}
etc? Wouldn’t that be redundant? Let’s see what happens when we remove the \begin{english}
...\end{english}
and \begin{russian}
...\end{russian}
environments:
Certainly, the Latin and Cyrillic glyphs are all rendered in the output PDF, but note that some words are now hyphenated incorrectly: ‘unk-nown’ and ‘unchan-ged’—and стандартной isn’t hyphenated at all. Without the language-switching environments, the compiler thinks these text items are still in the French language and attempts to typeset them using French conventions. The compiler tries to apply French hyphenation rules which, naturally, produce incorrect results. This is why typography and typesetting is so much more than just font design and selection: they are very language- and culture-specific disciplines.
Revisiting our first Greek example, we now see why it is a good idea to load polyglossia
and use \setdefaultlanguage{greek}
: to ensure the document is typeset following Greek conventions.
Mixing right-to-left (RTL) and left-to-right (LTR) languages
You need to be careful when typesetting a mixture of right-to-left (RTL) scripts, such as Arabic or Hebrew, and left-to-right (LTR) scripts in the same document. Consider the following small Arabic document with an English word, using Amiri as the main font:
\usepackage{polyglossia}
\setdefaultlanguage{arabic}
\setmainfont{Amiri}
\begin{document}
ما هو differentiation
\end{document}
which produces:
The text is automatically set right-to-left, starting on the right-hand edge of the page. The word “differentiation” itself is typeset correctly as left-to-right text–but wait, no it’s not! It’s rendered as “dffirentiation” in the output! What’s going on?
The Amiri font does have glyphs for Latin alphabets but here the text differentiation
is not marked as English: the compiler treats differentiation
as right-to-left text, as if it were a sequence of Arabic characters. During typesetting, the original sequence iff
is processed as ffi
(i.e., as RTL text) and Amiri’s ligature glyph for “ffi” is typeset. Marking the word with \textenglish{...}
ensures it is interpreted correctly as left-to-right text.
\setmainfont{Amiri}
\setotherlanguage{english}
\newfontfamily\englishfont{TeX Gyre Termes}
\begin{document}
ما هو \textenglish{differentiation}
Note: If you’re used to the babel
package commands you’ll be happy to hear that the commands \selectlanguage
, \foreignlanguage
and the environment otherlanguage
are also supported by polyglossia
.
Language-specific options
Some languages support additional options for customisation; for example, greek
accepts a variant=ancient
, mono
or poly
option for ancient, monotonic or polytonic Greek; hindi
can be configured with numerals=western
or devanagari
. See the polyglossia package documentation for details.
These can be specified when loading the language:
\setdefaultlanguage[variant=poly]{greek}
\setotherlanguage[numerals=western]{hindi}
or later at anytime:
\setkeys{greek}{variant=ancient}
or even locally for a specific environment:
\begin{greek}[variant=ancient]
...
\end{greek}
Specifying fonts for specific languages
You can specify the font used for different languages. Suppose you’d like to typeset all English text (contained in our previous example) in italics; you could write:
\newfontfamily\englishfont{FreeSerif Italic}
You can of course use something even more flamboyant:
\newfontfamily\englishfont{Chancery Uralic}
This mechanism of setting fonts for different languages or scripts is especially important when you use a main font that does not have glyphs for all scripts or languages in your document. Suppose we now decide to use Caladea as the main document font:
\setmainfont{Caladea}
Upon compilation we would see the following error:
Package polyglossia Error: The current roman font
does not contain the Cyrillic script!
(polyglossia) Please define
\cyrillicfont with \newfontfamily.
See the polyglossia package documentation for
explanation.
Type H <return> for immediate help.
...
l.15 \select@language {russian}
Package polyglossia Error: The current roman font
does not contain the Thai script!
(polyglossia) Please define
\thaifont with \newfontfamily.
See the polyglossia package documentation for
explanation.
Type H <return> for immediate help.
...
l.23 \select@language {thai}
...
We are now obligated to specify which fonts to use for Cyrillic and Thai scripts. Again, you can refer to the list of available TrueType and OpenType fonts on Overleaf.
\newfontfamily\cyrillicfont[Script=Cyrillic]{Charis SIL}
\newfontfamily\thaifont[Script=Thai]{Garuda}
Note: it is outside the scope of this article to address issues relating to choices of aesthetically-pleasing and typographically-compatible font combinations.
Notice that we’ve defined \cyrillicfont
instead of \russianfont
, i.e. we defined a font for the Cyrillic script rather than the Russian language. The advantage of defining \cyrillicfont
is that if, for example, serbian
is also a defined language in your project, then \textserbian
would automatically use the defined \cyrillicfont
. If you had defined only \russianfont
, then using \textserbian
would again complain about “the current roman font does not contain the Cyrillic script” and you would need to define \cyrillicfont
anyway — unless you did mean to use a different font for Serbian text!
Another similar scenario is the Devanagari script, which is used for the Hindi and Sanskrit languages; or the Arabic script used for Arabic and Farsi (Persian).
\setdefaultlanguage{english}
\setotherlanguages{hindi,sanskrit}
\newfontfamily\devanagarifont[Script=Devanagari]{Lohit Devanagari}
...
Hindi: \texthindi{हिन्दी}
Sanskrit: \textsanskrit{संस्कृतम्}
When using \newfontfamily
it is necessary to specify the Script
, otherwise some glyphs may be rendered incorrectly; for example, if we had written only \newfontfamily\thaifont{Garuda}
the typeset result may be wrong (left image below)—the correct output is produced by adding [Script=Thai]
.
Wrong: Correct:
Defining other font families
Let’s have a look at another example, this time with Hebrew:
\documentclass{article}
\usepackage{polyglossia}
\setdefaultlanguage[numerals=hebrew]{hebrew}
\setotherlanguage{english}
\newfontfamily\hebrewfont[Script=Hebrew]{Hadasim CLM}
\begin{document}
\section{מבוא}
זוהי עובדה מבוססת שדעתו של הקורא תהיה מוסחת עלידי טקטס קריא כאשר הוא יביט בפריסתו. -
\end{document}
So far so good. Now suppose we were using a template originally created for an English document, which sets section headers in sans serif type using the titlesec
package:
\RequirePackage{titlesec}
\titleformat{\section}{\Large\sffamily\bfseries}{\thesection}{1em}{}
\usepackage{polyglossia}
\setdefaultlanguage[numerals=hebrew]{hebrew}
...
We are confronted with the error message:
Package polyglossia Error: The current roman font
does not contain the Hebrew script!
(polyglossia) Please define
\hebrewfont with \newfontfamily.
See the polyglossia package documentation for
explanation.
Type H <return> for immediate help.
...
l.27 \section{מבוא}
This is a bit confusing: didn’t we already define \hebrewfont
to be Hadasim CLM? Well, it’s really because we haven’t specified a sans serif font for Hebrew. Let’s remedy this by adding a definition for \hebrewfontsf
:
\newfontfamily\hebrewfontsf[Script=Hebrew]{Miriam CLM}
And now we have the output:
Should the need arise, we could also define a monospaced font to use with \hebrewfonttt
.
Acknowledgements
All lorem ipsum snippets, in various languages, are from https://lipsum.com
.
Overleaf guides
- Creating a document in Overleaf
- Uploading a project
- Copying a project
- Creating a project from a template
- Using the Overleaf project menu
- Including images in Overleaf
- Exporting your work from Overleaf
- Working offline in Overleaf
- Using Track Changes in Overleaf
- Using bibliographies in Overleaf
- Sharing your work with others
- Using the History feature
- Debugging Compilation timeout errors
- How-to guides
- Guide to Overleaf’s premium features
LaTeX Basics
- Creating your first LaTeX document
- Choosing a LaTeX Compiler
- Paragraphs and new lines
- Bold, italics and underlining
- Lists
- Errors
Mathematics
- Mathematical expressions
- Subscripts and superscripts
- Brackets and Parentheses
- Matrices
- Fractions and Binomials
- Aligning equations
- Operators
- Spacing in math mode
- Integrals, sums and limits
- Display style in math mode
- List of Greek letters and math symbols
- Mathematical fonts
- Using the Symbol Palette in Overleaf
Figures and tables
- Inserting Images
- Tables
- Positioning Images and Tables
- Lists of Tables and Figures
- Drawing Diagrams Directly in LaTeX
- TikZ package
References and Citations
- Bibliography management with bibtex
- Bibliography management with natbib
- Bibliography management with biblatex
- Bibtex bibliography styles
- Natbib bibliography styles
- Natbib citation styles
- Biblatex bibliography styles
- Biblatex citation styles
Languages
- Multilingual typesetting on Overleaf using polyglossia and fontspec
- Multilingual typesetting on Overleaf using babel and fontspec
- International language support
- Quotations and quotation marks
- Arabic
- Chinese
- French
- German
- Greek
- Italian
- Japanese
- Korean
- Portuguese
- Russian
- Spanish
Document structure
- Sections and chapters
- Table of contents
- Cross referencing sections, equations and floats
- Indices
- Glossaries
- Nomenclatures
- Management in a large project
- Multi-file LaTeX projects
- Hyperlinks
Formatting
- Lengths in LaTeX
- Headers and footers
- Page numbering
- Paragraph formatting
- Line breaks and blank spaces
- Text alignment
- Page size and margins
- Single sided and double sided documents
- Multiple columns
- Counters
- Code listing
- Code Highlighting with minted
- Using colours in LaTeX
- Footnotes
- Margin notes
Fonts
Presentations
Commands
Field specific
- Theorems and proofs
- Chemistry formulae
- Feynman diagrams
- Molecular orbital diagrams
- Chess notation
- Knitting patterns
- CircuiTikz package
- Pgfplots package
- Typesetting exams in LaTeX
- Knitr
- Attribute Value Matrices
Class files
- Understanding packages and class files
- List of packages and class files
- Writing your own package
- Writing your own class