Digitization support reference

These references outline supported file types, settings, and languages for document digitization.

Supported file types

These are the supported file types in Reader and in the Flow process files and fetch files steps.

OCR Required Spatial Layout Process Attachments Metadata Generation Password-protected Attachment Support Segmentation
Machine readable PDF No Yes Fetch-files step only Yes Fetch-files step only Reader app only
Scanned PDF/Images* Yes Yes Fetch-files step only Yes Fetch-files step only Reader app only
.xlsx No Yes Yes No Process-files and fetch-files step only Reader app only
.docx No Yes No No Process-files and fetch-files step only Reader app only
.rtf No No No No No Reader app only
.pptx No No No No No Reader app only
.eml No Yes Yes No Process-files and fetch-files step only Reader app only
.msg No Yes Yes No Process-files and fetch-files step only Reader app only
.html No Yes N/A No N/A Reader app only
.mht No Yes Yes No No Reader app only
.csv No N/A N/A N/A N/A Reader app only
.txt No N/A N/A N/A N/A Reader app only

* Images include .bmp, .gif (single frame), .ico, .jpeg, .jpg, .png, .tif and .tiff file types.

Supported settings by file type

These are the supported settings by file type in Reader and in the Flow process files step.

Write Converted Image Write Thumbnail Correct Orientation Correct Resolution Find Lines** Find Barcodes
Machine readable PDF Yes Yes N/A N/A Yes Yes
Scanned PDF/Images* Yes Yes Yes Yes Yes Yes
.xlsx Yes Yes N/A N/A Yes Yes
.docx Yes Yes N/A N/A Yes Yes
.pptx No No N/A N/A No No
.eml Yes Yes N/A N/A Yes Yes
.msg No No N/A N/A N/A N/A
.html Yes Yes N/A N/A Yes Yes
.mht No No N/A N/A Yes Yes
.csv No No N/A N/A N/A N/A
.txt No No N/A N/A N/A N/A

* Images include .bmp, .gif (single frame), .ico, .jpeg, .jpg, .png, .tif and .tiff file types.

** Find-lines is supported through the force_image_ocr setting.

Supported languages and language codes

These are the supported languages in Reader and in the Flow process files step.

Languages supported by Google Vision (Cloud)

Code Language
af Afrikaans
sq Albanian
ar Arabic
hy Armenian
be Belarusian
bn Bengali
bg Bulgarian
ca Catalan; Valencian
zh Chinese
hr Croatian
cs Czech
da Danish
nl Dutch
en English
et Estonian
fil Filipino
fi Finnish
fr French
de German
el Greek, Modern
gu Gujarati
iw Hebrew
hi Hindi
hu Hungarian
is Icelandic
id Indonesian
it Italian
ja Japanese
kn Kannada
km Khmer
ko Korean
lo Lao
lv Latvian
lt Lithuanian
mk Macedonian
ms Malay
ml Malayalam
mr Marathi (Marāṭhī)
ne Nepali
no Norwegian
fa Persian
pl Polish
pt Portuguese
pa Panjabi, Punjabi
ro Romanian
ru Russian
ru-PETR1708 Russian
sr Serbian
sr-Latn Serbian
sk Slovak
sl Slovene
es Spanish; Castilian
sv Swedish
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
vi Vietnamese
yi Yiddish

Languages supported by Microsoft OCR

Code Language
ar Arabic
zh-Hans Chinese (Simplified)
zh-Hant Chinese (Traditional)
cs Czech
da Danish
nl Dutch
en English
fi Finnish
fr French
de German
el Greek, Modern
hu Hungarian
it Italian
ja Japanese
ko Korean
pl Polish
pt Portuguese
ro Romanian
ru Russian
sr-cyrl Serbian (Cyrillic)
sr-latn Serbian (Latin)
sk Slovak
es Spanish; Castilian
sv Swedish
tr Turkish

Languages supported by Tesseract

Code Language
af Afrikaans
am Amharic
ar Arabic
as Assamese
az Azerbaijani
be Belarusian
bg Bulgarian
bn Bengali
br Breton
bs Bosnian
ca Catalan; Valencian
cs Czech
cy Welsh
da Danish
de German
dz Dzongkha
el Greek, Modern
en English
eo Esperanto
es Spanish; Castilian
et Estonian
fi Finnish
fr French
ga Irish
gl Galician
gu Gujarati
he Hebrew (modern)
hi Hindi
hr Croatian
ht Haitian; Haitian Creole
hu Hungarian
id Indonesian
is Icelandic
it Italian
iu Inuktitut
ja Japanese
jv Javanese
ka Georgian
kk Kazakh
km Khmer
kn Kannada
ko Korean
ku Kurdish
ky Kirghiz, Kyrgyz
la Latin
lb Luxembourgish, Letzeburgesch
lo Lao
lt Lithuanian
lv Latvian
mi Māori
mk Macedonian
ml Malayalam
mn Mongolian
mr Marathi (Marāṭhī)
ms Malay
mt Maltese
my Burmese
ne Nepali
nl Dutch
no Norwegian
oc Occitan
or Oriya
pa Panjabi, Punjabi
pl Polish
ps Pashto, Pushto
pt Portuguese
qu Quechua
ru Russian
sa Sanskrit (Saṁskṛta)
sd Sindhi
si Sinhala, Sinhalese
sk Slovak
sl Slovene
sr Serbian
su Sundanese
sv Swedish
sw Swahili
ta Tamil
te Telugu
tg Tajik
th Thai
ti Tigrinya
tl Tagalog
to Tongan
tr Turkish
tt Tatar
ug Uighur, Uyghur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
yi Yiddish
yo Yoruba
zh-Hans Chinese (Simplified)
zh-Hant Chinese (Traditional)