As per title, can this be done?
eg if it contain Chinese, or Chinese and English?
Thanks!
Search by file name language type
-
void
- Developer
- Posts: 19903
- Joined: Fri Oct 16, 2009 11:31 pm
Re: Search by file name language type
Please try the following searches:
Chinese (Han):
regex:[\p{Han}]
English (Latin):
regex:[\p{Latin}]
Chinese and English:
regex:[\p{Han}] regex:[\p{Latin}]
Chinese and English ignoring the extension:
regex:[\p{Han}].*\.[^.]*$ regex:[\p{Latin}].*\.[^.]*$
The following scripts are also supported:
regex:[\p{Adlam}]
regex:[\p{Ahom}]
regex:[\p{Anatolian_Hieroglyphs}]
regex:[\p{Arabic}]
regex:[\p{Armenian}]
regex:[\p{Avestan}]
regex:[\p{Balinese}]
regex:[\p{Bamum}]
regex:[\p{Bassa_Vah}]
regex:[\p{Batak}]
regex:[\p{Bengali}]
regex:[\p{Bhaiksuki}]
regex:[\p{Bopomofo}]
regex:[\p{Brahmi}]
regex:[\p{Braille}]
regex:[\p{Buginese}]
regex:[\p{Buhid}]
regex:[\p{Canadian_Aboriginal}]
regex:[\p{Carian}]
regex:[\p{Caucasian_Albanian}]
regex:[\p{Chakma}]
regex:[\p{Cham}]
regex:[\p{Cherokee}]
regex:[\p{Chorasmian}]
regex:[\p{Common}]
regex:[\p{Coptic}]
regex:[\p{Cuneiform}]
regex:[\p{Cypriot}]
regex:[\p{Cypro_Minoan}]
regex:[\p{Cyrillic}]
regex:[\p{Deseret}]
regex:[\p{Devanagari}]
regex:[\p{Dives_Akuru}]
regex:[\p{Dogra}]
regex:[\p{Duployan}]
regex:[\p{Egyptian_Hieroglyphs}]
regex:[\p{Elbasan}]
regex:[\p{Elymaic}]
regex:[\p{Ethiopic}]
regex:[\p{Georgian}]
regex:[\p{Glagolitic}]
regex:[\p{Gothic}]
regex:[\p{Grantha}]
regex:[\p{Greek}]
regex:[\p{Gujarati}]
regex:[\p{Gunjala_Gondi}]
regex:[\p{Gurmukhi}]
regex:[\p{Han}]
regex:[\p{Hangul}]
regex:[\p{Hanifi_Rohingya}]
regex:[\p{Hanunoo}]
regex:[\p{Hatran}]
regex:[\p{Hebrew}]
regex:[\p{Hiragana}]
regex:[\p{Imperial_Aramaic}]
regex:[\p{Inherited}]
regex:[\p{Inscriptional_Pahlavi}]
regex:[\p{Inscriptional_Parthian}]
regex:[\p{Javanese}]
regex:[\p{Kaithi}]
regex:[\p{Kannada}]
regex:[\p{Katakana}]
regex:[\p{Kayah_Li}]
regex:[\p{Kharoshthi}]
regex:[\p{Khitan_Small_Script}]
regex:[\p{Khmer}]
regex:[\p{Khojki}]
regex:[\p{Khudawadi}]
regex:[\p{Lao}]
regex:[\p{Latin}]
regex:[\p{Lepcha}]
regex:[\p{Limbu}]
regex:[\p{Linear_A}]
regex:[\p{Linear_B}]
regex:[\p{Lisu}]
regex:[\p{Lycian}]
regex:[\p{Lydian}]
regex:[\p{Mahajani}]
regex:[\p{Makasar}]
regex:[\p{Malayalam}]
regex:[\p{Mandaic}]
regex:[\p{Manichaean}]
regex:[\p{Marchen}]
regex:[\p{Masaram_Gondi}]
regex:[\p{Medefaidrin}]
regex:[\p{Meetei_Mayek}]
regex:[\p{Mende_Kikakui}]
regex:[\p{Meroitic_Cursive}]
regex:[\p{Meroitic_Hieroglyphs}]
regex:[\p{Miao}]
regex:[\p{Modi}]
regex:[\p{Mongolian}]
regex:[\p{Mro}]
regex:[\p{Multani}]
regex:[\p{Myanmar}]
regex:[\p{Nabataean}]
regex:[\p{Nandinagari}]
regex:[\p{New_Tai_Lue}]
regex:[\p{Newa}]
regex:[\p{Nko}]
regex:[\p{Nushu}]
regex:[\p{Nyakeng_Puachue_Hmong}]
regex:[\p{Ogham}]
regex:[\p{Ol_Chiki}]
regex:[\p{Old_Hungarian}]
regex:[\p{Old_Italic}]
regex:[\p{Old_North_Arabian}]
regex:[\p{Old_Permic}]
regex:[\p{Old_Persian}]
regex:[\p{Old_Sogdian}]
regex:[\p{Old_South_Arabian}]
regex:[\p{Old_Turkic}]
regex:[\p{Old_Uyghur}]
regex:[\p{Oriya}]
regex:[\p{Osage}]
regex:[\p{Osmanya}]
regex:[\p{Pahawh_Hmong}]
regex:[\p{Palmyrene}]
regex:[\p{Pau_Cin_Hau}]
regex:[\p{Phags_Pa}]
regex:[\p{Phoenician}]
regex:[\p{Psalter_Pahlavi}]
regex:[\p{Rejang}]
regex:[\p{Runic}]
regex:[\p{Samaritan}]
regex:[\p{Saurashtra}]
regex:[\p{Sharada}]
regex:[\p{Shavian}]
regex:[\p{Siddham}]
regex:[\p{SignWriting}]
regex:[\p{Sinhala}]
regex:[\p{Sogdian}]
regex:[\p{Sora_Sompeng}]
regex:[\p{Soyombo}]
regex:[\p{Sundanese}]
regex:[\p{Syloti_Nagri}]
regex:[\p{Syriac}]
regex:[\p{Tagalog}]
regex:[\p{Tagbanwa}]
regex:[\p{Tai_Le}]
regex:[\p{Tai_Tham}]
regex:[\p{Tai_Viet}]
regex:[\p{Takri}]
regex:[\p{Tamil}]
regex:[\p{Tangsa}]
regex:[\p{Tangut}]
regex:[\p{Telugu}]
regex:[\p{Thaana}]
regex:[\p{Thai}]
regex:[\p{Tibetan}]
regex:[\p{Tifinagh}]
regex:[\p{Tirhuta}]
regex:[\p{Toto}]
regex:[\p{Ugaritic}]
regex:[\p{Unknown}]
regex:[\p{Vai}]
regex:[\p{Vithkuqi}]
regex:[\p{Wancho}]
regex:[\p{Warang_Citi}]
regex:[\p{Yezidi}]
regex:[\p{Yi}]
regex:[\p{Zanabazar_Square}]
PCRE Unicode character properties
Chinese (Han):
regex:[\p{Han}]
English (Latin):
regex:[\p{Latin}]
Chinese and English:
regex:[\p{Han}] regex:[\p{Latin}]
Chinese and English ignoring the extension:
regex:[\p{Han}].*\.[^.]*$ regex:[\p{Latin}].*\.[^.]*$
The following scripts are also supported:
regex:[\p{Adlam}]
regex:[\p{Ahom}]
regex:[\p{Anatolian_Hieroglyphs}]
regex:[\p{Arabic}]
regex:[\p{Armenian}]
regex:[\p{Avestan}]
regex:[\p{Balinese}]
regex:[\p{Bamum}]
regex:[\p{Bassa_Vah}]
regex:[\p{Batak}]
regex:[\p{Bengali}]
regex:[\p{Bhaiksuki}]
regex:[\p{Bopomofo}]
regex:[\p{Brahmi}]
regex:[\p{Braille}]
regex:[\p{Buginese}]
regex:[\p{Buhid}]
regex:[\p{Canadian_Aboriginal}]
regex:[\p{Carian}]
regex:[\p{Caucasian_Albanian}]
regex:[\p{Chakma}]
regex:[\p{Cham}]
regex:[\p{Cherokee}]
regex:[\p{Chorasmian}]
regex:[\p{Common}]
regex:[\p{Coptic}]
regex:[\p{Cuneiform}]
regex:[\p{Cypriot}]
regex:[\p{Cypro_Minoan}]
regex:[\p{Cyrillic}]
regex:[\p{Deseret}]
regex:[\p{Devanagari}]
regex:[\p{Dives_Akuru}]
regex:[\p{Dogra}]
regex:[\p{Duployan}]
regex:[\p{Egyptian_Hieroglyphs}]
regex:[\p{Elbasan}]
regex:[\p{Elymaic}]
regex:[\p{Ethiopic}]
regex:[\p{Georgian}]
regex:[\p{Glagolitic}]
regex:[\p{Gothic}]
regex:[\p{Grantha}]
regex:[\p{Greek}]
regex:[\p{Gujarati}]
regex:[\p{Gunjala_Gondi}]
regex:[\p{Gurmukhi}]
regex:[\p{Han}]
regex:[\p{Hangul}]
regex:[\p{Hanifi_Rohingya}]
regex:[\p{Hanunoo}]
regex:[\p{Hatran}]
regex:[\p{Hebrew}]
regex:[\p{Hiragana}]
regex:[\p{Imperial_Aramaic}]
regex:[\p{Inherited}]
regex:[\p{Inscriptional_Pahlavi}]
regex:[\p{Inscriptional_Parthian}]
regex:[\p{Javanese}]
regex:[\p{Kaithi}]
regex:[\p{Kannada}]
regex:[\p{Katakana}]
regex:[\p{Kayah_Li}]
regex:[\p{Kharoshthi}]
regex:[\p{Khitan_Small_Script}]
regex:[\p{Khmer}]
regex:[\p{Khojki}]
regex:[\p{Khudawadi}]
regex:[\p{Lao}]
regex:[\p{Latin}]
regex:[\p{Lepcha}]
regex:[\p{Limbu}]
regex:[\p{Linear_A}]
regex:[\p{Linear_B}]
regex:[\p{Lisu}]
regex:[\p{Lycian}]
regex:[\p{Lydian}]
regex:[\p{Mahajani}]
regex:[\p{Makasar}]
regex:[\p{Malayalam}]
regex:[\p{Mandaic}]
regex:[\p{Manichaean}]
regex:[\p{Marchen}]
regex:[\p{Masaram_Gondi}]
regex:[\p{Medefaidrin}]
regex:[\p{Meetei_Mayek}]
regex:[\p{Mende_Kikakui}]
regex:[\p{Meroitic_Cursive}]
regex:[\p{Meroitic_Hieroglyphs}]
regex:[\p{Miao}]
regex:[\p{Modi}]
regex:[\p{Mongolian}]
regex:[\p{Mro}]
regex:[\p{Multani}]
regex:[\p{Myanmar}]
regex:[\p{Nabataean}]
regex:[\p{Nandinagari}]
regex:[\p{New_Tai_Lue}]
regex:[\p{Newa}]
regex:[\p{Nko}]
regex:[\p{Nushu}]
regex:[\p{Nyakeng_Puachue_Hmong}]
regex:[\p{Ogham}]
regex:[\p{Ol_Chiki}]
regex:[\p{Old_Hungarian}]
regex:[\p{Old_Italic}]
regex:[\p{Old_North_Arabian}]
regex:[\p{Old_Permic}]
regex:[\p{Old_Persian}]
regex:[\p{Old_Sogdian}]
regex:[\p{Old_South_Arabian}]
regex:[\p{Old_Turkic}]
regex:[\p{Old_Uyghur}]
regex:[\p{Oriya}]
regex:[\p{Osage}]
regex:[\p{Osmanya}]
regex:[\p{Pahawh_Hmong}]
regex:[\p{Palmyrene}]
regex:[\p{Pau_Cin_Hau}]
regex:[\p{Phags_Pa}]
regex:[\p{Phoenician}]
regex:[\p{Psalter_Pahlavi}]
regex:[\p{Rejang}]
regex:[\p{Runic}]
regex:[\p{Samaritan}]
regex:[\p{Saurashtra}]
regex:[\p{Sharada}]
regex:[\p{Shavian}]
regex:[\p{Siddham}]
regex:[\p{SignWriting}]
regex:[\p{Sinhala}]
regex:[\p{Sogdian}]
regex:[\p{Sora_Sompeng}]
regex:[\p{Soyombo}]
regex:[\p{Sundanese}]
regex:[\p{Syloti_Nagri}]
regex:[\p{Syriac}]
regex:[\p{Tagalog}]
regex:[\p{Tagbanwa}]
regex:[\p{Tai_Le}]
regex:[\p{Tai_Tham}]
regex:[\p{Tai_Viet}]
regex:[\p{Takri}]
regex:[\p{Tamil}]
regex:[\p{Tangsa}]
regex:[\p{Tangut}]
regex:[\p{Telugu}]
regex:[\p{Thaana}]
regex:[\p{Thai}]
regex:[\p{Tibetan}]
regex:[\p{Tifinagh}]
regex:[\p{Tirhuta}]
regex:[\p{Toto}]
regex:[\p{Ugaritic}]
regex:[\p{Unknown}]
regex:[\p{Vai}]
regex:[\p{Vithkuqi}]
regex:[\p{Wancho}]
regex:[\p{Warang_Citi}]
regex:[\p{Yezidi}]
regex:[\p{Yi}]
regex:[\p{Zanabazar_Square}]
PCRE Unicode character properties
-
Ismale.d
- Posts: 11
- Joined: Tue Nov 02, 2021 11:20 am
Re: Search by file name language type
hi thanks for the reply, however can you give me more hints on what term should I user to search for the sytanx for say Japanese?
pcre2pattern spec is too technical for me and I have tried to search for "pre2 unicode language list", "pre2 unicode Japanese" ..etc and I can't find anything that work.
pcre2pattern spec is too technical for me and I have tried to search for "pre2 unicode language list", "pre2 unicode Japanese" ..etc and I can't find anything that work.
-
void
- Developer
- Posts: 19903
- Joined: Fri Oct 16, 2009 11:31 pm
Re: Search by file name language type
To search for Hiragana OR Katakana:
regex:[\p{Hiragana}\p{Katakana}]
To search for Hiragana OR Katakana OR Han:
regex:[\p{Hiragana}\p{Katakana}\p{Han}]
Using unicode ranges might be better.
For example, Kanji:
regex:[\u4E00-\u9FFF]
https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
regex:[\p{Hiragana}\p{Katakana}]
To search for Hiragana OR Katakana OR Han:
regex:[\p{Hiragana}\p{Katakana}\p{Han}]
Using unicode ranges might be better.
For example, Kanji:
regex:[\u4E00-\u9FFF]
https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
-
Ismale.d
- Posts: 11
- Joined: Tue Nov 02, 2021 11:20 am
Re: Search by file name language type
wow this is pretty new to me, thanks for the help and reference! Couldn't have understand it otherwise 
-
Ismale.d
- Posts: 11
- Joined: Tue Nov 02, 2021 11:20 am
Re: Search by file name language type
oh I played it around abit, actually the syntax of using ranges doesn't work, may be additional symbol is needed? I also tried the range from other language, and especially english, and none of it work. (AC00, D743; U+0000, U+007F)void wrote: Wed Feb 22, 2023 11:40 pm To search for Hiragana OR Katakana:
regex:[\p{Hiragana}] | regex:[\p{Katakana}]
To search for Hiragana OR Katakana OR Han:
regex:[\p{Hiragana}] | regex:[\p{Katakana}] | regex:[\p{Han}]
Using unicode ranges might be better.
For example, Kanji:
regex:[\u4E00-\u9FFF]
https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
-
void
- Developer
- Posts: 19903
- Joined: Fri Oct 16, 2009 11:31 pm
Re: Search by file name language type
The PCRE syntax is:
\x{hhh..} character with hex code hhh..
Use a - inside [ and ] to specify a range.
Please try the following:
regex:[\p{Hiragana}\p{Katakana}\x{4E00}-\x{9FFF}]
PCRE Non-printing characters
\x{hhh..} character with hex code hhh..
Use a - inside [ and ] to specify a range.
Please try the following:
regex:[\p{Hiragana}\p{Katakana}\x{4E00}-\x{9FFF}]
PCRE Non-printing characters
-
Ismale.d
- Posts: 11
- Joined: Tue Nov 02, 2021 11:20 am
Re: Search by file name language type
work perfectly! Really appreicate the help!
-
samiaziz
- Posts: 4
- Joined: Tue Jan 23, 2024 3:16 pm
Re: Search by file name language type
That is very useful. However, what can I do to search for filenames written in a given language (like Korean) exclusively without any characters from another language?
-
NotNull
- Posts: 5961
- Joined: Wed May 24, 2017 9:22 pm
Re: Search by file name language type
The Korean alphabet is called Hangul (says Internet ..)
With that:
Explanation:
regex:[^\p{Hangul}] = Show all files/folders that have non-Korean characters in them anywhere.
!regex:... = show alkl files/folders, except the ones found above, meaning only files with Korean characters exclusively.
So I don't know how practical this will be, but this is what you asked
Regular Expressions Syntax
With that:
Code: Select all
!regex:[^\p{Hangul}]regex:[^\p{Hangul}] = Show all files/folders that have non-Korean characters in them anywhere.
!regex:... = show alkl files/folders, except the ones found above, meaning only files with Korean characters exclusively.
Note that the search query above will not list files with a "normal" extension, like .txt, .jpg, .zip as those are non-Korean characters. Same goes for files with numbers (0...9) in them.samiaziz wrote: Tue Jan 23, 2024 3:39 pm what can I do to search for filenames written in a given language (like Korean) exclusively without any characters from another language?
So I don't know how practical this will be, but this is what you asked
Regular Expressions Syntax
-
void
- Developer
- Posts: 19903
- Joined: Fri Oct 16, 2009 11:31 pm
Re: Search by file name language type
Please consider the following search to ignore the extension:
regex:^[\p{Hangul}]+\.[a-z]+$
regex:^[\p{Hangul}]+\.[a-z]+$
-
samiaziz
- Posts: 4
- Joined: Tue Jan 23, 2024 3:16 pm
Re: Search by file name language type
Thanks a lot. That is exactly what I was looking for.void wrote: Wed Jan 24, 2024 2:55 am Please consider the following search to ignore the extension:
regex:^[\p{Hangul}]+\.[a-z]+$
The following search gives the same result of ignoring the extension:
regex:stem:^[\p{Hangul}]+$
-
samiaziz
- Posts: 4
- Joined: Tue Jan 23, 2024 3:16 pm
Re: Search by file name language type
Thank you for your response,NotNull wrote: Tue Jan 23, 2024 4:28 pm
So I don't know how practical this will be, but this is what you asked![]()
I have some downloaded files with a name in a foreign language only and I want to add a translation to my language to the name of these files without removing the original names.
When I search for the Korean language in the file name for example, the search result lists:
- file names in Korean only,
- and file names in Korean and other languages (which I have already changed).
-
NotNull
- Posts: 5961
- Joined: Wed May 24, 2017 9:22 pm
Re: Search by file name language type
I understand. Thanks for explaining. What I meant was that usually the file extension is *not* in Korean, so that would skip lots of files that still might be of interest. But you mentioned:
But now I get that you wanted the filename *without extension* to be Korean-only.
Anyway .. problem solved
And any .txt file contains characters from another language, namely t,x and t.exclusively without any characters from another language?
But now I get that you wanted the filename *without extension* to be Korean-only.
Anyway .. problem solved