Search in .htm (.html) files

Found a bug in "Everything"? report it here
Post Reply
AE_AE
Posts: 2
Joined: Mon Nov 11, 2019 6:56 pm

Search in .htm (.html) files

Post by AE_AE » Mon Nov 11, 2019 7:57 pm

Hello!

Thanks very much for the super desktop search utility for Windows. "Everything" is the very useful, helpful and easy search engine. Everyday I search files by filename and by content many times. And so I found some bugs.

My English is poor, but I try to put my ideas (thought) into words clearly.

fact_01: BUG.
Everything_1.4.1.935 DOESN'T find English and Russian words (letters) in .htm and .html types of files, irrespective of character encoding (UTF-8 with BOM or UTF-8 without BOM) in part of code Location of Bookmark <A HREF=" " >.

fact_02: Normal. Everything_1.4.1.935 FIND English and Russian words (letters) in .htm and .html types of files, irrespective of character encoding (UTF-8 with BOM or UTF-8 without BOM) in parts of code Location and Description of Bookmark <DD> and Name of Bookmark > </A>.

For example:

There are two .html files (Everything_bookmarks_UTF8_BOM.html and Everything_bookmarks_UTF8_no_BOM.html) in attachments. These files are bookmarks from browser. These bookmarks contain 2 hyperlinks:

Вопросы и ответы - voidtools
https://www.voidtools.com/ru-ru/faq/
Очень быстрый поиск с программой Everything / Хабр
https://habr.com/ru/post/42354/



<DL><p>
<DT><A HREF="https://www.voidtools.com/ru-ru/faq/" ADD_DATE="1573496435" LAST_MODIFIED="1573496435" ICON_URI="https://www.voidtools.com/favicon.ico" >Вопросы и ответы - voidtools</A>
<DT><A HREF="https://habr.com/ru/post/42354/" ADD_DATE="1573496463" LAST_MODIFIED="1573496463" ICON_URI="https://habr.com/images/favicon-16x16.png" >Очень быстрый поиск с программой Everything / Хабр</A>
<DD>Начну немного «издалека». Дело в том, что я (и думаю не я один) — очень люблю маленькие но функциональные программы. Я встречал несколько таких приложений, которые иначе чем шедеврами софтостроения...

</DL>

Using Everything_1.4.1.935

01. You CAN find these words: "Вопросы", "voidtools", "Everything", "Хабр", "встречал", "люблю", because they are in part of code <DD> or > </A>.

02. You CANNOT find these words: "voidtools.com", "ru-ru", "habr", "post/42354/", "habr.com", "ww.void" because they are in part of code <A HREF=" " >.

Please, fix this BUG in next versions of Everything (if this possible).

Thanks very much.
You do not have the required permissions to view the files attached to this post.

NotNull
Posts: 1596
Joined: Wed May 24, 2017 9:22 pm

Re: Search in .htm (.html) files

Post by NotNull » Mon Nov 11, 2019 8:52 pm

If I understand the content: function correctly (I don't use it very often), it will search in the resulting text of a document (minus formatting and layout), just as Windows Search uses for indexing (see iFilter).

If you want to search in the raw text, you can use some other Everything functions:
ansicontent:
utf8content:
utf16content:
utf16becontent:

In your case, replace content: with utf8content: to also search in - for example - HREF attributes.

AE_AE
Posts: 2
Joined: Mon Nov 11, 2019 6:56 pm

Re: Search in .htm (.html) files

Post by AE_AE » Tue Nov 12, 2019 10:44 pm

Hello, NotNull!

Thanks very much for your quick answer. You helped me.

Now I find any text in .htm (.html) files by using request: .HTM utf8content:

Also I have began to use your advice for another cases.

For example, Everything_1.4.1.935 DOESN'T find Russian words (letters) in .txt type of files, when character encoding is UTF-8 without BOM.

Now I find any text in .txt files, when character encoding is UTF-8 without BOM, by using request: .txt utf8content:

NotNull
Posts: 1596
Joined: Wed May 24, 2017 9:22 pm

Re: Search in .htm (.html) files

Post by NotNull » Tue Nov 12, 2019 11:04 pm

:thumbsup:

Glad that I could help.

FYI: it is under consideration for a next major version of Everything to bypass this iFilter behaviour for certain text-based files like xml, html and json.
That way you don't have to use the utf8content: function to find your text, but you can use the "normal" content: function instead.

Post Reply