Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Have a suggestion for "Everything"? Please post it here.
Post Reply
defza
Posts: 22
Joined: Thu Apr 18, 2019 12:49 pm

Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Post by defza » Fri Feb 18, 2022 2:57 pm

void wrote:
Fri Dec 04, 2020 6:18 am
Searching, sorting by, displaying and indexing tags is in development.

Thank you for your suggestion.
So, I still can't search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID for example, right?
Even if it's slow and non-indexed?

raccoon
Posts: 717
Joined: Thu Oct 18, 2018 1:24 am

Re: Custom tags

Post by raccoon » Fri Feb 18, 2022 4:37 pm

We can search for them in Everything 1.5 Alpha using a custom tailored binary contents search. If you could attach a very tiny public domain MP3 file with a typical AcoustID tag, I can examine it in a hex editor and give you a couple search options.

horst.epp
Posts: 751
Joined: Fri Apr 04, 2014 3:24 pm

Re: Custom tags

Post by horst.epp » Fri Feb 18, 2022 5:16 pm

I use NTFS streams for tags which are indexed by Everythings Properties indexing..
The tags are written by a Total Commander plugin or a XYplorer script.
Both tools support Everything based searches.

defza
Posts: 22
Joined: Thu Apr 18, 2019 12:49 pm

Re: Custom tags

Post by defza » Fri Feb 18, 2022 5:34 pm

raccoon wrote:
Fri Feb 18, 2022 4:37 pm
We can search for them in Everything 1.5 Alpha using a custom tailored binary contents search. If you could attach a very tiny public domain MP3 file with a typical AcoustID tag, I can examine it in a hex editor and give you a couple search options.
I see what you mean, a hex search on the contents... but would it be able to find one's with duplicate AcoustID though?
Here is a small sample https://file.io/5NTZZKF3HJTa

raccoon
Posts: 717
Joined: Thu Oct 18, 2018 1:24 am

Re: Custom tags

Post by raccoon » Fri Feb 18, 2022 6:32 pm

Your file was deleted. Please attach it to the forum here, in the Attachments tab below Submit.

Regular expressions search can capture data and place it into a column. You can sort and detect duplicates from this column.

defza
Posts: 22
Joined: Thu Apr 18, 2019 12:49 pm

Re: Custom tags

Post by defza » Fri Feb 18, 2022 6:59 pm


raccoon
Posts: 717
Joined: Thu Oct 18, 2018 1:24 am

Re: Custom tags

Post by raccoon » Fri Feb 18, 2022 8:06 pm

@void, what am I doing wrong here?

I can successfully match content:"Acoustid Id" against this file, but I cannot do anything else at all. no regex:content:"(Acoustid Id)", no contentw:"Acoustid Id" or any other progress forward to handling content and creating a regex capture group.

void
Developer
Posts: 9622
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Post by void » Sat Feb 19, 2022 2:02 am

This is going to be difficult to do with Everything currently..



Example usage to find "Acoustid Id"

Code: Select all

"02 Skit 1.mp3" regex:binary:content:"\x41\x00\x63\x00\x6F\x00\x75\x00\x73\x00\x74\x00\x69\x00\x64\x00\x20\x00\x49\x00\x64\x00\x00\x00\xFF\xFE(.*?)\x00\x00"
regex: = enable regular expressions so we can capture the tag value.
binary: = treat the search and content as a byte stream.
content: = search for content
\x?? = the tag in hex
(.?*) = capture the UTF-16 text after the tag.
\x00\x00 = match the trailing double null terminator.

Use the Regex Match 1 column to view the capture.
Unfortunately, the captured text is treated as a byte stream, so you will see something like:

Code: Select all

0 4 c c b 3 e 9 - 9 c a f - 4 1 f f - a 1 7 2 - f 5 c 3 c 2 9 a 9 1 3 3
You could the right click this column and click find duplicates to find duplicated Acoustid Id.



I am working on a hexcontent: search function to treat content as a binary in hexidecimal values to make this easier to search and easier to read the regex captures.


I can successfully match content:"Acoustid Id"
This shouldn't match.
I was unable to produce this result my end.
The default iFilter for mp3 files should return the content as empty.
Could you please confirm content:"Acoustid Id" is matching this file.


contentw: or utf16becontent: will most likely not match because it will treat the entire content as UTF-16LE / UTF-16BE
These functions will not work because the header and tags are not word aligned.

I need a contentw+1: search function...

raccoon
Posts: 717
Joined: Thu Oct 18, 2018 1:24 am

Re: Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Post by raccoon » Sat Feb 19, 2022 6:35 am

void wrote:
Sat Feb 19, 2022 2:02 am
raccoon wrote:I can successfully match content:"Acoustid Id"
This shouldn't match.
I was unable to produce this result my end.
The default iFilter for mp3 files should return the content as empty.
Could you please confirm content:"Acoustid Id" is matching this file.
I'm using Windows 7. I get a match of this file, but not other files in my test directory, including other mp3 files. Just this one file matches, correctly imho.

Everything64_R4lQHaff2r.png
Everything64_R4lQHaff2r.png (17.63 KiB) Viewed 2981 times
FWIW, PCRE regex can handle binary data just fine. There should be no need to evoke hex. Just spew raw binary data to the regex function. Trust the user to pluck out match text.

void
Developer
Posts: 9622
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Post by void » Sat Feb 19, 2022 6:46 am

Everything is falling back to a mixed search (treats the file as ansi, utf8, utf16, utf16be, utf16+1, utf16be+1)

One of these is matching.

The content is converted to UTF-8 before performing any match is performed.

regex: doesn't work because the UTF-8 conversion is invalid. (PCRE_UTF8_ERR14)
I am working on a fix.

raccoon
Posts: 717
Joined: Thu Oct 18, 2018 1:24 am

Re: Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Post by raccoon » Sat Feb 19, 2022 6:53 am

The binary data of the ID3v2 tag located at the file header is UTF-16 or UCS-2. So it goes <ASC><NUL><ASC><NUL>...

The early ID3v2 spec quotes: All unicode strings [UNICODE] use 16-bit unicode 2.0 (ISO/IEC 10646-1:1993, UCS-2). So yeah. "That" flavor of kinda-sorta-sometimes compliant UTF-16.

void
Developer
Posts: 9622
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Post by void » Sat Feb 19, 2022 8:06 am

The following will work in the next alpha update:

"02 Skit 1.mp3" regex:utf16content:"Acoustid Id\x00([^\x00]*)\x00" | regex:utf16offset1content:"Acoustid Id\x00([^\x00]*)\x00"

This will also display the captured text nicely.
I'll post again once this is ready for testing.

raccoon
Posts: 717
Joined: Thu Oct 18, 2018 1:24 am

Re: Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Post by raccoon » Sat Feb 19, 2022 8:58 am

Is utf16content: compatible with first1024bytes: so Everything doesn't read the entire file into memory, only as much as should be needed to get the job done?

void
Developer
Posts: 9622
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Post by void » Sat Feb 19, 2022 9:00 am

Currently, No.
In the next alpha update there will be a content-max-size: search function to specify the maximum number of bytes to read in the following content: function.

For example:
content-offset:0 content-max-size:1024 utf16content:"..."

void
Developer
Posts: 9622
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Post by void » Tue Feb 22, 2022 6:58 am

Everything 1.5.0.1301a make several improvements to content searching.

The following will now find the Acoustid ID value in the Regular Expression Match 1 column:

"02 Skit 1.mp3" regex:utf16offset1content:"Acoustid Id\x00([^\x00]*)\x00" | regex:utf16content:"Acoustid Id\x00([^\x00]*)\x00"



There's still a little bit more work to do here.

I would like to simplify this search into one call:
"02 Skit 1.mp3" regex:binarycontent:"Acoustid Id\x00([^\x00]*)\x00"

This currently doesn't work due to a search optimization.



Added hexcontent: to treat file content as a hex stream.
For example:

hex-content:4944330300000000



Added content-offset: search function to specified the starting byte-offset to read from disk.

Added content-max-size: search function to specified the maximum number of bytes to read from disk with a content search function.

Added utf16-offset1-content: search function to start searching UTF-16 content with a 1 byte offset.

Added utf16be-offset1-content: search function to start searching UTF-16BE content with a 1 byte offset.

void
Developer
Posts: 9622
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search inside an mp3 file for some custom or non-standard id3 tag like the AcoustID

Post by void » Thu Feb 24, 2022 5:34 am

Everything 1.5.0.1302a adds support for:

"02 Skit 1.mp3" regex:binarycontent:"Acoustid Id\x00([^\x00]*)\x00"


binarycontent: means search content as all of the following until a match is found, or no match is found:
  • ANSI
  • UTF-8
  • UTF-16
  • UTF-16BE
  • UTF-16 (with a byte offset of 1)
  • UTF-16BE (with a byte offset of 1)
The search term is still treated as text (not binary).
(whereas the binary: modifier treats the content and the search as binary)


Everything 1.5.0.1302a adds an ascii-content: search function to search files as ASCII text.



Everything 1.5.0.1302a will no longer treat characters in the range 0xd800 - 0xdfff as NULs.
(This was introduced in Everything 1301a)
While characters in the range 0xd800 - 0xdfff are invalid Unicode characters, they are valid in filenames on Windows.
Force a rebuild if you are indexing files with invalid Unicode characters.



Everything 1.5.0.1302a modifies PCRE to accept characters in the range 0xd800 - 0xdfff.
(disables PCRE_UTF8_ERR14)

Post Reply