ESN 57222-090413-489357-27


Document Name: Mess of Metadata
Document Description: Mess of Metadata

Mess of Metadata


2009/04/13

There was a long series of comments at the article about mdfind that got very confused talking about OS X metadata. I thought I'd try to straighten some of that out in a separate post - though honestly I'm still easily confused myself!

First, what metadata are we talking about? For an old Unix hand, the metadata is information stored in the inode: file size, permissions, pointers to datablocks, link counts.. that's traditional metadata.

However, there's more metadata today - not just in Unix systems, but especially in Mac OS X. There are extended permissions, acl's, xattributes, Spotlight related metadata.. it's very hard to ferret all this out of Google because similar terms are used for dissimilar features.

Macs had "resource forks" early on. OS X still has resource forks. but apparently Apple would like to move away from those. That's probably why things get so darn confusing: search for information on metadata and OS X and you'll find lots of pointers to things that talk about resource forks, but usually that's deprecated and doesn't usually apply to OS X.

Let's take Spotlight metadata first. These are specific keys that Spotlight indexes. For example, you can do things like this:

 mdfind 'kMDItemFSSize > 20000000'.
 mdfind 'kMDItemFinderComment == "script application wrapper"' 
 mdfind  'kMDItemTextContent == "*Seneca*" && kMDItemFSName != "*emlx"'
 mdfind  'kMDItemTextContent == "*Seneca*" && kMDItemContentType != "com.apple.mail.emlx"'
 
 

How does Spotlight get the info to index? It asks an Spotlight Importer. This BASICS OF SPOTLIGHT page explains:

Once the Mac OS does kick-off the extraction of metadata from a file, it does so through a Spotlight Importer. Spotlight Importers are plug-ins for the Mac OS that a developer provides specifically for helping files created by their applications to be searchable within Spotlight. Spotlight crawls through its list of changed files, handing each one to the appropriate importer. The importers then read the files, compile a list of metadata, and then hand the metadata back to Spotlight. At this point, the changed file is available for searching within Spotlight.

OK, great, but where does the metadata that the importer supplies come from? Apparently, that's up to the developer. Apple's Extracting Metadata from Documents says:

Avoid the use of external files to store metadata content. All critical metadata should be in the same file as the data. The system store of metadata should be considered volatile.

I want to quibble a little: if it's stored in the data file, it's really not metadata, is it? But never mind. Some apps do it that way. For example, ID3 tags. But other apps do not. For example. In my ~/Library/Caches/Metadata I found some interesting stuff. *Some* apps store Spotlight metadata there. I found:

 $ ls  ~/Library/Caches/Metadata 
 Billings		Microsoft		Safari
 Camino			Precipitate		com.evernote.Evernote
 

If I look in Billings, I find this:

<key>MetaData</key>^M        <dict>^M                <key>com_marketcircle_projectname</key>^M                <string>Repair</string>^M                <key>kMDItemContentCreationDate</key>^M                <date>2009-03-22T12:10:17Z</date>^M                <key>kMDItemContentModificationDate</key>^M                <date>2009-03-22T12:21:17Z</date>^M                <key>kMDItemDisplayName</key>^M
                 <string>Extreme rework</string>^M
                 <key>kMDItemTitle</key>^M
                 <string>Extreme rework</string>^M
         </dict>^M
 

But obviously not all apps store their Spotlight related metadata there. Entourage does, as seen in this HOW DOES ENTOURAGE WORK WITH SPOTLIGHT? bit:

When you enable Spotlight indexing within Entourage, a "cache" file is created for each item within your Entourage database. If you have 100,000 e-mail messages in your Entourage database, 100,000 cache files will be created. If you want to see the cache files, you can find them within your Library/Caches/Metadata/Microsoft folder.

Each cache file contains all the metadata that will be needed for indexing by Spotlight. All changes within Entourage are reflected to the cache files. Create a new item and a new cache file will be created. Updated an item and its cache file will update. Delete an item and its cache file will be deleted. With all these changes, Spotlight receives file change notifications and eventually will ask the modified cache files to go through the import process using the Entourage Spotlight Importer.

But there's no iTunes folder there..

There are also defaults. If I create a text file with "date > file", an "mdls" will show Spotlight keys:

 kMDItemContentCreationDate     = 2009-04-12 12:07:02 -0400
 kMDItemContentModificationDate = 2009-04-12 12:07:02 -0400
 kMDItemContentType             = "public.data"
 kMDItemContentTypeTree         = (
     "public.data",
     "public.item"
 )
 kMDItemDisplayName             = "file"
 kMDItemFSContentChangeDate     = 2009-04-12 12:07:02 -0400
 kMDItemFSCreationDate          = 2009-04-12 12:07:02 -0400
 kMDItemFSCreatorCode           = ""
 kMDItemFSFinderFlags           = 0
 kMDItemFSHasCustomIcon         = 0
 kMDItemFSInvisible             = 0
 kMDItemFSIsExtensionHidden     = 0
 kMDItemFSIsStationery          = 0
 kMDItemFSLabel                 = 0
 kMDItemFSName                  = "file"
 kMDItemFSNodeCount             = 0
 kMDItemFSOwnerGroupID          = 501
 kMDItemFSOwnerUserID           = 501
 kMDItemFSSize                  = 29
 kMDItemFSTypeCode              = ""
 kMDItemKind                    = "Plain text"
 kMDItemLastUsedDate            = 2009-04-12 12:07:02 -0400
 kMDItemUsedDates               = (
     2009-04-12 00:00:00 -0400
 )
 

Obviously the "date" command didn't create those. Spotlight won't even index that file (no extension), but it has some default keys just the same! See Spotlight, mdfind (Mac OS X Tiger searching) for more on that.

You can add metadata yourself and can modify one item of Spotlight's domain.

 $ xattr -w mystuff "hello there" file
 $ xattr -l file
 mystuff: hello there
 

The only Spotlight related data you can modify is kMDItemFinderComment. You do that with GetInfo and after adding it, xattr shows this:

 xattr -l file
 com.apple.metadata:kMDItemFinderComment:
 0000   62 70 6C 69 73 74 30 30 5A 4D 79 20 43 6F 6D 6D    bplist00ZMy Comm
 0010   65 6E 74 08 00 00 00 00 00 00 01 01 00 00 00 00    ent.............
 0020   00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00    ................
 0030   00 00 00 13                                        ....
 
 mystuff: hello there
 

Note that this gives us the clue as to where the data was stored, but I don't find a file with that "com.apple.metadata" name. I do find:

 /System/Library/LaunchAgents/com.apple.metadata.mdwrite.plist
 /System/Library/LaunchDaemons/com.apple.metadata.mds.plist
 

But those aren't related.

So what do we know? Well, we know it's up to the application responsible for a file to provide importer code. It's up to the same app to decide where to store metadata. Obviously, that implies that for some data that would be the across all files of this type, there's no need to store it anywhere - the importer could generate the response when Spotlight asks.

That's as far as I've gone.. maybe someone else can add more.


Author: Anthony Lawrence - Contact Author
Publisher: Anthony Lawrence
Licensee Name: Anthony Lawrence
Reference URL: http://aplawrence.com/MacOSX/metadata_mess.html
Copyright: All Rights Reserved
Registration Date: 4/13/2009 8:23:55 PM UTC
Views: 556




NUMLY.COM