Dive into Lightroom catalogues

Published on by Arnau Siches

Table of contents

This note captures my learnings on the contents of the Adobe Lightroom catalogues, particularly its β€œLibrary” module.

This entry is about Lightroom version 11. I have no expectation of any of this holding true for past or future versions. I have only inspected a catalogue from the MacOS version so I don't know if any of my findings are specific to this operating system.

The following will assume a catalogue named β€œsquirrel” so file and directory names will match that.

# Overview

A catalogue is composed of a few SQLite databases and some auxiliary files.

  • squirrel.lrcat : The catalogue. 116 tables with information about the images, folder structure, metadata (XMP, etc), development history, etc.
  • squirrel Helper.lrdata/helper.db : 22 tables to manage the search using FTS5.
  • squirrel Previews.lrdata/previews.db : 11 tables to manage the previews stored in disk as .rlprev image pyramids.
  • squirrel Previews.lrdata/root-pixels.db : 2 tables to manage small JPEG thumbnails.

# Database schema

The schema is large so I will focus on the core tables from where the rest hangs. Some tables have redundant information, likely to be convenience denormalisation and precomputed values.

To inspect the whole schema you can either use the .schema command in a standard SQLite session or use a GUI such as Beekeeper Studio. You'll notice that there are no explicit foreign key contraints so you'll have to work out which fields link to which tables. Field naming is fairly consistent so it should be a reasonable effort.

Adobe_AdditionalMetadata Adobe_AdditionalMetadata id_local image Adobe_images Adobe_images id_local rootFile Adobe_AdditionalMetadata--Adobe_images 1 1 AgLibraryKeywordImage AgLibraryKeywordImage id_local image tag AgLibraryKeywordImage--Adobe_images 1 0..N AgLibraryKeyword AgLibraryKeyword id_local name parent AgLibraryKeywordImage--AgLibraryKeyword 1 0..N AgLibraryFile AgLibraryFile id_local folder Adobe_images--AgLibraryFile 1 1 AgLibraryKeyword--AgLibraryKeyword 1 0..N AgLibraryRootFolder AgLibraryRootFolder id_local AgLibraryFolder AgLibraryFolder id_local rootFolder AgLibraryFolder--AgLibraryRootFolder 1 1 AgLibraryFile--AgLibraryFolder 1 1 AgHarvestedExifMetadata AgHarvestedExifMetadata id_local image AgHarvestedExifMetadata--Adobe_images 1 1 AgHarvestedIptcMetadata AgHarvestedIptcMetadata id_local image AgHarvestedIptcMetadata--Adobe_images 1 1

# Images

The Adobe_images is the core of the catalogue. In it you can find information about ratings, colour labels, whether it's a copy or an original, orientation and more.

# Filesystem

There are three tables representing the file structure linking between the images in the catalogue and the source images stored outside Lightroom.

  • AgLibraryRootFolder has the absolute and relative paths to work out where the images sit in the disk and where they sit in relation to the catalogue.
  • AgLibraryFolder represents any folder structure existing from the root folder inwards.
  • AgLibraryFile represents any file found in each known folder. Every image in Adobe_images has a file in this table.

# Metadata

Metadata is split in multiple tables by provenance:

  • Adobe_AdditionalMetadata has a field xmp with the whole XMP blob.
  • AgHarvestedExifMetadata has the Exif information.
  • AgHarvestedIptcMetadata has the IPTC information.
  • AgHarvestedDNGMetadata has the DNG information.

There are other metadata tables such as AgLibraryIPTC but I haven't worked out whether they are more relevant than the ones listed above.

# Keywords

Keywords are normalised across. The main tables are AgLibraryKeyword, AgLibraryKeywordImage and AgLibraryKeywordSynonym.

# Collections

Collections are normalised across. The main tables are AgLibraryCollection and AgLibraryCollectionImage.

# Imports

Both AgLibraryImport and AgLibraryImportImage group images to the importing action and the date it happened.

# Querying the catalogue

With an acquaintance of the schema we can start pulling some information out of the catalogue.

The following query lists every image with their respective filesystem path, rating, colour label and details around how the photo was taken.

SELECT
  root_folder.name AS root_folder,
  folder.pathFromRoot AS relative_path,
  file.baseName AS file_basename,
  file.extension AS file_extension,

  COALESCE(image.rating, 0) AS rating,
  image.colorLabels AS colour_label,

  image.captureTime as capture_time,
  exif.focalLength AS focal_length,
  ROUND(exif.aperture, 3) AS aperture,
  exif.shutterSpeed AS shutter_speed,
  ROUND(exif.isoSpeedRating, 0) AS iso_speed_rating,
  camera_model.value AS camera,
  lens.value AS lens

FROM
  AgLibraryFile AS file
LEFT JOIN AgLibraryFolder AS folder
  ON folder.id_local = file.folder
LEFT JOIN AgLibraryRootFolder AS root_folder
  ON root_folder.id_local = folder.rootFolder
LEFT JOIN Adobe_images AS image
  ON file.id_local = image.rootFile
LEFT JOIN AgHarvestedExifMetadata AS exif
  ON exif.image = image.id_local
LEFT JOIN AgInternedExifCameraModel AS camera_model
  ON camera_model.id_local = metadata.cameraModelRef
LEFT JOIN AgInternedExifLens AS lens
  ON lens.id_local = metadata.lensRef

This next query lists the images with their filesystem path, height, width and the full XMP blob.

SELECT
  root_folder.absolutePath || folder.pathFromRoot || file.baseName || '.' || file.extension AS path,
  image.fileHeight AS height,
  image.fileWidth AS width,
  metadata.xmp
FROM
  Adobe_images AS image
JOIN AgLibraryFile AS file
  ON image.rootFile = file.id_local
JOIN AgLibraryFolder AS folder
  ON file.folder = folder.id_local
JOIN AgLibraryRootFolder AS root_folder
  ON folder.rootFolder = root_folder.id_local
JOIN Adobe_AdditionalMetadata metadata
  ON metadata.image = image.id_local

# Timestamps

Timestamps deserve their own section because Lightroom uses different formats depending on the case and not necessarily one you would expect. For example, some values are ISO8601 datetimes, whereas others are large integers looking like a Unix epoch.

These large integers don't have the conventional Unix epoch starting at 1970-01-01T00:00:00Z though, they start at 2001-01-01T00:00:00Z. Thus the corresponding Unix time is 978307200.

Note : At least this is the case in MacOS and seems to boil down to the Cocoa Core Data Timestamp.

With this in mind, to get the creation time in UTC from a virtual copy of an image you'll need to convert values from existing copies to the Unix Epoch and the other ones either filter them out or reset to 0 (i.e. 1970-01-01T00:00:00Z):

SELECT
  id_local,
  copyName,
  strftime('%Y-%m-%dT%H:%M:%SZ', 
    CASE
      WHEN copyName IS NULL THEN
        copyCreationTime - 63113817600
      ELSE
        copyCreationTime + 978307200
    END, 'unixepoch')
  AS copy_creation_time,

FROM
  Adobe_images AS image

ORDER BY
  timestamp DESC

# Thumbnails

There are two types of thumbnail: the root-pixels.db database holds small JPEG blobs whilst the previews.db database holds references to the image pyramid files (.lrprev).

A quick way to see a thumbnail from the root-pixels.db is as follows:

sqlite3 squirrel.db "SELECT quote(jpegData) FROM RootPixels LIMIT 1" | xxd -r -p | icat --scale-up

We extract the JPEG blob as a hexadecimal string, convert it back to binary with xxd and display it with Kitty 's icat.

The previews.db holds information about the .lrprev files and their original images. Using the main tables you can get a good view of what's available:

SELECT
  ImageCacheEntry.imageId AS image_id,
  ImageCacheEntry.orientation,
  Pyramid.uuid,
  Pyramid.digest,
  Pyramid.colorProfile AS color_profile,
  Pyramid.croppedWidth AS width,
  Pyramid.croppedHeight AS height
FROM
  Pyramid
JOIN ImageCacheEntry
  ON ImageCacheEntry.uuid = Pyramid.uuid

# Detour: the .lrprev file

A .lrprev file is a binary file containing the full image pyramid generated by Lightroom. In short, the binary layout excluding paddings looks like:

<signature = AgHg> <size> <label = "header">
<data>
<signature = AgHg> <size> <label = "level_1">
<data>
…
<signature = AgHg> <size> <label = "level_4">
<data>
…
EOF

In other words, the file starts with a four byte signature AgHg (0x41 0x67 0x48 0x67), followed with the size (541 bytes in this example) and then the label of the block (header):

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚00000000β”‚ 41 67 48 67 00 20 00 00 β”Š 00 00 00 00 00 00 02 47 β”‚AgHg0 00β”Š000000β€’Gβ”‚
β”‚00000010β”‚ 00 00 00 00 00 00 00 09 β”Š 68 65 61 64 65 72 00 00 β”‚0000000_β”Šheader00β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

After that comes the actual header which is a key/value structure parseable as a Lua table:

pyramid = {
	colorProfile = "AdobeRGB",
	croppedHeight = 4912,
	croppedWidth = 7360,
	digest = "030b36e11e9d722fdab20884884e0ff2",
	fileTimeStamp = 645111123,
	formatVersion = 3,
	fromProxy = false,
	levels = {
		{
			height = 61,
			width = 90,
		},
		{
			height = 121,
			width = 180,
		},
		{
			height = 241,
			width = 360,
		},
		{
			height = 481,
			width = 720,
		},
		{
			height = 961,
			width = 1440,
		},
		{
			height = 1922,
			width = 2880,
		},
	},
	quality = "standard",
	uuid = "FF4ADF67-3C63-4EB7-85B1-6D4409B537D3",
}

From the header data we know this file has 7 pyramid levels which means the rest of the file is expected to have 7 more blocks labelled from level_1 to level_7 where each data block is a fully functional JPEG of the described width and height. I haven't checked but level 1 looks exactly the same as the small thumbnail found in root-pixels.db.

Note : The uuid and digest are also part of the filename which has a pattern such as {uuid}-{digest}.lrprev. This information also allows joining the dots with the previews.db tables Pyramid, PyramidLevel and ImageCacheEntry which in turn link to the main catalogue Adobe_images table.

After the header, a new block starts with the same signature AgHg, followed by the size and label of the block (level_1), then the data of size bytes and then it repeats for as many levels as the pyramid has.

# Closing thoughts

It's fair to say that proprietary undocumented formats are not my favourite topic. That said, the fact that Lightroom uses SQLite and keeps things roughtly consistent softens the problem of having data held hostage by businesses. It would be great if Adobe had easy to find documentation for the internals but I recognise they likely don't get any benefit from doing so.

I'm not entirely sure whether Lightroom Classic is here to stay or not but if it is it might be worth it learning how to write a plug-in for it. A quick search in Adobe's website yields no useful getting started tutorials which is a bit of a shame.