diff --git a/README.md b/README.md index 45805f9..ee634e0 100644 --- a/README.md +++ b/README.md @@ -44,46 +44,6 @@ See https://github.com/protomaps/PMTiles/tree/master/python/bin for library usag ## Specification -![layout](layout.png) - -PMTiles is a binary serialization format designed for two main access patterns: over the network, via HTTP 1.1 Byte Serving (`Range:` requests), or via memory-mapped files on disk. **All integer values are little-endian.** - -A PMTiles archive is composed of: -* a fixed-size 512,000 byte header section -* Followed by any number of tiles in arbitrary format -* Optionally followed by any number of *leaf directories* - -### Header -* The header begins with a 2-byte magic number, "PM" -* Followed by 2 bytes, the PMTiles specification version (currently 2). -* Followed by 4 bytes, the length of metadata (M bytes) -* Followed by 2 bytes, the number of entries in the *root directory* (N entries) -* Followed by M bytes of metadata, which **must be a JSON string with bounds, minzoom and maxzoom properties (new in v2)** -* Followed by N * 17 bytes, the root directory. - -### Directory structure -A directory is a contiguous sequence of 17 byte entries. A directory can have at most 21,845 entries. **A directory must be sorted by Z, X and then Y order (new in v2).** - -An entry consists of: -* 1 byte: the zoom level (Z) of the entry, with the top bit set to 1 instead of 0 to indicate the offset/length points to a leaf directory and not a tile. -* 3 bytes: the X (column) of the entry. -* 3 bytes: the Y (row) of the entry. -* 6 bytes: the offset of where the tile begins in the archive. -* 4 bytes: the length of the tile, in bytes. - -**All leaf directory entries follow non-leaf entries. All leaf directories in a single directory must have the same Z value. (new in v2).** - -### Notes -* A full directory of 21,845 entries holds exactly a complete pyramid with 8 levels, or 1+4+16+64+256+1024+4096+16384. -* A PMTiles archive with less than 21,845 tiles should have a root directory and no leaf directories. -* Multiple tile entries can point to the same offset; this is useful for de-duplicating certain tiles, such as an empty "ocean" tile. -* Analogously, multiple leaf directory entries can point to the same offset; this can avoid inefficiently-packed small leaf directories. -* The tentative media type for PMTiles archives is `application/vnd.pmtiles`. - -### Implementation suggestions -* PMTiles is designed to make implementing a writer simple. Reserve 512KB, then write all tiles, recording their entry information; then write all leaf directories; finally, rewind to 0 and write the header. -* The order of tile data in the archive is unspecified; an optimized implementation should arrange tiles on a 2D space-filling curve. -* PMTiles readers should cache directory entries by byte offset, not by Z/X/Y. This means that deduplicated leaf directories result in cache hits. ## Recipes diff --git a/spec/v2/spec.md b/spec/v2/spec.md new file mode 100644 index 0000000..9100030 --- /dev/null +++ b/spec/v2/spec.md @@ -0,0 +1,42 @@ +# PMTiles version 2 + +*Note: this is deprecated in favor of spec version 3.* + +PMTiles is a binary serialization format designed for two main access patterns: over the network, via HTTP 1.1 Byte Serving (`Range:` requests), or via memory-mapped files on disk. **All integer values are little-endian.** + +A PMTiles archive is composed of: +* a fixed-size 512,000 byte header section +* Followed by any number of tiles in arbitrary format +* Optionally followed by any number of *leaf directories* + +### Header +* The header begins with a 2-byte magic number, "PM" +* Followed by 2 bytes, the PMTiles specification version (currently 2). +* Followed by 4 bytes, the length of metadata (M bytes) +* Followed by 2 bytes, the number of entries in the *root directory* (N entries) +* Followed by M bytes of metadata, which **must be a JSON string with bounds, minzoom and maxzoom properties (new in v2)** +* Followed by N * 17 bytes, the root directory. + +### Directory structure +A directory is a contiguous sequence of 17 byte entries. A directory can have at most 21,845 entries. **A directory must be sorted by Z, X and then Y order (new in v2).** + +An entry consists of: +* 1 byte: the zoom level (Z) of the entry, with the top bit set to 1 instead of 0 to indicate the offset/length points to a leaf directory and not a tile. +* 3 bytes: the X (column) of the entry. +* 3 bytes: the Y (row) of the entry. +* 6 bytes: the offset of where the tile begins in the archive. +* 4 bytes: the length of the tile, in bytes. + +**All leaf directory entries follow non-leaf entries. All leaf directories in a single directory must have the same Z value. (new in v2).** + +### Notes +* A full directory of 21,845 entries holds exactly a complete pyramid with 8 levels, or 1+4+16+64+256+1024+4096+16384. +* A PMTiles archive with less than 21,845 tiles should have a root directory and no leaf directories. +* Multiple tile entries can point to the same offset; this is useful for de-duplicating certain tiles, such as an empty "ocean" tile. +* Analogously, multiple leaf directory entries can point to the same offset; this can avoid inefficiently-packed small leaf directories. +* The tentative media type for PMTiles archives is `application/vnd.pmtiles`. + +### Implementation suggestions +* PMTiles is designed to make implementing a writer simple. Reserve 512KB, then write all tiles, recording their entry information; then write all leaf directories; finally, rewind to 0 and write the header. +* The order of tile data in the archive is unspecified; an optimized implementation should arrange tiles on a 2D space-filling curve. +* PMTiles readers should cache directory entries by byte offset, not by Z/X/Y. This means that deduplicated leaf directories result in cache hits. \ No newline at end of file diff --git a/spec/v3/spec.md b/spec/v3/spec.md new file mode 100644 index 0000000..54b5f5e --- /dev/null +++ b/spec/v3/spec.md @@ -0,0 +1,91 @@ +# PMTiles version 3 + +## File structure + +A PMTiles archive is a single-file archive of square tiles with five main sections: + +1. A fixed-size, 127-byte **Header** starting with `PMTiles` and then the spec version - currently `3` - that contains offsets to the next sections. +2. A root **Directory**, described below. The Header and Root combined must be less than 16,384 bytes. +3. JSON metadata. +4. Optionally, a section of **Leaf Directories**, encoded the same way as the root. +5. The tile data. + +## Entries + +A Directory is a list of `Entries`, in ascending order by `TileId`: + + Entry = (TileId uint64, Offset uint64, Length uint32, RunLength uint32) + +* `TileId` starts at 0 and corresponds to a cumulative position on the series of square Hilbert curves starting at z=0. +* `Offset` is the position of the tile in the file relative to the start of the data section. +* `Length` is the size of the tile in bytes. +* `RunLength` is how many times this tile is repeated: the `TileId=5,RunLength=2` means that tile is present at IDs 5 and 6. +* If `RunLength=0`, the offset/length points to a Leaf Directory where `TileId` is the first entry. + +# Directory Serialization + +Entries are stored in memory as integers, but serialized to disk using these compression steps: + 1. A little-endian varint indicating the # of entries. + 2. Delta encoding of `TileId` + 3. Zeroing of `Offset`: + * `0` if it is equal to the `Offset` + `Length` of the previous entry + * `Offset+1` otherwise + 4. Varint encoding of ll numbers + 5. Columnar ordering: all `TileId`s, all `RunLength`s, all `Length`s, then all `Offset`s + 6. Finally, general purpose compression as described by the `Header`'s `InternalCompression` field. + +# Directory Hierarchy +* The number of entries in the root directory and leaf directories is up to the implementation. +* However, the compressed size of the header plus root directory is required in v3 to be under **16,384 bytes**. This is to allow latency-optimized clients to prefetch the root directory and guarantee it is complete. A sophisticated writer might need several attempts to optimize this. +* Root size, leaf sizes and depth should be configurable by the user to adjust for optimize for different trade-offs: cost, bandwidth, latency. + +# Header Design + +*Certain fields belonging to metadata in v2 are promoted to fixed-size header fields. This allows a map container to be initialized to the desired extent or center without blocking on the JSON metadata.* + +The `Header` is 127 bytes, with little-endian integer values: + +| offset | description | width | +| --- | --- | --- | +| 0 | magic number `PMTiles` | 7 | +| 7 | spec version, currently `3` | 1 | +| 8 | offset of root directory | 8 | +| 16 | length of root directory | 8 | +| 24 | offset of JSON metadata, possibly compressed by `InternalCompression` | 8 | +| 32 | length of JSON metadata | 8 | +| 40 | offset of leaf directories | 8 | +| 48 | length of leaf directories | 8 | +| 56 | offset of tile data | 8 | +| 64 | length of tile data | 8 | +| 72 | # of addressed tiles, 0 if unknown | 8 | +| 80 | # of tile entries, 0 if unknown | 8 | +| 88 | # of tile contents, 0 if unknown | 8 | +| 96 | boolean clustered flag | 1 | +| 97 | internal compression enum (0 = Unknown, 1 = None, 2 = Gzip, 3 = Brotli, 4 = Zstd) | 1 | +| 98 | tile compression enum | 1 | +| 99 | tile type enum (0 = Unknown/Other, 1 = MVT (PBF Vector Tile), 2 = PNG, 3 = JPEG, 4 = WEBP | 1 | +| 100 | min zoom | 1 | +| 101 | max zoom | 1 | +| 102 | min longitude (IEEE 754 float) | 4 | +| 106 | min latitude | 4 | +| 110 | max longitude | 4 | +| 114 | max latitude | 4 | +| 118 | center zoom | 1 | +| 119 | center longitude | 4 | +| 123 | center latitude | 4 | + +### Notes + +* **# of addressed tiles**: the total number of tiles before run-length encoding, i.e. `Sum(RunLlength)` over all entries. +* **# of tile entries**: the total number of entries across all directories where `RunLength > 0`. +* **# # of tile contents**: the number of referenced blobs in the tile section, or the unique # of offsets. If the archive is completely deduplicated, this is equal to the # of unique tile contents. If there is no deduplication, this is equal to the number of tile entries above. +* **boolean clustered flag**: if `True`, blobs in the data section are generally ordered by Hilbert TileID. More concretely, this means that: when traversing all entries in TileID order, the offsets are either contiguous with the immediately previous entry, or refer to a lesser offset - a deduplicated tile. +* **compression enum**: Mandatory, tells the client how to decompress contents as well as provide correct `Content-Encoding` headers to browsers. +* **tile type**: A hint as to the tile contents. Clients and proxies may use this to: + * Automatically determine a visualization method + * provide a conventional MIME type HTTP `Content-Type` header + * Enforce a canonical file path extension e.g. `.mvt`, `png`, `jpeg`, `.webp` + +### Organization + +In most cases, the archive should be in the order `Header`, Root Directory, JSON Metadata, Leaf Directories, Tile Data. It is possible to relocate sections other than `Header` arbitrarily, but no current writers/readers take advantage of this.