PRG File Format Specification

  • MIME Type: application/vnd.project-graph
  • File Extension: .prg
  • Author: zty012 <[email protected]>
  • Version: 2.3.0

1. Introduction

The PRG file format is a container-based format for storing diagrams or extensions used in the Project Graph application. It leverages the ubiquitous ZIP archive format to bundle the main content (stage.msgpack or extension.js) with its associated binary attachments (images, documents, etc.).

The design goals of this format are:

  • Portability: To serve as a single, shareable file containing all project assets.
  • Interoperability: To be based on well-established standards (ZIP, MessagePack) for ease of implementation.
  • Extensibility: To allow for future evolution of the format while maintaining backward compatibility.

2. Overall Container Structure

A PRG file MUST be a valid ZIP archive. The structure within the ZIP filesystem is as follows:

metadata.msgpack
stage.msgpack/extension.js
<uuid>.<ext>
  • All paths within the ZIP archive MUST use the forward slash (/) as the directory separator.
  • All filenames MUST be encoded using UTF-8.

NOTE: A valid PRG container MUST include metadata.msgpack at the root, and MUST include at least one of stage.msgpack or extension.js. Implementations that encounter a ZIP lacking these required files should reject the file as invalid.

3. The Metadata File (metadata.msgpack)

3.1. Format & Encoding

The metadata.msgpack file MUST exist at the root of the ZIP container. Its content MUST be a binary stream that is a valid MessagePack serialization of an object containing metadata about the project.

3.2. Content Schema

The deserialized content of metadata.msgpack MUST be an object with the following entries:

  • version: A string value representing the version of the PRG format used in Semantic Versioning (e.g., "2.3.0"). This field MUST be present to allow for future format evolution and compatibility checks.
  • extension?: Required if the PRG file is an extension. An object containing:
    • id: A string value representing the unique identifier of the extension (e.g., "com.example.myextension").
    • name: A string value representing the human-readable name of the extension (e.g., "My Extension").
    • description: A string value providing a brief description of the extension's functionality and purpose.
    • version: A string value representing the version of the extension in Semantic Versioning (e.g., "1.0.0").
    • author: A string value representing the name and email of the extension's author (e.g., "Jane Doe <[email protected]>").

Compatibility and alternative formats:

  • Implementations MUST support metadata.msgpack. Implementations MAY also support a human-readable metadata.json as an alternative. If both metadata.msgpack and metadata.json are present, implementations MUST prefer metadata.msgpack.
  • When encountering a metadata.version with an unknown major version, implementations SHOULD fail gracefully (e.g., refuse to load or operate in read-only mode) and log or surface a clear error. For unknown minor/patch versions, implementations SHOULD ignore unknown fields while continuing processing when possible.

Example (JSON form; equivalent MessagePack should be used in metadata.msgpack):

{
  "version": "2.3.0",
  "extension": {
    "id": "com.example.myextension",
    "name": "My Extension",
    "description": "Provides extra node types",
    "version": "1.0.0",
    "author": "Jane Doe <[email protected]>"
  }
}

4. Main Content

PRG files can represent either a diagram or an extension.

4.1. Diagram (stage.msgpack)

4.1.1. Format & Encoding

The stage.msgpack file MAY exist at the root of the ZIP container. Its content MUST be a binary stream that is a valid MessagePack serialization of a Graphif Serializer serialized array.

4.1.2. Content Schema

The deserialized content of stage.msgpack MUST be an array. Each element in this array MUST be an object representing a Stage Object.

4.1.2.1. Stage Object

A Stage Object (an object) MUST contain the following entry:

  • uuid (Key): A string value representing the unique identifier for this stage.

The Stage Object MAY contain any number of other entries to define the graph's nodes, edges, properties, and references to attachments. The specific schema for these entries is defined by the Graphif Serializer specification.

Any implementation that encounters a Stage Object with an unknown structure SHOULD ignore the unrecognized entries and continue processing.

4.2. Extension (extension.js)

The extension.js file MAY exist at the root of the ZIP container. If present, it MUST be a JavaScript file that defines the behavior and functionality of an extension package. The content of this file is outside the scope of this specification and is determined by the extension's requirements.

5. The Attachments Directory (attachments/)

5.1. Purpose

The attachments/ directory is an optional container for any binary or text files referenced by the stage(s), such as images, documents, or other media.

5.2. Naming Convention

Files within the attachments/ directory MUST be named using a Universally Unique Identifier (UUID) followed by a file extension that implies the file's MIME type.

Example:

  • attachments/b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77.png
  • attachments/f8a3d2b1-4e5c-6789-0123-456789abcdef.pdf

UUID format: Implementations SHOULD follow RFC 4122 (commonly UUID v4). A recommended validation regex is:

^[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$

Implementations MUST NOT allow multiple different files in attachments/ that share the same UUID (even with different extensions); presence of duplicate UUIDs is considered invalid and implementations MUST report an error rather than silently choosing one.

5.3. Referencing Attachments

Stage Objects reference these files by their UUID filename (without the extension). For example, an ImageNode object would reference the attachment b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77.png using the string "b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77".

6. Security Considerations

Implementors and users of this format should be aware of several security-related aspects:

  1. ZIP Container Risks:

    • Compression Bombs: A PRG file may contain a small ZIP that decompresses to an extremely large amount of data, causing denial-of-service. Implementations MUST impose reasonable limits on the number of extracted files and the total uncompressed size.
      • Recommended default limits (implementations MAY make these configurable):
        • Maximum number of entries: 1000
        • Maximum total uncompressed size: 500 * 1024 * 1024 bytes (500 MB)
        • Maximum single file uncompressed size: 200 * 1024 * 1024 bytes (200 MB)
    • Path Traversal: Maliciously crafted ZIP entries could have names like ../../../some_important_file. Implementations MUST NOT extract files to filesystem, instead, read them directly from the ZIP stream to memory or a controlled environment.
      • Implementations MUST reject ZIP entries that contain absolute paths, drive letters, or any .. path segments.
  2. Attachment Risks: The attachments directory can contain any file type. The application processing the PRG file is responsible for handling each attachment in a secure manner (e.g., run script files in sandbox, detect malware in attachments).

7. Future Considerations

This section outlines potential extensions to the format for future discussion and development.

7.1. Use JSON instead of MessagePack for Metadata

To enhance human readability and ease of debugging, the metadata.msgpack file could be supplemented by a metadata.json file. Implementations MUST continue to accept metadata.msgpack; support for metadata.json is optional and, if implemented, the priority rules in section 3.2 apply.

7.2. Versioning (versions/ directory)

A versions/ directory could be introduced to store historical snapshots of the stage.msgpack file, enabling built-in version control and audit trails. Each snapshot could be a copy of stage.msgpack named by a timestamp or commit hash.

7.3. Sub-Stages (sub/ directory)

To avoid the inefficiency of nested ZIP files (storing a .prg inside another .prg), a sub/ directory could store additional Stage files. This would allow for complex, multi-stage projects within a single container.

stage.msgpack
<uuid>.<ext>
stage.msgpack
stage.msgpack
stage.msgpack
stage.msgpack
stage.msgpack
stage.msgpack

Sub-stages can be nested to arbitrary depth, with each sub-stage having its own stage.msgpack file. References between stages MUST be done using the UUID of the sub-stage.

Storage convention and lookup: sub-stages MUST be stored under sub/<uuid>/stage.msgpack. When resolving a reference to a sub-stage UUID, implementations SHOULD look for a matching folder name under sub/ and load that folder's stage.msgpack. Example layout:

sub/
  b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77/
    stage.msgpack

A sub-stage reference such as b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77 resolves to sub/b54c5f6c-6f28-4d65-bcc5-4c891c6dbd77/stage.msgpack.

7.3.1 Representing Sub-Stages

Because UUID is unique, so use just one UUID to represent a sub-stage.

7.4. Workspace Settings (settings.msgpack)

Application-specific settings (e.g., default node styles, view preferences) could be stored in a root-level file like settings.json, making the PRG file a self-contained workspace.

7.5. Anchors

To represent a node or a region, we need to design an anchor syntax that can uniquely identify elements within the PRG file. The proposed syntax uses UUIDs to reference specific nodes or regions.

Node UUIDs MUST be separated by ;.

# wrapped for readability
file:///home/user/project.prg#
ec32d43d-7890-4e45-a28f-b31bca4dafea;
6c95db6b-b64e-49e8-a3b0-3a39ee2588c9;
49828d7f-7d05-48d9-bcb5-a699782f9880

This example references three nodes:

  • ec32d43d-7890-4e45-a28f-b31bca4dafea
  • 6c95db6b-b64e-49e8-a3b0-3a39ee2588c9
  • 49828d7f-7d05-48d9-bcb5-a699782f9880

Anchor syntax (recommended): anchors SHOULD be a single-line, semicolon-separated list of UUIDs with no surrounding whitespace. Trailing semicolons are NOT allowed. Example single-line form:

file:///home/user/project.prg#ec32d43d-7890-4e45-a28f-b31bca4dafea;6c95db6b-b64e-49e8-a3b0-3a39ee2588c9;49828d7f-7d05-48d9-bcb5-a699782f9880

When embedded in URLs, implementations MUST apply URL-encoding as required by the surrounding context.

7.6. Fractional indexing

Source: Realtime editing of ordered sequences | Figma Blog

Instead of OT, Figma uses a trick that’s often used to implement reordering on top of a database. Every object has a real number as an index and the order of the children for an element of the tree is determined by sorting all children by their index. To insert between two objects, just set the index for the new object to the average index of the two objects on either side. We use arbitrary-precision fractions instead of 64-bit doubles so that we can’t run out of precision after lots of edits.

We can also use numbers to identify nodes in a stage. This can make it easier to represent the order of nodes (z-index).