Checksum-Based Storage

Artifactory stores artifacts by SHA1 checksum in two-character subdirectories with database mappings for deduplication.

Artifactory stores binaries by checksum. This design improves storage efficiency and operation speed.

Storage Process

  1. Upload File: Artifactory calculates the file's SHA1 checksum.
  2. Name File By Checksum: The file is renamed to its checksum value.
  3. Place File In Checksum Directory: The file is stored in a directory named after the first two checksum characters.
  4. Create Database Mapping: Artifactory creates a mapping between the checksum and the uploaded repository path.

Checksum-Based Storage Example

  • A file with checksum ac3f5e56... is stored in directory ac
  • A file with checksum dfe12a4b... is stored in directory df
  • A file with checksum d4a3b2c1... is stored in directory d4

The following example shows the d4 directory that contains two files whose checksum begins with d4

flowchart LR
    subgraph CBS["Checksum-Based Storage"]
        subgraph DB["Database"]
            P1["libs-release/guava-31.jar"]
            P2["libs-release/guava-31.jar (copy)"]
            P3["docker-local/layer.tar"]
            P4["npm-local/lodash-4.17.tgz"]
        end

        ART["JFrog Artifactory <br />Same checksum → stored only once"]

        subgraph FS["Filestore"]
            subgraph ac["ac/"]
                AC["ac3f5e56..."]
            end
            subgraph d4["d4/"]
                D4A["d4a3b2c1..."]
                D4B["d4e7f891..."]
            end
            subgraph df["df/"]
                DF["dfe12a4b..."]
            end
        end

        DB <-->|Database| ART
        ART <-->|Filestore| FS

        P1 & P2 -->|ac3f5e56...| AC
        P3       -->|d4a3b2c1...| D4A
        P4       -->|dfe12a4b...| DF

        style D4B stroke-dasharray:5 5
    end

In parallel, Artifactory stores a database entry that maps checksum to repository path. This lets many operations run as database transactions instead of direct file manipulation.

Benefits

👍

Why Checksum-Based Storage Matters

  • Deduplication: Files are stored once, even when uploaded multiple times.
  • Fast operations: Copy, move, and delete run as database transactions, not file operations.
  • Storage efficiency: Deduplication reduces storage use.
  • Performance: Database indirection reduces expensive filesystem work.
📘

Note

Checksum-based storage applies to all binaries in all Artifactory repositories.

For more information about checksum-based storage implementation in Artifactory, see:

Related Topics