Checksum-Based Storage

Artifactory uniquely stores artifacts using checksum-based storage. A file that is uploaded to Artifactory, has its SHA1 checksum calculated, and is then renamed to its checksum.

Artifactory stores binaries by checksum. This design improves storage efficiency and operation speed.

Storage Process

  1. Upload File: Artifactory calculates the file's SHA1 checksum.
  2. Name File By Checksum: The file is renamed to its checksum value.
  3. Place File In Checksum Directory: The file is stored in a directory named after the first two checksum characters.
  4. Create Database Mapping: Artifactory creates a mapping between the checksum and the uploaded repository path.

Checksum-Based Storage Example

  • A file with checksum ac3f5e56... is stored in directory ac
  • A file with checksum dfe12a4b... is stored in directory df
  • A file with checksum d4a3b2c1... is stored in directory d4

The following example shows the d4 directory that contains two files whose checksum begins with d4

In parallel, Artifactory stores a database entry that maps checksum to repository path. This lets many operations run as database transactions instead of direct file manipulation.

Benefits

👍

Why Checksum-Based Storage Matters

  • Deduplication: Files are stored once, even when uploaded multiple times.
  • Fast operations: Copy, move, and delete run as database transactions, not file operations.
  • Storage efficiency: Deduplication reduces storage use.
  • Performance: Database indirection reduces expensive filesystem work.
📘

Note

Checksum-based storage applies to all binaries in all Artifactory repositories.

For more information about checksum-based storage implementation in Artifactory, see: