Checksum-Based Storage

Artifactory stores artifacts by SHA1 checksum in two-character subdirectories with database mappings for deduplication.

Artifactory stores binaries by checksum. This design improves storage efficiency and operation speed.

Storage Process

  1. Upload File: Artifactory calculates the file's SHA1 checksum.
  2. Name File By Checksum: The file is renamed to its checksum value.
  3. Place File In Checksum Directory: The file is stored in a directory named after the first two checksum characters.
  4. Create Database Mapping: Artifactory creates a mapping between the checksum and the uploaded repository path.

Checksum-Based Storage Example

  • A file with checksum ac3f5e56... is stored in directory ac
  • A file with checksum dfe12a4b... is stored in directory df
  • A file with checksum d4a3b2c1... is stored in directory d4

The following example shows the d4 directory that contains two files whose checksum begins with d4

In parallel, Artifactory stores a database entry that maps checksum to repository path. This lets many operations run as database transactions instead of direct file manipulation.

Benefits

👍

Why Checksum-Based Storage Matters

  • Deduplication: Files are stored once, even when uploaded multiple times.
  • Fast operations: Copy, move, and delete run as database transactions, not file operations.
  • Storage efficiency: Deduplication reduces storage use.
  • Performance: Database indirection reduces expensive filesystem work.
📘

Note

Checksum-based storage applies to all binaries in all Artifactory repositories.

For more information about checksum-based storage implementation in Artifactory, see: