Checksum-Based Storage
Artifactory stores artifacts by SHA1 checksum in two-character subdirectories with database mappings for deduplication.
Artifactory stores binaries by checksum. This design improves storage efficiency and operation speed.
Storage Process
- Upload File: Artifactory calculates the file's SHA1 checksum.
- Name File By Checksum: The file is renamed to its checksum value.
- Place File In Checksum Directory: The file is stored in a directory named after the first two checksum characters.
- Create Database Mapping: Artifactory creates a mapping between the checksum and the uploaded repository path.
Checksum-Based Storage Example
- A file with checksum ac3f5e56... is stored in directory ac
- A file with checksum dfe12a4b... is stored in directory df
- A file with checksum d4a3b2c1... is stored in directory d4
The following example shows the d4 directory that contains two files whose checksum begins with d4
In parallel, Artifactory stores a database entry that maps checksum to repository path. This lets many operations run as database transactions instead of direct file manipulation.
Benefits
Why Checksum-Based Storage Matters
- Deduplication: Files are stored once, even when uploaded multiple times.
- Fast operations: Copy, move, and delete run as database transactions, not file operations.
- Storage efficiency: Deduplication reduces storage use.
- Performance: Database indirection reduces expensive filesystem work.
Note
Checksum-based storage applies to all binaries in all Artifactory repositories.
For more information about checksum-based storage implementation in Artifactory, see:
Updated 12 days ago
