guix/tests/store-deduplication.scm

155 lines
6.4 KiB
Scheme
Raw Normal View History

;;; GNU Guix --- Functional package management for GNU
daemon: Do not deduplicate files smaller than 8 KiB. Files smaller than 8 KiB typically represent ~70% of the entries in /gnu/store/.links but only contribute to ~4% of the space savings afforded by deduplication. Not considering these files for deduplication speeds up file insertion in the store and, more importantly, leaves 'removeUnusedLinks' with fewer entries to traverse, thereby speeding it up proportionally. Partly fixes <https://issues.guix.gnu.org/24937>. * config-daemon.ac: Remove symlink hard link check and CAN_LINK_SYMLINK definition. * guix/store/deduplication.scm (%deduplication-minimum-size): New variable. (deduplicate)[loop]: Do not recurse when FILE's size is below %DEDUPLICATION-MINIMUM-SIZE. (dump-port): New procedure. (dump-file/deduplicate)[hash]: Turn into... [dump-and-compute-hash]: ... this thunk. Call 'deduplicate' only when SIZE is greater than %DEDUPLICATION-MINIMUM-SIZE; otherwise call 'dump-port'. * nix/libstore/gc.cc (LocalStore::removeUnusedLinks): Drop files where st.st_size < deduplicationMinSize. * nix/libstore/local-store.hh (deduplicationMinSize): New declaration. * nix/libstore/optimise-store.cc (deduplicationMinSize): New variable. (LocalStore::optimisePath_): Return when PATH is a symlink or smaller than 'deduplicationMinSize'. * tests/derivations.scm ("identical files are deduplicated"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/nar.scm ("restore-file-set with directories (signed, valid)"): Likewise. * tests/store-deduplication.scm ("deduplicate, below %deduplication-minimum-size"): New test. ("deduplicate", "deduplicate, ENOSPC"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/store.scm ("substitute, deduplication"): Likewise.
2021-11-13 20:47:15 +00:00
;;; Copyright © 2018, 2020-2021 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
;;; GNU Guix is free software; you can redistribute it and/or modify it
;;; under the terms of the GNU General Public License as published by
;;; the Free Software Foundation; either version 3 of the License, or (at
;;; your option) any later version.
;;;
;;; GNU Guix is distributed in the hope that it will be useful, but
;;; WITHOUT ANY WARRANTY; without even the implied warranty of
;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;;; GNU General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License
;;; along with GNU Guix. If not, see <http://www.gnu.org/licenses/>.
(define-module (test-store-deduplication)
#:use-module (guix tests)
#:use-module (guix store deduplication)
Switch to Guile-Gcrypt. This removes (guix hash) and (guix pk-crypto), which now live as part of Guile-Gcrypt (version 0.1.0.) * guix/gcrypt.scm, guix/hash.scm, guix/pk-crypto.scm, tests/hash.scm, tests/pk-crypto.scm: Remove. * configure.ac: Test for Guile-Gcrypt. Remove LIBGCRYPT and LIBGCRYPT_LIBDIR assignments. * m4/guix.m4 (GUIX_ASSERT_LIBGCRYPT_USABLE): Remove. * README: Add Guile-Gcrypt to the dependencies; move libgcrypt as "required unless --disable-daemon". * doc/guix.texi (Requirements): Likewise. * gnu/packages/bash.scm, guix/derivations.scm, guix/docker.scm, guix/git.scm, guix/http-client.scm, guix/import/cpan.scm, guix/import/cran.scm, guix/import/crate.scm, guix/import/elpa.scm, guix/import/gnu.scm, guix/import/hackage.scm, guix/import/texlive.scm, guix/import/utils.scm, guix/nar.scm, guix/pki.scm, guix/scripts/archive.scm, guix/scripts/authenticate.scm, guix/scripts/download.scm, guix/scripts/hash.scm, guix/scripts/pack.scm, guix/scripts/publish.scm, guix/scripts/refresh.scm, guix/scripts/substitute.scm, guix/store.scm, guix/store/deduplication.scm, guix/tests.scm, tests/base32.scm, tests/builders.scm, tests/challenge.scm, tests/cpan.scm, tests/crate.scm, tests/derivations.scm, tests/gem.scm, tests/nar.scm, tests/opam.scm, tests/pki.scm, tests/publish.scm, tests/pypi.scm, tests/store-deduplication.scm, tests/store.scm, tests/substitute.scm: Adjust imports. * gnu/system/vm.scm: Likewise. (guile-sqlite3&co): Rename to... (gcrypt-sqlite3&co): ... this. Add GUILE-GCRYPT. (expression->derivation-in-linux-vm)[config]: Remove. (iso9660-image)[config]: Remove. (qemu-image)[config]: Remove. (system-docker-image)[config]: Remove. * guix/scripts/pack.scm: Adjust imports. (guile-sqlite3&co): Rename to... (gcrypt-sqlite3&co): ... this. Add GUILE-GCRYPT. (self-contained-tarball)[build]: Call 'make-config.scm' without #:libgcrypt argument. (squashfs-image)[libgcrypt]: Remove. [build]: Call 'make-config.scm' without #:libgcrypt. (docker-image)[config, json]: Remove. [build]: Add GUILE-GCRYPT to the extensions Remove (guix config) from the imported modules. * guix/self.scm (specification->package): Remove "libgcrypt", add "guile-gcrypt". (compiled-guix): Remove #:libgcrypt. [guile-gcrypt]: New variable. [dependencies]: Add it. [*core-modules*]: Remove #:libgcrypt from 'make-config.scm' call. Add #:extensions. [*config*]: Remove #:libgcrypt from 'make-config.scm' call. (%dependency-variables): Remove %libgcrypt. (make-config.scm): Remove #:libgcrypt. * build-aux/build-self.scm (guile-gcrypt): New variable. (make-config.scm): Remove #:libgcrypt. (build-program)[fake-gcrypt-hash]: New variable. Add (gcrypt hash) to the imported modules. Adjust load path assignments. * gnu/packages/package-management.scm (guix)[propagated-inputs]: Add GUILE-GCRYPT. [arguments]: In 'wrap-program' phase, add GUILE-GCRYPT to the search path.
2018-08-31 15:07:07 +00:00
#:use-module (gcrypt hash)
#:use-module ((guix utils) #:select (call-with-temporary-directory))
#:use-module (guix build utils)
#:use-module (rnrs bytevectors)
#:use-module (ice-9 binary-ports)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-26)
#:use-module (srfi srfi-64))
(test-begin "store-deduplication")
daemon: Do not deduplicate files smaller than 8 KiB. Files smaller than 8 KiB typically represent ~70% of the entries in /gnu/store/.links but only contribute to ~4% of the space savings afforded by deduplication. Not considering these files for deduplication speeds up file insertion in the store and, more importantly, leaves 'removeUnusedLinks' with fewer entries to traverse, thereby speeding it up proportionally. Partly fixes <https://issues.guix.gnu.org/24937>. * config-daemon.ac: Remove symlink hard link check and CAN_LINK_SYMLINK definition. * guix/store/deduplication.scm (%deduplication-minimum-size): New variable. (deduplicate)[loop]: Do not recurse when FILE's size is below %DEDUPLICATION-MINIMUM-SIZE. (dump-port): New procedure. (dump-file/deduplicate)[hash]: Turn into... [dump-and-compute-hash]: ... this thunk. Call 'deduplicate' only when SIZE is greater than %DEDUPLICATION-MINIMUM-SIZE; otherwise call 'dump-port'. * nix/libstore/gc.cc (LocalStore::removeUnusedLinks): Drop files where st.st_size < deduplicationMinSize. * nix/libstore/local-store.hh (deduplicationMinSize): New declaration. * nix/libstore/optimise-store.cc (deduplicationMinSize): New variable. (LocalStore::optimisePath_): Return when PATH is a symlink or smaller than 'deduplicationMinSize'. * tests/derivations.scm ("identical files are deduplicated"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/nar.scm ("restore-file-set with directories (signed, valid)"): Likewise. * tests/store-deduplication.scm ("deduplicate, below %deduplication-minimum-size"): New test. ("deduplicate", "deduplicate, ENOSPC"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/store.scm ("substitute, deduplication"): Likewise.
2021-11-13 20:47:15 +00:00
(test-equal "deduplicate, below %deduplication-minimum-size"
(list #t (make-list 5 1))
(call-with-temporary-directory
(lambda (store)
;; Note: DATA must be longer than %DEDUPLICATION-MINIMUM-SIZE.
(let ((data "Hello, world!")
(identical (map (lambda (n)
(string-append store "/" (number->string n)
"/a/b/c"))
(iota 5))))
(for-each (lambda (file)
(mkdir-p (dirname file))
(call-with-output-file file
(lambda (port)
(put-bytevector port (string->utf8 data)))))
identical)
(deduplicate store (nar-sha256 store) #:store store)
;; (system (string-append "ls -lRia " store))
(list (= (length (delete-duplicates
(map (compose stat:ino stat) identical)))
(length identical))
(map (compose stat:nlink stat) identical))))))
(test-equal "deduplicate"
(cons* #t #f ;inode comparisons
2 (make-list 5 6)) ;'nlink' values
(call-with-temporary-directory
(lambda (store)
daemon: Do not deduplicate files smaller than 8 KiB. Files smaller than 8 KiB typically represent ~70% of the entries in /gnu/store/.links but only contribute to ~4% of the space savings afforded by deduplication. Not considering these files for deduplication speeds up file insertion in the store and, more importantly, leaves 'removeUnusedLinks' with fewer entries to traverse, thereby speeding it up proportionally. Partly fixes <https://issues.guix.gnu.org/24937>. * config-daemon.ac: Remove symlink hard link check and CAN_LINK_SYMLINK definition. * guix/store/deduplication.scm (%deduplication-minimum-size): New variable. (deduplicate)[loop]: Do not recurse when FILE's size is below %DEDUPLICATION-MINIMUM-SIZE. (dump-port): New procedure. (dump-file/deduplicate)[hash]: Turn into... [dump-and-compute-hash]: ... this thunk. Call 'deduplicate' only when SIZE is greater than %DEDUPLICATION-MINIMUM-SIZE; otherwise call 'dump-port'. * nix/libstore/gc.cc (LocalStore::removeUnusedLinks): Drop files where st.st_size < deduplicationMinSize. * nix/libstore/local-store.hh (deduplicationMinSize): New declaration. * nix/libstore/optimise-store.cc (deduplicationMinSize): New variable. (LocalStore::optimisePath_): Return when PATH is a symlink or smaller than 'deduplicationMinSize'. * tests/derivations.scm ("identical files are deduplicated"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/nar.scm ("restore-file-set with directories (signed, valid)"): Likewise. * tests/store-deduplication.scm ("deduplicate, below %deduplication-minimum-size"): New test. ("deduplicate", "deduplicate, ENOSPC"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/store.scm ("substitute, deduplication"): Likewise.
2021-11-13 20:47:15 +00:00
;; Note: DATA must be longer than %DEDUPLICATION-MINIMUM-SIZE.
(let ((data (string-concatenate (make-list 1000 "Hello, world!")))
(identical (map (lambda (n)
(string-append store "/" (number->string n)
"/a/b/c"))
(iota 5)))
(unique (string-append store "/unique")))
(for-each (lambda (file)
(mkdir-p (dirname file))
(call-with-output-file file
(lambda (port)
daemon: Do not deduplicate files smaller than 8 KiB. Files smaller than 8 KiB typically represent ~70% of the entries in /gnu/store/.links but only contribute to ~4% of the space savings afforded by deduplication. Not considering these files for deduplication speeds up file insertion in the store and, more importantly, leaves 'removeUnusedLinks' with fewer entries to traverse, thereby speeding it up proportionally. Partly fixes <https://issues.guix.gnu.org/24937>. * config-daemon.ac: Remove symlink hard link check and CAN_LINK_SYMLINK definition. * guix/store/deduplication.scm (%deduplication-minimum-size): New variable. (deduplicate)[loop]: Do not recurse when FILE's size is below %DEDUPLICATION-MINIMUM-SIZE. (dump-port): New procedure. (dump-file/deduplicate)[hash]: Turn into... [dump-and-compute-hash]: ... this thunk. Call 'deduplicate' only when SIZE is greater than %DEDUPLICATION-MINIMUM-SIZE; otherwise call 'dump-port'. * nix/libstore/gc.cc (LocalStore::removeUnusedLinks): Drop files where st.st_size < deduplicationMinSize. * nix/libstore/local-store.hh (deduplicationMinSize): New declaration. * nix/libstore/optimise-store.cc (deduplicationMinSize): New variable. (LocalStore::optimisePath_): Return when PATH is a symlink or smaller than 'deduplicationMinSize'. * tests/derivations.scm ("identical files are deduplicated"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/nar.scm ("restore-file-set with directories (signed, valid)"): Likewise. * tests/store-deduplication.scm ("deduplicate, below %deduplication-minimum-size"): New test. ("deduplicate", "deduplicate, ENOSPC"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/store.scm ("substitute, deduplication"): Likewise.
2021-11-13 20:47:15 +00:00
(put-bytevector port (string->utf8 data)))))
identical)
;; Make the parent of IDENTICAL read-only. This should not prevent
;; deduplication from inserting its hard link.
(chmod (dirname (second identical)) #o544)
(call-with-output-file unique
(lambda (port)
daemon: Do not deduplicate files smaller than 8 KiB. Files smaller than 8 KiB typically represent ~70% of the entries in /gnu/store/.links but only contribute to ~4% of the space savings afforded by deduplication. Not considering these files for deduplication speeds up file insertion in the store and, more importantly, leaves 'removeUnusedLinks' with fewer entries to traverse, thereby speeding it up proportionally. Partly fixes <https://issues.guix.gnu.org/24937>. * config-daemon.ac: Remove symlink hard link check and CAN_LINK_SYMLINK definition. * guix/store/deduplication.scm (%deduplication-minimum-size): New variable. (deduplicate)[loop]: Do not recurse when FILE's size is below %DEDUPLICATION-MINIMUM-SIZE. (dump-port): New procedure. (dump-file/deduplicate)[hash]: Turn into... [dump-and-compute-hash]: ... this thunk. Call 'deduplicate' only when SIZE is greater than %DEDUPLICATION-MINIMUM-SIZE; otherwise call 'dump-port'. * nix/libstore/gc.cc (LocalStore::removeUnusedLinks): Drop files where st.st_size < deduplicationMinSize. * nix/libstore/local-store.hh (deduplicationMinSize): New declaration. * nix/libstore/optimise-store.cc (deduplicationMinSize): New variable. (LocalStore::optimisePath_): Return when PATH is a symlink or smaller than 'deduplicationMinSize'. * tests/derivations.scm ("identical files are deduplicated"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/nar.scm ("restore-file-set with directories (signed, valid)"): Likewise. * tests/store-deduplication.scm ("deduplicate, below %deduplication-minimum-size"): New test. ("deduplicate", "deduplicate, ENOSPC"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/store.scm ("substitute, deduplication"): Likewise.
2021-11-13 20:47:15 +00:00
(put-bytevector port (string->utf8 (string-reverse data)))))
(deduplicate store (nar-sha256 store) #:store store)
;; (system (string-append "ls -lRia " store))
(cons* (apply = (map (compose stat:ino stat) identical))
(= (stat:ino (stat unique))
(stat:ino (stat (car identical))))
(stat:nlink (stat unique))
(map (compose stat:nlink stat) identical))))))
(test-equal "deduplicate, ENOSPC"
(cons* #f ;inode comparison
(append (make-list 3 4)
(make-list 7 1))) ;'nlink' values
;; In this scenario the first 3 files are properly deduplicated and then we
;; simulate a full '.links' directory where link(2) gets ENOSPC, thereby
;; preventing deduplication of the subsequent files.
(call-with-temporary-directory
(lambda (store)
(let ((true-link link)
(links 0)
daemon: Do not deduplicate files smaller than 8 KiB. Files smaller than 8 KiB typically represent ~70% of the entries in /gnu/store/.links but only contribute to ~4% of the space savings afforded by deduplication. Not considering these files for deduplication speeds up file insertion in the store and, more importantly, leaves 'removeUnusedLinks' with fewer entries to traverse, thereby speeding it up proportionally. Partly fixes <https://issues.guix.gnu.org/24937>. * config-daemon.ac: Remove symlink hard link check and CAN_LINK_SYMLINK definition. * guix/store/deduplication.scm (%deduplication-minimum-size): New variable. (deduplicate)[loop]: Do not recurse when FILE's size is below %DEDUPLICATION-MINIMUM-SIZE. (dump-port): New procedure. (dump-file/deduplicate)[hash]: Turn into... [dump-and-compute-hash]: ... this thunk. Call 'deduplicate' only when SIZE is greater than %DEDUPLICATION-MINIMUM-SIZE; otherwise call 'dump-port'. * nix/libstore/gc.cc (LocalStore::removeUnusedLinks): Drop files where st.st_size < deduplicationMinSize. * nix/libstore/local-store.hh (deduplicationMinSize): New declaration. * nix/libstore/optimise-store.cc (deduplicationMinSize): New variable. (LocalStore::optimisePath_): Return when PATH is a symlink or smaller than 'deduplicationMinSize'. * tests/derivations.scm ("identical files are deduplicated"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/nar.scm ("restore-file-set with directories (signed, valid)"): Likewise. * tests/store-deduplication.scm ("deduplicate, below %deduplication-minimum-size"): New test. ("deduplicate", "deduplicate, ENOSPC"): Produce files bigger than %DEDUPLICATION-MINIMUM-SIZE. * tests/store.scm ("substitute, deduplication"): Likewise.
2021-11-13 20:47:15 +00:00
(data1 (string->utf8
(string-concatenate (make-list 1000 "Hello, world!"))))
(data2 (string->utf8
(string-concatenate (make-list 1000 "Hi, world!"))))
(identical (map (lambda (n)
(string-append store "/" (number->string n)
"/a/b/c"))
(iota 10)))
(populate (lambda (data)
(lambda (file)
(mkdir-p (dirname file))
(call-with-output-file file
(lambda (port)
(put-bytevector port data)))))))
(for-each (populate data1) (take identical 5))
(for-each (populate data2) (drop identical 5))
(dynamic-wind
(lambda ()
(set! link (lambda (old new)
(set! links (+ links 1))
(if (<= links 4)
(true-link old new)
(throw 'system-error "link" "~A" '("Whaaat?!")
(list ENOSPC))))))
(lambda ()
(deduplicate store (nar-sha256 store) #:store store))
(lambda ()
(set! link true-link)))
(cons (apply = (map (compose stat:ino stat) identical))
(map (compose stat:nlink stat) identical))))))
(test-assert "copy-file/deduplicate"
(call-with-temporary-directory
(lambda (store)
(let ((source (search-path %load-path "gnu/packages/emacs-xyz.scm")))
(for-each (lambda (target)
(copy-file/deduplicate source
(string-append store target)
#:store store))
'("/a" "/b" "/c"))
(and (directory-exists? (string-append store "/.links"))
(file=? source (string-append store "/a"))
(apply = (map (compose stat:ino stat
(cut string-append store <>))
'("/a" "/b" "/c"))))))))
(test-end "store-deduplication")