Commit Graph

205 Commits

Author SHA1 Message Date
Eelco Dolstra d210cdc435 Fix assertion failure in ‘nix-store --load-db’
Namely:

  nix-store: derivations.cc:242: nix::Hash nix::hashDerivationModulo(nix::StoreAPI&, nix::Derivation): Assertion `store.isValidPath(i->first)' failed.

This happened because of the derivation output correctness check being
applied before the references of a derivation are valid.
2014-02-03 22:36:07 +01:00
Eelco Dolstra d6582c04c1 Give a friendly error message if the DB directory is not writable
Previously we would say "error: setting synchronous mode: unable to
open database file" which isn't very helpful.
2014-02-01 16:57:38 +01:00
Eelco Dolstra 709cbe4e76 Include <cstring> for memset
This should fix building on Illumos.
2013-11-22 10:00:43 +00:00
Eelco Dolstra a478e8a7bb Remove nix-setuid-helper
AFAIK, nobody uses it, it's not maintained, and it has no tests.
2013-11-14 11:57:37 +01:00
Eelco Dolstra a737f51fd9 Retry all SQLite operations
To deal with SQLITE_PROTOCOL, we also need to retry read-only
operations.
2013-10-16 15:58:20 +02:00
Eelco Dolstra ff02f5336c Fix a race in registerFailedPath()
Registering the path as failed can fail if another process does the
same thing after the call to hasPathFailed().  This is extremely
unlikely though.
2013-10-16 14:55:53 +02:00
Eelco Dolstra 4bd5282573 Convenience macros for retrying a SQLite transaction 2013-10-16 14:46:35 +02:00
Eelco Dolstra bce14d0f61 Don't wrap read-only queries in a transaction
There is no risk of getting an inconsistent result here: if the ID
returned by queryValidPathId() is deleted from the database
concurrently, subsequent queries involving that ID will simply fail
(since IDs are never reused).
2013-10-16 14:36:53 +02:00
Eelco Dolstra 7cdefdbe73 Print a distinct warning for SQLITE_PROTOCOL 2013-10-16 14:27:36 +02:00
Eelco Dolstra d05bf04444 Treat SQLITE_PROTOCOL as SQLITE_BUSY
In the Hydra build farm we fairly regularly get SQLITE_PROTOCOL errors
(e.g., "querying path in database: locking protocol").  The docs for
this error code say that it "is returned if some other process is
messing with file locks and has violated the file locking protocol
that SQLite uses on its rollback journal files."  However, the SQLite
source code reveals that this error can also occur under high load:

  if( cnt>5 ){
    int nDelay = 1;                      /* Pause time in microseconds */
    if( cnt>100 ){
      VVA_ONLY( pWal->lockError = 1; )
      return SQLITE_PROTOCOL;
    }
    if( cnt>=10 ) nDelay = (cnt-9)*238;  /* Max delay 21ms. Total delay 996ms */
    sqlite3OsSleep(pWal->pVfs, nDelay);
  }

i.e. if certain locks cannot be not acquired, SQLite will retry a
number of times before giving up and returing SQLITE_PROTOCOL.  The
comments say:

  Circumstances that cause a RETRY should only last for the briefest
  instances of time.  No I/O or other system calls are done while the
  locks are held, so the locks should not be held for very long. But
  if we are unlucky, another process that is holding a lock might get
  paged out or take a page-fault that is time-consuming to resolve,
  during the few nanoseconds that it is holding the lock.  In that case,
  it might take longer than normal for the lock to free.
  ...
  The total delay time before giving up is less than 1 second.

On a heavily loaded machine like lucifer (the main Hydra server),
which often has dozens of processes waiting for I/O, it seems to me
that a page fault could easily take more than a second to resolve.
So, let's treat SQLITE_PROTOCOL as SQLITE_BUSY and retry the
transaction.

Issue NixOS/hydra#14.
2013-10-16 14:19:59 +02:00
Ivan Kozik 34bb806f74 Fix typos, especially those that end up in the Nix manual 2013-08-26 11:15:22 +02:00
Eelco Dolstra a583a2bc59 Run the daemon worker on the same CPU as the client
On a system with multiple CPUs, running Nix operations through the
daemon is significantly slower than "direct" mode:

$ NIX_REMOTE= nix-instantiate '<nixos>' -A system
real    0m0.974s
user    0m0.875s
sys     0m0.088s

$ NIX_REMOTE=daemon nix-instantiate '<nixos>' -A system
real    0m2.118s
user    0m1.463s
sys     0m0.218s

The main reason seems to be that the client and the worker get moved
to a different CPU after every call to the worker.  This patch adds a
hack to lock them to the same CPU.  With this, the overhead of going
through the daemon is very small:

$ NIX_REMOTE=daemon nix-instantiate '<nixos>' -A system
real    0m1.074s
user    0m0.809s
sys     0m0.098s
2013-08-07 14:02:04 +02:00
Eelco Dolstra 1906cce6fc Increase SQLite's auto-checkpoint interval
Common operations like instantiating a NixOS system config no longer
fitted in 8192 pages, leading to more fsyncs.  So increase this limit.
2013-06-20 14:01:33 +00:00
Eelco Dolstra 22144afa8d Don't keep "disabled" substituters running
For instance, it's pointless to keep copy-from-other-stores running if
there are no other stores, or download-using-manifests if there are no
manifests.  This also speeds things up because we don't send queries
to those substituters.
2013-06-20 11:55:15 +02:00
Eelco Dolstra 1b6ee8f4c7 Allow hard links between the outputs of a derivation 2013-06-13 17:29:56 +02:00
Eelco Dolstra f9ff67e948 In repair mode, update the hash of rebuilt paths
Otherwise subsequent invocations of "--repair" will keep rebuilding
the path.  This only happens if the path content differs between
builds (e.g. due to timestamps).
2013-06-13 14:46:07 +02:00
Eelco Dolstra ca70fba0bf Remove obsolete EOF checks 2013-06-07 15:10:23 +02:00
Eelco Dolstra 5959c591a0 Process stderr from substituters while doing have/info queries 2013-06-07 15:02:14 +02:00
Eelco Dolstra c5f9d0d080 Buffer reads from the substituter
This greatly reduces the number of system calls.
2013-06-07 14:00:23 +02:00
Eelco Dolstra 470553bd05 Don't let stderr writes in substituters cause a deadlock 2013-05-01 13:21:39 +02:00
Shea Levy cc63db1dd5 makeStoreWritable: Ask forgiveness, not permission
It is surprisingly impossible to check if a mountpoint is a bind mount
on Linux, and in my previous commit I forgot to check if /nix/store was
even a mountpoint at all. statvfs.f_flag is not populated with MS_BIND
(and even if it were, my check was wrong in the previous commit).

Luckily, the semantics of mount with MS_REMOUNT | MS_BIND make both
checks unnecessary: if /nix/store is not a mountpoint, then mount will
fail with EINVAL, and if /nix/store is not a bind-mount, then it will
not be made writable. Thus, if /nix/store is not a mountpoint, we fail
immediately (since we don't know how to make it writable), and if
/nix/store IS a mountpoint but not a bind-mount, we fail at first write
(see below for why we can't check and fail immediately).

Note that, due to what is IMO buggy behavior in Linux, calling mount
with MS_REMOUNT | MS_BIND on a non-bind readonly mount makes the
mountpoint appear writable in two places: In the sixth (but not the
10th!) column of mountinfo, and in the f_flags member of struct statfs.
All other syscalls behave as if the mount point were still readonly (at
least for Linux 3.9-rc1, but I don't think this has changed recently or
is expected to soon). My preferred semantics would be for MS_REMOUNT |
MS_BIND to fail on a non-bind mount, as it doesn't make sense to remount
a non bind-mount as a bind mount.
2013-03-25 19:00:16 +01:00
Shea Levy 2c9cf50746 makeStoreWritable: Use statvfs instead of /proc/self/mountinfo to find out if /nix/store is a read-only bind mount
/nix/store could be a read-only bind mount even if it is / in its own filesystem, so checking the 4th field in mountinfo is insufficient.

Signed-off-by: Shea Levy <shea@shealevy.com>
2013-03-25 19:00:16 +01:00
Eelco Dolstra bdd4646338 Revert "Prevent config.h from being clobbered"
This reverts commit 28bba8c44f.
2013-03-08 01:24:59 +01:00
Eelco Dolstra 28bba8c44f Prevent config.h from being clobbered 2013-03-07 23:55:55 +01:00
Eelco Dolstra 8057a192e3 Handle systems without lutimes() or lchown() 2013-02-28 19:55:09 +01:00
Eelco Dolstra f45c731cd7 Handle symlinks properly
Now it's really brown paper bag time...
2013-02-28 14:51:08 +01:00
Eelco Dolstra 0111ba98ea Handle hard links to other files in the output 2013-02-27 17:18:41 +01:00
Eelco Dolstra b008674e46 Refactoring: Split off the non-recursive canonicalisePathMetaData()
Also, change the file mode before changing the owner.  This prevents a
slight time window in which a setuid binary would be setuid root.
2013-02-27 16:42:19 +01:00
Eelco Dolstra 5526a282b5 Security: Don't allow builders to change permissions on files they don't own
It turns out that in multi-user Nix, a builder may be able to do

  ln /etc/shadow $out/foo

Afterwards, canonicalisePathMetaData() will be applied to $out/foo,
causing /etc/shadow's mode to be set to 444 (readable by everybody but
writable by nobody).  That's obviously Very Bad.

Fortunately, this fails in NixOS's default configuration because
/nix/store is a bind mount, so "ln" will fail with "Invalid
cross-device link".  It also fails if hard-link restrictions are
enabled, so a workaround is:

  echo 1 > /proc/sys/fs/protected_hardlinks

The solution is to check that all files in $out are owned by the build
user.  This means that innocuous operations like "ln
${pkgs.foo}/some-file $out/" are now rejected, but that already failed
in chroot builds anyway.
2013-02-26 02:30:19 +01:00
Eelco Dolstra 5e9c3da412 Only warn about SQLite being busy once
No need to get annoying.
2013-01-23 16:45:10 +01:00
Eelco Dolstra b424d29d1b Open the database after removing immutable bits 2013-01-03 13:29:17 +01:00
Eelco Dolstra def5160b61 Clear any immutable bits in the Nix store
Doing this once makes subsequent operations like garbage collecting
more efficient since we don't have to call makeMutable() first.
2013-01-03 12:59:23 +01:00
Eelco Dolstra 772778c0ec On SQLITE_BUSY, wait a random amount of time
If all contending processes wait a fixed amount of time (100 ms),
there is a good probability that they'll just collide again.
2012-12-11 11:49:42 +01:00
Eelco Dolstra ea89df2b76 Use vfork() instead of fork() if available
Hopefully this reduces the chance of hitting ‘unable to fork: Cannot
allocate memory’ errors.  vfork() is used for everything except
starting builders.
2012-11-09 18:00:33 +01:00
Eelco Dolstra 198dbe7fa1 Remove some redundant close() calls
They are unnecessary because we set the close-on-exec flag.
2012-11-09 16:58:51 +01:00
Eelco Dolstra 10dcee99ed Remove the quickExit function 2012-11-09 16:42:10 +01:00
Eelco Dolstra 4c9e3fa641 Remove a Darwin hack that should no longer be needed 2012-11-09 16:35:42 +01:00
Eelco Dolstra 91ef4d9a81 Remove unnecessary call to closeMostFDs()
We have close-on-exec on all FDs now, and there is no security risk in
passing open FDs to substituters anyway.
2012-11-09 14:43:47 +01:00
Shea Levy d0fc615af6 canonicalizePathMetaData: Fall-back to utimes if lutimes fails due to ENOSYS 2012-11-06 11:29:59 +01:00
Eelco Dolstra 904f50412c nix-store --verify: Continue on errors 2012-10-04 10:20:23 -04:00
Eelco Dolstra 7586095504 Remove bin2c 2012-10-03 16:59:28 -04:00
Eelco Dolstra 0a7084567f Add a ‘--repair’ flag to nix-instantiate
This allows repairing corrupted derivations and other source files.
2012-10-03 15:09:18 -04:00
Eelco Dolstra a3f205b249 When repairing a derivation, check and repair the entire output closure
If we find a corrupted path in the output closure, we rebuild the
derivation that produced that particular path.
2012-10-03 10:38:09 -04:00
Eelco Dolstra 2001895f3d Add a --repair flag to ‘nix-store -r’ to repair derivation outputs
With this flag, if any valid derivation output is missing or corrupt,
it will be recreated by using a substitute if available, or by
rebuilding the derivation.  The latter may use hash rewriting if
chroots are not available.
2012-10-02 17:13:46 -04:00
Eelco Dolstra 8e3a7bd712 nix-store --verify: Add an option ‘--repair’ to repair all missing/corrupt paths
Also, return a non-zero exit code if errors remain after
verifying/repairing.
2012-10-02 15:12:56 -04:00
Eelco Dolstra d534f137f0 Make the store writable before creating /nix/store/.links 2012-09-25 16:30:08 -04:00
Eelco Dolstra b9c2b4d5b4 Remove setting of the immutable bit
Using the immutable bit is problematic, especially in conjunction with
store optimisation.  For instance, if the garbage collector deletes a
file, it has to clear its immutable bit, but if the file has
additional hard links, we can't set the bit afterwards because we
don't know the remaining paths.

So now that we support having the entire Nix store as a read-only
mount, we may as well drop the immutable bit.  Unfortunately, we have
to keep the code to clear the immutable bit for backwards
compatibility.
2012-09-19 16:17:54 -04:00
Eelco Dolstra b9124a5c33 Support having /nix/store as a read-only bind mount
It turns out that the immutable bit doesn't work all that well.  A
better way is to make the entire Nix store a read-only bind mount,
i.e. by doing

  $ mount --bind /nix/store /nix/store
  $ mount -o remount,ro,bind /nix/store

(This would typically done in an early boot script, before anything
from /nix/store is used.)

Since Nix needs to be able to write to the Nix store, it now detects
if /nix/store is a read-only bind mount and then makes it writable in
a private mount namespace.
2012-09-19 15:45:29 -04:00
Eelco Dolstra 76e88871b2 Templatise tokenizeString() 2012-09-19 15:43:23 -04:00
Eelco Dolstra e6e495649c Vacuum the SQLite DB after running the garbage collector 2012-09-13 14:33:41 -04:00