• Kevin Pouget's avatar
    VMs periodic checkpointing · f6a5349c
    Kevin Pouget authored
    This work was supported by the ExaNoDe project that has received funding
    from the European Union’s Horizon 2020 research and innovation programme
    under grant agreement No. 671578. The work presented in this paper
    reflects only authors’ view and the European Commission is not
    responsible for any use that may be made of the information it contains.
    
    [vosys] remove submodules for faster testing
    
    [vosys] add gitlab-ci
    
    migration/postcopy: define userfaultfd syscall number
    
    This patch adds `userfaultfd` definition to Qemu Linux ARM64 syscall
    list.
    
    migration/postcopy: update userfaultfd header file
    
    This patch updates Qemu's copy of Linux `userfaultfd.h` based on commit
    25412491e9e43d27d9c50aea09a106e4876f108e from repository
    https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
    https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/tree/include/uapi/linux/userfaultfd.h?h=userfault&id=25412491e9e43d27d9c50aea09a106e4876f108eSigned-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/postcopy: improve postcopy helper functions
    
    This patch extends the postcopy toolset functions to provide/extend
    UserfaultFD functionalities.
    
    These functions are lightweight wrappers around UFFD syscalls:
    
        static int uffd_wake(int uffd, ram_addr_t region, size_t len);
        static int uffd_unregister_protection(int uffd, ram_addr_t region,
                                              size_t len);
        static int uffd_protection(int uffd, ram_addr_t page_addr,
                                   size_t len, int remove);
    
    These functions are their public interface:
    
        int postcopy_ram_register_wp(UserfaultState *us);
        int postcopy_ram_register_missing(UserfaultState *us)
        int postcopy_ram_wprotect_all(UserfaultState *us);
        int postcopy_ram_disable_notify(UserfaultState *us);
    
        int postcopy_ram_write_protect(UserfaultState *us);
    
    There is currently a functionnality of UFFD that is not working as
    expected: when the write-protection of some pages of a region has been
    turned off, we currently must unregister the whole region from UFFD,
    then register it again, and activate the write protection. This is
    unfortunate, as it implies that the VM cannot be running between these
    two operations.
    
    The expected behavior (re-activate the protection of all the pages of
    a region) could be performed without stopping the VM.
    
    Set `UFFD_USE_UNREGISTER` to `1` to have `postcopy_ram_enable_notify`
    to work; set it to `0` to switch to the version not working at the
    moment, but supposedly correct.
    
    Function `postcopy_ram_fault_thread` relies on `UserfaultState *us`
    structure to belong to an `MigrationIncomingState` object (it accesses
    `mis->postcopy_remote_fds` array). For checkpoint migration, we do not
    modify this behavior, and instead make sure that this structure is
    always in a valid state (ie, not destroyed it at the end of incoming
    migrations.)
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/hmp: allow printing migration capabilities without migration statuses
    
    The `hmp_info_migrate function` first prints the migration
    capabilities, then the migration status.
    
    This patch allows the display of the migration capabilities, even if
    there is no status currently set (for printing incremental information
    about incremental checkpointing dirty page tracking).
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration: add dedicated init function
    
    This patch adds a dedicated init function for the `migration` module.
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: add capabilities and state query helpers
    
    Introduce 'live' and 'incremental' migration capabilities and helper
    functions, as well as snapshot state query functions:
    
    - full or incremental checkpoint ongoing?
    - inside an incremental checkpoint?
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: add page-level atomic operations and bitmaps
    
    These operations and bitmaps track the state of the virtual machine
    RAM pages during the checkpointing. Pages can be marked as dirty,
    already saved, or under processing.
    
    These bitmaps are independent of the ones used for VM migrations.
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: extend migration priority-page queue
    
    This patch extends the migration page queue to store the pages that
    should be saved as part of the on-going or next checkpoint.
    
    - unqueue_page: extended to correctly track the case where the virtual
      machine pages are smaller then the host page (eg, 1K vs 4K)
    
    - ram_update_page_in_queue: add shadow copy of the page content to a
      page already in the priority queue
    
    - ram_next_dirty_pages_to_priority_queue: at the end of a live and
      incremental checkpoint, put in priority queue of the next checkpoint
      the pages that faulted during the current checkpoint.
    
    - ram_page_req_mutex: lock/unlock the priority queue mutex from other
      files (eg: postcopy-ram.c)
    
    - ram_save_queue_pages: add flags to indicate if the page should be
      added to the main priority queue, or in the priority queue of the
      next checkpoint
    
    - ram_pages_in_queue: indicates if the priority queue is currently
      empty
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: extend the page fault handler
    
    This patch extends the UserfaultFD fault handler to track
    write-protection faults during live and incremental checkpointing.
    
    - postcopy_ram_fault_thread: handle write-protection fault in the VM
      memory:
      - mark the page as dirty,
      - push it to the right priority queue (current or next)
      - if necessary, take a shadow copy of its content before allowing
        the VM to continue its execution.
    
    - migration/postcopy-ram.o is moved to Makefile.target so that it can
    use the TARGET_PAGE_SIZE macro.
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: add run_state transitions
    
    This patch adds the different `run_state` transitions that occur during
    live and incremental checkpointing.
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: add checkpoint metadata and file-saving operations
    
    This patch introduces the structures and helper functions required to
    save incremental checkpoints to disk:
    
    - checkpoint_save_metadata: write (or update) `CHPT_META_MAGIC` and
      `checkpoint_file_state` at the beginning of the file `filename`.
    
    - file_start_outgoing_migration: make sure that the file already exist
      if saving a checkpoint increment.
    
    - qmp_migrate: add 'chpt:' prefix for checkpointing into a file, add
      initialize checkpoint state data.
    
    - snapshot_reset_increments: reset the checkpoint increment counters.
    
    - struct CheckpointState checkpoint_file_state[]: global array
      structure storing the meta data related to each of the current
      incremental checkpoint.
    
    - struct CheckpointState checkpoint_state: global structure storing
      the current state of the checkpointing.
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: extends the snapshot_thread
    
    This patch extends the `snapshot_thread` function, which leads the
    live and incremental memory checkpointing.
    
    If the `live` migration capability is enabled, this snapshot_thread
    will run in parallel of the VM execution. The guest RAM will be
    write-protected, so that we can ensure that the pages touched by the
    guest system are copied to shadow memory before being modified.
    
    If the `incremental` migration capability is enabled, then ...
    
    1a. if this is the first checkpoint, then a full memory checkpoint will
    be carried out.
    1b. if this is not the first checkpoint, only the pages marked as
    dirty will be saved to disk.
    
    2. at the end of the checkpoint, dirty page (write-protection)
    tracking will be enabled, in order to track the pages modified by the
    guest system.
    
    The page protection is handled inside `postcopy-ram.c`, which
    encapsulates the calls to UserfaultFD.
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: add RAM checkpointing mechanisms
    
    This patch extends the RAM migration mechanisms to support live and
    incremental checkpointing.
    
    RAM checkpointing relies on a set of per-page atomic flags:
    
    - dirty: true if the page has been modified since the previous
    checkpoint, and hence needs to be saved in the next one.
    
    - sent: true if the page has already been saved to disk, as part of
    the currently ongoing checkpoint
    
    - under processing: flag used as mutex to ensure that a page cannot be
    at the same time copied to shadow memory and copied to disk (this can
    only happen during a incremental live checkpoint).
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: introduce incremental checkpoint reloading
    
    This patches introduces the functions required to reload incremental
    checkpoints:
    
    - qemu_start_incoming_migration: add prefix 'chpt' to incoming
      migrations to trigger a checkpoint file reloading
    
    - file_start_incoming_checkpoint_reload: starts a "checkpoint file
      reloading" incoming migration. If the filename starts with
      "<n:int>:", then only the n first increments will be
      reloaded (`reload_stop_at` field of the `checkpoint_state` global
      state).
    
    - file_get_checkpoint_fd: returns a new file description (`dup`licated
      from the main one), pointing at the beginning of the incremental
      migration stream to reload (`snapshot_number` parameter).
    
    - checkpoint_load_metadata: checks the checkpoint metadata magic
      number and loads the checkpoint meta data, at the beginning of the
      checkpoint file.
    
    - process_incoming_migration_co: updated to allow reloading multiple
      checkpoint increments, or a single/simple checkpoint of if the
      checkpoint metadata magic number was not found.
    
    - incoming_migration_is_last_increment: indicate if the checkpoint
      increment currently being reloaded is the last one that will be
      loaded.
    
    - loadvm_load_checkpoint: triggers the actual reloading of a given
      checkpoint increment.
    
    - vmstate_load_state: updated to call `vmsd->post_load` only once,
      after the reloading of the last checkpoint increment.
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: introduce checkpoint increment squashing
    
    This patch introduces the ability to squash multiple checkpoint
    increments into a single one.
    
    It is an extension of the checkpoint reloading incoming migration,
    using the prefix "chpt:squash[:N]", where N is the number of
    checkpoint increment to consider.
    
    Checkpoint squashing works by performing a normal checkpoint reload,
    but without restarting the VM after its completion. Instead, when the
    reloading has succeeded, a new, full, checkpoint is performed,
    creating a single checkpoint file. Once this full checkpoint has
    completed, Qemu exits.
    
    The goal of checkpoint squashing is to reduce disk size, as
    incremental checkpoints may grow big over the time. Besides, reloading
    a single checkpoint is necessarily faster than reloading multiple
    increments.
    
    - qemu_start_incoming_migration: set the flag `do_squash` if the
      migration prefix is `chpt:squash:`.
    
    - process_incoming_migration_bh:trigger a new checkpoint migration to
      create the single-increment snapshot file, and setup a timer that
      will wait for the completion of the migration and terminate Qemu.
    
    - hmp_migrate_status_cb: in checkpoing squashing, do not report the
      progress of disks or block migration; inform the user about the squash
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: introduce periodic checkpointing
    
    This patch adds the ability to trigger periodic checkpoints. It should
    be used in conjunction with the 'incremental' capability, but this is
    not mandatory.
    
    Periodic checkpointing works by setting a Qemu timer that triggers a
    new checkpoint migration when it elapsed. The timer is restarted when
    the checkpoint completes successfully.
    
    A new parameter (`period`) is added to the HMP `migrate` command. It
    takes as value the period (in x10s) at which the checkpoint will be
    performed. A value of 0 disables any ongoing periodic checkpointing.
    
    - hmp-commands.hx::migrate: add `period` parameter to the HMP `migrate`
      command.
    - hmp_migrate: Idem.
    - qapi/migration.json: Idem.
    - qmp_migrate: Idem, and correctly initialize or disable the periodic
      checkpointing.
    
    - PERIODIC_CHECKPOINT_UNIT: Defines the unit of the `period` parameter
      of the `migrate` command. Current value: 10s.
    
    - migration_state_notifier: New migration state change notifier, that
      restarts the periodic timer if the migration succeeded, or cleans up
      the periodic structures if it failed.
    
    - struct MigrationState: extended to store the migration parameters
      and the periodic checkpoint timer.
    
    - struct MigrationParams: new structure to store the migration
      parameters, so that we can restart it later with the same options.
    
    - periodic_snapshot_cb: Callback triggered after the period checkpoint
      timer elapsed. Triggers a new migration if the previous one
      completed successfully, or deletes the timer if it failed.
    
    - periodic_snapshot_setup: Saves the checkpoint arguments (to be able
      to trigger it again later) and sets the periodic checkpoint timer;
      or deletes it if the requested period is 0.
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    migration/checkpoint: introduce partial reloading
    
    This patches improves the reloading of incremental checkpointing, by
    only reloading the RAM section of the incremental state (except for
    the last increment).
    
    - qemu_savevm_state_iterate: save the offset of the checkpoint file
      where the RAM begins.
    
    - qemu_loadvm_section_start_full: after having reloaded the RAM, if
      this is not the last checkpoint increment, interrupt the
      reloading (return `-EINTR`).
    
    - process_incoming_migration_co: detect that the reloading was
      interrupted because the RAM section has been reloaded
      (`err == -EINTR`).
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] add debugging message commands
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] add init hook
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] add a VM id for Qemu: $VM_UID or $USER-$VM_ID
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] force Qemu to name threads
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration/checkpoint: introduce VOSYS_TRY_MMAP_SHADOW_COPY [optimization]
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration/checkpoint: introduce checksum capability
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration/checkpoint: add sanity checks
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration/checkpoint: abort on sanity check failure
    
    [vosys] migration/checkpoint: add checkpoint statistics
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration/checkpoint: add logging messages
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration/checkpoint: checksum the pages when they are saved to disk [WARNING]
    
    This patch computes the checksum of the pages when they are being
    saved to disk. This is complementary with the ram_checksum capability,
    that computes the checksum when the checkpoint is requested, and
    compares it with the checksum when the VM is reloaded.
    
    This patch ensures that the RAM content saved to disk is identical to
    what the one at the checkpoint request.
    
    However, for this capability to work with incremental checkpointing,
    we have to store the checksum for each of the RAM page, and update it
    when performing an incremental checkpoint.
    
    WARNING: there is an 'off by one' error that appears sometimes so I
    comment out the abort() on failure ...
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration: add set 'internal-dist' build parameters
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration/checkpoint: allow switching between old/new UFFD
    
    Old is for Linux 4.4
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration/checkpoint: add CoW checkpointing capability
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration/checkpoint: add FORCE_CONT_AFTER_RELOAD for FORTH
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] makefile: add libfuse compilation
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration: add QFS virtual fuse filesystem
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration/checkpoint: add default value for checkpoint destination
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration: introduce guest-inform module
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    [vosys] migration: add guest-inform binding to the QFS
    Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
    
    CONFIG_DEBUG_MUTEX
    f6a5349c