1. 26 Jul, 2019 1 commit
    • Kevin Pouget's avatar
      VMs periodic checkpointing · f6a5349c
      Kevin Pouget authored
      This work was supported by the ExaNoDe project that has received funding
      from the European Union’s Horizon 2020 research and innovation programme
      under grant agreement No. 671578. The work presented in this paper
      reflects only authors’ view and the European Commission is not
      responsible for any use that may be made of the information it contains.
      
      [vosys] remove submodules for faster testing
      
      [vosys] add gitlab-ci
      
      migration/postcopy: define userfaultfd syscall number
      
      This patch adds `userfaultfd` definition to Qemu Linux ARM64 syscall
      list.
      
      migration/postcopy: update userfaultfd header file
      
      This patch updates Qemu's copy of Linux `userfaultfd.h` based on commit
      25412491e9e43d27d9c50aea09a106e4876f108e from repository
      https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
      https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/tree/include/uapi/linux/userfaultfd.h?h=userfault&id=25412491e9e43d27d9c50aea09a106e4876f108eSigned-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/postcopy: improve postcopy helper functions
      
      This patch extends the postcopy toolset functions to provide/extend
      UserfaultFD functionalities.
      
      These functions are lightweight wrappers around UFFD syscalls:
      
          static int uffd_wake(int uffd, ram_addr_t region, size_t len);
          static int uffd_unregister_protection(int uffd, ram_addr_t region,
                                                size_t len);
          static int uffd_protection(int uffd, ram_addr_t page_addr,
                                     size_t len, int remove);
      
      These functions are their public interface:
      
          int postcopy_ram_register_wp(UserfaultState *us);
          int postcopy_ram_register_missing(UserfaultState *us)
          int postcopy_ram_wprotect_all(UserfaultState *us);
          int postcopy_ram_disable_notify(UserfaultState *us);
      
          int postcopy_ram_write_protect(UserfaultState *us);
      
      There is currently a functionnality of UFFD that is not working as
      expected: when the write-protection of some pages of a region has been
      turned off, we currently must unregister the whole region from UFFD,
      then register it again, and activate the write protection. This is
      unfortunate, as it implies that the VM cannot be running between these
      two operations.
      
      The expected behavior (re-activate the protection of all the pages of
      a region) could be performed without stopping the VM.
      
      Set `UFFD_USE_UNREGISTER` to `1` to have `postcopy_ram_enable_notify`
      to work; set it to `0` to switch to the version not working at the
      moment, but supposedly correct.
      
      Function `postcopy_ram_fault_thread` relies on `UserfaultState *us`
      structure to belong to an `MigrationIncomingState` object (it accesses
      `mis->postcopy_remote_fds` array). For checkpoint migration, we do not
      modify this behavior, and instead make sure that this structure is
      always in a valid state (ie, not destroyed it at the end of incoming
      migrations.)
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/hmp: allow printing migration capabilities without migration statuses
      
      The `hmp_info_migrate function` first prints the migration
      capabilities, then the migration status.
      
      This patch allows the display of the migration capabilities, even if
      there is no status currently set (for printing incremental information
      about incremental checkpointing dirty page tracking).
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration: add dedicated init function
      
      This patch adds a dedicated init function for the `migration` module.
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: add capabilities and state query helpers
      
      Introduce 'live' and 'incremental' migration capabilities and helper
      functions, as well as snapshot state query functions:
      
      - full or incremental checkpoint ongoing?
      - inside an incremental checkpoint?
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: add page-level atomic operations and bitmaps
      
      These operations and bitmaps track the state of the virtual machine
      RAM pages during the checkpointing. Pages can be marked as dirty,
      already saved, or under processing.
      
      These bitmaps are independent of the ones used for VM migrations.
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: extend migration priority-page queue
      
      This patch extends the migration page queue to store the pages that
      should be saved as part of the on-going or next checkpoint.
      
      - unqueue_page: extended to correctly track the case where the virtual
        machine pages are smaller then the host page (eg, 1K vs 4K)
      
      - ram_update_page_in_queue: add shadow copy of the page content to a
        page already in the priority queue
      
      - ram_next_dirty_pages_to_priority_queue: at the end of a live and
        incremental checkpoint, put in priority queue of the next checkpoint
        the pages that faulted during the current checkpoint.
      
      - ram_page_req_mutex: lock/unlock the priority queue mutex from other
        files (eg: postcopy-ram.c)
      
      - ram_save_queue_pages: add flags to indicate if the page should be
        added to the main priority queue, or in the priority queue of the
        next checkpoint
      
      - ram_pages_in_queue: indicates if the priority queue is currently
        empty
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: extend the page fault handler
      
      This patch extends the UserfaultFD fault handler to track
      write-protection faults during live and incremental checkpointing.
      
      - postcopy_ram_fault_thread: handle write-protection fault in the VM
        memory:
        - mark the page as dirty,
        - push it to the right priority queue (current or next)
        - if necessary, take a shadow copy of its content before allowing
          the VM to continue its execution.
      
      - migration/postcopy-ram.o is moved to Makefile.target so that it can
      use the TARGET_PAGE_SIZE macro.
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: add run_state transitions
      
      This patch adds the different `run_state` transitions that occur during
      live and incremental checkpointing.
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: add checkpoint metadata and file-saving operations
      
      This patch introduces the structures and helper functions required to
      save incremental checkpoints to disk:
      
      - checkpoint_save_metadata: write (or update) `CHPT_META_MAGIC` and
        `checkpoint_file_state` at the beginning of the file `filename`.
      
      - file_start_outgoing_migration: make sure that the file already exist
        if saving a checkpoint increment.
      
      - qmp_migrate: add 'chpt:' prefix for checkpointing into a file, add
        initialize checkpoint state data.
      
      - snapshot_reset_increments: reset the checkpoint increment counters.
      
      - struct CheckpointState checkpoint_file_state[]: global array
        structure storing the meta data related to each of the current
        incremental checkpoint.
      
      - struct CheckpointState checkpoint_state: global structure storing
        the current state of the checkpointing.
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: extends the snapshot_thread
      
      This patch extends the `snapshot_thread` function, which leads the
      live and incremental memory checkpointing.
      
      If the `live` migration capability is enabled, this snapshot_thread
      will run in parallel of the VM execution. The guest RAM will be
      write-protected, so that we can ensure that the pages touched by the
      guest system are copied to shadow memory before being modified.
      
      If the `incremental` migration capability is enabled, then ...
      
      1a. if this is the first checkpoint, then a full memory checkpoint will
      be carried out.
      1b. if this is not the first checkpoint, only the pages marked as
      dirty will be saved to disk.
      
      2. at the end of the checkpoint, dirty page (write-protection)
      tracking will be enabled, in order to track the pages modified by the
      guest system.
      
      The page protection is handled inside `postcopy-ram.c`, which
      encapsulates the calls to UserfaultFD.
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: add RAM checkpointing mechanisms
      
      This patch extends the RAM migration mechanisms to support live and
      incremental checkpointing.
      
      RAM checkpointing relies on a set of per-page atomic flags:
      
      - dirty: true if the page has been modified since the previous
      checkpoint, and hence needs to be saved in the next one.
      
      - sent: true if the page has already been saved to disk, as part of
      the currently ongoing checkpoint
      
      - under processing: flag used as mutex to ensure that a page cannot be
      at the same time copied to shadow memory and copied to disk (this can
      only happen during a incremental live checkpoint).
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: introduce incremental checkpoint reloading
      
      This patches introduces the functions required to reload incremental
      checkpoints:
      
      - qemu_start_incoming_migration: add prefix 'chpt' to incoming
        migrations to trigger a checkpoint file reloading
      
      - file_start_incoming_checkpoint_reload: starts a "checkpoint file
        reloading" incoming migration. If the filename starts with
        "<n:int>:", then only the n first increments will be
        reloaded (`reload_stop_at` field of the `checkpoint_state` global
        state).
      
      - file_get_checkpoint_fd: returns a new file description (`dup`licated
        from the main one), pointing at the beginning of the incremental
        migration stream to reload (`snapshot_number` parameter).
      
      - checkpoint_load_metadata: checks the checkpoint metadata magic
        number and loads the checkpoint meta data, at the beginning of the
        checkpoint file.
      
      - process_incoming_migration_co: updated to allow reloading multiple
        checkpoint increments, or a single/simple checkpoint of if the
        checkpoint metadata magic number was not found.
      
      - incoming_migration_is_last_increment: indicate if the checkpoint
        increment currently being reloaded is the last one that will be
        loaded.
      
      - loadvm_load_checkpoint: triggers the actual reloading of a given
        checkpoint increment.
      
      - vmstate_load_state: updated to call `vmsd->post_load` only once,
        after the reloading of the last checkpoint increment.
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: introduce checkpoint increment squashing
      
      This patch introduces the ability to squash multiple checkpoint
      increments into a single one.
      
      It is an extension of the checkpoint reloading incoming migration,
      using the prefix "chpt:squash[:N]", where N is the number of
      checkpoint increment to consider.
      
      Checkpoint squashing works by performing a normal checkpoint reload,
      but without restarting the VM after its completion. Instead, when the
      reloading has succeeded, a new, full, checkpoint is performed,
      creating a single checkpoint file. Once this full checkpoint has
      completed, Qemu exits.
      
      The goal of checkpoint squashing is to reduce disk size, as
      incremental checkpoints may grow big over the time. Besides, reloading
      a single checkpoint is necessarily faster than reloading multiple
      increments.
      
      - qemu_start_incoming_migration: set the flag `do_squash` if the
        migration prefix is `chpt:squash:`.
      
      - process_incoming_migration_bh:trigger a new checkpoint migration to
        create the single-increment snapshot file, and setup a timer that
        will wait for the completion of the migration and terminate Qemu.
      
      - hmp_migrate_status_cb: in checkpoing squashing, do not report the
        progress of disks or block migration; inform the user about the squash
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: introduce periodic checkpointing
      
      This patch adds the ability to trigger periodic checkpoints. It should
      be used in conjunction with the 'incremental' capability, but this is
      not mandatory.
      
      Periodic checkpointing works by setting a Qemu timer that triggers a
      new checkpoint migration when it elapsed. The timer is restarted when
      the checkpoint completes successfully.
      
      A new parameter (`period`) is added to the HMP `migrate` command. It
      takes as value the period (in x10s) at which the checkpoint will be
      performed. A value of 0 disables any ongoing periodic checkpointing.
      
      - hmp-commands.hx::migrate: add `period` parameter to the HMP `migrate`
        command.
      - hmp_migrate: Idem.
      - qapi/migration.json: Idem.
      - qmp_migrate: Idem, and correctly initialize or disable the periodic
        checkpointing.
      
      - PERIODIC_CHECKPOINT_UNIT: Defines the unit of the `period` parameter
        of the `migrate` command. Current value: 10s.
      
      - migration_state_notifier: New migration state change notifier, that
        restarts the periodic timer if the migration succeeded, or cleans up
        the periodic structures if it failed.
      
      - struct MigrationState: extended to store the migration parameters
        and the periodic checkpoint timer.
      
      - struct MigrationParams: new structure to store the migration
        parameters, so that we can restart it later with the same options.
      
      - periodic_snapshot_cb: Callback triggered after the period checkpoint
        timer elapsed. Triggers a new migration if the previous one
        completed successfully, or deletes the timer if it failed.
      
      - periodic_snapshot_setup: Saves the checkpoint arguments (to be able
        to trigger it again later) and sets the periodic checkpoint timer;
        or deletes it if the requested period is 0.
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      migration/checkpoint: introduce partial reloading
      
      This patches improves the reloading of incremental checkpointing, by
      only reloading the RAM section of the incremental state (except for
      the last increment).
      
      - qemu_savevm_state_iterate: save the offset of the checkpoint file
        where the RAM begins.
      
      - qemu_loadvm_section_start_full: after having reloaded the RAM, if
        this is not the last checkpoint increment, interrupt the
        reloading (return `-EINTR`).
      
      - process_incoming_migration_co: detect that the reloading was
        interrupted because the RAM section has been reloaded
        (`err == -EINTR`).
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] add debugging message commands
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] add init hook
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] add a VM id for Qemu: $VM_UID or $USER-$VM_ID
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] force Qemu to name threads
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration/checkpoint: introduce VOSYS_TRY_MMAP_SHADOW_COPY [optimization]
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration/checkpoint: introduce checksum capability
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration/checkpoint: add sanity checks
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration/checkpoint: abort on sanity check failure
      
      [vosys] migration/checkpoint: add checkpoint statistics
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration/checkpoint: add logging messages
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration/checkpoint: checksum the pages when they are saved to disk [WARNING]
      
      This patch computes the checksum of the pages when they are being
      saved to disk. This is complementary with the ram_checksum capability,
      that computes the checksum when the checkpoint is requested, and
      compares it with the checksum when the VM is reloaded.
      
      This patch ensures that the RAM content saved to disk is identical to
      what the one at the checkpoint request.
      
      However, for this capability to work with incremental checkpointing,
      we have to store the checksum for each of the RAM page, and update it
      when performing an incremental checkpoint.
      
      WARNING: there is an 'off by one' error that appears sometimes so I
      comment out the abort() on failure ...
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration: add set 'internal-dist' build parameters
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration/checkpoint: allow switching between old/new UFFD
      
      Old is for Linux 4.4
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration/checkpoint: add CoW checkpointing capability
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration/checkpoint: add FORCE_CONT_AFTER_RELOAD for FORTH
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] makefile: add libfuse compilation
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration: add QFS virtual fuse filesystem
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration/checkpoint: add default value for checkpoint destination
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration: introduce guest-inform module
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      [vosys] migration: add guest-inform binding to the QFS
      Signed-off-by: Kevin Pouget's avatarKevin Pouget <k.pouget@virtualopensystems.com>
      
      CONFIG_DEBUG_MUTEX
      f6a5349c
  2. 11 Mar, 2019 1 commit
    • zhanghailiang's avatar
      postcopy/migration: zhanghailiang UserfaultFD patch series · f399b9c2
      zhanghailiang authored
      KP: Update for Qemu master branch, as of 2019-03-07
      
      ------------------------------
      
      postcopy/migration: Split fault related state into struct UserfaultState
      
      Split fault related state from MigrationIncomingState struct, and put
      them all into a new struct UserfaultState. We will add this state into
      struct MigrationState in later patch.
      
      We also fix some helper functions to use the new type.
      Signed-off-by: default avatarzhanghailiang <zhang.zhanghailiang@huawei.com>
      
      ------------------------------
      
      migration: Allow the migrate command to work on file: urls
      
      Usage:
      (qemu) migrate file:/path/to/vm_statefile
      Signed-off-by: default avatarzhanghailiang <zhang.zhanghailiang@huawei.com>
      Signed-off-by: default avatarBenoit Canet <benoit.canet@gmail.com>
      
      ------------------------------
      
      migration: Allow -incoming to work on file: urls
      
      Usage:
      -incoming file:/path/to/vm_statefile
      Signed-off-by: default avatarzhanghailiang <zhang.zhanghailiang@huawei.com>
      Signed-off-by: default avatarBenoit Canet <benoit.canet@gmail.com>
      
      ------------------------------
      
      migration: Create a snapshot thread to realize saving memory snapshot
      
      If users use migrate file:url command, we consider it as creating
      live memory snapshot command.
      Besides, we only support tcg accel for now.
      Signed-off-by: default avatarzhanghailiang <zhang.zhanghailiang@huawei.com>
      
      ------------------------------
      
      migration: implement initialization work for snapshot
      
      We re-use some migration helper fucntions to realize setup work
      for snapshot, besides, we need to do some initialization work (for example,
      save VM's device state) with VM pausing.
      Signed-off-by: default avatarzhanghailiang <zhang.zhanghailiang@huawei.com>
      
      savevm: Split qemu_savevm_state_complete_precopy() into two helper functions
      
      We splited qemu_savevm_state_complete_precopy() into two helper functions,
      qemu_savevm_section_full() and qemu_savevm_section_end().
      The main reason to do that is, sometimes we may want to do this two works
      separately.
      Signed-off-by: default avatarzhanghailiang <zhang.zhanghailiang@huawei.com>
      
      ------------------------------
      
      snapshot: Save VM's device state into snapshot file
      
      For live memory snapshot, we want to catch VM's state at the
      time of getting snapshot command. So we need to save the VM's
      static state (here, it is VM's device state) at the beginning
      of snapshot_thread(), but we can't do that while VM is running.
      Besides, we can't save device's state into snapshot file directly,
      because, we want to re-use the migration's incoming process with
      snapshot, we need to keep the save sequence.
      
      So here, we save the VM's device state into qsb temporarily in the
      SETUP stage with VM is stopped, and save it into snapshot file after
      finishing save VM's live state.
      Signed-off-by: default avatarzhanghailiang <zhang.zhanghailiang@huawei.com>
      f399b9c2
  3. 06 Mar, 2019 38 commits