-
Kevin Pouget authored
This work was supported by the ExaNoDe project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 671578. The work presented in this paper reflects only authors’ view and the European Commission is not responsible for any use that may be made of the information it contains. [vosys] remove submodules for faster testing [vosys] add gitlab-ci migration/postcopy: define userfaultfd syscall number This patch adds `userfaultfd` definition to Qemu Linux ARM64 syscall list. migration/postcopy: update userfaultfd header file This patch updates Qemu's copy of Linux `userfaultfd.h` based on commit 25412491e9e43d27d9c50aea09a106e4876f108e from repository https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/tree/include/uapi/linux/userfaultfd.h?h=userfault&id=25412491e9e43d27d9c50aea09a106e4876f108eSigned-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/postcopy: improve postcopy helper functions This patch extends the postcopy toolset functions to provide/extend UserfaultFD functionalities. These functions are lightweight wrappers around UFFD syscalls: static int uffd_wake(int uffd, ram_addr_t region, size_t len); static int uffd_unregister_protection(int uffd, ram_addr_t region, size_t len); static int uffd_protection(int uffd, ram_addr_t page_addr, size_t len, int remove); These functions are their public interface: int postcopy_ram_register_wp(UserfaultState *us); int postcopy_ram_register_missing(UserfaultState *us) int postcopy_ram_wprotect_all(UserfaultState *us); int postcopy_ram_disable_notify(UserfaultState *us); int postcopy_ram_write_protect(UserfaultState *us); There is currently a functionnality of UFFD that is not working as expected: when the write-protection of some pages of a region has been turned off, we currently must unregister the whole region from UFFD, then register it again, and activate the write protection. This is unfortunate, as it implies that the VM cannot be running between these two operations. The expected behavior (re-activate the protection of all the pages of a region) could be performed without stopping the VM. Set `UFFD_USE_UNREGISTER` to `1` to have `postcopy_ram_enable_notify` to work; set it to `0` to switch to the version not working at the moment, but supposedly correct. Function `postcopy_ram_fault_thread` relies on `UserfaultState *us` structure to belong to an `MigrationIncomingState` object (it accesses `mis->postcopy_remote_fds` array). For checkpoint migration, we do not modify this behavior, and instead make sure that this structure is always in a valid state (ie, not destroyed it at the end of incoming migrations.) Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/hmp: allow printing migration capabilities without migration statuses The `hmp_info_migrate function` first prints the migration capabilities, then the migration status. This patch allows the display of the migration capabilities, even if there is no status currently set (for printing incremental information about incremental checkpointing dirty page tracking). Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration: add dedicated init function This patch adds a dedicated init function for the `migration` module. Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: add capabilities and state query helpers Introduce 'live' and 'incremental' migration capabilities and helper functions, as well as snapshot state query functions: - full or incremental checkpoint ongoing? - inside an incremental checkpoint? Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: add page-level atomic operations and bitmaps These operations and bitmaps track the state of the virtual machine RAM pages during the checkpointing. Pages can be marked as dirty, already saved, or under processing. These bitmaps are independent of the ones used for VM migrations. Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: extend migration priority-page queue This patch extends the migration page queue to store the pages that should be saved as part of the on-going or next checkpoint. - unqueue_page: extended to correctly track the case where the virtual machine pages are smaller then the host page (eg, 1K vs 4K) - ram_update_page_in_queue: add shadow copy of the page content to a page already in the priority queue - ram_next_dirty_pages_to_priority_queue: at the end of a live and incremental checkpoint, put in priority queue of the next checkpoint the pages that faulted during the current checkpoint. - ram_page_req_mutex: lock/unlock the priority queue mutex from other files (eg: postcopy-ram.c) - ram_save_queue_pages: add flags to indicate if the page should be added to the main priority queue, or in the priority queue of the next checkpoint - ram_pages_in_queue: indicates if the priority queue is currently empty Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: extend the page fault handler This patch extends the UserfaultFD fault handler to track write-protection faults during live and incremental checkpointing. - postcopy_ram_fault_thread: handle write-protection fault in the VM memory: - mark the page as dirty, - push it to the right priority queue (current or next) - if necessary, take a shadow copy of its content before allowing the VM to continue its execution. - migration/postcopy-ram.o is moved to Makefile.target so that it can use the TARGET_PAGE_SIZE macro. Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: add run_state transitions This patch adds the different `run_state` transitions that occur during live and incremental checkpointing. Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: add checkpoint metadata and file-saving operations This patch introduces the structures and helper functions required to save incremental checkpoints to disk: - checkpoint_save_metadata: write (or update) `CHPT_META_MAGIC` and `checkpoint_file_state` at the beginning of the file `filename`. - file_start_outgoing_migration: make sure that the file already exist if saving a checkpoint increment. - qmp_migrate: add 'chpt:' prefix for checkpointing into a file, add initialize checkpoint state data. - snapshot_reset_increments: reset the checkpoint increment counters. - struct CheckpointState checkpoint_file_state[]: global array structure storing the meta data related to each of the current incremental checkpoint. - struct CheckpointState checkpoint_state: global structure storing the current state of the checkpointing. Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: extends the snapshot_thread This patch extends the `snapshot_thread` function, which leads the live and incremental memory checkpointing. If the `live` migration capability is enabled, this snapshot_thread will run in parallel of the VM execution. The guest RAM will be write-protected, so that we can ensure that the pages touched by the guest system are copied to shadow memory before being modified. If the `incremental` migration capability is enabled, then ... 1a. if this is the first checkpoint, then a full memory checkpoint will be carried out. 1b. if this is not the first checkpoint, only the pages marked as dirty will be saved to disk. 2. at the end of the checkpoint, dirty page (write-protection) tracking will be enabled, in order to track the pages modified by the guest system. The page protection is handled inside `postcopy-ram.c`, which encapsulates the calls to UserfaultFD. Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: add RAM checkpointing mechanisms This patch extends the RAM migration mechanisms to support live and incremental checkpointing. RAM checkpointing relies on a set of per-page atomic flags: - dirty: true if the page has been modified since the previous checkpoint, and hence needs to be saved in the next one. - sent: true if the page has already been saved to disk, as part of the currently ongoing checkpoint - under processing: flag used as mutex to ensure that a page cannot be at the same time copied to shadow memory and copied to disk (this can only happen during a incremental live checkpoint). Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: introduce incremental checkpoint reloading This patches introduces the functions required to reload incremental checkpoints: - qemu_start_incoming_migration: add prefix 'chpt' to incoming migrations to trigger a checkpoint file reloading - file_start_incoming_checkpoint_reload: starts a "checkpoint file reloading" incoming migration. If the filename starts with "<n:int>:", then only the n first increments will be reloaded (`reload_stop_at` field of the `checkpoint_state` global state). - file_get_checkpoint_fd: returns a new file description (`dup`licated from the main one), pointing at the beginning of the incremental migration stream to reload (`snapshot_number` parameter). - checkpoint_load_metadata: checks the checkpoint metadata magic number and loads the checkpoint meta data, at the beginning of the checkpoint file. - process_incoming_migration_co: updated to allow reloading multiple checkpoint increments, or a single/simple checkpoint of if the checkpoint metadata magic number was not found. - incoming_migration_is_last_increment: indicate if the checkpoint increment currently being reloaded is the last one that will be loaded. - loadvm_load_checkpoint: triggers the actual reloading of a given checkpoint increment. - vmstate_load_state: updated to call `vmsd->post_load` only once, after the reloading of the last checkpoint increment. Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: introduce checkpoint increment squashing This patch introduces the ability to squash multiple checkpoint increments into a single one. It is an extension of the checkpoint reloading incoming migration, using the prefix "chpt:squash[:N]", where N is the number of checkpoint increment to consider. Checkpoint squashing works by performing a normal checkpoint reload, but without restarting the VM after its completion. Instead, when the reloading has succeeded, a new, full, checkpoint is performed, creating a single checkpoint file. Once this full checkpoint has completed, Qemu exits. The goal of checkpoint squashing is to reduce disk size, as incremental checkpoints may grow big over the time. Besides, reloading a single checkpoint is necessarily faster than reloading multiple increments. - qemu_start_incoming_migration: set the flag `do_squash` if the migration prefix is `chpt:squash:`. - process_incoming_migration_bh:trigger a new checkpoint migration to create the single-increment snapshot file, and setup a timer that will wait for the completion of the migration and terminate Qemu. - hmp_migrate_status_cb: in checkpoing squashing, do not report the progress of disks or block migration; inform the user about the squash Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: introduce periodic checkpointing This patch adds the ability to trigger periodic checkpoints. It should be used in conjunction with the 'incremental' capability, but this is not mandatory. Periodic checkpointing works by setting a Qemu timer that triggers a new checkpoint migration when it elapsed. The timer is restarted when the checkpoint completes successfully. A new parameter (`period`) is added to the HMP `migrate` command. It takes as value the period (in x10s) at which the checkpoint will be performed. A value of 0 disables any ongoing periodic checkpointing. - hmp-commands.hx::migrate: add `period` parameter to the HMP `migrate` command. - hmp_migrate: Idem. - qapi/migration.json: Idem. - qmp_migrate: Idem, and correctly initialize or disable the periodic checkpointing. - PERIODIC_CHECKPOINT_UNIT: Defines the unit of the `period` parameter of the `migrate` command. Current value: 10s. - migration_state_notifier: New migration state change notifier, that restarts the periodic timer if the migration succeeded, or cleans up the periodic structures if it failed. - struct MigrationState: extended to store the migration parameters and the periodic checkpoint timer. - struct MigrationParams: new structure to store the migration parameters, so that we can restart it later with the same options. - periodic_snapshot_cb: Callback triggered after the period checkpoint timer elapsed. Triggers a new migration if the previous one completed successfully, or deletes the timer if it failed. - periodic_snapshot_setup: Saves the checkpoint arguments (to be able to trigger it again later) and sets the periodic checkpoint timer; or deletes it if the requested period is 0. Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> migration/checkpoint: introduce partial reloading This patches improves the reloading of incremental checkpointing, by only reloading the RAM section of the incremental state (except for the last increment). - qemu_savevm_state_iterate: save the offset of the checkpoint file where the RAM begins. - qemu_loadvm_section_start_full: after having reloaded the RAM, if this is not the last checkpoint increment, interrupt the reloading (return `-EINTR`). - process_incoming_migration_co: detect that the reloading was interrupted because the RAM section has been reloaded (`err == -EINTR`). Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] add debugging message commands Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] add init hook Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] add a VM id for Qemu: $VM_UID or $USER-$VM_ID Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] force Qemu to name threads Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration/checkpoint: introduce VOSYS_TRY_MMAP_SHADOW_COPY [optimization] Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration/checkpoint: introduce checksum capability Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration/checkpoint: add sanity checks Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration/checkpoint: abort on sanity check failure [vosys] migration/checkpoint: add checkpoint statistics Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration/checkpoint: add logging messages Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration/checkpoint: checksum the pages when they are saved to disk [WARNING] This patch computes the checksum of the pages when they are being saved to disk. This is complementary with the ram_checksum capability, that computes the checksum when the checkpoint is requested, and compares it with the checksum when the VM is reloaded. This patch ensures that the RAM content saved to disk is identical to what the one at the checkpoint request. However, for this capability to work with incremental checkpointing, we have to store the checksum for each of the RAM page, and update it when performing an incremental checkpoint. WARNING: there is an 'off by one' error that appears sometimes so I comment out the abort() on failure ... Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration: add set 'internal-dist' build parameters Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration/checkpoint: allow switching between old/new UFFD Old is for Linux 4.4 Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration/checkpoint: add CoW checkpointing capability Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration/checkpoint: add FORCE_CONT_AFTER_RELOAD for FORTH Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] makefile: add libfuse compilation Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration: add QFS virtual fuse filesystem Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration/checkpoint: add default value for checkpoint destination Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration: introduce guest-inform module Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> [vosys] migration: add guest-inform binding to the QFS Signed-off-by: Kevin Pouget <k.pouget@virtualopensystems.com> CONFIG_DEBUG_MUTEX
f6a5349c