-
Kevin Pouget authored
This work was supported by the ExaNoDe project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 671578. The work presented in this paper reflects only authors’ view and the European Commission is not responsible for any use that may be made of the information it contains. [vosys] remove submodules for faster testing [vosys] add gitlab-ci migration/postcopy: define userfaultfd syscall number This patch adds `userfaultfd` definition to Qemu Linux ARM64 syscall list. migration/postcopy: update userfaultfd header file This patch updates Qemu's copy of Linux `userfaultfd.h` based on commit 25412491e9e43d27d9c50aea09a106e4876f108e from repository https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/tree/include/uapi/linux/userfaultfd.h?h=userfault&id=25412491e9e43d27d9c50aea09a106e4876f108eSigned-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/postcopy: improve postcopy helper functions
This patch extends the postcopy toolset functions to provide/extend
UserfaultFD functionalities.
These functions are lightweight wrappers around UFFD syscalls:
static int uffd_wake(int uffd, ram_addr_t region, size_t len);
static int uffd_unregister_protection(int uffd, ram_addr_t region,
size_t len);
static int uffd_protection(int uffd, ram_addr_t page_addr,
size_t len, int remove);
These functions are their public interface:
int postcopy_ram_register_wp(UserfaultState *us);
int postcopy_ram_register_missing(UserfaultState *us)
int postcopy_ram_wprotect_all(UserfaultState *us);
int postcopy_ram_disable_notify(UserfaultState *us);
int postcopy_ram_write_protect(UserfaultState *us);
There is currently a functionnality of UFFD that is not working as
expected: when the write-protection of some pages of a region has been
turned off, we currently must unregister the whole region from UFFD,
then register it again, and activate the write protection. This is
unfortunate, as it implies that the VM cannot be running between these
two operations.
The expected behavior (re-activate the protection of all the pages of
a region) could be performed without stopping the VM.
Set `UFFD_USE_UNREGISTER` to `1` to have `postcopy_ram_enable_notify`
to work; set it to `0` to switch to the version not working at the
moment, but supposedly correct.
Function `postcopy_ram_fault_thread` relies on `UserfaultState *us`
structure to belong to an `MigrationIncomingState` object (it accesses
`mis->postcopy_remote_fds` array). For checkpoint migration, we do not
modify this behavior, and instead make sure that this structure is
always in a valid state (ie, not destroyed it at the end of incoming
migrations.)
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/hmp: allow printing migration capabilities without migration statuses
The `hmp_info_migrate function` first prints the migration
capabilities, then the migration status.
This patch allows the display of the migration capabilities, even if
there is no status currently set (for printing incremental information
about incremental checkpointing dirty page tracking).
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration: add dedicated init function
This patch adds a dedicated init function for the `migration` module.
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: add capabilities and state query helpers
Introduce 'live' and 'incremental' migration capabilities and helper
functions, as well as snapshot state query functions:
- full or incremental checkpoint ongoing?
- inside an incremental checkpoint?
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: add page-level atomic operations and bitmaps
These operations and bitmaps track the state of the virtual machine
RAM pages during the checkpointing. Pages can be marked as dirty,
already saved, or under processing.
These bitmaps are independent of the ones used for VM migrations.
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: extend migration priority-page queue
This patch extends the migration page queue to store the pages that
should be saved as part of the on-going or next checkpoint.
- unqueue_page: extended to correctly track the case where the virtual
machine pages are smaller then the host page (eg, 1K vs 4K)
- ram_update_page_in_queue: add shadow copy of the page content to a
page already in the priority queue
- ram_next_dirty_pages_to_priority_queue: at the end of a live and
incremental checkpoint, put in priority queue of the next checkpoint
the pages that faulted during the current checkpoint.
- ram_page_req_mutex: lock/unlock the priority queue mutex from other
files (eg: postcopy-ram.c)
- ram_save_queue_pages: add flags to indicate if the page should be
added to the main priority queue, or in the priority queue of the
next checkpoint
- ram_pages_in_queue: indicates if the priority queue is currently
empty
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: extend the page fault handler
This patch extends the UserfaultFD fault handler to track
write-protection faults during live and incremental checkpointing.
- postcopy_ram_fault_thread: handle write-protection fault in the VM
memory:
- mark the page as dirty,
- push it to the right priority queue (current or next)
- if necessary, take a shadow copy of its content before allowing
the VM to continue its execution.
- migration/postcopy-ram.o is moved to Makefile.target so that it can
use the TARGET_PAGE_SIZE macro.
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: add run_state transitions
This patch adds the different `run_state` transitions that occur during
live and incremental checkpointing.
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: add checkpoint metadata and file-saving operations
This patch introduces the structures and helper functions required to
save incremental checkpoints to disk:
- checkpoint_save_metadata: write (or update) `CHPT_META_MAGIC` and
`checkpoint_file_state` at the beginning of the file `filename`.
- file_start_outgoing_migration: make sure that the file already exist
if saving a checkpoint increment.
- qmp_migrate: add 'chpt:' prefix for checkpointing into a file, add
initialize checkpoint state data.
- snapshot_reset_increments: reset the checkpoint increment counters.
- struct CheckpointState checkpoint_file_state[]: global array
structure storing the meta data related to each of the current
incremental checkpoint.
- struct CheckpointState checkpoint_state: global structure storing
the current state of the checkpointing.
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: extends the snapshot_thread
This patch extends the `snapshot_thread` function, which leads the
live and incremental memory checkpointing.
If the `live` migration capability is enabled, this snapshot_thread
will run in parallel of the VM execution. The guest RAM will be
write-protected, so that we can ensure that the pages touched by the
guest system are copied to shadow memory before being modified.
If the `incremental` migration capability is enabled, then ...
1a. if this is the first checkpoint, then a full memory checkpoint will
be carried out.
1b. if this is not the first checkpoint, only the pages marked as
dirty will be saved to disk.
2. at the end of the checkpoint, dirty page (write-protection)
tracking will be enabled, in order to track the pages modified by the
guest system.
The page protection is handled inside `postcopy-ram.c`, which
encapsulates the calls to UserfaultFD.
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: add RAM checkpointing mechanisms
This patch extends the RAM migration mechanisms to support live and
incremental checkpointing.
RAM checkpointing relies on a set of per-page atomic flags:
- dirty: true if the page has been modified since the previous
checkpoint, and hence needs to be saved in the next one.
- sent: true if the page has already been saved to disk, as part of
the currently ongoing checkpoint
- under processing: flag used as mutex to ensure that a page cannot be
at the same time copied to shadow memory and copied to disk (this can
only happen during a incremental live checkpoint).
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: introduce incremental checkpoint reloading
This patches introduces the functions required to reload incremental
checkpoints:
- qemu_start_incoming_migration: add prefix 'chpt' to incoming
migrations to trigger a checkpoint file reloading
- file_start_incoming_checkpoint_reload: starts a "checkpoint file
reloading" incoming migration. If the filename starts with
"<n:int>:", then only the n first increments will be
reloaded (`reload_stop_at` field of the `checkpoint_state` global
state).
- file_get_checkpoint_fd: returns a new file description (`dup`licated
from the main one), pointing at the beginning of the incremental
migration stream to reload (`snapshot_number` parameter).
- checkpoint_load_metadata: checks the checkpoint metadata magic
number and loads the checkpoint meta data, at the beginning of the
checkpoint file.
- process_incoming_migration_co: updated to allow reloading multiple
checkpoint increments, or a single/simple checkpoint of if the
checkpoint metadata magic number was not found.
- incoming_migration_is_last_increment: indicate if the checkpoint
increment currently being reloaded is the last one that will be
loaded.
- loadvm_load_checkpoint: triggers the actual reloading of a given
checkpoint increment.
- vmstate_load_state: updated to call `vmsd->post_load` only once,
after the reloading of the last checkpoint increment.
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: introduce checkpoint increment squashing
This patch introduces the ability to squash multiple checkpoint
increments into a single one.
It is an extension of the checkpoint reloading incoming migration,
using the prefix "chpt:squash[:N]", where N is the number of
checkpoint increment to consider.
Checkpoint squashing works by performing a normal checkpoint reload,
but without restarting the VM after its completion. Instead, when the
reloading has succeeded, a new, full, checkpoint is performed,
creating a single checkpoint file. Once this full checkpoint has
completed, Qemu exits.
The goal of checkpoint squashing is to reduce disk size, as
incremental checkpoints may grow big over the time. Besides, reloading
a single checkpoint is necessarily faster than reloading multiple
increments.
- qemu_start_incoming_migration: set the flag `do_squash` if the
migration prefix is `chpt:squash:`.
- process_incoming_migration_bh:trigger a new checkpoint migration to
create the single-increment snapshot file, and setup a timer that
will wait for the completion of the migration and terminate Qemu.
- hmp_migrate_status_cb: in checkpoing squashing, do not report the
progress of disks or block migration; inform the user about the squash
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: introduce periodic checkpointing
This patch adds the ability to trigger periodic checkpoints. It should
be used in conjunction with the 'incremental' capability, but this is
not mandatory.
Periodic checkpointing works by setting a Qemu timer that triggers a
new checkpoint migration when it elapsed. The timer is restarted when
the checkpoint completes successfully.
A new parameter (`period`) is added to the HMP `migrate` command. It
takes as value the period (in x10s) at which the checkpoint will be
performed. A value of 0 disables any ongoing periodic checkpointing.
- hmp-commands.hx::migrate: add `period` parameter to the HMP `migrate`
command.
- hmp_migrate: Idem.
- qapi/migration.json: Idem.
- qmp_migrate: Idem, and correctly initialize or disable the periodic
checkpointing.
- PERIODIC_CHECKPOINT_UNIT: Defines the unit of the `period` parameter
of the `migrate` command. Current value: 10s.
- migration_state_notifier: New migration state change notifier, that
restarts the periodic timer if the migration succeeded, or cleans up
the periodic structures if it failed.
- struct MigrationState: extended to store the migration parameters
and the periodic checkpoint timer.
- struct MigrationParams: new structure to store the migration
parameters, so that we can restart it later with the same options.
- periodic_snapshot_cb: Callback triggered after the period checkpoint
timer elapsed. Triggers a new migration if the previous one
completed successfully, or deletes the timer if it failed.
- periodic_snapshot_setup: Saves the checkpoint arguments (to be able
to trigger it again later) and sets the periodic checkpoint timer;
or deletes it if the requested period is 0.
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
migration/checkpoint: introduce partial reloading
This patches improves the reloading of incremental checkpointing, by
only reloading the RAM section of the incremental state (except for
the last increment).
- qemu_savevm_state_iterate: save the offset of the checkpoint file
where the RAM begins.
- qemu_loadvm_section_start_full: after having reloaded the RAM, if
this is not the last checkpoint increment, interrupt the
reloading (return `-EINTR`).
- process_incoming_migration_co: detect that the reloading was
interrupted because the RAM section has been reloaded
(`err == -EINTR`).
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] add debugging message commands
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] add init hook
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] add a VM id for Qemu: $VM_UID or $USER-$VM_ID
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] force Qemu to name threads
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration/checkpoint: introduce VOSYS_TRY_MMAP_SHADOW_COPY [optimization]
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration/checkpoint: introduce checksum capability
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration/checkpoint: add sanity checks
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration/checkpoint: abort on sanity check failure
[vosys] migration/checkpoint: add checkpoint statistics
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration/checkpoint: add logging messages
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration/checkpoint: checksum the pages when they are saved to disk [WARNING]
This patch computes the checksum of the pages when they are being
saved to disk. This is complementary with the ram_checksum capability,
that computes the checksum when the checkpoint is requested, and
compares it with the checksum when the VM is reloaded.
This patch ensures that the RAM content saved to disk is identical to
what the one at the checkpoint request.
However, for this capability to work with incremental checkpointing,
we have to store the checksum for each of the RAM page, and update it
when performing an incremental checkpoint.
WARNING: there is an 'off by one' error that appears sometimes so I
comment out the abort() on failure ...
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration: add set 'internal-dist' build parameters
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration/checkpoint: allow switching between old/new UFFD
Old is for Linux 4.4
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration/checkpoint: add CoW checkpointing capability
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration/checkpoint: add FORCE_CONT_AFTER_RELOAD for FORTH
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] makefile: add libfuse compilation
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration: add QFS virtual fuse filesystem
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration/checkpoint: add default value for checkpoint destination
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration: introduce guest-inform module
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
[vosys] migration: add guest-inform binding to the QFS
Signed-off-by:
Kevin Pouget <k.pouget@virtualopensystems.com>
CONFIG_DEBUG_MUTEXf6a5349c