-
Notifications
You must be signed in to change notification settings - Fork 565
[WIP] block: discard and write_zeroes support for raw sync and raw async #7292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e2406be to
5ffbf9e
Compare
rbradford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the guest OS take advantage of this? What's the plan for implementing on something better than the raw sync backend - as I doubt that get's used very much compared to say io_uring.
vmm/src/config.rs
Outdated
| id=<device_id>,pci_segment=<segment_id>,rate_limit_group=<group_id>,\ | ||
| queue_affinity=<list_of_queue_indices_with_their_associated_cpuset>,\ | ||
| serial=<serial_number>"; | ||
| serial=<serial_number>,discard=on|off,detect-zeroes=on|off|unmap"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should always be on if the backing file implementation supports it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance of deleting multiple files will deteriorate on ext4 after executing discard.
Do we need more discussions to confirm whether to set discard=on as the default config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed discard=on to the default config, users can disable it if they don't want guest to execute discard or write_zeroes on this block.
virtio-devices/src/block.rs
Outdated
| fn check_request(features: u64, request_type: RequestType) -> result::Result<(), ExecuteError> { | ||
| let has_feature = |feature_pos: u64| -> bool { (features & (1u64 << feature_pos)) != 0 }; | ||
|
|
||
| if has_feature(VIRTIO_BLK_F_RO.into()) && request_type != RequestType::In { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include this refactoring and the R/O support in a separate commit (and even separate PR) as I think this would be a good thing to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to new PR #7294.
2c545e5 to
f847dfb
Compare
|
d4d4b26 to
a241f7a
Compare
0479c2f to
df98391
Compare
|
This is great! The lack of discard support has been one of the major drawbacks of both firecracker and cloud-hypervisor compared to qemu. I guess a future enhancement might be calling ioctl(BLKDISCARD) instead of using FALLOC_FL_PUNCH_HOLE when the virtual disk is backed by a block device instead of a plain file. |
|
@lisongqian Please can you rebase - we can probably look at moving ahead with this PR. |
c600fcf to
4bef3fb
Compare
Backport experimental support for virtio-blk discard from cloud-hypervisor/cloud-hypervisor#7292 for testing and benchmarking.
|
I decided to do a little casual testing with this PR, on my linux-musl x86-64 dev host. I created a small, fully allocated disk image on the host's tmpfs /tmp with and then started cloud hypervisor with a 6.16.x kernel and my 'rescue initramfs' to play with this disk. My first move was to try making an ext4 filesystem, which involves writing zeros and discarding unused blocks. I got an immediate warning about failed requests because 'Submission queue is full': The guest appears to be hung at this point. However, if I disable detect-zeroes, things seem fine: I can confirm that the discards have happened correctly too: I don't think the specifics of the kernel or initramfs are very exciting, although they're available at
respectively, with the defconfig-abbreviated kernel config (suitable for (I should also confirm: for the purposes of this test, I've built an exact clone of this PR's branch without any additional patches, using the current release version of rust. I've also rebased the changes onto the current release of CH, and get the same behaviour there.) I'll continue trying to find problems with the I've now repeated these tests with So I think it's just It's really great seeing CH having proper discard/trim support at last. I was delighted when I saw the allocated space in the disk image being released for the first time. Thanks! :) |
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arachsys Thanks for sharing your experiments with detailed write-up. Can you please work with @lisongqian to get the issues (when enabling detect-zeroes) fixed and add integration tests too?
Please also summarize the motivation (e.g. why) for adding these support, e.g. to explain why it helps with space reclamation and performance, etc. /cc @lisongqian
|
Until @lisongqian is about, maybe I can help out with some notes on the importance of discard support, as you asked for @likebreath? With guest drives provisioned directly onto SSDs (or contiguous volumes/partitions of SSDs), discard support is essentially pass-through of TRIM, and is useful for exactly the same reasons that TRIM helps on the host: passing information about unused blocks down to the flash management firmware to help with wear-levelling and performance. It's much better to erase the released blocks during "idle" time ahead of wanting to reuse them, vs doing a full read-erase-modify-write at write time. And allowing the firmware to understand what blocks are really free and can be reused can make the wear-levelling work across a larger range of free blocks, prolonging SSD life. With thin-provisioned guests, either with sparse image files in filesystems or something like lvm2 sparse volumes, discard plays an even more important role: allowing unused space when files are deleted in guest filesystems (mounted with There are lots of thin-provisioned or SSD-backed situations where the only reason I use qemu instead of cloud-hypervisor or firecracker is because qemu supports discard pass-through, and that's required to make the thin-provisioning work or maximise SSD lifespan. |
84cdf88 to
3c9ace3
Compare
Right, crosvm does ZERO_RANGE by default for virtio WRITE_ZEROES. That's why i'm saying we don't need to have a special option for that. |
I agree, at least for my use of CH. The only thing I wanted to double check is that both sync and async do the right thing with block devices in the sparse case with write-zeros mapping to ioctl BLKZEROOUT and discard mapping to ioctl BLKDISCARD. With files instead block devices, write-zeroes could correctly map to either fallocate PUNCH_HOLE or fallocate WRITE_ZEROES, and I don't mind. Probably the latter unconditionally is easier as you say. |
86104f5 to
72d52b6
Compare
|
The latest push has changed the configuration to
For raw async with io_uring, The |
Ah, I hadn't spotted that wasn't available! When sparse is enabled on a block device, are all IO operations except discard async, with discard issued synchronously, or does enabling sparse force all IO operations include normal reads and writes to be sync? |
|
Potentially torvalds/linux@50c5225 which went into 6.11 might be the specific async block discard, although it's not the whole of ioctl()? |
I will give it a try and add some integration test cases. |
67ddc0b to
245eaca
Compare
|
Would you like me to do some live testing on this yet, @lisongqian, or is it still in development? Very happy to test on some real VMs if we're at a stage where that's helpful. |
I'm not sure it's ready. The unit test has failed but I don't know whether the kernel of the host is bigger than 6.12. |
The host doesn't reclaim the disk space when a disk space is released in guest. This patch supports discard with FALLOC_FL_PUNCH_HOLE and write_zeroes with FALLOC_FL_ZERO_RANGE for raw sync, allowing execution of trim in guest to notify the disk backend to release the disk space. Signed-off-by: Songqian Li <sionli@tencent.com>
This patch supports discard with ioctl(BLKDISCARD) and write_zeroes with ioctl(BLKZEROOUT) when the backend of blk is a block device. Signed-off-by: Songqian Li <sionli@tencent.com>
This patch supports discard and write_zeroes for raw io_uring, allowing execution of trim in guest to notify the disk backend to release the disk space. Signed-off-by: Songqian Li <sionli@tencent.com>
Signed-off-by: Songqian Li <sionli@tencent.com>
Signed-off-by: Songqian Li <sionli@tencent.com>
I can reproduce these failures on a 6.12 kernel. It is flaky, with different assertions in the new test failing at different times. I also observed a partly cleared buffer at some point. My guess is there is at least a race condition between the |
snue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work 🎉
I am still trying to figure out why the tests fail. Apart from the unexpected buffer content assertions, I also see segfaults with the new tests occasionally.
Otherwise, just some small suggestions so far.
| /// Failed to punch hole file. | ||
| #[error("Failed to punch hole file: {0}")] | ||
| PunchHole(#[source] std::io::Error), | ||
| /// Failed to write zeroes to file. | ||
| #[error("Failed to write zeroes to file: {0}")] | ||
| WriteZeroes(#[source] std::io::Error), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested error printing was improved a while ago and the {0} are not needed anymore with errors that provide a #[source].
Also, a small suggestion:
| /// Failed to punch hole file. | |
| #[error("Failed to punch hole file: {0}")] | |
| PunchHole(#[source] std::io::Error), | |
| /// Failed to write zeroes to file. | |
| #[error("Failed to write zeroes to file: {0}")] | |
| WriteZeroes(#[source] std::io::Error), | |
| /// Failed to punch hole in file. | |
| #[error("Failed to punch hole in file")] | |
| PunchHole(#[source] std::io::Error), | |
| /// Failed to write zeroes to file. | |
| #[error("Failed to write zeroes to file")] | |
| WriteZeroes(#[source] std::io::Error), |
| #[error("Failed to punch hole: {0}")] | ||
| PunchHole(AsyncIoError), | ||
| #[error("Failed to write zeroes: {0}")] | ||
| WriteZeroes(AsyncIoError), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, {0} better replaced with #[source].
| #[error("Failed to punch hole: {0}")] | |
| PunchHole(AsyncIoError), | |
| #[error("Failed to write zeroes: {0}")] | |
| WriteZeroes(AsyncIoError), | |
| #[error("Failed to punch hole")] | |
| PunchHole(#[source] AsyncIoError), | |
| #[error("Failed to write zeroes")] | |
| WriteZeroes(#[source] AsyncIoError), |
| #[error("Failed to handle discard or write zeroes: {0}")] | ||
| DiscardWriteZeroes(Error), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here...
| #[error("Failed to handle discard or write zeroes: {0}")] | |
| DiscardWriteZeroes(Error), | |
| #[error("Failed to handle discard or write zeroes")] | |
| DiscardWriteZeroes(#[source] Error), |
| while let Some(mut desc_chain) = queue.pop_descriptor_chain(self.mem.memory()) { | ||
| let mut request = Request::parse(&mut desc_chain, self.access_platform.as_deref()) | ||
| .map_err(Error::RequestParsing)?; | ||
| debug!("virtio-blk request type: {:?}", request.request_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surely helpful for development, but should be removed from the loop before merge.
| --privileged \ | ||
| --security-opt seccomp=unconfined \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunate for the unit tests, but probably required for the loop device creation? Should this rather be an integration test? (but I also like that it is a quick unit test currently)
| let mut cmd = [0u8; 16]; | ||
| cmd[0..8].copy_from_slice(&offset.to_ne_bytes()); | ||
| cmd[8..16].copy_from_slice(&length.to_ne_bytes()); | ||
|
|
||
| // SAFETY: we know the file descriptor is valid and points to a block device. | ||
| unsafe { | ||
| sq.push( | ||
| &opcode::UringCmd16::new(types::Fd(self.fd), BLOCK_URING_CMD_DISCARD() as _) | ||
| .cmd(cmd) | ||
| .build() | ||
| .user_data(user_data), | ||
| ) | ||
| .map_err(|_| AsyncIoError::PunchHole(Error::other("Submission queue is full")))?; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments to this operation are wrong. The offset needs to go to sqe->addr and length is at sqe->addr3 aka. sqe->cmd. Looks like this isn't exposed through the opcode. I hacked it together like this:
| let mut cmd = [0u8; 16]; | |
| cmd[0..8].copy_from_slice(&offset.to_ne_bytes()); | |
| cmd[8..16].copy_from_slice(&length.to_ne_bytes()); | |
| // SAFETY: we know the file descriptor is valid and points to a block device. | |
| unsafe { | |
| sq.push( | |
| &opcode::UringCmd16::new(types::Fd(self.fd), BLOCK_URING_CMD_DISCARD() as _) | |
| .cmd(cmd) | |
| .build() | |
| .user_data(user_data), | |
| ) | |
| .map_err(|_| AsyncIoError::PunchHole(Error::other("Submission queue is full")))?; | |
| }; | |
| let mut cmd = [0u8; 16]; | |
| // length goes to `cmd` aka `sqe.addr3`, but offset/start goes to `sqe.addr` | |
| cmd[0..8].copy_from_slice(&length.to_ne_bytes()); | |
| // SAFETY: we know the file descriptor is valid and points to a block device. | |
| unsafe { | |
| let mut sqe = | |
| opcode::UringCmd16::new(types::Fd(self.fd), BLOCK_URING_CMD_DISCARD() as _) | |
| .cmd(cmd) | |
| .build() | |
| .user_data(user_data); | |
| let sqe_ptr = &raw mut sqe as *mut [u8; 128]; | |
| (&mut (*sqe_ptr))[16..24].copy_from_slice(&offset.to_ne_bytes()); | |
| sq.push(&sqe).map_err(|_| { | |
| AsyncIoError::PunchHole(Error::other("Submission queue is full")) | |
| })?; | |
| }; |
This makes the test_blk_dev_raw_io_uring_discard_wr_zeroes_request succeed for me (and the other discard tests as well).
I still see a little flakiness with both test_raw_io_uring_discard_wr_zeroes_request and the async blk_dev variant, that I have not tracked down yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sqe->addr is exposed in the io_uring crate since tokio-rs/io-uring#375 (not in a crate release yet)
| let evt_num = match epoll::wait(epoll_file.as_raw_fd(), timeout, &mut events[..]) { | ||
| Ok(n) => n, | ||
| Err(e) if e.kind() == std::io::ErrorKind::Interrupted => continue, | ||
| Err(e) => panic!("epoll_wait failed: {e}"), | ||
| }; | ||
|
|
||
| for event in events.iter().take(evt_num) { | ||
| assert_eq!(event.data as u16, COMPLETION_EVENT); | ||
| disk_image_async.notifier().read().unwrap(); | ||
| } | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I observed rare cases where multiple events are queued, but the next read is issued before they all completed. This causes the unit test to fail because buffer contents don't match. The read may have been reordered with a still queued write or not happened at all when the comparison is done.
| let evt_num = match epoll::wait(epoll_file.as_raw_fd(), timeout, &mut events[..]) { | |
| Ok(n) => n, | |
| Err(e) if e.kind() == std::io::ErrorKind::Interrupted => continue, | |
| Err(e) => panic!("epoll_wait failed: {e}"), | |
| }; | |
| for event in events.iter().take(evt_num) { | |
| assert_eq!(event.data as u16, COMPLETION_EVENT); | |
| disk_image_async.notifier().read().unwrap(); | |
| } | |
| break; | |
| let evt_num = match epoll::wait(epoll_file.as_raw_fd(), timeout, &mut events[..]) { | |
| // Always loop until no more events are detected. | |
| Ok(0) => break, | |
| Ok(n) => n, | |
| Err(e) if e.kind() == std::io::ErrorKind::Interrupted => continue, | |
| Err(e) => panic!("epoll_wait failed: {e}"), | |
| }; | |
| for event in events.iter().take(evt_num) { | |
| assert_eq!(event.data as u16, COMPLETION_EVENT); | |
| disk_image_async.notifier().read().unwrap(); | |
| } |
The relevant strace part of a failing execution:
[pid 2040901] io_uring_enter(4, 1, 0, 0, NULL, 128) = 1
[pid 2040901] epoll_wait(6, [{events=EPOLLIN, data=0x11}], 1, 5) = 1
[pid 2040901] read(5, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 2040901] io_uring_enter(4, 1, 0, 0, NULL, 128) = 1
[pid 2040901] io_uring_enter(4, 1, 0, 0, NULL, 128) = 1
[pid 2040901] io_uring_enter(4, 1, 0, 0, NULL, 128) = 1
[pid 2040901] epoll_wait(6, [{events=EPOLLIN, data=0x11}], 1, 5) = 1
[pid 2040901] read(5, "\1\0\0\0\0\0\0\0", 8) = 8 <-- only one of 3 queued write events completed
[pid 2040901] io_uring_enter(4, 1, 0, 0, NULL, 128) = 1 <-- queued the next read
[pid 2040901] epoll_wait(6, [{events=EPOLLIN, data=0x11}], 1, 5) = 1
[pid 2040901] read(5, "\1\0\0\0\0\0\0\0", 8) = 8 <-- one more event completed (2 remaining?)
[pid 2040901] gettid() = 2040901 <-- starting failed assert handler
With this final change (and those from previous comments), the unit tests now look solid to me! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, some more stress-testing found another very rare case of a not fully cleared buffer:
thread 'tests::test_blk_dev_raw_io_uring_discard_wr_zeroes_request' (2357082) panicked at block/src/lib.rs:1584:13:
assertion `left == right` failed
left: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86]
right: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I don't see how that happened, but it is more likely another test design deficit than implementation bug. I suspect no completion event was queued yet in that case. Simply increasing the timeout has not resurfaced the bug yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure at least one event was processed seems to work as well, even with a smaller timeout to provoke the situation. That's probably the way to go here.
| let evt_num = match epoll::wait(epoll_file.as_raw_fd(), timeout, &mut events[..]) { | |
| Ok(n) => n, | |
| Err(e) if e.kind() == std::io::ErrorKind::Interrupted => continue, | |
| Err(e) => panic!("epoll_wait failed: {e}"), | |
| }; | |
| for event in events.iter().take(evt_num) { | |
| assert_eq!(event.data as u16, COMPLETION_EVENT); | |
| disk_image_async.notifier().read().unwrap(); | |
| } | |
| break; | |
| let mut total = 0; | |
| loop { | |
| let evt_num = match epoll::wait(epoll_file.as_raw_fd(), timeout, &mut events[..]) { | |
| // Always loop until no more events are detected. | |
| Ok(0) => if total > 0 { break } else { continue }, | |
| Ok(n) => n, | |
| Err(e) if e.kind() == std::io::ErrorKind::Interrupted => continue, | |
| Err(e) => panic!("epoll_wait failed: {e}"), | |
| }; | |
| for event in events.iter().take(evt_num) { | |
| assert_eq!(event.data as u16, COMPLETION_EVENT); | |
| disk_image_async.notifier().read().unwrap(); | |
| } | |
| total += evt_num; | |
| } |
block: discard and write_zeroes support for raw sync and raw async
The host doesn't reclaim the disk space when a disk space is released in guest.
This patch supports
discardandwrite_zeroesfor raw sync and raw async(io_uring), allowing execution oftrimin guest to notify the disk backend to release the disk space.Signed-off-by: Songqian Li sionli@tencent.com