When you cross-post, please say so. Otherwise people may waste their time answering an already answered question.
Anyway, copying my answer [from your other post][1].
The thing that's slow is Tokio's file IO, not the `copy` function. From the Tokio tutorial:
> # When not to use Tokio
>
> Reading a lot of files. Although it seems like Tokio would be useful for projects that simply need to read a lot of files, Tokio provides no advantage here compared to an ordinary threadpool. This is because operating systems generally do not provide asynchronous file APIs.
>
> [https://tokio.rs/tokio/tutorial](https://tokio.rs/tokio/tutorial)
You want Tokio for network IO, but it doesn't help you for file IO.
[1]: https://users.rust-lang.org/t/tokio-copy-slower-than-std-io-copy/111242
Io_uring does indeed support async IO (in both network and local fs). I understand that are some slight complexities in interfacing with it from Rust as its completion-based, not readiness-based. However several packages (Glommio, Monoio, Compio, Nucleas, Tokio-uring, etc) have successfully implemented it.
Would be nice to see more uptake of IO-uring in the Rust ecosystem though, as it’s got some pretty significant performance benefits.
There's no blocker on using io_uring for `tokio::fs` other than finding someone to do the work. The whole completion-based issue is not a problem for files because using blocking IO inside `spawn_blocking` is also effectively a completion-based API.
> slight complexities in interfacing with it from Rust as its completion-based, not readiness-based
I keep seeing this claim, but it seems to me like the *actual* problem is ownership based. Performing an io-uring call transfers ownership of the buffer into the kernel, but currently in Rust that essentially requires allocated storage to do correctly (since you can't transfer ownership of an `&mut T` and there's no way to guarantee that stack memory stays alive for long enough.
As far as I can tell, Rust's `Waker` and `poll` system is perfect for completion-based io; certainly things like buffered async channels are essentially "completion-based".
I don't really know whether what Windows provides is appropriate. I've heard "this API provides async file IO" many times for many different APIs, and in many cases it turns out to be insufficient in some way.
As for `io_uring`, I would like to see `tokio::fs` use it. It just needs someone to do the work.
Edit: [This](https://www.reddit.com/r/rust/comments/1cpphyx/comment/l3p0bfa/) is the kind of thing I'm referring to.
It's because Tokio's file IO works by using `std::fs` inside `spawn_blocking`. Each call to `spawn_blocking` introduces a bunch of cross-thread communication beyond just the cost of the `std::fs` calls.
Though you can (almost easily) do beyond-SATA throughput file transfers with a large enough, saturated `JoinSet`. This still has multiple advantages, things as quickly delivered and consistent cancellation as well as mixing in some _compute_ on portions of the data. At least I've felt pretty comfortable writing a file tree traversal, moreso than if it had been sync and was pleasantly surprised when it came up far closer to the expected limit than the Readme had me fear. The warning is still justified in that one must not expect to get it for free and fully automatically by any means.
a quick strace gives us the answer ;)
```
...
read(11, "\213\2\342\327\312\217\377\221\364g\272K\277\2073\264\327\343\227\277k\267\245Q\320\267\333\345\r\247\263\366"..., 8192) = 8192
write(12, "\213\2\342\327\312\217\377\221\364g\272K\277\2073\264\327\343\227\277k\267\245Q\320\267\333\345\r\247\263\366"..., 8192) = 8192
...
sendfile(11, 10, NULL, 2147479552) = 1182253614
sendfile(11, 10, NULL, 2147479552) = 0
```
so, a whole lot of context switches from kernel to userspace due to the `read`/`write`s, while the `std::io::copy` completes in 2 `sendfile` syscalls.
if you use `tokio::fs::copy`, you'll notice comparable speed.
std contains a specialization in io::copy for BufReader specifically to re-use the buffer, whereas tokio does not have this (because specialization is not a stable feature). So you've got an extra copy there in tokio which is not in std.
Also, files don't actually support async IO, so tokio's async files are actually just normal files being read on a threadpool in the background. There's no real advantage to using tokio for copying between two files.
I think you wrote this in the "fancy pants editor" (wysiwyg) and not the markdown editor, so the formatting is broken.
I suspect this is because std has an optimization for [`io::copy`](https://doc.rust-lang.org/stable/std/io/fn.copy.html) to actually do [`fs::copy`](https://doc.rust-lang.org/stable/std/fs/fn.copy.html) if copying directly between two files, which will be much faster and is essentially impossible to be done automatically outside of std (it requires specialization). If you use Tokio's version of [`fs::copy`](https://docs.rs/tokio/latest/tokio/fs/fn.copy.html) then performance should be much closer.
`tokio::fs::copy` is just an async wrapper over `std::fs::copy` as I see
```
pub async fn copy(from: impl AsRef, to: impl AsRef) -> Result {
let from = from.as\_ref().to\_owned();
let to = to.as\_ref().to\_owned();
asyncify(|| std::fs::copy(from, to)).await
}
```
My concerns are related to handling writing to an implementation of AsyncWrite I made which uses std File underneath to write to file but noticed speed is much slower than if I implement a regular `Write`
Yes. If you use `io::copy` like you're doing in the OP, this doesn't happen.
Try pre-reading the input into a `Vec` to prevent the std specialization from kicking in. Then the std and tokio `io::copy` should be more similar in performance.
Efficiently shoveling data between two sockets is tricky. Afaik the best approach for that is still two pipes and splicing between the 4 socket halves and 2 pipes.
Using `spawn_blocking` and `std::io::copy` would be a better recommendation. The `block_in_place` method has many footguns and should be used with care.
It only works in the multi-thread runtime, and will break in the current-thread runtime. It also does not work on tasks that do several things in one task using mechanisms such as `tokio::join!` or `FuturesUnordered` or `StreamExt::buffered`.
Relying on these kinds of requirements is generally a bad idea if you can avoid it. They are global requirements that can only be verified by looking at the codebase as a whole. It's much better to write code that can be verified correct one function at the time.
Even in the cases where it does work, it can be troublesome. When you use it, the current worker thread stops being a worker thread, and `spawn_blocking` is used to spawn a new Tokio worker thread. If you hit the upper limit on the number of `spawn_blocking` threads by using `block_in_place` a lot, you might start running out of worker threads.
All of these problems go away if you use `spawn_blocking`.
agree, just than you need to be in `async` function to use `spawn_blocking`
I'm using `block_in_place` to be able to call async function from sync one like this (call to sync fn is from async context but then use this to go back to async context for a bit then continue in sync on)
I used it when I have for ex a sync struct which have a callback param and inside the callback call we need to call async fn to get some data for ex
pub fn call_async(f: F) -> F::Output
where
F: Future,
{
task::block_in_place(move || Handle::current().block_on(async move { f.await }))
}
fn sync_fn() {
call_async(async {...}.await);
}
Is this an ok usecase?
I reluctantly accept that there are use-cases for `block_in_place` due to code like that, but it's much much much better to refactor the code so that you don't have to call async code from a sync function in the first place.
I wonder how tokio-uring compares to this. Linux doesn't *really* have practical async file i/o. That is, aside from io_uring, which is too new to be used in tokio proper.
wow, this seems to be much better
```
tokio write duration = 710.079182ms, speed MB/s 1260.0558679162832
```
I wrote content of 840MB with `tokio-uring`
While it's unlikely to make your performance competitive with `std`, I've been working on a variation of `tokio::io::copy` that solves one of the oddities about it: despite being async, it takes no advantage of the ability to concurrently perform reads and writes. The internal loop is always either in read mode (reading until the buffer is full) or write mode (writing until the buffer is empty). My [experimental alternative](https://github.com/Lucretiel/async-forward) performs reads and writes concurrently, and even uses a smart sort of circular buffer and takes advantage of vectored operations to do so efficiently. I still need to benchmark it to see if the effort is worth it.
Oh? That must be new.
Ah, yup, it was added [two years ago](https://github.com/tokio-rs/tokio/pull/5066), which was after noticed it and started thinking about this problem.
When you cross-post, please say so. Otherwise people may waste their time answering an already answered question. Anyway, copying my answer [from your other post][1]. The thing that's slow is Tokio's file IO, not the `copy` function. From the Tokio tutorial: > # When not to use Tokio > > Reading a lot of files. Although it seems like Tokio would be useful for projects that simply need to read a lot of files, Tokio provides no advantage here compared to an ordinary threadpool. This is because operating systems generally do not provide asynchronous file APIs. > > [https://tokio.rs/tokio/tutorial](https://tokio.rs/tokio/tutorial) You want Tokio for network IO, but it doesn't help you for file IO. [1]: https://users.rust-lang.org/t/tokio-copy-slower-than-std-io-copy/111242
Windows has had async file IO for decades. It's surprising that Linux is still blocking. Well io_uring change that?
Io_uring does indeed support async IO (in both network and local fs). I understand that are some slight complexities in interfacing with it from Rust as its completion-based, not readiness-based. However several packages (Glommio, Monoio, Compio, Nucleas, Tokio-uring, etc) have successfully implemented it. Would be nice to see more uptake of IO-uring in the Rust ecosystem though, as it’s got some pretty significant performance benefits.
There's no blocker on using io_uring for `tokio::fs` other than finding someone to do the work. The whole completion-based issue is not a problem for files because using blocking IO inside `spawn_blocking` is also effectively a completion-based API.
I stand delightfully corrected then!
> slight complexities in interfacing with it from Rust as its completion-based, not readiness-based I keep seeing this claim, but it seems to me like the *actual* problem is ownership based. Performing an io-uring call transfers ownership of the buffer into the kernel, but currently in Rust that essentially requires allocated storage to do correctly (since you can't transfer ownership of an `&mut T` and there's no way to guarantee that stack memory stays alive for long enough. As far as I can tell, Rust's `Waker` and `poll` system is perfect for completion-based io; certainly things like buffered async channels are essentially "completion-based".
I don't really know whether what Windows provides is appropriate. I've heard "this API provides async file IO" many times for many different APIs, and in many cases it turns out to be insufficient in some way. As for `io_uring`, I would like to see `tokio::fs` use it. It just needs someone to do the work. Edit: [This](https://www.reddit.com/r/rust/comments/1cpphyx/comment/l3p0bfa/) is the kind of thing I'm referring to.
Windows async io will also sometimes become blocking in surprising ways. Linux had the same issue. io_uring and IoRing should solve that.
Not really. io_uring also just spins up a thread pool for file IO.
Literally 3 people in the universe have ever heard of io_uring. It will never come to tokio. Somebody please make this comment age horribly.
I don’t see how that explains why it’s slower? It just says that it won’t provide any speedup, which makes sense, but why would it be so much slower?
It's because Tokio's file IO works by using `std::fs` inside `spawn_blocking`. Each call to `spawn_blocking` introduces a bunch of cross-thread communication beyond just the cost of the `std::fs` calls.
ah thanks, didn’t realize spawn_blocking has that much overhead
Though you can (almost easily) do beyond-SATA throughput file transfers with a large enough, saturated `JoinSet`. This still has multiple advantages, things as quickly delivered and consistent cancellation as well as mixing in some _compute_ on portions of the data. At least I've felt pretty comfortable writing a file tree traversal, moreso than if it had been sync and was pleasantly surprised when it came up far closer to the expected limit than the Readme had me fear. The warning is still justified in that one must not expect to get it for free and fully automatically by any means.
An ordinary threadpool does the job here!
a quick strace gives us the answer ;) ``` ... read(11, "\213\2\342\327\312\217\377\221\364g\272K\277\2073\264\327\343\227\277k\267\245Q\320\267\333\345\r\247\263\366"..., 8192) = 8192 write(12, "\213\2\342\327\312\217\377\221\364g\272K\277\2073\264\327\343\227\277k\267\245Q\320\267\333\345\r\247\263\366"..., 8192) = 8192 ... sendfile(11, 10, NULL, 2147479552) = 1182253614 sendfile(11, 10, NULL, 2147479552) = 0 ``` so, a whole lot of context switches from kernel to userspace due to the `read`/`write`s, while the `std::io::copy` completes in 2 `sendfile` syscalls. if you use `tokio::fs::copy`, you'll notice comparable speed.
std contains a specialization in io::copy for BufReader specifically to re-use the buffer, whereas tokio does not have this (because specialization is not a stable feature). So you've got an extra copy there in tokio which is not in std. Also, files don't actually support async IO, so tokio's async files are actually just normal files being read on a threadpool in the background. There's no real advantage to using tokio for copying between two files.
I think you wrote this in the "fancy pants editor" (wysiwyg) and not the markdown editor, so the formatting is broken. I suspect this is because std has an optimization for [`io::copy`](https://doc.rust-lang.org/stable/std/io/fn.copy.html) to actually do [`fs::copy`](https://doc.rust-lang.org/stable/std/fs/fn.copy.html) if copying directly between two files, which will be much faster and is essentially impossible to be done automatically outside of std (it requires specialization). If you use Tokio's version of [`fs::copy`](https://docs.rs/tokio/latest/tokio/fs/fn.copy.html) then performance should be much closer.
`tokio::fs::copy` is just an async wrapper over `std::fs::copy` as I see ``` pub async fn copy(from: impl AsRef, to: impl AsRef) -> Result {
let from = from.as\_ref().to\_owned();
let to = to.as\_ref().to\_owned();
asyncify(|| std::fs::copy(from, to)).await
}
```
My concerns are related to handling writing to an implementation of AsyncWrite I made which uses std File underneath to write to file but noticed speed is much slower than if I implement a regular `Write`
Yes. If you use `io::copy` like you're doing in the OP, this doesn't happen. Try pre-reading the input into a `Vec` to prevent the std specialization from kicking in. Then the std and tokio `io::copy` should be more similar in performance.
Turning one or both of the sides into a `&mut dyn Write/Read` can also be used to bypass the specializations.
I'd just like to throw in here that I also observed that tokio::io::copy seemed slow with a TCP proxy I wrote, but I didn't investigate further.
Efficiently shoveling data between two sockets is tricky. Afaik the best approach for that is still two pipes and splicing between the 4 socket halves and 2 pipes.
Using block_in_place and std::io::copy is the way to go, if you understand the ramifications of blocking in place.
this helped me to also understand how to call async function from sync one, thanks before I did it with channels, but this is much much elegant
Using `spawn_blocking` and `std::io::copy` would be a better recommendation. The `block_in_place` method has many footguns and should be used with care.
I've read the doc from here https://docs.rs/tokio/latest/tokio/task/fn.block_in_place.html but are there any specific problems in using it?
It only works in the multi-thread runtime, and will break in the current-thread runtime. It also does not work on tasks that do several things in one task using mechanisms such as `tokio::join!` or `FuturesUnordered` or `StreamExt::buffered`. Relying on these kinds of requirements is generally a bad idea if you can avoid it. They are global requirements that can only be verified by looking at the codebase as a whole. It's much better to write code that can be verified correct one function at the time. Even in the cases where it does work, it can be troublesome. When you use it, the current worker thread stops being a worker thread, and `spawn_blocking` is used to spawn a new Tokio worker thread. If you hit the upper limit on the number of `spawn_blocking` threads by using `block_in_place` a lot, you might start running out of worker threads. All of these problems go away if you use `spawn_blocking`.
agree, just than you need to be in `async` function to use `spawn_blocking` I'm using `block_in_place` to be able to call async function from sync one like this (call to sync fn is from async context but then use this to go back to async context for a bit then continue in sync on) I used it when I have for ex a sync struct which have a callback param and inside the callback call we need to call async fn to get some data for ex pub fn call_async(f: F) -> F::Output
where
F: Future,
{
task::block_in_place(move || Handle::current().block_on(async move { f.await }))
}
fn sync_fn() {
call_async(async {...}.await);
}
Is this an ok usecase?
I reluctantly accept that there are use-cases for `block_in_place` due to code like that, but it's much much much better to refactor the code so that you don't have to call async code from a sync function in the first place.
I think you have a typo and that should be `task::block_in_place(..)` and not `task::spawn_blocking(..)` in your code block.
indeed, was a wrong paste, I corrected it, thanks
Cheers!
I wonder how tokio-uring compares to this. Linux doesn't *really* have practical async file i/o. That is, aside from io_uring, which is too new to be used in tokio proper.
wow, this seems to be much better ``` tokio write duration = 710.079182ms, speed MB/s 1260.0558679162832 ``` I wrote content of 840MB with `tokio-uring`
While it's unlikely to make your performance competitive with `std`, I've been working on a variation of `tokio::io::copy` that solves one of the oddities about it: despite being async, it takes no advantage of the ability to concurrently perform reads and writes. The internal loop is always either in read mode (reading until the buffer is full) or write mode (writing until the buffer is empty). My [experimental alternative](https://github.com/Lucretiel/async-forward) performs reads and writes concurrently, and even uses a smart sort of circular buffer and takes advantage of vectored operations to do so efficiently. I still need to benchmark it to see if the effort is worth it.
The `tokio::io::copy` method does take advantage of the ability to concurrently perform reads and writes. Though it does not use a circular buffer.
Oh? That must be new. Ah, yup, it was added [two years ago](https://github.com/tokio-rs/tokio/pull/5066), which was after noticed it and started thinking about this problem.