|
|
|
# Accepting Strings
|
|
|
|
|
|
|
|
## Description
|
|
|
|
|
|
|
|
When accepting strings via FFI through pointers, there are two principles that
|
|
|
|
should be followed:
|
|
|
|
|
|
|
|
1. Keep foreign strings "borrowed", rather than copying them directly.
|
|
|
|
2. Minimize the amount of complexity and `unsafe` code involved in converting
|
|
|
|
from a C-style string to native Rust strings.
|
|
|
|
|
|
|
|
## Motivation
|
|
|
|
|
|
|
|
The strings used in C have different behaviours to those used in Rust, namely:
|
|
|
|
|
|
|
|
- C strings are null-terminated while Rust strings store their length
|
|
|
|
- C strings can contain any arbitrary non-zero byte while Rust strings must be
|
|
|
|
UTF-8
|
|
|
|
- C strings are accessed and manipulated using `unsafe` pointer operations while
|
|
|
|
interactions with Rust strings go through safe methods
|
|
|
|
|
|
|
|
The Rust standard library comes with C equivalents of Rust's `String` and `&str`
|
|
|
|
called `CString` and `&CStr`, that allow us to avoid a lot of the complexity and
|
|
|
|
`unsafe` code involved in converting between C strings and Rust strings.
|
|
|
|
|
|
|
|
The `&CStr` type also allows us to work with borrowed data, meaning passing
|
|
|
|
strings between Rust and C is a zero-cost operation.
|
|
|
|
|
|
|
|
## Code Example
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
pub mod unsafe_module {
|
|
|
|
|
|
|
|
// other module content
|
|
|
|
|
|
|
|
/// Log a message at the specified level.
|
|
|
|
///
|
|
|
|
/// # Safety
|
|
|
|
///
|
|
|
|
/// It is the caller's guarantee to ensure `msg`:
|
|
|
|
///
|
|
|
|
/// - is not a null pointer
|
|
|
|
/// - points to valid, initialized data
|
|
|
|
/// - points to memory ending in a null byte
|
|
|
|
/// - won't be mutated for the duration of this function call
|
|
|
|
#[no_mangle]
|
|
|
|
pub unsafe extern "C" fn mylib_log(msg: *const libc::c_char, level: libc::c_int) {
|
|
|
|
let level: crate::LogLevel = match level { /* ... */ };
|
|
|
|
|
|
|
|
// SAFETY: The caller has already guaranteed this is okay (see the
|
|
|
|
// `# Safety` section of the doc-comment).
|
|
|
|
let msg_str: &str = match std::ffi::CStr::from_ptr(msg).to_str() {
|
|
|
|
Ok(s) => s,
|
|
|
|
Err(e) => {
|
|
|
|
crate::log_error("FFI string conversion failed");
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
};
|
|
|
|
|
|
|
|
crate::log(msg_str, level);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
## Advantages
|
|
|
|
|
|
|
|
The example is is written to ensure that:
|
|
|
|
|
|
|
|
1. The `unsafe` block is as small as possible.
|
|
|
|
2. The pointer with an "untracked" lifetime becomes a "tracked" shared reference
|
|
|
|
|
|
|
|
Consider an alternative, where the string is actually copied:
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
pub mod unsafe_module {
|
|
|
|
|
|
|
|
// other module content
|
|
|
|
|
|
|
|
pub extern "C" fn mylib_log(msg: *const libc::c_char, level: libc::c_int) {
|
|
|
|
// DO NOT USE THIS CODE.
|
|
|
|
// IT IS UGLY, VERBOSE, AND CONTAINS A SUBTLE BUG.
|
|
|
|
|
|
|
|
let level: crate::LogLevel = match level { /* ... */ };
|
|
|
|
|
|
|
|
let msg_len = unsafe { /* SAFETY: strlen is what it is, I guess? */
|
|
|
|
libc::strlen(msg)
|
|
|
|
};
|
|
|
|
|
|
|
|
let mut msg_data = Vec::with_capacity(msg_len + 1);
|
|
|
|
|
|
|
|
let msg_cstr: std::ffi::CString = unsafe {
|
|
|
|
// SAFETY: copying from a foreign pointer expected to live
|
|
|
|
// for the entire stack frame into owned memory
|
|
|
|
std::ptr::copy_nonoverlapping(msg, msg_data.as_mut(), msg_len);
|
|
|
|
|
|
|
|
msg_data.set_len(msg_len + 1);
|
|
|
|
|
|
|
|
std::ffi::CString::from_vec_with_nul(msg_data).unwrap()
|
|
|
|
}
|
|
|
|
|
|
|
|
let msg_str: String = unsafe {
|
|
|
|
match msg_cstr.into_string() {
|
|
|
|
Ok(s) => s,
|
|
|
|
Err(e) => {
|
|
|
|
crate::log_error("FFI string conversion failed");
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
};
|
|
|
|
|
|
|
|
crate::log(&msg_str, level);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
This code in inferior to the original in two respects:
|
|
|
|
|
|
|
|
1. There is much more `unsafe` code, and more importantly, more invariants it
|
|
|
|
must uphold.
|
|
|
|
2. Due to the extensive arithmetic required, there is a bug in this version that
|
|
|
|
cases Rust `undefined behaviour`.
|
|
|
|
|
|
|
|
The bug here is a simple mistake in pointer arithmetic: the string was copied,
|
|
|
|
all `msg_len` bytes of it. However, the `NUL` terminator at the end was not.
|
|
|
|
|
|
|
|
The Vector then had its size *set* to the length of the *zero padded string* --
|
|
|
|
rather than *resized* to it, which could have added a zero at the end. As a
|
|
|
|
result, the last byte in the Vector is uninitialized memory. When the `CString`
|
|
|
|
is created at the bottom of the block, its read of the Vector will cause
|
|
|
|
`undefined behaviour`!
|
|
|
|
|
|
|
|
Like many such issues, this would be difficult issue to track down. Sometimes it
|
|
|
|
would panic because the string was not `UTF-8`, sometimes it would put a weird
|
|
|
|
character at the end of the string, sometimes it would just completely crash.
|
|
|
|
|
|
|
|
## Disadvantages
|
|
|
|
|
|
|
|
None?
|