patterns/src/idioms/ffi/accepting-strings.md

# Accepting Strings

## Description

When accepting strings via FFI through pointers, there are two principles that
should be followed:

1. Keep foreign strings "borrowed", rather than copying them directly.
2. Minimize the amount of complexity and `unsafe` code involved in converting
   from a C-style string to native Rust strings.

## Motivation

The strings used in C have different behaviours to those used in Rust, namely:

- C strings are null-terminated while Rust strings store their length
- C strings can contain any arbitrary non-zero byte while Rust strings must be
  UTF-8
- C strings are accessed and manipulated using `unsafe` pointer operations while
  interactions with Rust strings go through safe methods

The Rust standard library comes with C equivalents of Rust's `String` and `&str`
called `CString` and `&CStr`, that allow us to avoid a lot of the complexity and
`unsafe` code involved in converting between C strings and Rust strings.

The `&CStr` type also allows us to work with borrowed data, meaning passing
strings between Rust and C is a zero-cost operation.

## Code Example

```rust,ignore
pub mod unsafe_module {

    // other module content

    /// Log a message at the specified level.
    ///
    /// # Safety
    ///
    /// It is the caller's guarantee to ensure `msg`:
    ///
    /// - is not a null pointer
    /// - points to valid, initialized data
    /// - points to memory ending in a null byte
    /// - won't be mutated for the duration of this function call
    #[no_mangle]
    pub unsafe extern "C" fn mylib_log(msg: *const libc::c_char, level: libc::c_int) {
        let level: crate::LogLevel = match level { /* ... */ };

        // SAFETY: The caller has already guaranteed this is okay (see the
        // `# Safety` section of the doc-comment).
        let msg_str: &str = match std::ffi::CStr::from_ptr(msg).to_str() {
            Ok(s) => s,
            Err(e) => {
                crate::log_error("FFI string conversion failed");
                return;
            }
        };

        crate::log(msg_str, level);
    }
}
```

## Advantages

The example is is written to ensure that:

1. The `unsafe` block is as small as possible.
2. The pointer with an "untracked" lifetime becomes a "tracked" shared reference

Consider an alternative, where the string is actually copied:

```rust,ignore
pub mod unsafe_module {

    // other module content

    pub extern "C" fn mylib_log(msg: *const libc::c_char, level: libc::c_int) {
        // DO NOT USE THIS CODE.
        // IT IS UGLY, VERBOSE, AND CONTAINS A SUBTLE BUG.

        let level: crate::LogLevel = match level { /* ... */ };

        let msg_len = unsafe { /* SAFETY: strlen is what it is, I guess? */
            libc::strlen(msg)
        };

        let mut msg_data = Vec::with_capacity(msg_len + 1);

        let msg_cstr: std::ffi::CString = unsafe {
            // SAFETY: copying from a foreign pointer expected to live
            // for the entire stack frame into owned memory
            std::ptr::copy_nonoverlapping(msg, msg_data.as_mut(), msg_len);

            msg_data.set_len(msg_len + 1);

            std::ffi::CString::from_vec_with_nul(msg_data).unwrap()
        }

        let msg_str: String = unsafe {
            match msg_cstr.into_string() {
                Ok(s) => s,
                Err(e) => {
                    crate::log_error("FFI string conversion failed");
                    return;
                }
            }
        };

        crate::log(&msg_str, level);
    }
}
```

This code in inferior to the original in two respects:

1. There is much more `unsafe` code, and more importantly, more invariants it
   must uphold.
2. Due to the extensive arithmetic required, there is a bug in this version that
   cases Rust `undefined behaviour`.

The bug here is a simple mistake in pointer arithmetic: the string was copied,
all `msg_len` bytes of it. However, the `NUL` terminator at the end was not.

The Vector then had its size *set* to the length of the *zero padded string* --
rather than *resized* to it, which could have added a zero at the end. As a
result, the last byte in the Vector is uninitialized memory. When the `CString`
is created at the bottom of the block, its read of the Vector will cause
`undefined behaviour`!

Like many such issues, this would be difficult issue to track down. Sometimes it
would panic because the string was not `UTF-8`, sometimes it would put a weird
character at the end of the string, sometimes it would just completely crash.

## Disadvantages

None?
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago			`# Accepting Strings`

			`## Description`

Shorten lines to line-length 80 (#208) 3 years ago			`When accepting strings via FFI through pointers, there are two principles that`
			`should be followed:`
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago
			`1. Keep foreign strings "borrowed", rather than copying them directly.`
Revise the FFI chapters (#190) Co-authored-by: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> 3 years ago			2. Minimize the amount of complexity and `unsafe` code involved in converting
			`from a C-style string to native Rust strings.`
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago
			`## Motivation`

Revise the FFI chapters (#190) Co-authored-by: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> 3 years ago			`The strings used in C have different behaviours to those used in Rust, namely:`
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago
Revise the FFI chapters (#190) Co-authored-by: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> 3 years ago			`- C strings are null-terminated while Rust strings store their length`
			`- C strings can contain any arbitrary non-zero byte while Rust strings must be`
			`UTF-8`
chore: reformat repository with dprint (#370) * chore: reformat repository with dprint * ci: change action name to reflect that it's not only markdown * fix: broken link 10 months ago			- C strings are accessed and manipulated using `unsafe` pointer operations while
			`interactions with Rust strings go through safe methods`
Revise the FFI chapters (#190) Co-authored-by: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> 3 years ago
			The Rust standard library comes with C equivalents of Rust's `String` and `&str`
chore: reformat repository with dprint (#370) * chore: reformat repository with dprint * ci: change action name to reflect that it's not only markdown * fix: broken link 10 months ago			called `CString` and `&CStr`, that allow us to avoid a lot of the complexity and
			`unsafe` code involved in converting between C strings and Rust strings.
Revise the FFI chapters (#190) Co-authored-by: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> 3 years ago
			The `&CStr` type also allows us to work with borrowed data, meaning passing
			`strings between Rust and C is a zero-cost operation.`
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago
			`## Code Example`

			```rust,ignore
			`pub mod unsafe_module {`

			`// other module content`

Revise the FFI chapters (#190) Co-authored-by: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> 3 years ago			`/// Log a message at the specified level.`
			`///`
			`/// # Safety`
			`///`
			/// It is the caller's guarantee to ensure `msg`:
			`///`
			`/// - is not a null pointer`
			`/// - points to valid, initialized data`
			`/// - points to memory ending in a null byte`
			`/// - won't be mutated for the duration of this function call`
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago			`#[no_mangle]`
style: format code blocks Signed-off-by: simonsan <14062932+simonsan@users.noreply.github.com> 3 months ago			`pub unsafe extern "C" fn mylib_log(msg: *const libc::c_char, level: libc::c_int) {`
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago			`let level: crate::LogLevel = match level { /* ... */ };`

Revise the FFI chapters (#190) Co-authored-by: Marco Ieni <11428655+MarcoIeni@users.noreply.github.com> 3 years ago			`// SAFETY: The caller has already guaranteed this is okay (see the`
			// `# Safety` section of the doc-comment).
			`let msg_str: &str = match std::ffi::CStr::from_ptr(msg).to_str() {`
			`Ok(s) => s,`
			`Err(e) => {`
			`crate::log_error("FFI string conversion failed");`
			`return;`
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago			`}`
			`};`

			`crate::log(msg_str, level);`
			`}`
			`}`
			```

			`## Advantages`

			`The example is is written to ensure that:`

			1. The `unsafe` block is as small as possible.
chore: reformat repository with dprint (#370) * chore: reformat repository with dprint * ci: change action name to reflect that it's not only markdown * fix: broken link 10 months ago			`2. The pointer with an "untracked" lifetime becomes a "tracked" shared reference`
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago
			`Consider an alternative, where the string is actually copied:`

			```rust,ignore
			`pub mod unsafe_module {`

			`// other module content`

			`pub extern "C" fn mylib_log(msg: *const libc::c_char, level: libc::c_int) {`
Shorten lines to line-length 80 (#208) 3 years ago			`// DO NOT USE THIS CODE.`
			`// IT IS UGLY, VERBOSE, AND CONTAINS A SUBTLE BUG.`
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago
			`let level: crate::LogLevel = match level { /* ... */ };`

			`let msg_len = unsafe { /* SAFETY: strlen is what it is, I guess? */`
			`libc::strlen(msg)`
			`};`

			`let mut msg_data = Vec::with_capacity(msg_len + 1);`

			`let msg_cstr: std::ffi::CString = unsafe {`
			`// SAFETY: copying from a foreign pointer expected to live`
			`// for the entire stack frame into owned memory`
			`std::ptr::copy_nonoverlapping(msg, msg_data.as_mut(), msg_len);`

			`msg_data.set_len(msg_len + 1);`

			`std::ffi::CString::from_vec_with_nul(msg_data).unwrap()`
			`}`

			`let msg_str: String = unsafe {`
			`match msg_cstr.into_string() {`
			`Ok(s) => s,`
			`Err(e) => {`
			`crate::log_error("FFI string conversion failed");`
			`return;`
			`}`
			`}`
			`};`

			`crate::log(&msg_str, level);`
			`}`
			`}`
			```

			`This code in inferior to the original in two respects:`

Shorten lines to line-length 80 (#208) 3 years ago			1. There is much more `unsafe` code, and more importantly, more invariants it
Refactor repository structure to include translations (#352) 1 year ago			`must uphold.`
chore: reformat repository with dprint (#370) * chore: reformat repository with dprint * ci: change action name to reflect that it's not only markdown * fix: broken link 10 months ago			`2. Due to the extensive arithmetic required, there is a bug in this version that`
			cases Rust `undefined behaviour`.
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago
Shorten lines to line-length 80 (#208) 3 years ago			`The bug here is a simple mistake in pointer arithmetic: the string was copied,`
			all `msg_len` bytes of it. However, the `NUL` terminator at the end was not.
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago
chore: reformat repository with dprint (#370) * chore: reformat repository with dprint * ci: change action name to reflect that it's not only markdown * fix: broken link 10 months ago			`The Vector then had its size set to the length of the zero padded string --`
			`rather than resized to it, which could have added a zero at the end. As a`
			result, the last byte in the Vector is uninitialized memory. When the `CString`
			`is created at the bottom of the block, its read of the Vector will cause`
			`undefined behaviour`!
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago
chore: reformat repository with dprint (#370) * chore: reformat repository with dprint * ci: change action name to reflect that it's not only markdown * fix: broken link 10 months ago			`Like many such issues, this would be difficult issue to track down. Sometimes it`
			would panic because the string was not `UTF-8`, sometimes it would put a weird
			`character at the end of the string, sometimes it would just completely crash.`
Several FFI Sections: Strings, Errors, and API Design (#106) 3 years ago
			`## Disadvantages`

			`None?`