You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
261 lines
9.9 KiB
Markdown
261 lines
9.9 KiB
Markdown
# Object-Based APIs
|
|
|
|
## Description
|
|
|
|
When designing APIs in Rust which are exposed to other languages, there are some
|
|
important design principles which are contrary to normal Rust API design:
|
|
|
|
1. All Encapsulated types should be *owned* by Rust, *managed* by the user,
|
|
and *opaque*.
|
|
2. All Transactional data types should be *owned* by the user, and *transparent*.
|
|
3. All library behavior should be functions acting upon Encapsulated types.
|
|
4. All library behavior should be encapsulated into types not based on structure,
|
|
but *provenance/lifetime*.
|
|
|
|
## Motivation
|
|
|
|
Rust has built-in FFI support to other languages.
|
|
It does this by providing a way for crate authors to provide C-compatible APIs
|
|
through different ABIs (though that is unimportant to this practice).
|
|
|
|
Well-designed Rust FFI follows C API design principles, while compromising the
|
|
design in Rust as little as possible. There are three goals with any foreign API:
|
|
|
|
1. Make it easy to use in the target language.
|
|
2. Avoid the API dictating internal unsafety on the Rust side as much as possible.
|
|
3. Keep the potential for memory unsafety and Rust `undefined behaviour` as small
|
|
as possible.
|
|
|
|
Rust code must trust the memory safety of the foreign language beyond a certain
|
|
point. However, every bit of `unsafe` code on the Rust side is an opportunity for
|
|
bugs, or to exacerbate `undefined behaviour`.
|
|
|
|
For example, if a pointer provenance is wrong, that may be a segfault due to
|
|
invalid memory access. But if it is manipulated by unsafe code, it could become
|
|
full-blown heap corruption.
|
|
|
|
The Object-Based API design allows for writing shims that have good memory safety
|
|
characteristics, and a clean boundary of what is safe and what is `unsafe`.
|
|
|
|
## Code Example
|
|
|
|
The POSIX standard defines the API to access an on-file database, known as [DBM](https://web.archive.org/web/20210105035602/https://www.mankier.com/0p/ndbm.h).
|
|
It is an excellent example of an "object-based" API.
|
|
|
|
Here is the definition in C, which hopefully should be easy to read for those
|
|
involved in FFI. The commentary below should help explaining it for those who
|
|
miss the subtleties.
|
|
|
|
```C
|
|
struct DBM;
|
|
typedef struct { void *dptr, size_t dsize } datum;
|
|
|
|
int dbm_clearerr(DBM *);
|
|
void dbm_close(DBM *);
|
|
int dbm_delete(DBM *, datum);
|
|
int dbm_error(DBM *);
|
|
datum dbm_fetch(DBM *, datum);
|
|
datum dbm_firstkey(DBM *);
|
|
datum dbm_nextkey(DBM *);
|
|
DBM *dbm_open(const char *, int, mode_t);
|
|
int dbm_store(DBM *, datum, datum, int);
|
|
```
|
|
|
|
This API defines two types: `DBM` and `datum`.
|
|
|
|
The `DBM` type was called an "encapsulated" type above.
|
|
It is designed to contain internal state, and acts as an entry point for the
|
|
library's behavior.
|
|
|
|
It is completely opaque to the user, who cannot create a `DBM` themselves since
|
|
they don't know its size or layout. Instead, they must call `dbm_open`, and that
|
|
only gives them *a pointer to one*.
|
|
|
|
This means all `DBM`s are "owned" by the library in a Rust sense.
|
|
The internal state of unknown size is kept in memory controlled by the library,
|
|
not the user. The user can only manage its life cycle with `open` and `close`,
|
|
and perform operations on it with the other functions.
|
|
|
|
The `datum` type was called a "transactional" type above.
|
|
It is designed to facilitate the exchange of information between the library and
|
|
its user.
|
|
|
|
The database is designed to store "unstructured data", with no pre-defined length
|
|
or meaning. As a result, the `datum` is the C equivalent of a Rust slice: a bunch
|
|
of bytes, and a count of how many there are. The main difference is that there is
|
|
no type information, which is what `void` indicates.
|
|
|
|
Keep in mind that this header is written from the library's point of view.
|
|
The user likely has some type they are using, which has a known size.
|
|
But the library does not care, and by the rules of C casting, any type behind a
|
|
pointer can be cast to `void`.
|
|
|
|
As noted earlier, this type is *transparent* to the user. But also, this type is
|
|
*owned* by the user.
|
|
This has subtle ramifications, due to that pointer inside it.
|
|
The question is, who owns the memory that pointer points to?
|
|
|
|
The answer for best memory safety is, "the user".
|
|
But in cases such as retrieving a value, the user does not know how to allocate
|
|
it correctly (since they don't know how long the value is). In this case, the library
|
|
code is expected to use the heap that the user has access to -- such as the C library
|
|
`malloc` and `free` -- and then *transfer ownership* in the Rust sense.
|
|
|
|
This may all seem speculative, but this is what a pointer means in C.
|
|
It means the same thing as Rust: "user defined lifetime."
|
|
The user of the library needs to read the documentation in order to use it correctly.
|
|
That said, there are some decisions that have fewer or greater consequences if users
|
|
do it wrong. Minimizing those is what this best practice is about, and the key
|
|
is to *transfer ownership of everything that is transparent*.
|
|
|
|
## Advantages
|
|
|
|
This minimizes the number of memory safety guarantees the user must uphold to a
|
|
relatively small number:
|
|
|
|
1. Do not call any function with a pointer not returned by `dbm_open` (invalid
|
|
access or corruption).
|
|
2. Do not call any function on a pointer after close (use after free).
|
|
3. The `dptr` on any `datum` must be `NULL`, or point to a valid slice of memory
|
|
at the advertised length.
|
|
|
|
In addition, it avoids a lot of pointer provenance issues.
|
|
To understand why, let us consider an alternative in some depth: key iteration.
|
|
|
|
Rust is well known for its iterators.
|
|
When implementing one, the programmer makes a separate type with a bounded lifetime
|
|
to its owner, and implements the `Iterator` trait.
|
|
|
|
Here is how iteration would be done in Rust for `DBM`:
|
|
|
|
```rust,ignore
|
|
struct Dbm { ... }
|
|
|
|
impl Dbm {
|
|
/* ... */
|
|
pub fn keys<'it>(&'it self) -> DbmKeysIter<'it> { ... }
|
|
/* ... */
|
|
}
|
|
|
|
struct DbmKeysIter<'it> {
|
|
owner: &'it Dbm,
|
|
}
|
|
|
|
impl<'it> Iterator for DbmKeysIter<'it> { ... }
|
|
```
|
|
|
|
This is clean, idiomatic, and safe. thanks to Rust's guarantees.
|
|
However, consider what a straightforward API translation would look like:
|
|
|
|
```rust,ignore
|
|
#[no_mangle]
|
|
pub extern "C" fn dbm_iter_new(owner: *const Dbm) -> *mut DbmKeysIter {
|
|
// THIS API IS A BAD IDEA! For real applications, use object-based design instead.
|
|
}
|
|
#[no_mangle]
|
|
pub extern "C" fn dbm_iter_next(
|
|
iter: *mut DbmKeysIter,
|
|
key_out: *const datum
|
|
) -> libc::c_int {
|
|
// THIS API IS A BAD IDEA! For real applications, use object-based design instead.
|
|
}
|
|
#[no_mangle]
|
|
pub extern "C" fn dbm_iter_del(*mut DbmKeysIter) {
|
|
// THIS API IS A BAD IDEA! For real applications, use object-based design instead.
|
|
}
|
|
```
|
|
|
|
This API loses a key piece of information: the lifetime of the iterator must not
|
|
exceed the lifetime of the `Dbm` object that owns it. A user of the library could
|
|
use it in a way which causes the iterator to outlive the data it is iterating on,
|
|
resulting in reading uninitialized memory.
|
|
|
|
This example written in C contains a bug that will be explained afterwards:
|
|
|
|
```C
|
|
int count_key_sizes(DBM *db) {
|
|
// DO NOT USE THIS FUNCTION. IT HAS A SUBTLE BUT SERIOUS BUG!
|
|
datum key;
|
|
int len = 0;
|
|
|
|
if (!dbm_iter_new(db)) {
|
|
dbm_close(db);
|
|
return -1;
|
|
}
|
|
|
|
int l;
|
|
while ((l = dbm_iter_next(owner, &key)) >= 0) { // an error is indicated by -1
|
|
free(key.dptr);
|
|
len += key.dsize;
|
|
if (l == 0) { // end of the iterator
|
|
dbm_close(owner);
|
|
}
|
|
}
|
|
if l >= 0 {
|
|
return -1;
|
|
} else {
|
|
return len;
|
|
}
|
|
}
|
|
```
|
|
|
|
This bug is a classic. Here's what happens when the iterator returns the
|
|
end-of-iteration marker:
|
|
|
|
1. The loop condition sets `l` to zero, and enters the loop because `0 >= 0`.
|
|
2. The length is incremented, in this case by zero.
|
|
3. The if statement is true, so the database is closed. There should be a break
|
|
statement here.
|
|
4. The loop condition executes again, causing a `next` call on the closed object.
|
|
|
|
The worst part about this bug?
|
|
If the Rust implementation was careful, this code will work most of the time!
|
|
If the memory for the `Dbm` object is not immediately reused, an internal check
|
|
will almost certainly fail, resulting in the iterator returning a `-1` indicating
|
|
an error. But occasionally, it will cause a segmentation fault, or even worse,
|
|
nonsensical memory corruption!
|
|
|
|
None of this can be avoided by Rust.
|
|
From its perspective, it put those objects on its heap, returned pointers to them,
|
|
and gave up control of their lifetimes. The C code simply must "play nice".
|
|
|
|
The programmer must read and understand the API documentation.
|
|
While some consider that par for the course in C, a good API design can mitigate
|
|
this risk. The POSIX API for `DBM` did this by *consolidating the ownership* of
|
|
the iterator with its parent:
|
|
|
|
```C
|
|
datum dbm_firstkey(DBM *);
|
|
datum dbm_nextkey(DBM *);
|
|
```
|
|
|
|
Thus, all of the lifetimes were bound together, and such unsafety was prevented.
|
|
|
|
## Disadvantages
|
|
|
|
However, this design choice also has a number of drawbacks, which should be
|
|
considered as well.
|
|
|
|
First, the API itself becomes less expressive.
|
|
With POSIX DBM, there is only one iterator per object, and every call changes
|
|
its state. This is much more restrictive than iterators in almost any language,
|
|
even though it is safe. Perhaps with other related objects, whose lifetimes are
|
|
less hierarchical, this limitation is more of a cost than the safety.
|
|
|
|
Second, depending on the relationships of the API's parts, significant design effort
|
|
may be involved. Many of the easier design points have other patterns associated
|
|
with them:
|
|
|
|
- [Wrapper Type Consolidation](./ffi-wrappers.md) groups multiple Rust types together
|
|
into an opaque "object"
|
|
|
|
- [FFI Error Passing](../idioms/ffi-errors.md) explains error handling with integer
|
|
codes and sentinel return values (such as `NULL` pointers)
|
|
|
|
- [Accepting Foreign Strings](../idioms/ffi-accepting-strings.md) allows accepting
|
|
strings with minimal unsafe code, and is easier to get right than
|
|
[Passing Strings to FFI](../idioms/ffi-passing-strings.md)
|
|
|
|
However, not every API can be done this way.
|
|
It is up to the best judgement of the programmer as to who their audience is.
|