Remove UB on overflow in Allocate::next()#250
Remove UB on overflow in Allocate::next()#250ImmemorConsultrixContrarie wants to merge 2 commits intoamethyst:masterfrom
Allocate::next()#250Conversation
Remove all unsafe in `Allocate::next()`.
Allocate::next().Allocate::next()
|
Probably you guys should just revert #238. The problem with a possible UB it introduced is that use legion::world::Allocate;
fn main() {
// Assuming `sizeof(usize) == sizeof(u64)`.
let zero_nonzero = Allocate::new().skip(usize::MAX - 16).next();
}and get UB in this totally safe code. Yeah, currently overflow in this crate's code is totally impossible, as |
TomGillen
left a comment
There was a problem hiding this comment.
I can see that we should use NonZeroU64::new() instead of NonZeroU64::new_unchecked, so that on overflow the iterator returns None rather than constructing a NonZeroU64 with 0.
I am not sure how, after making that change, the Allocator iterator being public could cause UB?
| // This is either the first block, or we overflowed to the next block. | ||
| self.next = NEXT_ENTITY.fetch_add(BLOCK_SIZE, Ordering::Relaxed); | ||
| debug_assert_eq!(self.next % BLOCK_SIZE, 0); | ||
| static NEXT_ENTITY_BLOCK_START: AtomicU64 = AtomicU64::new(1); |
There was a problem hiding this comment.
Initializing this to BLOCK_SIZE has the same effect of preventing the first entity from being allocated as 0, but without causing 1 entity in every block to be skipped. It skips the first block instead (which isn't a problem).
| // Safety: self.next can't be 0 as long as the first block is skipped, | ||
| // and no overflow occurs in NEXT_ENTITY | ||
| let entity = unsafe { | ||
| debug_assert_ne!(self.next, 0); | ||
| Entity(NonZeroU64::new_unchecked(self.next)) | ||
| }; |
There was a problem hiding this comment.
@TomGillen
Here's the problem: you have no UB as long as NEXT_ENTITY does not overflow. But how do you know that it won't overflow? Welp, you don't know. If the game runs long enough, it will overflow at some point in time.
Alternatives:
- Skip the first block, use
NonZero::new, returnNoneafter overflow. There is no UB, but this breaksAllocatepromise to never returnNone, and the game will crash with panic on overflow, because we expectAllocateto never returnNoneand simply get next entity byAllocate.next().unwrap(). - Skip the first ID in every block. No UB, never returns
None. The game is probably doomed to have bugs after overflow anyway, due to rewriting old entities, but it won't crash with a panic. - Do not use
NonZero. No UB, never returnsNone, more performant than previous alternative, has a downside ofOption<Entity>not being the same size asEntity.
There was a problem hiding this comment.
Option 1 is what it is designed to do.
If the internal counter has overflowed, the system will panic (or it should, with checked NonZeroU64 construction). You have exhausted the available 64 bit address space. Every ECS has this issue (generational IDs or not) unless they don't provide unique IDs at all.
If you allocated 1000 entities every frame at 60fps, it would take 10 million years before your program panicked. Even single entity allocations that waste most of a block aren't much worse in reality.
The solution for someone experiencing this issue would be to switch the internal ID from a u64 to a u128. That would give you ~2x10^26 years until panic.
Options 2 and 3 will cause the application to behave incorrectly in bizarre and difficult to diagnose ways instead of panicking, which I am not convinced is better.
There was a problem hiding this comment.
bizarre and difficult to diagnose ways instead of panicking
Yeah, you are right, after ten million years panic is better.
Though, there are ways to avoid those bizarre bugs with something like
fn leak(e: Entity) -> EntityLocation;which would remove entity ID but won't remove entity data from archetype. But the thing is that users should call it on forever-living entities, and if the user forgets to do so, we are back to bizarre bugs.
Okay, ten million years is a thing.
There was a problem hiding this comment.
Apologies for bumping this issue, but I have to ask: If the intention is to have it panic when NonZeroU64 overflows, then perhaps that should be included in a comment in this function, if not in the documentation for Allocate as a whole? Maybe a line like this:
Allocate will eventually overflow if enough IDs are generated. However, this is very unlikely to be a concern in reality, unless you plan on running your program for a few hundred millennia, and are generating hundreds of thousands of entity IDs every second during that timespan.
It just seems odd that you'd intend for the program to crash when it overflows, yet not document that fact.
Description
Removes UB on overflow and also removes all unsafe in
Allocate::next(), since I haven't found any real speed differences between unsafe and safe functions.Also moves single-use static item as close as possible to the place where it's used.
Pros: no unsafe, no UB on overflow, even if that overflow is unlikely to ever happen.
Cons: slight performance regression, gets overflowed faster (one index per block is ignored and never used), doesn't fix any overflow bugs like overwriting very old entities after overflow.
Benchmark results:
Expected and somewhat big difference on
BLOCK_SIZE(16) testcase, since now it skips one item per block and actual block size regressed toBLOCK_SIZE - 1.~2% regression in 100k testcase.
Motivation and Context
Because, well, I think that even highly unlikely UB is a bad thing.
How Has This Been Tested?
I used this benchmark to get the time of old and new iterators. It uses #249 to avoid vector reallocation noise.
Checklist:
Edit: accidentally loaded PR with incomplete message.