[BUG][Inquiry/Bug?] Unit contract consistency and load/store asymmetry in RegularTileIterator (PitchLinear

### Which component has the problem?

CUTLASS C++

### Bug Report

### Summary
While reviewing the `RegularTileIterator` specialization for `layout::PitchLinear`, I encountered a potential inconsistency regarding how pointer offsets are interpreted (Element units vs. byte units), along with an apparent asymmetry between the `load()` and `store()` tile offset calculations when `kElementsPerAccess > 1`. 

I would like to clarify if this behavior is intended for specific SIMT configurations or if it represents a latent issue for vectorized paths.

---

### 1. Observation: Unit contract in `add_pointer_offset()`
In the CUTLASS `TileIterator` concept, `add_pointer_offset(LongIndex offset)` is generally documented to operate in units of **Elements**. However, the `PitchLinear` specialization appears to apply the offset directly to a `uint8_t*`:

```cpp
// regular_tile_iterator_pitch_linear.h
void add_pointer_offset(LongIndex pointer_offset) {
  pointer_ += pointer_offset; // pointer_ is uint8_t*
}

```

This effectively treats the input as **bytes**. Furthermore, `add_tile_offset()` pre-calculates a byte-sized offset before passing it to this function:

```cpp
int offset = sizeof_bits<Element>::value * (...) / 8;
add_pointer_offset(offset);

```

This suggests that while the internal usage is consistent with byte semantics, it may deviate from the broader `Element`-based contract expected by generic abstractions.

---

### 2. Potential Asymmetry in Contiguous Offset Handling

There appears to be a divergence in how `load()` and `store()` handle their contiguous coordinates.

**In the `load()` path (Line 142):**
The contiguous coordinate is divided by `ThreadMap::kElementsPerAccess`:

* `tile_offset.contiguous() * Shape::kContiguous / ThreadMap::kElementsPerAccess`

**In the `store()` path (Line 175):**
No such division is applied:

* `tile_offset.contiguous() * Shape::kContiguous`

Since both results are eventually passed to `load_with_pointer_offset` (or `store_with_...`), which applies a `sizeof(Element)` multiplier, this leads to different base addresses for the same logical tile offset when `kElementsPerAccess > 1`.

---

### 3. Example illustrating the mismatch

Assume a configuration where:

* `tile_offset.contiguous() * Shape::kContiguous` = 64 elements
* `Element` = `float` (4 bytes)
* `kElementsPerAccess` = 4

**Resulting address offsets:**

* **`store()` path:**  bytes
* **`load()` path:**  bytes

This suggests that for vectorized configurations, `load()` and `store()` may reference different memory locations for the same tile offset.

---

### 4. Context and Possible Directions

This behavior might be masked in many **SIMT kernels** where `ThreadMap::kElementsPerAccess` defaults to 1, making the division a no-op. However, it could lead to unexpected results in vectorized paths.

**Possible directions for resolution:**

1. Align `load()` and `store()` to use a consistent unit convention for contiguous offsets.
2. Clarify the intended unit contract for `add_pointer_offset()` (Elements vs. Bytes).
3. If byte semantics are intended, consider renaming the internal API to `add_byte_offset()` to prevent ambiguity in generic contexts.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG][Inquiry/Bug?] Unit contract consistency and load/store asymmetry in RegularTileIterator (PitchLinear #3017

Which component has the problem?

Bug Report

Summary

1. Observation: Unit contract in `add_pointer_offset()`

2. Potential Asymmetry in Contiguous Offset Handling

3. Example illustrating the mismatch

4. Context and Possible Directions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG][Inquiry/Bug?] Unit contract consistency and load/store asymmetry in RegularTileIterator (PitchLinear #3017

Description

Which component has the problem?

Bug Report

Summary

1. Observation: Unit contract in add_pointer_offset()

2. Potential Asymmetry in Contiguous Offset Handling

3. Example illustrating the mismatch

4. Context and Possible Directions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Observation: Unit contract in `add_pointer_offset()`