HDK
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
TfUtf8CodePointIterator Class Referencefinal

#include <unicodeUtils.h>

Classes

class  PastTheEndSentinel
 

Public Types

using iterator_category = std::forward_iterator_tag
 
using value_type = TfUtf8CodePoint
 
using difference_type = std::ptrdiff_t
 
using pointer = void
 
using reference = TfUtf8CodePoint
 

Public Member Functions

 TfUtf8CodePointIterator (const std::string_view::const_iterator &it, const std::string_view::const_iterator &end)
 
value_type operator* () const
 
std::string_view::const_iterator GetBase () const
 Retrieves the wrapped string iterator. More...
 
bool operator== (const TfUtf8CodePointIterator &rhs) const
 
bool operator!= (const TfUtf8CodePointIterator &rhs) const
 
TfUtf8CodePointIteratoroperator++ ()
 
TfUtf8CodePointIterator operator++ (int)
 

Friends

bool operator== (const TfUtf8CodePointIterator &lhs, PastTheEndSentinel)
 
bool operator== (PastTheEndSentinel lhs, const TfUtf8CodePointIterator &rhs)
 
bool operator!= (const TfUtf8CodePointIterator &lhs, PastTheEndSentinel rhs)
 
bool operator!= (PastTheEndSentinel lhs, const TfUtf8CodePointIterator &rhs)
 

Detailed Description

Defines an iterator over a UTF-8 encoded string that extracts unicode code point values.

UTF-8 is a variable length encoding, meaning that one Unicode code point can be encoded in UTF-8 as 1, 2, 3, or 4 bytes. This iterator takes care of consuming the valid UTF-8 bytes for a code point while incrementing.

Definition at line 116 of file unicodeUtils.h.

Member Typedef Documentation

Definition at line 120 of file unicodeUtils.h.

using TfUtf8CodePointIterator::iterator_category = std::forward_iterator_tag

Definition at line 118 of file unicodeUtils.h.

Definition at line 121 of file unicodeUtils.h.

Constructor & Destructor Documentation

TfUtf8CodePointIterator::TfUtf8CodePointIterator ( const std::string_view::const_iterator &  it,
const std::string_view::const_iterator &  end 
)
inline

Constructs an iterator that can read UTF-8 character sequences from the given starting string_view iterator it. end is used as a guard against reading byte sequences past the end of the source string.

When working with views of substrings, end must not point to a continuation byte in a valid UTF-8 byte sequence to avoid decoding errors.

Definition at line 135 of file unicodeUtils.h.

Member Function Documentation

std::string_view::const_iterator TfUtf8CodePointIterator::GetBase ( ) const
inline

Retrieves the wrapped string iterator.

Definition at line 153 of file unicodeUtils.h.

bool TfUtf8CodePointIterator::operator!= ( const TfUtf8CodePointIterator rhs) const
inline

Determines if two iterators are unequal. This intentionally does not consider the end iterator to allow for comparison of iterators between different substring views of the same underlying string.

Definition at line 171 of file unicodeUtils.h.

value_type TfUtf8CodePointIterator::operator* ( ) const
inline

Retrieves the current UTF-8 character in the sequence as its Unicode code point value. Returns TfUtf8InvalidCodePoint when the byte sequence pointed to by the iterator cannot be decoded.

A code point might be invalid because it's incorrectly encoded, exceeds the maximum allowed value, or is in the disallowed surrogate range.

Definition at line 147 of file unicodeUtils.h.

TfUtf8CodePointIterator& TfUtf8CodePointIterator::operator++ ( )
inline

Advances the iterator logically one UTF-8 character sequence in the string. The underlying string iterator will be advanced according to the variable length encoding of the next UTF-8 character, but will never consume non-continuation bytes after the current one.

Definition at line 181 of file unicodeUtils.h.

TfUtf8CodePointIterator TfUtf8CodePointIterator::operator++ ( int  )
inline

Advances the iterator logically one UTF-8 character sequence in the string. The underlying string iterator will be advanced according to the variable length encoding of the next UTF-8 character, but will never consume non-continuation bytes after the current one.

Definition at line 213 of file unicodeUtils.h.

bool TfUtf8CodePointIterator::operator== ( const TfUtf8CodePointIterator rhs) const
inline

Determines if two iterators are equal. This intentionally does not consider the end iterator to allow for comparison of iterators between different substring views of the same underlying string.

Definition at line 162 of file unicodeUtils.h.

Friends And Related Function Documentation

bool operator!= ( const TfUtf8CodePointIterator lhs,
PastTheEndSentinel  rhs 
)
friend

Definition at line 234 of file unicodeUtils.h.

bool operator!= ( PastTheEndSentinel  lhs,
const TfUtf8CodePointIterator rhs 
)
friend

Definition at line 239 of file unicodeUtils.h.

bool operator== ( const TfUtf8CodePointIterator lhs,
PastTheEndSentinel   
)
friend

Checks if the lhs iterator is at or past the end for the underlying string_view

Definition at line 222 of file unicodeUtils.h.

bool operator== ( PastTheEndSentinel  lhs,
const TfUtf8CodePointIterator rhs 
)
friend

Definition at line 228 of file unicodeUtils.h.


The documentation for this class was generated from the following file: