Files
text-encoding/README.md
Chris Daßler 3342f7e40b Initial commit: Text encoding component with UTF-8 polyfills
🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-29 14:54:44 +02:00

159 lines
5.3 KiB
Markdown
Raw Permalink Blame History

# Text Encoding Component
UTF-8 text encoding/decoding utilities with automatic polyfill support for React Native.
## Features
- **Standard Compliance**: Compatible with the standard TextEncoder/TextDecoder Web API
- **React Native Support**: Automatic polyfills for environments without native support
- **UTF-8 Only**: Focused implementation supporting only UTF-8 encoding for reliability
- **Performance**: Uses native implementations when available, falls back to efficient polyfills
- **TypeScript**: Full TypeScript support with comprehensive type definitions
## Installation
This package is designed to be loaded via IOR (Interoperable Object Reference) from Gitea:
```typescript
import { textEncoding } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
```
## Usage
### Simple Text Encoding/Decoding
The easiest way to use this component is through the default service:
```typescript
import { textEncoding } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
// Encode string to bytes
const encoded = textEncoding.encode('Hello, 世界! 🌍');
console.log(encoded); // Uint8Array
// Decode bytes to string
const decoded = textEncoding.decode(encoded);
console.log(decoded); // "Hello, 世界! 🌍"
```
### Factory Functions
For more control, use the factory functions:
```typescript
import { createTextEncoder, createTextDecoder } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
const encoder = createTextEncoder();
const decoder = createTextDecoder();
const bytes = encoder.encode('Hello World');
const text = decoder.decode(bytes);
```
### Advanced Usage
Create decoder instances with options:
```typescript
import { createTextDecoder } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
// Throw on invalid sequences instead of using replacement character
const fatalDecoder = createTextDecoder('utf-8', { fatal: true });
// Ignore byte order mark
const ignoreBomDecoder = createTextDecoder('utf-8', { ignoreBOM: true });
```
### Direct Polyfill Usage
Access polyfill classes directly for advanced use cases:
```typescript
import { TextEncoderPolyfill, TextDecoderPolyfill } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
const encoder = new TextEncoderPolyfill();
const decoder = new TextDecoderPolyfill('utf-8', { fatal: false });
```
## API Reference
### textEncoding (Default Service)
The main service instance with convenient methods:
- `encode(text: string): Uint8Array` - Encode string to UTF-8 bytes
- `decode(bytes: Uint8Array | ArrayBuffer | number[]): string` - Decode bytes to string
- `stringToUtf8(text: string): Uint8Array` - Alias for encode()
- `utf8ToString(bytes: Uint8Array | number[]): string` - Alias for decode()
### Factory Functions
- `createTextEncoder(): ITextEncoder` - Create encoder instance
- `createTextDecoder(label?: string, options?: TextDecoderOptions): ITextDecoder` - Create decoder instance
- `installTextEncodingPolyfills(): void` - Install global polyfills
### Interfaces
#### ITextEncoder
- `encoding: string` - Always 'utf-8'
- `encode(input?: string): Uint8Array` - Encode string to bytes
- `encodeInto(source: string, destination: Uint8Array): TextEncoderEncodeIntoResult` - Encode into existing array
#### ITextDecoder
- `encoding: string` - Always 'utf-8'
- `fatal: boolean` - Whether to throw on invalid sequences
- `ignoreBOM: boolean` - Whether to ignore byte order mark
- `decode(input?: ArrayBufferView | ArrayBuffer, options?: TextDecodeOptions): string` - Decode bytes to string
## Error Handling
The component handles various error conditions gracefully:
```typescript
import { textEncoding } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
// Invalid UTF-8 sequences are replaced with <20> (U+FFFD) by default
const invalidBytes = new Uint8Array([0xFF, 0xFE, 0xFD]);
const result = textEncoding.decode(invalidBytes);
console.log(result); // "<22><><EFBFBD>"
// Use fatal mode to throw on errors
import { createTextDecoder } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
const fatalDecoder = createTextDecoder('utf-8', { fatal: true });
try {
fatalDecoder.decode(invalidBytes);
} catch (error) {
console.error('Invalid UTF-8 sequence:', error.message);
}
```
## Platform Support
- **React Native**: Full support with automatic polyfills
- **Node.js**: Uses native TextEncoder/TextDecoder when available
- **Browsers**: Uses native implementations in modern browsers
- **Automatic Fallback**: Seamlessly falls back to polyfills when native support is unavailable
## Performance Notes
- Native implementations are preferred when available for optimal performance
- Polyfills are optimized for correctness and reasonable performance
- UTF-8 validation is performed to ensure data integrity
- Surrogate pair handling for proper Unicode support
## Unicode Support
This implementation fully supports the Unicode standard:
- All valid Unicode code points (U+0000 to U+10FFFF)
- Proper surrogate pair handling for characters above U+FFFF
- UTF-8 validation with proper error handling
- BOM (Byte Order Mark) support with optional ignoring
## Version Information
- Version: 1.0.0
- Component Name: text-encoding
- IOR: `ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0`