Initial commit: Text encoding component with UTF-8 polyfills
🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
159
README.md
Normal file
159
README.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# Text Encoding Component
|
||||
|
||||
UTF-8 text encoding/decoding utilities with automatic polyfill support for React Native.
|
||||
|
||||
## Features
|
||||
|
||||
- **Standard Compliance**: Compatible with the standard TextEncoder/TextDecoder Web API
|
||||
- **React Native Support**: Automatic polyfills for environments without native support
|
||||
- **UTF-8 Only**: Focused implementation supporting only UTF-8 encoding for reliability
|
||||
- **Performance**: Uses native implementations when available, falls back to efficient polyfills
|
||||
- **TypeScript**: Full TypeScript support with comprehensive type definitions
|
||||
|
||||
## Installation
|
||||
|
||||
This package is designed to be loaded via IOR (Interoperable Object Reference) from Gitea:
|
||||
|
||||
```typescript
|
||||
import { textEncoding } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Simple Text Encoding/Decoding
|
||||
|
||||
The easiest way to use this component is through the default service:
|
||||
|
||||
```typescript
|
||||
import { textEncoding } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
|
||||
|
||||
// Encode string to bytes
|
||||
const encoded = textEncoding.encode('Hello, 世界! 🌍');
|
||||
console.log(encoded); // Uint8Array
|
||||
|
||||
// Decode bytes to string
|
||||
const decoded = textEncoding.decode(encoded);
|
||||
console.log(decoded); // "Hello, 世界! 🌍"
|
||||
```
|
||||
|
||||
### Factory Functions
|
||||
|
||||
For more control, use the factory functions:
|
||||
|
||||
```typescript
|
||||
import { createTextEncoder, createTextDecoder } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
|
||||
|
||||
const encoder = createTextEncoder();
|
||||
const decoder = createTextDecoder();
|
||||
|
||||
const bytes = encoder.encode('Hello World');
|
||||
const text = decoder.decode(bytes);
|
||||
```
|
||||
|
||||
### Advanced Usage
|
||||
|
||||
Create decoder instances with options:
|
||||
|
||||
```typescript
|
||||
import { createTextDecoder } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
|
||||
|
||||
// Throw on invalid sequences instead of using replacement character
|
||||
const fatalDecoder = createTextDecoder('utf-8', { fatal: true });
|
||||
|
||||
// Ignore byte order mark
|
||||
const ignoreBomDecoder = createTextDecoder('utf-8', { ignoreBOM: true });
|
||||
```
|
||||
|
||||
### Direct Polyfill Usage
|
||||
|
||||
Access polyfill classes directly for advanced use cases:
|
||||
|
||||
```typescript
|
||||
import { TextEncoderPolyfill, TextDecoderPolyfill } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
|
||||
|
||||
const encoder = new TextEncoderPolyfill();
|
||||
const decoder = new TextDecoderPolyfill('utf-8', { fatal: false });
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### textEncoding (Default Service)
|
||||
|
||||
The main service instance with convenient methods:
|
||||
|
||||
- `encode(text: string): Uint8Array` - Encode string to UTF-8 bytes
|
||||
- `decode(bytes: Uint8Array | ArrayBuffer | number[]): string` - Decode bytes to string
|
||||
- `stringToUtf8(text: string): Uint8Array` - Alias for encode()
|
||||
- `utf8ToString(bytes: Uint8Array | number[]): string` - Alias for decode()
|
||||
|
||||
### Factory Functions
|
||||
|
||||
- `createTextEncoder(): ITextEncoder` - Create encoder instance
|
||||
- `createTextDecoder(label?: string, options?: TextDecoderOptions): ITextDecoder` - Create decoder instance
|
||||
- `installTextEncodingPolyfills(): void` - Install global polyfills
|
||||
|
||||
### Interfaces
|
||||
|
||||
#### ITextEncoder
|
||||
|
||||
- `encoding: string` - Always 'utf-8'
|
||||
- `encode(input?: string): Uint8Array` - Encode string to bytes
|
||||
- `encodeInto(source: string, destination: Uint8Array): TextEncoderEncodeIntoResult` - Encode into existing array
|
||||
|
||||
#### ITextDecoder
|
||||
|
||||
- `encoding: string` - Always 'utf-8'
|
||||
- `fatal: boolean` - Whether to throw on invalid sequences
|
||||
- `ignoreBOM: boolean` - Whether to ignore byte order mark
|
||||
- `decode(input?: ArrayBufferView | ArrayBuffer, options?: TextDecodeOptions): string` - Decode bytes to string
|
||||
|
||||
## Error Handling
|
||||
|
||||
The component handles various error conditions gracefully:
|
||||
|
||||
```typescript
|
||||
import { textEncoding } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
|
||||
|
||||
// Invalid UTF-8 sequences are replaced with <20> (U+FFFD) by default
|
||||
const invalidBytes = new Uint8Array([0xFF, 0xFE, 0xFD]);
|
||||
const result = textEncoding.decode(invalidBytes);
|
||||
console.log(result); // "<22><><EFBFBD>"
|
||||
|
||||
// Use fatal mode to throw on errors
|
||||
import { createTextDecoder } from 'ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0';
|
||||
const fatalDecoder = createTextDecoder('utf-8', { fatal: true });
|
||||
try {
|
||||
fatalDecoder.decode(invalidBytes);
|
||||
} catch (error) {
|
||||
console.error('Invalid UTF-8 sequence:', error.message);
|
||||
}
|
||||
```
|
||||
|
||||
## Platform Support
|
||||
|
||||
- **React Native**: Full support with automatic polyfills
|
||||
- **Node.js**: Uses native TextEncoder/TextDecoder when available
|
||||
- **Browsers**: Uses native implementations in modern browsers
|
||||
- **Automatic Fallback**: Seamlessly falls back to polyfills when native support is unavailable
|
||||
|
||||
## Performance Notes
|
||||
|
||||
- Native implementations are preferred when available for optimal performance
|
||||
- Polyfills are optimized for correctness and reasonable performance
|
||||
- UTF-8 validation is performed to ensure data integrity
|
||||
- Surrogate pair handling for proper Unicode support
|
||||
|
||||
## Unicode Support
|
||||
|
||||
This implementation fully supports the Unicode standard:
|
||||
|
||||
- All valid Unicode code points (U+0000 to U+10FFFF)
|
||||
- Proper surrogate pair handling for characters above U+FFFF
|
||||
- UTF-8 validation with proper error handling
|
||||
- BOM (Byte Order Mark) support with optional ignoring
|
||||
|
||||
## Version Information
|
||||
|
||||
- Version: 1.0.0
|
||||
- Component Name: text-encoding
|
||||
- IOR: `ior:gitea:gitea.metatrom.net:universal-components/text-encoding@1.0.0`
|
||||
Reference in New Issue
Block a user