moFileReader — Fast File Parsing for JavaScript

moFileReader — Fast File Parsing for JavaScriptParsing files efficiently in the browser or in Node.js is a common requirement for modern web apps: uploading large CSVs, reading logs, processing images, or handling custom binary formats. moFileReader is a lightweight JavaScript library designed to make file parsing fast, memory-efficient, and easy to integrate. This article explains why moFileReader exists, how it works, where it shines, practical usage patterns, performance considerations, and common pitfalls to avoid.

What is moFileReader?

moFileReader is a small, focused file-parsing library for JavaScript that emphasizes streaming and minimal memory footprint. It provides utilities to read files chunk-by-chunk, parse structured text (CSV/TSV/JSON-lines), decode binary formats, and integrate with web APIs (File, Blob, Streams) and Node.js streams. The core philosophy is: process data incrementally, avoid full-file buffering, and expose a simple, composable API.

Why use moFileReader?

Handles very large files without loading the entire file into memory.
Optimized for streaming parsing patterns (line-oriented formats, chunked binary).
Works in browsers and Node.js with a consistent API.
Minimal dependencies and simple API surface — ideal for embedding in apps without heavy bundles.
Extensible parsing callbacks let you integrate transformation and validation easily.

Key features

Chunk-based reading from File/Blob and Node.js streams.
Line-aware parsing for newline-delimited formats (CSV, JSONL, logs).
Pluggable decoders (UTF-8, UTF-16, base64, custom codecs).
Simple backpressure-friendly API that cooperates with browser streams and async iterators.
Built-in utilities for CSV parsing with configurable delimiters, quoting rules, and header handling.
Lightweight binary helpers for reading little/big-endian integers, floats, and offsets.

Core concepts

Chunk streaming: instead of loading the whole file, moFileReader reads a configurable chunk size (e.g., 64KB) and emits those chunks for parsing.
Buffer boundary handling: text lines and tokens can span chunks; moFileReader maintains minimal carry-over buffers to join partial tokens correctly.
Incremental parsing: a parser consumes incoming bytes/strings and emits complete records as soon as they are available.
Backpressure and flow control: the reader can pause/resume based on downstream processing speed (useful in browser UI work or CPU-heavy parsing).
Async iterators: the API supports async iteration so you can for-await file records in a natural way.

Installation

Install via npm for Node.js projects:

npm install mofilereader

In browsers, use a bundler (Rollup/Webpack/Vite) or import via ESM from a CDN that serves the package.

Basic usage — browser (File input)

This example shows reading a large newline-delimited file (JSONL or logs) in the browser without loading it entirely in memory.

import { createFileReader } from 'mofilereader'; const input = document.querySelector('#file-input'); input.addEventListener('change', async (e) => {   const file = e.target.files[0];   const reader = createFileReader(file, { chunkSize: 64 * 1024, encoding: 'utf-8' });   for await (const line of reader.lines()) {     // process each line (string) as it becomes available     try {       const obj = JSON.parse(line);       // handle object     } catch (err) {       // handle parse errors     }   } });

Basic usage — Node.js (stream)

Use moFileReader with Node.js streams to parse CSV or binary logs.

import fs from 'fs'; import { createStreamReader } from 'mofilereader'; const stream = fs.createReadStream('./large.csv'); const reader = createStreamReader(stream, { encoding: 'utf-8', delimiter: ' ' }); for await (const row of reader.csv({ headers: true })) {   // row is an object mapping header -> value }

CSV parsing example

moFileReader’s CSV utility supports configurable delimiter, quote characters, escape rules, and streaming emission of parsed rows.

const reader = createFileReader(file, { chunkSize: 128 * 1024, encoding: 'utf-8' }); for await (const row of reader.csv({   delimiter: ',',   quote: '"',   escape: '\',   headers: true })) {   // Each row is either an array (no headers) or an object (headers: true)   console.log(row); }

Notes:

Handles quoted fields with embedded newlines.
Minimal memory usage: only partial field buffers are retained across chunk boundaries.

Binary parsing example

Reading binary formats (e.g., custom records where each record starts with a 4-byte length) is straightforward.

const reader = createFileReader(file, { chunkSize: 32 * 1024, binary: true }); for await (const record of reader.readRecords({   headerBytes: 4,   parseHeader: (buf) => buf.readUInt32LE(0),   parseBody: async (bodyBuf) => {     // decode bodyBuf as needed     return processRecord(bodyBuf);   } })) {   // record is the parsed result of parseBody }

moFileReader ensures partial header/body data across chunks is correctly concatenated.

Performance considerations

Chunk size: default ~64KB works well; use larger chunks (256KB–1MB) for high-throughput servers and smaller chunks for UI responsiveness.
Avoid expensive synchronous work inside the parsing loop. Offload heavy transforms to Web Workers or worker threads.
Use async iteration with small commits to the UI to keep the main thread responsive.
If parsing CPU-bound formats (complex CSV transforms, decompressing), combine moFileReader streaming with worker threads to prevent blocking.

Memory usage patterns

Streaming avoids buffering the whole file. Memory usage grows with:
- chunk size
- size of carry-over buffers for partial tokens/lines
- size of batches you accumulate before writing/processing
To keep memory minimal: process records as they arrive and avoid collecting them in arrays.

Error handling & resilience

Parsing errors: moFileReader emits per-record parse errors (so a single malformed line doesn’t crash the whole process) and can be configured to skip, collect, or halt on errors.
Partial files: when a file is cut off mid-record, moFileReader can either emit the last partial record or report an incomplete-record error.
Encoding issues: configure encoding explicitly; fallback policies are available (e.g., replace invalid sequences or throw).

Integration patterns

Upload pipelines: parse files in the browser, validate rows, and stream valid batches to an upload API.
ETL jobs: use Node.js stream reader to transform and push data into databases without temporary files.
Client-side previews: parse the first N rows to display previews, then continue parsing in background.
Web Workers: run heavy parsing in a worker and post results to the main thread for UI updates.

Comparison with native FileReader and other libraries

Feature	moFileReader	Native FileReader (browser)	Papaparse / csv-parse
Streaming / chunked parsing	Yes	No (reads whole Blob or slices)	PapaParse: chunked; csv-parse: streaming
Memory usage for large files	Low	High (if full file read)	Varies — PapaParse supports streaming
Binary parsing helpers	Yes	No	Limited
Backpressure support	Yes	No	Partial
Browser + Node unified API	Yes	Browser-only	Node/browser variants

Common pitfalls and how to avoid them

Assuming tokens won’t span chunks — always use the library’s line/field handlers rather than naive splitting.
Blocking the main thread — for large, CPU-heavy parsing offload to workers.
Misconfigured encoding — specify encoding to avoid silent data corruption.
Collecting results in memory — process or persist incrementally.

Extending moFileReader

Custom parsers: implement a parser that consumes chunks and emits complete records; plug it into the reader pipeline.
Plugins: add converters (e.g., CSV-to-JSON transformer, compression decompressors) that attach as pipeline stages.
TypeScript types: moFileReader ships with typings; extend them for domain-specific record shapes.

Example real-world workflow

User selects a 1.2 GB CSV in the browser.
moFileReader reads the file in 256KB chunks and parses rows.
Each parsed row is validated; valid rows are batched (e.g., 500 rows) and POSTed to a server.
The UI shows progress based on bytes processed and successful uploads.
Errors are logged and the file continues processing to avoid blocking other uploads.

This pattern prevents the browser from running out of memory and keeps the UI responsive while handling large datasets.

When not to use moFileReader

Very small files where convenience matters more than streaming — native FileReader or simple read() may suffice.
Extremely specialized parsers already optimized in native C/C++ extensions (for Node.js) where maximum CPU throughput is required.
If you need a full-featured CSV library with complex dialect auto-detection out-of-the-box (though moFileReader can be combined with such tools).

Summary

moFileReader is a focused tool for fast, memory-conscious file parsing in JavaScript. It shines in scenarios with large files, streaming needs, and environments where keeping memory low and responsiveness high are priorities. With a small API surface, support for both browser and Node.js environments, and built-in parsing helpers, moFileReader is a practical choice for file-heavy applications.