How does one process large binary data files in Clojure? Let's assume data/files are about 50MB - small enough to be processed in memory (but not with a naive implementation).
The following code correctly removes ^M from small files but it throws OutOfMemoryError for larger files (like 6MB):
(defn read-bin-file [file]
(to-byte-array (as-file file)))
(defn remove-cr-from-file [file]
(let [dirty-bytes (read-bin-file file)
clean-bytes (filter #(not (= 13 %)) dirty-bytes)
changed? (< (count clean-bytes) (alength dirty-bytes))] ; OutOfMemoryError
(if changed?
(write-bin-file file clean-bytes)))) ; writing works fine
It seems that Java byte arrays can't be treated as seq as it is extremely inefficient.
On the other hand, solutions with aset, aget and areduce are bloated, ugly and imperative because you can't really use Clojure sequence library.
What am I missing? How does one process large binary data files in Clojure?