Skip to content

Parser combinator framework consumes unnececessary amount of memory #319

Open
@scabug

Description

@scabug

Since all Readers provider by parser combinators framework use PagedSeq inside, using those parsers for working with large files seems impossible - because PagedSeq will not release already parsed elements.

For example, consider the scenario of parsing 1GB file, from which you need only a portion of information (you may want to skip headers, comments, etc.). PagedSeq will hold on the whole 1GB until the parsing finishes and GC would step in.

Example code:

import collection.immutable.PagedSeq
import util.parsing.combinator._
import util.parsing.input._

// virtual file reader (simulates ~400Mb file)
def in = new java.io.Reader {
  var buffersRead = 0
  def read(cbuf: Array[Char], offset: Int, l: Int) = {
    if (buffersRead < 100000) {
      (0 until cbuf.size).foreach(cbuf(_) = 't')
      buffersRead += 1
      cbuf.size
    } else -1
  }
  def close() {}
}

def parser = new RegexParsers {
  var gcCountdown = 0
  def tt = new Parser[Char] {
    def apply(in: Input) = {
      gcCountdown += 1
      if (gcCountdown > 10000) {
        System.gc()
        gcCountdown = 0
      }
      if (in.atEnd)
        Failure("", in)
      else
        Success(in.first, in.drop(1024))
    }
  }
  def go(in: java.io.Reader) = parseAll(tt.*, in).get.size
}
println(parser.go(in))

If you would look at memory usage using something like jvisualvm, you would notice that running this process consumes about 800Mb of RAM just to parse 400kb worth of characters.

Activity

scabug

scabug commented on Oct 12, 2012

@scabug
Author

Imported From: https://issues.scala-lang.org/browse/SI-6520?orig=1
Reporter: Platon Pronko (rogach)
Affected Versions: 2.9.2

scabug

scabug commented on Jul 10, 2013

@scabug
Author

@adriaanm said:
Unassigning and rescheduling to M6 as previous deadline was missed.

transferred this issue fromscala/bugon Nov 19, 2020
deleted a comment from scabug on Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Parser combinator framework consumes unnececessary amount of memory · Issue #319 · scala/scala-parser-combinators