tim-group / iterata   0.1.7

MIT License GitHub

Useful extensions to Scala's Iterator

Scala versions: 2.12 2.11

Build Status Maven Central

iterata

Useful extensions to Scala's Iterator. Think errata for iterators.

Installation

Using SBT:

libraryDependencies += "com.timgroup" %% "iterata" % "0.1.6"

Or download the jar directly from maven central.

Iterata is currently published for Scala 2.11 only, please feel free to let us know if you'd like a build for a different Scala version.

Usage

1. Parallel processing iterator: #par()

Use the #par() method to add parallelism when processing an Iterator with functions chained via #map and #flatMap. It will eagerly evaluate the underlying iterator in chunks, and then evaluate the functions on each chunk via the Scala Parallel Collections. For example:

scala> import com.timgroup.iterata.ParIterator.Implicits._
scala> val it = (1 to 100000).iterator.par().map(n => (n + 1, Thread.currentThread.getId))
scala> it.map(_._2).toSet.size
res2: Int = 8 // addition was distributed over 8 threads

You can provide a specific chunk size, for example it.par(100).

Note that only the following Iterator methods are implemented (so far) to make use of parallel collections:

  • #map
  • #flatMap
  • #filter
  • #find

Grouped vs Ungrouped

The #par() method is available on any iterator, and takes an optional chunk size parameter. However, if you already have a GroupedIterator, you can simply call #par since it is already grouped. For example:

scala> val it = (1 to 100000).iterator.grouped(4).par

2. Memoize exhaustion iterator: #memoizeExhaustion

Use the #memoizeExhaustion method to wrap an Iterator so that its #hasNext method will not be called again after returning false. This is useful in cases where it is expensive to check if there is a next element, such as when I/O is involved.

Can serve as a workaround for SI-9623, where after concatenating two iterators with ++, the left iterator's #hasNext will be called twice for every call to the right iterator's #next().

scala> import com.timgroup.iterata.MemoizeExhaustionIterator.Implicits._
scala> val it1 = new IteratorWithExpensiveHasNext()
scala> val it2 = new IteratorWithExpensiveHasNext()
scala> (it1.memoizeExhaustion ++ it2).foreach(_ => ())
scala> it1.numTimesHasNextReturnedFalse
res2: Int = 1