helgeho / archivespark   3.0.1

MIT License GitHub

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

Scala versions: 2.11