-
helgeho/archivespark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Scala versions: 2.11 -
sansa-stack/sansa-stack
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Scala versions: 2.12 2.11 -
swoop-inc/spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Scala versions: 2.12 -
simplexspatial/osm4scala
Scala and Spark library focused on reading OpenStreetMap Pbf files.
Scala versions: 2.13 2.12 2.11 2.10 -
googleclouddataproc/spark-bigquery-connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Scala versions: 2.13 2.12 2.11 -
jelmerk/hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Scala versions: 2.13 2.12 2.11 -
mrpowers/spark-stringmetric
Spark functions to run popular phonetic and string matching algorithms
Scala versions: 2.13 2.12 2.11 -
databrickslabs/automl-toolkit
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Scala versions: 2.11 -
potix2/spark-google-spreadsheets
Google Spreadsheets datasource for SparkSQL and DataFrames
Scala versions: 2.11 2.10 -
uosdmlab/spark-nkp
Natural Korean Processor for Apache Spark
Scala versions: 2.11 -
cerndb/sparkplugins
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
Scala versions: 2.13 2.12