-
chermenin/spark-states 0.2
Custom state store providers for Apache Spark
Scala versions: 2.12 2.11 -
galliaproject/gallia-core 0.6.1
A schema-aware Scala library for data transformation
Scala versions: 3.x 2.13 2.12 -
helgeho/archivespark 3.0
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Scala versions: 2.11 -
sansa-stack/sansa-stack 0.9.5
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Scala versions: 2.12 -
simplexspatial/osm4scala 1.0
Scala and Spark library focused on reading OpenStreetMap Pbf files.
Scala versions: 2.11 2.10 -
swoop-inc/spark-records 3.0.1
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Scala versions: 2.12 -
jelmerk/hnswlib 1.2.1
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Scala versions: 2.13 2.12 2.11 -
googleclouddataproc/spark-bigquery-connector 0.42.4
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Scala versions: 2.13 2.12 -
mrpowers/spark-stringmetric 0.5.0
Spark functions to run popular phonetic and string matching algorithms
Scala versions: 2.13 2.12 -
databrickslabs/automl-toolkit 0.7.2
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Scala versions: 2.11 -
cerndb/sparkplugins 0.4
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
Scala versions: 2.13 2.12 -
potix2/spark-google-spreadsheets 0.6.3
Google Spreadsheets datasource for SparkSQL and DataFrames
Scala versions: 2.11