-
galliaproject/gallia-core 0.6.1
A schema-aware Scala library for data transformation
Scala versions: 3.x 2.13 2.12 -
helgeho/archivespark 3.0
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Scala versions: 2.11 -
sansa-stack/sansa-stack 0.9.5
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Scala versions: 2.12 -
simplexspatial/osm4scala 1.0
Scala and Spark library focused on reading OpenStreetMap Pbf files.
Scala versions: 2.11 2.10 -
googleclouddataproc/spark-bigquery-connector 0.41.1
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Scala versions: 2.13 2.12 -
swoop-inc/spark-records 3.0.1
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Scala versions: 2.12 -
jelmerk/hnswlib 1.1.3
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Scala versions: 2.13 2.12 2.11 -
mrpowers/spark-stringmetric 0.5.0
Spark functions to run popular phonetic and string matching algorithms
Scala versions: 2.13 2.12 -
databrickslabs/automl-toolkit 0.7.2
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Scala versions: 2.11 -
potix2/spark-google-spreadsheets 0.6.3
Google Spreadsheets datasource for SparkSQL and DataFrames
Scala versions: 2.11 -
uosdmlab/spark-nkp 0.3.3
Natural Korean Processor for Apache Spark
Scala versions: 2.11