-
microsoft/synapseml
Simple and Distributed Machine Learning
Scala versions: 2.11 -
lucacanali/sparkmeasure
This is the development repository for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
Scala versions: 2.13 2.12 2.11 -
hydrospheredata/mist
Serverless proxy for Spark cluster
Scala versions: 2.12 2.11 2.10 -
azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Scala versions: 2.12 2.11 2.10 -
azure/azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
Scala versions: 2.11 2.10 -
treeverse/lakefs
lakeFS - Data version control for your data lake | Git for data
Scala versions: 2.12 2.11 -
streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
Scala versions: 2.12 2.11 -
microsoft/mobius
C# and F# language binding and extensions to Apache Spark
Scala versions: 2.11 2.10 -
chermenin/spark-states
Custom state store providers for Apache Spark
Scala versions: 2.12 2.11 -
sansa-stack/sansa-stack
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Scala versions: 2.12 2.11 -
swoop-inc/spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Scala versions: 2.12 -
databrickslabs/automl-toolkit
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Scala versions: 2.11 -
uosdmlab/spark-nkp
Natural Korean Processor for Apache Spark
Scala versions: 2.11 -
absaoss/hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
Scala versions: 2.12 2.11 -
coxautomotivedatasolutions/spark-distcp
A re-implementation of Hadoop DistCP in Apache Spark
Scala versions: 2.13 2.12 2.11 -
heartsavior/spark-state-tools
Spark Structured Streaming State Tools
Scala versions: 2.12 2.11 -
tupol/spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
Scala versions: 2.12 2.11 -
itspawanbhardwaj/spark-fuzzy-matching
Fuzzy matching function in spark (https://spark-packages.org/package/itspawanbhardwaj/spark-fuzzy-matching)
Scala versions: 2.11 2.10 -
isarn/isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Scala versions: 2.12 2.11 2.10