-
microsoft/synapseml 1.0.5
Simple and Distributed Machine Learning
Scala versions: 2.12 -
lucacanali/sparkmeasure 0.24
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Scala versions: 2.13 2.12 -
hydrospheredata/mist 0.6.4
Serverless proxy for Spark cluster
Scala versions: 2.11 2.10 -
azure/azure-event-hubs-spark 2.1.5
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Scala versions: 2.11 -
azure/azure-cosmosdb-spark 3.7.0
Apache Spark Connector for Azure Cosmos DB
Scala versions: 2.11 -
treeverse/lakefs 0.14.1
lakeFS - Data version control for your data lake | Git for data
Scala versions: 2.12 -
streamnative/pulsar-spark 2.4.5
Spark Connector to read and write with Pulsar
Scala versions: 2.11 -
microsoft/mobius 2.0.200
C# and F# language binding and extensions to Apache Spark
Scala versions: 2.11 -
chermenin/spark-states 0.2
Custom state store providers for Apache Spark
Scala versions: 2.12 2.11 -
sansa-stack/sansa-stack 0.9.5
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Scala versions: 2.12 -
swoop-inc/spark-records 3.0.1
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Scala versions: 2.12 -
databrickslabs/automl-toolkit 0.7.2
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Scala versions: 2.11 -
uosdmlab/spark-nkp 0.3.3
Natural Korean Processor for Apache Spark
Scala versions: 2.11 -
absaoss/hyperdrive 4.7.0
Extensible streaming ingestion pipeline on top of Apache Spark
Scala versions: 2.12 2.11 -
coxautomotivedatasolutions/spark-distcp 0.2.5
A re-implementation of Hadoop DistCP in Apache Spark
Scala versions: 2.13 2.12 2.11 -
tupol/spark-utils 0.6.2
Basic framework utilities to quickly start writing production ready Apache Spark applications
Scala versions: 2.12 -
heartsavior/spark-state-tools 0.4.0
Spark Structured Streaming State Tools
Scala versions: 2.12 2.11 -
itspawanbhardwaj/spark-fuzzy-matching 1.0.1
Fuzzy matching function in spark (https://spark-packages.org/package/itspawanbhardwaj/spark-fuzzy-matching)
Scala versions: 2.11 -
whylabs/whylogs-java 0.1.3
Profile and monitor your ML data pipeline end-to-end
Scala versions: 2.12