-
coxautomotivedatasolutions/spark-distcp 0.2
A re-implementation of Hadoop DistCP in Apache Spark
Scala versions: 2.12 2.11 -
absaoss/hyperdrive 4.7.0
Extensible streaming ingestion pipeline on top of Apache Spark
Scala versions: 2.12 2.11 -
tharwaninitin/etlflow 1.7.3
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
Scala versions: 3.x 2.13 2.12Scala.js versions: 1.x -
locationtech-labs/geopyspark 0.3.0
GeoTrellis for PySpark
Scala versions: 2.11 -
heartsavior/spark-sql-kafka-offset-committer 0.2.0
Kafka offset committer for structured streaming query
Scala versions: 2.12 2.11 -
tupol/spark-utils 0.6.2
Basic framework utilities to quickly start writing production ready Apache Spark applications
Scala versions: 2.12 -
zuinnote/spark-hadoopoffice-ds 1.7.0
A Spark datasource for the HadoopOffice library
Scala versions: 2.13 2.12 2.11 -
agile-lab-dev/darwin 1.2.2
Avro Schema Evolution made easy
Scala versions: 2.13 2.12 2.11 2.10 -
music-of-the-ainur/almaren-framework 2.4.5-2.4.5
The Almaren Framework provides a simplified consistent minimalistic layer over Apache Spark. While still allowing you to take advantage of native Apache Spark features. You can still combine it with standard Spark code.
Scala versions: 2.12 2.11 -
sansa-stack/archived-sansa-query 0.7.1
SANSA Query Layer
Scala versions: 2.11 -
agile-lab-dev/wasp 3.0.1
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Scala versions: 2.12 -
sansa-stack/archived-sansa-inference 0.7.1
A general Inference API based on two of the most popular Big Data processing engines: Apache Spark and Apache Flink
Scala versions: 2.11