-
cerndb/sparkplugins
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
Scala versions: 2.13 2.12 -
locationtech-labs/geopyspark
GeoTrellis for PySpark
Scala versions: 2.11 -
tharwaninitin/etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
Scala versions: 3.x 2.13 2.12 2.11Scala.js versions: 1.x -
absaoss/hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
Scala versions: 2.12 2.11 -
coxautomotivedatasolutions/spark-distcp
A re-implementation of Hadoop DistCP in Apache Spark
Scala versions: 2.13 2.12 2.11 -
zuinnote/spark-hadoopoffice-ds
A Spark datasource for the HadoopOffice library
Scala versions: 2.13 2.12 2.11 2.10 -
tupol/spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
Scala versions: 2.13 2.12 2.11 -
heartsavior/spark-sql-kafka-offset-committer
Kafka offset committer for structured streaming query
Scala versions: 2.12 2.11 -
agile-lab-dev/darwin
Avro Schema Evolution made easy
Scala versions: 2.13 2.12 2.11 2.10 -
sansa-stack/archived-sansa-query
SANSA Query Layer
Scala versions: 2.11 -
indix/sparkplug
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Scala versions: 2.12 2.11 2.10 -
fsanaulla/chronicler
Scala toolchain for InfluxDB
Scala versions: 2.13 2.12 2.11