egen / spark-workday   1.1.0

Apache License 2.0 GitHub

Spark data source for Workday

Scala versions: 2.11

Spark Workday Library

Spark Connector for Workday is a SOAP web services wrapper around the Workday API published here.

Requirements

This library requires Spark 2.x

For Spark 1.x support, please check spark1.x branch.

Linking

You can link against this library in your program at the following ways:

Maven Dependency

<dependency>
    <groupId>com.springml</groupId>
    <artifactId>spark-workday_2.11</artifactId>
    <version>1.1.0</version>
</dependency>

SBT Dependency

libraryDependencies += "com.springml" % "spark-workday_2.11" % "1.1.0"

Using with Spark shell

This package can be added to Spark using the --packages command line option. For example, to include it when starting the spark shell:

$ bin/spark-shell --packages com.springml:spark-workday_2.11:1.1.0

Feature

  • Construct Spark Dataframe using Workday data - User has to provide WWS (Workday Web Service) request and list of XPath to read data from Workday. The XPath will be evaluated against WWS response and dataframe will be constructed based on that

Options

  • username: Workday Web Service Username.
  • password: Workday Web Service Password.
  • wwsEndpoint: Workday Web Service endpoint.
  • request: Workday Web Service request. This will be used to execute the required Web Service. Sample request is present over here
  • objectTagPath: XPath of the response element which should be considered as Object element
  • detailsTagPath: XPath of the detail element in object. This will be used to get Object Detail element
  • xpathMap: Location of CSV file which should contain fieldName, fieldType and its XPath. Sample file is present over here
  • namespacePrefixMap: Location of CSV file which should contain prefix and its corresponding namespace. Sample file is present over here

Scala API

// Construct Dataframe using WWS
// Request to be executed against WWS
// Here Get_Customer_Invoices operation from Revenue_Management Service is used
// https://community.workday.com/custom/developer/API/Revenue_Management/v27.0/Get_Customer_Invoices.html
val request = "<bsvc:Get_Customer_Invoices_Request xmlns:bsvc=\"urn:com.workday/bsvc\"><bsvc:Response_Filter><bsvc:As_Of_Effective_Date>2016-09-09</bsvc:As_Of_Effective_Date><bsvc:As_Of_Entry_DateTime>2016-09-09</bsvc:As_Of_Entry_DateTime><bsvc:Page>1</bsvc:Page><bsvc:Count>100</bsvc:Count></bsvc:Response_Filter><bsvc:Response_Group><bsvc:Include_Reference>1</bsvc:Include_Reference><bsvc:Include_Customer_Invoice_Data>1</bsvc:Include_Customer_Invoice_Data></bsvc:Response_Group></bsvc:Get_Customer_Invoices_Request>"

// Below constructs dataframe by executing wws 
// Customer_Invoice is the object element and hence //wd:Customer_Invoice in objectTagPath
// Customer_Invoice_Line_Replacement_Data is the detail element and 
// hence /wd:Customer_Invoice/wd:Customer_Invoice_Data/wd:Customer_Invoice_Line_Replacement_Data in detailsTagPath
// detailsTagPath is relative to objectTagPath
val df = spark.read.
    format("com.springml.spark.workday").
    option("username", "wws_username").
    option("password", "wws_password").
    option("wwsEndpoint", "wws_endpoint").
    option("request", request).
    option("objectTagPath", "//wd:Customer_Invoice").
    option("detailsTagPath", "/wd:Customer_Invoice/wd:Customer_Invoice_Data/wd:Customer_Invoice_Line_Replacement_Data").
    option("xpathMap","/home/xpath.csv").
    option("namespacePrefixMap","/home/namespaces.csv").
    load()  

R API

# Request to be executed against WWS
# Here Get_Customer_Invoices operation from Revenue_Management Service is used
# https://community.workday.com/custom/developer/API/Revenue_Management/v27.0/Get_Customer_Invoices.html
ws_request <- "<bsvc:Get_Customer_Invoices_Request xmlns:bsvc=\"urn:com.workday/bsvc\"><bsvc:Response_Filter><bsvc:As_Of_Effective_Date>2016-09-09</bsvc:As_Of_Effective_Date><bsvc:As_Of_Entry_DateTime>2016-09-09</bsvc:As_Of_Entry_DateTime><bsvc:Page>1</bsvc:Page><bsvc:Count>100</bsvc:Count></bsvc:Response_Filter><bsvc:Response_Group><bsvc:Include_Reference>1</bsvc:Include_Reference><bsvc:Include_Customer_Invoice_Data>1</bsvc:Include_Customer_Invoice_Data></bsvc:Response_Group></bsvc:Get_Customer_Invoices_Request>"

# Below constructs dataframe by executing wws 
# Customer_Invoice is the object element and hence //wd:Customer_Invoice in objectTagPath
# Customer_Invoice_Line_Replacement_Data is the detail element and 
# hence /wd:Customer_Invoice/wd:Customer_Invoice_Data/wd:Customer_Invoice_Line_Replacement_Data in detailsTagPath
# detailsTagPath is relative to objectTagPath
df <- read.df(source="com.springml.spark.workday",
      username="wws_username",
      password="wws_password",
      wwsEndpoint="wws_endpoint",
      request=ws_request,
      objectTagPath="//wd:Customer_Invoice",
      detailsTagPath="/wd:Customer_Invoice/wd:Customer_Invoice_Data/wd:Customer_Invoice_Line_Replacement_Data",
      xpathMap="/home/xpath.csv",
      namespacePrefixMap="/home/namespaces.csv")

Building From Source

This library is built with SBT, which is automatically downloaded by the included shell script. To build a JAR file simply run sbt/sbt package from the project root.