Available for Enterprise Edition only.
Requirements
Some options require a specific gemLibsVersion. To update this, you must manually change thegemLibsVersion value inside pbt_project.yml in your project Git repository.
Mode
There are a few different types of gems that you can create. The table below describes each mode you can choose.| Mode | Description | Additional settings |
|---|---|---|
| Transformation | Edits intermediate data in the pipeline that is in-memory. | Choose the category of the transformation gem |
| Dataset Format | Reads and writes data between storage and memory. | Choose whether the type is batch or streaming |
| Custom Subgraph (Python only) | Controls the flow of gems. Visit the Subgraph page for an example. | None |
Classes
The following classes must be included in all Spark gems. Each class extends a base class that Prophecy has defined.- A class where you inherit the representation of the overall gem.
- A class that contains the properties to be made available to the user for this particular gem.
- A class that defines the Spark code that needs to run on your Spark cluster.
| Class | Base Class for Transformation | Base Class for Dataset Format | Base Class for Custom Subgraph |
|---|---|---|---|
| class CustomGem(BaseClass) | ComponentSpec | DatasetSpec | MetaComponentSpec |
| class YourProperties(BaseClass) | ComponentProperties | ComponentProperties | MetaComponentProperties |
| class YourCode(BaseClass) | ComponentCode | ComponentCode | MetaComponentCode |
Functions
The following functions can be used to customize Spark gems.| Function | Purpose | Return | Gem Mode |
|---|---|---|---|
optimizeCode | Enables the Prophecy optimizer to simplify the gem code when it runs. | Boolean | All |
customOutputSchemaEnabled | Enables the custom schema option by default in the gem. Requires gemLibsVersion 1.1.47+ for Scala. | Boolean | Transformation |
dialog | Defines how you want the gem to look like in the visual interface. | Dialog object | Transformation and Subgraph |
sourceDialog | Defines how you want the source gem to look like in the visual interface. | DatasetDialog object | Dataset and Subgraph |
targetDialog | Defines how you want the target gem to look like in the visual interface. | DatasetDialog object | Dataset and Subgraph |
validate | Defines how to detect user errors when using the gem. | Diagnostics array | All |
onChange | Define UI state transformations. | Properties object | All |
serializeProperty | (Scala only) Takes a Properties object and converts it into JSON format. | String | All |
deserializeProperty | (Scala only) Parses a JSON string and converts it into a Properties object. | Properties object | All |
apply | Included in the class that extends component code to define Spark logic. | None, DataFrame, or list of DataFrames | Transformation and Subgraph |
sourceApply | Included in the class that extends component code to define Spark logic. | DataFrame | Dataset |
targetApply | Included in the class that extends component code to define Spark logic. | None | Dataset |
Examples
Parent Class
- Python
- Scala
Properties Classes
- Python
- Scala
Dialog (UI)
- Python
- Scala

onChange function.
Validation
- Python
- Scala
State Changes
- Python
- Scala
Component Code
- Python
- Scala
To keep gems generally compatible with each other, they must conform to a common interface. Therefore, as defined in the
apply() method, gems must accept and produce DataFrame objects at the input and output ports.To assist the Spark Catalyst Optimizer to build scalable code, Prophecy performs some minor optimizations to the code
generated by the
apply() method.Dataset Format example
The previous examples were for Transformation gems. The following example is the code for a Dataset Format gem.- Python
- Scala

