Documentation Index
Fetch the complete documentation index at: https://docs.prophecy.ai/llms.txt
Use this file to discover all available pages before exploring further.
This will repartition or coalesce the input DataFrame based on the specified configuration. There are four different repartitioning options:
Hash Repartitoning
Repartitions the data evenly across various partitions based on the hash value of the specified key.
Parameters
| Parameter | Description | Required |
| DataFrame | Input DataFrame | True |
| Overwrite default partitions | Flag to overwrite default partitions | False |
| Number of partitions | Integer value specifying number of partitions | False |
| Repartition expression(s) | List of expressions to repartition by | True |
Compiled code
def hashRepartition(spark: SparkSession, in0: DataFrame) -> DataFrame:
return in0.repartition(5, col("customer_id"))
Random Repartitioning
Repartitions without data distribution defined.
Parameters
| Parameter | Description | Required |
| DataFrame | Input DataFrame | True |
| Number of partitions | Integer value specifying number of partitions | True |
Compiled code
def randomRepartition(spark: SparkSession, in0: DataFrame) -> DataFrame:
return in0.repartition(5)
Range Repartitoning
Repartitions the data with tuples having keys within the same range on the same worker.
Parameters
| Parameter | Description | Required |
| DataFrame | Input DataFrame | True |
| Overwrite default partitions | Flag to overwrite default partitions | False |
| Number of partitions | Integer value specifying number of partitions | False |
| Repartition expression(s) with sorting | List of expressions to repartition by with corresponding sorting order | True |
Compiled code
def RepartitionByRange(spark: SparkSession, in0: DataFrame) -> DataFrame:
return in0.repartitionByRange(5, col("customer_id").asc())
Coalesce
Reduces the number of partitions without shuffling the dataset.
Parameters
| Parameter | Description | Required |
| DataFrame | Input DataFrame | True |
| Number of partitions | Integer value specifying number of partitions | True |
Compiled code
def Coalesce(spark: SparkSession, in0: DataFrame) -> DataFrame:
return in0.coalesce(5)
Video demo