Skip to main content
Dependencies:
  • ProphecySparkBasicsPython 0.2.25+
  • ProphecySparkBasicsScala 0.0.1+
Cluster requirements:
  • UC dedicated clusters not supported
  • UC standard clusters 14.3+ supported
  • Livy clusters 3.0.1+ supported
Use the SampleRows gem to sample records by choosing a specific number or percentage of records.

Parameters

ParameterDescription
Sampling strategyAn option between sampling by number of records or percentage of records
Sampling ratioThe ratio of records that you wish to sample
Random seedA number that lets you reproduce the random sample
With replacementWhen enabled, this allows records to be returned to the sample pool after selection

Example code

To see the compiled code of your project, switch to the Code view in the project header.
def SampleRows_1(spark: SparkSession, in0: DataFrame) -> DataFrame:
 return in0.sample(withReplacement = False, fraction = 0.5)