← Back to team overview

data-platform team mailing list archive

[Merge] soss/+source/charmed-spark:release-3.5.6-ubuntu0 into soss/+source/charmed-spark:lp-3.5.6

 

Alex Batisse has proposed merging soss/+source/charmed-spark:release-3.5.6-ubuntu0 into soss/+source/charmed-spark:lp-3.5.6.

Commit message:
[DPE-8333] Release spark-3.5.6-ubuntu0

Requested reviews:
  Canonical Data Platform (data-platform)

For more details, see:
https://code.launchpad.net/~data-platform/soss/+source/charmed-spark/+git/charmed-spark/+merge/492396

We originally had a test which was never ending, meaning our test suite was timing out. 
I tracked down the culprit and replaced the problematic statement with a timeout-aware alternative.

The weird thing is that I can reproduce this issue locally, but only when I run the whole test suite. When I only select the test class containing this test, the test passes just fine.

The tests looks good, except for this one test which now appears as failing.

 +-----------------------------------------------------+-----------+--------+----------+---------+---------+-------+-----------------------+--------------+
|                         Test                        | Succeeded | Failed | Canceled | Ignored | Pending | Total | Executed test modules | Failed Tests |
+-----------------------------------------------------+-----------+--------+----------+---------+---------+-------+-----------------------+--------------+
| 20250911H08M32_release-3.5.6-ubuntu0_22490_test.out |   32240   |   1    |    39    |   141   |    0    | 32421 |         46574         |      1       |
| 20250911H08M32_release-3.5.6-ubuntu0_24551_test.out |   32240   |   1    |    39    |   141   |    0    | 32421 |         46466         |      1       |
|  20250911H08M32_release-3.5.6-ubuntu0_9644_test.out |   32240   |   1    |    39    |   141   |    0    | 32421 |         46392         |      1       |
+-----------------------------------------------------+-----------+--------+----------+---------+---------+-------+-----------------------+--------------+



Failed tests:
---------------------------------------------------------------------------
Filename: 20250911H08M32_release-3.5.6-ubuntu0_22490_test.out
Number of failed tests: 1
===========================================================================
	 - SPARK-25888: using external shuffle service fetching disk persisted blocks *** FAILED ***
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Filename: 20250911H08M32_release-3.5.6-ubuntu0_24551_test.out
Number of failed tests: 1
===========================================================================
	 - SPARK-25888: using external shuffle service fetching disk persisted blocks *** FAILED ***
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Filename: 20250911H08M32_release-3.5.6-ubuntu0_9644_test.out
Number of failed tests: 1
===========================================================================
	 - SPARK-25888: using external shuffle service fetching disk persisted blocks *** FAILED ***
---------------------------------------------------------------------------
-- 
Your team Canonical Data Platform is requested to review the proposed merge of soss/+source/charmed-spark:release-3.5.6-ubuntu0 into soss/+source/charmed-spark:lp-3.5.6.
diff --git a/.launchpad.yaml b/.launchpad.yaml
new file mode 100644
index 0000000..bcdc871
--- /dev/null
+++ b/.launchpad.yaml
@@ -0,0 +1,106 @@
+pipeline:
+- build
+
+jobs:
+    build:
+        series: jammy
+        architectures: amd64
+        packages:
+          - wget
+          - openjdk-17-jdk
+          - maven
+          - git
+          - python3
+          - python3-pip
+          - python3-setuptools
+          - xmlstarlet
+          - zip
+        environment:
+          NO_PROXY: localhost
+          ARTIFACTORY_BUILDING_URL: "https://canonical.jfrog.io/artifactory/dataplatform-spark/";
+          ARTIFACTORY_STAGING_URL: "https://canonical.jfrog.io/artifactory/dataplatform-spark-staging/";
+          RELEASE_NAME: "k8s"
+        run: |-
+            # try to read branch name (works only locally)
+            
+            # We need to copy files under home due to the confined nature of the snap
+            cp pom.xml ~/pom.xml
+            SPARK_VERSION=$(xmlstarlet sel -N x="http://maven.apache.org/POM/4.0.0"; -t -m "//x:project" -v "_:version" ~/pom.xml)
+            rm ~/pom.xml
+            echo "Spark version: $SPARK_VERSION"
+            
+            BRANCH_NAME=$(git branch --show-current)
+            echo "branch_name: $BRANCH_NAME"
+            # check if branch name is valid
+            if [ -z "$BRANCH_NAME" ]
+            then
+              # get branch revision id from git HEAD file
+              echo "No branch name given from git command! Try to get it from .git folder"
+              git_rev=$(cat .git/HEAD)
+              while read line; do
+                current_rev=$( echo $line | awk -F ' ' '{print $1}' )
+                branch_name=$( echo $line | awk -F ' ' '{print $2}' | awk -F '/' '{print $NF}' )
+                if [[ $current_rev = $git_rev ]]
+                then
+                  echo "Branch name: $branch_name"
+                  export BRANCH_NAME=$branch_name
+                fi
+              done < .git/packed-refs
+            else
+              export JAVA_OPTS=""
+              sed -i 's/<proxies>/<!-- <proxies>/' settings.xml
+              sed -i 's/<\/proxies>/<\/proxies> -->/' settings.xml
+            fi
+            [ ! -z "$BRANCH_NAME" ] && echo "Current branch: $BRANCH_NAME"
+            if [[ "$BRANCH_NAME" != "lp-"* ]]; then
+              export ARTIFACTORY_URL=$ARTIFACTORY_STAGING_URL
+              export RELEASE=false
+            else
+              export ARTIFACTORY_URL=$ARTIFACTORY_BUILDING_URL
+              export RELEASE=true
+            fi
+            echo "Selected artifactory: $ARTIFACTORY_URL"
+            echo "Release artifact: $RELEASE"
+            # check artifactory credentials
+            [ -z "PIP_INDEX_URL" ] && exit 1 
+            [ ! -z "$PIP_INDEX_URL" ] && echo "Env variable exists :) "
+            [ ! -z "$PIP_INDEX_URL" ] && export USERNAME=$(echo "${PIP_INDEX_URL#https://}"; | awk -F '@' '{print $1}' | awk -F ':' '{print $1}')
+            [ ! -z "$PIP_INDEX_URL" ] && export PASSWORD=$(echo "${PIP_INDEX_URL#https://}"; | awk -F '@' '{print $1}' | awk -F ':' '{print $2}')
+            echo "ARTIFACTORY TO BE USED: $ARTIFACTORY_URL"
+            echo "ARTIFACTORY USERNAME: $USERNAME"
+            # check release name
+            current_date=$(date '+%Y%m%d%H%M%S')
+            CANONICAL_PATCH_VERSION=$(cat PATCH_VERSION )
+            CANONICAL_PATCH_VERSION+="-$current_date"
+            export CANONICAL_PATCH_VERSION=$CANONICAL_PATCH_VERSION
+            echo "Canonical patch: $RELEASE_NAME"
+            # set java version to Java 17
+            update-java-alternatives -s $(update-java-alternatives -l | grep '17' | cut -d " " -f1) || echo '.'
+            java -version
+            # configure setting for maven repository
+            mkdir ~/.m2
+            mv settings.xml  ~/.m2/settings.xml
+            cat ~/.m2/settings.xml
+            # start building process
+            echo "Start building Spark..."
+            # build spark
+            ./dev/make-distribution.sh --pip --tgz --name $RELEASE_NAME -Pkubernetes -Phive -Phive-thriftserver -Pyarn -Pvolcano -Phadoop-cloud
+            FILE=spark-$SPARK_VERSION-$CANONICAL_PATCH_VERSION-bin-$RELEASE_NAME.tgz
+            if [ -f "$FILE" ]; then
+              echo "$FILE exists. Spark is correctly built."
+            else
+              echo "$FILE does not exist. Exit..."
+              exit 1
+            fi
+            # compute checksum
+            sha512sum spark-$SPARK_VERSION-$CANONICAL_PATCH_VERSION-bin-$RELEASE_NAME.tgz > spark-$SPARK_VERSION-$CANONICAL_PATCH_VERSION-bin-$RELEASE_NAME.tgz.sha512
+            cd ~/.m2
+            # compress local maven repository
+            zip -rq repository.zip repository
+            cp repository.zip /build/lpci/project/
+            cd /build/lpci/project/
+        output:
+            paths:
+              - spark-*-bin-*.tgz
+              - spark-*-bin-*.tgz.sha512
+              - repository.zip
diff --git a/PATCH_VERSION b/PATCH_VERSION
new file mode 100644
index 0000000..852dc9e
--- /dev/null
+++ b/PATCH_VERSION
@@ -0,0 +1 @@
+ubuntu0
diff --git a/core/src/test/scala/org/apache/spark/ExternalShuffleServiceSuite.scala b/core/src/test/scala/org/apache/spark/ExternalShuffleServiceSuite.scala
index 68434af..acf3ba8 100644
--- a/core/src/test/scala/org/apache/spark/ExternalShuffleServiceSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ExternalShuffleServiceSuite.scala
@@ -126,7 +126,7 @@ class ExternalShuffleServiceSuite extends ShuffleSuite with BeforeAndAfterAll wi
         .map { i => (i, broadcast.value.size) }
         .persist(StorageLevel.DISK_ONLY)
 
-      rdd.count()
+      rdd.countApprox(10000) // 10 sec
 
       val blockId = RDDBlockId(rdd.id, 0)
       val bms = eventually(timeout(2.seconds), interval(100.milliseconds)) {
diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh
index ef7c010..99e20f9 100755
--- a/dev/make-distribution.sh
+++ b/dev/make-distribution.sh
@@ -36,7 +36,7 @@ MAKE_TGZ=false
 MAKE_PIP=false
 MAKE_R=false
 NAME=none
-MVN="$SPARK_HOME/build/mvn"
+MVN="mvn"
 
 function exit_with_usage {
   set +x
@@ -289,7 +289,7 @@ if [ -d "$SPARK_HOME/R/lib/SparkR" ]; then
 fi
 
 if [ "$MAKE_TGZ" == "true" ]; then
-  TARDIR_NAME=spark-$VERSION-bin-$NAME
+  TARDIR_NAME=spark-$VERSION-$CANONICAL_PATCH_VERSION-bin-$NAME
   TARDIR="$SPARK_HOME/$TARDIR_NAME"
   rm -rf "$TARDIR"
   cp -r "$DISTDIR" "$TARDIR"
@@ -297,6 +297,6 @@ if [ "$MAKE_TGZ" == "true" ]; then
   if [ "$(uname -s)" = "Darwin" ]; then
     TAR="tar --no-mac-metadata --no-xattrs --no-fflags"
   fi
-  $TAR -czf "spark-$VERSION-bin-$NAME.tgz" -C "$SPARK_HOME" "$TARDIR_NAME"
+  $TAR -czf "spark-$VERSION-$CANONICAL_PATCH_VERSION-bin-$NAME.tgz" -C "$SPARK_HOME" "$TARDIR_NAME"
   rm -rf "$TARDIR"
 fi
diff --git a/pom.xml b/pom.xml
index 68e2c42..b70fc8f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -112,11 +112,11 @@
   <properties>
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
     <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
-    <java.version>1.8</java.version>
+    <java.version>17</java.version>
     <maven.compiler.source>${java.version}</maven.compiler.source>
     <maven.compiler.target>${java.version}</maven.compiler.target>
-    <maven.version>3.9.6</maven.version>
-    <exec-maven-plugin.version>3.1.0</exec-maven-plugin.version>
+    <maven.version>3.6.3</maven.version>
+    <exec-maven-plugin.version>1.6.0</exec-maven-plugin.version>
     <sbt.project.name>spark</sbt.project.name>
     <asm.version>9.5</asm.version>
     <slf4j.version>2.0.7</slf4j.version>
diff --git a/settings.xml b/settings.xml
new file mode 100644
index 0000000..dcb5cf7
--- /dev/null
+++ b/settings.xml
@@ -0,0 +1,81 @@
+<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+  xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 https://maven.apache.org/xsd/settings-1.0.0.xsd";>
+  <servers>
+    <server>
+      <username>${env.USERNAME}</username>
+      <password>${env.PASSWORD}</password>
+      <id>central</id>
+    </server>
+    <server>
+      <username>${env.USERNAME}</username>
+      <password>${env.PASSWORD}</password>
+      <id>snapshots</id>
+    </server>
+  </servers>
+  <profiles>
+    <profile>
+      <repositories>
+        <repository>
+          <snapshots>
+            <enabled>false</enabled>
+          </snapshots>
+          <id>central</id>
+          <name>dataplatform-spark</name>
+          <url>${env.ARTIFACTORY_URL}</url>
+        </repository>
+	<repository>
+          <snapshots />
+          <id>snapshots</id>
+          <name>dataplatform-spark</name>
+          <url>${env.ARTIFACTORY_URL}</url>
+        </repository>
+      </repositories>
+      <pluginRepositories>
+        <pluginRepository>
+          <snapshots>
+            <enabled>false</enabled>
+          </snapshots>
+          <id>central</id>
+          <name>dataplatform-spark</name>
+          <url>${env.ARTIFACTORY_URL}</url>
+        </pluginRepository>
+        <pluginRepository>
+          <snapshots />
+          <id>snapshots</id>
+          <name>dataplatform-spark</name>
+          <url>${env.ARTIFACTORY_URL}</url>
+        </pluginRepository>
+      </pluginRepositories>
+      <id>artifactory</id>
+    </profile>
+  </profiles>
+  <activeProfiles>
+    <activeProfile>artifactory</activeProfile>
+  </activeProfiles>
+  <proxies>
+    <proxy>
+      <id>http_proxy</id>
+      <active>true</active>
+      <protocol>http</protocol>
+      <host>10.10.10.1</host>
+      <port>8222</port>
+      <nonProxyHosts>localhost</nonProxyHosts>
+    </proxy>
+    <proxy>
+      <id>https_proxy</id>
+      <active>true</active>
+      <protocol>https</protocol>
+      <host>10.10.10.1</host>
+      <port>8222</port>
+      <nonProxyHosts>localhost</nonProxyHosts>
+    </proxy>
+  </proxies>
+  <mirrors>
+    <mirror>
+      <id>central</id>
+      <name>Maven Repository Manager running on canonical.jfrog.io</name>
+      <url>${env.ARTIFACTORY_URL}</url>
+      <mirrorOf>*</mirrorOf>
+    </mirror>
+  </mirrors>
+</settings>