[SPARK-51338][INFRA] Add automated CI build for `connect-examples` #50187

vicennial · 2025-03-06T14:57:43Z

What changes were proposed in this pull request?

Adds a build step for the connect-examples directory (specifically the server-library-example project) in the CI steps (build_and_test.yml)

Why are the changes needed?

No existing automated CI build atm

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI run in this PR

Was this patch authored or co-authored using generative AI tooling?

Generated-by: o3-mini-high

hvanhovell · 2025-03-06T16:33:31Z

@HyukjinKwon can you take a look?

HyukjinKwon · 2025-03-06T23:29:06Z

.github/workflows/build_and_test.yml

@@ -92,6 +92,7 @@ jobs:
          pyspark_pandas_modules=`cd dev && python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark-pandas')))"`
          pyspark=`./dev/is-changed.py -m $pyspark_modules`
          pandas=`./dev/is-changed.py -m $pyspark_pandas_modules`
+          connect_examples=`./dev/is-changed.py -m "connect-examples"`


If we want to run this with Java 21 (and other scheduled builds), we would need to add this in https://github.com/apache/spark/blob/master/.github/workflows/build_java21.yml too as an example, .e.g., "connect_examples": "true"

HyukjinKwon · 2025-03-06T23:30:05Z

.github/workflows/build_and_test.yml

+      - name: Build server-library-example
+        run: |
+          cd connect-examples/server-library-example
+          mvn clean package


Can we use build/sbt instead? We use SBT in the PR builder

HyukjinKwon

Looks good otherwise

LuciferYang · 2025-03-08T13:52:11Z

I've noticed a rather peculiar issue here. It seems that the connect-examples project is dependent on a released version of Spark, which means we can only update to a new version after it has been officially released.

For example, when Spark 4.0.0 is released, the version of this project in the published v4.0.0 tag won't be Spark 4.0.0, but rather some previously released version. Am I right about this? If I'm wrong, please correct me.

If I'm right, doesn't that seem a bit strange? Therefore, I still believe that connect-examples module should be placed outside of the Spark Repo, so that we can update and release it in another codebase after Spark 4.0.0(Or other official versions) is out.

spark/connect-examples/server-library-example/pom.xml

Lines 19 to 42 in f40bf4d

    
           <project xmlns="http://maven.apache.org/POM/4.0.0" 
        
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        
                    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 
        
             <modelVersion>4.0.0</modelVersion> 
        
             <groupId>org.apache.connect.examples.serverlibrary</groupId> 
        
             <artifactId>spark-server-library-example</artifactId> 
        
             <version>1.0.0</version> 
        
             <packaging>pom</packaging> 
        
             <modules> 
        
               <module>common</module> 
        
               <module>server</module> 
        
               <module>client</module> 
        
             </modules> 
        
             <properties> 
        
               <maven.compiler.source>17</maven.compiler.source> 
        
               <maven.compiler.target>17</maven.compiler.target> 
        
               <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> 
        
               <scala.binary>2.13</scala.binary> 
        
               <scala.version>2.13.15</scala.version> 
        
               <protobuf.version>3.25.4</protobuf.version> 
        
               <spark.version>4.0.0-preview2</spark.version> 
        
             </properties> 
        
           </project>

What do you think about this? @HyukjinKwon @hvanhovell @dongjoon-hyun @cloud-fan @srowen @yaooqinn

LuciferYang

I think we should discuss #50187 (comment) clearly before merging

srowen · 2025-03-08T17:49:59Z

Can the examples module simply point to SNAPSHOT versions like everything else in the build? the main branch code is always pointing at unreleased code, but on release, those SNAPSHOT versions are changed to the current release version.

LuciferYang · 2025-03-08T17:59:09Z

Can the examples module simply point to SNAPSHOT versions like everything else in the build? the main branch code is always pointing at unreleased code, but on release, those SNAPSHOT versions are changed to the current release version.

Yes, if it can be done this way, I think it's ok.

LuciferYang · 2025-03-12T04:01:59Z

connect-examples/server-library-example/pom.xml

-    <spark.version>4.0.0-preview2</spark.version>
+    <protobuf.version>4.29.3</protobuf.version>
+    <spark.version>4.1.0-SNAPSHOT</spark.version>
+    <connect.guava.version>33.4.0-jre</connect.guava.version>


The parent of this project should inherit from Spark's parent pom.xml, and the project version should be consistent with Spark's version, then spark.version should use ${project.version}.

Otherwise, the releasing script seems unable to auto change the project version to the official version during the release process now(4.1.0-SNAPSHOT -> 4.1.0).

What if we update the release script and add a rule/command to auto-change the project version in this pom file as well? This way, we can satisfy both continuous build compatibility with Spark and be somewhat independent (modulo the dependency on the ASF snapshot repo).

I'd like to avoid inheriting the parent pom as that would lead to the project pulling in Spark's default shading rules, version definitions etc. In this specific case, it wouldn't be favourable as it's intended to demonstrate the extension's development using a minimal set of dependencies (spark-sql-api, spark-connect-client, etc.).

If feasible, it's certainly ok. However, I have a few questions regarding this:

How should the versions of other dependencies be updated? Do they need to be consistent with Spark? For instance, the current Spark uses Scala 2.13.16, but this project is still using 2.13.15.

During the release process, after changing the Spark version (e.g., from 4.0.0-SNAPSHOT to 4.0.0), is it necessary to check the build of this project?

Since it aims to be independent project, why don't we choose to maintain this examples project in a separate branch(no Spark code whatsoever), or even create a separate repository like spark-connect-examples? If it is an independent repository, would it be more convenient to also include examples for clients in other programming languages, such as Go or Swift?

What if we update the release script and add a rule/command to auto-change the project version in this pom file as well? This way, we can satisfy both continuous build compatibility with Spark and be somewhat independent (modulo the dependency on the ASF snapshot repo).

@vicennial Is there any progress on this pr? I think it would be best if we could resolve this issue in Spark 4.0.

@vicennial Is there any progress on this hypothetical plan? Or can we remove this example module from branch-4.0 first?

Thanks for the questions, @LuciferYang. I had been AFK last week, back now

How should the versions of other dependencies be updated? Do they need to be consistent with Spark? For instance, the current Spark uses Scala 2.13.16, but this project is still using 2.13.15.

Some (but not all) dependencies need to be consistent, such as the protobuf version. These would require being updated as the Spark Connect code/deps evolves

During the release process, after changing the Spark version (e.g., from 4.0.0-SNAPSHOT to 4.0.0), is it necessary to check the build of this project?

Since we've decided to add CI tests, I think it would make sense to a final check at time of release as well.

Since it aims to be independent project, why don't we choose to maintain this examples project in a separate branch(no Spark code whatsoever), or even create a separate repository like spark-connect-examples? If it is an independent repository, would it be more convenient to also include examples for clients in other programming languages, such as Go or Swift

I am not opposed to a separate branch/repository and I could see it working but I must admit that I do not know the implications or pros/cons of creating a separate repository under ASF. Perhaps the more seasoned committers may know, any idea @hvanhovell / @HyukjinKwon / @cloud-fan ?

I'd like to avoid inheriting the parent pom as that would lead to the project pulling in Spark's default shading rules, version definitions etc. In this specific case, it wouldn't be favourable as it's intended to demonstrate the extension's development using a minimal set of dependencies (spark-sql-api, spark-connect-client, etc.).

After some consideration, if this project does not want to inherit Spark's parent pom.xml, it might be necessary to first deploy the Spark codebase corresponding to the this commit to a local repository. Then, the current project would need to be built using the -Dmaven.repo.local=/path/to/local/repository option.

Another possible approach is to configure the ASF snapshot repository, but in this case, the project will not obtain a timely snapshot but rather a nightly build.

Thanks for the suggestions @LuciferYang , I am exploring the first option atm

cloud-fan · 2025-03-24T13:36:42Z

AFAIK some downstream libraries (e.g. Delta Lake) also depend on the Spark nightly build for testing. I think it makes sense for connect-example to do the same.

LuciferYang · 2025-03-24T13:52:04Z

AFAIK some downstream libraries (e.g. Delta Lake) also depend on the Spark nightly build for testing. I think it makes sense for connect-example to do the same.

Yes, that would be ok. However, the connect examples within the tag for Spark 4.0.0 should be changed to depend on the release version of Spark 4.0.0 rather than the snapshot, right? This seems to require some additional changes and needs to be completed before the release of version 4.0, correct?

vicennial added 2 commits March 6, 2025 15:52

init

cce86c3

lint

92922c5

github-actions bot added the INFRA label Mar 6, 2025

typo

f2c9f92

vicennial added 2 commits March 6, 2025 18:20

naming

dcf75eb

shading

cd83445

github-actions bot added the BUILD label Mar 6, 2025

make naming consistent

243f389

vicennial marked this pull request as draft March 6, 2025 21:15

HyukjinKwon reviewed Mar 6, 2025

View reviewed changes

HyukjinKwon approved these changes Mar 6, 2025

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-51338] Add automated CI build for connect-examples~~ [SPARK-51338][INFRA] Add automated CI build for connect-examples Mar 6, 2025

LuciferYang requested changes Mar 8, 2025

View reviewed changes

use snapshot

8c77814

github-actions bot added the DOCS label Mar 11, 2025

LuciferYang reviewed Mar 12, 2025

View reviewed changes

[SPARK-51338][INFRA] Add automated CI build for connect-examples #50187

Are you sure you want to change the base?

[SPARK-51338][INFRA] Add automated CI build for connect-examples #50187

Conversation

vicennial commented Mar 6, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

hvanhovell commented Mar 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Mar 8, 2025

Uh oh!

LuciferYang left a comment

Choose a reason for hiding this comment

Uh oh!

srowen commented Mar 8, 2025

Uh oh!

LuciferYang commented Mar 8, 2025

Uh oh!

LuciferYang Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Mar 24, 2025

Uh oh!

LuciferYang commented Mar 24, 2025

Uh oh!

Uh oh!

[SPARK-51338][INFRA] Add automated CI build for `connect-examples` #50187

[SPARK-51338][INFRA] Add automated CI build for `connect-examples` #50187

LuciferYang Mar 12, 2025 •

edited

Loading

LuciferYang Mar 12, 2025 •

edited

Loading