Skip to content

[components] Scrapeless - update actions #17493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

joy-chanboop
Copy link
Contributor

@joy-chanboop joy-chanboop commented Jul 7, 2025

WHY

  • Updated @scrapeless-ai/sdk dependency to version 1.6.0 in package.json.
  • Updated Scraping API action to support Google Search and Google Trends with new parameters for better data retrieval.
  • Update the way to obtain scrapeless client.

Summary by CodeRabbit

  • New Features
    • Added support for Google Trends data scraping alongside Google Search, with comprehensive parameter options.
  • Improvements
    • Enhanced input customization for scraping actions based on selected API server.
    • Improved efficiency by reusing client instances during scraping operations.
    • Updated environment handling for the Scrapeless client to support asynchronous loading.
  • Dependency Updates
    • Upgraded the Scrapeless SDK to version 1.6.0.
  • Bug Fixes
    • Addressed issues with input property handling and client instantiation for more reliable scraping.

- Updated @scrapeless-ai/sdk dependency to version 1.6.0 in package.json.
- Updated Scraping API action to support Google Search and Google Trends with new parameters for better data retrieval.
- Update the way to obtain scrapeless client.
Copy link

vercel bot commented Jul 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
pipedream-docs-redirect-do-not-edit ⬜️ Ignored (Inspect) Jul 7, 2025 6:36am

Copy link
Contributor

coderabbitai bot commented Jul 7, 2025

Walkthrough

The updates refactor and extend Scrapeless action modules, notably making the Scrapeless client initialization asynchronous and updating its usage across actions. The scraping API action now supports Google Trends with extensive parameterization. Versions were incremented, input property logic was refactored, and the Scrapeless SDK dependency was updated to version 1.6.0.

Changes

File(s) Change Summary
components/scrapeless/actions/crawler/crawler.mjs Refactored additionalProps logic for input properties, updated version, moved method order, and improved client usage by caching instance.
components/scrapeless/actions/scraping-api/scraping-api.mjs Added Google Trends support, introduced dynamic additionalProps for both Google Search and Trends, expanded parameter options, improved client usage, removed old prop logic, and updated version.
components/scrapeless/actions/universal-scraping-api/universal-scraping-api.mjs Made client initialization asynchronous in run, reordered method, and incremented version.
components/scrapeless/scrapeless.app.mjs Changed _scrapelessClient from synchronous to asynchronous, switched to dynamic import, set environment variables, and updated error handling for missing API key.
components/scrapeless/package.json Updated "@scrapeless-ai/sdk" dependency from "^1.4.0" to "1.6.0".

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Action
    participant ScrapelessApp
    participant ScrapelessClient

    User->>Action: Trigger run()
    Action->>+ScrapelessApp: await _scrapelessClient()
    ScrapelessApp->>+ScrapelessClient: (Dynamic import, instantiate with API key)
    ScrapelessApp-->>-Action: ScrapelessClient instance
    Action->>ScrapelessClient: Call appropriate method (crawl, scrape, universal.scrape, etc.)
    ScrapelessClient-->>Action: Return results
    Action-->>User: Respond with summary and results
Loading

Poem

In the warren of code, a fresh breeze blew,
Async clients hopping, with versions anew.
Trends and searches, now both in the mix,
With props refactored for clever new tricks.
From package to action, the changes are clear—
A rabbit’s delight: Scrapeless runs with cheer!
🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

components/scrapeless/actions/scraping-api/scraping-api.mjs

Oops! Something went wrong! :(

ESLint: 8.57.1

Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs
at Object.getPackageJSONURL (node:internal/modules/package_json_reader:255:9)
at packageResolve (node:internal/modules/esm/resolve:767:81)
at moduleResolve (node:internal/modules/esm/resolve:853:18)
at defaultResolve (node:internal/modules/esm/resolve:983:11)
at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:801:12)
at #cachedDefaultResolve (node:internal/modules/esm/loader:725:25)
at ModuleLoader.resolve (node:internal/modules/esm/loader:708:38)
at ModuleLoader.getModuleJobForImport (node:internal/modules/esm/loader:309:38)
at #link (node:internal/modules/esm/module_job:202:49)

components/scrapeless/actions/crawler/crawler.mjs

Oops! Something went wrong! :(

ESLint: 8.57.1

Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs
at Object.getPackageJSONURL (node:internal/modules/package_json_reader:255:9)
at packageResolve (node:internal/modules/esm/resolve:767:81)
at moduleResolve (node:internal/modules/esm/resolve:853:18)
at defaultResolve (node:internal/modules/esm/resolve:983:11)
at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:801:12)
at #cachedDefaultResolve (node:internal/modules/esm/loader:725:25)
at ModuleLoader.resolve (node:internal/modules/esm/loader:708:38)
at ModuleLoader.getModuleJobForImport (node:internal/modules/esm/loader:309:38)
at #link (node:internal/modules/esm/module_job:202:49)

components/scrapeless/scrapeless.app.mjs

Oops! Something went wrong! :(

ESLint: 8.57.1

Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs
at Object.getPackageJSONURL (node:internal/modules/package_json_reader:255:9)
at packageResolve (node:internal/modules/esm/resolve:767:81)
at moduleResolve (node:internal/modules/esm/resolve:853:18)
at defaultResolve (node:internal/modules/esm/resolve:983:11)
at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:801:12)
at #cachedDefaultResolve (node:internal/modules/esm/loader:725:25)
at ModuleLoader.resolve (node:internal/modules/esm/loader:708:38)
at ModuleLoader.getModuleJobForImport (node:internal/modules/esm/loader:309:38)
at #link (node:internal/modules/esm/module_job:202:49)

  • 1 others
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

vercel bot commented Jul 7, 2025

@joy-chanboop is attempting to deploy a commit to the Pipedreamers Team on Vercel.

A member of the Team first needs to authorize it.

@adolfo-pd adolfo-pd added the User submitted Submitted by a user label Jul 7, 2025
@pipedream-component-development
Copy link
Collaborator

Thank you so much for submitting this! We've added it to our backlog to review, and our team has been notified.

@pipedream-component-development
Copy link
Collaborator

Thanks for submitting this PR! When we review PRs, we follow the Pipedream component guidelines. If you're not familiar, here's a quick checklist:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
components/scrapeless/scrapeless.app.mjs (1)

25-40: Remove Unnecessary Environment Variables

Lines setting SCRAPELESS_IS_ONLINE and SCRAPELESS_LOG_ROOT_DIR aren’t required by Scrapeless SDK v1.6.0 and should be removed unless you have a custom, documented use case:

  • File: components/scrapeless/scrapeless.app.mjs
  • Remove at lines 26–27:
-      process.env.SCRAPELESS_IS_ONLINE = "true";
-      process.env.SCRAPELESS_LOG_ROOT_DIR = "/tmp";

Keep only the environment variable(s) that are officially required (e.g., SCRAPELESS_API_KEY).

🧹 Nitpick comments (3)
components/scrapeless/package.json (1)

17-17: Consider using caret range for SDK dependency

The dependency version was changed from ^1.4.0 to a fixed 1.6.0. While this ensures compatibility with the async client changes, it prevents automatic patch updates that might include security fixes or bug fixes.

Consider using a caret range to allow patch updates:

-    "@scrapeless-ai/sdk": "1.6.0"
+    "@scrapeless-ai/sdk": "^1.6.0"
components/scrapeless/actions/scraping-api/scraping-api.mjs (2)

148-150: Clarify the tbs parameter description

The description "(to be searched) parameter" is unclear. Consider updating it to be more descriptive.

-        description: "(to be searched) parameter defines advanced search parameters that aren't possible in the regular query field. (e.g., advanced search for patents, dates, news, videos, images, apps, or text contents).",
+        description: "The tbs (to be searched) parameter defines advanced search parameters that aren't possible in the regular query field. (e.g., advanced search for patents, dates, news, videos, images, apps, or text contents).",

179-182: Clarify the tbm parameter description

The description "(to be matched) parameter" could be clearer.

-        description: "(to be matched) parameter defines the type of search you want to do.\n\nIt can be set to:\n`(no tbm parameter)`: `regular Google Search`,\n`isch`: `Google Images API`,\n`lcl` - `Google Local API`\n`vid`: `Google Videos API`,\n`nws`: `Google News API`,\n`shop`: `Google Shopping API`,\n`pts`: `Google Patents API`,\nor any other Google service.",
+        description: "The tbm (to be matched) parameter defines the type of search you want to do.\n\nIt can be set to:\n`(no tbm parameter)`: `regular Google Search`,\n`isch`: `Google Images API`,\n`lcl`: `Google Local API`,\n`vid`: `Google Videos API`,\n`nws`: `Google News API`,\n`shop`: `Google Shopping API`,\n`pts`: `Google Patents API`,\nor any other Google service.",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 827dd7b and 1c15185.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (5)
  • components/scrapeless/actions/crawler/crawler.mjs (3 hunks)
  • components/scrapeless/actions/scraping-api/scraping-api.mjs (5 hunks)
  • components/scrapeless/actions/universal-scraping-api/universal-scraping-api.mjs (2 hunks)
  • components/scrapeless/package.json (1 hunks)
  • components/scrapeless/scrapeless.app.mjs (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
components/scrapeless/package.json (1)
Learnt from: jcortes
PR: PipedreamHQ/pipedream#14935
File: components/sailpoint/package.json:15-18
Timestamp: 2024-12-12T19:23:09.039Z
Learning: When developing Pipedream components, do not add built-in Node.js modules like `fs` to `package.json` dependencies, as they are native modules provided by the Node.js runtime.
components/scrapeless/actions/scraping-api/scraping-api.mjs (1)
Learnt from: js07
PR: PipedreamHQ/pipedream#17375
File: components/zerobounce/actions/get-validation-results-file/get-validation-results-file.mjs:23-27
Timestamp: 2025-07-01T17:07:48.193Z
Learning: "dir" props in Pipedream components are hidden in the component form and not user-facing, so they don't require labels or descriptions for user clarity.
components/scrapeless/actions/universal-scraping-api/universal-scraping-api.mjs (2)
Learnt from: GTFalcao
PR: PipedreamHQ/pipedream#12731
File: components/hackerone/actions/get-members/get-members.mjs:3-28
Timestamp: 2024-07-04T18:11:59.822Z
Learning: When exporting a summary message in the `run` method of an action, ensure the message is correctly formatted. For example, in the `hackerone-get-members` action, the correct format is `Successfully retrieved ${response.data.length} members`.
Learnt from: GTFalcao
PR: PipedreamHQ/pipedream#12731
File: components/hackerone/actions/get-members/get-members.mjs:3-28
Timestamp: 2024-10-08T15:33:38.240Z
Learning: When exporting a summary message in the `run` method of an action, ensure the message is correctly formatted. For example, in the `hackerone-get-members` action, the correct format is `Successfully retrieved ${response.data.length} members`.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Lint Code Base
  • GitHub Check: Publish TypeScript components
  • GitHub Check: Verify TypeScript components
  • GitHub Check: pnpm publish
🔇 Additional comments (5)
components/scrapeless/actions/universal-scraping-api/universal-scraping-api.mjs (1)

63-88: Async client initialization correctly implemented

The changes properly handle the async client initialization and cache the client instance for reuse. This aligns with the updated _scrapelessClient method.

components/scrapeless/actions/crawler/crawler.mjs (2)

29-51: Clean refactoring of additionalProps method

The refactored method is more readable and maintains the same functionality. Good improvement!


68-86: Async client handling and improved formatting

Good implementation of async client caching and the backtick formatting for the URL in the summary message improves readability.

components/scrapeless/actions/scraping-api/scraping-api.mjs (2)

3-36: Well-structured imports and Google Trends support

Good addition of Google Trends support and proper use of constants for option values. The version increment appropriately reflects the new functionality.


282-368: Excellent implementation of async client and Google Trends support

The async client handling is consistent with other actions, and the Google Trends implementation is well-structured. The added logging will be helpful for debugging.

@joy-chanboop
Copy link
Contributor Author

Hi @jcortes ,
I’ve opened a new PR. I’m looking forward to your review — please let me know if you have any questions. Thanks a lot!

Copy link
Collaborator

@jcortes jcortes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @joy-chanboop it is looking great, just a few suggestions other than that it is Ready for QA!

@jcortes jcortes moved this to Changes Required in Component (Source and Action) Backlog Jul 7, 2025
@jcortes jcortes moved this from Changes Required to Ready for QA in Component (Source and Action) Backlog Jul 7, 2025
@vunguyenhung vunguyenhung moved this from Ready for QA to In QA in Component (Source and Action) Backlog Jul 8, 2025
@vunguyenhung vunguyenhung moved this from In QA to Ready for Release in Component (Source and Action) Backlog Jul 8, 2025
@vunguyenhung
Copy link
Collaborator

Hi everyone, all test cases are passed! Ready for release!

Test report
https://vunguyenhung.notion.site/components-Scrapeless-update-actions-229bf548bb5e8162a2cbe13147a83779

@jcortes
Copy link
Collaborator

jcortes commented Jul 8, 2025

Hi @joy-chanboop

Please increase the minor version of the following components as well
components/scrapeless/actions/get-scrape-result/get-scrape-result.mjs
components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs
components/scrapeless/actions/submit-scrape-job/submit-scrape-job.mjs

Also change the version of components/scrapeless/package.json to 0.2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User submitted Submitted by a user
Projects
Status: Ready for Release
Development

Successfully merging this pull request may close these issues.

5 participants