Skip to content

Commit 10c39b8

Browse files
authored
Merge pull request #993 from ScrapeGraphAI/main
allignement
2 parents 1ee7640 + 6989e1a commit 10c39b8

27 files changed

+817
-124
lines changed

CHANGELOG.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,75 @@
1+
## [1.59.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.58.0...v1.59.0) (2025-06-24)
2+
3+
4+
### Features
5+
6+
* removed sposnsors ([288c69a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/288c69a862f34b999db476e669ff97c00afacde3))
7+
8+
## [1.58.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.57.0...v1.58.0) (2025-06-21)
9+
10+
11+
### Features
12+
13+
* add new oss link ([0c2481f](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/0c2481fffebca355e542ae420ee1bf4cade8e5e3))
14+
15+
16+
### Docs
17+
18+
* add links to other language versions of README ([07dec35](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/07dec35f1bf95842ee55b17796bb45f2db0f44b3))
19+
20+
## [1.57.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.56.0...v1.57.0) (2025-06-13)
21+
22+
23+
### Features
24+
25+
* add markdownify endpoint ([7340375](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/73403755da1e4c3065e91d834c59f6d8c1825763))
26+
27+
## [1.56.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.55.0...v1.56.0) (2025-06-13)
28+
29+
30+
### Features
31+
32+
* add scrapegraphai integration ([94e9ebd](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/94e9ebd28061f8313bb23074b4db3406cf4db0c9))
33+
34+
## [1.55.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.54.1...v1.55.0) (2025-06-07)
35+
36+
37+
### Features
38+
39+
* add adv ([cd29791](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/cd29791894325c54f1dec1d2a5f6456800beb63e))
40+
* update logs ([8c54162](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/8c541620879570c46f32708c7e488e9a4ca0ea3e))
41+
42+
## [1.54.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.54.0...v1.54.1) (2025-06-06)
43+
44+
45+
### Bug Fixes
46+
47+
* bug on generate answer node ([e846a14](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/e846a1415506a58f7bc8b76ac56ba0b6413178ba))
48+
49+
## [1.54.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.53.0...v1.54.0) (2025-06-06)
50+
51+
52+
### Features
53+
54+
* add grok integration ([0c476a4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/0c476a4a7bbbec3883f505cd47bcffdcd2d9e5fd))
55+
56+
57+
### Bug Fixes
58+
59+
* grok integration and add new grok models ([3f18272](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/3f1827274c60a2729233577666d2fa446c48c4ba))
60+
61+
62+
### chore
63+
64+
* enhanced a readme ([68bb34c](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/68bb34cc5e63b8a1d5acc61b9b61f9ea716a2a51))
65+
66+
67+
### CI
68+
69+
* **release:** 1.52.0-beta.1 [skip ci] ([7adb0f1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/7adb0f1df1efc4e6ada1134f6e53e4d6b072a608))
70+
* **release:** 1.52.0-beta.2 [skip ci] ([386b46a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/386b46a8692c8c18000bb071fc8f312adc3ad05e))
71+
* **release:** 1.54.0-beta.1 [skip ci] ([77d4432](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/77d44321a1d41e10ac6aa13b526a49e718bd7c5d))
72+
173
## [1.54.0-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.53.0...v1.54.0-beta.1) (2025-06-06)
274

375

README.md

Lines changed: 8 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## 🚀 **Looking for an even faster and simpler way to scrape at scale (only 5 lines of code)? ** Check out our enhanced version at [**ScrapeGraphAI.com**](https://scrapegraphai.com/?utm_source=github&utm_medium=readme&utm_campaign=oss_cta&ut#m_content=top_banner)! 🚀
1+
## 🚀 **Looking for an even faster and simpler way to scrape at scale (only 5 lines of code)?** Check out our enhanced version at [**ScrapeGraphAI.com**](https://scrapegraphai.com/?utm_source=github&utm_medium=readme&utm_campaign=oss_cta&ut#m_content=top_banner)! 🚀
22

33
---
44

@@ -7,6 +7,10 @@
77
[English](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/README.md) | [中文](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/chinese.md) | [日本語](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/japanese.md)
88
| [한국어](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/korean.md)
99
| [Русский](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/russian.md) | [Türkçe](https://github.com/VinciGit00/Scrapegraph-ai/blob/main/docs/turkish.md)
10+
| [Deutsch](https://www.readme-i18n.com/ScrapeGraphAI/Scrapegraph-ai?lang=de)
11+
| [Español](https://www.readme-i18n.com/ScrapeGraphAI/Scrapegraph-ai?lang=es)
12+
| [français](https://www.readme-i18n.com/ScrapeGraphAI/Scrapegraph-ai?lang=fr)
13+
| [Português](https://www.readme-i18n.com/ScrapeGraphAI/Scrapegraph-ai?lang=pt)
1014

1115

1216
[![Downloads](https://img.shields.io/pepy/dt/scrapegraphai?style=for-the-badge)](https://pepy.tech/project/scrapegraphai)
@@ -39,7 +43,7 @@ You can find more informations at the following [link](https://scrapegraphai.com
3943
- **API**: [Documentation](https://docs.scrapegraphai.com/introduction)
4044
- **SDKs**: [Python](https://docs.scrapegraphai.com/sdks/python), [Node](https://docs.scrapegraphai.com/sdks/javascript)
4145
- **LLM Frameworks**: [Langchain](https://docs.scrapegraphai.com/integrations/langchain), [Llama Index](https://docs.scrapegraphai.com/integrations/llamaindex), [Crew.ai](https://docs.scrapegraphai.com/integrations/crewai), [CamelAI](https://github.com/camel-ai/camel)
42-
- **Low-code Frameworks**: [Pipedream](https://pipedream.com/apps/scrapegraphai), [Bubble](https://bubble.io/plugin/scrapegraphai-1745408893195x213542371433906180), [Zapier](https://zapier.com/apps/scrapegraphai/integrations), [n8n](http://localhost:5001/dashboard), [LangFlow](https://www.langflow.org)
46+
- **Low-code Frameworks**: [Pipedream](https://pipedream.com/apps/scrapegraphai), [Bubble](https://bubble.io/plugin/scrapegraphai-1745408893195x213542371433906180), [Zapier](https://zapier.com/apps/scrapegraphai/integrations), [n8n](http://localhost:5001/dashboard), [LangFlow](https://www.langflow.org), [Dify](https://dify.ai)
4347
- **MCP server**: [Link](https://smithery.ai/server/@ScrapeGraphAI/scrapegraph-mcp)
4448

4549
## 🚀 Quick install
@@ -183,22 +187,6 @@ We offer SDKs in both Python and Node.js, making it easy to integrate into your
183187

184188
The Official API Documentation can be found [here](https://docs.scrapegraphai.com/).
185189

186-
## 🏆 Sponsors
187-
<div style="text-align: center;">
188-
<a href="https://2ly.link/1zaXG">
189-
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/browserbase_logo.png" alt="Browserbase" style="width: 10%;">
190-
</a>
191-
<a href="https://2ly.link/1zNiz">
192-
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/serp_api_logo.png" alt="SerpAPI" style="width: 10%;">
193-
</a>
194-
<a href="https://2ly.link/1zNj1">
195-
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/transparent_stat.png" alt="Stats" style="width: 15%;">
196-
</a>
197-
<a href="https://scrape.do">
198-
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/scrapedo.png" alt="Stats" style="width: 11%;">
199-
</a>
200-
</div>
201-
202190
## 📈 Telemetry
203191
We collect anonymous usage metrics to enhance our package's quality and user experience. The data helps us prioritize improvements and ensure compatibility. If you wish to opt-out, set the environment variable SCRAPEGRAPHAI_TELEMETRY_ENABLED=false. For more information, please refer to the documentation [here](https://scrapegraph-ai.readthedocs.io/en/latest/scrapers/telemetry.html).
204192

@@ -235,3 +223,5 @@ ScrapeGraphAI is licensed under the MIT License. See the [LICENSE](https://githu
235223
- ScrapeGraphAI is meant to be used for data exploration and research purposes only. We are not responsible for any misuse of the library.
236224

237225
Made with ❤️ by [ScrapeGraph AI](https://scrapegraphai.com)
226+
227+
[Scarf tracking](https://static.scarf.sh/a.png?x-pxid=102d4b8c-cd6a-4b9e-9a16-d6d141b9212d)

docs/assets/scrapedo.png

-19.2 KB
Binary file not shown.

docs/assets/scrapeless.png

-22.2 KB
Binary file not shown.

docs/assets/serp_api_logo.png

-15.1 KB
Binary file not shown.

docs/assets/transparent_stat.png

-217 KB
Binary file not shown.

examples/markdownify/.env.example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
SCRAPEGRAPH_API_KEY=your SCRAPEGRAPH_API_KEY
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
"""
2+
Example script demonstrating the markdownify functionality
3+
"""
4+
5+
import os
6+
from dotenv import load_dotenv
7+
from scrapegraph_py import Client
8+
from scrapegraph_py.logger import sgai_logger
9+
10+
def main():
11+
# Load environment variables
12+
load_dotenv()
13+
14+
# Set up logging
15+
sgai_logger.set_logging(level="INFO")
16+
17+
# Initialize the client
18+
api_key = os.getenv("SCRAPEGRAPH_API_KEY")
19+
if not api_key:
20+
raise ValueError("SCRAPEGRAPH_API_KEY environment variable not found")
21+
sgai_client = Client(api_key=api_key)
22+
23+
# Example 1: Convert a website to Markdown
24+
print("Example 1: Converting website to Markdown")
25+
print("-" * 50)
26+
response = sgai_client.markdownify(
27+
website_url="https://example.com"
28+
)
29+
print("Markdown output:")
30+
print(response["result"]) # Access the result key from the dictionary
31+
print("\nMetadata:")
32+
print(response.get("metadata", {})) # Use get() with default value
33+
print("\n" + "=" * 50 + "\n")
34+
if __name__ == "__main__":
35+
main()

examples/markdownify/readme.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Markdownify Graph Example
2+
3+
This example demonstrates how to use the Markdownify graph to convert HTML content to Markdown format.
4+
5+
## Features
6+
7+
- Convert HTML content to clean, readable Markdown
8+
- Support for both URL and direct HTML input
9+
- Maintains formatting and structure of the original content
10+
- Handles complex HTML elements and nested structures
11+
12+
## Usage
13+
14+
```python
15+
from scrapegraphai import Client
16+
from scrapegraphai.logger import sgai_logger
17+
18+
# Set up logging
19+
sgai_logger.set_logging(level="INFO")
20+
21+
# Initialize the client
22+
sgai_client = Client(api_key="your-api-key")
23+
24+
# Example 1: Convert a website to Markdown
25+
response = sgai_client.markdownify(
26+
website_url="https://example.com"
27+
)
28+
print(response.markdown)
29+
30+
# Example 2: Convert HTML content directly
31+
html_content = """
32+
<div>
33+
<h1>Hello World</h1>
34+
<p>This is a <strong>test</strong> paragraph.</p>
35+
</div>
36+
"""
37+
response = sgai_client.markdownify(
38+
html_content=html_content
39+
)
40+
print(response.markdown)
41+
```
42+
43+
## Parameters
44+
45+
The `markdownify` method accepts the following parameters:
46+
47+
- `website_url` (str, optional): The URL of the website to convert to Markdown
48+
- `html_content` (str, optional): Direct HTML content to convert to Markdown
49+
50+
Note: You must provide either `website_url` or `html_content`, but not both.
51+
52+
## Response
53+
54+
The response object contains:
55+
56+
- `markdown` (str): The converted Markdown content
57+
- `metadata` (dict): Additional information about the conversion process
58+
59+
## Error Handling
60+
61+
The graph handles various edge cases:
62+
63+
- Invalid URLs
64+
- Malformed HTML
65+
- Network errors
66+
- Timeout issues
67+
68+
If an error occurs, it will be logged and raised with appropriate error messages.
69+
70+
## Best Practices
71+
72+
1. Always provide a valid URL or well-formed HTML content
73+
2. Use appropriate logging levels for debugging
74+
3. Handle the response appropriately in your application
75+
4. Consider rate limiting for large-scale conversions
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
SCRAPEGRAPH_API_KEY=your SCRAPEGRAPH_API_KEY

examples/search_graph/scrapegraphai/readme.md

Whitespace-only changes.
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
"""
2+
Example implementation of search-based scraping using Scrapegraph AI.
3+
This example demonstrates how to use the searchscraper to extract information from the web.
4+
"""
5+
6+
import os
7+
from typing import Dict, Any
8+
from dotenv import load_dotenv
9+
from scrapegraph_py import Client
10+
from scrapegraph_py.logger import sgai_logger
11+
12+
def format_response(response: Dict[str, Any]) -> None:
13+
"""
14+
Format and print the search response in a readable way.
15+
16+
Args:
17+
response (Dict[str, Any]): The response from the search API
18+
"""
19+
print("\n" + "="*50)
20+
print("SEARCH RESULTS")
21+
print("="*50)
22+
23+
# Print request ID
24+
print(f"\nRequest ID: {response['request_id']}")
25+
26+
# Print number of sources
27+
urls = response.get('reference_urls', [])
28+
print(f"\nSources Processed: {len(urls)}")
29+
30+
# Print the extracted information
31+
print("\nExtracted Information:")
32+
print("-"*30)
33+
if isinstance(response['result'], dict):
34+
for key, value in response['result'].items():
35+
print(f"\n{key.upper()}:")
36+
if isinstance(value, list):
37+
for item in value:
38+
print(f" • {item}")
39+
else:
40+
print(f" {value}")
41+
else:
42+
print(response['result'])
43+
44+
# Print source URLs
45+
if urls:
46+
print("\nSources:")
47+
print("-"*30)
48+
for i, url in enumerate(urls, 1):
49+
print(f"{i}. {url}")
50+
print("\n" + "="*50)
51+
52+
def main():
53+
# Load environment variables
54+
load_dotenv()
55+
56+
# Get API key
57+
api_key = os.getenv("SCRAPEGRAPH_API_KEY")
58+
if not api_key:
59+
raise ValueError("SCRAPEGRAPH_API_KEY not found in environment variables")
60+
61+
# Configure logging
62+
sgai_logger.set_logging(level="INFO")
63+
64+
# Initialize client
65+
sgai_client = Client(api_key=api_key)
66+
67+
try:
68+
# Basic search scraper example
69+
print("\nSearching for information...")
70+
71+
search_response = sgai_client.searchscraper(
72+
user_prompt="Extract webpage information"
73+
)
74+
format_response(search_response)
75+
76+
except Exception as e:
77+
print(f"\nError occurred: {str(e)}")
78+
finally:
79+
# Always close the client
80+
sgai_client.close()
81+
82+
if __name__ == "__main__":
83+
main()

examples/smart_scraper_graph/README.md

Lines changed: 0 additions & 30 deletions
This file was deleted.

examples/smart_scraper_graph/ollama/smart_scraper_ollama.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
graph_config = {
1313
"llm": {
14-
"model": "ollama/llama3.2:3b",
14+
"model": "ollama/llama3.2",
1515
"temperature": 0,
1616
# "base_url": "http://localhost:11434", # set ollama URL arbitrarily
1717
"model_tokens": 4096,
@@ -24,7 +24,7 @@
2424
# Create the SmartScraperGraph instance and run it
2525
# ************************************************
2626
smart_scraper_graph = SmartScraperGraph(
27-
prompt="Find some information about what does the company do and the list of founders.",
27+
prompt="Find some information about the founders.",
2828
source="https://scrapegraphai.com/",
2929
config=graph_config,
3030
)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
SCRAPEGRAPH_API_KEY=your SCRAPEGRAPH_API_KEY

0 commit comments

Comments
 (0)