Skip to content

Commit 220156e

Browse files
authored
Merge pull request #867 from ScrapeGraphAI/pre/beta
Pre/beta
2 parents 4ee2f4d + af7532d commit 220156e

39 files changed

+394
-2620
lines changed

.github/update-requirements.yml

Lines changed: 0 additions & 26 deletions
This file was deleted.

.github/workflows/python-publish.yml

Lines changed: 0 additions & 32 deletions
This file was deleted.

CHANGELOG.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,24 @@
1+
## [1.34.0-beta.16](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.15...v1.34.0-beta.16) (2025-01-06)
12
## [1.34.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0...v1.34.1) (2025-01-04)
23

34

5+
46
### Bug Fixes
57

8+
* add back poethepoet for pylint ([a82af04](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/a82af04afed2e4ba309b5e98b5df351d9b79ca2e))
9+
* better playwright installation handling ([f6009d1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/f6009d1abf9e2c83999de0c9b03a41aa1bf8f2a4))
10+
* disallow mailto: ([#861](https://github.com/ScrapeGraphAI/Scrapegraph-ai/issues/861)) ([8d9c909](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/8d9c909923dff1c247c85099db20e2a6dabb93f5))
11+
* removed requirements files ([25861b0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/25861b04be8a6fc60c900a46033aed91d1fef1f9))
12+
* selenium import in ChromiumLoader ([e374e05](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/e374e055d64b7fa4c5a4c7694384dd15e6361bbd))
13+
14+
15+
### chore
16+
17+
* chromium browser asnc handling ([5be7c49](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/5be7c497cd44fbd0c026bf3d833f572b34661b08))
18+
* made some libs optional ([5cdf055](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/5cdf0550fe9dcd519d274bb343cf65c845e8a608))
19+
* pandas package is now optional ([54c69a2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/54c69a2b0b1677286b840be95ce482bcee881413))
20+
21+
## [1.34.0-beta.15](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.14...v1.34.0-beta.15) (2025-01-03)
622
* add new models ([72684a9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/72684a9476e255d5e20550f82daf3e7462fb8f5a))
723

824
## [1.34.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.33.11...v1.34.0) (2025-01-03)
@@ -14,8 +30,11 @@
1430
* added scrolling method to chromium docloader ([1c8b910](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/1c8b910562112947a357277bca9dc81619b72e61))
1531

1632

33+
1734
### Bug Fixes
1835

36+
37+
* search graph ([d4b2679](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/d4b26796d94d314af135d2d1bbd538e1d4be7593))
1938
* added license-files = [ ([9150e4c](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/9150e4c95fa468afe9ddda3f1278b5037a2d0f38))
2039
* added twine ([df07da9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/df07da9bcc59cbccf1c45d69e3a3e904eaed565b))
2140
* build config ([b186a4f](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b186a4f1c73fe29fa706158cc3c61812d6b16343))

README.md

Lines changed: 47 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -24,21 +24,6 @@ Just say which information you want to extract and the library will do it for yo
2424
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/sgai-hero.png" alt="ScrapeGraphAI Hero" style="width: 100%;">
2525
</p>
2626

27-
## 🔗 ScrapeGraph API & SDKs
28-
If you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API [here!](https://dashboard.scrapegraphai.com/login)
29-
30-
<p align="center">
31-
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 100%;">
32-
</p>
33-
34-
We offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:
35-
36-
| SDK | Language | GitHub Link |
37-
|-----------|----------|-----------------------------------------------------------------------------|
38-
| Python SDK | Python | [scrapegraph-py](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py) |
39-
| Node.js SDK | Node.js | [scrapegraph-js](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js) |
40-
41-
The Official API Documentation can be found [here](https://docs.scrapegraphai.com/).
4227

4328
## 🚀 Quick install
4429

@@ -47,35 +32,12 @@ The reference page for Scrapegraph-ai is available on the official page of PyPI:
4732
```bash
4833
pip install scrapegraphai
4934

35+
# IMPORTANT (to fetch websites content)
5036
playwright install
5137
```
5238

5339
**Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱
5440

55-
<details>
56-
<summary><b>Optional Dependencies</b></summary>
57-
Additional dependecies can be added while installing the library:
58-
59-
- <b>More Language Models</b>: additional language models are installed, such as Fireworks, Groq, Anthropic, Hugging Face, and Nvidia AI Endpoints.
60-
61-
This group allows you to use additional language models like Fireworks, Groq, Anthropic, Together AI, Hugging Face, and Nvidia AI Endpoints.
62-
```bash
63-
pip install scrapegraphai[other-language-models]
64-
```
65-
- <b>Semantic Options</b>: this group includes tools for advanced semantic processing, such as Graphviz.
66-
67-
```bash
68-
pip install scrapegraphai[more-semantic-options]
69-
```
70-
71-
- <b>Browsers Options</b>: this group includes additional browser management tools/services, such as Browserbase.
72-
73-
```bash
74-
pip install scrapegraphai[more-browser-options]
75-
```
76-
77-
</details>
78-
7941

8042
## 💻 Usage
8143
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).
@@ -84,13 +46,12 @@ The most common one is the `SmartScraperGraph`, which extracts information from
8446

8547

8648
```python
87-
import json
8849
from scrapegraphai.graphs import SmartScraperGraph
8950

9051
# Define the configuration for the scraping pipeline
9152
graph_config = {
9253
"llm": {
93-
"api_key": "YOUR_OPENAI_APIKEY",
54+
"api_key": "YOUR_OPENAI_API_KEY",
9455
"model": "openai/gpt-4o-mini",
9556
},
9657
"verbose": True,
@@ -99,33 +60,45 @@ graph_config = {
9960

10061
# Create the SmartScraperGraph instance
10162
smart_scraper_graph = SmartScraperGraph(
102-
prompt="Extract me all the news from the website",
103-
source="https://www.wired.com",
63+
prompt="Extract useful information from the webpage, including a description of what the company does, founders and social media links",
64+
source="https://scrapegraphai.com/",
10465
config=graph_config
10566
)
10667

10768
# Run the pipeline
10869
result = smart_scraper_graph.run()
70+
71+
import json
10972
print(json.dumps(result, indent=4))
11073
```
11174

11275
The output will be a dictionary like the following:
11376

11477
```python
115-
"result": {
116-
"news": [
117-
{
118-
"title": "The New Jersey Drone Mystery May Not Actually Be That Mysterious",
119-
"link": "https://www.wired.com/story/new-jersey-drone-mystery-maybe-not-drones/",
120-
"author": "Lily Hay Newman"
121-
},
122-
{
123-
"title": "Former ByteDance Intern Accused of Sabotage Among Winners of Prestigious AI Award",
124-
"link": "https://www.wired.com/story/bytedance-intern-best-paper-neurips/",
125-
"author": "Louise Matsakis"
126-
},
127-
...
128-
]
78+
{
79+
"description": "ScrapeGraphAI transforms websites into clean, organized data for AI agents and data analytics. It offers an AI-powered API for effortless and cost-effective data extraction.",
80+
"founders": [
81+
{
82+
"name": "Marco Perini",
83+
"role": "Founder & Technical Lead",
84+
"linkedin": "https://www.linkedin.com/in/perinim/"
85+
},
86+
{
87+
"name": "Marco Vinciguerra",
88+
"role": "Founder & Software Engineer",
89+
"linkedin": "https://www.linkedin.com/in/marco-vinciguerra-7ba365242/"
90+
},
91+
{
92+
"name": "Lorenzo Padoan",
93+
"role": "Founder & Product Engineer",
94+
"linkedin": "https://www.linkedin.com/in/lorenzo-padoan-4521a2154/"
95+
}
96+
],
97+
"social_media_links": {
98+
"linkedin": "https://www.linkedin.com/company/101881123",
99+
"twitter": "https://x.com/scrapegraphai",
100+
"github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai"
101+
}
129102
}
130103
```
131104
There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.
@@ -145,20 +118,30 @@ It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**,
145118

146119
Remember to have [Ollama](https://ollama.com/) installed and download the models using the **ollama pull** command, if you want to use local models.
147120

148-
## 🔍 Demo
149-
Official streamlit demo:
150-
151-
[![My Skills](https://skillicons.dev/icons?i=react)](https://scrapegraph-demo-demo.streamlit.app)
152121

153-
Try it directly on the web using Google Colab:
122+
## 📖 Documentation
154123

155124
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sEZBonBMGP44CtO6GQTwAlL0BGJXjtfd?usp=sharing)
156125

157-
## 📖 Documentation
158-
159126
The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.readthedocs.io/en/latest/).
160127
Check out also the Docusaurus [here](https://docs-oss.scrapegraphai.com/).
161128

129+
## 🔗 ScrapeGraph API & SDKs
130+
If you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API [here!](https://dashboard.scrapegraphai.com/login)
131+
132+
<p align="center">
133+
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 100%;">
134+
</p>
135+
136+
We offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:
137+
138+
| SDK | Language | GitHub Link |
139+
|-----------|----------|-----------------------------------------------------------------------------|
140+
| Python SDK | Python | [scrapegraph-py](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py) |
141+
| Node.js SDK | Node.js | [scrapegraph-js](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js) |
142+
143+
The Official API Documentation can be found [here](https://docs.scrapegraphai.com/).
144+
162145
## 🏆 Sponsors
163146
<div style="text-align: center;">
164147
<a href="https://2ly.link/1zaXG">

docs/turkish.md

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -31,31 +31,6 @@ playwright install
3131

3232
**Not**: Diğer kütüphanelerle çakışmaları önlemek için kütüphaneyi sanal bir ortamda kurmanız önerilir 🐱
3333

34-
<details>
35-
<summary><b>Opsiyonel Bağımlılıklar</b></summary>
36-
Kütüphaneyi kurarken ek bağımlılıklar ekleyebilirsiniz:
37-
38-
- **Daha Fazla Dil Modeli**: Fireworks, Groq, Anthropic, Hugging Face ve Nvidia AI Endpoints gibi ek dil modelleri kurulur.
39-
40-
Bu grup, Fireworks, Groq, Anthropic, Together AI, Hugging Face ve Nvidia AI Endpoints gibi ek dil modellerini kullanmanızı sağlar.
41-
42-
```bash
43-
pip install scrapegraphai[other-language-models]
44-
```
45-
46-
- **Semantik Seçenekler**: Graphviz gibi gelişmiş semantik işleme araçlarını içerir.
47-
48-
```bash
49-
pip install scrapegraphai[more-semantic-options]
50-
```
51-
52-
- **Tarayıcı Seçenekleri**: Browserbase gibi ek tarayıcı yönetim araçları/hizmetlerini içerir.
53-
54-
```bash
55-
pip install scrapegraphai[more-browser-options]
56-
```
57-
58-
</details>
5934

6035
## 💻 Kullanım
6136

examples/anthropic/csv_scraper_anthropic.py

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,8 @@
33
"""
44
import os
55
from dotenv import load_dotenv
6-
import pandas as pd
76
from scrapegraphai.graphs import CSVScraperGraph
8-
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
7+
from scrapegraphai.utils import prettify_exec_info
98

109
load_dotenv()
1110

@@ -17,7 +16,8 @@
1716
curr_dir = os.path.dirname(os.path.realpath(__file__))
1817
file_path = os.path.join(curr_dir, FILE_NAME)
1918

20-
text = pd.read_csv(file_path)
19+
with open(file_path, 'r') as file:
20+
text = file.read()
2121

2222
# ************************************************
2323
# Define the configuration for the graph
@@ -41,7 +41,7 @@
4141

4242
csv_scraper_graph = CSVScraperGraph(
4343
prompt="List me all the last names",
44-
source=str(text), # Pass the content of the file, not the file object
44+
source=text, # Pass the content of the file
4545
config=graph_config
4646
)
4747

@@ -53,8 +53,4 @@
5353
# ************************************************
5454

5555
graph_exec_info = csv_scraper_graph.get_execution_info()
56-
print(prettify_exec_info(graph_exec_info))
57-
58-
# Save to json or csv
59-
convert_to_csv(result, "result")
60-
convert_to_json(result, "result")
56+
print(prettify_exec_info(graph_exec_info))

examples/anthropic/csv_scraper_graph_multi_anthropic.py

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,8 @@
33
"""
44
import os
55
from dotenv import load_dotenv
6-
import pandas as pd
76
from scrapegraphai.graphs import CSVScraperMultiGraph
8-
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
7+
from scrapegraphai.utils import prettify_exec_info
98

109
load_dotenv()
1110
# ************************************************
@@ -16,7 +15,8 @@
1615
curr_dir = os.path.dirname(os.path.realpath(__file__))
1716
file_path = os.path.join(curr_dir, FILE_NAME)
1817

19-
text = pd.read_csv(file_path)
18+
with open(file_path, 'r') as file:
19+
text = file.read()
2020

2121
# ************************************************
2222
# Define the configuration for the graph
@@ -48,7 +48,3 @@
4848

4949
graph_exec_info = csv_scraper_graph.get_execution_info()
5050
print(prettify_exec_info(graph_exec_info))
51-
52-
# Save to json or csv
53-
convert_to_csv(result, "result")
54-
convert_to_json(result, "result")

examples/openai/depth_search_graph_openai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
load_dotenv()
99

10-
openai_key = os.getenv("OPENAI_APIKEY")
10+
openai_key = os.getenv("OPENAI_API_KEY")
1111

1212
graph_config = {
1313
"llm": {

examples/openai/search_graph_openai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
# Define the configuration for the graph
1212
# ************************************************
1313

14-
openai_key = os.getenv("OPENAI_APIKEY")
14+
openai_key = os.getenv("OPENAI_API_KEY")
1515

1616
graph_config = {
1717
"llm": {

0 commit comments

Comments
 (0)