Skip to content

Commit 10b5aa8

Browse files
authored
Update README.md
1 parent 233006f commit 10b5aa8

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,3 +95,12 @@ In Apache Spark, a DataFrame is a distributed collection of rows under named col
9595
* DataFrame has a support for wide range of data format and sources.
9696
* It has API support for different languages like Python, R, Scala, Java.
9797

98+
### Spark SQL
99+
Spark SQL provides a DataFrame API that can perform relational operations on both external data sources and Spark's built-in distributed collections—at scale!
100+
101+
To support a wide variety of diverse data sources and algorithms in Big Data, Spark SQL introduces a novel extensible optimizer called Catalyst, which makes it easy to add data sources, optimization rules, and data types for advanced analytics such as machine learning.
102+
Essentially, Spark SQL leverages the power of Spark to perform distributed, robust, in-memory computations at massive scale on Big Data.
103+
104+
Spark SQL provides state-of-the-art SQL performance and also maintains compatibility with all existing structures and components supported by Apache Hive (a popular Big Data warehouse framework) including data formats, user-defined functions (UDFs), and the metastore. Besides this, it also helps in ingesting a wide variety of data formats from Big Data sources and enterprise data warehouses like JSON, Hive, Parquet, and so on, and performing a combination of relational and procedural operations for more complex, advanced analytics.
105+
106+
![Spark-2](https://cdn-images-1.medium.com/max/2000/1*OY41hGbe4IB9-hHLRPuCHQ.png)

0 commit comments

Comments
 (0)