This blog post has been co-authored by Bhanu Prakash, Principal Program Manager, Azure Databricks.
We are now announcing the general availability of the Apache Spark 3.0 compatible Apache Spark Connector for SQL Server and Azure SQL, accessible through Maven.
The Spark 3.0 compatible connector went into preview early this year. Since then, we have seen tremendous customer adoption and received helpful customer feedback. Over the last few months, after incorporating enhancements and bug fixes to the connector, we are now excited about the general availability of this connector so that customers can expand their usage for even more workloads.
The Apache Spark Connector for SQL Server is a high-performance connector that enables users to use transactional data in big data analytics and persist results for ad-hoc queries or reporting. It allows you to use SQL Server or Azure SQL as input data sources or output data sinks for Spark jobs. It provides bulk insert data into the database and can outperform row-by-row insertion with 10 to 20 times faster performance, as compared to just using Java Database Connectivity (JDBC). In addition, customers can use this connector to score machine learning models from SQL Server Machine Learning Services, or score results in SQL after doing machine learning in Spark.
Why use the Apache Spark Connector for SQL Server and Azure SQL
The Apache Spark Connector for SQL Server and Azure SQL is based on the Apache Spark DataSourceV1 API and SQL Server Bulk API and uses the same interface as the built-in JDBC Spark-SQL connector. This allows you to easily integrate the connector and migrate your existing Spark jobs by simply updating the format parameter.
Notable features and benefits of the connector:
- Compatible with Apache Spark 3.0.
- Support for all Apache Spark bindings (Scala, Python, R).
- Basic authentication, Active Directory (AD) Key Tab, and Azure Active Directory support.
To learn more about the connector and how to use it, visit the GitHub page. To configure the compatible connector using Maven coordinates, reach out to the Apache Spark Connector for SQL Server and Azure SQL Maven page, links to specific builds are also provided on the GitHub page.
Get involved
The Apache Spark Connector for SQL Server and Azure SQL makes the interaction between SQL Server and Apache Spark flawless. The connector has a growing and engaged community, and has been installed thousands of times. We are continuously evolving and improving the connector, and we look forward to your feedback and contributions.
Want to contribute or have feedback or questions? Check out the project on GitHub and follow us on Twitter.
Note: The connector is community-supported and does not include Microsoft SLA support. Please file an issue on GitHub to engage the community for help.