News

Apache Spark has become the de facto standard for processing data at scale, whether for querying large datasets, training machine learning models to predict future trends, or processing streaming ...
The Apache Spark community has improved support for Python to such a great degree over the past few years that Python is now a “first-class” language, and no longer a “clunky” add-on as it once was, ...
The Python-versus-R-in-Spark discussion also carries over to the production side of the equation. In the olden days of Spark (i.e. 18 months ago), putting a Spark job into production often required ...
But Spark has also had its share of impedance mismatch issues, such as making R and Python programs first-class citizens, or adapting to more compute-intensive processing of AI models.