Our Blogs

Learn about data lakes, machine learning & more innovations

Z-ordering: take the Guesswork out (part2)

Z-ordering: take the Guesswork out (part2)

Introduction: Last time, we left Bob in quite a situation. He was struggling with his store’s ordering. In particular, his…
Read More
Z-ordering: take the Guesswork out (part1)

Z-ordering: take the Guesswork out (part1)

“With great power comes great responsibility” -Spiderman Introduction: The world generates 2.5 quintillion bytes per day. That’s 1,000 petabytes!So in line with…
Read More
Delta 2.0 vs Iceberg 0.14.0 : TPC-DS benchmark

Delta 2.0 vs Iceberg 0.14.0 : TPC-DS benchmark

After the announcement of Delta 2.0 during the Data + AI Summit in which Databricks fully open sourced Delta Lake, and the release…
Read More
Delta vs Hudi : Databeans’ vision on Benchmarking

Delta vs Hudi : Databeans’ vision on Benchmarking

Introduction In our previous blog, we compared Delta 1.2.0, Iceberg 0.13.1 and Hudi 011.1 and we published our findings only to find out that Onehouse saw…
Read More
Delta vs Iceberg vs hudi : Reassessing Performance

Delta vs Iceberg vs hudi : Reassessing Performance

Introduction: After comparing delta vs iceberg in our previous blog, a lot of people asked for benchmarking their latest versions and for…
Read More
Delta vs Iceberg : Performance as a decisive criteria

Delta vs Iceberg : Performance as a decisive criteria

Introduction : A data Lakehouse is an open data architecture that brings together the scalability and cost-effectiveness of data lakes…
Read More
Delta Lake: The Data Engineer’s missing piece (Part-2)

Delta Lake: The Data Engineer’s missing piece (Part-2)

Introduction Last time we left Bob in a very Complicated situation, he was facing too many problems with too little…
Read More
Simplify your streaming CDC architecture using Delta Streams

Simplify your streaming CDC architecture using Delta Streams

In 2019, a study by Lightbend and The New Stack [1] revealed that The use of stream processing for AI/ML…
Read More
Spark ASN.1 DataSource for Telecom Mediation

Spark ASN.1 DataSource for Telecom Mediation

We are progressing towards an era of information which can and must be converted into real time actionable acumen, to…
Read More
Delta Lake: The Data Engineer’s missing piece

Delta Lake: The Data Engineer’s missing piece

Open sourced in April 2019, Delta Lake is a Databricks project that brings reliability, performance and lifecycle management to data…
Read More