Posts

Showing posts with the label structured vs unstructured

Featured Post

SQL Query: 3 Methods for Calculating Cumulative SUM

Image
SQL provides various constructs for calculating cumulative sums, offering flexibility and efficiency in data analysis. In this article, we explore three distinct SQL queries that facilitate the computation of cumulative sums. Each query leverages different SQL constructs to achieve the desired outcome, catering to diverse analytical needs and preferences. Using Window Functions (e.g., PostgreSQL, SQL Server, Oracle) SELECT id, value, SUM(value) OVER (ORDER BY id) AS cumulative_sum  FROM your_table; This query uses the SUM() window function with the OVER clause to calculate the cumulative sum of the value column ordered by the id column. Using Subqueries (e.g., MySQL, SQLite): SELECT t1.id, t1.value, SUM(t2.value) AS cumulative_sum FROM your_table t1 JOIN your_table t2 ON t1.id >= t2.id GROUP BY t1.id, t1.value ORDER BY t1.id; This query uses a self-join to calculate the cumulative sum. It joins the table with itself, matching rows where the id in the first table is greater than or

6 Exclusive Differences Between Structured and Unstructured data

Image
Here's a basic interview question for Big data engineers. Why it's basic means many Bachelor degrees now offering courses on Big data, as a beginner, understanding of data is a little tricky. So interviewers stress this point. Don't worry, I made it simplified. So you get a clear concept. I share here a total of six differences between these. In today's world, we have a lot of data. That data is the unstructured format.   Structured Data The major data format is text, which can be string or numeric. The date is also supported. The data model is fixed before inserting the data. Data is stored in the form of a table, making it easy to search. Not easy to scale. Version is maintained as a column in the table. Transaction management and concurrency are easy to support. Unstructured data The data format can be anything from text to images, audio to videos. The data model cannot be fixed since the nature of the data can change. Consider a tweet message that could be text foll