Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

The awesome points to learn from DB2 NoSQL GraphStore

The awesome points to learn from db2 graphstore
 #db2 graphstore:
One best example, prior to understanding the RDF format for Graph data modelIf the graph data model is the model the semantic web uses to store data, RDF is the format in which it is written. 


Summary of DB2 Graph Store:
  • DB2-RDF support is officially called "NoSQL Graph Support".  
  • The API extends the Jena API (Graph layer).  Developers familiar with Jena TDB will have the Model layer capabilities they are accustomed to.
  • Although the DB2-RDF functionality is being released with DB2 LUW 10.1, it is also compatible with DB2 9.7.
  • Full supports for SPARQL 1.0 and a subset of SPARQL 1.1.  Full SPARQL 1.1 support (which is till a W3C working draft) will be forthcoming.
  • While RDBMS implementations of RDF graphs have typically been non-performant, that is not the case here*.  Some very impressive and innovative work has been put into optimization capabilities.  Out-of-the box performance is comparable with native triple stores, and read/write performance in the optimized schema has been seen to surpass these speeds.
Related: Presentation on DB2 NoSQL Graph Store

What is RDF data model(ref:wiki)

The RDF data model is similar to classical conceptual modeling approaches such as entity–relationship or class diagrams, as it is based upon the idea of making statements about resources (in particular web resources) in the form of subject–predicate–object expressions.  


These expressions are known as triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has", and an object denoting "the color blue". Therefore, RDF swaps object for subject that would be used in the classical notation of an entity–attribute–value model within object-oriented design; Entity (sky), attribute (color) and value (blue). RDF is an abstract model with several serialization formats (i.e., file formats), and so the particular way in which a resource or triple is encoded varies from format to format. 


This mechanism for describing resources is a major component in the W3C's Semantic Web activity: an evolutionary stage of the World Wide Web in which automated software can store, exchange, and use machine-readable information distributed throughout the Web, in turn enabling users to deal with the information with greater efficiency and certainty. 

RDF's simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity. 
A collection of RDF statements intrinsically represents a labeled, directed multi-graph. As such, an RDF-based data model is more naturally suited to certain kinds of knowledge representation than the relational model and other ontological models. However, in practice, RDF data is often persisted in relational database or native representations also called Triplestores, or Quad stores if context (i.e. the named graph) is also persisted for each RDF triple.[3] ShEX, or Shape Expressions,[4] is a language for expressing constraints on RDF graphs. It includes the cardinality constraints from OSLC Resource Shapes and Dublin Core Description Set Profiles as well as logical connectives for disjunction and polymorphism. As RDFS and OWL demonstrate, one can build additional ontology languages upon RDF.

Related:

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM