Posts

Showing posts with the label Data Science

Featured Post

How to Build CI/CD Pipeline: GitHub to AWS

Image
 Creating a CI/CD pipeline to deploy a project from GitHub to AWS can be done using various AWS services like AWS CodePipeline, AWS CodeBuild, and optionally AWS CodeDeploy or Amazon ECS for application deployment. Below is a high-level guide on how to set up a basic GitHub to AWS pipeline: Prerequisites AWS Account : Ensure access to the AWS account with the necessary permissions. GitHub Repository : Have your application code hosted on GitHub. IAM Roles : Create necessary IAM roles with permissions to interact with AWS services (e.g., CodePipeline, CodeBuild, S3, ECS, etc.). AWS CLI : Install and configure the AWS CLI for easier management of services. Step 1: Create an S3 Bucket for Artifacts AWS CodePipeline requires an S3 bucket to store artifacts (builds, deployments, etc.). Go to the S3 service in the AWS Management Console. Create a new bucket, ensuring it has a unique name. Note the bucket name for later use. Step 2: Set Up AWS CodeBuild CodeBuild will handle the build proces

How to Show Data Science Project in Resume Correctly

Image
The data scientist resume should have mentioned the project correctly. Here are my ideas on how to show the project in the resume. How to Show Data Science Project? 1. Preparation of Resume .  The first step for an interview for any project is you need Resume. You need to tell clearly about your resume.  2. Answering about Project.   In interviews, you will be asked questions about your project. So the second step is you need to be in a position to explain the project. 3. Answering your Project Role.  The third point is you need to explain the roles you performed in your data science project. If you mention the roles correctly, then, you will have a 100% chance to shortlist your resume. Based on your experience your resume can be 1 page or 2 pages .  4. Technologies in the Resume. In interviews, again they will be asked how you used different tools to complete your data science project. So, you need to be in a position to explain how you used different options present in the tools.  S

Top Data Science Tools Complete List

Image
Top data science tools and platform providers across the world. Useful information for data science and data analytics developers. 8 Top Data Analytics Tools List. Data Science is a combination of multiple skills. AI and Machine Learning are part of data science. You can create AI and Machine Learning products with data. Related Posts Top Skills You Need for Data Science Career Data Science Sample Project an Example

Data science: Simple Project to Practice

Image
I want to share with you how to use Python for your Data science or analytics Projects. Many programmers struggle to learn Data science because they do not know where to start. You can get hands-on if you start with a mini-project. I have used Ubuntu Operating System for this project. You Need dual skills; Learning and Apply knowledge to become a data scientist. In Data science you need to learn and apply your knowledge.   After engineering, you can go for M Tech Degree.  You can become a real engineer if you apply engineering principles. So Data science also the same. Data Visualization in Python is my simple project Importance of Data Data is a precious resource in resolving Machine Learning and Data Science Problems.  Define first what is your problem. Collect Data  Wrangle the Data and Clean it. Visualize the Patterns In the olden days , you might be studied a subject called Statistical Analysis.  In this subject, you need to study the actual problem and collect the data in

Here's to Know Data lake Vs Database

Image
In a data lake, data stored internally in a repository. You can call it a blob. The data in the lake a no-format data, but you need a schema for the database.  Data lake Repository Database In the database, the Schema definition you need before you store data on it. It should follow Codd's rules. Here data is completely formatted. The data stores here in Tables, so you need SQL language to read the records. Poor performance in terms of scalability. Data lake It doesn't have any format - it's just a dump. You can send this dump to the Hadoop repository for data analysis. This repository can be incremental. You can build a database. The data lake is a dump of data with no format. It needs a pre-format before it sends for analytics. Data security and encryption: You need these before you send data to Hadoop. In real-time, you need to pre-process data. This data you need to send to the data warehouse to get insights.

5 Essential IT Skills for Data Engineers

Image
Data engineers need the following skills. These skills help you get nice job in any analytics company. Photo Credit: Srini Five Top Skills Need Skill-1 Experience working with big data tools such as MapReduce, Pig, Spark, Kafka and NoSQL data stores such as MongoDB, Cassandra, HBase, etc. Skill-2 Expertise in multi-structured data modeling, reporting on NoSQL & structured database technologies such as HBase and Cassandra, SQL. Skill-3 Experience with languages such as Python, Perl, Ruby, Java, Scala, R etc. Skill-4 Strong data & visual presentation skills and ability to explain insights using tools like tableau, D3 charts or other tools. Skill-5 Basic knowledge and experience of statistical analysis tools such as R.

Data Analytics Tutorial for COBOL Programmers

Image
Mainframe developers look for an alternative IT course to grow in their careers.  I have explained in this post how can they use their business knowledge. Data analytics tutorial  is a top an alternative for COBOL programmers . What is Data Analytics The field of data science is evolving into one of the fastest-growing and most in-demand fields in the world.  Organizations across industries are looking to make sense of the data they can now collect from new technologies – from predicting the next hot product to determining the risk of an infectious disease outbreak. Demand and Opportunity According to The New York Times, data science “promises to revolutionize industries from business to government, health care to academia.” As data accumulates, organizations are hiring individuals with the expertise to find meaning in the numbers and drive positive business decisions based on what they learn. It is estimated that by 2018, 4 million to 5 million jobs in the Unit

How to Use Chaid Useful for Data Science Developers

Image
The Chaid is one of the most asked skills for Data Science engineers. The CHAID Analysis (Chi-Square Automatic Interaction Detection) is a form of analysis that determines how variables best combine to explain the outcome in a given dependent variable. Chaid Model The model can be used in cases of market penetration, predicting and interpreting responses, or a multitude of other research problems. CHAID analysis is especially useful for data expressing categorized values instead of continuous values. For this kind of data, some common statistical tools such as regression are not applicable and CHAID analysis is a perfect tool to discover the relationship between variables.  One of the outstanding advantages of CHAID analysis is that it can visualize the relationship between the target (dependent) variable and the related factors with a tree 1. CHAID Analysis for Surveys Analysis Most survey answers have categorized values instead of continuous values.  Finding out the statistical re

How to Identify Data Relevant for Data Science Analytics

Your government, your web server, your business partners, even your body. While we aren’t drowning in a sea of data, we’re finding that almost everything can (or has) been instrumented. We frequently combine publishing industry data from Nielsen Book Scan with our own sales data, publicly available Amazon data, and even job data to see what’s happening in the publishing industry. Data is everywhere Sites like Infochimps and Factual provide access to many large datasets, including climate data, MySpace activity streams, and game logs from sporting events. Factual enlists users to update and improve its datasets, which cover topics as diverse as endocrinologists to hiking trails. How the data is growing Much of the data we currently work with is the direct consequence of Web 2.0, and of Moore’s Law applied to data. The Web has people spending more time online and leaving a trail of data wherever they go. Mobile applications leave an even richer data trail since many of them a

4 Top Data Scientist Skills to be Successful

Image
Data science is a combination of technical and general skills. As an analyst, you are responsible to provide useful information to the client. Below is a useful list of skills. Top Data Scientist Skills. 1. Paradigms and practices. This involves data scientists acquiring a grounding in core concepts of data science, analytics, and data management.  Data scientists should easily grasp the data science life cycle, know their typical roles and responsibilities in every phase, and be able to work in teams and with business domain experts and stakeholders.  Also, they should learn a standard approach for establishing, managing, and operationalizing data science projects in the business. 2. Algorithms and modeling. Here are the areas with which data scientists must become familiar: linear algebra,  basic statistics,  linear and logistic regression,  data mining,  predictive modeling,  cluster analysis,  association rules,  market-basket analysis,  decision tr