Posts

Showing posts with the label ETL questions

Featured Post

How to Check Column Nulls and Replace: Pandas

Image
Here is a post that shows how to count Nulls and replace them with the value you want in the Pandas Dataframe. We have explained the process in two steps - Counting and Replacing the Null values. Count null values (column-wise) in Pandas ## count null values column-wise null_counts = df.isnull(). sum() print(null_counts) ``` Output: ``` Column1    1 Column2    1 Column3    5 dtype: int64 ``` In the above code, we first create a sample Pandas DataFrame `df` with some null values. Then, we use the `isnull()` function to create a DataFrame of the same shape as `df`, where each element is a boolean value indicating whether that element is null or not. Finally, we use the `sum()` function to count the number of null values in each column of the resulting DataFrame. The output shows the count of null values column-wise. to count null values column-wise: ``` df.isnull().sum() ``` ##Code snippet to count null values row-wise: ``` df.isnull().sum(axis=1) ``` In the above code, `df` is the Panda

19 Top Unix File Scenario Commands

Image
ETL developers main task is to browse various flat files before they start testing. File browsing in UNIX is tricky. If you know right command to do it you can save a lot of time. These 19 top UNIX files commands useful to use in your project. In UNIX a file normally can have Header, Detail and Trailer. There are scenarios where you need only details without header and Trailer, and need only recent one record, and you need to skip some records from the input files. So for all the File based scenarios, I have given useful UNIX commands.   1). How to print/display the first line of a file?  There are many ways to do this. However the easiest way to display the first line of a file is using the [head] command.  $> head -1 file. Txt If you specify [head -2] then it would print first 2 records of the file.  Another way can be by using [sed] command. [sed] is a very powerful text editor which can be used for various text manipulation purposes like this.  $> sed '2,$ d