MTH 522 – 10/13/2023
In my initial steps of working with the two CSV files, ‘fatal-police-shootings-data’ and ‘fatal-police-shootings-agencies,’ my process began with loading them into Jupyter Notebook. Here’s an overview of the actions I took and the obstacles I encountered:
1. Data Loading: I initiated the process by importing both CSV files into Jupyter Notebook. The ‘fatal-police-shootings-data’ dataset consists of 8,770 entries with 19 attributes, whereas the ‘fatal-police-shootings-agencies’ dataset comprises 3,322 entries with 5 attributes.
2. Column Correspondence: Upon reviewing the column descriptions available on GitHub, I realized that the ‘ids’ column in the ‘fatal-police-shootings-agencies’ dataset corresponds to the ‘agency_ids’ in the ‘fatal-police-shootings-data’ dataset. To streamline the merging process, I renamed the column in the ‘fatal-police-shootings-agencies’ dataset from ‘ids’ to ‘agency_ids.’
3. Data Type Inconsistency: When I attempted to merge the two datasets using the ‘agency_ids,’ I encountered an error indicating an inability to merge on a column with mismatched data types. Upon inspecting the data types using the ‘.info()’ function, I found that one dataset had the ‘agency_ids’ column as an object type, while the other had it as an int64 type. To address this, I utilized the ‘pd.to_numeric()’ function to ensure both columns were of type ‘int64.’
4. Data Fragmentation: A new challenge surfaced in the ‘fatal-police-shootings-data’ dataset: the ‘agency_ids’ column contained multiple IDs within a single cell. To overcome this, I am currently in the process of splitting these cells into multiple rows.
Once I successfully split the cells in the ‘fatal-police-shootings-data’ dataset into multiple rows, my next steps will involve a deeper dive into data exploration and commencing data preprocessing. This will encompass tasks like data cleaning, managing missing data, and preparing the data for analysis or modeling. Your journey into data analysis and preprocessing appears to be off to a promising start, and navigating through these challenges will help you uncover valuable insights from the data.