MTH 522 – 10/18/2023

I’ve started my examination of the ‘fatal-police-shootings-data’ dataset in Python. I’ve initiated the process of loading the data to examine its different variables and their respective distributions. Notably, the ‘age’ variable, which is a numerical column, stands out as it provides insights into the ages of individuals who tragically lost their lives in police shootings. Additionally, the dataset includes latitude and longitude values, allowing us to precisely determine the geographical locations of these incidents.
During this initial assessment, I’ve come across an ‘id’ column, which appears to have limited relevance for our analysis. Consequently, I’m considering excluding it from our further investigation. Going deeper, I’ve scanned the dataset for missing values, revealing that several variables contain null or missing data, including ‘name,’ ‘armed,’ ‘age,’ ‘gender,’ ‘race,’ ‘flee,’ ‘longitude,’ and ‘latitude.’ Furthermore, I’ve checked the dataset for potential duplicate records, and I found only a single duplicate entry, notable for its absence of a ‘name’ value. As we move on to the next phase of this analysis, our focus will shift to exploring the distribution of the ‘age’ variable, a crucial step in gaining insights from this dataset.
In our recent classroom session, we acquired essential knowledge about computing geospatial distances using location information. This newfound expertise enables us to create GeoHistograms, a valuable tool for visualizing and analyzing geographic data. GeoHistograms serve as a powerful instrument for identifying spatial patterns, pinpointing hotspots, and discovering clusters within datasets related to geographic locations. As a result, our understanding of the underlying patterns in the data is significantly enhanced.

Leave a Reply

Your email address will not be published. Required fields are marked *