MTH 522 09/29/2023

I have used Python to put my knowledge of polynomial regression and 5-fold cross-validation into practice, which has strengthened my understanding of these ideas. In order to translate these methods, difficulties with data preprocessing had to be overcome, and Python’s syntax had to be adjusted for data display using Matplotlib and Seaborn.
I also learned a lot from instructional films on K-fold cross-validation, prediction error estimation, the value of validation sets, and other related topics. My comprehension of these ideas and how to effectively use them in Python have increased as a result of these real-world applications and educational resources.

MTH 522 – 09/27/2023

I successfully learnt and applied the technique of 5-fold cross-validation and polynomial regression. The process of translating this technique into Python was pleasant, albeit not without problems. Adapting the data pretreatment methods, guaranteeing data consistency, and dealing with any data anomalies were all key challenges. Furthermore, transferring the nonlinear model fitting part into Python necessitated a good understanding of Python tools such as scikit-learn as well as modeling functions.

Another noteworthy task was reproducing data visualizations, specifically the ListPlot of mean square error (MSE) data, using Python plotting packages such as Matplotlib or Seaborn. Python’s syntax and customisation choices for creating similar charts differed from those in Mathematica. Finally, debugging and error handling were critical in verifying that the Python code generated results that were consistent with the Mathematica code. Despite these difficulties, the translation process gave an excellent opportunity for me to improve my grasp of cross-validation and polynomial regression within a Python programming context.

MTH 522 – 09/26/2023

I watched the video Estimating Prediction Error and Validation Set Approach and learned how important it is to accurately estimate prediction error in machine learning models. This video emphasized the significance of reserving a validation set to evaluate the model’s performance.

The K-fold Cross-Validation video was quite informative. I now understand how K-fold cross-validation improves the robustness of a model’s performance over a single validation set. I utilized this strategy in Python for a diabetic data, dividing my dataset into ‘K’ subgroups and using each as a validation set while training on the remaining data iteratively. I found the ‘Cross-Validation: The Right and Wrong Ways’ video to be quite informative. It stressed the correct and incorrect methods of performing cross-validation, shining light on typical blunders to avoid.

MTH 522 – 09/22/2023

I carried out a detailed investigation by first visualizing the non-normal distribution of the post-molt and pre-molt data by producing histograms for each group of data independently. To learn more, I plotted both histograms side by side and performed a statistical analysis to determine whether there was a significant difference in the means. I did this by calculating the p-value using a t-test.

MTH 522 – 09/20/2023

I have studied linear regression modeling and its application to data fitting in last class. This entails understanding how to work with non-normally distributed, skewed variables with high variation and kurtosis. Furthermore, I’ve dabbled with data analysis with a dataset that includes two measurements: “post-molt,” which indicates the size of a crab’s shell after molting, and “pre-molt,” which reflects the size of a crab’s shell before molting. In reality, I’ve successfully performed linear regression on this dataset, drawing the regression line and evaluating descriptive statistics to gain a better understanding of the data’s relationships. I have also learnt about the significance of the t-test and its use in statistical analysis.

MTH 522 – 09/18/2023

I learned about correlation analysis, linear regression with three variables, creating 3D plots, analyzing residuals plots, and understanding the planes formed in 3D plots while considering three variables in the previous class. I additionally looked at how quadratic equations can be used in data analysis.
I’ve been working on graphs that incorporate data on diabetes, inactivity, and obesity. My current focus is on plotting the linear regression plane within the 3D plot, which I hope will provide useful information about the correlations between these variables.

MTH 522 – 09/15/2023

During our last class, My comprehension of the significance of p-values in statistical analysis was strengthened as a result of our in-depth discussion of the topic. I also expressed some concerns about how linear regression calculates distances. Given that the distance is normally calculated as the perpendicular distance between a point and a line in geometry, I was confused as to why the distance was calculated parallel to the y-axis. I also questioned why polynomials of different degrees weren’t explored instead of using the linear equation “y = mx + c” exclusively to model data interactions. I’m happy to report that my inquiries were thoughtfully answered in class, giving me clarification on these crucial topics.

Additionally, I learned a lot from the inquiries made by others.

MTH 522 – 09/13/2023

When I compare the kurtosis numbers for diabetes and inactivity to those given in the course materials and discussed in class, I see a disparity. I used the kurtosis() function from the scipy library to quantify these statistics. The observable variances have prompted inquiries about the distribution of the underlying data and the potential causes of these variations. I’m looking at possible explanations for these discrepancies and trying to make the calculated kurtosis values match the anticipated trends. For our data analysis to be accurate and reliable, this inquiry is crucial.

Along with the kurtosis research, I have also dabbled with modeling the link between diabetes and inactivity using regression approaches. Although linear regression is frequently used for this purpose, I experimented with polynomial regression to identify more complex data patterns. A polynomial of degree 6 (y = -0.00x^8 0.00x^7 -0.14x^6 3.88x^5 -67.96x^4 753.86x^3 -5171.95x^2 20053.68x^1 -33621.52) offers the best fit for our dataset, according to my analysis of different polynomial degrees. This conclusion raises an important question: Why do we frequently use linear regression when polynomial regression seems to provide a more realistic depiction of the complexity of the data? To understand these variables’ dynamics better and to choose the best regression strategy for this particular dataset, more research is required and I am currently on it.

MTH 522 – 09/11/2023

I tried using Python to combine the three tables Diabetes, Obesity, and Inactivity utilizing a shared column called “FIPS”. To ensure consistent merging, I ran across a problem where the column titles in one of the tables were different. I had to fix this. The Diabetes and Inactivity tables were successfully combined to create a DataFrame, and I then set out to examine the data’s statistical characteristics, including mean, mode, median, standard deviation, skewness, and kurtosis.
Also I have revisited key concepts from our last class, including correlation, scatterplots, linear regression, residual analysis, heteroscedasticity, and the Breusch-Pagan Test. Having gained a solid understanding of these fundamental concepts, I’m eager to put them into practice using Python. I plan to apply these techniques to the Diabetes, Obesity, and Inactivity tables data and wanted to explore much insights.