Month: September 2023
MTH 522 – 09/27/2023
I successfully learnt and applied the technique of 5-fold cross-validation and polynomial regression. The process of translating this technique into Python was pleasant, albeit not without problems. Adapting the data pretreatment methods, guaranteeing data consistency, and dealing with any data anomalies were all key challenges. Furthermore, transferring the nonlinear model fitting part into Python necessitated a good understanding of Python tools such as scikit-learn as well as modeling functions.
Another noteworthy task was reproducing data visualizations, specifically the ListPlot of mean square error (MSE) data, using Python plotting packages such as Matplotlib or Seaborn. Python’s syntax and customisation choices for creating similar charts differed from those in Mathematica. Finally, debugging and error handling were critical in verifying that the Python code generated results that were consistent with the Mathematica code. Despite these difficulties, the translation process gave an excellent opportunity for me to improve my grasp of cross-validation and polynomial regression within a Python programming context.
MTH 522 – 09/26/2023
I watched the video Estimating Prediction Error and Validation Set Approach and learned how important it is to accurately estimate prediction error in machine learning models. This video emphasized the significance of reserving a validation set to evaluate the model’s performance.
The K-fold Cross-Validation video was quite informative. I now understand how K-fold cross-validation improves the robustness of a model’s performance over a single validation set. I utilized this strategy in Python for a diabetic data, dividing my dataset into ‘K’ subgroups and using each as a validation set while training on the remaining data iteratively. I found the ‘Cross-Validation: The Right and Wrong Ways’ video to be quite informative. It stressed the correct and incorrect methods of performing cross-validation, shining light on typical blunders to avoid.
MTH 522 – 09/22/2023
I carried out a detailed investigation by first visualizing the non-normal distribution of the post-molt and pre-molt data by producing histograms for each group of data independently. To learn more, I plotted both histograms side by side and performed a statistical analysis to determine whether there was a significant difference in the means. I did this by calculating the p-value using a t-test.
MTH 522 – 09/20/2023
I have studied linear regression modeling and its application to data fitting in last class. This entails understanding how to work with non-normally distributed, skewed variables with high variation and kurtosis. Furthermore, I’ve dabbled with data analysis with a dataset that includes two measurements: “post-molt,” which indicates the size of a crab’s shell after molting, and “pre-molt,” which reflects the size of a crab’s shell before molting. In reality, I’ve successfully performed linear regression on this dataset, drawing the regression line and evaluating descriptive statistics to gain a better understanding of the data’s relationships. I have also learnt about the significance of the t-test and its use in statistical analysis.
MTH 522 – 09/18/2023
MTH 522 – 09/15/2023
During our last class, My comprehension of the significance of p-values in statistical analysis was strengthened as a result of our in-depth discussion of the topic. I also expressed some concerns about how linear regression calculates distances. Given that the distance is normally calculated as the perpendicular distance between a point and a line in geometry, I was confused as to why the distance was calculated parallel to the y-axis. I also questioned why polynomials of different degrees weren’t explored instead of using the linear equation “y = mx + c” exclusively to model data interactions. I’m happy to report that my inquiries were thoughtfully answered in class, giving me clarification on these crucial topics.
Additionally, I learned a lot from the inquiries made by others.
MTH 522 – 09/13/2023
When I compare the kurtosis numbers for diabetes and inactivity to those given in the course materials and discussed in class, I see a disparity. I used the kurtosis() function from the scipy library to quantify these statistics. The observable variances have prompted inquiries about the distribution of the underlying data and the potential causes of these variations. I’m looking at possible explanations for these discrepancies and trying to make the calculated kurtosis values match the anticipated trends. For our data analysis to be accurate and reliable, this inquiry is crucial.
Along with the kurtosis research, I have also dabbled with modeling the link between diabetes and inactivity using regression approaches. Although linear regression is frequently used for this purpose, I experimented with polynomial regression to identify more complex data patterns. A polynomial of degree 6 (y = -0.00x^8 0.00x^7 -0.14x^6 3.88x^5 -67.96x^4 753.86x^3 -5171.95x^2 20053.68x^1 -33621.52) offers the best fit for our dataset, according to my analysis of different polynomial degrees. This conclusion raises an important question: Why do we frequently use linear regression when polynomial regression seems to provide a more realistic depiction of the complexity of the data? To understand these variables’ dynamics better and to choose the best regression strategy for this particular dataset, more research is required and I am currently on it.
MTH 522 – 09/11/2023
Hello world!
Welcome to UMassD WordPress. This is your first post. Edit or delete it, then start blogging!