## MTH 522 – 12/02/2023

ARIMA and LSTM time series forecasting models require data to be in chronological sequence. ARIMA works well for datasets with clear historical trends that require stationarity, whereas LSTM works well for capturing more complicated interactions that go beyond linear or seasonal patterns. When dealing with high-dimensional datasets that require numerical inputs and careful feature scaling, Support Vector Machines (SVM) are useful. SVMs are very useful when classes are clearly separated. Because they require numerical inputs and scaled features, neural networks excel at managing complicated and huge datasets where traditional methods may fall short. They are appropriate for cases with complex variable relationships.

ARIMA and LSTM can model and predict response times based on past patterns when forecasting response time and resource requirements. SVMs can categorize occurrences based on expected reaction times or resource requirements, but neural networks excel at complicated prediction tasks including a plethora of influencing elements in historical data. The model chosen is determined by unique data features and the nature of the forecasting activity.

## MTH 522 – 11/29/2023

## MTH 522 – 11/27/2023

I have become completely engrossed in the complexities of sophisticated statistical analysis, with a focus on the skill of regression modeling in particular. We can now systematically measure the correlations between important factors, adding a quantitative dimension to our previously qualitative insights thanks to this sophisticated method. This analysis marks a turning point in our attempts to understand the intricate relationships that are buried in our information. Regression modeling has helped us gain a deeper knowledge by revealing the subtle patterns hidden inside our data and turning our theoretical understanding into measurable findings.

## MTH 522 – 11/24/2023

I’m ready to use more sophisticated statistical techniques, such regression modeling and hypothesis testing, to go deeper into our research after learning a lot from our first data investigation. By revealing complex relationships between different components, this tactical approach seeks to provide a more nuanced understanding of Boston’s economic dynamics. Moving past the basic investigation, we are now using advanced instruments to uncover more specific information about the connections between important variables. This next round of study should provide a more thorough understanding of how many factors interact, which will ultimately lead to a more complex and nuanced view of Boston’s economic environment.

## MTH 522 – 11/22/2023

Sentiment analysis is difficult since language is inherently ambiguous. A statement’s overall tone, sarcasm, and contextual cues all play a significant role in determining the sentiment it conveys. Moreover, generic sentiment analysis models could not perform well in domain-specific contexts; domain-specific lexicons should be added, and relevant data should be fine-tuned. Given their ability to significantly alter a sentence’s sentiment, negations and modifiers must also be taken into account. Sentiment analysis models that work successfully must take into account the impact of words like “not” and modifiers like “very.”

## MTH 522 – 11/20/2023

## MTH 522 – 11/17/2023

## MTH 522 – 11/15/2023

## MTH 522 – 11/13/2023

## MTH 522 – 11/30/2023

## MTH 522 – 11/10/2023

Today i completely worked on my project i.e. on performing the clustering and writing the project report.

## MTH 522 – 11/12/2023

## MTH 522 – 11/08/2023

## MTH 522 – 11/06/2023

## MTH 522 – 11/03/2023

## MTH 522 – 11/01/2023

Analysis of Variance, or ANOVA, is a statistical tool used to evaluate mean differences between different groups in a sample. An analysis of variance (ANOVA) is very useful in determining statistically significant differences between three or more groups/conditions. Through the assessment of whether the difference between group means exceeds the variation within groups, ANOVA is useful in a variety of research and experimental contexts.

## MTH 522 – 10/30/2023

## MTH 522 – 10/27/2023

## MTH 522 – 10/23/2023

## MTH 522 – 10/20/2023

## MTH 522 – 10/18/2023

## MTH 522 – 10/16/2023

## MTH 522 – 10/13/2023

## MTH 522 – 10/11/2023

The “Fatal Force Database,” a comprehensive project that painstakingly tracks and documents instances in which American police officers shoot and mortally hurt citizens while carrying out their duty, was first reported by The Washington Post. Important information is included in this database, such as the victim’s race, the circumstances of the shootings, if the victim was carrying a weapon, and whether the victim was going through a mental health crisis. Data is gathered from a number of sources, including as social media, law enforcement websites, independent databases like Fatal Encounters, and local news articles.

## MTH 522 Project 1 -10/08/2023

## MTH 522 – 10/06/2023

I worked carefully today to code for our forthcoming project submission and to write a thorough report to go with it. Making strides while maintaining focus on our project objectives

## MTH 522 – 10/04/2023

I have started with the project by using the available datasets to do data analysis and write reports. I’m currently working on data filtering and exploratory analysis in Spyder. Currently analyzing the data to draw more in sites on it to proceed accordingly with it.

## MTH 522 – 10/02/2023

## MTH 522 09/29/2023

## MTH 522 – 09/27/2023

I successfully learnt and applied the technique of 5-fold cross-validation and polynomial regression. The process of translating this technique into Python was pleasant, albeit not without problems. Adapting the data pretreatment methods, guaranteeing data consistency, and dealing with any data anomalies were all key challenges. Furthermore, transferring the nonlinear model fitting part into Python necessitated a good understanding of Python tools such as scikit-learn as well as modeling functions.

Another noteworthy task was reproducing data visualizations, specifically the ListPlot of mean square error (MSE) data, using Python plotting packages such as Matplotlib or Seaborn. Python’s syntax and customisation choices for creating similar charts differed from those in Mathematica. Finally, debugging and error handling were critical in verifying that the Python code generated results that were consistent with the Mathematica code. Despite these difficulties, the translation process gave an excellent opportunity for me to improve my grasp of cross-validation and polynomial regression within a Python programming context.

## MTH 522 – 09/26/2023

I watched the video Estimating Prediction Error and Validation Set Approach and learned how important it is to accurately estimate prediction error in machine learning models. This video emphasized the significance of reserving a validation set to evaluate the model’s performance.

The K-fold Cross-Validation video was quite informative. I now understand how K-fold cross-validation improves the robustness of a model’s performance over a single validation set. I utilized this strategy in Python for a diabetic data, dividing my dataset into ‘K’ subgroups and using each as a validation set while training on the remaining data iteratively. I found the ‘Cross-Validation: The Right and Wrong Ways’ video to be quite informative. It stressed the correct and incorrect methods of performing cross-validation, shining light on typical blunders to avoid.

## MTH 522 – 09/22/2023

I carried out a detailed investigation by first visualizing the non-normal distribution of the post-molt and pre-molt data by producing histograms for each group of data independently. To learn more, I plotted both histograms side by side and performed a statistical analysis to determine whether there was a significant difference in the means. I did this by calculating the p-value using a t-test.

## MTH 522 – 09/20/2023

I have studied linear regression modeling and its application to data fitting in last class. This entails understanding how to work with non-normally distributed, skewed variables with high variation and kurtosis. Furthermore, I’ve dabbled with data analysis with a dataset that includes two measurements: “post-molt,” which indicates the size of a crab’s shell after molting, and “pre-molt,” which reflects the size of a crab’s shell before molting. In reality, I’ve successfully performed linear regression on this dataset, drawing the regression line and evaluating descriptive statistics to gain a better understanding of the data’s relationships. I have also learnt about the significance of the t-test and its use in statistical analysis.

## MTH 522 – 09/18/2023

## MTH 522 – 09/15/2023

During our last class, My comprehension of the significance of p-values in statistical analysis was strengthened as a result of our in-depth discussion of the topic. I also expressed some concerns about how linear regression calculates distances. Given that the distance is normally calculated as the perpendicular distance between a point and a line in geometry, I was confused as to why the distance was calculated parallel to the y-axis. I also questioned why polynomials of different degrees weren’t explored instead of using the linear equation “y = mx + c” exclusively to model data interactions. I’m happy to report that my inquiries were thoughtfully answered in class, giving me clarification on these crucial topics.

Additionally, I learned a lot from the inquiries made by others.

## MTH 522 – 09/13/2023

When I compare the kurtosis numbers for diabetes and inactivity to those given in the course materials and discussed in class, I see a disparity. I used the kurtosis() function from the scipy library to quantify these statistics. The observable variances have prompted inquiries about the distribution of the underlying data and the potential causes of these variations. I’m looking at possible explanations for these discrepancies and trying to make the calculated kurtosis values match the anticipated trends. For our data analysis to be accurate and reliable, this inquiry is crucial.

Along with the kurtosis research, I have also dabbled with modeling the link between diabetes and inactivity using regression approaches. Although linear regression is frequently used for this purpose, I experimented with polynomial regression to identify more complex data patterns. A polynomial of degree 6 (y = -0.00x^8 0.00x^7 -0.14x^6 3.88x^5 -67.96x^4 753.86x^3 -5171.95x^2 20053.68x^1 -33621.52) offers the best fit for our dataset, according to my analysis of different polynomial degrees. This conclusion raises an important question: Why do we frequently use linear regression when polynomial regression seems to provide a more realistic depiction of the complexity of the data? To understand these variables’ dynamics better and to choose the best regression strategy for this particular dataset, more research is required and I am currently on it.

## MTH 522 – 09/11/2023

## Hello world!

Welcome to UMassD WordPress. This is your first post. Edit or delete it, then start blogging!