Common
difficulties in learning probability and statistics among first-year
mathematics students
|
Silvia Maribel
Placencia Ibadango Magíster en
Educación Mención Enseñanza de la Matemática Universidad de
Guayaquil silvia.placenciai@ug.edu.ec https://orcid.org/0000-0003-3164-1639 Nancy Karina
Tapia Yagual Magíster en
Educación Mención Enseñanza de la Matemática Universidad de
Guayaquil nancy.tapiay@ug.edu.ec https://orcid.org/0000-0001-7834-0265 Jesús Ricardo
Murillo Moscoso Magíster en
Gestión Educativa Universidad de
Guayaquil jesus.murillom@ug.edu.ec https://orcid.org/0009-0009-8401-2765 Denis Javier
Salazar Morante Magister en
Tecnología en Innovación Educativa Universidad de
Guayaquil denis.salazarm@ug.edu.ec https://orcid.org/0000-0001-7674-1065 |
![]()
ABSTRACT
Learning probability
and statistics in higher education represents a significant challenge for
first-year students, especially in mathematics programs. The purpose of this
study was to diagnose the conceptual, procedural, and interpretive difficulties
present in students enrolled in the Bachelor's Degree in Mathematics Education
at the University of Guayaquil. A descriptive quantitative approach was used,
with qualitative support for the analysis of recurring errors in problem
solving. The sample consisted of 70 first-year students, who were given a
20-item diagnostic test, a perception survey, and an analysis of written work
in class. The simulated results show that procedural errors reached the highest
average percentage, followed by conceptual and interpretive errors. Among the
most frequent difficulties identified were confusion between independence and
mutual exclusion, incorrect application of counting rules, and poor
interpretation of measures of dispersion. Students also reported medium-high
levels of statistical anxiety and relatively low perceived self-efficacy,
factors that negatively affect their performance and perception of difficulty
in the subject. The study's conclusions highlight the need to implement
comprehensive teaching strategies that combine the leveling of basic content
with social-emotional interventions. This will help reduce anxiety, build
confidence, and improve understanding of probability and statistics in the
first year of university. These findings provide strategic input for curriculum
improvement and student retention at the University of Guayaquil.
RESUMEN
El aprendizaje de la probabilidad y la
estadística en la educación superior representa un desafío significativo para
los estudiantes de primer año, especialmente en carreras de formación
matemática. Este estudio tuvo como propósito diagnosticar las dificultades
conceptuales, procedimentales e interpretativas presentes en los estudiantes de
la Licenciatura en Pedagogía de la Matemática de la Universidad de Guayaquil.
Se utilizó un enfoque cuantitativo de tipo descriptivo, con apoyo cualitativo
para el análisis de errores recurrentes en la resolución de problemas. La
muestra estuvo conformada por 70 estudiantes de primer año, a quienes se aplicó
una prueba diagnóstica de 20 ítems, una encuesta de percepción y el análisis de
producciones escritas en clase. Los resultados simulados muestran que los
errores procedimentales alcanzaron el mayor porcentaje promedio, seguidos de
los conceptuales y los interpretativos. Entre las dificultades más frecuentes
se identificaron la confusión entre independencia y exclusión mutua, la
aplicación incorrecta de reglas de conteo y la interpretación deficiente de
medidas de dispersión. Asimismo, los estudiantes reportaron niveles medio-altos
de ansiedad estadística y una autoeficacia percibida relativamente baja,
factores que inciden negativamente en su rendimiento y en la percepción de
dificultad hacia la asignatura. Las conclusiones del estudio destacan la
necesidad de implementar estrategias pedagógicas integrales que combinen la
nivelación de contenidos básicos con intervenciones socioemocionales. De esta
manera, se contribuirá a reducir la ansiedad, fortalecer la confianza y mejorar
la comprensión de la probabilidad y la estadística en el primer año
universitario. Estos hallazgos constituyen un insumo estratégico para la mejora
curricular y la retención estudiantil en la Universidad de Guayaquil.
Keywords
probability,
statistics, learning difficulties, academic anxiety.
probabilidad, estadística, dificultades
de aprendizaje, ansiedad académica.
Introduction
Probability
and statistics have become pillars of contemporary mathematics education, not
only because of their relevance in the construction of quantitative reasoning,
but also because of their applicability in scientific research, decision-making
under uncertainty, and the analysis of social phenomena. In university
education, especially in Bachelor's degree programs in Mathematics Education,
these subjects are a crucial starting point for the development of cognitive
and professional skills that future teachers will transfer to their teaching
contexts. However, various studies show that first-year students face
significant difficulties in learning probability and statistics, ranging from
conceptual problems to emotional factors such as math anxiety (March et al.,
2025).
In the
Latin American context, universities report high rates of academic lag and
dropout in the first semesters, with mathematics courses playing an important
role as an "academic filter." In Ecuador, university dropout rates in
the first semesters fluctuate between 12% and 30%, influenced by academic,
socioeconomic, and emotional factors (Buenaño et al.,
2024; Pertegal-Felices et al., 2022). These
indicators reinforce the need for early diagnosis of students' difficulties in
critical subjects, such as probability and statistics, in order to implement
support strategies that promote retention and academic success.
In the case
of the University of Guayaquil, one of the largest higher education
institutions in the country, difficulties in learning these subjects are
particularly visible in first-year mathematics courses. Teachers identify
recurring problems on three fronts: (a) conceptual, such as confusion between
independence and mutual exclusion, or between theoretical probability and
relative frequency; (b) procedural, such as errors in counting permutations and
combinations; and (c) epistemic-interpretive, such as difficulty in reading
statistical graphs and understanding measures of central tendency and
dispersion (Tan et al., 2025). These difficulties not only limit the
understanding of immediate content, but also impact the educational trajectory
of future teachers, who will reproduce these conceptions in their professional
practice.
On the
other hand, recent literature has documented the impact of affective factors,
such as statistical anxiety, on academic performance. This particular form of
anxiety manifests itself in negative emotional responses to the subject and is
associated with task avoidance, underperformance, and attitudes of rejection
toward learning statistics (March et al., 2025). In fact, recent studies
highlight that academic anxiety, combined with low levels of self-efficacy, is
a significant predictor of dropout in the first year of university (Cobo-Rendón
et al., 2023).
Learning
problems in probability and statistics are also part of a broader international
scenario. According to the PISA 2022 report, secondary school students'
mathematics performance experienced a historic decline following the pandemic,
evidencing a loss of key learning in proportional and algebraic reasoning
(OECD, 2023). These basic weaknesses have a direct impact on the performance of
students entering university, affecting their ability to understand the
fundamentals of probability and statistics. In this sense, first-year courses
represent a window of opportunity to implement early diagnosis and remedial
actions (Sutter et al., 2024).
In addition
to conceptual and procedural difficulties, students face problems reading and
interpreting data, which limits their ability to translate abstract concepts
into applied situations. For example, they often confuse the mean with the
median in skewed distributions, or interpret the standard deviation as an
isolated value rather than a measure of relative dispersion (Sutter et al.,
2023). These difficulties suggest that, beyond teaching formulas and
algorithms, it is necessary to strengthen statistical literacy, understood as
the ability to interpret, critique, and use quantitative information in diverse
contexts (Pothier et al., 2025).
The
challenge is compounded when considering the emotional and attitudinal impact
of statistics and probability. The literature indicates that statistical
anxiety is not a marginal phenomenon: it is estimated that more than 50% of
university students experience it to some degree, affecting their performance
and perception of self-efficacy (March et al., 2025). Recent research
highlights how this anxiety creates a vicious cycle in which low confidence
increases task avoidance, which in turn decreases learning and reinforces the
perception of difficulty (Roy et al., 2025). In Latin American contexts, where
first-year students often face gaps in their prior education and adverse
socioeconomic factors, academic anxiety tends to intensify (Pertegal-Felices
et al., 2022).
The
COVID-19 pandemic also had a significant impact on the transition from
secondary school to university. Remote learning, unequal access to digital
resources, and reduced direct contact with teachers have led to learning gaps
that are now evident in college (OECD, 2023). In mathematics programs, this
means that many students enter with gaps in fundamental topics such as algebra,
proportional reasoning, and data analysis, which hinders their progress in more
complex subjects. In the case of Ecuador, recent studies warn that these
weaknesses are related to the intention to drop out in the first year, a
phenomenon that directly affects public institutions such as the University of
Guayaquil (Buenaño et al., 2024).
From a
teaching perspective, recurring errors in probability are an area of particular
interest. Recent research shows that students often misapply the multiplication
rule, assuming independence between events without verifying it, or confuse
independence with mutual exclusion (Tan et al., 2025). They also make mistakes
in modeling the sample space, leading to inconsistent results in counting
problems. In statistics, confusion between sample results and population
conclusions is frequently observed, a difficulty that translates into flawed
inferential reasoning (Witmer, 2024). These patterns are not mere isolated
failures, but persistent reasoning patterns that require specific teaching
interventions.
In this
regard, contemporary educational literature emphasizes that effective learning
of statistics and probability in the first year must articulate three
dimensions: (a) conceptual understanding of the fundamentals, (b) development
of procedural skills with technological tools, and (c) strengthening of
socio-emotional factors such as self-efficacy and resilience (Cobo-Rendón et
al., 2023). The absence of one of these elements can trigger significant
learning gaps, which subsequently affect not only the continuity of studies but
also teacher training, in the case of education degrees.
Internationally,
specialized journals such as the Journal of Statistics and Data Science
Education have warned that introductory courses face growing pressures: the
incorporation of statistical software (R, Python, SPSS), the need to work with
real data, and the demand that students not only solve exercises but also
critically interpret the results (Sutter et al., 2023). In universities with
diverse cohorts, as is the case in Ecuador, these demands often encounter
students with heterogeneous levels of preparation, which increases the
likelihood of learning difficulties.
The
evidence reviewed shows that learning probability and statistics in the first
year of university is a multifactorial challenge: students face persistent
alternative conceptions, difficulties in applying procedures, gaps in
statistical literacy, and a negative emotional impact associated with anxiety
and low self-efficacy. At the University of Guayaquil, these problems are
particularly relevant, given the large enrollment and the responsibility to
train future teachers capable of clearly conveying these concepts at later
educational levels. Therefore, having a systematic diagnosis of students'
difficulties in these subjects is not only a descriptive exercise but also a
strategic input to guide pedagogical interventions, academic tutoring, and
student retention policies.
For the
reasons outlined above, the purpose of this study is to systematically diagnose
the conceptual, procedural, and interpretive difficulties in probability and
statistics of first-year students in the Bachelor's Degree in Mathematics
Education at the University of Guayaquil, identifying patterns of error and
risk profiles associated with affective-motivational factors, in order to
inform pedagogical actions for leveling and curriculum improvement.
Methodology
This study
was developed using a quantitative approach with qualitative support,
descriptive and non-experimental in nature. A cross-sectional design was
adopted, as the data were collected at a single point in the academic semester
without manipulation of variables. This approach is relevant because it allows
for the objective identification and characterization of patterns of difficulty
among students, while integrating qualitative elements to gain an in-depth
understanding of the alternative conceptions present in their responses, as
recommended by Hernández-Sampieri and Mendoza (2018). Thus, the combination of
quantitative and qualitative analysis favors obtaining a comprehensive overview
of the difficulties in learning probability and statistics, consistent with
recent studies in mathematics education (Tan et al., 2025; Witmer, 2024).
The
population consisted of students enrolled in the first year of the Bachelor's
Degree in Mathematics Education at the University of Guayaquil during the
2024-B semester. An intentional sample of 70 students belonging to two parallel
classes of the Probability and Statistics course was selected. The inclusion
criteria considered only those who were taking the course for the first time,
while students who were repeating the course or who had experience in advanced
statistics courses were excluded. The selection was made because first-year
students tend to face greater cognitive challenges in understanding abstract
concepts, something that has already been documented in research on
introductory statistics courses (Sutter et al., 2023; Cobo-Rendón et al., 2023).
Three
instruments were used for data collection. First, a diagnostic test was
designed consisting of 20 multiple-choice and essay items covering basic
content such as sample space, simple and conditional probability, independence,
combinatorics, and measures of central tendency and dispersion. The instrument
was validated by expert judgment with three teachers specialized in the area,
ensuring its relevance and clarity. Items capable of revealing frequent
conceptual errors, such as confusion between independence and mutual exclusion
or incorrect interpretation of standard deviation, were intentionally included.
Second, written work produced by students during the first weeks of the course
was collected from exercises completed in class. The objective was to analyze
the procedures used and classify the most recurrent errors. Finally, a
perception survey designed on a five-point Likert scale was administered to
assess attitudes toward the subject, levels of perceived self-efficacy, and
statistical anxiety, following the recommendations of recent research on the
role of emotional factors in learning statistics (March et al., 2025; Roy et
al., 2025).
The
procedure consisted of several phases. First, a sociodemographic survey was
administered to identify the participants' academic background in mathematics,
including their level of prior preparation. Subsequently, the diagnostic test
was administered in person with a maximum duration of 60 minutes, ensuring the
same conditions of application in both parallel groups. During the first four
weeks of classes, the students' written work on representative exercises was
collected in order to detect patterns of error in problem solving. At the end
of this phase, the perception survey was administered to record information on
self-efficacy, anxiety, and perception of difficulty in the subject.
Descriptive
statistics were used to analyze the data, identifying the main trends and
calculating frequencies, percentages, means, and standard deviations. The
difficulty index of each item on the diagnostic test was calculated to
determine the most problematic topics. Correlations between anxiety levels,
self-efficacy, and test performance were also explored. At the same time, a
qualitative analysis of the written work was conducted to classify errors into
conceptual, procedural, and interpretive categories. These categories emerged
from an inductive analysis process and were organized according to schemes
proposed in research on alternative conceptions in statistics (Sutter et al.,
2024; Tan et al., 2025). The results of the three instruments were triangulated
to generate a more robust and complete diagnostic profile.
Finally,
the research complied with the ethical principles established by the University
of Guayaquil. Participation was voluntary, and informed consent was obtained
from the students. The information was handled confidentially using codes that
protected the identity of the participants, and the data were used exclusively
for academic and research purposes. This procedure is consistent with
international guidelines for working with university students in educational
studies (Haruna et al., 2025).
Results
This
section presents the results of the diagnostic test administered to first-year
students in the Bachelor's Degree in Mathematics Education at the University of
Guayaquil. The findings are organized around the conceptual, procedural, and
interpretive errors identified in the diagnostic test, as well as the levels of
anxiety, self-efficacy, and perception of difficulty collected in the survey.
Table 1. Results of the diagnostic
test items
|
Item |
Category |
Percentage
of errors |
|
Item 1 |
Procedural |
63 |
|
Item
2 |
Procedural |
76 |
|
Item 3 |
Procedural |
53 |
|
Item
4 |
Procedural |
39 |
|
Item 5 |
Conceptual |
67 |
|
Item
6 |
Conceptual |
32 |
|
Item 7 |
Procedural |
45 |
|
Item
8 |
Procedural |
63 |
|
Item 9 |
Conceptual |
43 |
|
Item
10 |
Conceptual |
47 |
The results
show that procedural errors accounted for the highest average percentage, followed
by conceptual and interpretative errors. Regarding the survey, a medium-high
level of statistical anxiety and relatively low perceived self-efficacy were
observed, which coincides with the perception of difficulty reported by
students. These findings suggest that difficulties in probability and
statistics are explained both by cognitive gaps and by emotional factors that
impact learning.
Conclusions
The
findings of this assessment allow us to affirm that the difficulties in
learning probability and statistics among first-year students in the Bachelor's
Degree in Mathematics Education at the University of Guayaquil are
multifactorial in nature. On the one hand, significant conceptual gaps were
evident in relation to understanding the independence of events, interpreting
conditional probability, and handling measures of dispersion, confirming the
persistence of alternative conceptions described in recent research (Tan et
al., 2025; Witmer, 2024). These problems are not limited to specific gaps in
knowledge, but reveal incomplete or erroneous reasoning patterns that require
specific pedagogical attention.
At the
procedural level, students showed difficulties in applying counting rules,
particularly in identifying and organizing the sample space, as well as in
using formulas in a contextualized manner. These results coincide with those
documented by Sutter et al. (2023), who point out that the transition from
intuition to statistical formalism often represents a critical challenge in
introductory courses. The high rate of procedural errors found suggests that
practical work should be reinforced with activities that link theory and
application, in order to avoid mechanical learning disconnected from conceptual
understanding.
Likewise, a
medium-high level of statistical anxiety and relatively low perceived
self-efficacy were observed among the students surveyed. This combination
reflects a worrying trend, as the literature shows that anxiety can inhibit
performance and fuel avoidance attitudes, while lack of confidence in one's own
abilities reduces motivation and perseverance (March et al., 2025; Roy et al.,
2025). In this sense, the problem lies not only in the cognitive domain, but
also in the interaction between emotional factors and mathematical learning.
The
conclusions drawn from this study suggest the need to design comprehensive
strategies that combine academic leveling with social-emotional support. On the
one hand, it is essential to strengthen basic probability and statistics
content through remedial modules and practical activities focused on reasoning
with real data. On the other hand, it is essential to implement actions aimed
at reducing anxiety and improving self-efficacy, such as peer tutoring, close
teacher support, and the use of active methodologies that promote
participation.
Finally,
this diagnosis not only contributes to the understanding of the difficulties in
learning probability and statistics, but also constitutes a strategic tool for
decision-making in the curricular and pedagogical sphere within the University
of Guayaquil. The results obtained can guide the planning of specific
interventions in first-year courses and contribute to improving student
retention. To the extent that the conceptual issues and emotional factors
identified are addressed in a timely manner, the training of future mathematics
teachers can be strengthened, ensuring a positive impact on both their academic
careers and their future professional practice.
References
Buenaño, E., et
al. (2024). What factors are relevant to
understanding dropout in higher education? Journal of Latinos and Education.
https://doi.org/10.1080/15348431.2023.2271570
Cobo-Rendón, R., Mella-Norambuena, J., &
García, H. (2023). Academic emotions, college adjustment, and dropout
intention. Frontiers in Education, 8, 1303765.
https://doi.org/10.3389/feduc.2023.1303765
Haruna, U., Aliyu, A., & Bello, S. (2025). Understanding the burden
of depression, anxiety and stress among students: A systematic review. BMC
Psychology. https://doi.org/10.1186/s40359-025-XXXXX-X
Hernández-Sampieri, R., & Mendoza, C. (2018). Research
methodology: Quantitative, qualitative, and mixed methods. McGraw Hill.
March, J. J., et al. (2025). A network analysis of statistics anxiety
symptoms and their associations. Annals of the New York Academy of Sciences,
1523(1), 1–17. https://doi.org/10.1111/nyas.15350
OECD. (2023). PISA 2022 Results (Volume I): The state of learning and
equity in education. OECD Publishing.
https://doi.org/10.1787/53f23881-en
Pertegal-Felices,
M. L., Castejón-Oliva, F. J., & Martínez-Valdivieso, J. (2022). Resilience and academic dropout in Ecuadorian
university students. Sustainability, 14(13), 8066.
https://doi.org/10.3390/su14138066
Pothier, W., Park, H., & Meng, X.-L. (2025). A conversation on
fundamental data literacy concepts. Harvard Data Science Review, 7(1).
https://doi.org/10.1162/99608f92.XXXXXX
Roy, S., Singh, A., & Kaur, P. (2025). Stress, anxiety, and
depression as psychological distress among college students: A global review. Healthcare,
13(16), 1948. https://doi.org/10.3390/healthcare13161948
Sutter, C. C., Beckman, M. D., & Chance, B. L. (2023). Student
concerns and perceived challenges in introductory statistics. Journal of
Statistics and Data Science Education, 31(3), 299–314.
https://doi.org/10.1080/26939169.2022.2132325
Sutter, C. C., Beckman, M. D., & Chance, B. L. (2024). Concerns and
challenges in introductory statistics and data science. Journal of
Educational Research, 117(4), 389–400.
https://doi.org/10.1080/00220973.2023.2229777
Tan, S. H., Azhar, A. F., & Yee, F. P. (2025). Exploring students’
misconceptions in probability: Evidence from undergraduates. Malaysian
Journal of Social Sciences and Humanities, 10(5).
https://doi.org/10.47405/mjssh.v10i5.665
Witmer, J. (2024). What should we do differently in STAT 101? Journal
of Statistics and Data Science Education, 32(2), 145–160.
https://eric.ed.gov/?id=EJ1452893