EN PT


0337/2024 - PADRÕES BIOLÓGICOS, SOCIAIS E COMPORTAMENTAIS E O DESENVOLVIMENTO DE CÂNCER DE PELE NA REGIÃO AMAZÔNICA
BIOLOGICAL, SOCIAL AND BEHAVIORAL PATTERNS AND THE DEVELOPMENT OF SKIN CANCER IN THE AMAZONIAN REGION

Author:

• Thayse de Oliveira Brito - Brito, T.O - <thayseobrito@gmail.com>
ORCID: https://orcid.org/0000-0001-5785-3926

Co-author(s):

• Renata Cardoso Costa - Costa, R.C - <renata.costa@altamira.ufpa.br>
ORCID: https://orcid.org/0000-0003-3583-4622
• Dalberto Lucianelli Junior - Lucianelli Junior, D. - <juniorlucianelli@gmail.com>
ORCID: https://orcid.org/0000-0002-5919-0975
• Maria do Carmo Faria Paes - Paes, M.C.F - <maria.paes@bio5.rwth-aachen.de>
ORCID: https://orcid.org/0000-0002-4689-6928
• João Vitor Barbosa Pinheiro - Pinheiro, J.V.B - <jv.drogo@outlook.com>
ORCID: https://orcid.org/0000-0002-1889-161X
• Ana Laura Guimarães Moura - Moura, A.L.G - <mguimaraesanalaura@gmail.com>
ORCID: https://orcid.org/0000-0002-6261-2825
• Rodrigo Silveira - Silveira, R. - <rodrigo_silveira@usp.br>
ORCID: https://orcid.org/0000-0001-6330-1669
• Fernanda Nogueira Valentin Lucianelli - Lucianelli, F.N.V - <fer.valentin@ufpa.br>
ORCID: https://orcid.org/0000-0002-8279-3758


Abstract:

Introdução: O câncer de pele representa 33% dos tumores no Brasil, constituindo um grave problema de saúde pública tanto no Brasil quanto no mundo. Objetivo: avaliar os padrões biológicos, sociais e comportamentais na região amazônica, como também analisar as características desses padrões em relação aos indicadores de saúde e determinar os fatores de risco considerados mais importantes. Metodologia: Estudo transversal exploratório realizado com uma amostra da população da região amazônica (n=1014), a partir da aplicação de questionário estruturado, visando obter informações sobre hábitos de vida, padrões sociais e biológicos. Foi utilizado o algoritmo de agrupamento K-means, dividindo a amostra por clusterização. Resultados: Obteve-se dois grupos, em que o grupo 1 é constituído por pessoas jovens, brancas, com cabelos castanhos e menor predominância de manchas na pele, mas que se expõem ao sol no intervalo entre 10 e 16 horas. Já o grupo 2 é formado por pessoas mais adultas, pardas, que se expõem ao sol em horários não recomendados, mas por razões trabalhistas. Conclusão: Infere-se que os dois grupos estão sujeitos a riscos diferentes, apesar da pouca taxa de uso de itens que atuam na prevenção ao câncer de pele.

Keywords:

Câncer de pele, fatores de risco, raios ultravioletas, prevenção.

Content:

INTRODUCTION
Cancer represents a significant public health concern on a global scale, ranking as one of the primary causes of mortality and posing a considerable challenge to enhancing life expectancy 1. The most prevalent form of cancer globally and in Brazil is skin cancer. As indicated by data from the National Cancer Institute (INCA), this particular form of cancer represents 33% of all cancer diagnoses in the country. Annually, approximately 180,000 new cases of skin cancer are reported in Brazil 2,3.
Skin cancers can be classified as non-melanoma and melanoma. The most prevalent form is non-melanoma and although it has low lethality, this category of cancer encompasses basal cell carcinoma (BCC) and squamous cell carcinoma (SCC), with an incidence rate in the Brazilian population of 70% and 25%, respectively, responsible for 177,000 new cases of the disease annually 4,5.
Melanoma skin cancer (MSC) is the most aggressive and lethal type of skin cancer. Despite being the least common, it has the highest mortality rate, with approximately 8,400 cases reported annually. However, if diagnosed at an early stage, the cure rate is high, exceeding 90% 5.
The primary cause of skin cancer is natural ultraviolet (UV) radiation emitted by the sun. Ultraviolet radiation can penetrate the skin's layers, reaching the basal cells, where epidermal cells multiply. This results in damage to the molecular bases of DNA and RNA, altering their structure and function 6. Regions with a tropical climate and high altitudes experience more intense UV radiation.
For Brazil, the estimated number of new cases of non-melanoma skin cancer for the period between 2023 and 2025 is approximately 220,000. Of these, it is estimated that 102,000 will be men and 118,000 women 1.
Non-melanoma skin cancer represents the most prevalent form of cancer in Brazil. The prevalence of this condition is higher in men residing in the South, Southeast, and Central-West regions. In women, non-melanoma skin cancer is the most prevalent in all Brazilian regions, with the highest estimated risk observed in the South (164.79 per 100,000 women), followed by the Southeast (123.33 per 100,000) ¹.
While the incidence of skin cancer is relatively low in the North Region compared to other regions of Brazil, the combination of high levels of sunlight and a population with specific social, biological, and behavioral characteristics may contribute to an increased risk of developing this neoplasm 8.
Furthermore, a significant proportion of the Amazonian population, comprising a diverse range of ethnic groups, engages in various outdoor activities, including agriculture and fishing, which are known to involve prolonged exposure to the sun. It is crucial to consider the challenges associated with accessing healthcare, the limited availability of specialized services, and the potential for underreporting of cases due to insufficiently trained professionals who are unable to make a proper diagnosis 9.
Although studies have been conducted on the prevalence and incidence of skin cancer cases in Brazil, there is a lack of information regarding the population's knowledge and prevention practices in the Amazon region. It is therefore imperative that comprehensive research be conducted in order to accurately portray the sociodemographic, behavioral, and biological patterns that contribute to the emergence of skin cancer, particularly in this region, where there is a high incidence of sunlight 10.
Therefore, the utilization of unsupervised algorithms and the application of clustering techniques are capable of exploring patterns in the characteristics of this population that may correlate with the risk of skin cancer. In this context, the objective of this study was to provide a comprehensive assessment of biological, social, and behavioral patterns in municipalities in the North region. Additionally, the characteristics of these patterns were examined concerning health indicators, and the risk factors deemed most significant were identified.
METHODOLOGY
Ethical-legal aspects
The research was approved by the Ethics Committee of the Tropical Medicine Center of the Federal University of Pará (NMT) under the following number: CAAE: 33548620.0.0000.0018. The participants were informed about the objective of the study, the optional nature of their participation, and the necessity of signing the Informed Consent Form in accordance with the resolutions of the National Health Council (CNS) No. 466/2012 and 510/2016.
Study type and location, population, and sample
This is a cross-sectional quantitative study conducted through the administration of a questionnaire to a representative sample of the population in the Amazon region. The research was conducted in the municipalities of Altamira (PA), Belém (PA), Cametá (PA), Parauapebas (PA), and Porto Velho (RO).
In accordance with data regarding demographic diversity from the Brazilian Institute of Geography and Statistics 11, the sample size was calculated to ensure the inclusion of a representative number of individuals aged 18 and above in the Amazonian region.
Therefore, as these are exploratory analyses, there is no way to establish an effect size hypothesis based solely on the population studied. However, it was estimated by the G Power software that the effects would be at least moderate due to clustering by K-means, hypothesizing a moderate effect size in the variables with 5 categories, for a value of p < 0.05 and statistical power = 95%. The statistical test that estimated the largest sample size was the Chi-square test for comparing age range, ethnicity, and education. This test estimated a sample size of n = 220 for 4 degrees of freedom (5 categories), adding approximately 10% in case of sample loss due to eligibility criteria, totaling n = 245.
Due to the cultural plurality in the regions studied, to ensure a greater representation of the local population, 50% of the sample size estimated in the main analysis was added for each municipality. Thus, a total sample size of n = ~613 was estimated (122.5 [half of n = 245 estimated for main analysis] x the 5 municipalities studied).
The study population consisted of individuals of both genders who were over 18 years of age, had provided informed consent, and completed the questionnaire. Individuals with cognitive, learning, or communication impairments, as well as those who declined to participate despite having signed the informed consent form, were excluded from the study. A total of 1,014 individuals consented to participate in the research.
Data collection and instrument
The data was gathered through direct engagement with individuals in public settings that are prone to prolonged sun exposure, including public squares, beaches, and open-air tourist attractions in the aforementioned cities. The data collection was conducted over the course of two years, from July 2021 to June 2022.
The questionnaire consisted of closed-ended questions designed to elicit information about various characteristics based on different variables. The variables included biological factors such as (a) age, ethnicity, sex, eye and hair color, and the presence of spots or sunburns on the body. Additionally, social factors (b), such as level of education, behavioral factors (c), such as the reason, frequency, and time of exposure to the sun, the frequency of sunscreen, hat, cap, sunglasses, long-sleeved t-shirt, umbrella use, and sources of information about skin cancer, were included in the questionnaire. The questionnaire also evaluated the respondents' familiarity with the ABCDE method for identifying melanoma, which includes the following criteria: A (asymmetry of the lesion), B (irregular edges), C (variable colors), D (diameter > 6 mm), and E (evolution).
Statistical analysis
Given the assumption that the development of skin cancer is multifactorial 10 and can be affected by various factors (e.g., age, education level, place of work, exposure to the sun, among others), the sample was divided through clustering, adopting the K-means clustering algorithm. This unsupervised method enables the algorithm to learn how all variables in a data set relate to each other, subsequently dividing participants into different predetermined groups 12, 13.
Based on the hypothesis that biological, social, and behavioral factors influence the occurrence of skin cancer, we propose that individuals may exhibit distinct patterns of characteristics and behaviors. Accordingly, the K-means algorithm was configured to partition the sample into two distinct groups. To guarantee the quality of the clustering, specific strategies were implemented.
In the present study, the application of the K-means algorithm was enhanced by the K-means++ initialization method14 for clustering a set of categorical data. The application of this algorithm to categorical data presents a significant challenge due to the inherent limitations of the dissimilarity metric when compared to the Euclidean distance used for quantitative variables15. To mitigate these challenges and increase the algorithm's effectiveness, the following complementary steps were implemented:
1) Normalization of Columns - Column normalization was a fundamental step in the analysis process. Considering that the database contained a total of 25 variables, each with 2 to 5 categories, normalization ensured that all variables contributed equally to the calculation of distances between data points. Without normalization, variables with more categories could dominate the cluster formation, resulting in biased outcomes. With normalization, the influence of each variable was balanced, promoting a fairer and more accurate analysis.
2) Application of K-means++ - K-means is a widely used clustering algorithm, recognized for its simplicity and effectiveness in data segmentation. However, its performance can be significantly affected by the initial choice of centroids. To address this issue, we employed the K-means++ method, which optimizes the initialization of cluster centers, reducing the probability of convergence to local minima and improving the quality of the formed clusters. This technique proved especially useful in the context of categorical data, ensuring a more balanced initial distribution of centroids.
3) Application of the Elbow Method - To ensure that the suggested quantity of 2 clusters in the study's hypothesis was indeed ideal, the Elbow Method, a visually intuitive and effective technique, was used. This method involves plotting the sum of squared distances within clusters (inertia) as a function of the number of clusters. The point at which the rate of inertia reduction decreases significantly, forming an "elbow" in the curve, is considered the ideal number of clusters. This approach allowed the identification of a balance point where the addition of new clusters provides marginal benefits, avoiding over-segmentation of the data.
4) Application of the Multilayer Perceptron Algorithm – To ensure the efficacy of the previous steps and the quality of clustering, the Multilayer Perceptron algorithm was also applied 16,17. Through the area under the curve (AUC) coefficient 18, it tested the separability of the data by the clusters obtained by K-means.
The Multilayer Perceptron is a supervised machine learning algorithm that, through artificial neural networks, is capable of identifying non-linear patterns among different variables in a dataset and, consequently, provides a prediction of some predetermined variable. The learning process of this algorithm involves the following steps: 1) initializing the weights; 2) calculating the outputs from the input layer to the hidden layer followed by the outputs from the hidden layer to the output layer; 3) calculating the error rate in the predictions of the output layer and adjusting the weights; 4) repeating all previous steps until the error rate becomes as low as possible 18.
All quantitative variables were rescaled to intervals between 0 and 1. The sample was randomly divided into two datasets, with 70% of the entire sample used to train the algorithm and 30% used for testing. The minibatch method was selected for training, and the gradient descent method was chosen for optimization. Due to the Multilayer Perceptron potentially presenting different results each time it is executed, due to the randomization of data partitions for cross-validation and weight initialization, the algorithm was executed three times. The attempt with the lowest average cross-entropy error ([training error + test error] / 2) was selected.
To identify the variables that most contributed to the division of the obtained clusters, the importance of each variable in determining the artificial neural networks was calculated, thereby performing a sensitivity analysis based on the combined training and test samples, creating a table displaying a ranking of importance for each variable19. For application of the sensitivity analysis in the present study, the Multilayer Perceptron was reapplied following the same architecture cited above, however, with only the variables with the significant statistical difference (p < 0.05) between the clusters.
For proportions, categorical data were used to evaluate the characteristics of the groups. For comparisons between groups in categorical data, Chi-square was used with Fisher corrections (if there were < 6 individuals in any position in the contingency table). The significance level was set at p < 0.05.
To identify the most important values in the contingency tables >2x2, in all significant categorical analyses, the residual adjustment > 2 was adopted. To evaluate the effect size of the chi-square test, the Phi and Cramer tests were used (v). For the quality of adjustment in 2 × 2 contingency tables, Phi was used, equivalent to the correlation coefficient r, defined as: 0.1 considered as a small effect, 0.3 (intermediate effect), 0.5 (large effect). The Cramer's effect measure (v) was applied to a contingency table greater than 2. For each degree of freedom, a different value is considered in classifying the effect size. The formula for calculating Cramér's V coefficient is as follows:
V=?(?2/(n?min(k-1,r-1)))

where: ?² represents the chi-square value obtained from the test; n denotes the sample size (total number of observations); k indicates the number of categories in the column variable (number of columns in the contingency table); and r signifies the number of categories in the row variable (number of rows in the contingency table) 20.
To evaluate the division of groups and classify the degree of importance of each variable in determining the belonging groups, the multilayer perceptron algorithm was applied. It is considered a supervised machine learning algorithm that, through artificial neural networks, is capable of finding non-linear patterns between different variables in a set of data, and the response provides a prediction of some predetermined variable. The algorithm was applied in triplicate and the one chosen was the one with the lowest error. The quantitative variables were rescaled into intervals of 0 and 1. Only the variables that gave statistical significance (p<0.05) were considered in the test to indicate the groups provided by K-means and thus calculate and classify the importance of each factor.
All group division evaluation procedures were carried out using the IBM SPSS Statistics for Windows.
RESULTS
The algorithm has classified the sample into two distinct groups, designated as Group 1 and Group 2. With regard to the biological and social characteristics presented in Table 1, Group 1 is predominantly composed of young adults aged 18-30 years (58.6%), who are primarily white (34.2%), female (68.9%), with brown hair (52.8%), and exhibit fewer skin blemishes (70.8%) but more sunburns (52.8%). Additionally, individuals in Group 1 have attained higher levels of education than those in Group 2. In contrast, Group 2 is predominantly exposed to the sun due to occupational requirements (93.5%), whereas Group 1 is exposed to the sun for leisure purposes (53.3%). The study indicates that group 1 is occasionally exposed to the sun, whereas group 2 is frequently exposed to the sun due to occupational reasons. Significant differences and large effect sizes are evident with regard to both sun exposure and its frequency, as illustrated in Table 2.
With regard to the duration of sun exposure, the most notable differences were observed "at all times," with group 2 exhibiting a high prevalence (45.1%), and between "10 am and 4 pm," group 1 demonstrated a higher prevalence (37.6%) (Table 2). Furthermore, as illustrated in the table, the majority of individuals in Group 1 rarely utilize sunscreen (49.7%) and a cap or hat (31.9%), whereas in Group 2, 53.0% of respondents never use sunscreen. However, when it comes to protecting themselves with a cap or hat, the majority of them use it daily (28.2%). With regard to the use of long-sleeved t-shirts, notable discrepancies were observed between the categories "daily" and "never." Group 2 exhibited a proclivity for daily use, whereas Group 1 never uses them, as evidenced in Table 3. As evidenced in the aforementioned table, the behavior related to the period of sun exposure, the use of sunscreen, caps/hats, and the use of long-sleeved t-shirts exhibited a medium effect size, with p < 0.001.
When observing the frequency of use of sunglasses and parasols, still in table 3, it is possible to notice that group 1 sometimes uses sunglasses and parasols and, on the other hand, group 2 never uses sunglasses/umbrella. Despite the differences p < 0.01 (sunglasses) and p < 0.001 (umbrella), such habits had a small effect size.


Other languages:







How to

Cite

Brito, T.O, Costa, R.C, Lucianelli Junior, D., Paes, M.C.F, Pinheiro, J.V.B, Moura, A.L.G, Silveira, R., Lucianelli, F.N.V. PADRÕES BIOLÓGICOS, SOCIAIS E COMPORTAMENTAIS E O DESENVOLVIMENTO DE CÂNCER DE PELE NA REGIÃO AMAZÔNICA. Cien Saude Colet [periódico na internet] (2024/Oct). [Citado em 22/01/2025]. Está disponível em: http://cienciaesaudecoletiva.com.br/en/articles/padroes-biologicos-sociais-e-comportamentais-e-o-desenvolvimento-de-cancer-de-pele-na-regiao-amazonica/19385?id=19385&id=19385



Execution



Sponsors