0042/2026 - Can the Net Promoter Score predict the Primary Care Assessment Tool? A transfer learning approach
O Net Promoter Score pode predizer o Primary Care Assessment Tool? Uma abordagem de transfer learning
Autor:
• Luiz Alexandre Chisini - Chisini, LA - <alexandrechisini@gmail.com>ORCID: https://orcid.org/0000-0002-3695-0361
Coautor(es):
• Otávio Pereira D’Avila - D’Avila, OP - <otaviopereiradavila@gmail.com>ORCID: http://orcid.org/0000-0003-1852-7858
• Mauro Cardoso Ribeiro - Ribeiro, MC - <mauro.cardoso1@gmail.com>
ORCID: https://orcid.org/0009-0004-6146-9301
• Erno Harzheim - Harzheim, E - <eharzheim@hcpa.edu.br>
ORCID: https://orcid.org/0000-0002-8919-7916
• Luiz Felipe Pinto - Pinto, LF - <felipepinto.rio@medicina.ufrj.br>
ORCID: https://orcid.org/0000-0002-9888-606X
Resumo:
This study aims to evaluate the viability of the NPS as a parsimonious predictor of the Primary Care Assessment Tool (PCAT). The data used in this study were obtained from a cross-sectional survey conducted in Rio de Janeiro in 2024, including adults and children in separate samples. A total of 968 adults and 985 children participated in the study. Both dataset was divided into training (70%) and test (30%) sets. The main predictor was the NPS. Additional predictors consisted of 44 variables for adults and 39 for pediatric populations. Six machine learning algorithms were assessed. After evaluating all variables, six variables were selected for adult sample without showing a decrease in the model performance: 1) NPS, 2) age, 3) years of education; 4) the same doctor was seen consistently, 5) the number of times the service was used, and 6) the number of children. Four similar variables were selected for the children sample: 1) NPS, 2) age (respondent), 3) years of education, 4) the same doctor was seen consistently. We observed a superior performance of predictive models in the children population (AUC=0.82) compared to adults (AUC=0.76). The transfer learning approach achieved robust performance in adapting the adult-derived model to children's data. This study demonstrated that a predictive model based on NPS and a minimal set of additional variables can effectively estimate PCAT scores.Palavras-chave:
primary health care, health evaluation, Rio de JaneiroAbstract:
Este estudo teve como objetivo avaliar a viabilidade do NPS (Net Promoter Score) como preditor parcimonioso do Primary Care Assessment Tool (PCAT). Os dados utilizados foram obtidos de um estudo transversal realizado no Rio de Janeiro em 2024, incluindo amostras independentes de adultos e crianças. Participaram 968 adultos e 985 crianças. Os conjuntos de dados foram divididos em treinamento (70%) e teste (30%). O principal preditor foi o NPS, complementado por 44 variáveis para adultos e 39 para a população pediátrica. Seis algoritmos de machine learning foram avaliados. Após avaliação, seis variáveis foram selecionadas para adultos sem perda de desempenho do modelo: NPS, idade, anos de escolaridade, atendimento consistente pelo mesmo médico, número de utilizações do serviço, e número de filhos. Para crianças, quatro variáveis similares foram selecionadas: NPS, idade (do respondente), anos de escolaridade, atendimento consistente pelo mesmo médico. Os modelos preditivos apresentaram desempenho superior na população infantil (AUC=0,82) comparada à adulta (AUC=0,76). A abordagem de transfer learning demonstrou robustez ao adaptar o modelo derivado de adultos para dados pediátricos. Este estudo mostrou que um modelo baseado no NPS e em um conjunto mínimo de variáveis adicionais pode estimar eficazmente os escores do PCAT.Keywords:
atenção primária à saúde, avaliação em saúde, Rio de Janeiro.Conteúdo:
Improving patient experiences in healthcare has become a fundamental goal worldwide, closely linked to the quality and safety of care provided 1. Patient experience encompasses all interactions shaped by an organization's culture that influence perceptions across the continuum of care 2. Although conventional metrics like patient satisfaction and complaint resolution are still important, they have limitations when it comes to providing a complete evaluation of healthcare experiences 3. The need for robust patient experience metrics has led to the development of various assessment tools, including the Hospital Consumer Assessment of Healthcare Providers and Systems 4 and the Australian Hospital Patient Experience Question Set (AHPEQS) 5. Given the complexity of patient experience, healthcare organizations often seek simplified indicators to facilitate performance benchmarking and decision-making 6.
The Net Promoter Score (NPS) has emerged as a widely used single-item measure for assessing patient experience 7. Originally developed for business and service industries, NPS has been increasingly applied in healthcare to evaluate system-level performance 6. The tool categorizes respondents based on their likelihood of recommending a service, distinguishing between "Detractors" (scores 0–6), "Passives" (scores 7–8), and "Promoters" (scores 9–10) 8. Although NPS offers a straightforward and rapid assessment method, its validity and suitability for healthcare settings remain debated, particularly concerning its ability to capture the multifaceted nature of patient experiences 6, 8.
In contrast, the Primary Care Assessment Tool (PCAT) is a validated instrument specifically designed to evaluate the performance of primary healthcare services 9. Developed within a robust framework, PCAT measures key attributes of primary care, including access, continuity, comprehensiveness, and coordination 10, 11. The tool has been widely applied across diverse healthcare systems to assess service quality and patient perceptions 12-15. However, administering the full PCAT can be time-consuming and resource-intensive, limiting its practicality in certain healthcare settings 9. Given the need for efficient yet reliable assessment methods, an important question arises: Could the NPS serve as a valid predictor of PCAT scores?
Despite the extensive application of both NPS and PCAT in healthcare evaluations, there is no research on the predictive relationship between these two tools. The extent to which NPS can serve as a proxy for PCAT remains unclear 6, and no studies have systematically explored the use of machine learning (ML) to enhance this predictive capability. The literature lacks empirical evidence on whether NPS, combined with selected health and sociodemographic variables, can effectively predict PCAT. Addressing this gap is essential to determine whether NPS can provide a rapid and reliable estimation of primary care quality when administering the full PCAT is impractical.
This study aims to evaluate the viability of the Net Promoter Score as a parsimonious predictor of the Primary Care Assessment Tool using machine learning (ML). We have addressed three questions: 1) Can NPS alone serve as a valid predictor of PCAT scores? 2) Does augmenting NPS with sociodemographic and health-related variables enhance predictive performance? 3) Can transfer learning optimize the model? We explore domain adaptation strategies to fine-tune adult-derived models for pediatric cohorts, reducing data dependency while maintaining performance in settings with limited pediatric samples.
Methods:
This study follows the TRIPOD+ AI guidelines for reporting prediction using machine learning approaches 16.
Data:
The data used in this study were obtained from a cross-sectional survey conducted in Rio de Janeiro in 2024 which included adults and children in separate samples 17. The study included independent random samples of service users from ten planning areas across the city. Data were collected through structured interviews, which were administered by trained interviewers. The study utilized the PCAT-Brazil to evaluate the alignment of healthcare services with Primary Healthcare principles, taking into account sociodemographic factors and self-reported health conditions. Ethical approval for the 2024 study was granted by the Ethics Committee of the Federal University of Porto Alegre (Approval No. 77802624.0.0000.5347/2024). Data collection took place between March and April 2024.
Participants:
A total of 968 adults and 985 children participated in the study. The eligibility criteria included adults aged 18 years or older and children aged 12 years or younger, both of whom had attended at least two medical appointments at the same healthcare unit within the previous two years. Participants were recruited immediately after their medical appointments at the healthcare unit on the interview day. Only healthcare units that had been implementing the Family Health strategy for at least six months, as per the guidelines of the Secretaria Municipal de Saúde do Rio de Janeiro (SMS-RJ), were included in the study. Individuals who were unable to complete the questionnaire due to cognitive or physical limitations, or who did not meet the minimum consultation criteria, were excluded from participation. For children, the caregiver completed the interviews during the consultation.
Sample Size:
A confidence level of 95% and a margin of error of 4% (d) were applied to the estimates, considering that, in addition to the scores generated by the PCATool questionnaire, various frequency-based questions would also be analyzed. In the absence of recent studies, a conservative approach was adopted, assuming p=0.5. Consequently, the estimated sample size was calculated to be 1,034, with a 10% buffer included to account for potential sample attrition, applicable to both the adult and child cohorts.
Outcomes:
The research focused on evaluating the performance of the PCAT-Brazil indicator, applied to both adult and pediatric populations. These metrics, alongside assessments of Primary Healthcare (PHC) service characteristics, were standardized using the tool’s validated methodology and expressed on a normalized 0–10 scale 11. Raw survey responses (originally scored 1–4) were proportionally transformed to align with this standardized range. To interpret results, a quality threshold of 6.6 was established based on protocol guidelines 11, with values exceeding this cutoff classified as “High Score” and those at or below it designated as “Low Score.”
Predictors
The main predictor was the NPS, which consists of a single question: 'On a scale from 0 to 10, where 0 means 'would not recommend at all' and 10 means 'would definitely recommend,' how likely are you to recommend this service to a friend or family member?' The overall score is calculated by subtracting the percentage of Detractor respondents from the percentage of Promoters, resulting in a value ranging from -100 (100% Detractors) to +100 (100% Promoters). The additional predictors consisted of 44 variables for adults and 39 variables for pediatric populations, identified through a synthesis of empirical evidence and clinical consensus. A comprehensive variable inventory is available in Supplementary Table S1.
Data Preparation
Categorical variables with more than two categories were transformed using one-hot encoding, ensuring a binary representation. Numerical variables were standardized as Z-scores to normalize their distribution and facilitate comparability across different scales.
Missing Data
Missing data imputation was performed using the Multiple Imputation by Chained Equations (MICE) method, implemented in the miceforest library. This approach utilizes LightGBM as the base model, ensuring fast and memory-efficient imputation. miceforest was chosen for its ability to automatically handle categorical variables, integrate seamlessly with scikit-learn pipelines, and allow extensive customization of the imputation process. For this study, the imputation process was executed with 10 iterations and 500 trees per model.
Statistical Methods:
Analyses were conducted using Python version 3.11.7. Initially, we adopted a stratified analytical framework, conducting separate analyses for the adult and child datasets due to differences in predictor variables across the two questionnaires. The dataset (adult and children) was divided into training (70%) and test (30%) sets. For each population group, we tested multiple predictive models to identify the optimal algorithm for each dataset. Six machine learning algorithms were assessed: Gradient Boosting Classifier, Random Forest Classifier, Extreme Gradient Boosting, CatBoost Classifier, Light Gradient Boosting Machine, and TabPFNClassifier v2. No adjustments were made for class imbalance. Hyperparameter tuning was carried out through 10-fold cross-validation with 50 iterations. Model performance was evaluated based on area under the ROC curve (AUC), with 95% confidence intervals estimated using 10,000 bootstrap resamples. Additional metrics, including accuracy, recall, precision, and F1-score, were also computed. The DeLong test was used to compare ROC curves.
Recursive Feature Elimination (RFE) was applied with Cross-Validation using a Random Forest classifier to perform feature selection. This method iteratively removes the least important features while evaluating model performance using Stratified K-Fold cross-validation (with 5 folds) and AUC as the scoring metric. The optimal number of features was determined based on cross-validation performance. The selected features were then visualized to assess their impact on model performance. This approach helps reduce dimensionality, improving model efficiency while maintaining predictive power.
In parallel, we performed a targeted evaluation using the Net Promoter Score (NPS) as the sole predictor to isolate and quantify its independent predictive capacity. This comparative analysis sought to determine how NPS alone performed against multivariate models in predicting the PCAT. This analysis was only performed with the best model selected previously.
To interpret the best-performing model, Shapley values were derived using the SHapley Additive exPlanations (SHAP) method to determine the contribution of each feature to the predictions 18. Model fairness was examined through stratified analyses by sex and race, using the algorithm with the highest AUC.
Transfer learning 19, 20 was also employed to adapt a predictive model originally trained on adult data to the children population using the best algorithm. Thus, the pre-trained model, developed on adult datasets, was fine-tuned on pediatric data to leverage shared feature patterns while addressing domain-specific differences. The training process incorporated class balancing to mitigate label imbalance and early stopping to prevent overfitting. Model performance was evaluated using AUC, accuracy, precision, recall, and F1-score. Final models are available to use at: https://github.com/alexandrechisini/PCAT_NPS.
Results
A total of 968 adults and 985 children were initially recruited. After excluding incomplete datasets, 889 adults (91.8% of the original group) and 956 children (97.1% of the initial cohort) met the criteria for full PCATool and NPS data inclusion. Demographic analysis revealed that the adult group skewed heavily toward female representation (79.3%), with a predominant self-identification as Brown-skinned (44.3%). Participants averaged 46.9 years of age (standard deviation [SD] = 17.2), and the largest socioeconomic segment was SES C2 (37.1%). Conversely, the pediatric cohort exhibited a balanced gender distribution (50.1% female), with 46.5% identifying as Brown-skinned, and SES B1 constituting the largest socioeconomic category (42.3%). For adults, 51.1% of the participants attributed a high score for PCAT the mean of NPS score was 7.9 (SD=2.3). For children, 71.1% of the caregivers attributed a high score and the mean NPS score was 7.6 (SD=3.3).
After evaluating all 45 variables using Recursive Feature Elimination, six variables were selected without showing a decrease in the model performance: 1) NPS, 2) Age, 3) years of education; 4) the same doctor was seen consistently, 5) the number of times the service was used, and 6) the number of children. Similar variables were selected for the children sample: 1) NPS, 2) Age (respondent), 3) years of education, 4) the same doctor was seen consistently.
The performance of machine learning models in predicting PCATool in adult samples was compared (Tables 1 and Supplemental Figure 1). For adults, all models performed similarly except for Extreme Gradient Boosting, which showed worse performance. The TabPFNClassifier v2 achieved a marginally higher test AUC (0.76, 95% CI [0.71–0.82]) compared to the CatBoost Classifier (AUC = 0.76, 95% CI [0.70–0.81]) when using full predictors. However, the CatBoost model demonstrated superior balance across other metrics, with higher recall (0.76 vs. 0.71) and precision (0.67 vs. 0.64). When restricted to an NPS-only predictor, the CatBoost Classifier maintained moderate performance (AUC = 0.72, 95% CI [0.67–0.78]), albeit with reduced precision (0.56).
For children, the CatBoost and Gradient Boosting Classifiers jointly outperformed other models with full predictors, both attaining a test AUC of 0.82 (95% CI [0.77–0.87]) (Table 2). The CatBoost Classifier exhibited high precision (0.82) and recall (0.90), while the Gradient Boosting Classifier achieved the highest F1-score (0.87) driven by exceptional recall (0.92). Using only NPS, the CatBoost Classifier retained robust performance (AUC = 0.78, 95% CI [0.76–0.80]), with minimal decline in precision (0.81) compared to the full-predictor setup.
Algorithmic fairness analysis revealed nuanced performance disparities across demographic subgroups (Supplemental Table 2). For adults, the CatBoost classifier exhibited higher discriminative capacity (AUC = 0.79 vs. 0.75) and recall (0.78 vs. 0.69) in males compared to females, though precision remained comparable (0.68 vs. 0.66). Racial disparities were pronounced, with White individuals achieving the highest F1-score (0.74) and AUC (0.78), while Brown individuals showed reduced precision (0.66) despite moderate recall (0.70). In contrast, for children, Black (AUC = 0.87, F1-score = 0.88) and Brown (AUC = 0.88, F1-score = 0.87) subgroups outperformed White individuals (AUC = 0.77), with Black children demonstrating exceptional precision (0.89). Gender-based differences in the pediatric cohort favored males in AUC (0.84 vs. 0.80) and precision (0.84 vs. 0.71), though both genders retained high recall (>0.89).
Shapley value analysis identified the NPS as the main variable to predict PCAT (Figure 1). In both adult and pediatric cohorts, NPS consistently demonstrated the highest feature importance magnitude, exerting the strongest directional influence on high PCATool prediction.
The transfer learning approach achieved robust performance (Figure 2) in adapting the adult-derived model to children's data, with an AUC of 0.80 (95% CI [0.75–0.86]), accuracy of 0.79, precision of 0.82, recall of 0.92, and an F1-score of 0.86. Notably, these scores closely aligned with the performance of the fully children-trained CatBoost classifier (AUC = 0.82; F1-score = 0.86), indicating that domain adaptation preserved predictive efficacy while reducing computational and data requirements. The model maintained high sensitivity (recall = 0.92), critical for minimizing missed cases in clinical screening, and demonstrated stable precision (0.82), reflecting reliable positive predictions. This parity underscores the utility of transfer learning for leveraging cross-population feature patterns without compromising performance. The narrow AUC confidence interval further reinforces generalizability, consistent with fairness metrics observed in subgroup analyses.
Discussion:
This study demonstrated that it is possible to accurately predict PCAT scores using only the Net Promoter Score (NPS) and a better prediction can be achieved including a small set of sociodemographic and health-related variables. While the NPS is widely used to assess healthcare experiences, its applicability as an indicator of primary care quality remains uncertain until now 6. Our findings show that when combined with selected variables, the NPS can serve as a reliable predictor of primary care performance, particularly in contexts where the use of the PCAT (the gold standard) is infeasible or impractical. Furthermore, we demonstrated that models trained on adult data could be adapted to the children’s population through transfer learning, reinforcing the feasibility of more efficient strategies for evaluating healthcare service quality. These results have important implications for simplifying primary care assessment, particularly in settings where administering lengthy questionnaires may not be practical.
Our findings highlight that the Net Promoter Score (NPS), alone or combined with a small set of sociodemographic and health-related variables, can effectively predict PCAT. However, the model's performance was notably lower when using NPS alone for adults, whereas the performance loss was minimal for children. The superior performance of predictive models in the pediatric population (AUC = 0.82) compared to adults (AUC = 0.76) may be attributed to inherent socio-behavioral and cultural dynamics within the studied groups. For children, healthcare experience assessments were mediated by caregivers (e.i., parents or guardians), who tend to adopt a more critical and systematic approach to evaluating services 21, 22. Furthermore, may be attributed not only to caregiver mediation but also to lower variability in reported health experiences, as children often use services more preventively and are less influenced by complex clinical histories. From a public health perspective, this suggests that simplified instruments such as the NPS may be particularly useful for monitoring perceived quality in child care, where continuity of care and trust in the provider are central elements. This mediation may yield more consistent responses, less influenced by emotional variability or individual subjectivity, as reflected in the pediatric models’ high precision (0.82) and recall (0.92). Additionally, continuity of care—highlighted as a key predictor in both groups—plays a central role in pediatrics 23, where trust in a familiar physician reinforces perceived quality, particularly in vulnerable contexts (e.g., caregivers’ anxiety to ensure adequate care for dependents) 24. For adults, greater heterogeneity in healthcare experiences—shaped by factors such as complex medical histories, chronic conditions, and socioeconomic disparities— may introduce additional noise into the data.
The success of gradient-boosted models like CatBoost in this study can be attributed to their ability to handle heterogeneous, real-world healthcare data 25. Boosting algorithms iteratively refine decision trees to minimize prediction errors, excelling at capturing non-linear relationships and interactions among variables 26. CatBoost, in particular, mitigates overfitting through ordered boosting and robust handling of categorical variables (e.g., binary features like consulting the same doctor), which are prevalent in survey-based datasets. The TabPFNClassifier v2 27, a transformer-based model, despite its promise as a rapid, synthetic-data-trained foundation model, showed mixed results. While it matched CatBoost’s AUC for adults (0.76), its pediatric performance lagged (AUC = 0.79 vs. 0.82), likely due to discrepancies between its pre-training on synthetic datasets and the real-world nuances of caregiver-mediated children’s data. Future work could explore hybrid approaches, leveraging TabPFN’s embeddings alongside CatBoost’s domain-tuned robustness, to bridge this gap without sacrificing equity or interpretability.
This result aligns with efforts to develop more efficient methods for assessing primary care quality while raising important considerations regarding fairness. Given the potential racial and social gender inequalities in health 28-30, healthcare access 31, and healthcare use 32 and experiences across different populations, it is crucial to discuss that our model presents nuanced performance disparities across demographic subgroups 33. Implementing equity-focused mitigation strategies, such as fairness-aware model training 34 and subgroup-specific threshold calibration 35, could reduce performance disparities while maintaining predictive accuracy. The observed disparities in model performance across racial and gender subgroups reflect structural inequalities well-documented in the literature on healthcare access and quality. The fact that the model performed better among Black and Brown children may indicate that the NPS captures the experience of these groups more accurately, possibly due to differentiated expectations of public services. It is important to recognize, however, that the NPS is derived from a single recommendation question, constituting a necessarily more superficial and unidimensional assessment compared to the multifaceted and detailed approach of the PCATool. This synthetic nature may represent an intrinsic limitation for deeper equity investigations, as it compresses the heterogeneous and complex experience of service use into a single indicator, potentially erasing crucial nuances and contextual determinants essential for understanding inequalities. However, this difference may also signal the need for equity-aware model adjustments, especially in contexts such as the Brazilian Unified Health System (SUS), where reducing inequities is a guiding principle.
This study has some limitations that should be considered. First, the sample may not be fully representative of all healthcare settings, which could limit the generalizability of our findings. Since our model was trained and tested exclusively on data from healthcare units in Rio de Janeiro, its performance in different regions or healthcare systems remains uncertain. Differences in population demographics, healthcare delivery structures, and patient experiences could impact its predictive accuracy when applied to new settings. Future research should focus on validating the model with external datasets from diverse geographic and healthcare contexts to assess its robustness and ensure its applicability beyond the studied population. Also, it is important to note that our data were collected from individuals who had just used the healthcare service, meaning the model may not perform as well when applied to more general populations outside healthcare facilities 19. Additionally, we did not perform external validation, which limits the generalizability of our findings. External validation would help identify potential biases and allow for necessary model adjustments to improve fairness and performance across different patient groups. In addition to the absence of external validation, we recognize that data collection immediately after service use may have introduced a recency bias, where the immediate experience—positive or negative—disproportionately influenced responses. This may limit the generalizability of our findings to the broader user population, including those with lower service utilization or greater time elapsed since their last visit. Future studies could include representative samples of the enrolled community, capturing accumulated experiences over time.
For real-world implementation, careful consideration must be given to how missing or poor-quality input data are handled. Automated imputation techniques or flagging missing values for review may help maintain reliability in predictions. The model’s usability also depends on whether end-users, such as healthcare professionals or administrators, need to interact with data preprocessing. Ideally, the model should be designed for minimal manual input, ensuring accessibility even for users without advanced technical expertise. Furthermore, our approach demonstrated strong performance with transfer learning, highlighting its potential as an effective solution. This transfer learning capability could allow for fine-tuning the model to different populations, adapting it to specific contexts, and improving its predictive accuracy. To facilitate broader use, we have made the model available on GitHub, enabling other researchers to perform this fine-tuning on their datasets, which could further enhance its performance and adaptability across diverse healthcare settings.
Future research should focus on validating the model in broader and more diverse healthcare settings, ensuring its applicability beyond the populations studied here. Additionally, exploring the integration of this predictive approach into existing healthcare information systems could enhance its usability. Further work is also needed to assess the model’s impact on decision-making and whether its implementation improves primary care evaluation processes in practice.
Final considerations:
This study demonstrated that a predictive model based on the Net Promoter Score (NPS) and a minimal set of additional variables can effectively estimate PCAT scores. By comparing models with different sets of predictors, we showed that even a simplified approach can provide reliable estimates, reinforcing the potential of streamlined data collection methods in primary care evaluation.
Additionally, our findings highlight the benefits of transfer learning in this context, suggesting that knowledge gained from one predictive task can be effectively leveraged to improve performance in related applications. This approach not only enhances model efficiency but also facilitates adaptation across different healthcare settings.
Despite some limitations, our results provide a foundation for future research aimed at refining predictive models for primary care assessment. Further validation in diverse populations and integration into existing healthcare systems will be essential to maximize the model’s practical utility while ensuring fairness and generalizability.
Author contributions:
Luiz Alexandre Chisini: conceptualization, methodology, formal analysis, data curation, writing – original draft, visualization. Otávio Pereira D'Avila: conceptualization, methodology, writing – review & editing, supervision. Mauro Cardoso Ribeiro: methodology, writing – review. Luiz Felipe Pinto: conceptualization, resources, writing – review & editing. Erno Harzheim: conceptualization, supervision, resources, writing – review & editing.
References
1. Marzban S, Najafi M, Agolli A, Ashrafi E. Impact of Patient Engagement on Healthcare Quality: A Scoping Review. J Patient Exp. 2022;9:23743735221125439.doi:10.1177/23743735221125439
2. Oben P. Understanding the Patient Experience: A Conceptual Framework. J Patient Exp. 2020;7(6):906-10.doi:10.1177/2374373520951672
3. Du Y, Gu Y. The development of evaluation scale of the patient satisfaction with telemedicine: a systematic review. BMC Med Inform Decis Mak. 2024;24(1):31.doi:10.1186/s12911-024-02436-z
4. Albaroudi A, Chen J. Consumer Assessment of Healthcare Providers and Systems Among Racial and Ethnic Minority Patients With Alzheimer Disease and Related Dementias. JAMA Netw Open. 2022;5(9):e2233436.doi:10.1001/jamanetworkopen.2022.33436
5. Jones CH, Woods J, Brusco NK, Sullivan N, Morris ME. Implementation of the Australian Hospital Patient Experience Question Set (AHPEQS): a consumer-driven patient survey. Aust Health Rev. 2021;45:562-9.doi:10.1071/AH20265
6. Adams C, Walpola R, Schembri AM, Harrison R. The ultimate question? Evaluating the use of Net Promoter Score in healthcare: A systematic review. Health Expect. 2022;25(5):2328-39.doi:10.1111/hex.13577
7. Lucero KS. Net Promoter Score (NPS): What Does Net Promoter Score Offer in the Evaluation of Continuing Medical Education? J Eur CME. 2022;11(1):2152941.doi:10.1080/21614083.2022.2152941
8. Krol MW, de Boer D, Delnoij DM, Rademakers JJ. The Net Promoter Score--an asset to patient experience surveys? Health Expect. 2015;18(6):3099-109.doi:10.1111/hex.12297
9. Fracolli LA, Gomes MF, Nabao FR, Santos MS, Cappellini VK, de Almeida AC. Primary health care assessment tools: a literature review and metasynthesis. Cien Saude Colet. 2014;19(12):4851-60.doi:10.1590/1413-812320141912.00572014
10. Pinto LF, D'Avila OP, Hauser L, Harzheim E. Innovations in the national household random sampling in Brazilian National Health Survey: results from Starfield and Shi's adult primary care assessment tool (PCAT). Int J Equity Health. 2021;20(1):113.doi:10.1186/s12939-021-01455-w
11. Brasil. Ministério da saúde. secretaria de atenção Primária à saúde. Departamento de saúde da Família.
Manual do instrumento de avaliação da atenção Primária à saúde : PCatool-Brasil – 2020 [recurso eletrônico] /
Ministério da saúde, secretaria de atenção Primária à saúde. – Brasília : Ministério da saúde, 237 p. : il. 2020
12. D'Avila OP, Pinto LF, Hauser L, Goncalves MR, Harzheim E. The use of the Primary Care Assessment Tool (PCAT): an integrative review and proposed update. Cien Saude Colet. 2017;22(3):855-65.doi:10.1590/1413-81232017223.03312016
13. Mei J, Liang Y, Shi L, Zhao J, Wang Y, Kuang L. The Development and Validation of a Rapid Assessment Tool of Primary Care in China. Biomed Res Int. 2016;2016:6019603.doi:10.1155/2016/6019603
14. Wang W, Haggerty J. Development of primary care assessment tool-adult version in Tibet: implication for low- and middle-income countries. Prim Health Care Res Dev. 2019;20:e94.doi:10.1017/S1463423619000239
15. El Mouaddib H, Sebbani M, Mansouri A, Adarmouch L, Amine M. Cross-cultural adaptation of the Moroccan Arabic dialect version of the Primary Care Assessment Tool. Gac Sanit. 2023;37:102350.doi:10.1016/j.gaceta.2023.102350
16. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378.doi:10.1136/bmj-2023-078378
17. D’Avila OP, Chisini LA, Ribeiro MC, Meira-Silva VST, Mathuiy YR, Moura LJN, et al. Evaluation of Primary Health Care in Rio de Janeiro: the experience of patients fifteen years after the Reform. Cien Saude Colet. 2025;evaluable at: http://cienciaesaudecoletiva.com.br/artigos/avaliacao-da-atencao-primaria-a-saude-no-rio-de-janeiro-experiencia-de-usuarios-apos-quinze-anos-da-reforma/19544
18. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4766–75
19. Yang J, Soltan AAS, Clifton DA. Machine learning generalizability across healthcare settings: insights from multi-site COVID-19 screening. NPJ Digit Med. 2022;5(1):69.doi:10.1038/s41746-022-00614-9
20. Ebbehoj A, Thunbo MO, Andersen OE, Glindtvad MV, Hulman A. Transfer learning for non-image data in clinical research: A scoping review. PLOS Digit Health. 2022;1(2):e0000014.doi:10.1371/journal.pdig.0000014
21. Dourado BM, Arruda BFT, Salles VB, de Souza Junior SA, Dourado VM, Pinto JP. Evaluation of family caregiver satisfaction with a mental health inpatient service. Trends Psychiatry Psychother. 2018;40(4):300-9.doi:10.1590/2237-6089-2017-0137
22. Wang WF, Chen CM, Jhang KM, Su YY. Evaluating caregivers' service quality perceptions: impact-range performance and asymmetry analyses. BMC Health Serv Res. 2022;22(1):183.doi:10.1186/s12913-022-07594-2
23. Christakis DA, Wright JA, Zimmerman FJ, Bassett AL, Connell FA. Continuity of care is associated with high-quality careby parental report. Pediatrics. 2002;109(4):e54.doi:10.1542/peds.109.4.e54
24. Monteiro AMF, Santos RL, Kimura N, Baptista MAT, Dourado MCN. Coping strategies among caregivers of people with Alzheimer disease: a systematic review. Trends Psychiatry Psychother. 2018;40(3):258-68.doi:10.1590/2237-6089-2017-0065
25. Chen XY, Lu WT, Zhang D, Tan MY, Qin X. Development and validation of a prediction model for ED using machine learning: according to NHANES 2001-2004. Sci Rep. 2024;14(1):27279.doi:10.1038/s41598-024-78797-2
26. Mayr A, Binder H, Gefeller O, Schmid M. The evolution of boosting algorithms. From machine learning to statistical modelling. Methods Inf Med. 2014;53(6):419-27.doi:10.3414/ME13-01-0122
27. Hollmann N, Muller S, Purucker L, Krishnakumar A, Korfer M, Hoo SB, et al. Accurate predictions on small data with a tabular foundation model. Nature. 2025;637(8045):319-26.doi:10.1038/s41586-024-08328-6
28. Chisini LA, Noronha TG, Ramos EC, Dos Santos-Junior RB, Sampaio KH, Faria ESAL, et al. Does the skin color of patients influence the treatment decision-making of dentists? A randomized questionnaire-based study. Clin Oral Investig. 2019;23(3):1023-30.doi:10.1007/s00784-018-2526-7
29. Costa F, Wendt A, Costa C, Chisini LA, Agostini B, Neves R, et al. Racial and regional inequalities of dental pain in adolescents: Brazilian National Survey of School Health (PeNSE), 2009 to 2015. Cad Saude Publica. 2021;37(6):e00108620.doi:10.1590/0102-311X00108620
30. Costa FDS, Costa CDS, Chisini LA, Wendt A, Santos I, Matijasevich A, et al. Socio-economic inequalities in dental pain in children: A birth cohort study. Community Dent Oral Epidemiol. 2022;50(5):360-6.doi:10.1111/cdoe.12660
31. Costa FDS, Possebom Dos Santos L, Chisini LA. Inequalities in the use of dental services by people with and without disabilities in Brazil: a National Health Survey. Clin Oral Investig. 2024;28(10):540.doi:10.1007/s00784-024-05917-7
32. Pires ALC, Costa FDS, D'Avila OP, Carvalho RV, Conde MCM, Correa MB, et al. Contextual inequalities in specialized dental public health care in Brazil. Braz Oral Res. 2024;38:e023.doi:10.1590/1807-3107bor-2024.vol38.0023
33. Chisini LA, Araujo CF, Delpino FM, Figueiredo LM, Filho A, Schuch HS, et al. Dental services use prediction among adults in Southern Brazil: A gender and racial fairness-oriented machine learning approach. J Dent. 2025;161:105929.doi:10.1016/j.jdent.2025.105929
34. Liu M, Ning Y, Ke Y, Shang Y, Chakraborty B, Ong MEH, et al. FAIM: Fairness-aware interpretable modeling for trustworthy machine learning in healthcare. Patterns (N Y). 2024;5(10):101059.doi:10.1016/j.patter.2024.101059
35. Hegarty SE, Linn KA, Zhang H, Teeple S, Albert PS, Parikh RB, et al. Assessing Algorithm Fairness Requires Adjustment for Risk Distribution Differences: Re-Considering the Equal Opportunity Criterion. medRxiv. 2025.doi:10.1101/2025.01.31.25321489











