Hi there! 👋

I'm Rahul Singh. Thanks for visiting!

  • A passionate Data Science professional 🚀 with 4+ years of experience in Machine Learning, Big Data, and advanced analytics.
  • Proficient in deploying Deep Learning, AI, and Statistical modeling to extract actionable insights and drive business growth.
  • Skilled in building scalable data pipelines and delivering end-to-end Machine Learning solutions.
  • Expertise in predictive analytics, data visualization, and improving customer satisfaction through data-driven strategies.
  • Adaptable, quick learner, and effective communicator with a proven ability to solve complex business problems.

My Experience

Data Scientist II, Cotiviti

Mar 2025 – Present
  • Processed 200M+ monthly healthcare records (Aetna, Cigna, UHC) with Hadoop & PySpark, ensuring high data quality.
  • Deployed ML models (XGBoost, SVM, Gradient Boosting) for fraud detection & claims prediction, achieving 92% F1-score and reducing financial leakage by 20%.
  • Automated orchestration workflows (JHub, PySpark, Hadoop), cutting pipeline runtime from 3 hours to 35 minutes (-80%).
  • Improved ML-driven OI Primary model using DataRobot & LightGBM, boosting hit rate from 11% → 41% (Accuracy: 91%, F1: 71%, Precision: 81%).
  • Built a centralized repository with QC checklists and reusable scripts, reducing manual intervention.

Data Scientist, Public Consulting Group

June, 2023 – Feb, 2025
  • Designed AI solutions using TensorFlow and LLMs like GPT and BERT for text summarization and sentiment analysis, reducing manual analysis time by 40%.
  • Developed and optimized ETL pipelines with PySpark and AWS Glue, increasing data processing efficiency by 50%.
  • Built centralized BI visualization tools to monitor healthcare programs, reducing team effort by 20%.
  • Engineered ML model deployments using AWS SageMaker and Lambda, reducing deployment time by 25%.

Data Science Research Assistant, Gannon University

Aug, 2022 – May, 2023
  • Analyzed predictive models with deep learning, achieving 95% accuracy and reducing data redundancy by 20%.
  • Implemented GAN architectures for zero-shot classification and recommendations, improving precision and accuracy on large datasets.

Data Scientist, Make My Clinic Pvt Ltd

July, 2019 – June, 2021
  • Led quality assessment of 9M+ clinical records, automating validation to improve data accuracy by 50%.
  • Designed survival analysis models to analyze treatment patterns, boosting study efficiency by 15%.
  • Applied statistical modeling and hypothesis testing for effective A/B testing and model optimization.

Skills

Technical Skills

  • Languages
    → Python
    → R
    → SQL
    → Bash
    → JavaScript

  • Machine Learning and Deep Learning
    → Regression and Tree Models
    → Neural Networks
    → NLP (spaCy, NLTK, Hugging Face)
    → sklearn, pandas, numpy, pyspark
    → Time Series Forecasting
    → Ensemble Methods (Boosting, Bagging)

  • Tools
    → VSCode
    → Jupyter Notebook
    → Colab and Anaconda
    → Docker
    → Git/GitHub

  • BI Tools
    → Power BI
    → Tableau
    → Excel (Advanced)

  • Databases
    → PostgreSQL
    → MySQL
    → MongoDB
    → BigQuery

  • Cloud Platforms
    → Azure (Databricks, ML Studio)
    → GCP (Vertex AI, BigQuery)
    → AWS (Lambda, S3, RDS)

  • MLOps/DevOps
    → Airflow
    → Jenkins
    → MLflow
    → CI/CD Pipelines

  • Data Science Techniques
    → Data Cleaning and Preprocessing
    → Feature Engineering and Selection
    → Data Quality Validation
    → Model Evaluation and Tuning

Personal Skills

  • Possess the quality of a Good Story Teller.

  • Team Player with effective communication skills.

  • Ability to think of parallel solutions to the complex issues.

Publications

Automating Patch Set using LLMs

  • This research evaluates Large Language Models (LLMs) like GPT-4 and CodeBERT in automating patch set generation from code review comments, reducing developer context-switching and improving code quality assurance.
  • By comparing LLM-generated code changes against human-created patches, the study demonstrates the potential of LLMs to assist developers, streamlining the code review process while preserving human oversight. Link to the paper

Projects

Thyroid Detection

  • The goal of the project is to create a prediction system that can determine whether a patient has a high or low risk of developing thyroid disease.
  • Major disorders may develop in either situation when the thyroid gland functions either above or below normal levels (hyperthyroidism with high hormone levels versus hypothyroidism with low hormone levels)

Real & Fake Face Detection

  • Facial recognition, in particular, is poised to replace biometric authentication for identity verification. However, facial recognition systems are vulnerable to manipulation using open-source tools that can alter facial features at the pixel level.
  • This project conducts a comparison study of various Convolutional Neural Network (CNN) models, including ResNet50, VGG19, Xception, and Local Binary Pattern (LBP), combined with classifiers like KNN, to determine the most effective method for detecting fake faces. The study uses the "Real and Fake Face Identification" deepfake dataset from Yonsei University's Computational Intelligence Photography Lab.
  • Amazon Shipping Analytics

  • Amazon Shipping is a global logistics service that handles the shipping of a wide range of Fast Moving Consumer Goods (FMCG). The Shipping Manager, responsible for overseeing the smooth flow of shipments, previously lacked a clear and detailed overview of the shipping operations on a monthly basis.
  • To address this gap, an interactive Amazon Shipping Analytics Dashboard was created to provide real-time insights into shipping performance. This dashboard allows the Shipping Manager to easily track order volumes, shipping statuses, and destinations across different time periods. It enables quick decision-making based on up-to-date data, thus improving operational efficiency.
  • Heart Failure Detection

  • Heart Failure occurs when the heart cannot pump enough blood to support the organs in the body [CDC].
  • Using machine learning classifiers, a patient's survival can be predicted based on important clinical features.
  • Correlation analysis K-Means clustering Agglomerative hierarchical clustering Principle component analysis II. Heart failure prediction
  • Certificates

    Connect With Me On: