Rahul's Portfolio Website

Hi there! 👋

I'm Rahul Singh. Thanks for visiting!

A passionate Data Science professional 🚀 with 4+ years of experience in Machine Learning, Big Data, and advanced analytics.
Proficient in deploying Deep Learning, AI, and Statistical modeling to extract actionable insights and drive business growth.
Skilled in building scalable data pipelines and delivering end-to-end Machine Learning solutions.
Expertise in predictive analytics, data visualization, and improving customer satisfaction through data-driven strategies.
Adaptable, quick learner, and effective communicator with a proven ability to solve complex business problems.

My Experience

Data Scientist II, Cotiviti

Feb 2025 – Present

Automated ETL workflows using Oracle SQL, HiveQL, MySQL, SAS, R, and Python, cutting SLAs from 10 to 7 days.
Developed Tableau and MicroStrategy dashboards for Cotiviti’s top clients, driving data-driven decisions.
Built a centralized repository with QC checklists and reusable scripts, reducing manual intervention.
Led team operations in the absence of senior leaders, ensuring smooth project execution.
Deployed Microsoft Fabric solutions, cutting query execution time by 35%.
Developed an ML-driven fraud detection model with Azure ML and LightGBM, achieving 94% precision.

Data Science Engineer, Public Consulting Group

2023, Aug – Present

Designed AI solutions using TensorFlow and LLMs like GPT and BERT for text summarization and sentiment analysis, reducing manual analysis time by 40%.
Developed and optimized ETL pipelines with PySpark and AWS Glue, increasing data processing efficiency by 50%.
Built centralized BI visualization tools to monitor healthcare programs, reducing team effort by 20%.
Engineered ML model deployments using AWS SageMaker and Lambda, reducing deployment time by 25%.

Data Science Research Assistant, Gannon University

2022, Aug – 2023, May

Analyzed predictive models with deep learning, achieving 95% accuracy and reducing data redundancy by 20%.
Implemented GAN architectures for zero-shot classification and recommendations, improving precision and accuracy on large datasets.

Data Scientist, Make My Clinic Pvt Ltd

2019, July – 2021, May

Led quality assessment of 9M+ clinical records, automating validation to improve data accuracy by 50%.
Designed survival analysis models to analyze treatment patterns, boosting study efficiency by 15%.
Applied statistical modeling and hypothesis testing for effective A/B testing and model optimization.

Skills

Technical Skills

Languages
→ Python
→ R
→ SQL
→ Bash
→ JavaScript
Machine Learning and Deep Learning
→ Regression and Tree Models
→ Neural Networks
→ NLP (spaCy, NLTK, Hugging Face)
→ sklearn, pandas, numpy, pyspark
→ Time Series Forecasting
→ Ensemble Methods (Boosting, Bagging)
Tools
→ VSCode
→ Jupyter Notebook
→ Colab and Anaconda
→ Docker
→ Git/GitHub
BI Tools
→ Power BI
→ Tableau
→ Excel (Advanced)
Databases
→ PostgreSQL
→ MySQL
→ MongoDB
→ BigQuery
Cloud Platforms
→ Azure (Databricks, ML Studio)
→ GCP (Vertex AI, BigQuery)
→ AWS (Lambda, S3, RDS)
MLOps/DevOps
→ Airflow
→ Jenkins
→ MLflow
→ CI/CD Pipelines
Data Science Techniques
→ Data Cleaning and Preprocessing
→ Feature Engineering and Selection
→ Data Quality Validation
→ Model Evaluation and Tuning

Personal Skills

Possess the quality of a Good Story Teller.
Team Player with effective communication skills.
Ability to think of parallel solutions to the complex issues.

Publications

Automating Patch Set using LLMs

This research evaluates Large Language Models (LLMs) like GPT-4 and CodeBERT in automating patch set generation from code review comments, reducing developer context-switching and improving code quality assurance.
By comparing LLM-generated code changes against human-created patches, the study demonstrates the potential of LLMs to assist developers, streamlining the code review process while preserving human oversight. Link to the paper

Common Defects in Web Browsers Using Knowledge Embedding in GPT-4.0

The study investigates the defect-prone components in Firefox and Chromium, By leveraging GPT-4.0 and traditional NLP approaches, the study categorizes bugs and highlights areas requiring high bug-fix efforts.
These findings enable developers to prioritize testing and maintenance of critical components, improving decision-making for browser development and related software applications. Link to the paper (Google Drive)

Projects

Thyroid Detection

The goal of the project is to create a prediction system that can determine whether a patient has a high or low risk of developing thyroid disease.
Major disorders may develop in either situation when the thyroid gland functions either above or below normal levels (hyperthyroidism with high hormone levels versus hypothyroidism with low hormone levels)

Facial recognition, in particular, is poised to replace biometric authentication for identity verification. However, facial recognition systems are vulnerable to manipulation using open-source tools that can alter facial features at the pixel level.

This project conducts a comparison study of various Convolutional Neural Network (CNN) models, including ResNet50, VGG19, Xception, and Local Binary Pattern (LBP), combined with classifiers like KNN, to determine the most effective method for detecting fake faces. The study uses the "Real and Fake Face Identification" deepfake dataset from Yonsei University's Computational Intelligence Photography Lab.

Amazon Shipping Analytics

Amazon Shipping is a global logistics service that handles the shipping of a wide range of Fast Moving Consumer Goods (FMCG). The Shipping Manager, responsible for overseeing the smooth flow of shipments, previously lacked a clear and detailed overview of the shipping operations on a monthly basis.

To address this gap, an interactive Amazon Shipping Analytics Dashboard was created to provide real-time insights into shipping performance. This dashboard allows the Shipping Manager to easily track order volumes, shipping statuses, and destinations across different time periods. It enables quick decision-making based on up-to-date data, thus improving operational efficiency.

Heart Failure Detection

Heart Failure occurs when the heart cannot pump enough blood to support the organs in the body [CDC].

Using machine learning classifiers, a patient's survival can be predicted based on important clinical features.

Correlation analysis K-Means clustering Agglomerative hierarchical clustering Principle component analysis II. Heart failure prediction

Certificates

Rahul Singh