Built a full-stack phishing URL classifier training XGBoost and fine-tuning DistilBERT on 111,754 URLs, achieving 99.98% accuracy. Features a FastAPI backend, Plotly Dash frontend, and full AWS infrastructure provisioned via Terraform.
Real-time data pipeline ingesting global market data across 2,200+ markets in 36 countries, using AWS streaming, Snowpipe ingestion, and Snowpark transformations to process 3.3M+ records.
Streamlit ML app predicting Spotify track popularity from audio features using classical, ensemble, and deep learning models with hyperparameter tuning and rich evaluation visualizations.
Engineered and secured a private self-hosted cloud infrastructure on AWS with VPN-only access, SSL/TLS hardening, Redis caching, Nginx reverse proxying, and server optimization.
Multi-page Power BI dashboard analyzing country-level energy production, consumption, and net trade from 1990–2014 using DAX KPIs, map, ribbon, treemap, and stacked-column chart visuals.
Queried 1,450+ employee records with advanced SQL and built an interactive Tableau dashboard to analyze attrition drivers and performance trends across departments and job roles.
Performed EDA on 336,000+ U.S. flights using R (dplyr, ggplot2) to uncover delay patterns by airline, time of day, and airport.
Audit-focused transaction risk scoring engine ranking origin accounts based on behavioral patterns in a large synthetic financial transactions dataset.
Built a Python logistic regression model to classify tweets by topic with 85%+ accuracy, leveraging TF-IDF, sentiment, and lexical features.
This project investigates the relationship between monthly COVID-19 case counts and unemployment rates from January 2020 to May 2022 across five countries: United States, Brazil, India, France, and Germany. Deploying Python, Pandas, Seaborn, and Plotly, the analysis visualizes and statistically examines how pandemic waves correlated with labor market disruptions.