Work Experience
Graduate Teaching Assistant (Part Time)
University of Sheffield, Sheffield, UK
Feb 2025 – Present
- Mentor MSc students through hands-on labs using AWS, PySpark, and Databricks, enhancing practical understanding of scalable data workflows.
- Develop lab sheets and teaching materials in collaboration with module leaders, increasing student engagement and enhancing course feedback scores by 15%.
Big Data Engineer
IQVIA, Bangalore, India
Jun 2022 – Aug 2023
- Built scalable ETL pipelines using Apache Spark and Airflow on Microsoft Azure to process over millions of patient records weekly, achieving 99.8% reliability across production environments and supporting downstream data applications.
- Automated Hadoop cluster clean-up using Bash scripts and Airflow DAGs, reducing manual intervention and cutting recurring job failures by 40%, which significantly improved workflow stability on shared infrastructure.
- Optimised and developed Spark jobs by upgrading the codebase from Scala 2 to Scala 3 and refactoring inefficient transformations, mitigating pipeline runtime from 12 to 8 hours and lowering shared cluster compute costs.
- Resolved over 10 high-priority production incidents through root cause analysis and team collaboration, cutting incident response time by 50% and boosting documentation knowledge base via JIRA and Confluence.
- Engineered Spark SQL workflows and optimised joins and partitioning to process multi-terabyte patient datasets, transforming raw data into reporting-ready tables and ensuring 95%+ accuracy in automated stakeholder reports.
Machine Learning Engineer
CDAC, Pune, India
Oct 2021 – Jun 2022
- Delivered an interactive NLP analytics dashboard using Django and Python, integrating named-entity-recognition, topic modelling, and word clouds enabling domain experts to analyse faster and reduce manual review time by over 60%.
- Fine-tuned multilingual transformer models using Hugging Face and PyTorch for chatbot under the Kanthasth project, improving response accuracy by 30% based on internal evaluation metrics.
- Created a user-friendly Streamlit interface to evaluate 10+ translation models, simplifying file-based input and cutting validation time for non-technical testers by 80%.
- Spearheaded onboarding peers, accelerating onboarding through structured knowledge transfer on NLP tools like spaCy and scikit-learn, coding standards, and workflow practices, streamlining their integration into the team.
Skills Summary
Programming Languages: Python, SQL, R, Bash, Scala
GenAI Tools: LangChain, Ollama, Chroma, FAISS, Hugging Face, RAG, OpenAI APIs
Machine Learning: Scikit-learn, PyTorch, TensorFlow, Optuna
Cloud & MLOps: Azure ML, AWS, MLflow, CI/CD, Git, Docker, Airflow
Data Engineering: Apache Spark, PySpark, ETL Pipelines, MongoDB, MySQL
Visualisation: Power BI, Matplotlib, Seaborn, Streamlit, FastAPI
Projects
Knowledge-Based Q&A Chatbot (RAG, LangChain, Mistral-7B)
LLM based chatbot that processes user-uploaded PDFs by generating vector embeddings from split text chunks, enabling accurate responses to natural language queries through semantic search.
Tech: Python, Streamlit, FAISS, Hugging Face, Ollama (Apr ’25)
GitHub: https://github.com/kameshkotwani/rag-app
Multi-Modal Product Categorisation API (Tabular + Image Features)
Multi-input ML model that takes product text features and images as input, generates text and visual embeddings, and classifies products with 90% accuracy across 50+ categories.
Tech: Python, FastAPI, Docker, MLflow, GitHub, CatBoost, ResNet and SBERT (Apr ’25)
GitHub: https://github.com/kameshkotwani/product-categorisation-api
Real-Time Sentiment Analysis (API, CI/CD, MLflow)
API service that interprets text input and returns nuanced sentiment labels with confidence scores, achieving an F1-score of 0.95.
Tech: Python, FastAPI, AWS, MLOps (Feb ’25)
GitHub: https://github.com/kameshkotwani/mlops-mini-project
Leaf Disease Classification (MSc Dissertation, PyTorch)
Streamlit application that accepts user-uploaded images, classifies them into disease categories, achieves a macro F1 score of 0.87.
Tech: Python, PyTorch, OpenCV, AlexNet (Sep ’24)
Credit Card Fraud Detection (EDA, Streamlit, Machine Learning)
Dashboard web application to identify fraudulent credit card transactions.
Tech: Scikit-learn, Seaborn, NumPy, PandasProfiling (Nov ’23)
Education
MSc Data Science
University of Sheffield
Sep 2023 – Sep 2024
Grade: Upper Second-Class Honours
Certifications
Microsoft Certified: Azure Data Scientist Associate (DP-100)
Dec 2024
Verify Credential