Graduate Teaching Assistant (Part Time)

University of Sheffield, Sheffield, UK
Feb 2025 – Present

  • Mentor MSc students through hands-on labs using AWS, PySpark, and Databricks, enhancing practical understanding of scalable data workflows.
  • Develop lab sheets and teaching materials in collaboration with module leaders, increasing student engagement and enhancing course feedback scores by 15%.

Big Data Engineer

IQVIA, Bangalore, India
Jun 2022 – Aug 2023

  • Built scalable ETL pipelines using Apache Spark and Airflow on Microsoft Azure to process over millions of patient records weekly, achieving 99.8% reliability across production environments and supporting downstream data applications.
  • Automated Hadoop cluster clean-up using Bash scripts and Airflow DAGs, reducing manual intervention and cutting recurring job failures by 40%, which significantly improved workflow stability on shared infrastructure.
  • Optimised and developed Spark jobs by upgrading the codebase from Scala 2 to Scala 3 and refactoring inefficient transformations, mitigating pipeline runtime from 12 to 8 hours and lowering shared cluster compute costs.
  • Resolved over 10 high-priority production incidents through root cause analysis and team collaboration, cutting incident response time by 50% and boosting documentation knowledge base via JIRA and Confluence.
  • Engineered Spark SQL workflows and optimised joins and partitioning to process multi-terabyte patient datasets, transforming raw data into reporting-ready tables and ensuring 95%+ accuracy in automated stakeholder reports.

Machine Learning Engineer

CDAC, Pune, India
Oct 2021 – Jun 2022

  • Delivered an interactive NLP analytics dashboard using Django and Python, integrating named-entity-recognition, topic modelling, and word clouds enabling domain experts to analyse faster and reduce manual review time by over 60%.
  • Fine-tuned multilingual transformer models using Hugging Face and PyTorch for chatbot under the Kanthasth project, improving response accuracy by 30% based on internal evaluation metrics.
  • Created a user-friendly Streamlit interface to evaluate 10+ translation models, simplifying file-based input and cutting validation time for non-technical testers by 80%.
  • Spearheaded onboarding peers, accelerating onboarding through structured knowledge transfer on NLP tools like spaCy and scikit-learn, coding standards, and workflow practices, streamlining their integration into the team.

Skills Summary

Programming Languages: Python, SQL, R, Bash, Scala
GenAI Tools: LangChain, Ollama, Chroma, FAISS, Hugging Face, RAG, OpenAI APIs
Machine Learning: Scikit-learn, PyTorch, TensorFlow, Optuna
Cloud & MLOps: Azure ML, AWS, MLflow, CI/CD, Git, Docker, Airflow
Data Engineering: Apache Spark, PySpark, ETL Pipelines, MongoDB, MySQL
Visualisation: Power BI, Matplotlib, Seaborn, Streamlit, FastAPI


Projects

Knowledge-Based Q&A Chatbot (RAG, LangChain, Mistral-7B)
LLM based chatbot that processes user-uploaded PDFs by generating vector embeddings from split text chunks, enabling accurate responses to natural language queries through semantic search.
Tech: Python, Streamlit, FAISS, Hugging Face, Ollama (Apr ’25)
GitHub: https://github.com/kameshkotwani/rag-app

Multi-Modal Product Categorisation API (Tabular + Image Features)
Multi-input ML model that takes product text features and images as input, generates text and visual embeddings, and classifies products with 90% accuracy across 50+ categories.
Tech: Python, FastAPI, Docker, MLflow, GitHub, CatBoost, ResNet and SBERT (Apr ’25)
GitHub: https://github.com/kameshkotwani/product-categorisation-api

Real-Time Sentiment Analysis (API, CI/CD, MLflow)
API service that interprets text input and returns nuanced sentiment labels with confidence scores, achieving an F1-score of 0.95.
Tech: Python, FastAPI, AWS, MLOps (Feb ’25)
GitHub: https://github.com/kameshkotwani/mlops-mini-project

Leaf Disease Classification (MSc Dissertation, PyTorch)
Streamlit application that accepts user-uploaded images, classifies them into disease categories, achieves a macro F1 score of 0.87.
Tech: Python, PyTorch, OpenCV, AlexNet (Sep ’24)

Credit Card Fraud Detection (EDA, Streamlit, Machine Learning)
Dashboard web application to identify fraudulent credit card transactions.
Tech: Scikit-learn, Seaborn, NumPy, PandasProfiling (Nov ’23)


Education

MSc Data Science
University of Sheffield
Sep 2023 – Sep 2024
Grade: Upper Second-Class Honours


Certifications

Microsoft Certified: Azure Data Scientist Associate (DP-100)
Dec 2024
Verify Credential