Shyam Gupta

Logo

Data Scientist | LLMs, RAG, CV, Bayesian Analytics | Dortmund, Germany

View My GitHub Profile

Shyam Gupta

Data Scientist • LLMs • RAG • Computer Vision • Bayesian Analytics

Dortmund, Germany • Open to impactful AI and data-driven work


About Me

I build practical AI systems with a strong interest in LLMs, Retrieval-Augmented Generation, Computer Vision, Bayesian modeling, and analytics. I enjoy working on messy real-world problems where data, reasoning, and engineering have to cooperate instead of throwing chairs at each other.

Education

Master of Science in Data Science

Technical University of Dortmund, Germany

Oct 2023 – Present

B.Sc. in Applied Statistics and Analytics

Devi Ahilya Vishwavidyalaya, Indore, India

Jul 2019 – Jun 2022

Experience

Werkstudent Data Scientist

Beneering GmbH — Bottrop, NRW, Germany

Aug 2025 – Present
  • Working on AI development and integration projects using LLMs and Retrieval-Augmented Generation.
  • Building chatbot and AI application workflows with LangChain, LlamaIndex, and related frameworks.
  • Contributing to production-oriented integration of generative AI systems into enterprise workflows.

Student Job / Curriculum & Methods Hub Contributor

GESIS – Leibniz Institute for Social Sciences / SHK — Köln, NRW, Germany

Apr 2025 – Oct 2025
  • Maintained and improved tutorials, methods hub content, and related documentation.
  • Reviewed materials for clarity, correctness, and reliability before publication.
  • Created and refined social-science-focused technical learning resources.

Werkstudent

Property Expert GmbH — Langenfeld, Germany

3 months
  • Performed checkbox verification and annotation workflows.
  • Automated parts of the checking pipeline using GCP to improve model verification processes.

Lead Data Scientist

Mindful Automation — Mangalore, India

Sep 2022 – Oct 2023
  • Built invoice OCR pipelines for Arabic invoices using Google Vision, Translate APIs, and PaddleOCR.
  • Improved accuracy and speed of document understanding workflows for billing and invoice extraction.
  • Worked on automated KYC verification using advanced preprocessing and object detection methods.
  • Reduced false positives using data-centric improvements and improved OCR accuracy to 90%+.
Arabic OCR Invoice OCR

Skills

Python R SQL NoSQL LLMs RAG LangChain LlamaIndex Computer Vision Deep Learning Bayesian Analytics MLOps Google Cloud Tableau

Projects

CapiPort

Portfolio optimization tool for Indian equity markets focused on balancing risk and return.

Live Demo

Kaggle Notebooks & Competitions

Practical notebooks, experiments, and competition work with a focus on learning by building.

View Kaggle

Hugging Face Projects

Deployed Gradio and ML demos.

Hugging Face

Talks & Public Presence

Guest Speaker — More Than ML

Invited speaker session.

Open

Microsoft Event — Introduction to Computer Vision

Public session and knowledge sharing event.

Open

Certifications & Profiles

Intermediate SQL Certificate

View

NoSQL Certificate

View

Writing & Blogs

Medium & LinkedIn Writing

Writing on AI, analytics, learning, and technical exploration.

Medium LinkedIn

Volunteering

Robin Hood Army

Helped serve people in need during lockdown in India and contributed to food distribution efforts.

Website