- M.S. Business Analytics | Carnegie Mellon University (May 2024)
- B.A. Political Science| University of California Berkeley
Data Scientist @ Freelance (October 2023 - Present)
- Utilized statistical methods and the Python library, Pandas, to manage and analyze large datasets (IPEDS, PUMS, IRS)
- Streamlined the integration of ChatGPT-turbo4 and Llama2 AI with client-provided large textual datasets, enhancing analysis and data interpretation processes.
- Applied Natural Language Processing (NLP) techniques for comprehensive analysis and interpretation of textual data, enabling effective understanding of user behavior and valuable insights extraction.
Operations Manager @ Fancy Dream USA INC (March 2022 - May 2024)
- Designed a comprehensive SQL customer database, integrated with order and sales databases.
- Utilized Tableau to transform intricate data into actionable insights, driving strategic planning and business growth through sales and analytical reports.
- Led a team, employing data driven strategies for informed decision making, enhancing operational efficiency.
- Established partnerships with major corporations, fostering valuable relationships.
Created an RAG agent that scrapes documents using FireCrawl API to divide data into segments for the agent or utilize Llamaindex for extracting content from PDF files. Implemented a local Llama3 model to analyze the extracted data and incorporated Tavily API to search the web for up-to-date information if the retrieved documents are not pertinent to the query.
Developted a project analyzing app user behavior using a large dataset. Through data cleaning with NLP techniques and employing GPT-4 Turbo for better AI responses, we gained valuable insights. The findings supported the development of a user-centric approach for app improvements. Regular project progress was shared on GitHub. Key tools used included Python, NLP, GPT-4 Turbo API, and pandas.
Developed a solution to differentiate between authentic and fake product reviews using sophisticated machine learning models. This involved utilizing Multinomial Naive Bayes, Random Forest, and Gradient Boosting classifiers in conjunction with TF-IDF and Word2Vec vectorization methods. Performance evaluation metrics and API implementation were employed to assess model effectiveness and provide user-friendly interface respectively.
Executed a detailed exploration on a massive dataset containing over a million rows. The process involved data cleaning, in-depth analysis using various statistical techniques, and the application of outlier detection methods for more accurate results. This project showcases the ability to handle and extract meaningful insights from large datasets.