## Project: Portfolio - Final Project (Instruction corner)

**Instructions for Students:**

Please carefully follow these steps to complete and submit your assignment:

1. **Completing the Assignment**: You are required to work on and complete all tasks in the provided assignment. Be disciplined and ensure that you thoroughly engage with each task.
   
2. **Creating a Google Drive Folder**: If you don't previously have a folder for collecting assignments, you must create a new folder in your Google Drive. This will be a repository for all your completed assignment files, helping you keep your work organized and easy to access.
   
3. **Uploading Completed Assignment**: Upon completion of your assignment, make sure to upload all necessary files, involving codes, reports, and related documents into the created Google Drive folder. Save this link in the 'Student Identity' section and also provide it as the last parameter in the `submit` function that has been provided.
   
4. **Sharing Folder Link**: You're required to share the link to your assignment Google Drive folder. This is crucial for the submission and evaluation of your assignment.
   
5. **Setting Permission toPublic**: Please make sure your **Google Drive folder is set to public**. This allows your instructor to access your solutions and assess your work correctly.

Adhering to these procedures will facilitate a smooth assignment process for you and the reviewers.

**Description:**

Welcome to your final portfolio project assignment for AI Bootcamp. This is your chance to put all the skills and knowledge you've learned throughout the bootcamp into action by creating real-world AI application.

You have the freedom to create any application or model, be it text-based or image-based or even voice-based or multimodal.

To get you started, here are some ideas:

1. **Sentiment Analysis Application:** Develop an application that can determine sentiment (positive, negative, neutral) from text data like reviews or social media posts. You can use Natural Language Processing (NLP) libraries like NLTK or TextBlob, or more advanced pre-trained models from transformers library by Hugging Face, for your sentiment analysis model.

2. **Chatbot:** Design a chatbot serving a specific purpose such as customer service for a certain industry, a personal fitness coach, or a study helper. Libraries like ChatterBot or Dialogflow can assist in designing conversational agents.

3. **Predictive Text Application:** Develop a model that suggests the next word or sentence similar to predictive text on smartphone keyboards. You could use the transformers library by Hugging Face, which includes pre-trained models like GPT-2.

4. **Image Classification Application:** Create a model to distinguish between different types of flowers or fruits. For this type of image classification task, pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be utilized.

5. **News Article Classifier:** Develop a text classification model that categorizes news articles into predefined categories. NLTK, SpaCy, and sklearn are valuable libraries for text pre-processing, feature extraction, and building classification models.

6. **Recommendation System:** Create a simplified recommendation system. For instance, a book or movie recommender based on user preferences. Python's Surprise library can assist in building effective recommendation systems.

7. **Plant Disease Detection:** Develop a model to identify diseases in plants using leaf images. This project requires a good understanding of convolutional neural networks (CNNs) and image processing. PyTorch, TensorFlow, and OpenCV are all great tools to use.

8. **Facial Expression Recognition:** Develop a model to classify human facial expressions. This involves complex feature extraction and classification algorithms. You might want to leverage deep learning libraries like TensorFlow or PyTorch, along with OpenCV for processing facial images.

9. **Chest X-Ray Interpretation:** Develop a model to detect abnormalities in chest X-ray images. This task may require understanding of specific features in such images. Again, TensorFlow and PyTorch for deep learning, and libraries like SciKit-Image or PIL for image processing, could be of use.

10. **Food Classification:** Develop a model to classify a variety of foods such as local Indonesian food. Pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be a good starting point.

11. **Traffic Sign Recognition:** Design a model to recognize different traffic signs. This project has real-world applicability in self-driving car technology. Once more, you might utilize PyTorch or TensorFlow for the deep learning aspect, and OpenCV for image processing tasks.

**Submission:**

Please upload both your model and application to Huggingface or your own Github account for submission.

**Presentation:**

You are required to create a presentation to showcase your project, including the following details:

- The objective of your model.
- A comprehensive description of your model.
- The specific metrics used to measure your model's effectiveness.
- A brief overview of the dataset used, including its source, pre-processing steps, and any insights.
- An explanation of the methodology used in developing the model.
- A discussion on challenges faced, how they were handled, and your learnings from those.
- Suggestions for potential future improvements to the model.
- A functioning link to a demo of your model in action.

**Grading:**

Submissions will be manually graded, with a select few given the opportunity to present their projects in front of a panel of judges. This will provide valuable feedback, further enhancing your project and expanding your knowledge base.

Remember, consistent practice is the key to mastering these concepts. Apply your knowledge, ask questions when in doubt, and above all, enjoy the process. Best of luck to you all!


In [None]:
# @title #### Student Identity
student_id = "REA6KIZML" # @param {type:"string"}
name = "Shavira Zhalsabilla" # @param {type:"string"}
drive_link = "https://drive.google.com/drive/folders/1l_vP2voZb_0yig7FHn6ktbrh5owiqLHF?usp=drive_link"  # @param {type:"string"}
assignment_id = "00_portfolio_project"

## Installation and Import `rggrader` Package (for submit final project)

In [None]:
%pip install rggrader
from rggrader import submit_image
from rggrader import submit

## Working Space

For result go to 4.2 Temporary Result

In [None]:
# Write your code here
# Feel free to add new code block as needed



### 0.1 Data Source

[Kaggle: Spotify and YouTube](https://www.kaggle.com/datasets/salvatorerastelli/spotify-and-youtube/data)

### 0.2 Resources

1. [Large Language Models (LLMs) for Recommendations (Paper Walkthrough)](https://www.youtube.com/watch?app=desktop&v=g0EJgVAO7QM)
2. [Using Large Language Models as Recommendation Systems](https://towardsdatascience.com/using-large-language-models-as-recommendation-systems-49e8aeeff29b/)
3. [How to use LLMs for creating a content-based recommendation system for entertainment platforms?](https://www.leewayhertz.com/build-content-based-recommendation-for-entertainment-using-llms/) -> reasons here.
4. [LLM game recommender](https://medium.com/@elisarm.antunes/llm-game-recommender-8403e232db4b)
5. [Build a semantic book recommender](https://www.freecodecamp.org/news/build-a-semantic-book-recommender-using-an-llm-and-python/)
6. [Fine-Tuning DeepSeek LLM: Adapting Open-Source AI for Your Needs](https://abhishek-maheshwarappa.medium.com/fine-tuning-deepseek-llm-adapting-open-source-ai-for-your-needs-12a7e5572fa5)
7. [Deepseek V3 vs R1](https://www.datacamp.com/blog/deepseek-r1-vs-v3)
8. [The Complete Guide to DeepSeek Models: From V3 to R1 and Beyond](https://www.bentoml.com/blog/the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond)
9. [Evaluating Large Language Model (LLM) systems: Metrics, challenges, and best practices](https://medium.com/data-science-at-microsoft/evaluating-llm-systems-metrics-challenges-and-best-practices-664ac25be7e5)

### 1 Import dataset

In [1]:
# Mount Gdrive for importing file
from google.colab import drive
drive.mount('/content/drive')

path = '/content/drive/MyDrive/mastering-ai/final-project-REA6KIZML-shavira/'

Mounted at /content/drive


In [2]:
# To show all columns and rows
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [3]:
# Import data
df = pd.read_csv(f"{path}Spotify_Youtube.csv")
df = df.drop('Unnamed: 0', axis=1)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20718 entries, 0 to 20717
Data columns (total 27 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Artist            20718 non-null  object 
 1   Url_spotify       20718 non-null  object 
 2   Track             20718 non-null  object 
 3   Album             20718 non-null  object 
 4   Album_type        20718 non-null  object 
 5   Uri               20718 non-null  object 
 6   Danceability      20716 non-null  float64
 7   Energy            20716 non-null  float64
 8   Key               20716 non-null  float64
 9   Loudness          20716 non-null  float64
 10  Speechiness       20716 non-null  float64
 11  Acousticness      20716 non-null  float64
 12  Instrumentalness  20716 non-null  float64
 13  Liveness          20716 non-null  float64
 14  Valence           20716 non-null  float64
 15  Tempo             20716 non-null  float64
 16  Duration_ms       20716 non-null  float6

In [4]:
df.head()

Unnamed: 0,Artist,Url_spotify,Track,Album,Album_type,Uri,Danceability,Energy,Key,Loudness,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Duration_ms,Url_youtube,Title,Channel,Views,Likes,Comments,Description,Licensed,official_video,Stream
0,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,Feel Good Inc.,Demon Days,album,spotify:track:0d28khcov6AiegSCpG5TuT,0.818,0.705,6.0,-6.679,0.177,0.00836,0.00233,0.613,0.772,138.559,222640.0,https://www.youtube.com/watch?v=HyHNuVaZJ-k,Gorillaz - Feel Good Inc. (Official Video),Gorillaz,693555221.0,6220896.0,169907.0,Official HD Video for Gorillaz' fantastic trac...,True,True,1040235000.0
1,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,Rhinestone Eyes,Plastic Beach,album,spotify:track:1foMv2HQwfQ2vntFf9HFeG,0.676,0.703,8.0,-5.815,0.0302,0.0869,0.000687,0.0463,0.852,92.761,200173.0,https://www.youtube.com/watch?v=yYDmaexVHic,Gorillaz - Rhinestone Eyes [Storyboard Film] (...,Gorillaz,72011645.0,1079128.0,31003.0,The official video for Gorillaz - Rhinestone E...,True,True,310083700.0
2,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,New Gold (feat. Tame Impala and Bootie Brown),New Gold (feat. Tame Impala and Bootie Brown),single,spotify:track:64dLd6rVqDLtkXFYrEUHIU,0.695,0.923,1.0,-3.93,0.0522,0.0425,0.0469,0.116,0.551,108.014,215150.0,https://www.youtube.com/watch?v=qJa-VFwPpYA,Gorillaz - New Gold ft. Tame Impala & Bootie B...,Gorillaz,8435055.0,282142.0,7399.0,Gorillaz - New Gold ft. Tame Impala & Bootie B...,True,True,63063470.0
3,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,On Melancholy Hill,Plastic Beach,album,spotify:track:0q6LuUqGLUiCPP1cbdwFs3,0.689,0.739,2.0,-5.81,0.026,1.5e-05,0.509,0.064,0.578,120.423,233867.0,https://www.youtube.com/watch?v=04mfKJWDSzI,Gorillaz - On Melancholy Hill (Official Video),Gorillaz,211754952.0,1788577.0,55229.0,Follow Gorillaz online:\nhttp://gorillaz.com \...,True,True,434663600.0
4,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,Clint Eastwood,Gorillaz,album,spotify:track:7yMiX7n9SBvadzox8T5jzT,0.663,0.694,10.0,-8.627,0.171,0.0253,0.0,0.0698,0.525,167.953,340920.0,https://www.youtube.com/watch?v=1V_xRb0x9aw,Gorillaz - Clint Eastwood (Official Video),Gorillaz,618480958.0,6197318.0,155930.0,The official music video for Gorillaz - Clint ...,True,True,617259700.0


### 2 Data Cleaning & Preprocessing

In [4]:
# Drop unused columns
df = df.drop(['Album_type','Uri','Duration_ms'], axis=1)

In [5]:
# Drop duplicates & missing values
df = df.drop_duplicates()
df = df.dropna()

In [6]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

def preprocess_song_data(df):
    # Drop rows missing essential Spotify audio features or title/artist
    required_columns = [
        'Title', 'Artist', 'Valence', 'Energy', 'Danceability',
        'Tempo', 'Acousticness', 'Instrumentalness'
    ]
    df_clean = df.dropna(subset=required_columns).copy()

    # Normalize YouTube and Spotify popularity metrics
    popularity_cols = ['Likes', 'Views', 'Comments', 'Stream']
    for col in popularity_cols:
        if col in df_clean.columns:
            df_clean[col] = df_clean[col].fillna(0)

    scaler = MinMaxScaler()
    df_clean[popularity_cols] = scaler.fit_transform(df_clean[popularity_cols])

    # Create an audio feature vector column for similarity search
    audio_feature_cols = [
        'Valence', 'Energy', 'Danceability', 'Tempo',
        'Acousticness', 'Instrumentalness'
    ]
    df_clean['audio_vector'] = df_clean[audio_feature_cols].values.tolist()

    return df_clean, audio_feature_cols, popularity_cols

In [7]:
df_clean, audio_feature_cols, popularity_cols = preprocess_song_data(df)

In [9]:
df_clean.head()

Unnamed: 0,Artist,Url_spotify,Track,Album,Danceability,Energy,Key,Loudness,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Url_youtube,Title,Channel,Views,Likes,Comments,Description,Licensed,official_video,Stream,audio_vector
0,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,Feel Good Inc.,Demon Days,0.818,0.705,6.0,-6.679,0.177,0.00836,0.00233,0.613,0.772,138.559,https://www.youtube.com/watch?v=HyHNuVaZJ-k,Gorillaz - Feel Good Inc. (Official Video),Gorillaz,0.08584,0.122486,0.010564,Official HD Video for Gorillaz' fantastic trac...,True,True,0.307168,"[0.772, 0.705, 0.818, 138.559, 0.00836, 0.00233]"
1,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,Rhinestone Eyes,Plastic Beach,0.676,0.703,8.0,-5.815,0.0302,0.0869,0.000687,0.0463,0.852,92.761,https://www.youtube.com/watch?v=yYDmaexVHic,Gorillaz - Rhinestone Eyes [Storyboard Film] (...,Gorillaz,0.008913,0.021247,0.001928,The official video for Gorillaz - Rhinestone E...,True,True,0.091562,"[0.852, 0.703, 0.676, 92.761, 0.0869, 0.000687]"
2,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,New Gold (feat. Tame Impala and Bootie Brown),New Gold (feat. Tame Impala and Bootie Brown),0.695,0.923,1.0,-3.93,0.0522,0.0425,0.0469,0.116,0.551,108.014,https://www.youtube.com/watch?v=qJa-VFwPpYA,Gorillaz - New Gold ft. Tame Impala & Bootie B...,Gorillaz,0.001044,0.005555,0.00046,Gorillaz - New Gold ft. Tame Impala & Bootie B...,True,True,0.01862,"[0.551, 0.923, 0.695, 108.014, 0.0425, 0.0469]"
3,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,On Melancholy Hill,Plastic Beach,0.689,0.739,2.0,-5.81,0.026,1.5e-05,0.509,0.064,0.578,120.423,https://www.youtube.com/watch?v=04mfKJWDSzI,Gorillaz - On Melancholy Hill (Official Video),Gorillaz,0.026208,0.035216,0.003434,Follow Gorillaz online:\nhttp://gorillaz.com \...,True,True,0.128349,"[0.578, 0.739, 0.689, 120.423, 1.51e-05, 0.509]"
4,Gorillaz,https://open.spotify.com/artist/3AA28KZvwAUcZu...,Clint Eastwood,Gorillaz,0.663,0.694,10.0,-8.627,0.171,0.0253,0.0,0.0698,0.525,167.953,https://www.youtube.com/watch?v=1V_xRb0x9aw,Gorillaz - Clint Eastwood (Official Video),Gorillaz,0.076548,0.122022,0.009695,The official music video for Gorillaz - Clint ...,True,True,0.182268,"[0.525, 0.694, 0.663, 167.953, 0.0253, 0.0]"


In [11]:
df_clean = df_clean.head()

### 3 Build model

In [None]:
# !pip install langchain faiss-cpu gradio
# !pip install -U langchain langchain-community
# !pip install sentence-transformers transformers
!pip install transformers accelerate langchain

In [9]:
!pip install -U langchain langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.20-py3-none-any.whl.metadata (2.4 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB

In [10]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

In [11]:
import torch
torch.cuda.is_available()

True

In [12]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.llms import HuggingFacePipeline
import torch

model_name = "deepseek-ai/deepseek-llm-7b-chat"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto",
    offload_folder="offload"
)

hf_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=300
)

llm = HuggingFacePipeline(pipeline=hf_pipeline)

print("‚úÖ Model loaded on:", next(model.parameters()).device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.28k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/4.61M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/594 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/22.5k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.97G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.85G [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.6k [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

Device set to use cuda:0


‚úÖ Model loaded on: cuda:0


  llm = HuggingFacePipeline(pipeline=hf_pipeline)


In [13]:
import re, json

def parse_user_input(user_input):
    prompt = f"""
You are an expert music assistant trained to recommend songs for use in creative projects like ads, short films, social media videos, and campaigns.

Your job is to extract structured information from the user's input so that we can recommend songs from our database based on Spotify audio features and YouTube metadata.

Only return the following fields in JSON format:
- "mood": overall emotion (e.g., sad, happy, dramatic)
- "context": what kind of scene or use (e.g., wedding, breakup scene, brand ad)
- "preference": how to sort (e.g., likes, views, popularity)
- "reference_song": name of a song the user wants similar songs to
- "genre": intended musical style (e.g., acoustic, pop, ambient)
- "instrumental": 'yes' if vocals aren't needed, otherwise 'no'
- "tempo": slow / medium / fast
- "artist": if they want something from a specific artist
- "gender": if they prefer male / female vocal
- "limit": how many results to show (as a number)

{{
  "mood": "...",
  "context": "...",
  "preference": "...",
  "reference_song": "...",
  "genre": "...",
  "instrumental": "...",
  "tempo": "...",
  "artist": "...",
  "gender": "...",
  "limit": "..."
}}

User input: "{user_input}"

Respond ONLY with the JSON. Do not explain anything.
"""
    output = llm(prompt)
    print("üß† Raw LLM Output:\n", output)

    try:
        json_blocks = re.findall(r"\{[\s\S]*?\}", output)
        json_str = json_blocks[-1] if json_blocks else "{}"
        parsed = json.loads(json_str)

        # Ensure limit is str of int
        try:
            parsed["limit"] = str(int(parsed["limit"]))
        except:
            parsed["limit"] = "5"

        # Fill any missing keys
        expected_keys = ["mood", "context", "preference", "reference_song", "genre", "instrumental", "tempo", "artist", "gender", "limit"]
        for key in expected_keys:
            if key not in parsed:
                parsed[key] = "" if key != "limit" else "5"

    except Exception as e:
        print("‚ùå JSON parsing failed:", e)
        parsed = {key: "" for key in expected_keys}
        parsed["limit"] = "5"

    return parsed


In [14]:
def mood_to_valence_range(mood):
    mood = mood.strip().lower()
    mood_map = {
        "sad": (0.0, 0.3),
        "melancholy": (0.2, 0.4),
        "emotional": (0.3, 0.5),
        "chill": (0.4, 0.6),
        "nostalgic": (0.3, 0.5),
        "neutral": (0.4, 0.6),
        "hopeful": (0.5, 0.7),
        "happy": (0.6, 0.85),
        "cheerful": (0.7, 0.9),
        "upbeat": (0.7, 1.0),
        "energetic": (0.8, 1.0),
        "romantic": (0.4, 0.7),
        "heartbreak": (0.1, 0.3),
        "dark": (0.0, 0.2),
        "dramatic": (0.3, 0.5),
        "angry": (0.0, 0.3),
        "inspiring": (0.6, 0.85),
        "relaxing": (0.4, 0.6),
        "peaceful": (0.3, 0.5),
        "epic": (0.5, 0.8)
    }

    # fallback: no filter
    return mood_map.get(mood, (0.0, 1.0))

In [15]:
!pip install gender-guesser

Collecting gender-guesser
  Downloading gender_guesser-0.4.0-py2.py3-none-any.whl.metadata (3.0 kB)
Downloading gender_guesser-0.4.0-py2.py3-none-any.whl (379 kB)
[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/379.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m379.3/379.3 kB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gender-guesser
Successfully installed gender-guesser-0.4.0


In [16]:
import gender_guesser.detector as gender
gender_detector = gender.Detector()

In [17]:
def infer_gender(artist_name):
  try:
      first_name = artist_name.split()[0]
      guess = gender_detector.get_gender(first_name)
      if "female" in guess: return "female"
      if "male" in guess: return "male"
      return "unknown"
  except:
      return "unknown"

In [18]:
df_clean["Gender"] = df_clean["Artist"].apply(infer_gender)

In [19]:
genre_filters = {
    "pop": lambda df: (df["Danceability"] > 0.6) & (df["Energy"] > 0.5) & (df["Acousticness"] < 0.5),
    "rock": lambda df: (df["Energy"] > 0.75) & (df["Acousticness"] < 0.4),
    "jazz": lambda df: (df["Acousticness"] > 0.6) & (df["Instrumentalness"] > 0.5) & (df["Energy"] < 0.6),
    "rnb": lambda df: (df["Danceability"] > 0.6) & (df["Speechiness"] < 0.33) & (df["Acousticness"] > 0.3),
    "acoustic": lambda df: df["Acousticness"] > 0.8,
    "hip hop": lambda df: (df["Speechiness"] > 0.4) & (df["Danceability"] > 0.6),
    "edm": lambda df: (df["Danceability"] > 0.7) & (df["Energy"] > 0.7) & (df["Acousticness"] < 0.3),
    "indie": lambda df: (df["Acousticness"] > 0.4) & (df["Energy"] < 0.6),
    "dangdut": lambda df: (df["Danceability"] > 0.6) & (df["Speechiness"] < 0.4) & (df["Acousticness"] > 0.4) & (df["Tempo"].between(70, 130)),
    "keroncong": lambda df: (df["Acousticness"] > 0.8) & (df["Energy"] < 0.5) & (df["Instrumentalness"] > 0.3),
}


In [20]:
def recommend_from_dataframe(parsed, df_clean, randomize = False):
    df_filtered = df_clean.copy()
    filters_applied = set()
    top_n = int(parsed["limit"])
    sort_col = parsed.get("preference", "Likes") or "Likes"
    if sort_col not in df_clean.columns:
        sort_col = "Likes"

    print("üéµ Total songs before filtering:", len(df_filtered))

    # Mood ‚Üí valence
    val_min, val_max = mood_to_valence_range(parsed["mood"])
    df_filtered = df_filtered[(df_filtered["Valence"] >= val_min) & (df_filtered["Valence"] <= val_max)]
    filters_applied.add("Valence")
    print("üéØ Applied mood ‚Üí Valence filter:", len(df_filtered))

    # Genre
    genre = parsed["genre"].strip().lower()
    if genre not in ["", "...", "none", "n/a","null","unknown"] and genre in genre_filters and not {"Acousticness", "Energy", "Instrumentalness", "Speechiness", "Tempo"} & filters_applied:
        try:
            genre_mask = genre_filters[genre](df_filtered)
            df_filtered = df_filtered[genre_mask]
            filters_applied.update(genre_filters[genre].__code__.co_names)
            print(f"üéØ Applied genre filter for '{genre}':", len(df_filtered))
        except Exception as e:
            print(f"‚ö†Ô∏è Failed to apply genre filter for '{genre}':", e)
    else:
        print("‚ö†Ô∏è Skipped genre filter due to value or overlap.")

    # Instrumental
    instrumental = parsed["instrumental"].strip().lower()
    if instrumental == "yes" and "Instrumentalness" not in filters_applied:
        df_filtered = df_filtered[df_filtered["Instrumentalness"] > 0.000002]
        filters_applied.add("Instrumentalness")
        print("üéØ Applied instrumental filter:", len(df_filtered))
    else:
        print("‚ö†Ô∏è Skipped instrumental filter (empty or overlap)")

    # Tempo
    tempo = parsed["tempo"].strip().lower()
    if tempo not in ["", "...", "none", "n/a","null","unknown"] and "Tempo" not in filters_applied:
        if tempo == "fast":
            df_filtered = df_filtered[df_filtered["Tempo"] > 120]
        elif tempo == "slow":
            df_filtered = df_filtered[df_filtered["Tempo"] < 90]
        elif tempo == "medium":
            df_filtered = df_filtered[(df_filtered["Tempo"] >= 90) & (df_filtered["Tempo"] <= 120)]
        filters_applied.add("Tempo")
        print("üéØ Applied tempo filter:", len(df_filtered))
    else:
        print("‚ö†Ô∏è Skipped tempo filter (empty or overlap)")

    # Gender
    gender = parsed["gender"].strip().lower()
    if gender not in ["", "...", "none", "n/a","null"]:
        if "Gender" in df_filtered.columns:
            df_filtered = df_filtered[df_filtered["Gender"].str.lower() == gender]
            filters_applied.add("Gender")
            print("üéØ Applied gender filter:", len(df_filtered))
        else:
            print("‚ö†Ô∏è Gender column not found ‚Äî skipping.")

    if randomize:
        df_sorted = df_filtered.sample(frac=1).head(top_n)
    else:
        df_sorted = df_filtered.sort_values(by=sort_col, ascending=False).head(top_n)

    print("‚úÖ Final songs returned:", len(df_sorted))
    return df_sorted

In [21]:
from langchain.prompts import PromptTemplate

reference_song_prompt = PromptTemplate.from_template("""
You are a music expert. A user has requested songs similar to a reference song that may not exist in our database.

Your job is to describe the following audio features of the reference song as accurately as possible, based on your knowledge:

- mood (e.g. sad, energetic, chill)
- genre (e.g. pop, acoustic, EDM)
- instrumental (yes/no)
- tempo (slow, medium, fast)
- artist (who performed the song)
- gender (male, female, group, unknown)

Respond ONLY in the following JSON format:
{
  "mood": "...",
  "genre": "...",
  "instrumental": "...",
  "tempo": "...",
  "artist": "...",
  "gender": "..."
}

Reference song: "{reference_song}"

Return only the JSON. Do not include any other explanation.
""")

In [22]:
import re
import json

def recommend_by_reference_song(parsed, df_clean, audio_feature_cols, llm, reference_song_prompt):
    reference_title = parsed["reference_song"].strip().lower()
    top_n = int(parsed.get("limit", 5))
    sort_col = parsed.get("preference", "Likes") or "Likes"
    if sort_col not in df_clean.columns:
        sort_col = "Likes"

    # üß† Try to find exact or partial match in catalog
    matches = df_clean[df_clean["Title"].str.lower().str.contains(reference_title, na=False)]

    if not matches.empty:
        print("‚úÖ Reference song found in catalog ‚Äî using similarity-based recommendation.")

        from sklearn.metrics.pairwise import cosine_similarity
        import numpy as np

        ref_vec = np.array(matches[audio_feature_cols].values[0]).reshape(1, -1)
        all_vecs = np.array(df_clean[audio_feature_cols])
        sims = cosine_similarity(ref_vec, all_vecs)[0]

        df_clean["similarity"] = sims
        results = df_clean[df_clean["Title"].str.lower() != reference_title]
        return results.sort_values("similarity", ascending=False).head(top_n)

    else:
        print("üß† Reference song NOT in catalog ‚Äî asking LLM to describe its features.")

        prompt = reference_song_prompt.format(reference_song=parsed["reference_song"])
        llm_output = llm(prompt)
        print("üîç LLM Output:", llm_output)

        # Try parsing the output
        try:
            json_str = re.search(r"\{[\s\S]*?\}", llm_output).group()
            ref_features = json.loads(json_str)
        except:
            print("‚ö†Ô∏è Failed to parse LLM output. Using fallback.")
            ref_features = {
                "mood": "", "genre": "", "instrumental": "", "tempo": "",
                "artist": "", "gender": "", "limit": parsed.get("limit", "5")
            }

        # Force keys into recommendation format
        ref_features["limit"] = parsed.get("limit", "5")
        ref_features["preference"] = parsed.get("preference", "Likes")

        print("üéØ Parsed features from reference song:", ref_features)

        return recommend_from_dataframe(ref_features, df_clean)

In [23]:
def extract_explanation(text):
    split_text = text.split("Write a 2-sentence explanation.")
    return split_text[1].strip() if len(split_text) > 1 else text.strip()

def generate_explanation(user_input, song_row):
    prompt = f"""
User asked: "{user_input}"
Why is this song a good fit?

- Title: {song_row['Title']}
- Artist: {song_row['Artist']}
- Valence: {song_row['Valence']}
- Tempo: {song_row['Tempo']}
- Instrumentalness: {song_row['Instrumentalness']}
- Description: {song_row.get('Description', '')}

Write a 2-sentence explanation.
"""
    raw = llm(prompt)

    # If raw is a string (not a list), return it directly
    if isinstance(raw, str):
        return extract_explanation(raw)

    # If raw is a list of dicts (like HF pipeline), extract .generated_text
    if isinstance(raw, list) and "generated_text" in raw[0]:
        return extract_explanation(raw[0]["generated_text"])

    # Default fallback
    return str(raw)

#### 3.1 Testing corner

In [24]:
user_input = input("What kind of song are you looking for? ")
parsed = parse_user_input(user_input)

if parsed["reference_song"].strip().lower() not in ["", "...", "none", "n/a","null","unknown"]:
    results = recommend_by_reference_song(
        parsed,
        df_clean,
        audio_feature_cols,
        llm,
        reference_song_prompt
    )
else:
    results = recommend_from_dataframe(parsed, df_clean, randomize=False)

# Show results
if results.empty:
    print("‚ö†Ô∏è No matching songs found.")
else:
    from IPython.display import display
    display(results)

What kind of song are you looking for? give me song for infant formula ads using female vocal


  output = llm(prompt)


üß† Raw LLM Output:
 
You are an expert music assistant trained to recommend songs for use in creative projects like ads, short films, social media videos, and campaigns.

Your job is to extract structured information from the user's input so that we can recommend songs from our database based on Spotify audio features and YouTube metadata.

Only return the following fields in JSON format:
- "mood": overall emotion (e.g., sad, happy, dramatic)
- "context": what kind of scene or use (e.g., wedding, breakup scene, brand ad)
- "preference": how to sort (e.g., likes, views, popularity)
- "reference_song": name of a song the user wants similar songs to
- "genre": intended musical style (e.g., acoustic, pop, ambient)
- "instrumental": 'yes' if vocals aren't needed, otherwise 'no'
- "tempo": slow / medium / fast
- "artist": if they want something from a specific artist
- "gender": if they prefer male / female vocal
- "limit": how many results to show (as a number)

{
  "mood": "...",
  "cont

Unnamed: 0,Artist,Url_spotify,Track,Album,Danceability,Energy,Key,Loudness,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Url_youtube,Title,Channel,Views,Likes,Comments,Description,Licensed,official_video,Stream,audio_vector,Gender
154,Shakira,https://open.spotify.com/artist/0EmeFodog0BfCg...,Waka Waka (This Time for Africa) [The Official...,Listen Up! The Official 2010 FIFA World Cup Album,0.758,0.871,2.0,-6.408,0.147,0.0062,0.0,0.0663,0.753,126.994,https://www.youtube.com/watch?v=pRpeEdMmmQ0,Shakira - Waka Waka (This Time for Africa) (Th...,shakiraVEVO,0.428709,0.400245,0.079806,"Watch the official music video for ""Waka Waka ...",True,True,0.186006,"[0.753, 0.871, 0.758, 126.994, 0.0062, 0.0]",female
18046,Camila Cabello,https://open.spotify.com/artist/4nDoRrQiYLoBzw...,Se√±orita,Shawn Mendes (Deluxe),0.759,0.548,9.0,-6.049,0.029,0.0392,0.0,0.0828,0.749,116.967,https://www.youtube.com/watch?v=Pkh8UtuejGw,"Shawn Mendes, Camila Cabello - Se√±orita (Offic...",ShawnMendesVEVO,0.184123,0.390759,0.039813,Se√±orita: https://Senorita.lnk.to/OutNow \n\nC...,True,True,0.689858,"[0.749, 0.548, 0.759, 116.967, 0.0392, 0.0]",female
13526,Selena Gomez,https://open.spotify.com/artist/0C8ZW7ezQVs4UR...,"Taki Taki (with Selena Gomez, Ozuna & Cardi B)","Taki Taki (with Selena Gomez, Ozuna & Cardi B)",0.842,0.801,8.0,-4.167,0.228,0.157,5e-06,0.0642,0.617,95.881,https://www.youtube.com/watch?v=ixkoVwKQaJg,"DJ Snake - Taki Taki ft. Selena Gomez, Ozuna, ...",DJSnakeVEVO,0.2988,0.369689,0.034382,Stream and download Taki Taki - https://djsnak...,True,True,0.407427,"[0.617, 0.801, 0.842, 95.881, 0.157, 4.82e-06]",female
13919,Kimbra,https://open.spotify.com/artist/6hk7Yq1DU9QcCC...,Somebody That I Used To Know,Making Mirrors,0.864,0.495,0.0,-7.036,0.037,0.591,0.000133,0.0992,0.72,129.062,https://www.youtube.com/watch?v=8UVNT4wvIGY,Gotye - Somebody That I Used To Know (feat. Ki...,gotyemusic,0.254617,0.287086,0.049306,The official music video for ‚ÄúSomebody That I ...,True,True,0.389268,"[0.72, 0.495, 0.864, 129.062, 0.591, 0.000133]",female
19637,Lil Nas X,https://open.spotify.com/artist/7jVv8c5Fj3E9Vh...,Old Town Road - Remix,7 EP,0.878,0.619,6.0,-5.56,0.102,0.0533,0.0,0.113,0.639,136.041,https://www.youtube.com/watch?v=r7qovpFAGrQ,Lil Nas X - Old Town Road (Official Video) ft....,LilNasXVEVO,0.132348,0.235133,0.013437,Week 17 version of Lil Nas X‚Äôs Billboard #1 hi...,True,True,0.411602,"[0.639, 0.619, 0.878, 136.041, 0.0533, 0.0]",female


In [50]:
user_input = input("What kind of song are you looking for? ")
parsed = parse_user_input(user_input)

What kind of song are you looking for? I want 3 happy songs for my ecommerce big discount campaign in christmas
üß† Raw LLM Output:
 
You are an expert music assistant trained to recommend songs for use in creative projects like ads, short films, social media videos, and campaigns.

Your job is to extract structured information from the user's input so that we can recommend songs from our database based on Spotify audio features and YouTube metadata.

Only return the following fields in JSON format:
- "mood": overall emotion (e.g., sad, happy, dramatic)
- "context": what kind of scene or use (e.g., wedding, breakup scene, brand ad)
- "preference": how to sort (e.g., likes, views, popularity)
- "reference_song": name of a song the user wants similar songs to
- "genre": intended musical style (e.g., acoustic, pop, ambient)
- "instrumental": 'yes' if vocals aren't needed, otherwise 'no'
- "tempo": slow / medium / fast
- "artist": if they want something from a specific artist
- "gender": 

In [214]:
parsed

{'mood': 'sad',
 'context': 'breakup montage',
 'preference': 'views',
 'reference_song': '...',
 'genre': 'indie',
 'instrumental': 'no',
 'tempo': 'medium',
 'artist': '...',
 'gender': '...',
 'limit': '5'}

In [51]:
if parsed["reference_song"].strip().lower() not in ["", "...", "none", "n/a"]:
    results = recommend_by_reference_song(
        parsed,
        df_clean,
        audio_feature_cols,
        llm,
        reference_song_prompt
    )
else:
    results = recommend_from_dataframe(parsed, df_clean)

# Show results
if results.empty:
    print("‚ö†Ô∏è No matching songs found.")
else:
    from IPython.display import display
    display(results)

üéµ Total songs before filtering: 19170
üéØ Applied mood ‚Üí Valence filter: 5871
‚ö†Ô∏è Skipped genre filter due to value or overlap.
‚ö†Ô∏è Skipped instrumental filter (empty or overlap)
‚ö†Ô∏è Skipped tempo filter (empty or overlap)
‚úÖ Final songs returned: 3


Unnamed: 0,Artist,Url_spotify,Track,Album,Danceability,Energy,Key,Loudness,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Url_youtube,Title,Channel,Views,Likes,Comments,Description,Licensed,official_video,Stream,audio_vector,Gender,similarity
1147,Luis Fonsi,https://open.spotify.com/artist/4V8Sr092TqfHkf...,Despacito,VIDA,0.655,0.797,2.0,-4.787,0.153,0.198,0.0,0.067,0.839,177.928,https://www.youtube.com/watch?v=kJQP7kiw5Fk,Luis Fonsi - Despacito ft. Daddy Yankee,LuisFonsiVEVO,1.0,1.0,0.264425,‚ÄúDespacito‚Äù disponible ya en todas las platafo...,True,True,0.44488,"[0.839, 0.797, 0.655, 177.928, 0.198, 0.0]",male,0.999998
365,Daddy Yankee,https://open.spotify.com/artist/4VMYDCV2IEDYJA...,Despacito,VIDA,0.655,0.797,2.0,-4.787,0.153,0.198,0.0,0.067,0.839,177.928,https://www.youtube.com/watch?v=kJQP7kiw5Fk,Luis Fonsi - Despacito ft. Daddy Yankee,LuisFonsiVEVO,1.0,0.999999,0.264425,‚ÄúDespacito‚Äù disponible ya en todas las platafo...,True,True,0.44488,"[0.839, 0.797, 0.655, 177.928, 0.198, 0.0]",unknown,0.999998
14561,BTS,https://open.spotify.com/artist/3Nrfpe0tUJi4K4...,Dynamite,BE,0.746,0.765,6.0,-4.41,0.0993,0.0112,0.0,0.0936,0.737,114.044,https://www.youtube.com/watch?v=gdZLi9oWNZg,BTS (Î∞©ÌÉÑÏÜåÎÖÑÎã®) 'Dynamite' Official MV,HYBE LABELS,0.203096,0.706705,1.0,BTS (Î∞©ÌÉÑÏÜåÎÖÑÎã®) 'Dynamite' Official MV\n\n\nCredit...,True,True,0.467277,"[0.737, 0.765, 0.746, 114.044, 0.0112, 0.0]",unknown,0.999995


In [200]:
results = recommend_from_dataframe(parsed, df_clean)

üéµ Total songs before filtering: 19170
üéØ Applied mood ‚Üí Valence filter: 3971
‚ö†Ô∏è Skipped genre filter due to value or overlap.
‚ö†Ô∏è Skipped instrumental filter (empty or overlap)
üéØ Applied tempo filter: 1175
‚úÖ Final songs returned: 10


In [201]:
results

Unnamed: 0,Artist,Url_spotify,Track,Album,Danceability,Energy,Key,Loudness,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Url_youtube,Title,Channel,Views,Likes,Comments,Description,Licensed,official_video,Stream,audio_vector,Gender,similarity
18568,Billie Eilish,https://open.spotify.com/artist/6qqNVTkY8uBg9c...,lovely (with Khalid),lovely (with Khalid),0.351,0.296,4.0,-10.109,0.0333,0.934,0.0,0.095,0.12,115.284,https://www.youtube.com/watch?v=V1Pl8CzNzCw,"Billie Eilish, Khalid - lovely",BillieEilishVEVO,0.213054,0.480938,0.034938,Listen to ‚Äúlovely‚Äù (with Khalid): http://smart...,True,True,0.623227,"[0.12, 0.296, 0.351, 115.284, 0.934, 0.0]",female,0.999953
140,Khalid,https://open.spotify.com/artist/6LuN9FCkKOj5Pc...,lovely (with Khalid),lovely (with Khalid),0.351,0.296,4.0,-10.109,0.0333,0.934,0.0,0.095,0.12,115.284,https://www.youtube.com/watch?v=V1Pl8CzNzCw,"Billie Eilish, Khalid - lovely",BillieEilishVEVO,0.213052,0.480931,0.034938,Listen to ‚Äúlovely‚Äù (with Khalid): http://smart...,True,True,0.623227,"[0.12, 0.296, 0.351, 115.284, 0.934, 0.0]",male,0.999953
12449,Ed Sheeran,https://open.spotify.com/artist/6eUKZXaKkcviH0...,Perfect,√∑ (Deluxe),0.599,0.448,8.0,-6.312,0.0232,0.163,0.0,0.106,0.168,95.05,https://www.youtube.com/watch?v=2Vv-BfVoq4g,Ed Sheeran - Perfect (Official Music Video),Ed Sheeran,0.415994,0.374749,0.030227,The official music video for Ed Sheeran - Perf...,True,True,0.68291,"[0.168, 0.448, 0.599, 95.05, 0.163, 0.0]",male,0.999994
11050,Willy William,https://open.spotify.com/artist/4RSyJzf7ef6Iu2...,Mi Gente,Vibras,0.548,0.704,11.0,-4.838,0.0777,0.0168,2.3e-05,0.143,0.288,104.666,https://www.youtube.com/watch?v=wnJ6LuUFpMo,"J Balvin, Willy William - Mi Gente (Official V...",jbalvinVEVO,0.393505,0.343504,0.032493,Listen to J Balvin‚Äôs top songs here: \nhttps:/...,True,True,0.379564,"[0.288, 0.704, 0.548, 104.666, 0.0168, 2.25e-05]",male,0.999999
16488,M√ò,https://open.spotify.com/artist/0bdfiayQAKewqE...,Lean On,Peace Is The Mission (Extended),0.723,0.809,7.0,-3.081,0.0625,0.00346,0.00123,0.565,0.274,98.007,https://www.youtube.com/watch?v=YqeW9_5kURI,Major Lazer & DJ Snake - Lean On (feat. M√ò) (O...,Major Lazer Official,0.411507,0.319187,0.02799,Major Lazer & DJ Snake - Lean On (feat. M√ò) (O...,False,False,0.505849,"[0.274, 0.809, 0.723, 98.007, 0.00346, 0.00123]",unknown,0.999997
15393,DJ Snake,https://open.spotify.com/artist/540vIaP2JwjQb9...,Lean On,Peace Is The Mission (Extended),0.723,0.809,7.0,-3.081,0.0625,0.00346,0.00123,0.565,0.274,98.007,https://www.youtube.com/watch?v=YqeW9_5kURI,Major Lazer & DJ Snake - Lean On (feat. M√ò) (O...,Major Lazer Official,0.411507,0.319187,0.02799,Major Lazer & DJ Snake - Lean On (feat. M√ò) (O...,False,False,0.505849,"[0.274, 0.809, 0.723, 98.007, 0.00346, 0.00123]",unknown,0.999997
61,Linkin Park,https://open.spotify.com/artist/6XyY86QOPPrYVG...,Numb,Meteora,0.496,0.863,9.0,-4.153,0.0381,0.0046,0.0,0.639,0.243,110.018,https://www.youtube.com/watch?v=kXYiU_JCYtU,Numb [Official Music Video] - Linkin Park,Linkin Park,0.238715,0.243002,0.034255,Watch the official music video for Numb by Lin...,True,True,0.352931,"[0.243, 0.863, 0.496, 110.018, 0.0046, 0.0]",unknown,0.999999
15255,The Weeknd,https://open.spotify.com/artist/1Xyo4u8uXC1ZmM...,The Hills,Beauty Behind The Madness,0.585,0.564,0.0,-7.063,0.0515,0.0671,0.0,0.135,0.137,113.003,https://www.youtube.com/watch?v=yzTuBuRdAyA,The Weeknd - The Hills (Official Video),TheWeekndVEVO,0.239357,0.231955,0.019159,The Weeknd - The Hills (Official Video)\nDownl...,True,True,0.520517,"[0.137, 0.564, 0.585, 113.003, 0.0671, 0.0]",female,0.999996
19966,j-hope,https://open.spotify.com/artist/0b1sIQumIAsNbq...,Chicken Noodle Soup (feat. Becky G),Chicken Noodle Soup (feat. Becky G),0.827,0.817,2.0,-4.081,0.0953,0.00496,1.2e-05,0.294,0.167,97.008,https://www.youtube.com/watch?v=i23NEQEFpgQ,j-hope 'Chicken Noodle Soup (feat. Becky G)' MV,HYBE LABELS,0.046386,0.231836,0.049194,j-hope 'Chicken Noodle Soup (feat. Becky G)' M...,True,True,0.047983,"[0.167, 0.817, 0.827, 97.008, 0.00496, 1.19e-05]",unknown,0.999994
16675,Alan Walker,https://open.spotify.com/artist/7vk5e3vY1uw9pl...,Alone,Different World,0.673,0.914,10.0,-3.962,0.0496,0.229,0.000478,0.186,0.183,97.021,https://www.youtube.com/watch?v=1-xGerv5FOk,Alan Walker - Alone,Alan Walker,0.160902,0.217394,0.025805,Click the link to listen to my latest album: \...,True,True,0.179973,"[0.183, 0.914, 0.673, 97.021, 0.229, 0.000478]",male,0.999995


In [216]:
results["Explanation"] = results.apply(lambda row: generate_explanation(user_input, row), axis=1)

In [217]:
results

Unnamed: 0,Artist,Url_spotify,Track,Album,Danceability,Energy,Key,Loudness,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Url_youtube,Title,Channel,Views,Likes,Comments,Description,Licensed,official_video,Stream,audio_vector,Gender,similarity,Explanation
18568,Billie Eilish,https://open.spotify.com/artist/6qqNVTkY8uBg9c...,lovely (with Khalid),lovely (with Khalid),0.351,0.296,4.0,-10.109,0.0333,0.934,0.0,0.095,0.12,115.284,https://www.youtube.com/watch?v=V1Pl8CzNzCw,"Billie Eilish, Khalid - lovely",BillieEilishVEVO,0.213054,0.480938,0.034938,Listen to ‚Äúlovely‚Äù (with Khalid): http://smart...,True,True,0.623227,"[0.12, 0.296, 0.351, 115.284, 0.934, 0.0]",female,0.999945,"1. ""lovely"" is a heartfelt and melancholic son..."
140,Khalid,https://open.spotify.com/artist/6LuN9FCkKOj5Pc...,lovely (with Khalid),lovely (with Khalid),0.351,0.296,4.0,-10.109,0.0333,0.934,0.0,0.095,0.12,115.284,https://www.youtube.com/watch?v=V1Pl8CzNzCw,"Billie Eilish, Khalid - lovely",BillieEilishVEVO,0.213052,0.480931,0.034938,Listen to ‚Äúlovely‚Äù (with Khalid): http://smart...,True,True,0.623227,"[0.12, 0.296, 0.351, 115.284, 0.934, 0.0]",male,0.999945,"""lovely"" is a great fit for a breakup montage ..."
16258,AURORA,https://open.spotify.com/artist/1WgXqy2Dd70QQO...,Runaway,All My Demons Greeting Me as a Friend (Deluxe),0.459,0.276,11.0,-10.339,0.036,0.63,9.5e-05,0.104,0.128,114.169,https://www.youtube.com/watch?v=d_HlPboLRL8,AURORA - Runaway,iamAURORAVEVO,0.062888,0.190374,0.013156,Aurora's brand-new album The Gods We Can Touch...,True,True,0.188255,"[0.128, 0.276, 0.459, 114.169, 0.63, 9.5e-05]",unknown,0.99996,"This song, ""Runaway"" by AURORA, is an appropri..."
13520,Selena Gomez,https://open.spotify.com/artist/0C8ZW7ezQVs4UR...,Lose You To Love Me,Rare,0.488,0.343,4.0,-8.985,0.0436,0.556,0.0,0.21,0.0978,102.819,https://www.youtube.com/watch?v=zlJDTxahav0,Selena Gomez - Lose You To Love Me (Official M...,SelenaGomezVEVO,0.052497,0.169564,0.021886,"Get Selena's new album 'Rare', out now: http:/...",True,True,0.287106,"[0.0978, 0.343, 0.488, 102.819, 0.556, 0.0]",female,0.999966,This song is a good fit for a sad indie acoust...
19229,Jhayco,https://open.spotify.com/artist/6nVcHLIgY5pE2Y...,D√ÅKITI,EL √öLTIMO TOUR DEL MUNDO,0.731,0.573,4.0,-10.059,0.0544,0.401,5.2e-05,0.113,0.145,109.928,https://www.youtube.com/watch?v=TmKh7lAwnBI,BAD BUNNY x JHAY CORTEZ - D√ÅKITI (Video Oficial),Bad Bunny,0.154994,0.167239,0.016472,BAD BUNNY x JHAY CORTEZ\nD√ÅKITI | EL √öLTIMO TO...,True,True,0.47842,"[0.145, 0.573, 0.731, 109.928, 0.401, 5.22e-05]",unknown,0.999982,"""Bad Bunny and Jhay Cortez's ""Dakiti"" is a per..."


In [219]:
results['Explanation'].iloc[3]

"This song is a good fit for a sad indie acoustic breakup montage because it has an emotional and introspective tone, with Selena Gomez's heartfelt vocals and relatable lyrics about a painful breakup. The instrumental arrangement is simple and acoustic, adding to the song's emotional impact."

### 4 Implementation

#### 4.1 Testing corner for Gradio (in progress...)

In [26]:
!pip install gradio

Collecting gradio
  Downloading gradio-5.23.1-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.8.0 (from gradio)
  Downloading gradio_client-1.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6 

In [97]:
import gradio as gr
import pandas as pd
import datetime

# Global feedback tracker
feedback_df = pd.DataFrame(columns=["Query", "Title", "Artist", "Feedback"])

# Save functions
def submit_feedback(query, title, artist, feedback_type):
    global feedback_df
    new_feedback = pd.DataFrame([{
        "Query": query,
        "Title": title,
        "Artist": artist,
        "Feedback": feedback_type
    }])
    feedback_df = pd.concat([feedback_df, new_feedback], ignore_index=True)
    return f"‚úÖ Feedback recorded for {title}: {feedback_type}"

def save_results(results_df):
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    path = f"recommendations_{timestamp}.csv"
    results_df.to_csv(path, index=False)
    return path

def save_feedback():
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    path = f"feedback_log_{timestamp}.csv"
    feedback_df.to_csv(path, index=False)
    return path

# Gradio UI
with gr.Blocks() as demo:
    gr.Markdown("## üéß Song Recommender with Feedback")

    # Input & Run
    user_input = gr.Textbox(label="What kind of song are you looking for?")
    btn_run = gr.Button("üîç Recommend Songs")

    # Output Table + State
    song_output = gr.Dataframe(label="üéµ Recommended Songs")
    results_state = gr.State()

    # Hidden Download Button
    download_btn = gr.Button("‚¨áÔ∏è Download Recommendations CSV", visible=False)
    download_file = gr.File()

    # Wire up Run button
    def run_query_and_show_download(input_text):
        df_display, df_full = handle_query(input_text)
        return df_display, df_full, gr.update(visible=True)

    btn_run.click(
        fn=run_query_and_show_download,
        inputs=[user_input],
        outputs=[song_output, results_state, download_btn]
    )

    download_btn.click(fn=save_results, inputs=[results_state], outputs=download_file)

    # Feedback Section (Hidden until triggered)
    gr.Markdown("### üß† Optional Feedback")
    toggle_feedback_btn = gr.Button("‚úçÔ∏è Give Feedback")
    feedback_group = gr.Group(visible=False)

    with feedback_group:
        with gr.Row():
            feedback_title = gr.Textbox(label="Song Title")
            feedback_artist = gr.Textbox(label="Artist")

        with gr.Row():
            btn_like = gr.Button("üëç Relevant")
            btn_dislike = gr.Button("üëé Not Relevant")

        feedback_response = gr.Textbox(label="Feedback Message")

    toggle_feedback_btn.click(lambda: gr.update(visible=True), None, outputs=[feedback_group])

    btn_like.click(
        fn=submit_feedback,
        inputs=[user_input, feedback_title, feedback_artist, gr.State("üëç")],
        outputs=feedback_response
    )

    btn_dislike.click(
        fn=submit_feedback,
        inputs=[user_input, feedback_title, feedback_artist, gr.State("üëé")],
        outputs=feedback_response
    )

demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://029a1ee6fd1f73665c.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




#### 4.2 Temporary result

In [95]:
def handle_query(input_text):
    print("üì• User input received:", input_text)

    try:
        parsed = parse_user_input(input_text)
        print("‚úÖ Parsed:", parsed)

        ref_song = parsed.get("reference_song") or ""
        if ref_song.strip().lower() not in ["", "...", "none", "n/a"]:
            results = recommend_by_reference_song(parsed, df_clean, audio_feature_cols, llm, reference_song_prompt)
        else:
            results = recommend_from_dataframe(parsed, df_clean)


        results = results.copy()
        results["Explanation"] = results.apply(lambda row: generate_explanation(input_text, row), axis=1)
        print("‚úÖ Final results shape:", results.shape)

        return pd.DataFrame(results)

    except Exception as e:
        print("‚ùå ERROR:", e)
        return pd.DataFrame(columns=["Title", "Artist", "Explanation"]), pd.DataFrame()

In [None]:
query = input("What kind of song are you looking for? ")
handle_query(query)

In [96]:
handle_query("Give me 5 sad songs")

üì• User input received: Give me 5 sad songs
üß† Raw LLM Output:
 
You are an expert music assistant trained to recommend songs for use in creative projects like ads, short films, social media videos, and campaigns.

Your job is to extract structured information from the user's input so that we can recommend songs from our database based on Spotify audio features and YouTube metadata.

Only return the following fields in JSON format:
- "mood": overall emotion (e.g., sad, happy, dramatic)
- "context": what kind of scene or use (e.g., wedding, breakup scene, brand ad)
- "preference": how to sort (e.g., likes, views, popularity)
- "reference_song": name of a song the user wants similar songs to
- "genre": intended musical style (e.g., acoustic, pop, ambient)
- "instrumental": 'yes' if vocals aren't needed, otherwise 'no'
- "tempo": slow / medium / fast
- "artist": if they want something from a specific artist
- "gender": if they prefer male / female vocal
- "limit": how many results to 

Unnamed: 0,Artist,Url_spotify,Track,Album,Danceability,Energy,Key,Loudness,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Url_youtube,Title,Channel,Views,Likes,Comments,Description,Licensed,official_video,Stream,audio_vector,Gender,similarity,Explanation
14580,Charlie Puth,https://open.spotify.com/artist/6VuMaDnrHyPL1p...,See You Again (feat. Charlie Puth),See You Again (feat. Charlie Puth),0.689,0.481,10.0,-7.503,0.0815,0.369,1e-06,0.0649,0.283,80.025,https://www.youtube.com/watch?v=RgKAFK5djSk,Wiz Khalifa - See You Again ft. Charlie Puth [...,Wiz Khalifa Music,0.71461,0.790485,0.132272,Download the new Furious 7 Soundtrack Deluxe V...,True,True,0.449208,"[0.283, 0.481, 0.689, 80.025, 0.369, 1.03e-06]",male,0.999984,This song is a good fit as it is a powerful an...
12469,Wiz Khalifa,https://open.spotify.com/artist/137W8MRPWKqSmr...,See You Again (feat. Charlie Puth),See You Again (feat. Charlie Puth),0.689,0.481,10.0,-7.503,0.0815,0.369,1e-06,0.0649,0.283,80.025,https://www.youtube.com/watch?v=RgKAFK5djSk,Wiz Khalifa - See You Again ft. Charlie Puth [...,Wiz Khalifa Music,0.71461,0.790484,0.132272,Download the new Furious 7 Soundtrack Deluxe V...,True,True,0.449208,"[0.283, 0.481, 0.689, 80.025, 0.369, 1.03e-06]",unknown,0.999984,"- Wiz Khalifa's ""See You Again"" is a powerful ..."
16668,Alan Walker,https://open.spotify.com/artist/7vk5e3vY1uw9pl...,Faded,Different World,0.468,0.627,6.0,-5.085,0.0476,0.0281,8e-06,0.11,0.159,179.642,https://www.youtube.com/watch?v=60ItHLz5WEA,Alan Walker - Faded,Alan Walker,0.420902,0.52071,0.077725,Click the link to listen to my latest album: \...,True,True,0.497022,"[0.159, 0.627, 0.468, 179.642, 0.0281, 7.97e-06]",male,0.999987,"1. ""Faded"" is a song by Norwegian DJ and recor..."
18568,Billie Eilish,https://open.spotify.com/artist/6qqNVTkY8uBg9c...,lovely (with Khalid),lovely (with Khalid),0.351,0.296,4.0,-10.109,0.0333,0.934,0.0,0.095,0.12,115.284,https://www.youtube.com/watch?v=V1Pl8CzNzCw,"Billie Eilish, Khalid - lovely",BillieEilishVEVO,0.213054,0.480938,0.034938,Listen to ‚Äúlovely‚Äù (with Khalid): http://smart...,True,True,0.623227,"[0.12, 0.296, 0.351, 115.284, 0.934, 0.0]",female,0.999959,"""lovely"" is a heart-wrenching collaboration be..."
140,Khalid,https://open.spotify.com/artist/6LuN9FCkKOj5Pc...,lovely (with Khalid),lovely (with Khalid),0.351,0.296,4.0,-10.109,0.0333,0.934,0.0,0.095,0.12,115.284,https://www.youtube.com/watch?v=V1Pl8CzNzCw,"Billie Eilish, Khalid - lovely",BillieEilishVEVO,0.213052,0.480931,0.034938,Listen to ‚Äúlovely‚Äù (with Khalid): http://smart...,True,True,0.623227,"[0.12, 0.296, 0.351, 115.284, 0.934, 0.0]",male,0.999959,"- ""lovely"" is a song by Billie Eilish and Khal..."


### 5 `requirements.txt`

In [10]:
requirements = """
gradio
pandas
transformers
scikit-learn
accelerate
torch
gender-guesser
"""

# Save to Drive

from google.colab import drive
drive.mount('/content/drive')

save_path = '/content/drive/MyDrive/mastering-ai/final-project-REA6KIZML-shavira/requirements.txt'

with open(save_path, "w") as f:
    f.write(requirements.strip())

print(f"‚úÖ requirements.txt saved to: {save_path}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
‚úÖ requirements.txt saved to: /content/drive/MyDrive/mastering-ai/final-project-REA6KIZML-shavira/requirements.txt


## Submit Notebook (for submit final project)

In [None]:
portfolio_link = ""
presentation_link = ""

question_id = "01_portfolio_link"
submit(student_id, name, assignment_id, str(portfolio_link), question_id, drive_link)

question_id = "02_presentation_link"
submit(student_id, name, assignment_id, str(presentation_link), question_id, drive_link)

# FIN