Machine Learning Group Project: Create and Test Your Own Model

In this group project, you’ll use machine learning skills to create your own prediction model in Google Colab. Your group will pick a fun topic you’re interested in, choose a yes/no question, build a dataset, train a model, and test it with another group’s inputs in front of the class!

1. Choose Your Question

Why does this matter? Machine learning starts with a question you want to answer, like predicting if something will happen or if someone will like something. Choosing your own topic and question makes this project creative and exciting!

Your role in this step: As a group, brainstorm a yes/no prediction question about any topic you like—sports, entertainment, nature, or anything else that sparks your interest!

With your group, brainstorm a yes/no question to predict. Examples:

Will a student get an A in math? (Based on study hours, sleep, etc.)
Will a movie be a hit? (Based on genre, budget, etc.)
Will a plant grow well? (Based on sunlight, water, etc.)
Will someone enjoy a new video game? (Based on age, gaming hours, etc.)

Write down your question.

Pick a question with a clear yes/no answer (e.g., 0 = No, 1 = Yes) and factors you can measure, like numbers (hours, budget) or categories (yes/no, genre). Ask your instructor if you need ideas!

2. Create Your Dataset

What’s the big picture? A dataset is a table of information with features (like sunlight hours) and an outcome (like plant growth) to train your model.

What’s your job here? Your group will invent a small dataset (10-15 rows) based on your question, making up realistic data for different scenarios, objects, or people.

Decide on at least 7 features that affect your prediction (e.g., for “Will a plant grow well?”: Sunlight Hours, Water Amount, Soil Type).
Create a table with 15 rows, each showing a scenario with a yes/no outcome (0 = No, 1 = Yes).
Use this code template in a new Colab notebook (name it “Group [name] ML Project”):

import pandas as pd

# Create your dataset (edit this!)
data = {
    'Feature1': [10, 8, 5, 12, 3, 9, 7, 4, 11, 6],  # e.g., Sunlight Hours
    'Feature2': [7, 6, 5, 8, 4, 7, 6, 5, 8, 6],   # e.g., Water Amount
    'Feature3': [1, 1, 0, 1, 0, 1, 0, 1, 1, 0],   # e.g., Good Soil (0=No, 1=Yes)
    'Outcome': [1, 1, 0, 1, 0, 1, 0, 1, 1, 0]     # e.g., Grows Well (0=No, 1=Yes)
}
df = pd.DataFrame(data)

# Show the dataset
df

Copy this code, paste it in ChatGPT along with your question and selected features. Ask ChatGPT to create a dummy data. As a group, edit the code to match your question, features, and data (15 rows). Run it to see your table. Discuss: Does the data make sense for your question?

Use numbers for features like hours or amounts, and 0/1 for yes/no features (like Good Soil). Make sure your outcome is 0 or 1. Show this to your instructor before proceeding to the next step.

3. Explore Your Data

Why is this step key? Exploring your data helps you spot patterns, like whether more sunlight helps plants grow, so you understand what your model will learn.

Your task: Summarize your dataset and plot one feature against the outcome to look for connections.

Add a new code cell and paste this code for statistics:

# Show summary statistics
df.describe()

Add another code cell for a scatter plot (replace Feature1 with your feature’s name):

import matplotlib.pyplot as plt

# Scatter plot
plt.scatter(df['Feature1'], df['Outcome'], color='green')
plt.xlabel('Feature1')  # e.g., Sunlight Hours
plt.ylabel('Outcome (1=Yes, 0=No)')
plt.title('Feature1 vs. Outcome')
plt.show()

Run both cells. What’s the average of one feature (e.g., Sunlight Hours)? Does the plot show a pattern (e.g., does more of Feature1 mean Outcome=1)? Try plotting another feature to spot trends.

4. Train and Test Your Model

What’s the goal? You’ll train a Decision Tree model to learn patterns in your data and test how well it predicts your outcome.

Your role in this step: Prepare your data, train the model, and check its accuracy.

Add a new code cell and paste this code (edit feature names to match your dataset):

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Features and target
X = df[['Feature1', 'Feature2', 'Feature3']]  # Edit with your feature names
y = df['Outcome']

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the model
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)

# Show predictions and accuracy
print("Test set features:\n", X_test)
print("Predictions:", y_pred)
print("Actual:", y_test.values)
accuracy = accuracy_score(y_test, y_pred)
print("Model accuracy:", accuracy * 100, "%")

Run the code. How many rows are in the test set? Do the predictions match the actual outcomes? Look at the test set features to guess why the model made these predictions. Try changing test_size (e.g., 0.2 or 0.4) to see how accuracy changes.

You’re using 70% of your data to train and 30% to test. With a small dataset, accuracy might vary a lot!

5. Prepare to Test with Another Group

What’s exciting about this? You’ll share your model by predicting outcomes for another group’s inputs.

Your task: Write code to take inputs from another group and plan your class presentation.

Add a new code cell and paste this code (edit feature names and prompts):

# Get inputs from another group
print("Enter details for prediction:")
feature1 = float(input("Feature1 (e.g., Sunlight Hours): "))
feature2 = float(input("Feature2 (e.g., Water Amount): "))
feature3 = float(input("Feature3 (e.g., Good Soil, 0=No, 1=Yes): "))

# Create new data
new_person = pd.DataFrame({
    'Feature1': [feature1],
    'Feature2': [feature2],
    'Feature3': [feature3]
})

# Predict
prediction = model.predict(new_person)
print("Prediction:", "Yes" if prediction[0] == 1 else "No")  # Edit "Yes/No" to match your question

Test the code with sample inputs to check it works. Discuss with your group: What will you tell the class about your question, dataset, and model before testing their inputs? Practice explaining it clearly.

If input() doesn’t work in Colab, use hardcoded values (e.g., feature1 = 8) for the presentation or ask your instructor.

6. Present and Reflect

Why present your work? Sharing your model helps you explain what you learned and see how others tackled their projects, like scientists sharing discoveries.

What are you doing? Present your model, test it with another group’s inputs, and reflect on your project to deepen your understanding of machine learning.

Present to the class:
- Explain your question and why you chose it.
- Show your dataset (run df in Colab).
- Share your model’s accuracy and any patterns you found.
- Ask another group for inputs, run the prediction code, and discuss the result.
Add a text cell in Colab or write on paper to reflect:
- What patterns did you find in your data?
- What was the hardest part of this project? How did your group solve it?

Present your project to the class, test your model with another group’s inputs, and answer the reflection questions. Save your notebook (File > Save) and upload it in the google drive link as a .ipynb file.