In this group project, you’ll use machine learning skills to create your own prediction model in Google Colab. Your group will pick a fun topic you’re interested in, choose a yes/no question, build a dataset, train a model, and test it with another group’s inputs in front of the class!
Why does this matter? Machine learning starts with a question you want to answer, like predicting if something will happen or if someone will like something. Choosing your own topic and question makes this project creative and exciting!
Your role in this step: As a group, brainstorm a yes/no prediction question about any topic you like—sports, entertainment, nature, or anything else that sparks your interest!
What’s the big picture? A dataset is a table of information with features (like sunlight hours) and an outcome (like plant growth) to train your model.
What’s your job here? Your group will invent a small dataset (10-15 rows) based on your question, making up realistic data for different scenarios, objects, or people.
import pandas as pd
# Create your dataset (edit this!)
data = {
'Feature1': [10, 8, 5, 12, 3, 9, 7, 4, 11, 6], # e.g., Sunlight Hours
'Feature2': [7, 6, 5, 8, 4, 7, 6, 5, 8, 6], # e.g., Water Amount
'Feature3': [1, 1, 0, 1, 0, 1, 0, 1, 1, 0], # e.g., Good Soil (0=No, 1=Yes)
'Outcome': [1, 1, 0, 1, 0, 1, 0, 1, 1, 0] # e.g., Grows Well (0=No, 1=Yes)
}
df = pd.DataFrame(data)
# Show the dataset
df
Why is this step key? Exploring your data helps you spot patterns, like whether more sunlight helps plants grow, so you understand what your model will learn.
Your task: Summarize your dataset and plot one feature against the outcome to look for connections.
# Show summary statistics
df.describe()
Feature1 with your feature’s name):
import matplotlib.pyplot as plt
# Scatter plot
plt.scatter(df['Feature1'], df['Outcome'], color='green')
plt.xlabel('Feature1') # e.g., Sunlight Hours
plt.ylabel('Outcome (1=Yes, 0=No)')
plt.title('Feature1 vs. Outcome')
plt.show()
What’s the goal? You’ll train a Decision Tree model to learn patterns in your data and test how well it predicts your outcome.
Your role in this step: Prepare your data, train the model, and check its accuracy.
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Features and target
X = df[['Feature1', 'Feature2', 'Feature3']] # Edit with your feature names
y = df['Outcome']
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train the model
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict on the test set
y_pred = model.predict(X_test)
# Show predictions and accuracy
print("Test set features:\n", X_test)
print("Predictions:", y_pred)
print("Actual:", y_test.values)
accuracy = accuracy_score(y_test, y_pred)
print("Model accuracy:", accuracy * 100, "%")
test_size (e.g., 0.2 or 0.4) to see how accuracy changes.What’s exciting about this? You’ll share your model by predicting outcomes for another group’s inputs.
Your task: Write code to take inputs from another group and plan your class presentation.
# Get inputs from another group
print("Enter details for prediction:")
feature1 = float(input("Feature1 (e.g., Sunlight Hours): "))
feature2 = float(input("Feature2 (e.g., Water Amount): "))
feature3 = float(input("Feature3 (e.g., Good Soil, 0=No, 1=Yes): "))
# Create new data
new_person = pd.DataFrame({
'Feature1': [feature1],
'Feature2': [feature2],
'Feature3': [feature3]
})
# Predict
prediction = model.predict(new_person)
print("Prediction:", "Yes" if prediction[0] == 1 else "No") # Edit "Yes/No" to match your question
input() doesn’t work in Colab, use hardcoded values (e.g., feature1 = 8) for the presentation or ask your instructor.Why present your work? Sharing your model helps you explain what you learned and see how others tackled their projects, like scientists sharing discoveries.
What are you doing? Present your model, test it with another group’s inputs, and reflect on your project to deepen your understanding of machine learning.
df in Colab).File > Save) and upload it in the google drive link as a .ipynb file.