What is Cartesian Product?

The Cartesian product of two sets and , denoted , is the set of all ordered pairs where and .

Example: Let

Then

It grows combinatorially: if and , then .

How is Cartesian Product Used in Categorical Data: Feature Crosses?

🧠 Feature Crosses (aka combinatorial features) are: New features created by combining two or more categorical features into a single categorical feature representing joint levels (interactions).

🚀 Example: Suppose you have two categorical features: • Country = {US, UK}Device = {Mobile, Desktop}

A feature cross (i.e., Cartesian product) would generate: {(US, Mobile), (US, Desktop), (UK, Mobile), (UK, Desktop)}

Which becomes a new categorical feature like: • US_MobileUS_DesktopUK_MobileUK_Desktop

You can then:

• One-hot encode these combined categories • Feed them to tree-based models, embedding layers, or wide & deep models

Why use Feature Crosses?

  1. Capture interactions between categories that have joint effects (e.g., “users in US on mobile” behave differently from “users in UK on desktop”).
  2. Improve model accuracy, especially for models like logistic regression or neural nets that otherwise assume independence.

⚠️ Trade-Offs:

ProsCons
Can improve expressivenessHigh cardinality explosion (combinatorial)
Captures important patternsMay lead to overfitting or sparsity
Especially useful in Wide & Deep ModelsNeed embedding or hashing to manage

In Practice:

Manual Crosses: You select features to cross based on domain knowledge. • Automated Crosses: Libraries like tf.feature_column.crossed_column (TensorFlow), or via embedding layers in deep learning. • Hashed Crosses: Avoids exploding dimensions by hashing crossed features into fixed buckets.

Resources

  • Google > Machine Learning > Crash Course > Working with categorical data > Feature Crosses