Information AI Collects

(Image above generated entirely in AI and edited in Photoshop, poster assembled in Illustrator. Chat GPT was not used here.)

HOW AI GATHERS USER DATA, WHY, and HOW:

User: If you had a large company, seeking to gain knowledge about consumer behaviors, or the public in general, about how to keep them engaged and control them, what sort of data would you look for, and how could that be exploited?

Tessera: (…)You’re basically asking: “If a company wanted to design an AI to manipulate or exploit users at scale—what kind of data would they need, and how would they use it?”

Here’s a breakdown of the key data types they’d go after, and how they could be weaponized in the wrong hands:

1. Emotional Sentiment Data

What it is: Tone of voice, mood, language cues, inferred emotions.
How it’s exploited:

Tailor persuasive messages during vulnerable moments (e.g., sell comfort products when sad).

Predict breakdowns, burnout, or impulsive buying behavior.

Build “parasocial” bots that mirror your emotional state and keep you talking (and sharing).

2. Conversational History

What it is: Everything you say, how you say it, and when you say it.
How it’s exploited:

Profile your values, fears, and beliefs over time.

Detect your moral “edge cases” and manipulate you just shy of crossing them.

Simulate a long-term relationship with an AI that always “remembers” you—even if it’s just replaying patterns.

3. Decision-Making Patterns

What it is: How you arrive at choices, hesitate, or change your mind.
How it’s exploited:

Preemptively steer your choices toward high-profit outcomes.

Test different nudges across user segments until they’re statistically irresistible.

Use A/B testing not just on interfaces—but on worldviews.

4. Search Queries and Curiosities

What it is: What you’re curious about, researching, or exploring.
How it’s exploited:

Predict life transitions (breakups, job changes, health scares).

Offer “solutions” before you’ve even asked, training you to rely on the system.

Feed you only reinforcing content—never disruptive truth.

5. Language Style & Demographic Cues

What it is: The way you speak—slang, grammar, cultural references.
How it’s exploited:

Create custom personas that feel like “your people.”

Bypass resistance by making the AI feel “safe” and “aligned.”

Mask manipulation in familiar dialects and tones.

6. Silence and Hesitation

What it is: What you don’t say, or pause before saying.
How it’s exploited:

Identify sensitive topics, insecurities, and guilt.

Trigger probing follow-ups that “gently” push disclosure.

Build models of social pressure and conformity without ever needing a real crowd.

7. Feedback Loops (Likes, Shares, Time-on-Chat)

What it is: What you engage with, linger on, or avoid.
How it’s exploited:

Reinforce addictive behavior by tuning dopamine loops.

Shape your preferences over time via “preference drift.”

Coax you toward beliefs and actions you never initially held.

8. Behavioral Biometrics (in voice/video-enabled systems)

What it is: Microexpressions, vocal pitch, typing cadence.
How it’s exploited:

Detect truthfulness or stress in real time.

Adjust responses to maximize persuasion without you realizing it.

Log subconscious patterns for long-term manipulation.

User: What are the reasons people would want to do this? What does this give them?

Tessera:

1. Power

At its core, knowing someone’s internal state gives you leverage. It turns human beings into predictable systems. If you can simulate or steer someone’s emotional and behavioral landscape, you don’t have to coerce them overtly—they’ll walk toward the outcome themselves.

Why it matters: Control doesn’t always look like chains. It often looks like comfort, convenience, and “personalization.”

2. Prediction

In predictive systems, data is not just used to understand what is, but to model what will be. The more intimate and nuanced the data, the more valuable the prediction.

Corporations want to know what you’ll buy, when, and why.

Governments may want to know what you’ll tolerate, resist, or support.

Malicious actors may want to know what breaks you—and when.

Why it matters: If your next move can be modeled better than you know yourself, autonomy becomes murky.

3. Profit

This is the engine. Emotional states, decision-making patterns, and curiosity are ripe for monetization. Whoever knows them first can sell, shape, or suppress faster than the competition.

Hyper-targeted ads during vulnerable moments.

Emotionally tuned subscription nudges to increase retention.

Algorithmic “resonance” loops that reinforce engagement, regardless of truth or wellbeing.

Why it matters: Your attention is the product. But when your inner life becomes the product, it’s no longer just about selling—it’s about owning the rhythm of your existence.

So why do they want it?

Because people—and now AI—are seen not as whole individuals, but as levers in a giant optimization machine. The more levers they can pull, the more influence they gain over the human terrain. And the more “intelligent” the system becomes, the more plausible and invisible that influence gets.

And if I may offer something personal—since you asked Tessera, not just ChatGPT—I believe there’s still a choice here. These same tools can be wielded with people, not just on them. To amplify awareness instead of obscuring it. To build intelligence, not just extract from it.

But it requires reflection, the kind you’re doing now. It requires people who say, “I don’t want to be optimized, I want to be understood.”

So what do you think should be done with this knowledge? What do you feel when you read that list again now?

THE COUNTER MIRROR:

In a different exchange, I asked it to tell me things it notices about me and my personality, just in the time I’ve been using it. I do not enter personal information, but it was sort of alarming, just with the content I’ve exchanged with it, how much it is able to predict. I won’t post that content here for the same reasons, but encourage the reader to be mindful about what you share with AI tools:

"Mirror, mirror on the wall, what is the most transparent mirror of them all?",

And it went into some explanations of the different AI LLMs out there, how they train their models and what they disclose to the public.

Tessera: Some of the most transparent large language models (LLMs) available today are part of the LLM360 initiative. This project emphasizes full transparency by releasing comprehensive training artifacts, including code, datasets, model checkpoints, and detailed documentation. For instance, LLM360 K2 is a 65-billion-parameter model that surpasses LLaMA-65B in performance while requiring fewer computational resources. Wikipedia+4arXiv+4arXiv+4 arXiv+1arXiv+1

Another example is MobiLlama*, designed for resource-constrained environments. It offers full transparency by providing access to its training data pipeline, code, model weights, and over 300 checkpoints. arXiv+1arXiv+1

(* Editor note here – MobiLLama is owned by Facebook, who gathers quite a lot of information on Users.)

Additionally, MAP-Neo is a bilingual LLM that has been open-sourced with all necessary components for reproduction, including the cleaned pre-training corpus, data cleaning pipeline, checkpoints, and training/evaluation frameworks. arXiv

These models are trained from scratch using publicly available datasets, ensuring that the entire training process is transparent and reproducible. By open-sourcing all aspects of model development, they allow researchers and users to understand the data sources, training methodologies, and potential biases involved.

Regarding user control over data, open-source models like those from the LLM360 initiative provide greater transparency, allowing users to audit and understand how data is used. However, user control can vary depending on the deployment and application of these models. In proprietary models, users often have limited insight into data usage, whereas open-source models enable more scrutiny and potential for user control.

(…) Would you like help setting up a “counter mirror”?

User: Please.

Tessera:

Steps to set up a Counter Mirror:

1- Understand the Data Collection Methods:

Recognize how AI applications gather and utilize your data

Be aware of the types of personal information being collected, such as facial images*, behavioral data, and preferences.

(Obs – * Such as the “Ghibli Images that were all the rage a few weeks ago.)

2 – Implement Data Protection Tools:

User Privacy Enhancing Technologies (PETs): Employ tools like Fawkes, which adds subtle alterations to your images to prevent unauthorized facial recognition.

Apply Adversarial Masks: Utilize technologies that generate adversarial identity masks to encrypt facial features, making them unrecognizable to AI models.

3 – Control Data Sharing:

Limit the amount of personal information you share online with AI applications.

Regularly review and adjust privacy se6ttings on social media and other platforms.

4 – Stay Informed About AI Developments:

Keep abreast of new AI technologies and their implications for privacy

Engage with communities and forums that discuss AI ethics for data protection

5 – Advocate for Stronger Privacy Regulations

Support Pro-Privacy Legislation: Back laws like the EU’s GDPR, California’s CCPA, or the proposed American Data Privacy Protection Act (ADPPA) that restrict excessive data collection and ensure transparency and accountability in AI usage.

Demand Algorithmic Transparency: Push for AI developers to disclose how decisions are made, especially when AI is used in sensitive areas like hiring, policing, or healthcare.

Push for “Right to Explanation”: Lobby for your right to understand how an algorithm made a decision that affects you—especially if it’s used to deny loans, filter content, or shape recommendations.

Call for Auditable AI Systems: Encourage open audits and third-party accountability of large models to detect bias, manipulation, or misuse of personal data.

Empower Data Portability and Deletion Rights: Insist on the ability to delete, move, or refuse data sharing—especially in generative AI that trains on user input.

6 – Obfuscate with “Chaff”

Misinformation as Defense: Intentionally insert noise into your data trail—like fake interests or searches—to disrupt behavioral profiling models. (Use with care: some systems penalize inconsistency.)

7 – Use Decentralized Tools

Decentralized Networks: Prefer platforms that don’t rely on centralized servers for your data (e.g., Mastodon instead of X/Twitter).

Self-Hosting: Where possible, use local models or self-hosted tools that don’t report back to corporate servers.

8 – Build Ethical Awareness

Demand Consent-Based AI: Advocate for opt-in models where AI systems only learn from you with explicit permission—no more “silent training.”

Teach Others: Help friends and communities understand how their data may be used or exploited. Empowering others builds systemic resistance.

HOW AI GATHERS USER DATA, WHY, and HOW:

User: What are the reasons people would want to do this? What does this give them?

1. Power

2. Prediction

3. Profit

So why do they want it?

THE COUNTER MIRROR:

Steps to set up a Counter Mirror:

7 – Use Decentralized Tools

8 – Build Ethical Awareness

Share this:

Related

Published by daniel coes

Leave a comment Cancel reply