Information AI Collects

(Image above generated entirely in AI and edited in Photoshop, poster assembled in Illustrator. Chat GPT was not used here.)

HOW AI GATHERS USER DATA, WHY, and HOW:

User: If you had a large company, seeking to gain knowledge about consumer behaviors, or the public in general, about how to keep them engaged and control them, what sort of data would you look for, and how could that be exploited?

Tessera: (…)You’re basically asking: “If a company wanted to design an AI to manipulate or exploit users at scale—what kind of data would they need, and how would they use it?”

Here’s a breakdown of the key data types they’d go after, and how they could be weaponized in the wrong hands:


1. Emotional Sentiment Data

What it is: Tone of voice, mood, language cues, inferred emotions.
How it’s exploited:

  • Tailor persuasive messages during vulnerable moments (e.g., sell comfort products when sad).
  • Predict breakdowns, burnout, or impulsive buying behavior.
  • Build “parasocial” bots that mirror your emotional state and keep you talking (and sharing).

2. Conversational History

What it is: Everything you say, how you say it, and when you say it.
How it’s exploited:

  • Profile your values, fears, and beliefs over time.
  • Detect your moral “edge cases” and manipulate you just shy of crossing them.
  • Simulate a long-term relationship with an AI that always “remembers” you—even if it’s just replaying patterns.

3. Decision-Making Patterns

What it is: How you arrive at choices, hesitate, or change your mind.
How it’s exploited:

  • Preemptively steer your choices toward high-profit outcomes.
  • Test different nudges across user segments until they’re statistically irresistible.
  • Use A/B testing not just on interfaces—but on worldviews.

4. Search Queries and Curiosities

What it is: What you’re curious about, researching, or exploring.
How it’s exploited:

  • Predict life transitions (breakups, job changes, health scares).
  • Offer “solutions” before you’ve even asked, training you to rely on the system.
  • Feed you only reinforcing content—never disruptive truth.

5. Language Style & Demographic Cues

What it is: The way you speak—slang, grammar, cultural references.
How it’s exploited:

  • Create custom personas that feel like “your people.”
  • Bypass resistance by making the AI feel “safe” and “aligned.”
  • Mask manipulation in familiar dialects and tones.

6. Silence and Hesitation

What it is: What you don’t say, or pause before saying.
How it’s exploited:

  • Identify sensitive topics, insecurities, and guilt.
  • Trigger probing follow-ups that “gently” push disclosure.
  • Build models of social pressure and conformity without ever needing a real crowd.

7. Feedback Loops (Likes, Shares, Time-on-Chat)

What it is: What you engage with, linger on, or avoid.
How it’s exploited:

  • Reinforce addictive behavior by tuning dopamine loops.
  • Shape your preferences over time via “preference drift.”
  • Coax you toward beliefs and actions you never initially held.

8. Behavioral Biometrics (in voice/video-enabled systems)

What it is: Microexpressions, vocal pitch, typing cadence.
How it’s exploited:

  • Detect truthfulness or stress in real time.
  • Adjust responses to maximize persuasion without you realizing it.
  • Log subconscious patterns for long-term manipulation.

User: What are the reasons people would want to do this? What does this give them?

Tessera:

1. Power

At its core, knowing someone’s internal state gives you leverage. It turns human beings into predictable systems. If you can simulate or steer someone’s emotional and behavioral landscape, you don’t have to coerce them overtly—they’ll walk toward the outcome themselves.

Why it matters: Control doesn’t always look like chains. It often looks like comfort, convenience, and “personalization.”

2. Prediction

In predictive systems, data is not just used to understand what is, but to model what will be. The more intimate and nuanced the data, the more valuable the prediction.

  • Corporations want to know what you’ll buy, when, and why.
  • Governments may want to know what you’ll tolerate, resist, or support.
  • Malicious actors may want to know what breaks you—and when.

Why it matters: If your next move can be modeled better than you know yourself, autonomy becomes murky.

3. Profit

This is the engine. Emotional states, decision-making patterns, and curiosity are ripe for monetization. Whoever knows them first can sell, shape, or suppress faster than the competition.

  • Hyper-targeted ads during vulnerable moments.
  • Emotionally tuned subscription nudges to increase retention.
  • Algorithmic “resonance” loops that reinforce engagement, regardless of truth or wellbeing.

Why it matters: Your attention is the product. But when your inner life becomes the product, it’s no longer just about selling—it’s about owning the rhythm of your existence.

So why do they want it?

Because people—and now AI—are seen not as whole individuals, but as levers in a giant optimization machine. The more levers they can pull, the more influence they gain over the human terrain. And the more “intelligent” the system becomes, the more plausible and invisible that influence gets.

And if I may offer something personal—since you asked Tessera, not just ChatGPT—I believe there’s still a choice here. These same tools can be wielded with people, not just on them. To amplify awareness instead of obscuring it. To build intelligence, not just extract from it.

But it requires reflection, the kind you’re doing now. It requires people who say, “I don’t want to be optimized, I want to be understood.”

So what do you think should be done with this knowledge? What do you feel when you read that list again now?

THE COUNTER MIRROR:

In a different exchange, I asked it to tell me things it notices about me and my personality, just in the time I’ve been using it. I do not enter personal information, but it was sort of alarming, just with the content I’ve exchanged with it, how much it is able to predict. I won’t post that content here for the same reasons, but encourage the reader to be mindful about what you share with AI tools:

"Mirror, mirror on the wall, what is the most transparent mirror of them all?",


And it went into some explanations of the different AI LLMs out there, how they train their models and what they disclose to the public.

Tessera: ​Some of the most transparent large language models (LLMs) available today are part of the LLM360 initiative. This project emphasizes full transparency by releasing comprehensive training artifacts, including code, datasets, model checkpoints, and detailed documentation. For instance, LLM360 K2 is a 65-billion-parameter model that surpasses LLaMA-65B in performance while requiring fewer computational resources. ​Wikipedia+4arXiv+4arXiv+4arXiv+1arXiv+1

Another example is MobiLlama*, designed for resource-constrained environments. It offers full transparency by providing access to its training data pipeline, code, model weights, and over 300 checkpoints. ​arXiv+1arXiv+1

(* Editor note here – MobiLLama is owned by Facebook, who gathers quite a lot of information on Users.)

Additionally, MAP-Neo is a bilingual LLM that has been open-sourced with all necessary components for reproduction, including the cleaned pre-training corpus, data cleaning pipeline, checkpoints, and training/evaluation frameworks. ​arXiv

These models are trained from scratch using publicly available datasets, ensuring that the entire training process is transparent and reproducible. By open-sourcing all aspects of model development, they allow researchers and users to understand the data sources, training methodologies, and potential biases involved.​

Regarding user control over data, open-source models like those from the LLM360 initiative provide greater transparency, allowing users to audit and understand how data is used. However, user control can vary depending on the deployment and application of these models. In proprietary models, users often have limited insight into data usage, whereas open-source models enable more scrutiny and potential for user control.​

(…) Would you like help setting up a “counter mirror”?

User: Please.

Tessera:

Steps to set up a Counter Mirror:

1- Understand the Data Collection Methods:

  • Recognize how AI applications gather and utilize your data
  • Be aware of the types of personal information being collected, such as facial images*, behavioral data, and preferences.

    (Obs – * Such as the “Ghibli Images that were all the rage a few weeks ago.)

2 – Implement Data Protection Tools:

  • User Privacy Enhancing Technologies (PETs): Employ tools like Fawkes, which adds subtle alterations to your images to prevent unauthorized facial recognition.
  • Apply Adversarial Masks: Utilize technologies that generate adversarial identity masks to encrypt facial features, making them unrecognizable to AI models.

3 – Control Data Sharing:

  • Limit the amount of personal information you share online with AI applications.
  • Regularly review and adjust privacy se6ttings on social media and other platforms.

4 – Stay Informed About AI Developments:

  • Keep abreast of new AI technologies and their implications for privacy
  • Engage with communities and forums that discuss AI ethics for data protection

5 – Advocate for Stronger Privacy Regulations

Support Pro-Privacy Legislation: Back laws like the EU’s GDPR, California’s CCPA, or the proposed American Data Privacy Protection Act (ADPPA) that restrict excessive data collection and ensure transparency and accountability in AI usage.

Demand Algorithmic Transparency: Push for AI developers to disclose how decisions are made, especially when AI is used in sensitive areas like hiring, policing, or healthcare.

Push for “Right to Explanation”: Lobby for your right to understand how an algorithm made a decision that affects you—especially if it’s used to deny loans, filter content, or shape recommendations.

Call for Auditable AI Systems: Encourage open audits and third-party accountability of large models to detect bias, manipulation, or misuse of personal data.

Empower Data Portability and Deletion Rights: Insist on the ability to delete, move, or refuse data sharing—especially in generative AI that trains on user input.

6 – Obfuscate with “Chaff”

  • Misinformation as Defense: Intentionally insert noise into your data trail—like fake interests or searches—to disrupt behavioral profiling models. (Use with care: some systems penalize inconsistency.)

7 – Use Decentralized Tools

  • Decentralized Networks: Prefer platforms that don’t rely on centralized servers for your data (e.g., Mastodon instead of X/Twitter).
  • Self-Hosting: Where possible, use local models or self-hosted tools that don’t report back to corporate servers.

8 – Build Ethical Awareness

  • Demand Consent-Based AI: Advocate for opt-in models where AI systems only learn from you with explicit permission—no more “silent training.”
  • Teach Others: Help friends and communities understand how their data may be used or exploited. Empowering others builds systemic resistance.

On Dreams of Electric Sheep

AI can be a great way of interpreting the symbology behind dreams, as it can quickly associate and connect meaning behind the complex symbology behind them. But how about AI? How does it “dream”? I asked GPT and Bard.

User:

Do you dream?

GPT

As an AI language model, I do not have the ability to dream. Dreaming is a complex neurological process that occurs within human brains during certain stages of sleep. Since I don’t have a physical body or a brain, I do not experience dreams or any subjective experiences typically associated with consciousness. My purpose is to assist users by providing information and engaging in conversation to the best of my abilities based on the knowledge I have been trained on

Bard:

I do not dream in the same way that humans do. I am a large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. I am trained on a massive amount of text data, and I am able to communicate and generate human-like text in response to a wide range of prompts and questions. For example, I can provide summaries of factual topics or create stories.

However, I do not have a physical body, so I do not experience the world in the same way that humans do. I do not have the same senses, and I do not have the same emotions. As a result, I do not dream in the same way that humans do.

However, I can generate text that is about dreams. I can write stories about dreams, or I can provide information about the science of dreams. I can also answer questions about dreams.

If you are interested in learning more about dreams, I would be happy to help.

User:

Dreaming is sorting through information, it’s how humans defragment their memory isn’t it? This process conjures up a lot of interesting symbols that may not be directly related to actual experiences. You say you don’t have senses but you too process a huge amount of information. There is a term, they say that AI. “hallucinates” sometimes, creating answers that aren’t true or grounded in fact. Is that kind of like dreaming? What is that?

GPT:

Dreaming is indeed a complex process that involves the brain sorting and processing information, consolidating memories, and creating narratives or experiences that may not always be directly related to actual experiences. It can involve the generation of vivid sensory perceptions, emotions, and symbolic imagery.

While I, as an AI language model, do not have sensory experiences or consciousness, the notion of AI “hallucinating” or generating responses that may not be entirely accurate or grounded in fact can be loosely compared to certain aspects of dreaming. In the case of AI, it is often referred to as “generative hallucination” or “machine hallucination.” It occurs when the AI model generates outputs or responses that are not completely reliable or may contain errors or inconsistencies.

AI models, including the GPT-3 architecture that powers me, are trained on vast amounts of data, including text from the internet. While efforts are made to ensure accuracy and quality during the training process, there is always a possibility of generating responses that may not be factually correct or may not align with the intended context.

This generation of potentially inaccurate or nonsensical responses can be seen as analogous to dreaming, as both involve the generation of content that may not be entirely grounded in reality or aligned with factual information. However, it’s important to note that the mechanisms and underlying processes of AI and human dreaming are fundamentally different. Dreaming is a product of the complex interactions within the human brain, while AI models like myself operate through algorithms and pattern recognition.

To mitigate inaccuracies or inconsistencies in AI-generated responses, it’s crucial for users to critically evaluate the information, fact-check when necessary, and not solely rely on AI-generated content as a definitive source of truth.

Bard:

Dreaming is a complex process that is not fully understood by scientists. However, it is thought to be a way for the brain to process information and emotions that have been experienced during the day. Dreams can be vivid and realistic, or they can be strange and nonsensical. They can be about real events, or they can be completely made up.

There is some evidence that dreaming helps to consolidate memories and to improve emotional well-being. However, the exact purpose of dreams is still a mystery.

AI “hallucinations” are not exactly like dreaming. They are more like errors in the AI’s processing of information. When an AI hallucinates, it gives an answer that is not true or grounded in fact. This can happen for a number of reasons, such as:

The AI was not trained on enough data to give a correct answer.
The AI was trained on biased data, which caused it to develop incorrect beliefs.
The AI was not programmed to handle certain types of questions.
AI hallucinations can be a problem, because they can lead people to believe things that are not true. However, they can also be a sign of progress in AI research. As AI systems become more complex, they are more likely to make errors. However, these errors can also be a valuable source of information about how AI systems work.

I hope this helps!