S.I.M.P [Socially-Integrated Multilingual Practice]

👀 Overview

For the love of Athena, language learning should be lived, not memorized. Step into a simulated phone call to 'help' your 'loved one' & actively learn vocabulary by identifying objects in real life.

S.I.M.P is an interactive language-learning application that simulates real-time phone calls, guiding users through physical, context-based tasks to reinforce vocabulary through immersive, real-world interaction.

Made at YHack 2026 with Maria Wiik & Prince R.

📚 Tech Stack

React

Lava AI

ElevenLabs

Gemini

🖍️ Description

Inspiration
Many people try to learn a new language—whether out of passion or to connect with someone they care about. Some take classes but struggle to retain what they learn outside the classroom. Others turn to apps like Duolingo, only to lose motivation after repetitive exercises.

The issue is not exposure to vocabulary—it’s retention. Words rarely stick when learned passively. They become memorable when tied to real experiences, especially moments of urgency, interaction, or even mild embarrassment when misidentifying something in real life.

We believe language learning doesn’t happen behind a screen. It happens through interaction—with people, with movement, and with the physical world.

What it does
Our solution reframes language learning as a simulated phone call.

The experience begins on a familiar phone boot screen, greeting users with "Hello, Simp". Then it prompts to ask user to choose their preferred language. After language is set up, the typical iPhone lock screen appears. Upon answering, the caller—who speaks a mix of English and a foreign language—urgently asks for help finding an object.

The user already has some familiarity with the language, but not full fluency. Through context, tone, and partial understanding—similar to real-life communication—they infer meaning. The user must physically move and locate the object in their environment.

If the user hesitates, the caller provides additional clues, describing the object in different ways to reinforce understanding and trigger recall. This mimics how people naturally learn words through repeated, contextual exposure.

Once the object is found, the caller thanks the user and ends the call. The app returns to the lock screen, now displaying the vocabulary the user has just reinforced.

While the narrative is intentionally playful and slightly theatrical, it serves a functional purpose: creating a memorable, embodied learning moment that strengthens retention.

How we built it
The application is built using a React-based framework, with orchestration handled through Lava, which simplifies and manages external API calls.

We rely heavily on generative AI and computer vision:
1. Google Gemini models
gemini-2.0-flash powers dialogue generation, translation, answer evaluation, and adaptive hinting.
gemini-3.1-flash-image-preview handles advanced visual verification of objects in the user’s environment.
2. ElevenLabs
Provides multilingual, natural-sounding voice synthesis for the caller experience.
YOLOv8 (Ultralytics)
Enables fast, local object detection before escalating to more complex models.
3. A Python-based CV server performs real-time detection, while Lava’s forward proxy routes and manages all AI requests efficiently. Together, these components create a responsive, end-to-end interactive system combining voice, vision, and adaptive narrative.

Challenges we ran into
One of the main challenges was balancing storytelling with instructional value. The narrative needed to feel engaging and believable while still reinforcing vocabulary effectively.

Another significant challenge was latency, particularly in computer vision. Providing real-time feedback while maintaining accuracy required careful coordination between local detection and cloud-based models.

Accomplishments that we're proud of
We successfully built a complete, end-to-end experience—from the initial lock screen interaction to the final vocabulary reinforcement.

We are also particularly satisfied with the voice experience. The realism and expressiveness of ElevenLabs significantly enhanced immersion and made the interaction feel natural.

What we learned
We learned to prioritize core functionality over surface-level design. While interface polish is important, focusing on the most technically complex components first ensured that the core experience worked reliably.

Establishing a functional prototype allowed us to communicate the value of our idea clearly to others, which was critical in a fast-paced development setting.

What's next for S.I.M.P?
Next steps include expanding language support and introducing a wider variety of narrative scenarios to keep the experience fresh.

We also plan to implement a progression system that distinguishes between familiar and new vocabulary, allowing users to build knowledge more systematically over time.

The long-term goal is to create a scalable platform where language learning is grounded in interaction, movement, and real-world context rather than passive repetition.

⭐️ Featured In

YHack 2026 (Yale University)