{"id":103133,"date":"2024-10-21T04:58:09","date_gmt":"2024-10-21T04:58:09","guid":{"rendered":"https:\/\/www.seeedstudio.com\/blog\/?p=103133"},"modified":"2024-10-31T07:04:36","modified_gmt":"2024-10-31T07:04:36","slug":"watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense","status":"publish","type":"post","link":"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/","title":{"rendered":"WatchThis: A Wearable Point-and-Ask Interface Powered by Vision-Language Models and XIAO ESP32S3 Sense"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"103133\" class=\"elementor elementor-103133\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-42012925 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"42012925\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-145d91a4\" data-id=\"145d91a4\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-788420b3 elementor-widget elementor-widget-text-editor\" data-id=\"788420b3\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198905a1a3\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t<p><\/p>\n<p><span style=\"font-weight: 400\">MIT Media Lab researchers <\/span><strong><a href=\"https:\/\/www.linkedin.com\/in\/cathy-fang\/\">Cathy Mengying Fang<\/a>, <a href=\"https:\/\/www.linkedin.com\/in\/patrickchwalek\/\">Patrick Chwalek<\/a>, <\/strong><a href=\"https:\/\/www.linkedin.com\/in\/quincy-kuang-5bb03b139\/\"><span style=\"font-weight: 400\"><strong>Quincy Kuang<\/strong><\/span><\/a><span style=\"font-weight: 400\">, and <strong><a href=\"https:\/\/www.linkedin.com\/in\/pattie-maes-67276273\/\">Pattie Maes<\/a><\/strong> have developed WatchThis, a groundbreaking wearable device that enables natural language interactions with real-world objects through simple pointing gestures. Cathy conceived the idea for WatchThis during a one-day hackathon in Shenzhen, organized as part of <\/span><a href=\"https:\/\/www.media.mit.edu\/\"><span style=\"font-weight: 400\">MIT Media Lab&#8217;s <\/span><\/a><span style=\"font-weight: 400\">&#8220;Research at Scale&#8221; initiative. Organized by <\/span><a href=\"https:\/\/www.linkedin.com\/in\/honnet\/\"><span style=\"font-weight: 400\">Cedric Honnet<\/span><\/a><span style=\"font-weight: 400\"> and hosted by <\/span><a href=\"https:\/\/www.sustech.edu.cn\/en\/\"><span style=\"font-weight: 400\">Southern University of Science and Technology<\/span><\/a><span style=\"font-weight: 400\"> and <\/span><a href=\"https:\/\/www.seeedstudio.com\/\"><span style=\"font-weight: 400\">Seeed Studio<\/span><\/a><span style=\"font-weight: 400\">, the hackathon provided the perfect setting to prototype this innovative device using components from the <\/span><a href=\"https:\/\/www.seeedstudio.com\/XIAO-ESP32S3-Sense-p-5639.html?utm_source=Seeedblog\"><span style=\"font-weight: 400\">Seeed Studio XIAO ESP32S3 suite<\/span><\/a><span style=\"font-weight: 400\">. By integrating Vision-Language Models (VLMs) with a compact wrist-worn device, WatchThis allows users to ask questions about their surroundings in real-time, making contextual queries as intuitive as pointing and asking.<\/span><\/p>\n<p><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7da04f5 elementor-widget elementor-widget-video\" data-id=\"7da04f5\" data-element_type=\"widget\" data-settings=\"{&quot;video_type&quot;:&quot;hosted&quot;,&quot;autoplay&quot;:&quot;yes&quot;,&quot;controls&quot;:&quot;yes&quot;}\" data-widget_type=\"video.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198905af9d\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t<div class=\"e-hosted-video elementor-wrapper elementor-open-inline\">\n\t\t\t\t\t<video class=\"elementor-video\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/WatchThisVideo.mp4\" autoplay=\"\" controls=\"\" controlsList=\"nodownload\" poster=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg\"><\/video>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3274b81 elementor-widget elementor-widget-text-editor\" data-id=\"3274b81\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198905b6a3\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t<p style=\"text-align: center\">Credit: Cathy Fang<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4eb32ac8 elementor-widget elementor-widget-heading\" data-id=\"4eb32ac8\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198905c168\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span><h3 class=\"elementor-heading-title elementor-size-default\">Hardwares<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-65fe23e8 elementor-widget elementor-widget-text-editor\" data-id=\"65fe23e8\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198905c82e\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t<p>The WatchThis project utilizes the following hardware components:<\/p><ul><li style=\"list-style-type: none\"><ul><li style=\"font-weight: 400\"><a href=\"https:\/\/www.seeedstudio.com\/XIAO-ESP32S3-Sense-p-5639.html?utm_source=Seeedblog\"><span style=\"font-weight: 400\">Seeed Studio XIAO ESP32S3 Sense<\/span><\/a><span style=\"font-weight: 400\"> (with camera expansion board)<\/span><\/li><li style=\"font-weight: 400\"><a href=\"https:\/\/www.seeedstudio.com\/Seeed-Studio-Round-Display-for-XIAO-p-5638.html?utm_source=Seeedblog\"><span style=\"font-weight: 400\">Seeed Studio round display for XIAO<\/span><\/a><\/li><li style=\"font-weight: 400\"><a href=\"https:\/\/www.seeedstudio.com\/OV2640-Fisheye-Camera-p-4048.html\"><span style=\"font-weight: 400\">OV2640 camera<\/span><\/a><span style=\"font-weight: 400\"> with long ribbon cable<\/span><\/li><li style=\"font-weight: 400\"><span style=\"font-weight: 400\">LiPo Battery<\/span><\/li><li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Custom 3D printed parts<\/span><\/li><\/ul><\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5e82c7a0 elementor-widget elementor-widget-spacer\" data-id=\"5e82c7a0\" data-element_type=\"widget\" data-widget_type=\"spacer.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198905cee2\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t<div class=\"elementor-spacer\">\n\t\t\t<div class=\"elementor-spacer-inner\"><\/div>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4f7b87a4 elementor-widget elementor-widget-image\" data-id=\"4f7b87a4\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198905e86f\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"1002\" height=\"698\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/breakdown.png\" class=\"attachment-full size-full wp-image-103135\" alt=\"\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/breakdown.png 1002w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/breakdown-300x209.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/breakdown-768x535.png 768w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/breakdown-32x22.png 32w\" sizes=\"(max-width: 1002px) 100vw, 1002px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Credit: Cathy Fang<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-49ef3344 elementor-widget elementor-widget-spacer\" data-id=\"49ef3344\" data-element_type=\"widget\" data-widget_type=\"spacer.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198905ef2d\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t<div class=\"elementor-spacer\">\n\t\t\t<div class=\"elementor-spacer-inner\"><\/div>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-52452ace elementor-widget elementor-widget-heading\" data-id=\"52452ace\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198905f585\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span><h3 class=\"elementor-heading-title elementor-size-default\">How the Project Works<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3820ce66 elementor-widget elementor-widget-text-editor\" data-id=\"3820ce66\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198905fc9e\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t<p><span style=\"font-weight: 400\">WatchThis is designed to seamlessly integrate natural, gesture-based interaction into daily life. The wearable device consists of a watch with a rotating, flip-up camera attached to the back of a display. When the user points at an object of interest, the camera captures the area, and the device processes contextual queries based on the user&#8217;s gesture.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The interaction begins when the user flips up the watch body to reveal the camera, which then captures the area where the finger points at. The watch\u2019s display shows a live feed from the camera, allowing precise aiming. When the user touches the screen, the device captures the image and pauses the camera feed. The captured RGB image is then compressed into JPG format and converted to base64, after which an API request is made to query the image.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The device uses these API calls to interact with <\/span><strong><a href=\"https:\/\/platform.openai.com\/docs\/models\/gpt-4o\">OpenAI\u2019s GPT-4o model<\/a><\/strong><span style=\"font-weight: 400\">, which accepts both text and image inputs. This allows the user to ask questions such as &#8220;What is this?&#8221; or &#8220;Translate this,&#8221; and receive immediate responses. The text response is displayed on the screen, overlaid on the captured image. After the response is shown for 3 seconds, the screen returns to streaming the camera feed, ready for the next command.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The software driving WatchThis is written in Arduino-compatible C++ and runs directly on the device. It is optimized for quick and efficient performance, with an end-to-end response time of around 3 seconds. Instead of relying on voice recognition or text-to-speech\u2014which can be error-prone and resource-intensive\u2014the system uses direct text input for queries. Users can further personalize their interactions by modifying the default query prompt through an accompanying WebApp served on the device, allowing tailored actions such as identifying objects, translating text, or requesting instructions.<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a48c1f9 elementor-widget elementor-widget-image\" data-id=\"a48c1f9\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d1989061225\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1330\" height=\"522\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/teaser.png\" class=\"attachment-full size-full wp-image-103138\" alt=\"\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/teaser.png 1330w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/teaser-300x118.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/teaser-1030x404.png 1030w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/teaser-768x301.png 768w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/teaser-32x13.png 32w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/teaser-1024x402.png 1024w\" sizes=\"(max-width: 1330px) 100vw, 1330px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-664f634 elementor-widget elementor-widget-image\" data-id=\"664f634\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198906228b\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1434\" height=\"440\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/system.png\" class=\"attachment-full size-full wp-image-103137\" alt=\"\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/system.png 1434w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/system-300x92.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/system-1030x316.png 1030w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/system-768x236.png 768w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/system-32x10.png 32w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/system-1024x314.png 1024w\" sizes=\"(max-width: 1434px) 100vw, 1434px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Credit: Cathy Fang<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5cda9ed1 elementor-widget elementor-widget-spacer\" data-id=\"5cda9ed1\" data-element_type=\"widget\" data-widget_type=\"spacer.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d19890629bd\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t<div class=\"elementor-spacer\">\n\t\t\t<div class=\"elementor-spacer-inner\"><\/div>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3085d870 elementor-widget elementor-widget-heading\" data-id=\"3085d870\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d1989063019\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span><h3 class=\"elementor-heading-title elementor-size-default\">Applications<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6619d69a elementor-widget elementor-widget-text-editor\" data-id=\"6619d69a\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d1989063767\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t<p><span style=\"font-weight: 400\">Imagine strolling through a city and pointing at a building to learn its history, or identifying an exotic plant in a botanical garden with a mere gesture.<\/span><\/p><p><span style=\"font-weight: 400\">The device goes beyond simple identification, offering practical applications like real-time translation of, for example, menu items, which is a game-changer for travelers and language learners alike.<\/span><\/p><p><span style=\"font-weight: 400\">The research team has discussed even more exciting potential applications:<\/span><\/p><ul><li style=\"list-style-type: none\"><ul><li style=\"font-weight: 400\"><span style=\"font-weight: 400\">A &#8220;Remember this&#8221; function could serve as a visual reminder system, potentially aiding those who need to take medication regularly.<\/span><\/li><li style=\"font-weight: 400\"><span style=\"font-weight: 400\">For urban explorers, a &#8220;How do I get there&#8221; feature could provide intuitive, spatially-aware navigation by allowing users to point at distant landmarks.<\/span><\/li><li style=\"font-weight: 400\"><span style=\"font-weight: 400\">A &#8220;Zoom in on that&#8221; capability could offer a closer look at far-off objects without disrupting the user&#8217;s activities.<\/span><\/li><li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Perhaps most intriguingly, a &#8220;Turn that off&#8221; function could allow users to control smart home devices with a combination of voice commands and gestures, seamlessly integrating with IoT ecosystems.<\/span><\/li><\/ul><\/li><\/ul><p><span style=\"font-weight: 400\">While some of these features are still in conceptual stages, they paint a picture of a future where our interactions with the world around us are more intuitive, informative, and effortless than ever before. <\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-25c55dd elementor-widget elementor-widget-image\" data-id=\"25c55dd\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d19890648b7\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"894\" height=\"444\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/directions.png\" class=\"attachment-full size-full wp-image-103136\" alt=\"\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/directions.png 894w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/directions-300x149.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/directions-768x381.png 768w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/directions-32x16.png 32w\" sizes=\"(max-width: 894px) 100vw, 894px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-72b6ad1 elementor-widget elementor-widget-image\" data-id=\"72b6ad1\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d1989065a88\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1100\" height=\"626\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/apps.png\" class=\"attachment-full size-full wp-image-103134\" alt=\"\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/apps.png 1100w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/apps-300x171.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/apps-1030x586.png 1030w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/apps-768x437.png 768w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/apps-32x18.png 32w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/apps-1024x583.png 1024w\" sizes=\"(max-width: 1100px) 100vw, 1100px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Credit: Cathy Fang<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6829d2a3 elementor-widget elementor-widget-spacer\" data-id=\"6829d2a3\" data-element_type=\"widget\" data-widget_type=\"spacer.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d1989066150\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t<div class=\"elementor-spacer\">\n\t\t\t<div class=\"elementor-spacer-inner\"><\/div>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-47c78e1b elementor-widget elementor-widget-heading\" data-id=\"47c78e1b\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d1989066895\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span><h3 class=\"elementor-heading-title elementor-size-default\">Build Your Own WatchThis<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-59eda174 elementor-widget elementor-widget-text-editor\" data-id=\"59eda174\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d19890673ac\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t<p><span style=\"font-weight: 400\">Interested in building your own WatchThis wearable? Explore the open-source hardware and software components on <\/span><a href=\"https:\/\/github.com\/cathy-mengying-fang\/WatchThis\"><span style=\"font-weight: 400\">GitHub<\/span><\/a><span style=\"font-weight: 400\"> and start creating today!\u00a0<\/span>Check out their paper below for full details.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eb86234 elementor-align-center elementor-widget elementor-widget-button\" data-id=\"eb86234\" data-element_type=\"widget\" data-widget_type=\"button.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d1989068686\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t<div class=\"elementor-button-wrapper\">\n\t\t\t\t\t<a class=\"elementor-button elementor-button-link elementor-size-sm\" href=\"https:\/\/cathy-fang.com\/image\/watchthis\/WatchiThis_UIST_PrePrint.pdf\">\n\t\t\t\t\t\t<span class=\"elementor-button-content-wrapper\">\n\t\t\t\t\t\t\t\t\t<span class=\"elementor-button-text\">WatchThis Paper<\/span>\n\t\t\t\t\t<\/span>\n\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-24447e3 elementor-widget elementor-widget-spacer\" data-id=\"24447e3\" data-element_type=\"widget\" data-widget_type=\"spacer.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d1989068d4f\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t<div class=\"elementor-spacer\">\n\t\t\t<div class=\"elementor-spacer-inner\"><\/div>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t<div class=\"elementor-element elementor-element-1c159695 e-flex e-con-boxed e-con e-parent\" data-id=\"1c159695\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-3c961a70 elementor-widget elementor-widget-heading\" data-id=\"3c961a70\" data-element_type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198906cf2b\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span><h4 class=\"elementor-heading-title elementor-size-default\">End Note<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1dec9a95 elementor-widget elementor-widget-text-editor\" data-id=\"1dec9a95\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198906d689\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t<p>Hey community, we&#8217;re curating a monthly newsletter centering around the beloved <a href=\"https:\/\/www.seeedstudio.com\/xiao-series-page\"><span style=\"color: #99cc00\"><strong>Seeed Studio XIAO<\/strong><\/span><\/a>. If you want to stay up-to-date with:<\/p><p>\ud83e\udd16\ufe0f\u00a0<strong>Cool Projects from the Community<\/strong>\u00a0to get inspiration and tutorials<br \/>\ud83d\udcf0\u00a0<strong>Product Updates<\/strong>: firmware update, new product spoiler<br \/>\ud83d\udcd6\u00a0<strong>Wiki Updates<\/strong>: new wikis + wiki contribution<br \/>\ud83d\udce3\u00a0<strong>News<\/strong>: events, contests, and other community stuff<\/p><p>Please click the image below\ud83d\udc47 to subscribe now!<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3cdcf9bb elementor-widget elementor-widget-image\" data-id=\"3cdcf9bb\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<span id=\"scroll69d198906e81d\"  class=\"scrollMagicControl\" type=\"hidden\" effect = {} wpmp_enable_desktop=\"yes\" wpmp_enable_tablet=\"yes\" wpmp_enable_mobile=\"yes\" wpmp_trigger_hook=\"0.5\" wpmp_reverse=\"yes\" wpmp_class_CSS =\"custom\" split-text = {} value=\"scrollmagic\"><\/span>\t\t\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/mailchi.mp\/seeed\/xiao\">\n\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"800\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/05\/XIAO-Newsletter-scaled.jpg\" class=\"attachment-full size-full wp-image-94852\" alt=\"\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/05\/XIAO-Newsletter-scaled.jpg 2560w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/05\/XIAO-Newsletter-300x94.jpg 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/05\/XIAO-Newsletter-1030x322.jpg 1030w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/05\/XIAO-Newsletter-768x240.jpg 768w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/05\/XIAO-Newsletter-1536x480.jpg 1536w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/05\/XIAO-Newsletter-2048x640.jpg 2048w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/05\/XIAO-Newsletter-32x10.jpg 32w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/05\/XIAO-Newsletter-1024x320.jpg 1024w\" sizes=\"(max-width: 2560px) 100vw, 2560px\" \/>\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>MIT Media Lab researchers Cathy Mengying Fang, Patrick Chwalek, Quincy Kuang, and Pattie Maes have<\/p>\n","protected":false},"author":3623,"featured_media":103141,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","_price":"","_stock":"","_tribe_ticket_header":"","_tribe_default_ticket_provider":"","_tribe_ticket_capacity":"0","_ticket_start_date":"","_ticket_end_date":"","_tribe_ticket_show_description":"","_tribe_ticket_show_not_going":false,"_tribe_ticket_use_global_stock":"","_tribe_ticket_global_stock_level":"","_global_stock_mode":"","_global_stock_cap":"","_tribe_rsvp_for_event":"","_tribe_ticket_going_count":"","_tribe_ticket_not_going_count":"","_tribe_tickets_list":"[]","_tribe_ticket_has_attendee_info_fields":false,"iawp_total_views":0,"footnotes":""},"categories":[4391,4393],"tags":[1301,304,4899,4969,4706,4538,4725,108,3129,4555],"class_list":["post-103133","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-build","category-tech","tag-ai","tag-iot","tag-openai","tag-round-display","tag-seeed-studio-xiao","tag-vision-ai","tag-vision-language-model","tag-wearable","tag-xiao","tag-xiao-esp32s3-sense"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.0 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>WatchThis: A Wearable Point-and-Ask Interface Powered by Vision-Language Models and XIAO ESP32S3 Sense - Latest News from Seeed Studio<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"WatchThis: A Wearable Point-and-Ask Interface Powered by Vision-Language Models and XIAO ESP32S3 Sense - Latest News from Seeed Studio\" \/>\n<meta property=\"og:description\" content=\"MIT Media Lab researchers Cathy Mengying Fang, Patrick Chwalek, Quincy Kuang, and Pattie Maes have\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/\" \/>\n<meta property=\"og:site_name\" content=\"Latest News from Seeed Studio\" \/>\n<meta property=\"article:published_time\" content=\"2024-10-21T04:58:09+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-10-31T07:04:36+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"675\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Kezang Loday\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kezang Loday\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/\",\"url\":\"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/\",\"name\":\"WatchThis: A Wearable Point-and-Ask Interface Powered by Vision-Language Models and XIAO ESP32S3 Sense - Latest News from Seeed Studio\",\"isPartOf\":{\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg\",\"datePublished\":\"2024-10-21T04:58:09+00:00\",\"dateModified\":\"2024-10-31T07:04:36+00:00\",\"author\":{\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/8ba4d5089097d474bddd0715aec08055\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/#primaryimage\",\"url\":\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg\",\"contentUrl\":\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg\",\"width\":1200,\"height\":675},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.seeedstudio.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"WatchThis: A Wearable Point-and-Ask Interface Powered by Vision-Language Models and XIAO ESP32S3 Sense\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/#website\",\"url\":\"https:\/\/www.seeedstudio.com\/blog\/\",\"name\":\"Latest News from Seeed Studio\",\"description\":\"Emerging IoT, AI and Autonomous Applications on the Edge\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.seeedstudio.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/8ba4d5089097d474bddd0715aec08055\",\"name\":\"Kezang Loday\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f949ebcb0a7740f701fdbabe6c11427e?s=96&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f949ebcb0a7740f701fdbabe6c11427e?s=96&r=g\",\"caption\":\"Kezang Loday\"},\"url\":\"https:\/\/www.seeedstudio.com\/blog\/author\/kezang\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"WatchThis: A Wearable Point-and-Ask Interface Powered by Vision-Language Models and XIAO ESP32S3 Sense - Latest News from Seeed Studio","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/","og_locale":"en_US","og_type":"article","og_title":"WatchThis: A Wearable Point-and-Ask Interface Powered by Vision-Language Models and XIAO ESP32S3 Sense - Latest News from Seeed Studio","og_description":"MIT Media Lab researchers Cathy Mengying Fang, Patrick Chwalek, Quincy Kuang, and Pattie Maes have","og_url":"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/","og_site_name":"Latest News from Seeed Studio","article_published_time":"2024-10-21T04:58:09+00:00","article_modified_time":"2024-10-31T07:04:36+00:00","og_image":[{"width":1200,"height":675,"url":"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg","type":"image\/jpeg"}],"author":"Kezang Loday","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kezang Loday","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/","url":"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/","name":"WatchThis: A Wearable Point-and-Ask Interface Powered by Vision-Language Models and XIAO ESP32S3 Sense - Latest News from Seeed Studio","isPartOf":{"@id":"https:\/\/www.seeedstudio.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/#primaryimage"},"image":{"@id":"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/#primaryimage"},"thumbnailUrl":"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg","datePublished":"2024-10-21T04:58:09+00:00","dateModified":"2024-10-31T07:04:36+00:00","author":{"@id":"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/8ba4d5089097d474bddd0715aec08055"},"breadcrumb":{"@id":"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/#primaryimage","url":"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg","contentUrl":"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg","width":1200,"height":675},{"@type":"BreadcrumbList","@id":"https:\/\/www.seeedstudio.com\/blog\/2024\/10\/21\/watchthis-a-wearable-point-and-ask-interface-powered-by-vision-language-models-and-xiao-esp32s3-sense\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.seeedstudio.com\/blog\/"},{"@type":"ListItem","position":2,"name":"WatchThis: A Wearable Point-and-Ask Interface Powered by Vision-Language Models and XIAO ESP32S3 Sense"}]},{"@type":"WebSite","@id":"https:\/\/www.seeedstudio.com\/blog\/#website","url":"https:\/\/www.seeedstudio.com\/blog\/","name":"Latest News from Seeed Studio","description":"Emerging IoT, AI and Autonomous Applications on the Edge","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.seeedstudio.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/8ba4d5089097d474bddd0715aec08055","name":"Kezang Loday","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f949ebcb0a7740f701fdbabe6c11427e?s=96&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f949ebcb0a7740f701fdbabe6c11427e?s=96&r=g","caption":"Kezang Loday"},"url":"https:\/\/www.seeedstudio.com\/blog\/author\/kezang\/"}]}},"modified_by":"Lily","views":5382,"featured_image_urls":{"full":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg",1200,675,false],"thumbnail":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4-80x80.jpeg",80,80,true],"medium":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4-300x169.jpeg",300,169,true],"medium_large":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4-768x432.jpeg",640,360,true],"large":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4-1030x579.jpeg",640,360,true],"1536x1536":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg",1200,675,false],"2048x2048":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg",1200,675,false],"visody_icon":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4-32x18.jpeg",32,18,true],"magazine-7-slider-full":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4.jpeg",1200,675,false],"magazine-7-slider-center":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4-936x675.jpeg",936,675,true],"magazine-7-featured":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4-1024x576.jpeg",1024,576,true],"magazine-7-medium":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4-720x380.jpeg",720,380,true],"magazine-7-medium-square":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2024\/10\/df3b5ad0-e510-406b-a669-4b7b443d79d4-675x450.jpeg",675,450,true]},"author_info":{"display_name":"Kezang Loday","author_link":"https:\/\/www.seeedstudio.com\/blog\/author\/kezang\/"},"category_info":"<a href=\"https:\/\/www.seeedstudio.com\/blog\/category\/build\/\" rel=\"category tag\">Build<\/a> <a href=\"https:\/\/www.seeedstudio.com\/blog\/category\/tech\/\" rel=\"category tag\">Tech<\/a>","tag_info":"Tech","comment_count":"1","_links":{"self":[{"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/posts\/103133","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/users\/3623"}],"replies":[{"embeddable":true,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/comments?post=103133"}],"version-history":[{"count":27,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/posts\/103133\/revisions"}],"predecessor-version":[{"id":104580,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/posts\/103133\/revisions\/104580"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/media\/103141"}],"wp:attachment":[{"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/media?parent=103133"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/categories?post=103133"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/tags?post=103133"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}