{"id":53572,"date":"2021-10-10T02:56:15","date_gmt":"2021-10-09T18:56:15","guid":{"rendered":"https:\/\/www.seeedstudio.com\/blog\/?p=53572"},"modified":"2021-11-20T00:41:56","modified_gmt":"2021-11-19T16:41:56","slug":"learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent","status":"publish","type":"post","link":"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/","title":{"rendered":"Learn TinyML using Wio Terminal and Arduino IDE  #6 Speech recognition on MCU &#8211; Speech-to-Intent"},"content":{"rendered":"\n<p>A traditional approach to using speech for device control\/user request fulfillment is first, to transcribe the speech to text and then parse the text to the commands\/queries in suitable format. While this approach offers a lot of flexibility in terms of vocabulary and\/or applications scenarios, a combination of speech recognition model and dedicated parser is not suitable for constrained resources of micro-controllers.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-6-1030x418.png\" alt=\"\" class=\"wp-image-53663\" width=\"840\" height=\"340\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-6-1030x418.png 1030w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-6-300x122.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-6-768x312.png 768w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-6-1024x416.png 1024w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-6.png 1372w\" sizes=\"(max-width: 840px) 100vw, 840px\" \/><figcaption>Source: <a href=\"https:\/\/www.seeedstudio.com\/Wio-Terminal-p-4509.html\" target=\"_blank\" rel=\"noreferrer noopener\">Wio Terminal<\/a>, <a href=\"https:\/\/picovoice.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">Picovoice<\/a>, <a href=\"https:\/\/www.tensorflow.org\/lite\" target=\"_blank\" rel=\"noreferrer noopener\">Tensorflow Lite<\/a><\/figcaption><\/figure>\n\n\n\n<p><br>A more efficient way is to directly parse user utterances into actionable output in form of intent\/slots. In this article I will share techniques to train a specific domain speech-to-intent model and deploy it to Cortex M4F based development board with built-in microphone, Wio Terminal from Seeed Studio.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe title=\"Wio Terminal TinyML Course #6 Speech recognition on MCU - Speech-to-Intent\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/CVq4cet5jgI?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>There are different types of speech recognition tasks &#8211; we can roughly divide them into three groups:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Large-vocabulary continuous speech recognition (LVCSR)<\/li><li>Keyword spotting<\/li><li>Speech-to-Intent<\/li><\/ul>\n\n\n\n<p>Keyword spotting works well on microcontrollers, fairly easy to train with variety of no-code open-source tools available, e.g. Edge Impulse, but cannot handle large(er) vocabularies well. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-7.png\" alt=\"\" class=\"wp-image-53664\" width=\"441\" height=\"327\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-7.png 953w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-7-300x223.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-7-768x570.png 768w\" sizes=\"(max-width: 441px) 100vw, 441px\" \/><\/figure><\/div>\n\n\n\n<p>If we\u2019d like to have a device to make a useful action based on speech input, we need to &nbsp;combine LVCSR model and text-based Natural language parser &#8211; this &nbsp;approach is robust and somewhat easier to implement, given abundance of publicly available ASR engines, but is not suitable for running even on SBCs, let alone microcontrollers.<\/p>\n\n\n\n<p>There is a third way, direct conversion of speech to parsed intent, based on specific domain vocabulary. Let\u2019s take smart washing machine or smart lights as an example. Speech-to-Intent model upon processing utterance <strong>\u201cNormal cycle with low-spin\u201d <\/strong>would output parsed intent, for example<strong> Intent: washClothes, Slots: cycle: normal spin: low water: default.<\/strong><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-9.png\" alt=\"\" class=\"wp-image-53666\" width=\"524\" height=\"424\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-9.png 877w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-9-300x243.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-9-768x621.png 768w\" sizes=\"(max-width: 524px) 100vw, 524px\" \/><\/figure><\/div>\n\n\n\n<p>And this is really all what we need to be able to control said smart washing mashing with voice.<\/p>\n\n\n\n<p>Speech-to-Intent is well represented in research, but lacking widely available open-source implementations suitable for microcontrollers.<br><strong>Production-ready, not open-source:<\/strong><br>&#8211; Picovoice<br>&#8211; Fluent.ai<br><strong>Production-ready, FOSS, not suitable for microcontrollers:<\/strong><br>&#8211; Speechbrain.io<\/p>\n\n\n\n<p>One of the main motivations behind my work on this project was to create an open-source easily accessible package for training and deploying Speech-to-Intent models on microcontrollers and SBCs. Additionally since it is project 6 of my TinyML course series, I\u2019d like to give the learners a more in-depth view on creating a project with pure Tensorflow Lite for Microcontrollers &#8211; while Edge Impulse is great and I recommend starting with it if you\u2019re new to Machine Learning Inference on Microcontrollers, using Tensorflow Lite for Microcontrollers has its own benefits too, for example much greater flexibility in terms of data you can use and different model architectures.<\/p>\n\n\n\n<p>As wise man once said, Talk is cheap, show me code. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"636\" height=\"337\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-11.png\" alt=\"\" class=\"wp-image-53746\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-11.png 636w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-11-300x159.png 300w\" sizes=\"(max-width: 636px) 100vw, 636px\" \/><\/figure><\/div>\n\n\n\n<p>For model training you can use either <a href=\"https:\/\/github.com\/AIWintermuteAI\/Speech-to-Intent-Micro\/tree\/main\/jupyter_notebooks\" target=\"_blank\" aria-label=\"undefined (opens in a new tab)\" rel=\"noreferrer noopener\">a Jupyter Notebook<\/a> I prepared or training scripts from <a href=\"https:\/\/github.com\/AIWintermuteAI\/Speech-to-Intent-Micro\" target=\"_blank\" aria-label=\"undefined (opens in a new tab)\" rel=\"noreferrer noopener\">Github repository<\/a>. Jupyter Notebook contains a very basic reference model implementation and also has explanation for each cell.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1030\" height=\"439\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-10-1030x439.png\" alt=\"\" class=\"wp-image-53668\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-10-1030x439.png 1030w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-10-300x128.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-10-768x327.png 768w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-10-1024x436.png 1024w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-10.png 1432w\" sizes=\"(max-width: 1030px) 100vw, 1030px\" \/><\/figure>\n\n\n\n<p>After model is trained copy it to folder with <a href=\"https:\/\/github.com\/AIWintermuteAI\/Speech-to-Intent-Micro\/tree\/main\/inference_code\/Wio_Terminal\/wio_speech_to_intent_150_10\" target=\"_blank\" aria-label=\"undefined (opens in a new tab)\" rel=\"noreferrer noopener\">code for Wio Terminal<\/a> and change the name of the model in <a href=\"https:\/\/github.com\/AIWintermuteAI\/Speech-to-Intent-Micro\/blob\/886746bb1878971d43e3e39584e0e2a492933491\/inference_code\/Wio_Terminal\/wio_speech_to_intent_150_10\/wio_speech_to_intent_150_10.ino#L106\" target=\"_blank\" aria-label=\"undefined (opens in a new tab)\" rel=\"noreferrer noopener\">line 106<\/a> to your model name. Let\u2019s go over the most important pieces of the code. It can be roughly divided into three parts:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>audio acquisition<\/li><li>MFCC calculation<\/li><li>inference on MFCC features<\/li><\/ul>\n\n\n\n<p><strong>Audio acquisition<\/strong><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignleft\"><img decoding=\"async\" src=\"https:\/\/img.tfd.com\/cde\/DMAPIO.GIF\" alt=\"Direct Memory Access | Article about Direct Memory Access by The Free  Dictionary\"\/><\/figure><\/div>\n\n\n\n<p>In order to record sound for processing with Wio Terminal built-in microphone we use DMA ADC function of Cortex M4F MCU. DMA stands for direct memory access and it is exactly what is says on the tin &#8211; a specific part of MCU called DMAC or Direct Memory Access Control is set up beforehand to &#8220;pipe&#8221; the data from one location (e.g. internal memory, SPI, I2C, ADC or other interface) to another. This way the transfer can happen without much involvement from MCU, apart from initial setup. We set the source and destination for transfer here<\/p>\n\n\n\n<p><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>descriptor.descaddr = (uint32_t)&amp;descriptor_section&#91;1]; \/\/ Set up a circular descriptor\ndescriptor.srcaddr = (uint32_t)&amp;ADC1-&gt;RESULT.reg; \/\/ Take the result from the ADC0 RESULT register\ndescriptor.dstaddr = (uint32_t)adc_buf_0 + sizeof(uint16_t) * ADC_BUF_LEN;  \/\/ Place it in the adc_buf_0 array\ndescriptor.btcnt = ADC_BUF_LEN;  \/\/ Beat count\ndescriptor.btctrl = DMAC_BTCTRL_BEATSIZE_HWORD |   \/\/ Beat size is HWORD (16-bits)\n                      DMAC_BTCTRL_DSTINC |      \/\/ Increment the destination address\n                      DMAC_BTCTRL_VALID |       \/\/ Descriptor is valid\n                      DMAC_BTCTRL_BLOCKACT_SUSPEND; \/\/ Suspend DMAC channel 0 after block transfer\nmemcpy(&amp;descriptor_section&#91;0], &amp;descriptor, sizeof(descriptor));  \/\/ Copy the descriptor to the descriptor section\ndescriptor.descaddr = (uint32_t)&amp;descriptor_section&#91;0];           \/\/ Set up a circular descriptor\ndescriptor.srcaddr = (uint32_t)&amp;ADC1-&gt;RESULT.reg;                 \/\/ Take the result from the ADC0 RESULT register\ndescriptor.dstaddr = (uint32_t)adc_buf_1 + sizeof(uint16_t) * ADC_BUF_LEN;  \/\/ Place it in the adc_buf_1 array\ndescriptor.btcnt = ADC_BUF_LEN;  \/\/ Beat count\ndescriptor.btctrl = DMAC_BTCTRL_BEATSIZE_HWORD |    \/\/ Beat size is HWORD (16-bits)\n                      DMAC_BTCTRL_DSTINC |    \/\/ Increment the destination address\n                      DMAC_BTCTRL_VALID |      \/\/ Descriptor is valid\n                      DMAC_BTCTRL_BLOCKACT_SUSPEND; \/\/ Suspend DMAC channel 0 after block transfer\nmemcpy(&amp;descriptor_section&#91;1], &amp;descriptor, sizeof(descriptor));  \/\/ Copy the descriptor to the descriptor section<\/code><\/pre>\n\n\n\n<p>As we specify with parameter DMAC_BTCTRL_BLOCKACT_SUSPEND; in DMA descriptor, the DMA Channel should be suspended after a complete block transfer. We then proceed to set up an ISR (Interrupt Service Routine) triggered with the TC5 timer:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> \/\/ Configure Timer\/Counter 5\nGCLK-&gt;PCHCTRL&#91;TC5_GCLK_ID].reg = GCLK_PCHCTRL_CHEN | \/\/ Enable perhipheral channel for TC5\nGCLK_PCHCTRL_GEN_GCLK1;    \/\/ Connect generic clock 0 at 48MHz\nTC5-&gt;COUNT16.WAVE.reg = TC_WAVE_WAVEGEN_MFRQ;     \/\/ Set TC5 to Match Frequency(MFRQ) mode\nTC5-&gt;COUNT16.CC&#91;0].reg = 3000 - 1;                          \/\/ Set the trigger to 16 kHz: (4Mhz \/ 16000) - 1\nwhile (TC5-&gt;COUNT16.SYNCBUSY.bit.CC0);                      \/\/ Wait for synchronization\n\/\/ Start Timer\/Counter 5\nTC5-&gt;COUNT16.CTRLA.bit.ENABLE = 1;                          \/\/ Enable the TC5 timer\nwhile (TC5-&gt;COUNT16.SYNCBUSY.bit.ENABLE);                   \/\/ Wait for synchronization<\/code><\/pre>\n\n\n\n<p>The ISR will call a specific function at equally spaced intervals of time, controlled by TC5 timer. Let&#8217;s have a look at that function.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/**\n * Interrupt Service Routine (ISR) for DMAC 1\n *\/\nvoid DMAC_1_Handler() {\n\n  static uint8_t count = 0;\n\n  \/\/ Check if DMAC channel 1 has been suspended (SUSP)\n  if (DMAC-&gt;Channel&#91;1].CHINTFLAG.bit.SUSP) {\n\n     \/\/ Debug: make pin high before copying buffer\n#ifdef DEBUG\n    digitalWrite(debug_pin, HIGH);\n#endif\n\n    \/\/ Restart DMAC on channel 1 and clear SUSP interrupt flag\n    DMAC-&gt;Channel&#91;1].CHCTRLB.reg = DMAC_CHCTRLB_CMD_RESUME;\n    DMAC-&gt;Channel&#91;1].CHINTFLAG.bit.SUSP = 1;\n\n    \/\/ See which buffer has filled up, and dump results into large buffer\n    if (count) {\n      audio_rec_callback(adc_buf_0, ADC_BUF_LEN);\n    } else {\n      audio_rec_callback(adc_buf_1, ADC_BUF_LEN);\n    }\n\n    \/\/ Flip to next buffer\n    count = (count + 1) % 2;\n\n    \/\/ Debug: make pin low after copying buffer\n#ifdef DEBUG\n    digitalWrite(debug_pin, LOW);\n#endif\n  }\n}<\/code><\/pre>\n\n\n\n<p>The ISR function called <strong>DMAC1_Handler()<\/strong> checks if DMAC Channel 1 was suspended &#8211; which happens when one block of information has finished recording. If it was, it calls a user-defined function <strong>audio_rec_callback()<\/strong>, where we copy the content of the filled DMA ADC buffer into a (possibly)larger buffer used to calculate MFCC features. Optionally we also apply some sound post-processing on this step.<\/p>\n\n\n\n<p><strong>MFCC calculation<\/strong><\/p>\n\n\n\n<p>MFCC feature extraction to match with TensorFlow MFCC Op code is borrowed from ARM repository for Keyword Search on ARM Microcontrollers. You can find the original code <a aria-label=\"undefined (opens in a new tab)\" href=\"https:\/\/github.com\/ARM-software\/ML-KWS-for-MCU\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. <\/p>\n\n\n\n<p>Most of the work related to MFCC feature calculation happens within method <strong>mfcc_compute(const int16_t * audio_data, float* mfcc_out)<\/strong> of MFCC class. The method receives a pointer to audio data, in our case 320 sound data points and a pointer to specific position in the array of MFCC output values. For one time slice we do the following operations:<\/p>\n\n\n\n<p>Normalize the data to -1,1 and pad it (in our case the padding does not happen, since the audio data is always the exact size necessary to calculate one slice of MFCC features):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>  \/\/TensorFlow way of normalizing .wav data to (-1,1)\n  for (i = 0; i &lt; frame_len; i++) {\n    frame&#91;i] = (float)audio_data&#91;i]\/(1&lt;&lt;15); \n  }\n  \/\/Fill up remaining with zeros\n  memset(&amp;frame&#91;frame_len], 0, sizeof(float) * (frame_len_padded-frame_len));\n<\/code><\/pre>\n\n\n\n<p>Calculate RFTT or <a href=\"https:\/\/www.keil.com\/pack\/doc\/CMSIS\/DSP\/html\/group__RealFFT.html\" target=\"_blank\" aria-label=\"undefined (opens in a new tab)\" rel=\"noreferrer noopener\">Real Fast Fourier Transform<\/a> with ARM Math library function:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>  \/\/Compute FFT\n  arm_rfft_fast_f32(rfft, frame, buffer, 0);<\/code><\/pre>\n\n\n\n<p>Convert the values to power spectrum:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>  \/\/frame is stored as &#91;real0, realN\/2-1, real1, im1, real2, im2, ...]\n  int32_t half_dim = frame_len_padded\/2;\n  float first_energy = buffer&#91;0] * buffer&#91;0],\n        last_energy =  buffer&#91;1] * buffer&#91;1];  \/\/ handle this special case\n  for (i = 1; i &lt; half_dim; i++) {\n    float real = buffer&#91;i*2], im = buffer&#91;i*2 + 1];\n    buffer&#91;i] = real*real + im*im;\n  }\n  buffer&#91;0] = first_energy;\n  buffer&#91;half_dim] = last_energy;  <\/code><\/pre>\n\n\n\n<p>Then apply Mel filterbanks to square roots of data saved in buffer in the last step. Mel filterbanks are created when MFCC class in instantiated, inside of <strong>create_mel_fbank()<\/strong> method. The number of filterbanks, minimum and maximum frequencies are specified by user beforehand &#8211; and it is very important to keep them consistent between training script and inference code, otherwise there will be a significant accuracy drop.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>  float sqrt_data;\n  \/\/Apply mel filterbanks\n  for (bin = 0; bin &lt; NUM_FBANK_BINS; bin++) {\n    j = 0;\n    float mel_energy = 0;\n    int32_t first_index = fbank_filter_first&#91;bin];\n    int32_t last_index = fbank_filter_last&#91;bin];\n    for (i = first_index; i &lt;= last_index; i++) {\n      arm_sqrt_f32(buffer&#91;i],&amp;sqrt_data);\n      mel_energy += (sqrt_data) * mel_fbank&#91;bin]&#91;j++];\n    }\n    mel_energies&#91;bin] = mel_energy;\n\n    \/\/avoid log of zero\n    if (mel_energy == 0.0)\n      mel_energies&#91;bin] = FLT_MIN;\n  }<\/code><\/pre>\n\n\n\n<p>Finally we take the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Discrete_cosine_transform\">discrete cosine transform<\/a> of the array of Mel energies and write it to MFCC features output array. In the original script a quantization a performed on this step as well, but I opted to use quantization procedure from Tensorflow Lite for Microcontrollers example instead.<\/p>\n\n\n\n<p><strong>Inference on MFCC features<\/strong><\/p>\n\n\n\n<p>Once all audio within one sample (3 seconds) is processed and converted to MFCC features we convert the whole MFCC feature array from FLOAT32 to INT8 values and feed it to the neural network. TensorFlow Lite for Microcontrollers initialization and inference process was already described in one of my earlier articles, so I won&#8217;t repeat it here.<\/p>\n\n\n\n<p>Before you compile the sketch make sure you have all the necessary libraries installed and Seeed SAMD boards definitions are at least1.8.2 version &#8211; that is very important for TensorFlow Lite library to compile without errors. Compile and upload the sketch &#8211; if you have DEBUG parameter set to false, code will start running immediately and all you need to do is to press C button on top of the Wio Terminal and say on of the sentences from the dataset. The results will be displayed both on the screen and output to Serial monitor if Wio Terminal is connected to computer.<\/p>\n\n\n\n<p>While this course is based on Wio Terminal, since it is very suitable for exploring Embedded Machine Learning, it is definitely possible to implement it on other devices. The easiest would be to port the code to other Cortex M4F MCU, such as Nano33 BLE Sense &#8211; that would only require adjusting for a different microphone. Porting to other ARM MCUs should be fairly trivial too. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignright size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-13.png\" alt=\"\" class=\"wp-image-53749\" width=\"462\" height=\"376\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-13.png 973w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-13-300x245.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-13-768x626.png 768w\" sizes=\"(max-width: 462px) 100vw, 462px\" \/><\/figure><\/div>\n\n\n\n<p>Porting to other architectures, e.g. ESP32 or K210 or others would require re-implementing MFCC calculations, since they use ARM specific functions from CMSIS-DSP.<\/p>\n\n\n\n<p>There are multiple improvements that can be made to basic neural network architectures in the project. These improvements are:<\/p>\n\n\n\n<p>&#8211; model pre-training<br>&#8211; seq2seq, LSTM, attention<br>&#8211; trainable filters<br>&#8211; AutoML, synthetic data<\/p>\n\n\n\n<p>Have a look at my TinyML talk on this topic to find out more about this and find links to the papers!<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe title=\"tinyML Talks: Speech-to-intent model deployment to low-power low-footprint devices\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/YmJrr1D191k?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>I encourage you to fork the code repository, try training on your own dataset and perhaps try implementing more advanced architectures or model training techniques. If you do, don\u2019t hesitate to give me a shout out here or make a PR on Github!<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-12.png\" alt=\"\" class=\"wp-image-53747\" width=\"491\" height=\"457\" srcset=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-12.png 830w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-12-300x279.png 300w, https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/image-12-768x714.png 768w\" sizes=\"(max-width: 491px) 100vw, 491px\" \/><\/figure><\/div>\n\n\n\n<p> Hope this article was useful for you and stay tuned for the last installment of TinyML course series!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A traditional approach to using speech for device control\/user request fulfillment is first, to transcribe<\/p>\n","protected":false},"author":3505,"featured_media":53756,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","_price":"","_stock":"","_tribe_ticket_header":"","_tribe_default_ticket_provider":"","_tribe_ticket_capacity":"0","_ticket_start_date":"","_ticket_end_date":"","_tribe_ticket_show_description":"","_tribe_ticket_show_not_going":false,"_tribe_ticket_use_global_stock":"","_tribe_ticket_global_stock_level":"","_global_stock_mode":"","_global_stock_cap":"","_tribe_rsvp_for_event":"","_tribe_ticket_going_count":"","_tribe_ticket_not_going_count":"","_tribe_tickets_list":"[]","_tribe_ticket_has_attendee_info_fields":false,"iawp_total_views":0,"footnotes":""},"categories":[1],"tags":[6,1355,673,3760,3171,3003,3678],"class_list":["post-53572","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news","tag-arduino","tag-artificial-intelligence","tag-speech-recognition","tag-tiny-machine-learning","tag-tinyml","tag-wio-terminal","tag-wio-terminal-project"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.0 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Learn TinyML using Wio Terminal and Arduino IDE #6 Speech recognition on MCU - Speech-to-Intent - Latest News from Seeed Studio<\/title>\n<meta name=\"description\" content=\"In this article I will share techniques to train a specific domain speech-to-intent model and deploy it to Cortex M4F based development board with built-in microphone, Wio Terminal from Seeed Studio.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Learn TinyML using Wio Terminal and Arduino IDE #6 Speech recognition on MCU - Speech-to-Intent - Latest News from Seeed Studio\" \/>\n<meta property=\"og:description\" content=\"In this article I will share techniques to train a specific domain speech-to-intent model and deploy it to Cortex M4F based development board with built-in microphone, Wio Terminal from Seeed Studio.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/\" \/>\n<meta property=\"og:site_name\" content=\"Latest News from Seeed Studio\" \/>\n<meta property=\"article:published_time\" content=\"2021-10-09T18:56:15+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-11-19T16:41:56+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"819\" \/>\n\t<meta property=\"og:image:height\" content=\"720\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Dmitry Maslov\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Dmitry Maslov\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/\",\"url\":\"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/\",\"name\":\"Learn TinyML using Wio Terminal and Arduino IDE #6 Speech recognition on MCU - Speech-to-Intent - Latest News from Seeed Studio\",\"isPartOf\":{\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg\",\"datePublished\":\"2021-10-09T18:56:15+00:00\",\"dateModified\":\"2021-11-19T16:41:56+00:00\",\"author\":{\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/be44021cef50367de429a4d5f613ed2f\"},\"description\":\"In this article I will share techniques to train a specific domain speech-to-intent model and deploy it to Cortex M4F based development board with built-in microphone, Wio Terminal from Seeed Studio.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/#primaryimage\",\"url\":\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg\",\"contentUrl\":\"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg\",\"width\":819,\"height\":720},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.seeedstudio.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Learn TinyML using Wio Terminal and Arduino IDE #6 Speech recognition on MCU &#8211; Speech-to-Intent\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/#website\",\"url\":\"https:\/\/www.seeedstudio.com\/blog\/\",\"name\":\"Latest News from Seeed Studio\",\"description\":\"Emerging IoT, AI and Autonomous Applications on the Edge\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.seeedstudio.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/be44021cef50367de429a4d5f613ed2f\",\"name\":\"Dmitry Maslov\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b60714970fdc7dfa4a5d9915477bdd24?s=96&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b60714970fdc7dfa4a5d9915477bdd24?s=96&r=g\",\"caption\":\"Dmitry Maslov\"},\"url\":\"https:\/\/www.seeedstudio.com\/blog\/author\/dmitry-maslov\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Learn TinyML using Wio Terminal and Arduino IDE #6 Speech recognition on MCU - Speech-to-Intent - Latest News from Seeed Studio","description":"In this article I will share techniques to train a specific domain speech-to-intent model and deploy it to Cortex M4F based development board with built-in microphone, Wio Terminal from Seeed Studio.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/","og_locale":"en_US","og_type":"article","og_title":"Learn TinyML using Wio Terminal and Arduino IDE #6 Speech recognition on MCU - Speech-to-Intent - Latest News from Seeed Studio","og_description":"In this article I will share techniques to train a specific domain speech-to-intent model and deploy it to Cortex M4F based development board with built-in microphone, Wio Terminal from Seeed Studio.","og_url":"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/","og_site_name":"Latest News from Seeed Studio","article_published_time":"2021-10-09T18:56:15+00:00","article_modified_time":"2021-11-19T16:41:56+00:00","og_image":[{"width":819,"height":720,"url":"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg","type":"image\/jpeg"}],"author":"Dmitry Maslov","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Dmitry Maslov","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/","url":"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/","name":"Learn TinyML using Wio Terminal and Arduino IDE #6 Speech recognition on MCU - Speech-to-Intent - Latest News from Seeed Studio","isPartOf":{"@id":"https:\/\/www.seeedstudio.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/#primaryimage"},"image":{"@id":"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/#primaryimage"},"thumbnailUrl":"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg","datePublished":"2021-10-09T18:56:15+00:00","dateModified":"2021-11-19T16:41:56+00:00","author":{"@id":"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/be44021cef50367de429a4d5f613ed2f"},"description":"In this article I will share techniques to train a specific domain speech-to-intent model and deploy it to Cortex M4F based development board with built-in microphone, Wio Terminal from Seeed Studio.","breadcrumb":{"@id":"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/#primaryimage","url":"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg","contentUrl":"https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg","width":819,"height":720},{"@type":"BreadcrumbList","@id":"https:\/\/www.seeedstudio.com\/blog\/2021\/10\/10\/learn-tinyml-using-wio-terminal-and-arduino-ide-6-speech-recognition-on-mcu-speech-to-intent\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.seeedstudio.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Learn TinyML using Wio Terminal and Arduino IDE #6 Speech recognition on MCU &#8211; Speech-to-Intent"}]},{"@type":"WebSite","@id":"https:\/\/www.seeedstudio.com\/blog\/#website","url":"https:\/\/www.seeedstudio.com\/blog\/","name":"Latest News from Seeed Studio","description":"Emerging IoT, AI and Autonomous Applications on the Edge","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.seeedstudio.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/be44021cef50367de429a4d5f613ed2f","name":"Dmitry Maslov","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.seeedstudio.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b60714970fdc7dfa4a5d9915477bdd24?s=96&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b60714970fdc7dfa4a5d9915477bdd24?s=96&r=g","caption":"Dmitry Maslov"},"url":"https:\/\/www.seeedstudio.com\/blog\/author\/dmitry-maslov\/"}]}},"modified_by":"Elaine Wu","views":6646,"featured_image_urls":{"full":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg",819,720,false],"thumbnail":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847-80x80.jpg",80,80,true],"medium":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847-300x264.jpg",300,264,true],"medium_large":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847-768x675.jpg",640,563,true],"large":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-1030x579.jpg",640,360,true],"1536x1536":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg",819,720,false],"2048x2048":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg",819,720,false],"visody_icon":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg",32,28,false],"magazine-7-slider-full":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847.jpg",819,720,false],"magazine-7-slider-center":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-936x720.jpg",936,720,true],"magazine-7-featured":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-1024x576.jpg",1024,576,true],"magazine-7-medium":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847-720x380.jpg",720,380,true],"magazine-7-medium-square":["https:\/\/www.seeedstudio.com\/blog\/wp-content\/uploads\/2021\/10\/1-e1633805764847-675x450.jpg",675,450,true]},"author_info":{"display_name":"Dmitry Maslov","author_link":"https:\/\/www.seeedstudio.com\/blog\/author\/dmitry-maslov\/"},"category_info":"<a href=\"https:\/\/www.seeedstudio.com\/blog\/category\/news\/\" rel=\"category tag\">News<\/a>","tag_info":"News","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/posts\/53572","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/users\/3505"}],"replies":[{"embeddable":true,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/comments?post=53572"}],"version-history":[{"count":11,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/posts\/53572\/revisions"}],"predecessor-version":[{"id":56780,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/posts\/53572\/revisions\/56780"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/media\/53756"}],"wp:attachment":[{"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/media?parent=53572"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/categories?post=53572"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.seeedstudio.com\/blog\/wp-json\/wp\/v2\/tags?post=53572"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}