Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387561422> ?p ?o ?g. }
Showing items 1 to 63 of
63
with 100 items per page.
- W4387561422 abstract "The problem of audio-to-text alignment has seen significant amount of research using complete supervision during training. However, this is typically not in the context of long audio recordings wherein the text being queried does not appear verbatim within the audio file. This work is a collaboration with a non-governmental organization called CARE India that collects long audio health surveys from young mothers residing in rural parts of Bihar, India. Given a question drawn from a questionnaire that is used to guide these surveys, we aim to locate where the question is asked within a long audio recording. This is of great value to African and Asian organizations that would otherwise have to painstakingly go through long and noisy audio recordings to locate questions (and answers) of interest. Our proposed framework, INDENT, uses a cross-attention-based model and prior information on the temporal ordering of sentences to learn speech embeddings that capture the semantics of the underlying spoken text. These learnt embeddings are used to retrieve the corresponding audio segment based on text queries at inference time. We empirically demonstrate the significant effectiveness (improvement in R-avg of about 3%) of our model over those obtained using text-based heuristics. We also show how noisy ASR, generated using state-of-the-art ASR models for Indian languages, yields better results when used in place of speech. INDENT, trained only on Hindi data is able to cater to all languages supported by the (semantically) shared text space. We illustrate this empirically on 11 Indic languages." @default.
- W4387561422 created "2023-10-12" @default.
- W4387561422 creator A5015532874 @default.
- W4387561422 creator A5029902566 @default.
- W4387561422 creator A5036738038 @default.
- W4387561422 creator A5052583252 @default.
- W4387561422 creator A5089606464 @default.
- W4387561422 creator A5092629424 @default.
- W4387561422 date "2023-10-10" @default.
- W4387561422 modified "2023-10-13" @default.
- W4387561422 title "Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration" @default.
- W4387561422 doi "https://doi.org/10.48550/arxiv.2310.06702" @default.
- W4387561422 hasPublicationYear "2023" @default.
- W4387561422 type Work @default.
- W4387561422 citedByCount "0" @default.
- W4387561422 crossrefType "posted-content" @default.
- W4387561422 hasAuthorship W4387561422A5015532874 @default.
- W4387561422 hasAuthorship W4387561422A5029902566 @default.
- W4387561422 hasAuthorship W4387561422A5036738038 @default.
- W4387561422 hasAuthorship W4387561422A5052583252 @default.
- W4387561422 hasAuthorship W4387561422A5089606464 @default.
- W4387561422 hasAuthorship W4387561422A5092629424 @default.
- W4387561422 hasBestOaLocation W43875614221 @default.
- W4387561422 hasConcept C111919701 @default.
- W4387561422 hasConcept C127705205 @default.
- W4387561422 hasConcept C151730666 @default.
- W4387561422 hasConcept C154945302 @default.
- W4387561422 hasConcept C204321447 @default.
- W4387561422 hasConcept C2776214188 @default.
- W4387561422 hasConcept C2778572836 @default.
- W4387561422 hasConcept C2779343474 @default.
- W4387561422 hasConcept C28490314 @default.
- W4387561422 hasConcept C41008148 @default.
- W4387561422 hasConcept C519982507 @default.
- W4387561422 hasConcept C86803240 @default.
- W4387561422 hasConceptScore W4387561422C111919701 @default.
- W4387561422 hasConceptScore W4387561422C127705205 @default.
- W4387561422 hasConceptScore W4387561422C151730666 @default.
- W4387561422 hasConceptScore W4387561422C154945302 @default.
- W4387561422 hasConceptScore W4387561422C204321447 @default.
- W4387561422 hasConceptScore W4387561422C2776214188 @default.
- W4387561422 hasConceptScore W4387561422C2778572836 @default.
- W4387561422 hasConceptScore W4387561422C2779343474 @default.
- W4387561422 hasConceptScore W4387561422C28490314 @default.
- W4387561422 hasConceptScore W4387561422C41008148 @default.
- W4387561422 hasConceptScore W4387561422C519982507 @default.
- W4387561422 hasConceptScore W4387561422C86803240 @default.
- W4387561422 hasLocation W43875614221 @default.
- W4387561422 hasOpenAccess W4387561422 @default.
- W4387561422 hasPrimaryLocation W43875614221 @default.
- W4387561422 hasRelatedWork W1557888283 @default.
- W4387561422 hasRelatedWork W2243342922 @default.
- W4387561422 hasRelatedWork W2265245145 @default.
- W4387561422 hasRelatedWork W2384553807 @default.
- W4387561422 hasRelatedWork W2738278463 @default.
- W4387561422 hasRelatedWork W3025300666 @default.
- W4387561422 hasRelatedWork W3089999372 @default.
- W4387561422 hasRelatedWork W3110423299 @default.
- W4387561422 hasRelatedWork W4299528489 @default.
- W4387561422 hasRelatedWork W4308939443 @default.
- W4387561422 isParatext "false" @default.
- W4387561422 isRetracted "false" @default.
- W4387561422 workType "article" @default.