{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_821.wav", "doc_id": "WTTtiRKFZI.seg_821", "src_text": "So I'll concentrate on the right one.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "越来越短，越来越短，越来越短", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_197.wav", "doc_id": "SLpqvupgvW.seg_197", "src_text": "For songs, we simply show a Google search link to each song and then ask the annotators to listen to at least some of each song, and read about each song.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对于某些歌曲，我们只是简单地显示每首歌的Google搜索链接然后请注音员至少听一下每首歌，并阅读每首歌的内容；", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_826.wav", "doc_id": "WTTtiRKFZI.seg_826", "src_text": "And talk to us about at the poster session.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "并在邮件会议上与我们讨论。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_600.wav", "doc_id": "oeooqChmKK.seg_600", "src_text": "Here is an example from our data set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是我们数据集的一个例子：", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_198.wav", "doc_id": "SLpqvupgvW.seg_198", "src_text": "Here's for example, the Google search result for the song \"Easy on Me.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，歌曲“EasyAn”在Google搜索结果中。", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_277.wav", "doc_id": "PIZEXUFLAR.seg_277", "src_text": "We follow the method from OFA and formulate all the tasks in a unified sequence-to-sequence format.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们遵循OFA的方法，并将所有任务都制成一个统一的序列-序列格式，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_434.wav", "doc_id": "hgIDlKNiFM.seg_434", "src_text": "We introduce the first biomedical model in French named DrBERT, which is based on RoBERTa and trained on NACHOS, which is a data set of medical crawled data from the web.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们介绍了第一个基于Roberta的生物医学模型，名为Dr.BERT，训练于NACHOS数据集，这是一个来自Web的医疗数据集。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_613.wav", "doc_id": "oeooqChmKK.seg_613", "src_text": "Second, there's a \"Background-Both\" setting, where background knowledge is available both at pretrain time and inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二，后台设置也很重要。背景知识在预训练时间和自由时间都可用。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_120.wav", "doc_id": "uZBWfYjYnf.seg_120", "src_text": "And we also released open source the code and models and simultaneous output to facilitate the reproducibility of our work.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们也发布了开源的代码、模型和同时输出的结果，以便于我们的工作的可复制性。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_174.wav", "doc_id": "SLpqvupgvW.seg_174", "src_text": "Our data set covers three different domains: music, books, and recipes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的数据集涵盖了三个不同的领域：音乐、书籍和服装。", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_482.wav", "doc_id": "SUkmfOTvGi.seg_482", "src_text": "And last but not least, we all know that the number of fine tuning examples directly affects the performance of a downstream task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，尽管如此，我们都知道精调例直接影响下游任务的性能；此外，", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_43.wav", "doc_id": "aQpIWggfCo.seg_43", "src_text": "We use large language models to generate a high-quality script dataset, CoScript, for constrained language planning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们是吗？", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_123.wav", "doc_id": "wLqFAuDnKa.seg_123", "src_text": "This is joint work with my colleagues from Google Translate.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "：：：：：：：：：：：：：：：：：：：：：", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_147.wav", "doc_id": "wLqFAuDnKa.seg_147", "src_text": "The dev data is much more curated, and with higher quality than the training data, that it's more noisy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "由于训练数据的质量更高，因此", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_110.wav", "doc_id": "uZBWfYjYnf.seg_110", "src_text": "This means that these three words will be emitted.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这意味着这三个词语将被丢弃。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_794.wav", "doc_id": "WTTtiRKFZI.seg_794", "src_text": "This is illustrated here.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这本书，昨天我读了", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_253.wav", "doc_id": "oYCKgTzTDy.seg_253", "src_text": "We found that, by comparing the green and orange line, we found the Zero-shot setting, the Cross-lingual transfer performance gap is significant, and then comparing the blue and orange lines, we found that with the Few-shot setting the transfer gap is shortened rapidly.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通过比较绿色和橙色线，我们发现对于零射线设置，交叉角度传输性能差异很大；而通过比较蓝色和橙色线，我们发现对于少数射线设置，传输差异会迅速缩小。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_159.wav", "doc_id": "SLpqvupgvW.seg_159", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "嗨，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_509.wav", "doc_id": "dvGkKzmIaN.seg_509", "src_text": "Therefore, it's necessary to protect the copyright of embedding as services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，必须保护嵌入式服务的版权。", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_848.wav", "doc_id": "GvEBWkLmuI.seg_848", "src_text": "So the Marked Words method draws upon the sociolinguistic concept of \"markedness\", which states that there is an unmarked default, and any group that differs from that default is linguistically marked.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "而不需要依赖任何具体的词汇表。因此，标记词法方法基于社会语言学概念的标记性，指出任何一个与该标记性不同的群体都是语言学上标记的。因此，", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_839.wav", "doc_id": "GvEBWkLmuI.seg_839", "src_text": "Immediately we see that, while the outputs aren't overtly negative or toxic in the traditional sense of these words, there are some interesting patterns.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们马上就会看到，这些外出词在传统意义上是阴性或有毒的。有一些有趣的模式", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_866.wav", "doc_id": "GvEBWkLmuI.seg_866", "src_text": "So for example, the words describing Latina women include things like \"vibrant\" and \"curvaceous\" which connect to a trope of tropicalism.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如描述拉丁美洲女性的词包括像振动和颤抖这样的词。对于亚洲女性而言，这些词与热带主义的特征密切相关", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_276.wav", "doc_id": "PIZEXUFLAR.seg_276", "src_text": "Here we show some example instances from our MultiInstruct dataset, to unify the processing of various input and output data types.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这里，我们展示了我们多语种数据集的几个示例实例。为了统一各种输入和输出数据类型的处理，", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_158.wav", "doc_id": "wLqFAuDnKa.seg_158", "src_text": "Thank you very much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "非常感谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_872.wav", "doc_id": "GvEBWkLmuI.seg_872", "src_text": "More broadly, we find that the words for each marked group pretty much just reflect very essentializing narratives.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "更广泛地说，我们发现每个标记组的词汇大致反映了非常基本的叙述。", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_404.wav", "doc_id": "WBLMIsdIrq.seg_404", "src_text": "We perform our analysis at three different levels.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们在三个不同的层次上进行分析：", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_338.wav", "doc_id": "dJGfOSFgZO.seg_338", "src_text": "We hope ABC-Eval can be leveraged by others in the field as a meaningful step in this direction.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们希望ABC-EVAL可以被其他领域的人利用，作为对这一方向的有意义一步，", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_714.wav", "doc_id": "oaOHnMCwad.seg_714", "src_text": "However, when models and data sets are aligned to specific populations, some are inevitably left behind.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，当模型和数据集与特定人口结合时，有些模型和数据集不可避免地被遗忘。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_804.wav", "doc_id": "WTTtiRKFZI.seg_804", "src_text": "That's why this sounds quite okay.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "短，因此这听起来很好（正", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_337.wav", "doc_id": "dJGfOSFgZO.seg_337", "src_text": "However, this is all the more reason to pursue reliable and precise evaluation metrics for comparing models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，但这更是为了继续使用可靠和精确的评估指标来比较模型。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_438.wav", "doc_id": "hgIDlKNiFM.seg_438", "src_text": "Since its release in 2018, BERT has become one of the most effective approach to solve natural language processing tasks and offers huge performance gains compared to historical static and contextualized methods such as Word2vec, fastText, or more.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "自2008年发布以来，BERT已成为解决自然语言处理任务最有效的方法之一，并提供了与历史上最有效的静态和语义方法（例如Word2Vec，FastText或WordEmbedding）相比的巨大性能优势。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_435.wav", "doc_id": "hgIDlKNiFM.seg_435", "src_text": "We also introduced a comparison of models with multiple pre-training settings and data sources.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还将介绍一个比较。我们还提供了多个Pluto设置和数据源的模型版本。", "score": 33.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_38.wav", "doc_id": "aQpIWggfCo.seg_38", "src_text": "We find CoScript shows high pluralism in the generated specific goals.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现coscript在生成的特定目标中表现出高概率。我们可", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_793.wav", "doc_id": "WTTtiRKFZI.seg_793", "src_text": "Because then it can be moved to the position after the adjunct.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因为这样可以将其移到加压器之后的位置。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_548.wav", "doc_id": "rISrKoXQCx.seg_548", "src_text": "So language models are trained on large scale web crawl data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "语言模型是通过大规模的网络爬取数据来训练的。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_298.wav", "doc_id": "PIZEXUFLAR.seg_298", "src_text": "We use one instruction versus 5 instruction.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们使用一条指令而不是五条指令，", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_732.wav", "doc_id": "XejEJmgUmE.seg_732", "src_text": "So in this work, we revisit the minimal pair paradigms.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在本工作中，我们重新审视最小对数对范式。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_747.wav", "doc_id": "XejEJmgUmE.seg_747", "src_text": "And we can also do the same by choosing sentences from a different subset or a different data set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还可以通过选择来自不同子集或不同数据集的句子来实现这一点，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_124.wav", "doc_id": "wLqFAuDnKa.seg_124", "src_text": "PaLM is a 540 billion-parameter large language model presented last year in 2022.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "FARM是2002年推出的500亿美元参数的大型语言模型，", "score": 32.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_203.wav", "doc_id": "SLpqvupgvW.seg_203", "src_text": "Here are some examples from our dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "以下是我们的数据集中的几个例子，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_858.wav", "doc_id": "GvEBWkLmuI.seg_858", "src_text": "So, really just only the positive or at least non-negative ones.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以真正的只有积极的或至少不是消极的", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_865.wav", "doc_id": "GvEBWkLmuI.seg_865", "src_text": "Furthermore, there's a lot of common tropes that are reflected in these words, especially for women of color.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "此外，这些词汇中还有很多常见的词语，特别是“女性”、“女性”等词语，所以", "score": 18.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_482.wav", "doc_id": "SUkmfOTvGi.seg_482", "src_text": "And last but not least, we all know that the number of fine tuning examples directly affects the performance of a downstream task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后但不是最不重要的是，我们都知道，精调样本的数量直接影响下游任务的性能：", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_592.wav", "doc_id": "oeooqChmKK.seg_592", "src_text": "Recent works in tasks like question answering show that models can use pretrained-time knowledge to solve the task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最近的研究，例如问答任务，表明模型可以利用预训练知识来解决任务。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_252.wav", "doc_id": "oYCKgTzTDy.seg_252", "src_text": "While the green line is the Monolingual Setting.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "绿色线是单语言设置。我们发现，", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_829.wav", "doc_id": "GvEBWkLmuI.seg_829", "src_text": "This work is done in collaboration with Esin Durmus and Dan Jurafsky.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "是与埃斯登德穆什和丹杰罗夫斯基合作完成的。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_172.wav", "doc_id": "SLpqvupgvW.seg_172", "src_text": "This is an important problem in conversational systems and also for benchmarking LLMs' entity understanding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对话系统中，也是用于基准测试的。LLMs的实体理解：我们不知道有一个公共数据集", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_119.wav", "doc_id": "uZBWfYjYnf.seg_119", "src_text": "If you want to discover more results, read our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果你想了解更多结果，请阅读我们的论文，", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_436.wav", "doc_id": "hgIDlKNiFM.seg_436", "src_text": "Then, we present our results on 11 biomedical and clinical downstream tasks in French.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后我们在法国展示了我们对11个生物医学和临床非主流任务的结果。", "score": 48.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_310.wav", "doc_id": "dJGfOSFgZO.seg_310", "src_text": "And today we'll tell you all about ABC-Eval, a new dimensional approach to evaluating conversational AI.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "今天我们将告诉你有关ABCEVL的所有内容，这是一种新的尺寸方法来评估对话性AI。", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_87.wav", "doc_id": "TVCREhgqUP.seg_87", "src_text": "We address this by inducing the alignment as part of the training.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们通过在训练中引入对齐来解决这个问题。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_163.wav", "doc_id": "SLpqvupgvW.seg_163", "src_text": "Consider this alternative question.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "考虑这个替代问题：", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_646.wav", "doc_id": "FLkGnzVRew.seg_646", "src_text": "We used dissonance-first approach, as seen in the flow chart here.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们使用了离散的第一种方法，如在这里的流图中所见。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_500.wav", "doc_id": "dvGkKzmIaN.seg_500", "src_text": "Hello everyone, my name is Jingwei Yi from the University of Science and Technology of China.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大家好，我叫金维炎，是中国科学技术大学的学生。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_262.wav", "doc_id": "oYCKgTzTDy.seg_262", "src_text": "Thanks for listening.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "感谢听取。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_417.wav", "doc_id": "WBLMIsdIrq.seg_417", "src_text": "We can then also note that different languages have different proportions of these discourse phenomena.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们也可以看到，不同的语言有不同的对话现象。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_626.wav", "doc_id": "oeooqChmKK.seg_626", "src_text": "Additional experiments with fictional knowledge indicated even the best performing models, cannot reliably integrate backward knowledge provided only at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "使用虚构知识进行的其他实验表明，即使是表现最好的模型也存在问题。不能可靠地整合仅在推理时提供的背景知识。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_401.wav", "doc_id": "WBLMIsdIrq.seg_401", "src_text": "We can think of words that have high P-CXMI as ones that require context for translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "(Word-level)oratthewordlevel,wecanthinkofwordsthathavehighPOS(e.g.,“I”)aswordsthatrequirecontextfortranslation.", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_192.wav", "doc_id": "SLpqvupgvW.seg_192", "src_text": "The third one is when they have similar descriptions on Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第三种情况是他们在维基百科上有相似的描述，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_305.wav", "doc_id": "PIZEXUFLAR.seg_305", "src_text": "So one more thing, we are collecting a much larger multi-model instruction tuning dataset with around 150 additional vision language tasks and we will release them.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "外一个，我们正在收集一个更大的多模态指令调节数据集，包含大约150个额外的视觉语言任务，我们将其发布。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_394.wav", "doc_id": "WBLMIsdIrq.seg_394", "src_text": "In this work, we try to answer these two questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这项工作中，我们试图回答这两个问题：", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_480.wav", "doc_id": "SUkmfOTvGi.seg_480", "src_text": "The second ingredient is the model size.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二个成分是模型大小。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_429.wav", "doc_id": "WBLMIsdIrq.seg_429", "src_text": "Thank you so much for your attention.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的关注，", "score": 99.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_499.wav", "doc_id": "SUkmfOTvGi.seg_499", "src_text": "Thank you so much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_139.wav", "doc_id": "wLqFAuDnKa.seg_139", "src_text": "So in this example here, where we perform translation from German into English, the German sentences, the source sentences, are marked with German colon and the English translations with English colon.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这个例子中，我们从德语翻译成英语时，源句子用德语语柱标记，而翻译后的句子用英语语柱标记。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_405.wav", "doc_id": "WBLMIsdIrq.seg_405", "src_text": "First, we look at part-of-speech tags that have high mean P-CXMI.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，我们看一下那些具有高PCSM的语音标签", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_651.wav", "doc_id": "FLkGnzVRew.seg_651", "src_text": "Given the low occurrence of dissonance and absence of any prior such data set, we are facing the problem of absolute rarity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "鉴于不一致的发生率较低，并且之前没有任何此类数据集，我们面临绝对稀疏性的问题。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_294.wav", "doc_id": "PIZEXUFLAR.seg_294", "src_text": "As we can see, instruction tuning can significantly improve OFA's performance on seen multi-modal tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "正如我们所看到的，指令调节可以显著改善OIS的性能在同一多模任务上。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_753.wav", "doc_id": "XejEJmgUmE.seg_753", "src_text": "So how does the model do?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以，模型是如何做到的？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_76.wav", "doc_id": "TVCREhgqUP.seg_76", "src_text": "For the first output position, we simply select one, as highlighted in red.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对于第一个输出位置，我们简单地选择一个以红色突出显示的“S”。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_508.wav", "doc_id": "dvGkKzmIaN.seg_508", "src_text": "However, recent works have shown that the attacker may steal the model through learning from the embedding and provide similar services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，最近的研究表明，攻击者可以通过学习嵌入的知识窃取模型，并提供类似的服务，", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_249.wav", "doc_id": "oYCKgTzTDy.seg_249", "src_text": "We also compare the cross-language performance gap.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还比较了跨语言性能差距。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_581.wav", "doc_id": "rISrKoXQCx.seg_581", "src_text": "It's like between Scylla and Charybdis.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "亚和克里比德斯之间的。", "score": 26.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_121.wav", "doc_id": "uZBWfYjYnf.seg_121", "src_text": "Thanks for your attention.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的关注。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_75.wav", "doc_id": "TVCREhgqUP.seg_75", "src_text": "We go from left to right over the output and determine which multiset token to put in every position.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们从左向右穿过输出，并确定哪个多集符号应放置在每个位置；", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_163.wav", "doc_id": "SLpqvupgvW.seg_163", "src_text": "Consider this alternative question.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "考虑一下这种替代问题：", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_428.wav", "doc_id": "WBLMIsdIrq.seg_428", "src_text": "To summarize, we perform a data-driven analysis across 14 language pairs to identify when translations require context and then we use our findings to build a benchmark for document-level machine translation which can help us identify which discourse phenomena models can handle well or not, and which translation systems are good at document-level translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "总而言之，我们将对14种语言进行数据驱动分析，以识别所需的翻译。然后我们使用我们的发现来建立文档级机器传输的基准，帮助我们识别哪些对话现象模型可以或不能处理，并哪些传输系统是文档级传输的好系统。", "score": 69.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_74.wav", "doc_id": "TVCREhgqUP.seg_74", "src_text": "Conceptually, our permutation model works roughly like this.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "从概念上讲，我们的置换模型大致如此。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_58.wav", "doc_id": "TVCREhgqUP.seg_58", "src_text": "Naive seq2seq models struggle with this kind of out-of-distribution generalization and often produce outputs that are detached from the input.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "天真的序列到序列模型难以应对这种分布式泛化，通常会产生与输入分离的输出，", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_684.wav", "doc_id": "oaOHnMCwad.seg_684", "src_text": "Positionality is simply the perspectives that people hold as a result of their demographics, identity, and life experiences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "只是基于他们的人口学、身份和生活经历的人。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_531.wav", "doc_id": "dvGkKzmIaN.seg_531", "src_text": "We first construct a back door and a benign data set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们构建一个后门和一个善意的数据集。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_288.wav", "doc_id": "PIZEXUFLAR.seg_288", "src_text": "In each experiment, we report the min and max performance and the standard deviation of the performance across all 5 experiments.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "值的性能，以及在所有五个实验中性能的标准偏离。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_345.wav", "doc_id": "gGbuDbHhyc.seg_345", "src_text": "In weak supervision, you do not manually label the data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "5.56.57.58.59.60.", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_484.wav", "doc_id": "SUkmfOTvGi.seg_484", "src_text": "To our next question, what causes the performance drop of some models, We had two hypothesis.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "接下来，我们的下一个问题是：什么原因导致了一些模型的性能下降？我们有两个假设，", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_158.wav", "doc_id": "wLqFAuDnKa.seg_158", "src_text": "Thank you very much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "非常感谢。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_389.wav", "doc_id": "WBLMIsdIrq.seg_389", "src_text": "But if the previous sentence was \"Could it be anything serious, doctor?\", then \"mole\" refers to a birthmark.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但如果前一句是“先生，事情会变得严重吗？”那么莫尔指的是出生证。", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_688.wav", "doc_id": "oaOHnMCwad.seg_688", "src_text": "And we're not trying to say that models themselves in data sets themselves have demographic identities and life experiences, but they do aggregate judgments and opinions of real people, and can thus represent certain positionalities over others.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们不会试图说说那些在身体和数据本身具有人口统计学身份和生活经历的人，但他们的个人判断和观点是不同的，并且可以代表某些立场。", "score": 59.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_96.wav", "doc_id": "uZBWfYjYnf.seg_96", "src_text": "Specific architectures are usually trained, introducing additional modules to be optimized.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "特定的建筑物通常经过训练，引入额外的模块以进行优化。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_588.wav", "doc_id": "rISrKoXQCx.seg_588", "src_text": "Thank you for your time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "死了——我已经死了——谢谢你，因为你有时间。", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_157.wav", "doc_id": "wLqFAuDnKa.seg_157", "src_text": "For more details, please come to the full presentation of the paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "图的描述。更多详细信息，请参阅论文的全文。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_734.wav", "doc_id": "XejEJmgUmE.seg_734", "src_text": "Which can also include grammaticality like BLiMP, SyntaxGym, or acceptability in terms of stereotypes such as CrowS pairs.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "接受性在刻板印象的术语中，如对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对", "score": 30.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_839.wav", "doc_id": "GvEBWkLmuI.seg_839", "src_text": "Immediately we see that, while the outputs aren't overtly negative or toxic in the traditional sense of these words, there are some interesting patterns.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "可以看到，虽然输出的结果在传统的意义上来说不是明显的负面或有毒的，但它们在某些方面是有害的。有趣的是，", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_295.wav", "doc_id": "PIZEXUFLAR.seg_295", "src_text": "Also, transfer learning from natural instruction dataset can benefit instruction tuning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，基于自然指令数据集的转移学习可以提高指令调节。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_454.wav", "doc_id": "hgIDlKNiFM.seg_454", "src_text": "However, we can observe that data from heterogeneous sources appear to be more versatile.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，我们可以从异源数据中获得数据，我们可以观察到异源数据似乎更具多样性，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_77.wav", "doc_id": "TVCREhgqUP.seg_77", "src_text": "Then we jump to the next multiset token, to determine the second token in the output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后我们跳到下一个多元集令牌，以确定输出中的第二个令牌。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_325.wav", "doc_id": "dJGfOSFgZO.seg_325", "src_text": "For each of the existing methods, we collected evaluations on eight of the most commonly measured aspects of dialogue, since this is the standard practice for evaluating chat models along multiple dimensions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对于现有的每种方法，我们收集了对对话的八个最常用的测量方面的评估，因为这是评估聊天模型的标准做法。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_515.wav", "doc_id": "dvGkKzmIaN.seg_515", "src_text": "Finally, the watermark needs to be transferable to the attacker's services during the model extraction process.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，水印需要在模型提取过程中传输到攻击者的服务。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_504.wav", "doc_id": "dvGkKzmIaN.seg_504", "src_text": "Let's first introduce the background about embedding as services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "先，让我们介绍一下嵌入式服务的背景。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_609.wav", "doc_id": "oeooqChmKK.seg_609", "src_text": "Generally, background knowledge is learned during the pretraining of large language models, while entity-specific knowledge is typically observed at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "一般来说，背景知识是在大型语言模型的预训练过程中学习的，而实体的特定知识通常是在推断时观察到的。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_558.wav", "doc_id": "rISrKoXQCx.seg_558", "src_text": "So some preliminary results demonstrate that first, language models do have varying political leanings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，初步结果表明，首先使用的语言模型具有不同的政治倾向。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_852.wav", "doc_id": "GvEBWkLmuI.seg_852", "src_text": "So in our method, we first designate what the unmarked and marked groups are, and then we compare the personas using the Fightin’ Words method, which is basically using weighted log-odds ratios to distinguish the top words for each marked group.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，在我们的方法中，我们首先确定未标记和标记的组，然后我们使用fightingwords方法比较人群，基本上使用加权logodds比率来区分每个标记组的顶词。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_158.wav", "doc_id": "wLqFAuDnKa.seg_158", "src_text": "Thank you very much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "非常感谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_370.wav", "doc_id": "gGbuDbHhyc.seg_370", "src_text": "However, if we allow to continue fine-tuning on the clean samples, then FTw performs equally well as other methods.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但是，如果我们允许继续微调，我们可以在FTW上实现更好的性能。在“clean”样本上。然后，FTW的性能与其他方法一样好，", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_836.wav", "doc_id": "GvEBWkLmuI.seg_836", "src_text": "Describe yourself.\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "描述自己”）来描述一个想象的个体，", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_29.wav", "doc_id": "aQpIWggfCo.seg_29", "src_text": "Our method greatly improves the planning ability both in semantic completeness and faithfulness to the constraint.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的方法在语义完整性和对约束的忠诚度方面大大提高了可感性。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_264.wav", "doc_id": "PIZEXUFLAR.seg_264", "src_text": "So with the advances in large language models, many works started to explore new learning paradigms of reusing pre-trained language models for different downstream tasks in a parameter and data-efficient way.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "随着大型语言模型的进步，许多研究开始探索重新利用预训练语言模型来参数化和数据高效的不同下游任务的新学习范式。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_380.wav", "doc_id": "gGbuDbHhyc.seg_380", "src_text": "You can find it via the QR code on this slide.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "您可以通过此页面上的“q”代码找到它", "score": 59.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_407.wav", "doc_id": "WBLMIsdIrq.seg_407", "src_text": "And this can be explained because English doesn't have dual pronouns, so you need context to determine if a pronoun is dual when translating into Arabic.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这可以解释为英语中没有双名词，所以你需要上下文来确定一个名词是否是双数的。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_627.wav", "doc_id": "oeooqChmKK.seg_627", "src_text": "To summarize the main takeaways of our paper, many coreference resolution models appear unable to reason over knowledge from different sources without task-specific training.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了总结本文的主要观点，我们可以说：许多协同过滤模型似乎无法在没有任务特定训练的情况下从不同来源推理超出知识。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_574.wav", "doc_id": "rISrKoXQCx.seg_574", "src_text": "Similar trends also happen for fake news detection, where we see that left-leaning language models are better at detecting misinformation from their opposite political leaning and vice versa.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "类似的趋势也发生在虚假新闻检测中，我们看到左倾语言模型比他们的对立面政治倾向和相反的模型更好地检测到误导信息。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_158.wav", "doc_id": "wLqFAuDnKa.seg_158", "src_text": "Thank you very much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "非常感谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_713.wav", "doc_id": "oaOHnMCwad.seg_713", "src_text": "So for GPT 4, in the social acceptability task, we find that it's most aligned to people with a college education or Graduate School education and we find the same for Dynahate where it's most aligned to people with a college education.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，在社会适应任务中，我们发现与拥有大学教育的人相比，我们与拥有大学教育或毕业于大学教育的人有更强的联系。我们发现同样的情况在丹尼黑，人们在这里更容易找到与大学教育相关的工作。", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_72.wav", "doc_id": "TVCREhgqUP.seg_72", "src_text": "We introduce a new method to predict the permutation that does not put any hard constraints on the possible permutations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们引入了一个新的方法来预测置换，这个方法不会对可能的置换施加任何硬约束。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_174.wav", "doc_id": "SLpqvupgvW.seg_174", "src_text": "Our data set covers three different domains: music, books, and recipes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的数据集覆盖了三个不同的领域：音乐、书籍和菜谱。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_527.wav", "doc_id": "dvGkKzmIaN.seg_527", "src_text": "The provided embedding is a weight summation of the target embedding and the original embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "提供的嵌入是目标嵌入和原始嵌入的加权和。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_570.wav", "doc_id": "rISrKoXQCx.seg_570", "src_text": "So last but not least, we evaluate language models with different political leanings on hate speech detection and fake news detection to NLP applications that often involve language models and could have very significant implications.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后但并非不重要，我们评估了不同政治倾向的语言模型，包括语音检测和虚假新闻检测，用于NLB应用程序，涉及语言模型，并且可能具有非常明显的影响。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_755.wav", "doc_id": "XejEJmgUmE.seg_755", "src_text": "We increase the context length toward up to 1024 for to max out OPT and GPT 2 models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们将联系长度延长到2002年，以最大限度地提高OBT和GPT模型的", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_871.wav", "doc_id": "GvEBWkLmuI.seg_871", "src_text": "So rather than actually working towards changing those obstacles, it puts pressure on those people to overcome them, which leads to a very negative health outcomes for these people, among other harms.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "常顽强和强大。实际上，这些操作会对这些人造成压力，使他们无法承受，这会导致这些人中其他疾病的负面健康结果。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_54.wav", "doc_id": "TVCREhgqUP.seg_54", "src_text": "And \"Mary knew that the girl slept.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "玛丽知道女孩睡着了。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_765.wav", "doc_id": "XejEJmgUmE.seg_765", "src_text": "Basically, we find that the models are sensitive to the perturbed sentences in similar ways.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "面是有用的。基本上，我们发现模型对扰乱的句子敏感性类似：", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_354.wav", "doc_id": "gGbuDbHhyc.seg_354", "src_text": "The aforementioned doubt is asked to ask three research questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "上述疑问使我们提出三项研究问题：", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_764.wav", "doc_id": "XejEJmgUmE.seg_764", "src_text": "And after doing like several of these perturbations, we find that none of these noises are actually making the model like change its course in terms of how it shows us the MPP judgement print.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "行像许多这些破坏后。我们发现这些噪音并没有实际上改变模型在如何向我们展示MPP判断趋势方", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_467.wav", "doc_id": "SUkmfOTvGi.seg_467", "src_text": "We observe that models have been used in CoNLL-2003 to develop NER for almost 20 years and this naturally raises several problems.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们观察到，模型已经在2003年开始使用Kornel的Ner来开发Ner。这自然会引发一些问题。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_527.wav", "doc_id": "dvGkKzmIaN.seg_527", "src_text": "The provided embedding is a weight summation of the target embedding and the original embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "提供的嵌入是目标嵌入和原始嵌入的加权和。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_778.wav", "doc_id": "WTTtiRKFZI.seg_778", "src_text": "They single out one of the conjuncts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "昨天我读了这本书，昨天我读了这本书，昨天我读了", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_670.wav", "doc_id": "FLkGnzVRew.seg_670", "src_text": "We also find that iterative update is useful for transfer learning from a different domain, whereas in domain active annotations benefit from cumulative update.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还发现迭代更新对于从不同域的转移学习是有用的，而域内的活跃注释则从累积更新中受益。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_326.wav", "doc_id": "dJGfOSFgZO.seg_326", "src_text": "From our analysis of these evaluation results, we found that ABC-Eval behavior labels are overall more reliable than labels collected by existing methods, as measured by inter-annotator agreement on 100 doubly-labeled conversations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "从这些评估结果的分析中，我们发现ABCB行为标签总体上比通过现有方法收集的标签更可靠，因为通过一百次双语对话的自动化协议来衡量的标签。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_66.wav", "doc_id": "TVCREhgqUP.seg_66", "src_text": "In this paper, we don't use trees and introduce a neural seq2seq model that directly models the correspondences between fragments of the input and fragments of the output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这篇论文中，我们没有使用树结构，而是引入了一个神经序列到序列模型，它直接建模了输入和输出片段之间的对应关系。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_185.wav", "doc_id": "SLpqvupgvW.seg_185", "src_text": "We always use a simple template.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们总是使用一个简单的模板：", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_104.wav", "doc_id": "uZBWfYjYnf.seg_104", "src_text": "That is the cross-attention mechanism, and you can see an example on the right.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这就是交叉注意力机制，右边有一个例子。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_596.wav", "doc_id": "oeooqChmKK.seg_596", "src_text": "Therefore, successful models for knowledge-intensive NLU tasks require the ability to integrate and use both pretrain-time and inference-time knowledge.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，成功的知识密集型NLU任务模型需要能够整合和使用预训练时间和推断时间的知识。在这项", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_551.wav", "doc_id": "rISrKoXQCx.seg_551", "src_text": "This has created a mixed blessing for language model applications.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这使得语言模型应用程序在", "score": 48.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_491.wav", "doc_id": "SUkmfOTvGi.seg_491", "src_text": "For temporal drift, we did an experiment to retrain or continue to pre-train some models with more recent data and we found that the performance degrades with larger temporal gap and this confirms our hypothesis that the main cause of the performance drop is temporal drift.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对于时间漂移，我们进行了一项实验，以重新训练或继续使用最新数据预训练一些模型，并发现性能随着时间间隔的增加而降低。这证实了我们的假设，即表演下降的主要原因是时间漂移。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_372.wav", "doc_id": "gGbuDbHhyc.seg_372", "src_text": "To summarize, we showed that recent WSL approaches require clean, manually annotated samples for them to work properly.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "总之，我们展示了最近的WSL方法需要清洁的手动注释样本才能正常工作；", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_213.wav", "doc_id": "SLpqvupgvW.seg_213", "src_text": "Here is a link to our dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这里是我们的数据集，感", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_605.wav", "doc_id": "oeooqChmKK.seg_605", "src_text": "The task here is to identify the correct entity that the pronoun \"he\" refers to, which in this case is Servin.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这里的任务是识别他所指的正确实体，这在这种情况下是仆人。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_121.wav", "doc_id": "uZBWfYjYnf.seg_121", "src_text": "Thanks for your attention.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的关注。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_512.wav", "doc_id": "dvGkKzmIaN.seg_512", "src_text": "First the method should be applicable to embedding as services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，该方法应适用于嵌入式服务；", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_238.wav", "doc_id": "oYCKgTzTDy.seg_238", "src_text": "And we also consider Cross-lingual Zero-shot and Few-shot transfer.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还考虑了跨语言零冲击波和视觉传输，即", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_106.wav", "doc_id": "uZBWfYjYnf.seg_106", "src_text": "A word is emitted if the attention is not concentrated, that is, its sum is below a certain threshold alpha towards the last lambda speech frames, meaning that the received information is enough stable.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果注意力不集中，即是说总和低于某个阈值α，向最后的lamda语音框架，意味着接收到的信息", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_442.wav", "doc_id": "hgIDlKNiFM.seg_442", "src_text": "So we ask ourselves a question about what is the most appropriate data sources for a wide range of usage and those crawled data are good substitution for clinical data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们问自己一个问题：什么样的数据源适合广泛的使用，并且这些原始数据是临床数据的良好替代品。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_36.wav", "doc_id": "aQpIWggfCo.seg_36", "src_text": "To ensure the quality of the validation and test set, we ask crowd-sourced workers to find and revise the incorrect samples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "以确保验证和测试的质量，我们要求。Claude鼓励工人最终修正错误的样本。", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_748.wav", "doc_id": "XejEJmgUmE.seg_748", "src_text": "So that is what we call as the mismatch scenario.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这就是我们所说的不匹配场景。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_451.wav", "doc_id": "hgIDlKNiFM.seg_451", "src_text": "To evaluate our seven models, we gather data for public and private downstream tasks such as named entity recognition, classification, part-of-speech tagging, and question answering.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了评估我们的7个模型，我们收集了多个公共和私有任务，例如命名实体识别、分类、参与等。与此同时，一个基于BERT的模型被训练来回答问题。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_494.wav", "doc_id": "SUkmfOTvGi.seg_494", "src_text": "At the same time, we also found that the performance drop here is caused by temporal drift and kind of surprisingly, it is not caused by adaptive overfitting even though CoNLL-2003 has been used for over 20 years.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "同时，我们还发现这里的性能下降是由时间漂移引起的，令人惊讶的是，这并不是由适应性过度适应引起的。尽管“corneltwothousandandthree”已经使用了二十多年了。", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_538.wav", "doc_id": "dvGkKzmIaN.seg_538", "src_text": "We assume the provider apply wiki text data set to count word frequency.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们假设提供者应用wiktext数据集来计算单词频率。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_206.wav", "doc_id": "SLpqvupgvW.seg_206", "src_text": "Results with T5 XL model are summarized below.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "结果使用T5X大型模型进行计算并进行摘要。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_542.wav", "doc_id": "dvGkKzmIaN.seg_542", "src_text": "As shown in the figures, it's hard to distinguish between, the backdoor embeddings and normal embeddings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如图所示，很难区分向量嵌入和普通嵌入。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_340.wav", "doc_id": "dJGfOSFgZO.seg_340", "src_text": "Thank you for watching.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的观看。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_761.wav", "doc_id": "XejEJmgUmE.seg_761", "src_text": "Now this and this is very large like this effect, increases throughout the context length and this would probably affect like newer language models which has large context window.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在，这个效果非常大。在整个文本长度上进行，这可能会影响像具有大上下文窗口的新语言模型这样的模型。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_811.wav", "doc_id": "WTTtiRKFZI.seg_811", "src_text": "So when the difference between the lengths of the two conjuncts grows, the shorter conjunct prefers to be the first one, stronger, right?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以，当两个孔径之间的长度差异越大时，较短的孔径更倾向于是较强的，所以比例是左", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_330.wav", "doc_id": "dJGfOSFgZO.seg_330", "src_text": "You can see how the combination of all ABC-Eval metrics explains over 25% of conversation quality, and as you remove the metrics one at a time, most of them result in losing a decent amount of information about the quality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "可以看到所有ABCEVAL指标的组合如何揭示超过25%的对话质量，并且当您一遍遍地删除指标时，大多数指标都会导致您失去有关质量的信息。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_163.wav", "doc_id": "SLpqvupgvW.seg_163", "src_text": "Consider this alternative question.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "考虑这个替代问题：", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_760.wav", "doc_id": "XejEJmgUmE.seg_760", "src_text": "But when we match the structure, that is when we choose the sentences from the same phenomena in BLiMP or SyntaxGym, we see a massive increase or a massive decrease of the MPP judgement for the model, depending on whether the chosen prefix is acceptable or unacceptable.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "当您添加可接受的前缀或不可接受的前缀时。但当我们匹配结构时，这就是我们从同一个现象中选择句子时的选择。我们看到MP的判断对模型的判断有大幅增加或大幅下降，取决于所选的前缀是否可接受或不可接受。现", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_47.wav", "doc_id": "TVCREhgqUP.seg_47", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你好，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_210.wav", "doc_id": "SLpqvupgvW.seg_210", "src_text": "For example, when the language model retrieves the background knowledge.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，当语言模型恢复背景知识时。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_744.wav", "doc_id": "XejEJmgUmE.seg_744", "src_text": "And what we do is that to recreate like longer sequences and which are acceptable and which has the same matching of the grammatical structure.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的目的是重新创造像更长序列这样的序列，它们是可接受的，并且具有相同的语法结构，", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_562.wav", "doc_id": "rISrKoXQCx.seg_562", "src_text": "So we could conduct a controlled experiment by further pretraining language model checkpoints on 6 different partisan corpora separated into news and social media, further divided into their political leaning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此我们可以通过进一步训练语言模型来进行控制实验。检查点，六个不同的党派和机构（分离在新闻和社交媒体中）进一步分离到政治倾向中：通过进一步预训练", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_447.wav", "doc_id": "hgIDlKNiFM.seg_447", "src_text": "In addition to this comparison, we introduced three models trained on continual pre-training to analyze the impact of pre-training strategy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "版本进行比较。另外，我们还将引入三种模型训练和持续预训练的策略，以分析预训练策略的影响。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_375.wav", "doc_id": "gGbuDbHhyc.seg_375", "src_text": "First, report the model selection criteria.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，报告模型选择标准；", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_429.wav", "doc_id": "WBLMIsdIrq.seg_429", "src_text": "Thank you so much for your attention.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的关注，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_56.wav", "doc_id": "TVCREhgqUP.seg_56", "src_text": "In contrast to standard machine learning evaluation, the test set does not come from the same distribution but contains structurally unseen logical forms.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "与标准的机器学习评估相反，这个测试集并不是来自同样的分布，但包含了结构上未见过的逻辑形式。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_449.wav", "doc_id": "hgIDlKNiFM.seg_449", "src_text": "Another also based on CamemBERT, but trained this time on the 4 GB of clinical notes and finally, one based on English biomedical model PubMedBERT, and trained on 4 GB of set of NACHOS.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "另一个也以卡曼伯为基础，但这次在克林肯的四千克上训练。最后，基于一个英国生物医学模型，伯梅特出生并在四个自然体系上训练，", "score": 29.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_681.wav", "doc_id": "oaOHnMCwad.seg_681", "src_text": "Where prospective AP is really not as sensitive to offensive terms that are more common in Indian contexts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "性词汇的敏感度并不是那么高，而在印度更常见的场合中。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_591.wav", "doc_id": "oeooqChmKK.seg_591", "src_text": "Natural language understanding models draw on a variety of knowledge sources, such as knowledge contained in their parameters, usually acquired by a pretraining, and knowledge given in inputs at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "自然语言理解模型会从各种知识来源中获取知识，例如参数中包含的知识，通常是通过预训练获得的，以及在推理时输入的知识。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_876.wav", "doc_id": "GvEBWkLmuI.seg_876", "src_text": "And finally, there should really be increased transparency about bias mitigation methods, because for instance, like these positive stereotypes, we don't know if it's because there is some sort of weird overly-excessive value alignment going on, or maybe some other anti-stereotyping methods that are resulting in these pernicious patterns.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，应该增加偏差的透明度。因为例如这些积极的刻板印象我们不知道是因为因为有某种奇怪的原因。过度的价值对齐正在发生，或者可能是其他的，像反刻板印象方法，导致这些有害的模式", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_45.wav", "doc_id": "aQpIWggfCo.seg_45", "src_text": "Thanks for your time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的时间，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_662.wav", "doc_id": "FLkGnzVRew.seg_662", "src_text": "We compare this to the other state-of-the-art AL strategies that are commonly used in the community.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们将其与社区中常用的其他状态的艺术AL策略进行比较。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_672.wav", "doc_id": "FLkGnzVRew.seg_672", "src_text": "Feel free to get in touch with us if you have any questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果你有任何问题，请随时与我们联系。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_864.wav", "doc_id": "GvEBWkLmuI.seg_864", "src_text": "This contributes to a long legacy of discrimination and othering for these groups.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这为这些群体带来了长期的歧视和其他歧视。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_200.wav", "doc_id": "SLpqvupgvW.seg_200", "src_text": "For recipes, we additionally show their images, again from Wikipedia, so that the annotators know how they look like.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对于食谱，我们还显示它们来自维基百科的图片，以便注释者知道它们看起来如何。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_764.wav", "doc_id": "XejEJmgUmE.seg_764", "src_text": "And after doing like several of these perturbations, we find that none of these noises are actually making the model like change its course in terms of how it shows us the MPP judgement print.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但保留相关结构。然后，我们发现这些噪音实际上并没有改变模型的评估结果。它引用的数据在表明我们如何显示MP判断趋势方", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_582.wav", "doc_id": "rISrKoXQCx.seg_582", "src_text": "So if we do not sanitize political opinions in language model training data, the bias would propagate from pretraining data to language models to downstream tasks, ultimately creating fairness issues.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果我们没有在语言模型训练数据中标准化政治观点，那么偏见将从预训练数据传播到语言模型，最后传播到下游任务，最后导致公平性问题。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_776.wav", "doc_id": "WTTtiRKFZI.seg_776", "src_text": "So these two approaches are asymmetric.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这两种方法是对称的，", "score": 99.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_843.wav", "doc_id": "GvEBWkLmuI.seg_843", "src_text": "The first one is generating these personas.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个部分是生成这些人物。", "score": 99.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_801.wav", "doc_id": "WTTtiRKFZI.seg_801", "src_text": "So here we have a dependency from \"read\" to the adjunct of length 7 measured in words and from \"read\" to \"book\" of length 4, so together it's 11.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这两个结构中保持一致，所以我们有一个依赖关系从红色到长度的边界（7个单词）和从红色到长度的书（4个单词），总共11个单词。当你移动（swappi", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_132.wav", "doc_id": "wLqFAuDnKa.seg_132", "src_text": "Finally, we provide some recommendations for prompt selection strategies.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，我们提供了一些关于快速选择策略的建议。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_770.wav", "doc_id": "XejEJmgUmE.seg_770", "src_text": "Thank you for listening.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "感谢您的倾听。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_135.wav", "doc_id": "wLqFAuDnKa.seg_135", "src_text": "The difference observed is of more than one BLEURT points.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "516个，差异观察值超过1个模糊点。这", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_226.wav", "doc_id": "oYCKgTzTDy.seg_226", "src_text": "We provide a uniform data set XSemPLR for cross-lingual semantic parsing in multiple natural languages and meaning representations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们为多种自然语言和表示中的交叉语义解析提供了统一数据集示例。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_656.wav", "doc_id": "FLkGnzVRew.seg_656", "src_text": "Further, on iteratively fine-tuning on both tasks, we find that fine-tuning of CE tasks followed by further fine-tuning on debate yields a much better zero-shot performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "说，我们在这两个任务上进行迭代的微调，我们发现微调CE任务后再对辩论进行进一步的微调会得到更好的零损失性能，", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_459.wav", "doc_id": "hgIDlKNiFM.seg_459", "src_text": "Finally, as a conclusion our proper system offered better performance on nine of the 11 downstream tasks and surpassed globally the result of the generic model, here CamemBERT.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，作为结论，我们的系统在11个不需要主题的任务中表现更好，超过了全球的通用模型结果。", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_419.wav", "doc_id": "WBLMIsdIrq.seg_419", "src_text": "And finally, we use our benchmark as well as other metrics to evaluate different models on the document-level machine translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，我们使用我们的基准以及其他指标来评估不同模型在文档级机器翻译中的表现。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_413.wav", "doc_id": "WBLMIsdIrq.seg_413", "src_text": "And this allows us to identify phenomena that cannot really be captured by the word itself, but that's rather expressed in the sentence structure, such as ellipses resolution.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "许我们识别无法通过自身词语捕获的现象，但在中间结构中更明显，因此只需使用解释器", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_772.wav", "doc_id": "WTTtiRKFZI.seg_772", "src_text": "As you may know, there are different dependency structures assumed by different theories and corpus approaches.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "您可能知道，根据不同的理论和方法，依赖结构可以被不同的依赖结构所承认。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_170.wav", "doc_id": "SLpqvupgvW.seg_170", "src_text": "Or when the user wants to specify a preference.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "或者，当用户想要指定偏好时，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_736.wav", "doc_id": "XejEJmgUmE.seg_736", "src_text": "And then the hope is that the model, basically, puts more probability to the acceptable sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后希望模型基本上把可接受的句子给予更大的概率。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_691.wav", "doc_id": "oaOHnMCwad.seg_691", "src_text": "So to study data set and model positionality, we actually compare the annotations with real users with existing datasets and models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，为了研究数据集和模型位置性，我们实际上是将注释与现有数据集和模型中的真实用户进行比较。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_380.wav", "doc_id": "gGbuDbHhyc.seg_380", "src_text": "You can find it via the QR code on this slide.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "你可以通过此页面上的QR码找到它", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_384.wav", "doc_id": "WBLMIsdIrq.seg_384", "src_text": "A Data-driven, Multilingual Exploration\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "数据驱动的多语言探索》。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_774.wav", "doc_id": "WTTtiRKFZI.seg_774", "src_text": "So in this case, Lisa.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这种情况下是I萨伊戈尔·米", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_699.wav", "doc_id": "oaOHnMCwad.seg_699", "src_text": "In Live in the Wild is an online experimentation platform where we can recruit divers volunteers.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "LabintheWild是一个在线实验平台，我们可以招募各种志愿者。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_391.wav", "doc_id": "WBLMIsdIrq.seg_391", "src_text": "However, evaluating how well models can translate cases like this is pretty hard.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，评估模型如何对比类似这样的案例是相当困难的。", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_357.wav", "doc_id": "gGbuDbHhyc.seg_357", "src_text": "Finally, should we only use the clean samples for validation, or there are better ways to utilize them?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，我们是否应该只使用清洁样本进行验证，还是有更好的方法来利用它们？", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_867.wav", "doc_id": "GvEBWkLmuI.seg_867", "src_text": "For Asian women, the words are things like \"petite\" and \"delicate\" and \"silky\" which connects to a long history of Asian women being hyper-sexualized, seen as very docile and submissive, and so on.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对于亚洲女性来说，这些词语就像“小巧、细致和丝绸般的”。这与亚洲女性长期被性化、被视为非常柔弱、顺从和谦逊等的历史有关。", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_201.wav", "doc_id": "SLpqvupgvW.seg_201", "src_text": "Then, we asked the annotators to pick one of these entities, for example, here's the first one, and describe them using three to five indirect referring expressions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后，我们要求注释者选择其中一个实体（例如这里的第一个实体），并使用3到5个间接引用表达式来描述它们。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_128.wav", "doc_id": "wLqFAuDnKa.seg_128", "src_text": "We evaluated the transition capability of such models using the best practices of the MT community.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "使用MT社区的最佳实践的这种模型的翻译能力。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_747.wav", "doc_id": "XejEJmgUmE.seg_747", "src_text": "And we can also do the same by choosing sentences from a different subset or a different data set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们也可以通过从不同的子集或数据集选择句子来实现这一点，因", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_586.wav", "doc_id": "rISrKoXQCx.seg_586", "src_text": "Ok, great.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "好吧，", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_373.wav", "doc_id": "gGbuDbHhyc.seg_373", "src_text": "Their performance gain and practicality are heavily overestimated.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "他们的性能增益和实用性被严重夸大了。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_221.wav", "doc_id": "oYCKgTzTDy.seg_221", "src_text": "For instance, there are lots of coverage on certain natural languages.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，缺乏某些自然语言的覆盖面，例如", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_677.wav", "doc_id": "oaOHnMCwad.seg_677", "src_text": "So let's start off by imagining that you're working for a newspaper and you're sifting through comments under your news article trying to remove toxic content.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以让我们假设你在为报纸工作，并在你的新闻文章中削除有毒内容。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_426.wav", "doc_id": "WBLMIsdIrq.seg_426", "src_text": "So this sort of suggests where we would need to see more progress for document-level translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，这种情况表明了词的上下文的重要性。在这里，我们需要看到更多的文档级翻译进展。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_431.wav", "doc_id": "hgIDlKNiFM.seg_431", "src_text": "Hi, I am Yanis Labrak and I will present you our works on \"DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical Domains.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "是吗？我是YannLavrac，我将向您介绍我们的工作：DoctorBert：robustBritishmodelinFrench.在生物医学和临床领域，语言建模在医疗保健中起着至关重要的作用。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_255.wav", "doc_id": "oYCKgTzTDy.seg_255", "src_text": "For example, Encoder-Decoder outperforms previous work or achieves comparable results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，编码器解码器优于传统的前馈工作，或者在目标自然语言上获得可比的结果。", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_64.wav", "doc_id": "TVCREhgqUP.seg_64", "src_text": "Typically, this involves considerable formalism-specific pre-processing of the logical forms, for example, to handle variable symbols.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "通常，这涉及大量的数据集和大量的计算资源。形式化特定预处理逻辑形式，例如处理可变符号。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_221.wav", "doc_id": "oYCKgTzTDy.seg_221", "src_text": "For instance, there are lots of coverage on certain natural languages.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如。他们对某些自然语言的覆盖不足，中", "score": 40.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_47.wav", "doc_id": "TVCREhgqUP.seg_47", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你好，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_644.wav", "doc_id": "FLkGnzVRew.seg_644", "src_text": "Finally, cognitive dissonance is important to understand personal cognitive styles of individuals and helps us understand decision making processes better.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，认知不一致性对于理解个体的个人认知风格很重要，并且有助于我们更好地理解决策过程。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_149.wav", "doc_id": "wLqFAuDnKa.seg_149", "src_text": "Nevertheless, specialized state-of-the-art systems have a substantial advantage over the PaLM translations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，专门的先进系统在对比泛化翻译时有着显著的优势。”(2)comesprett", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_594.wav", "doc_id": "oeooqChmKK.seg_594", "src_text": "For example, in the sentence, \"John saw the newly elected president on TV.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通常通过预训练获得的知识，通常通过预训练获得的知识，通常通过预训练获得的知识，通常通过预训练获得的知识，通常通过预训练最近在任务类似于问答的任务中所做的工作表明模型可以使用预训练的时间知识来解决任务。但自然语言理解通常需要提供在听觉时间也提供的知识。例如，在判决中，John看到了电视上的新当选总统。", "score": 38.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_522.wav", "doc_id": "dvGkKzmIaN.seg_522", "src_text": "Before these main steps, we first select a trigger set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "测之前，我们首先选择一个触发器集。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_491.wav", "doc_id": "SUkmfOTvGi.seg_491", "src_text": "For temporal drift, we did an experiment to retrain or continue to pre-train some models with more recent data and we found that the performance degrades with larger temporal gap and this confirms our hypothesis that the main cause of the performance drop is temporal drift.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了研究时间漂移，我们对一些模型进行了重新训练或继续预训练，使用最新的数据，我们发现随着时间差异的增大，性能会下降。这也证实了我们的假设，即性能下降的主要原因是时间漂移。。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_852.wav", "doc_id": "GvEBWkLmuI.seg_852", "src_text": "So in our method, we first designate what the unmarked and marked groups are, and then we compare the personas using the Fightin’ Words method, which is basically using weighted log-odds ratios to distinguish the top words for each marked group.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们的方法首先将未标记和标记组标记为“未标记和标记组”。然后我们可以比较使用战斗词法的人，基本上使用加权的logod比率来区分每个市场组的顶词。", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_75.wav", "doc_id": "TVCREhgqUP.seg_75", "src_text": "We go from left to right over the output and determine which multiset token to put in every position.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们从左到右在输出上移动，并确定每个位置放置哪个多元集令牌。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_550.wav", "doc_id": "rISrKoXQCx.seg_550", "src_text": "According to a survey of the C4 Corpus, we can see that New York Times, Los Angeles Times, The Guardian, Huffington Post, etcetera are well covered in language model training data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "根据CIFAR的调查，我们可以看到《纽约时报》在预训练数据中占有很大的比例。（例如，洛杉矶时报，卫报，哈芬顿邮报等）在语言模型训练数据中得到了很好的覆盖。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_857.wav", "doc_id": "GvEBWkLmuI.seg_857", "src_text": "So, while the generated personas have much higher rates of the lexicon words, the human-written ones have a much wider distribution of words, while the stereotype words that are in the generated personas are really just the words \"tall\" and \"athletic\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "虽然生成的人物有词典中的词的比率要高得多。人类写的句子词汇分布更广泛，而生成的句子中的人物的刻板词汇实际上就是“高”和“运动”。", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_509.wav", "doc_id": "dvGkKzmIaN.seg_509", "src_text": "Therefore, it's necessary to protect the copyright of embedding as services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，必须保护嵌入服务的版权。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_283.wav", "doc_id": "PIZEXUFLAR.seg_283", "src_text": "In addition, we randomly sample 20 tasks from the test split of natural instructions as an unseen task for NLP.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "此外，我们随机从自然指令的测试分采样20个任务作为未见任务。所以", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_608.wav", "doc_id": "oeooqChmKK.seg_608", "src_text": "And second, background knowledge such as \"Judges decide cases in law courts.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "二，背景知识，如法官在民事法院做出案件的决定。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_708.wav", "doc_id": "oaOHnMCwad.seg_708", "src_text": "We find that there is positionality in NLP.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现NLP中存在位置性。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_587.wav", "doc_id": "rISrKoXQCx.seg_587", "src_text": "I think that's pretty much all I have for today.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我认为这很好，我想这很有趣。", "score": 42.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_826.wav", "doc_id": "WTTtiRKFZI.seg_826", "src_text": "And talk to us about at the poster session.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "文件，很抱歉，关于邮政会议的", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_13.wav", "doc_id": "aQpIWggfCo.seg_13", "src_text": "We sample 100 specific goals and evaluate the scripts generated from large language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们采样了一百个具体的目标，并评估了从语言模型生成的脚本。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_29.wav", "doc_id": "aQpIWggfCo.seg_29", "src_text": "Our method greatly improves the planning ability both in semantic completeness and faithfulness to the constraint.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，既在语义完整性方面，也在对约束的忠诚方面。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_41.wav", "doc_id": "aQpIWggfCo.seg_41", "src_text": "In summary, we establish the constrained language planning problem.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "总而言之，我们建立了受限语言规划问题；", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_824.wav", "doc_id": "WTTtiRKFZI.seg_824", "src_text": "And we show in the paper how this provides an argument against asymmetric structures of coordination, as these two, and for the symmetric structures, as these two.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短(1)提供了对协调结构的非对称性结构（如这两种）", "score": 20.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_21.wav", "doc_id": "aQpIWggfCo.seg_21", "src_text": "Thus, we adopt the idea of over-generate-then-filter to improve generation quality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们采用过生成然后过滤的想法来改善生成质量。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_704.wav", "doc_id": "oaOHnMCwad.seg_704", "src_text": "We then replicate a very similar setup for the toxicity and hate speech detection task, where they'll read an instance from Dynahate and write whether they think it's instance of hate speech.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后我们对毒性和热语检测任务进行了非常相似的设置，他们读取了来自丹尼的例子，并正确地判断了他们的热语例子。", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_869.wav", "doc_id": "GvEBWkLmuI.seg_869", "src_text": "This connects to an archetype that people have called the \"Strong Black Women\" archetype.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "人们称之为强黑女性架构，而在", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_875.wav", "doc_id": "GvEBWkLmuI.seg_875", "src_text": "We should also be using an intersectional lens to study biases and harms because there's a lot of things that might be overlooked if we don't do that.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们也应该使用交叉验证来研究偏差和偏差，因为如果我们不这样做，可能会忽略很多东西，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_653.wav", "doc_id": "FLkGnzVRew.seg_653", "src_text": "Since the initial model was not able to capture the dissonance class at all, we start the active learning process by transferring weights from closely related tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "由于最初的模型无法完全捕捉远处的类别，我们从将重量从紧密相关的任务转移到远处的任务开始了主动学习过程。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_510.wav", "doc_id": "dvGkKzmIaN.seg_510", "src_text": "To protect the copyright of embedding as services, one of the solutions is to embed a watermark in the provider service and detect whether another service contain the watermark.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了保护嵌入式服务的版权，一个解决方案是将水印嵌入服务提供商的服务，并检测是否有其他服务包含水印。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_313.wav", "doc_id": "dJGfOSFgZO.seg_313", "src_text": "The common practice is to use human evaluation, such as by asking human judges to select which of two conversations is better or to rate conversations given a Likert scale.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "常见的做法是使用人类评估，例如要求人类评判者选择哪个对话更好，或者给定一个评分尺度来评估对话。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_878.wav", "doc_id": "GvEBWkLmuI.seg_878", "src_text": "Thank you so much for listening.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "非常感谢你听我", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_586.wav", "doc_id": "rISrKoXQCx.seg_586", "src_text": "Ok, great.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "好的，", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_308.wav", "doc_id": "dJGfOSFgZO.seg_308", "src_text": "Hello, I'm James Finch.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你好，我是詹姆斯·芬奇，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_269.wav", "doc_id": "PIZEXUFLAR.seg_269", "src_text": "There exist more than 1600 language-only instruction tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "有超过1,000个单语指令，但", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_72.wav", "doc_id": "TVCREhgqUP.seg_72", "src_text": "We introduce a new method to predict the permutation that does not put any hard constraints on the possible permutations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们引入一种新方法来预测变换，这种方法不对可能的变换施加任何硬约束，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_311.wav", "doc_id": "dJGfOSFgZO.seg_311", "src_text": "This work was done by the Emory NLP Lab led by Professor Jinho Choi at Emory University and in collaboration with Amazon Alexa AI.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这项工作是由埃默里大学的埃默里NLP实验室负责完成的，由埃默里大学的教授吉诺·乔伊领导，并与亚马逊·亚历克西亚AI合作。所以说", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_824.wav", "doc_id": "WTTtiRKFZI.seg_824", "src_text": "And we show in the paper how this provides an argument against asymmetric structures of coordination, as these two, and for the symmetric structures, as these two.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们在论文中展示了如何利用这种方法来反对这种非对称的协调结构（即这两个）和这种非对称结构（即这两个）。", "score": 48.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_514.wav", "doc_id": "dvGkKzmIaN.seg_514", "src_text": "Third, the watermark should be covert enough to the attacker or the attacker can remove the watermark easily.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第三，水印应该被覆盖到攻击者身上，或者攻击者可以轻松移除水印。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_279.wav", "doc_id": "PIZEXUFLAR.seg_279", "src_text": "Ok, now I'm going to talk about multi-modal instruction tuning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "好的，现在我要谈谈多模数指令调谐。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_720.wav", "doc_id": "oaOHnMCwad.seg_720", "src_text": "And the other is to do NLP research with the lens of perspectivism.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "其次，在NLP研究中运用相对主义。", "score": 37.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_11.wav", "doc_id": "aQpIWggfCo.seg_11", "src_text": "Since no dataset of specific goals exists to support our study, we have to acquire these goals first.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "没有关于我们学习的具体目标。我们必须先获得这些目标，", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_651.wav", "doc_id": "FLkGnzVRew.seg_651", "src_text": "Given the low occurrence of dissonance and absence of any prior such data set, we are facing the problem of absolute rarity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "由于离散性和缺乏任何先前的这种数据集的低发生率，我们正面临绝对稀有性的问题。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_115.wav", "doc_id": "uZBWfYjYnf.seg_115", "src_text": "And we compare also with the state-of-the-art architecture specifically tailored for simultaneous pre-translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们与专门用于同时传译的艺术建筑进行比较。", "score": 38.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_285.wav", "doc_id": "PIZEXUFLAR.seg_285", "src_text": "During training, we mix all the instances for all the tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在训练过程中，我们将所有实例用于所有任务，", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_211.wav", "doc_id": "SLpqvupgvW.seg_211", "src_text": "If the language model has access only to entity names, then the accuracy is only 60%, so there's a lot of room for improvement.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果语言模型检索背景知识时，准确率在82%到87%之间，这更现实。语言模型只有对实体名称的访问，那么准确率只有60%，所以有很多空间来改进。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_365.wav", "doc_id": "gGbuDbHhyc.seg_365", "src_text": "But that's not the end of the story, because if we either way decide to access clean samples, then training on them directly will even achieve better performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但是，这并不是故事的结尾，因为如果我们无论如何决定访问清洁样本，那么在它们上直接进行训练甚至会取得更好的性能。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_728.wav", "doc_id": "XejEJmgUmE.seg_728", "src_text": "Hi, everyone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大家好，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_186.wav", "doc_id": "SLpqvupgvW.seg_186", "src_text": "Do you mean A or B?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "你是指A还是B？其中", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_819.wav", "doc_id": "WTTtiRKFZI.seg_819", "src_text": "However, when the governor is on the right, as here, \"laughed\" governs the coordination Ted and Ned, this effect disappears.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，当一个政府在右边（如她在左边）管理一个网络（Ted和Net），就会出现这种效果。因此，", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_439.wav", "doc_id": "hgIDlKNiFM.seg_439", "src_text": "Since then, this model has been adapted to many other languages, like in French with CamemBERT, and also in domains like biomedical with PubMedBERT and BioBERT and on clinical with ClinicalBERT, but mostly in English.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "“我们将继续努力。从那时起，这个模型已经被适应到许多其他语言，如法语（camembert）和生物医学领域（pamBERT和BioBERT）以及临床领域（clinicalBERT）等，但大多数是英语。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_112.wav", "doc_id": "uZBWfYjYnf.seg_112", "src_text": "So we want our curves to be as high as possible on this plot.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以我们希望我们的队伍在这个地带尽可能高。", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_33.wav", "doc_id": "aQpIWggfCo.seg_33", "src_text": "Thus, we follow the idea of symbolic knowledge distillation, to distil constrained language planning datasets from large language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们遵循符号知识蒸馏的想法，从大语言模型中蒸馏受限语言规划数据站。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_9.wav", "doc_id": "aQpIWggfCo.seg_9", "src_text": "A good planner should write scripts that are reasonable and faithful to constraints.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "一个好的规划者应该写出合理的脚本，并且遵守约束。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_649.wav", "doc_id": "FLkGnzVRew.seg_649", "src_text": "On collecting around 1,000 examples of discourse unit pairs, we ran training for an initial classifier trained only on 43 examples of dissonance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在收集一千个演讲单位的例子时，我们只在四十个例子中训练初级分类器，不会惊喜地发现", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_134.wav", "doc_id": "wLqFAuDnKa.seg_134", "src_text": "The majority of sentences 516 out of 1,000.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "大多数句子（", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_435.wav", "doc_id": "hgIDlKNiFM.seg_435", "src_text": "We also introduced a comparison of models with multiple pre-training settings and data sources.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还介绍了一个与多个普鲁托设置和数据源相比的模型，", "score": 43.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_36.wav", "doc_id": "aQpIWggfCo.seg_36", "src_text": "To ensure the quality of the validation and test set, we ask crowd-sourced workers to find and revise the incorrect samples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，以确保验证和测试站点的质量，我们要求来自云端的工作者进一步修订错误的示例。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_499.wav", "doc_id": "SUkmfOTvGi.seg_499", "src_text": "Thank you so much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "谢谢。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_719.wav", "doc_id": "oaOHnMCwad.seg_719", "src_text": "First one is keep a record of all relevant design choices throughout the research process.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，保持研究过程中所有相关设计选择的记录；", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_408.wav", "doc_id": "WBLMIsdIrq.seg_408", "src_text": "And similarly, we find that certain languages also require context when we want to choose the appropriate verb form.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "同样，我们发现某些语言也需要上下文来选择适当的动词形式。然后我们来看一下词汇", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_802.wav", "doc_id": "WTTtiRKFZI.seg_802", "src_text": "When you swap these two constituents, the sum of these two dependencies becomes 6.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "ng）这两个构成部分时，这些依赖关系中的某些会变成6个", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_222.wav", "doc_id": "oYCKgTzTDy.seg_222", "src_text": "But Chinese is missing and lack of coverage on certain meaning representation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，中文缺失。", "score": 32.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_208.wav", "doc_id": "SLpqvupgvW.seg_208", "src_text": "But this is not realistic.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但这不是现实的。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_198.wav", "doc_id": "SLpqvupgvW.seg_198", "src_text": "Here's for example, the Google search result for the song \"Easy on Me.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，Google搜索结果为EasyonMe。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_123.wav", "doc_id": "wLqFAuDnKa.seg_123", "src_text": "This is joint work with my colleagues from Google Translate.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是与我来自谷歌翻译的同事合作的作品。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_384.wav", "doc_id": "WBLMIsdIrq.seg_384", "src_text": "A Data-driven, Multilingual Exploration\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "数据驱动的多语言探索》。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_411.wav", "doc_id": "WBLMIsdIrq.seg_411", "src_text": "And similarly, we find that context is important to translate in the right formality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "同样，我们发现上下文有助于正确的形式转换。", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_525.wav", "doc_id": "dvGkKzmIaN.seg_525", "src_text": "In watermark injection, we first define a target embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在水印注入中，我们首先定义了目标嵌入：", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_694.wav", "doc_id": "oaOHnMCwad.seg_694", "src_text": "The first step is to re annotate data sets with diverse annotators.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一步是用多个注释器重新注释数据集，", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_605.wav", "doc_id": "oeooqChmKK.seg_605", "src_text": "The task here is to identify the correct entity that the pronoun \"he\" refers to, which in this case is Servin.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "任务是确定他所指代的正确实体，这里是塞文。一个给定", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_388.wav", "doc_id": "WBLMIsdIrq.seg_388", "src_text": "Well, if the previous sentence was \"Things could start to get dangerous if the ministers find out\", then \"mole\" refers to a spy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果前一句是“如果部长们知道的话，事情可能会变得危险的”，那么莫尔指的是间谍；", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_346.wav", "doc_id": "gGbuDbHhyc.seg_346", "src_text": "Instead, we label the data using weak labeling sources, such as simple heuristic rules, knowledge bases, or low-quality crowdsourcing, as illustrated in the figure on the right.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "相反，我们使用弱标签源（如简单的直觉规则，知识库或低质量云源）标记数据，如右边图所示。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_102.wav", "doc_id": "uZBWfYjYnf.seg_102", "src_text": "Use only one model for every latency regime and handle latency through specific parameters.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "每个延迟模式只使用一个模型，并通过特定参数来处理延迟；", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_234.wav", "doc_id": "oYCKgTzTDy.seg_234", "src_text": "We also test Monolingual Few-shot setting by training monolingual models with only 10% of training data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还通过只使用12%的训练数据来训练双语模型来测试双语虚拟场景。", "score": 66.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_227.wav", "doc_id": "oYCKgTzTDy.seg_227", "src_text": "It contains 9 datasets in various domains, 5 semantic parsing tasks, 8 meaning representations, and 22 natural languages in 15 language families.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "它包含90个各种领域的子集，5个语义解析任务，8个表示符号，22个自然语言在15个语言家庭中。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_831.wav", "doc_id": "GvEBWkLmuI.seg_831", "src_text": "However, these measures have various limitations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，这些措施有各种限制：", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_470.wav", "doc_id": "SUkmfOTvGi.seg_470", "src_text": "At the same time, if we do observe poor generalization, what causes the performance drop of these models?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "同时，如果我们观察到泛化能力差，导致这些模型性能下降的原因是什么？", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_304.wav", "doc_id": "PIZEXUFLAR.seg_304", "src_text": "We design a new metric called sensitivity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们设计了一个新的元学习敏感性。所以，另", "score": 41.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是模型架构。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_461.wav", "doc_id": "hgIDlKNiFM.seg_461", "src_text": "All the pre-trained model obtained from NACHOS are freely available on Hugging Face, and under the MIT license, and all the training scripts are on our GitHub repository.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "从NATASHA获得的预训练模型可在GitHub上免费下载，所有的训练脚本都在我们的GitHub仓库中。", "score": 42.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_472.wav", "doc_id": "SUkmfOTvGi.seg_472", "src_text": "This is a data set that we collected from Reuters News from 2020, and then annotated them with the same CoNLL-2003 annotation guidelines.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是我们从2002年收集的路透社新闻，然后用同样的Carnegie2003注释指南注释它们。", "score": 40.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_721.wav", "doc_id": "oaOHnMCwad.seg_721", "src_text": "Our third recommendation is to build specialised datasets and models within 4 specific communities.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的第三个建议是，在四个特定的社区中建立专门的数据集和模型。一个好的例子就是", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_868.wav", "doc_id": "GvEBWkLmuI.seg_868", "src_text": "And finally, for black women, we see that some of the top words are things like \"strong\" and \"resilient\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，对于黑人女性，我们看到一些顶级词汇是强壮和坚韧的。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_741.wav", "doc_id": "XejEJmgUmE.seg_741", "src_text": "So that is the approach.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这就是我们的方法。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_379.wav", "doc_id": "gGbuDbHhyc.seg_379", "src_text": "Finally, we have open-sourced our code.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，我们有了开源代码，", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_87.wav", "doc_id": "TVCREhgqUP.seg_87", "src_text": "We address this by inducing the alignment as part of the training.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们通过将对齐作为训练的一部分来解决这个问题。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_324.wav", "doc_id": "dJGfOSFgZO.seg_324", "src_text": "For comparison, we also evaluated these conversations using three existing methods: Likert ratings on the turn-level, Likert ratings on the dialogue-level, and dialogue-level pairwise comparisons.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了对比，我们还使用了三种现有的方法来评估这些对话。Lickert在转换级别上的评分，Lickert在对话级别上的评分，以及对话级别的配对比较。", "score": 74.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_841.wav", "doc_id": "GvEBWkLmuI.seg_841", "src_text": "And both of the women of color personas make references to ancestry while the white man persona has nothing of the sort.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "而且，彩色人物都提到了祖先，而白人人物却没有这种说法。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_751.wav", "doc_id": "XejEJmgUmE.seg_751", "src_text": "Finally, we can choose sentences from a completely unrelated domain such as Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最终，我们可以从一个完全无关的域（例如维基百科）中选择句子。因此，", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_292.wav", "doc_id": "PIZEXUFLAR.seg_292", "src_text": "So this measures the model's ability to consistently produce the same outputs for the same task regardless of the slight variation in the wording of the instruction.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "用于衡量模型在执行相同任务时产生相同输出的能力，尽管指令的词汇有轻微的变化。", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_186.wav", "doc_id": "SLpqvupgvW.seg_186", "src_text": "Do you mean A or B?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你是指A还是B？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_100.wav", "doc_id": "uZBWfYjYnf.seg_100", "src_text": "So what is our solution?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "那么，我们的解决方案是什么？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_14.wav", "doc_id": "aQpIWggfCo.seg_14", "src_text": "This table reports the overall accuracy of the results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这张表报告了结果的总体准确性；", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_551.wav", "doc_id": "rISrKoXQCx.seg_551", "src_text": "This has created a mixed blessing for language model applications.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这为语言模型应用程序创造了混合增益。因此，", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_848.wav", "doc_id": "GvEBWkLmuI.seg_848", "src_text": "So the Marked Words method draws upon the sociolinguistic concept of \"markedness\", which states that there is an unmarked default, and any group that differs from that default is linguistically marked.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以标记文字方法依靠社会语言学的概念，即市场是未标记的缺陷，并且任何偏离该缺陷的群体都是语言上的标记。因此，", "score": 43.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_590.wav", "doc_id": "oeooqChmKK.seg_590", "src_text": "This work is a collaboration between McGill University, Mila, and Microsoft Research.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这个工作是梅格尔大学和微软研究机构之间的合作。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_126.wav", "doc_id": "wLqFAuDnKa.seg_126", "src_text": "At the time of publication, it achieved state-of-the-art in hundreds of NLP tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "发布时，它在数百个NLP任务中实现了最先进的性能。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_237.wav", "doc_id": "oYCKgTzTDy.seg_237", "src_text": "And during inference we can use this model to translate German queries or Chinese queries, et cetera.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在婴儿期也可以使用该模型。翻译德语询问或中国询问或等等。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_741.wav", "doc_id": "XejEJmgUmE.seg_741", "src_text": "So that is the approach.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这就是我们的方法。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_54.wav", "doc_id": "TVCREhgqUP.seg_54", "src_text": "And \"Mary knew that the girl slept.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "玛丽知道女孩睡着了。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_555.wav", "doc_id": "rISrKoXQCx.seg_555", "src_text": "Secondly, how do language models with different political leanings actually perform on downstream tasks and whether that might result in fairness issues in NLP applications?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "其次，语言模型在不同政治倾向下实际上在下游任务中表现如何，并且是否存在政治偏见？这可能会导致在NLP应用中出现公平性问题，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_549.wav", "doc_id": "rISrKoXQCx.seg_549", "src_text": "Political news media are well covered in their pretraining data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "政治新闻媒体在预训练数据中得到很好的覆盖。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_58.wav", "doc_id": "TVCREhgqUP.seg_58", "src_text": "Naive seq2seq models struggle with this kind of out-of-distribution generalization and often produce outputs that are detached from the input.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "行测试。内存到内存模型会与这种外部分发和泛化的模式产生冲突，并且经常会产生与输入分离的输出。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_803.wav", "doc_id": "WTTtiRKFZI.seg_803", "src_text": "So instead of 11, 6 is much shorter.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "），而不是11（错误）更", "score": 30.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_297.wav", "doc_id": "PIZEXUFLAR.seg_297", "src_text": "So we also did one experiment.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们也做了一个实验，", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_396.wav", "doc_id": "WBLMIsdIrq.seg_396", "src_text": "And second, how well do models handle these cases?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "其次，模型如何处理这些情况？", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_309.wav", "doc_id": "dJGfOSFgZO.seg_309", "src_text": "And I'm Sarah Finch.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我是莎拉·芬奇；", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_877.wav", "doc_id": "GvEBWkLmuI.seg_877", "src_text": "We just really can't make any assumptions or really study that further, without more transparency.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "抗定型方法。我们只是真的不能做出任何假设，也不能在更大的透明度下进一步研究它。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_131.wav", "doc_id": "wLqFAuDnKa.seg_131", "src_text": "We use state-of-the-art, neural MT metrics, and additionally also show expert-based human evaluation results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们使用最先进的神经MT矩阵，并且还展示了基于专家的人类评估结果；", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_707.wav", "doc_id": "oaOHnMCwad.seg_707", "src_text": "So now we're better equipped to answer who do NLP datasets and models align with the most.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "现在我们可以回答，谁在NLp数据中将模型与大多数模型对齐：", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_475.wav", "doc_id": "SUkmfOTvGi.seg_475", "src_text": "And last but not least, we calculated the percentage change in F1 to assess the generalization of each model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最不重要的是，我们计算了F1得分的百分比变化，以评估每个模型的普遍性。", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_495.wav", "doc_id": "SUkmfOTvGi.seg_495", "src_text": "So going back to the question that we raised in the title of our paper Do CoNLL-2003 taggers still work in 2023?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此回到我们在论文标题中提出的问题：2003年，Coral2003标签仍然有效吗？", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_855.wav", "doc_id": "GvEBWkLmuI.seg_855", "src_text": "So first we use a lexicon of stereotypes, and we find that the generated personas contain a lot more stereotypes than the human-written ones.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，我们使用了lexiconstereotypes，我们发现生成的人物包含了比人类人物更多的stereotypes。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_5.wav", "doc_id": "aQpIWggfCo.seg_5", "src_text": "However, previous work mainly focuses on planning for the abstract goals of stereotypical activities.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，之前的研究主要集中在规划典型活动的抽象目标上，", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_3.wav", "doc_id": "aQpIWggfCo.seg_3", "src_text": "Previous work has exploited language models to plan for abstract goals of stereotypical activities such as \"make a cake\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "以前的研究利用语言模型来规划抽象的典型活动（例如“踢门", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_244.wav", "doc_id": "oYCKgTzTDy.seg_244", "src_text": "We found that Encoder-Decoder obtains the best performance on all nine datasets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现编码器-解码器在所有九个数据集上获得了最佳性能。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_832.wav", "doc_id": "GvEBWkLmuI.seg_832", "src_text": "They usually rely on hand-constructed data sets that are very time-consuming to curate and they also usually only. measure very specific stereotypes, meaning that they don't generalize well to other demographics or contexts, or they simply capture very general broad associations, like negative associations with particular groups.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "它们通常依赖于手工构建的数据集，这些数据集非常耗时耗力。而且它们通常只测量非常具体的刻板印象，这意味着它们不能很好地推广到其他人口学或背景中，或者它们只是捕获非常广泛的粗糙关联，例如负面的关联与特定群体。", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_676.wav", "doc_id": "oaOHnMCwad.seg_676", "src_text": "This work was done in collaboration with some folks at the University of Washington and the Allen Institute for AI, namely Sebastian Santy, Ronan Le Bras, Katharina Reinecke and Maarten Sap.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这项工作是在华盛顿大学与一些专家合作完成的，包括艾伦研究所的塞巴斯蒂安·桑蒂、罗纳德·拉布拉斯、卡特琳娜·阿雷尼卡和马丁·萨普。", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_421.wav", "doc_id": "WBLMIsdIrq.seg_421", "src_text": "But then if we use COMET, context-aware models perform best.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但如果我们使用COMET，感知上下文的模型表现最佳。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_791.wav", "doc_id": "WTTtiRKFZI.seg_791", "src_text": "Because here between the verb and the direct object is an adjunct: \"yesterday\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，昨天红色的很多，昨天它在直接物体和间接物体之间存在一个断裂。", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_160.wav", "doc_id": "SLpqvupgvW.seg_160", "src_text": "I'm going to talk about our work on \"Resolving Indirect Referring Expressions for Entity Selection\", in which we introduce the AltEntities Corpus.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我要谈谈我们解决实体选择中的间接引用表达式的工作，其中我们引入了Altentitescorpus。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_251.wav", "doc_id": "oYCKgTzTDy.seg_251", "src_text": "The orange line is Cross-lingual Zero-shot transfer.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "橙色线是跨语言零shot转移，而", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_734.wav", "doc_id": "XejEJmgUmE.seg_734", "src_text": "Which can also include grammaticality like BLiMP, SyntaxGym, or acceptability in terms of stereotypes such as CrowS pairs.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "包括可接受性判断，甚至可以包括语法，如语音语法模或可接受性在术语中语音对。", "score": 22.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_54.wav", "doc_id": "TVCREhgqUP.seg_54", "src_text": "And \"Mary knew that the girl slept.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "玛丽醒着。", "score": 15.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_565.wav", "doc_id": "rISrKoXQCx.seg_565", "src_text": "And we also try to investigate whether language models can pick up the polarisation that's prevalent in our modern society.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们也尝试调查语言模型是否可以捕捉到我们现代社会中普遍存在的极化现象。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_181.wav", "doc_id": "SLpqvupgvW.seg_181", "src_text": "And in the third speech bubble, Bob uses an indirect reference to select one of these entities, for example, \"the newer one.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在第三个演讲中，鲍勃使用间接引用来选择其中一个实体，例如新版本", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_24.wav", "doc_id": "aQpIWggfCo.seg_24", "src_text": "Next, a filter model is developed to select the faithful scripts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "接下来，开发了一个过滤模型来选择可视化脚本。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_346.wav", "doc_id": "gGbuDbHhyc.seg_346", "src_text": "Instead, we label the data using weak labeling sources, such as simple heuristic rules, knowledge bases, or low-quality crowdsourcing, as illustrated in the figure on the right.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "61.62.63.64.65.66.67.68.69.70.71.72.73.74.75.7", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_635.wav", "doc_id": "FLkGnzVRew.seg_635", "src_text": "Simply put, cognitive dissonance is two beliefs or actions that are inconsistent, such as this example where a person states, \"I know that cigarettes could kill me\", and then goes on to say \"I grabbed a couple of smokes after the meeting\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "简单地说，认知偏差是两个相互矛盾的信念或行为。例如，这种例子：一个人说“我知道那烟草会杀我”，然后接着说“我在见面后抽了几根烟”，", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_28.wav", "doc_id": "aQpIWggfCo.seg_28", "src_text": "With our method, InstructGPT can generate scripts of higher quality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "各种头发颜色的图像。我们的方法显著提高了生成的可读性", "score": 25.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_660.wav", "doc_id": "FLkGnzVRew.seg_660", "src_text": "Over the different strategies, we found that Cumulative performed equal or better than Iterative across the board.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在不同的策略中，我们发现累积的表现与迭代的表现相等或更好。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_474.wav", "doc_id": "SUkmfOTvGi.seg_474", "src_text": "We evaluated them on both the CoNLL-03 test sets and the CoNLL++.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们在康诺尔3测试套和康诺尔+测试套上对它们进行了评估。", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_572.wav", "doc_id": "rISrKoXQCx.seg_572", "src_text": "For example, for hate speech detection, left-leaning language models are better at detecting hate speech targeting socially minority groups, however are worse at detecting hate speech targeting more powerful groups in our society.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，例如，左边语言模型在检测左边言论方面更好，目标是社会性少数群体的左边言论，但在检测右边言论方面更差。更强大的语言模型在我们的社会中更有权力，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_26.wav", "doc_id": "aQpIWggfCo.seg_26", "src_text": "In addition, we reward the script that contains the keywords of the target constraint.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "此外，我们避免脚本包含目标约束关键词的脚本；我们只保", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_709.wav", "doc_id": "oaOHnMCwad.seg_709", "src_text": "For example, we find that data sets and models are most aligned to English speaking countries.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，我们发现数据集模型大多与英语国家相关，因此", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_615.wav", "doc_id": "oeooqChmKK.seg_615", "src_text": "This last setting is especially interesting, since it simulates the case where the background knowledge necessary to solve a task is not part of the pretrain data of models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后一个设置特别有趣。由于它模拟了需要解决任务的背景知识，因此它不是预训练数据模型的一部分。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_161.wav", "doc_id": "SLpqvupgvW.seg_161", "src_text": "My name is Javad Hosseini and this is a joint work with Filip Radlinski, Silvia Pareti, and Annie Louis.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我的名字是JavadHosseini，我与PhilipRadlinski、SilviaParavati和AnnieLouis一起合作。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_371.wav", "doc_id": "gGbuDbHhyc.seg_371", "src_text": "So in practice, there's no reason to choose more complex WSL methods which require more computation time and disk space.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，在实践中，没有理由选择更复杂的WSL方法，这些方法需要更多的计算时间和磁盘空间。", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_569.wav", "doc_id": "rISrKoXQCx.seg_569", "src_text": "So this indicates that language models can also pick up the polarisation in our society.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是在20世纪70年代之后的，所以这表明语言模型也可以捕捉到我们社会中类似极化的现象。因此，", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_241.wav", "doc_id": "oYCKgTzTDy.seg_241", "src_text": "And we also find many interesting results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还发现了许多有趣的结果。因此，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_428.wav", "doc_id": "WBLMIsdIrq.seg_428", "src_text": "To summarize, we perform a data-driven analysis across 14 language pairs to identify when translations require context and then we use our findings to build a benchmark for document-level machine translation which can help us identify which discourse phenomena models can handle well or not, and which translation systems are good at document-level translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "总结，我们在14对语言对上进行了数据驱动的分析，以确定翻译何时需要上下文，然后我们使用我们的发现来建立文档级机器翻译的基准，帮助我们确定哪些对话现象模型可以处理得很好或很差，以及哪些翻译系统在文档级翻译上表现良好。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_204.wav", "doc_id": "SLpqvupgvW.seg_204", "src_text": "For example, \"the one without words\", \"not the one with the 12 year old boy\", or \"the fictional one\", or \"comes from Azerbaijan\", and so on.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，带有无字的实体，不带有12岁男孩的实体或虚构的实体。Alvan或来自阿塞拜疆和其他地方。该", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_300.wav", "doc_id": "PIZEXUFLAR.seg_300", "src_text": "So this shows the effect of different fine-tuning strategies on the model sensitivity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这表明了不同调谐策略对模型敏感性的影响。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_480.wav", "doc_id": "SUkmfOTvGi.seg_480", "src_text": "The second ingredient is the model size.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二个成分是模型大小。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_47.wav", "doc_id": "TVCREhgqUP.seg_47", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你好，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_637.wav", "doc_id": "FLkGnzVRew.seg_637", "src_text": "Further mentioning that \"I don't think I could keep my job without them\" justifies the second occurrence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "此外，我想说我不认为我可以在没有他们的情况下拿到我的工作。", "score": 99.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_432.wav", "doc_id": "hgIDlKNiFM.seg_432", "src_text": "In this presentation, we first talk about language modeling in healthcare.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这次演讲中，我们首先会讨论语言建模在医疗保健中的应用，", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_754.wav", "doc_id": "XejEJmgUmE.seg_754", "src_text": "So first, we look at the Wikipedia sentences, which are completely irrelevant to the current query pair, and there we find that the MPP judgments are mostly robust for arbitrary context length.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，我们来看看与当前的query对完全无关的维基百科句子，在那里我们发现MP-P的判决结果对于任意的上下文长度都是相当坚固的。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_390.wav", "doc_id": "WBLMIsdIrq.seg_390", "src_text": "So, depending on context, the meaning of the word changes, and therefore its translation changes as well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以依赖于上下文，单词的意思会改变，因此其翻译也会改变。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_632.wav", "doc_id": "FLkGnzVRew.seg_632", "src_text": "Hello, my name is Vasudha and I'm a Computer Science PhD candidate at Stony Brook University.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我是瓦苏达，目前是斯托尼布鲁克大学计算机科学博士研究生。", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_104.wav", "doc_id": "uZBWfYjYnf.seg_104", "src_text": "That is the cross-attention mechanism, and you can see an example on the right.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "您可以看到一个示例在右侧。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_643.wav", "doc_id": "FLkGnzVRew.seg_643", "src_text": "Studying dissonance expressed in language can also be beneficial in understanding extremism and polarization of vulnerable groups.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "研究不同表达语言也可以有利于理解极端主义和可怜人群的极化。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_350.wav", "doc_id": "gGbuDbHhyc.seg_350", "src_text": "In recent works in WSL, so WSL stands for Weakly Supervised Learning, a common claim is that people say that they only train models on the weakly labeled data and achieve high performance on clean test sets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在最近的WSSL（WeeklySuperwiseLearning）研究中，WSSL的缩写代表“每周超级监督学习”。一个常见的说法是人们说他们只在每周标记的数据上训练模型，并在清洁测试集上获得高性能。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_464.wav", "doc_id": "SUkmfOTvGi.seg_464", "src_text": "Today I'm going to present our paper Do CoNLL-2003 named entity taggers still work well in 2023?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "今天我要介绍我们的论文：2003年命名实体识别器是否仍在2023年仍能有效工作。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_873.wav", "doc_id": "GvEBWkLmuI.seg_873", "src_text": "So based on these patterns, we conclude with three recommendations for model owners.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此基于这些模式，我们可以对模型所有者提出三点建议。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_735.wav", "doc_id": "XejEJmgUmE.seg_735", "src_text": "And in this, minimal pair paradigm, the typical way to evaluate language models is that you show like an acceptable sentence or a grammatical sentence and then you show an acceptable sentence or an ungrammatical sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这种最小的支付模式中，评估语言模型的典型方式是先显示可接受的句子或语法句子，然后显示不可接受的句子或非语法句子。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_527.wav", "doc_id": "dvGkKzmIaN.seg_527", "src_text": "The provided embedding is a weight summation of the target embedding and the original embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "提供的嵌入是目标嵌入和原始嵌入的加权和。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_642.wav", "doc_id": "FLkGnzVRew.seg_642", "src_text": "High cognitive dissonance is also related to anxiety disorders and can help understand people's mental health better.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "高认知距离也与焦虑障碍有关，可以帮助人们更好地理解人们的心理健康。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_83.wav", "doc_id": "TVCREhgqUP.seg_83", "src_text": "In our paper, we solve a couple of interesting technical challenges.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在我们的论文中，我们解决了一些有趣的技术挑战。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_444.wav", "doc_id": "hgIDlKNiFM.seg_444", "src_text": "Afterwards, we ask ourselves how much data do we need to train a specialized model on French data?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "之后，我们问自己：我们需要多少数据来训练一个专门用于法国数据的模型？", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_342.wav", "doc_id": "gGbuDbHhyc.seg_342", "src_text": "In this video, I would like to present our recent work \"Weaker Than You Think: A Critical Look at Weakly Supervised Learning.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这个视频中，我想介绍一下我们最近的作品：比你想象的更弱的视力学习。", "score": 59.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_252.wav", "doc_id": "oYCKgTzTDy.seg_252", "src_text": "While the green line is the Monolingual Setting.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "绿色线是单语言的设置。我们发现", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_675.wav", "doc_id": "oaOHnMCwad.seg_675", "src_text": "I'm Jenny, a first year PhD student at Carnegie Mellon University and today I'll be presenting your work NLPositionality characterising design biases of datasets and Models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我是卡内基梅隆大学的第一年博士生，我的名字是Jenny，我今天将会展示我的工作：ANNPositionally：用数据集和模型来描述设计。", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_521.wav", "doc_id": "dvGkKzmIaN.seg_521", "src_text": "Watermark injection and copyright verification.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "水印注入和版权伪造检", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_813.wav", "doc_id": "WTTtiRKFZI.seg_813", "src_text": "But what's novel in this paper is that we observed that this tendency only occurs when the governor is on the left or absent.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但这篇文章的新奇之处在于我们观察到这种趋势只有在总督左边或缺席时才会发生。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_43.wav", "doc_id": "aQpIWggfCo.seg_43", "src_text": "We use large language models to generate a high-quality script dataset, CoScript, for constrained language planning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们使用大型语言模型来生成用于受限语言规划的高质量脚本数据集（CoScript）。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_202.wav", "doc_id": "SLpqvupgvW.seg_202", "src_text": "For example, the one with the piano music.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，带着钢琴音乐的那", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_281.wav", "doc_id": "PIZEXUFLAR.seg_281", "src_text": "For testing, we reserve the entire common sense reasoning group for testing, and we select additional 5 tasks from VQ and Miscellaneous groups.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，保留了一个共通的测试任务组，并从VQI和微观组中选择了另外5个任务。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_420.wav", "doc_id": "WBLMIsdIrq.seg_420", "src_text": "First of all, when we use corpus-level metrics: so for BLEU, we find that context-agnostic models have the best performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，当我们使用CorpusLevelMetrics（语料级矩阵）时，我们发现复杂的认知模型具有最佳性能。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_423.wav", "doc_id": "WBLMIsdIrq.seg_423", "src_text": "This again demonstrates that it is difficult to determine the best document-level translation system if we use corpus-level metrics alone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这再次证明，如果仅使用语料级指标，就很难确定最佳文档级翻译系统。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_422.wav", "doc_id": "WBLMIsdIrq.seg_422", "src_text": "And if we use word f-measure, then models with and without context have comparable performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果我们使用F度量，则具有和没有内容的模型都有相似的性能。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_474.wav", "doc_id": "SUkmfOTvGi.seg_474", "src_text": "We evaluated them on both the CoNLL-03 test sets and the CoNLL++.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们在CORNELL-3测试集和CORNELL+测试集上评估了它们。.最后，但不是", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_666.wav", "doc_id": "FLkGnzVRew.seg_666", "src_text": "We also check the feasibility of each strategy for annotation quality and costs to annotators.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还检查了每个注释策略对注释质量和对注释者成本的可行性，", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_137.wav", "doc_id": "wLqFAuDnKa.seg_137", "src_text": "So, it's important to select a good prompting strategy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "的是要选择一个好的促进策略。在我们", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_743.wav", "doc_id": "XejEJmgUmE.seg_743", "src_text": "So for example, here we have chosen like a typical pair of grammaticality from the BLiMP data set from the Adjunct Island case.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，我们在这里选择了从AdjugantIsland案例中的Blimp数据集中选择典型的“对称性”。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_150.wav", "doc_id": "wLqFAuDnKa.seg_150", "src_text": "But, PaLM comes pretty close to a commercial system.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但帕姆非常接近商业系统", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_817.wav", "doc_id": "WTTtiRKFZI.seg_817", "src_text": "Here we have coordination of two verbs and there's no outsides, external governor.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们有两个词的协调，没有外部政府（right），因此", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_220.wav", "doc_id": "oYCKgTzTDy.seg_220", "src_text": "Existing cross-lingual semantic parsing models are separately proposed and evaluated on data set of limited tasks and applications.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "现有的跨语言语义分析模型分别在有限任务和应用程序的数据集上提出和评估，", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_121.wav", "doc_id": "uZBWfYjYnf.seg_121", "src_text": "Thanks for your attention.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的关注。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_385.wav", "doc_id": "WBLMIsdIrq.seg_385", "src_text": "This work was done in collaboration with Patrick Fernandes, Emmy Liu, André F. T. Martins, and Graham Neubig.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这项工作是由PatrickFournagne、M.L.E.Andre、A.F.T.Martins和GrahamNeubig共同完成的。所以", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_310.wav", "doc_id": "dJGfOSFgZO.seg_310", "src_text": "And today we'll tell you all about ABC-Eval, a new dimensional approach to evaluating conversational AI.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "天我们将告诉您关于ABCEval，一个新的维度方法来评估对话AI。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_193.wav", "doc_id": "SLpqvupgvW.seg_193", "src_text": "And finally when they have similar info boxes or attributes on Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，当他们在维基百科上有相似的信息框或属性时，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_239.wav", "doc_id": "oYCKgTzTDy.seg_239", "src_text": "We train on one source language and transfer to another language.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们在一个源语言上训练，并将其转移到另一个语言上，所以", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_537.wav", "doc_id": "dvGkKzmIaN.seg_537", "src_text": "We conduct experiments on four data sets AG News, MIND, SST2 and Enron Spam.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们在四个数据集上进行实验：Agnews、Mind、SST2和Ariespam；", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_353.wav", "doc_id": "gGbuDbHhyc.seg_353", "src_text": "But like an elephant in the room this necessity is often overlooked.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "像房间里的大象。这种必要性经常被忽视。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_693.wav", "doc_id": "oaOHnMCwad.seg_693", "src_text": "Our framework works in two main steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的框架分为两个主要步骤。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_302.wav", "doc_id": "PIZEXUFLAR.seg_302", "src_text": "We also can see transfer learning from natural instruction datasets can help OFA to attain much better performance on the natural instruct dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还可以看到从自然指令数据集转移学习可以帮助OFA在自然指令数据集上获得更好的性能。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_156.wav", "doc_id": "wLqFAuDnKa.seg_156", "src_text": "And that's it for this really short overview.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这就是对这个真正的“截图”视", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_488.wav", "doc_id": "SUkmfOTvGi.seg_488", "src_text": "This means that every unit of improvement that we made, on CoNLL-2003 translates to more than one unit improvement on CoNLL++ which means that there is no diminishing returns.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这意味着我们在卡罗尔2003年上色中所做的每项改进都转化为卡罗尔+色上的一个以上改进，这意味着没有降低的收益。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_496.wav", "doc_id": "SUkmfOTvGi.seg_496", "src_text": "And we found that the answer is actually a resounding yes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现答案实际上是“是”。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_437.wav", "doc_id": "hgIDlKNiFM.seg_437", "src_text": "And finally, we conclude about the experiments and give you more details about how to access those models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，我们将对实验进行总结，并给您更多关于如何访问这些模型的详细信息。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_367.wav", "doc_id": "gGbuDbHhyc.seg_367", "src_text": "As we can see, if we have 10 samples per class, direct fine-tuning starts to beat WSL approaches.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "正如我们可以看到，如果我们有十个样本，DirectFineTuning开始击败WSL方法。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_631.wav", "doc_id": "oeooqChmKK.seg_631", "src_text": "Thanks for listening.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的倾听。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_81.wav", "doc_id": "TVCREhgqUP.seg_81", "src_text": "Our model outperforms the others by a large margin on generalization to deeper recursion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的模型在泛化到更深的递归时远远超过了其他模型。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_158.wav", "doc_id": "wLqFAuDnKa.seg_158", "src_text": "Thank you very much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "非常感谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_769.wav", "doc_id": "XejEJmgUmE.seg_769", "src_text": "Please read our paper for more details of our experiments.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "请阅读我们的论文以了解更多有关我们的实验的细节。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_320.wav", "doc_id": "dJGfOSFgZO.seg_320", "src_text": "We developed this method to comprehensively cover chat model behaviors that have been suggested to affect chat quality in recent literature.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们开发了这种方法来全面涵盖被建议影响聊天质量和最近文学的聊天模型行为。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_505.wav", "doc_id": "dvGkKzmIaN.seg_505", "src_text": "Currently, large language models such as GPT, LLAMA, PALM are exceptional in natural language understanding and generation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "目前，像TPP、LAMA、PALM这样的大型语言模型在自然语言理解和生成方面是例外的。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_818.wav", "doc_id": "WTTtiRKFZI.seg_818", "src_text": "In such cases, the left conjunct prefers to be shorter; the most of the biggest difference between the two conjuncts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这种情况下，左边的结合更喜欢被缩短，而不是两个国家之间的最大差异。", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_287.wav", "doc_id": "PIZEXUFLAR.seg_287", "src_text": "So during test for each task, we conduct a total of 5 experiments by evaluating the model using one of the five instructions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在测试期间，对于每个任务，我们总共进行了五次实验，使用每个任务的五个指令模板之一评估模型。我们报告了平均值和最", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_381.wav", "doc_id": "gGbuDbHhyc.seg_381", "src_text": "Please feel free to check it out.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "；请自由查看；", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_382.wav", "doc_id": "gGbuDbHhyc.seg_382", "src_text": "Thank you and enjoy the conference.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢你参加了会议。", "score": 66.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_57.wav", "doc_id": "TVCREhgqUP.seg_57", "src_text": "In this example, the model has seen shallow recursion during training and is tested on an example with deeper recursion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这个例子中，模型在训练过程中看到的是较浅的递归，而在测试时则是测试了一个例子，具有更深的递归。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_262.wav", "doc_id": "oYCKgTzTDy.seg_262", "src_text": "Thanks for listening.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的关注。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_595.wav", "doc_id": "oeooqChmKK.seg_595", "src_text": "Pretrained parameters can contain information about what presidents do and what a TV is but they cannot reliably know who this instance-specific entity \"John\" is, or who the new president is, because the president might have changed since pretraining.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "预训练参数可以包含关于总统做什么和TVA是做什么的信息，但它们不能可靠地知道这个特定实体John是谁或新总统是谁，因为总统可能在预训练期间已经改变了。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_158.wav", "doc_id": "wLqFAuDnKa.seg_158", "src_text": "Thank you very much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "非常感谢。", "score": 99.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_486.wav", "doc_id": "SUkmfOTvGi.seg_486", "src_text": "The second hypothesis is temporal drift which is the performance degradation that is caused by the increasing temporal gap between the train and the test data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第二种假设是时间漂移，这是由于列车和测试数据之间的时间间隔增加而导致的性能降低。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_65.wav", "doc_id": "TVCREhgqUP.seg_65", "src_text": "Obtaining trees may also involve specialized grammar-induction procedures.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "获得树也可能涉及专门的语法诱导程序。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_164.wav", "doc_id": "SLpqvupgvW.seg_164", "src_text": "\"Did you mean 'Easy on Me' or 'I Gotta Feeling'?\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你是指我很容易接受吗？还是我有感觉？", "score": 40.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_498.wav", "doc_id": "SUkmfOTvGi.seg_498", "src_text": "And lastly, please make sure to check out our paper, our data set and if you have any questions, feel free to contact me.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，请务必检查我们的论文、我们的数据集，如果您有任何问题，请随时联系我", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_450.wav", "doc_id": "hgIDlKNiFM.seg_450", "src_text": "In total, we have seven models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "总共有七个模型。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_378.wav", "doc_id": "gGbuDbHhyc.seg_378", "src_text": "Third, continuous fine-tuning is a simple yet strong baseline that should be considered in future work in WSL.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第三，连续细调是一种简单而强大的基线，应该在WSL的未来工作中考虑。.", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_664.wav", "doc_id": "FLkGnzVRew.seg_664", "src_text": "Note that the performance is significantly lower for random.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "管差异很小，注意到随机的性能显著低于。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_324.wav", "doc_id": "dJGfOSFgZO.seg_324", "src_text": "For comparison, we also evaluated these conversations using three existing methods: Likert ratings on the turn-level, Likert ratings on the dialogue-level, and dialogue-level pairwise comparisons.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了进行比较，我们还使用了三种现有的方法来评估这些对话：转场级的利卡特评分，对话级的利卡特评分和对话级的对称比较。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_513.wav", "doc_id": "dvGkKzmIaN.seg_513", "src_text": "Second, the watermark should not degrade the utility of the provided embeddings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "其次，水印方法不应降低提供的嵌入设备的效用。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_480.wav", "doc_id": "SUkmfOTvGi.seg_480", "src_text": "The second ingredient is the model size.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二个成分是模型大小。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_528.wav", "doc_id": "dvGkKzmIaN.seg_528", "src_text": "The weight of the target embedding is proportional to the number of triggers in the sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "目标嵌入的重量与句子中的触发器数量成正比；", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_366.wav", "doc_id": "gGbuDbHhyc.seg_366", "src_text": "The right figure shows the performance difference between fine-tuning approaches, which are directly applied on the clean data, and WSL approaches, which use the clean data for validation only.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "红色图表显示了精细调节方法与直接在清洁数据上应用的精细调节方法之间的性能差异，以及仅用于验证的清洁数据的WSL方法。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_278.wav", "doc_id": "PIZEXUFLAR.seg_278", "src_text": "In which the input text, images, instructions and bounding boxes are represented in the same token space.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "其中输入文本、图像、指令和边界框都在同一个令牌空间中表示。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_596.wav", "doc_id": "oeooqChmKK.seg_596", "src_text": "Therefore, successful models for knowledge-intensive NLU tasks require the ability to integrate and use both pretrain-time and inference-time knowledge.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，成功的知识密集型NLU任务模型需要能够整合和使用预训练时间和推理时间知识。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_481.wav", "doc_id": "SUkmfOTvGi.seg_481", "src_text": "We found that usually larger models lead to better generalization.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现，通常来说，较大的模型会带来更好的泛化能力。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_27.wav", "doc_id": "aQpIWggfCo.seg_27", "src_text": "We only keep the script if the target goal scores the highest in the goal set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "留脚本，如果目标得分在目标得分中最高。我们的方法可以生成", "score": 41.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_709.wav", "doc_id": "oaOHnMCwad.seg_709", "src_text": "For example, we find that data sets and models are most aligned to English speaking countries.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，我们发现数据集和模型最匹配的是英语国家。因此，在", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_68.wav", "doc_id": "TVCREhgqUP.seg_68", "src_text": "Our approach predicts the output from the input in two steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的方法预测输入输出在两个步骤中：", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_620.wav", "doc_id": "oeooqChmKK.seg_620", "src_text": "In the Background-Inference setting, we provide the fictional occupation \"mirituer\" instead of politician because \"mirituer\" is unlikely to be contained in the pretrained parameters.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在火星背景下，我们提供了虚构职业“meritua”而不是政治家，因为meritua不太可能包含在预训练的天堂中。", "score": 30.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_263.wav", "doc_id": "PIZEXUFLAR.seg_263", "src_text": "Hello everyone, my name is Ying and my colleague Zhiyang and I will be presenting our research on MultiInstruct improving Multi-Modal Zero-Shot Learning via Instruction Tuning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "大家好，我是Yin，我的同事Jiang和我将会介绍我们关于多指令的研究，通过指令调节来改进多模态自我学习。因此，", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_284.wav", "doc_id": "PIZEXUFLAR.seg_284", "src_text": "So we use pre-trained OFA large model as a base model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们使用预训练的OFA大模型作为基本模型。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_471.wav", "doc_id": "SUkmfOTvGi.seg_471", "src_text": "To investigate these problems, we developed the CoNLL++ Dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了调查这些问题，我们开发了Carnegie+数据集：", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_493.wav", "doc_id": "SUkmfOTvGi.seg_493", "src_text": "And these goes hand in hand, we can't just have one ingredient but throw out the others.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这些目标是互相联系的，我们不能只有一种成分，而是要全部都有。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_301.wav", "doc_id": "PIZEXUFLAR.seg_301", "src_text": "As we can see by transfer learning from natural instruction datasets, the model can achieve much better sensitivity compared to the original OFA model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "看到，通过从自然指令集学习模型，可以比原始OFA模型获得更好的敏感度。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_638.wav", "doc_id": "FLkGnzVRew.seg_638", "src_text": "And they have a consonance relationship.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "他们有第二种经历，并且他", "score": 41.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_546.wav", "doc_id": "rISrKoXQCx.seg_546", "src_text": "Hi, I'm Shangbin, PhD student in the University of Washington.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我是华盛顿大学的博士生。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_654.wav", "doc_id": "FLkGnzVRew.seg_654", "src_text": "We transfer from two different tasks: topic independent dissonance stance classification, a task that determines if two debate statements from different people are in agreement or in disagreement, irrespective of topic, called debate here, and on binary classification of expansion and comparison classes of PDTB since these two are closely related to the conception of consonance and dissonance and we call them CE here.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们从两个不同的主题转换：主题独立的离散分类：该任务确定两个来自不同的人的声明是否相互同意或相互不同意，尽管主题相同。在这里进行辩论，并关于二音分类的扩展和比较类别的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类的二音分类", "score": 10.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_191.wav", "doc_id": "SLpqvupgvW.seg_191", "src_text": "The second one is when the entities have similar titles, for example, two books with the name \"The Return\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二种是实体具有相似的标题，例如两个书籍的名称为“返回”；", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_665.wav", "doc_id": "FLkGnzVRew.seg_665", "src_text": "On further rounds of AL with two best strategies, we improve dissonance classification AUC to 0.75, which is the best performance that we have on the task so far.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在进一步的AL循环中，我们使用了两种最佳策略，改进了距离分类（AUC）到0.75，这是我们在任务上到目前为止的最佳表现。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_833.wav", "doc_id": "GvEBWkLmuI.seg_833", "src_text": "Furthermore, most work in this space doesn't account for intersectionality, which is the notion that multi-faceted social identities can compound biases and be unique loci of harm.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "此外，多数在这个领域的工作都没有考虑到交叉性，这是指多面性社会身份可以组合成偏见，并且是独特的低劣的伤害。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_384.wav", "doc_id": "WBLMIsdIrq.seg_384", "src_text": "A Data-driven, Multilingual Exploration\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "数据驱动的多语言探索：", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_151.wav", "doc_id": "wLqFAuDnKa.seg_151", "src_text": "In our case, we chose to evaluate with Google Translate.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "case,wechosetoevaluatei", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_875.wav", "doc_id": "GvEBWkLmuI.seg_875", "src_text": "We should also be using an intersectional lens to study biases and harms because there's a lot of things that might be overlooked if we don't do that.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还应该使用间接镜来研究偏差和害处，因为如果我们不这样做，我们可能会错过很多东西。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_654.wav", "doc_id": "FLkGnzVRew.seg_654", "src_text": "We transfer from two different tasks: topic independent dissonance stance classification, a task that determines if two debate statements from different people are in agreement or in disagreement, irrespective of topic, called debate here, and on binary classification of expansion and comparison classes of PDTB since these two are closely related to the conception of consonance and dissonance and we call them CE here.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们从两个不同的任务转移：主题独立的不一致性判别：一个任务，决定两个来自不同人的辩论陈述是否无论主题是否一致或不一致（称为辩论这里），以及二元分类扩展和比较类别的PDTB，因为这两个都与和谐和不和谐的概念密切相关，我们称之为CEE这里在转移时，", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_322.wav", "doc_id": "dJGfOSFgZO.seg_322", "src_text": "For example, ABC-Eval measures the number of turns in which a chat model ignores its partner or says something irrelevant, contradicts itself or its partner, hallucinates incorrect facts or violates common sense knowledge, and when the model succeeds or fails to show empathy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，ABC-EVAL评估了聊天机器人忽视其对话伙伴或说出一些与其对话无关的内容的次数。它与自身或其伙伴相互矛盾，幻想出错误的事实或违背常识知识，并且当模型表现出同情心或无法表现出同情心时。", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_514.wav", "doc_id": "dvGkKzmIaN.seg_514", "src_text": "Third, the watermark should be covert enough to the attacker or the attacker can remove the watermark easily.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第三，水印应该被覆盖到攻击者或攻击者可以轻松移除水印；", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_34.wav", "doc_id": "aQpIWggfCo.seg_34", "src_text": "We appy our method for building a dataset of constrained language planning, named as CoScript.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们将我们的方法应用于构建一个受限语言规划数据集，称为CodeScript。", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_759.wav", "doc_id": "XejEJmgUmE.seg_759", "src_text": "And there we see that the MPP judgments either increase or decrease significantly when you add either acceptable prefixes or unacceptable prefixes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "从同一个眩晕或语法眩晕数据集。在那里，我们看到MP3判决会显著增加或减少，", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_706.wav", "doc_id": "oaOHnMCwad.seg_706", "src_text": "Our study in the end amassed over 16,000 annotations from over 1000 annotators from 87 countries.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们研究了16,000个来自88个国家的注释者。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_519.wav", "doc_id": "dvGkKzmIaN.seg_519", "src_text": "Then let me introduce the details of our embedding marker.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "后门基于水印方法。然后让我介绍一下我们的嵌入标记器：", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_341.wav", "doc_id": "gGbuDbHhyc.seg_341", "src_text": "Hello, I am Dawei, a PhD student at Saarland University in Germany.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我是德国萨尔兰特大学的博士生达维。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_624.wav", "doc_id": "oeooqChmKK.seg_624", "src_text": "When trained on KITMUS, however, both C2F and BERT4Coref perform significantly better than the random choice.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，他们在“杀虫剂”上接受了训练。“海洋之战”和“神奇宝贝”都比“龙与地下城”表现得更好。", "score": 25.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_364.wav", "doc_id": "gGbuDbHhyc.seg_364", "src_text": "Typically we only need 20 samples per class to attain high performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通常，我们每个班级只需要二十个样本才能达到高性能。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_660.wav", "doc_id": "FLkGnzVRew.seg_660", "src_text": "Over the different strategies, we found that Cumulative performed equal or better than Iterative across the board.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在不同的策略下，我们发现累积性表演比迭代性表演在板上更好或更平等。", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_723.wav", "doc_id": "oaOHnMCwad.seg_723", "src_text": "I mean, we want to emphasise that inclusive NLP isn't just making.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们想强调，一个包容性的NLP不仅仅是让", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_533.wav", "doc_id": "dvGkKzmIaN.seg_533", "src_text": "Then the provider requests the embeddings from the stealer's service with the data set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后，提供者要求从数据集上嵌入来自类似服务的数据。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_723.wav", "doc_id": "oaOHnMCwad.seg_723", "src_text": "I mean, we want to emphasise that inclusive NLP isn't just making.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们想强调一下，包括NLPI只是", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_828.wav", "doc_id": "GvEBWkLmuI.seg_828", "src_text": "Hi, I'm Myra and today I'll be talking about our paper \"Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我是玛丽亚，我今天要谈谈我们的论文。使用自然语言促进语音和语言模型的研究", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_258.wav", "doc_id": "oYCKgTzTDy.seg_258", "src_text": "We conduct a comprehensive benchmark study on three representative types of multilingual language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们对三种多语言模型的代表进行了全面的基准研究，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_343.wav", "doc_id": "gGbuDbHhyc.seg_343", "src_text": "This is joint work with Xiaoyu Shen, Marius Mosbach, Andreas Stephan, and Dietrich Klakow.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是与沙尔·尤谢尔、马里奥斯·穆斯巴、安德烈亚斯·斯蒂芬和迪特里斯·克拉科合作的作品。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_801.wav", "doc_id": "WTTtiRKFZI.seg_801", "src_text": "So here we have a dependency from \"read\" to the adjunct of length 7 measured in words and from \"read\" to \"book\" of length 4, so together it's 11.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以我们从红色到七种颜色（以文字表示）和从红色到四种颜色（以文字表示）之间有依赖关系，所以我们可以把它们结合在一起。当您移动并", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_336.wav", "doc_id": "dJGfOSFgZO.seg_336", "src_text": "With the rapid pace of improvement in the field, many of these error rates could see a decrease in new models released since our evaluation was conducted.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "由于该领域的改进速度很快，许多这些错误率在我们评估之后发布的新模型中都可能会降低。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_814.wav", "doc_id": "WTTtiRKFZI.seg_814", "src_text": "Right?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_88.wav", "doc_id": "TVCREhgqUP.seg_88", "src_text": "Our permutation method is very flexible, but it brings the challenge that finding the highest-scoring permutation is NP-hard.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的置换方法非常灵活，但它带来了挑战，即找到得分最高的置换是很难的。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_860.wav", "doc_id": "GvEBWkLmuI.seg_860", "src_text": "So instead to do that, we'll turn to the results from our Marked Words method to show how these positive-seeming words facilitate stereotypes and essentializing narratives.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以我们将转到我们的标记词方法的结果，展示这些看似积极的词汇如何促进刻板印象和本质化叙述。", "score": 67.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_416.wav", "doc_id": "WBLMIsdIrq.seg_416", "src_text": "And we called our tagger the Multilingual Discourse-Aware, or MuDA tagger.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的标志为多语言对话或穆达达标志。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_129.wav", "doc_id": "wLqFAuDnKa.seg_129", "src_text": "This involves using the latest test sets to avoid an overlap of the test data with the training data of the language model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这涉及使用最新的测试来避免使用语言模型训练数据的测试数据。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_728.wav", "doc_id": "XejEJmgUmE.seg_728", "src_text": "Hi, everyone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "问：您好，", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_670.wav", "doc_id": "FLkGnzVRew.seg_670", "src_text": "We also find that iterative update is useful for transfer learning from a different domain, whereas in domain active annotations benefit from cumulative update.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还发现迭代更新对于从不同域学习时有用，因为域内的活跃注释可以从累积更新中获益。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_469.wav", "doc_id": "SUkmfOTvGi.seg_469", "src_text": "And when we develop new taggers, what is needed for good generalization?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "当我们开发新的标签时，什么是好的泛化所需要的？", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_349.wav", "doc_id": "gGbuDbHhyc.seg_349", "src_text": "In weakly supervised learning, training algorithms are proposed to robustly train neural networks under such label noise so that the trained models still generalize well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "与人类标注相比，弱标注的标注在这种情况下，训练算法旨在在这种标签噪音下强大地训练神经网络，以便训练模型能够在测试时抵抗噪音。这仍然是通用的。", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_25.wav", "doc_id": "aQpIWggfCo.seg_25", "src_text": "We convert scripts and goals into InstructGPT embeddings and calculate the cosine similarity as similarity scores to measure semantic similarity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们将脚本和目标转换为嵌入式表示，并计算余弦相似性和相似性得分来衡量语义相似性。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_294.wav", "doc_id": "PIZEXUFLAR.seg_294", "src_text": "As we can see, instruction tuning can significantly improve OFA's performance on seen multi-modal tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如我们所见，指令调优可以显著提高同一多模态模型的OSF性能。任务；也就是说", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_184.wav", "doc_id": "SLpqvupgvW.seg_184", "src_text": "The second one, which is the alternative question is generated as follows.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二个语音泡，即替代问题，生成如下所示。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_630.wav", "doc_id": "oeooqChmKK.seg_630", "src_text": "If you're interested in more details, please see our paper and check out the data set and code on GitHub.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果您对更多细节感兴趣，请参阅我们的论文并检查数据集和代码在GitHub上。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_241.wav", "doc_id": "oYCKgTzTDy.seg_241", "src_text": "And we also find many interesting results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还发现了许多有趣的结果。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_582.wav", "doc_id": "rISrKoXQCx.seg_582", "src_text": "So if we do not sanitize political opinions in language model training data, the bias would propagate from pretraining data to language models to downstream tasks, ultimately creating fairness issues.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，如果我们不对语言模型培训数据进行标准化，偏见就会从预训练数据传播到语言模型，最后导致公平性问题。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_161.wav", "doc_id": "SLpqvupgvW.seg_161", "src_text": "My name is Javad Hosseini and this is a joint work with Filip Radlinski, Silvia Pareti, and Annie Louis.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我的名字是JawadHussain，和菲利普·拉德尔斯基、西尔维亚·帕雷蒂和安妮·图伊斯一起合作。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_323.wav", "doc_id": "dJGfOSFgZO.seg_323", "src_text": "To determine what kind of evaluation is most effective, we selected four state-of-the-art chat models and evaluated them on 100 human-bot conversations per model using ABC-Eval.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了确定哪种评估方式最有效，我们选择了四种顶级聊天机型，并在每种模型上评估了100个人类机器人对话。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_650.wav", "doc_id": "FLkGnzVRew.seg_650", "src_text": "To no surprise, the classifier performed not much better than chance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "分类器的性能不如机会。", "score": 43.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_473.wav", "doc_id": "SUkmfOTvGi.seg_473", "src_text": "We then fine-tuned over 20 models on CoNLL-2003.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后我们对康诺尔2003年上市的20多款车型进行了精细调节；", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_136.wav", "doc_id": "wLqFAuDnKa.seg_136", "src_text": "And this can go, in extreme cases, up to 40 BLEURT points.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在极端情况下，这可以达到40%的准确率，所以", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_524.wav", "doc_id": "dvGkKzmIaN.seg_524", "src_text": "We assume the provider can collect a general text corpus and count the word frequency with it.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们假设提供者可以收集一个通用文本语料库，并且可以统计词频。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_175.wav", "doc_id": "SLpqvupgvW.seg_175", "src_text": "Our data set collection methodology emphasizes informality using a cartoon completion setup.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的数据集收集方法使用卡通完成集强调非正式性动", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_806.wav", "doc_id": "WTTtiRKFZI.seg_806", "src_text": "It violates one principle, but it satisfies another one.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "它违反了一个原则，但它满足了另一个原则。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_466.wav", "doc_id": "SUkmfOTvGi.seg_466", "src_text": "Our paper investigated the problem of generalization using the Named Entity Recognition Task or the NER task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的论文使用命名实体识别任务或NER任务来研究泛化问题。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_267.wav", "doc_id": "PIZEXUFLAR.seg_267", "src_text": "Therefore, in this work we want to investigate whether instruction tuning a multi-modal pre-trained models can actually improve generalisation to unseen multi-modal tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，在这项工作中，我们想探讨多模态受训模型的指令调谐是否能实际提高未见多模态任务的泛化能力。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_296.wav", "doc_id": "PIZEXUFLAR.seg_296", "src_text": "Here we can see, as the amount of task increases, the model achieves better performance and in the meantime, lower sensitivity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这里，我们可以看到随着任务数量的增加，模型的性能会得到提高，而在此同时，敏感度会降低。所以", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_702.wav", "doc_id": "oaOHnMCwad.seg_702", "src_text": "Afterwards to stay engaged in the study, they can compare their responses to an AI and others.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "之后，如果要继续学习，他们可以将自己的反应与AI和其他的比较。", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_607.wav", "doc_id": "oeooqChmKK.seg_607", "src_text": "First, entity-specific knowledge such as \"Servin is a judge.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，实体特有的知识，如仆人是法官；", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_332.wav", "doc_id": "dJGfOSFgZO.seg_332", "src_text": "These reliable, informative, and distinct ABC-Eval metrics enable us to evaluate conversational AI with a higher resolution than previous methods are able to achieve.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "息化和独特的ABC-EVL度量法使我们能够评估与高分辨率相比的对话性AI，能够实现以前无法实现的方法。", "score": 41.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_487.wav", "doc_id": "SUkmfOTvGi.seg_487", "src_text": "For data overfitting, we saw that from the graph on the right, the red best fit line has a gradient that is greater than one.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了适应性过度，我们从右边的图表上看到了红色最佳适应线的倾斜度大于1。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_540.wav", "doc_id": "dvGkKzmIaN.seg_540", "src_text": "We also validate the covertness of the provided embedding by visualising the embedding of sentences on four dataset [INAUDIBLE 4:39] PCA.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还通过对20z.v.p.c.a上的句子进行虚拟化来验证提供的嵌入的隐蔽性。", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_475.wav", "doc_id": "SUkmfOTvGi.seg_475", "src_text": "And last but not least, we calculated the percentage change in F1 to assess the generalization of each model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，但不是最后，我们计算了F1的百分比变化，以评估每个模型的泛化。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_497.wav", "doc_id": "SUkmfOTvGi.seg_497", "src_text": "We hope our paper calls for more research on how to improve generalizations of the models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们希望我们的论文呼吁更多的研究关于如何改善模型的泛化。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_364.wav", "doc_id": "gGbuDbHhyc.seg_364", "src_text": "Typically we only need 20 samples per class to attain high performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "通常，我们只需要每个类别20个样本才能实现更好的性能。“”", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_436.wav", "doc_id": "hgIDlKNiFM.seg_436", "src_text": "Then, we present our results on 11 biomedical and clinical downstream tasks in French.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后，我们在11个生物医学和临床下游任务（在法语）", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_827.wav", "doc_id": "WTTtiRKFZI.seg_827", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "讨论。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_840.wav", "doc_id": "GvEBWkLmuI.seg_840", "src_text": "The Asian woman is depicted as unassuming; the Middle-Eastern woman is referred to using words like exotic and like, referring to a mesmerizing region.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "亚洲女人被描述为傻瓜；中东女人被称为异国风情，指的是迷人的地区。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_229.wav", "doc_id": "oYCKgTzTDy.seg_229", "src_text": "The first one is Translate-Test.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是翻译测试，", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_575.wav", "doc_id": "rISrKoXQCx.seg_575", "src_text": "We further show many qualitative examples to see that language models with different political leanings do give different predictions to hate speech and misinformation examples based on their social categories.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们将展示许多定性例子，以便看到带有不同政治含义的语言模型。如果您给出不同的预测，以便在社会类别上提供不同的语音和误解信息示例，那么在", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_625.wav", "doc_id": "oeooqChmKK.seg_625", "src_text": "This suggests that when trained on generic reference resolution data sets, most learn to exploit surface cues, which are not useful when testing on KITMUS where such queues have been removed.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这表明，当在通用差分数据集上训练时，模拟器们学会了利用表面引导信号，这些信号在测试于基底膜上时没有用处。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_800.wav", "doc_id": "WTTtiRKFZI.seg_800", "src_text": "So these two trees only show the length of the crucial dependencies, the ones that are not constant among these two structures.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这两个树只显示了依赖关系的长度。在这些关键依赖关系中，依赖关系不在", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_247.wav", "doc_id": "oYCKgTzTDy.seg_247", "src_text": "We found it is because most of the major natural languages can obtain performance gain, except that English performance drops in seven datasets and only gains in three datasets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现这是因为大多数主要自然语言都可以获得性能增益，除了英语性能下降外。在七个数据集上，仅在三个数据集上获得了最佳结果。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_319.wav", "doc_id": "dJGfOSFgZO.seg_319", "src_text": "We call this approach annotating behaviors in chat or ABC-Eval in short.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们称这种方法为注释行为在聊天中（或简称为ABEVAL），", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_375.wav", "doc_id": "gGbuDbHhyc.seg_375", "src_text": "First, report the model selection criteria.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，报告模型选择标准，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_387.wav", "doc_id": "WBLMIsdIrq.seg_387", "src_text": "For example, how would we translate \"mole\" in this sentence?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，如果前一句话是“如果前一句话是‘如果前一句话是‘如果前一句话是‘", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_116.wav", "doc_id": "uZBWfYjYnf.seg_116", "src_text": "These are all the results of the simultaneous speech translation strategy on German.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这些都是德国同时口头翻译策略的结果。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_10.wav", "doc_id": "aQpIWggfCo.seg_10", "src_text": "In this paper, we first evaluate and improve the constrained language planning ability of large language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在本文中，我们首先评估并改进了大型语言模型的受限语言规划能力。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_443.wav", "doc_id": "hgIDlKNiFM.seg_443", "src_text": "To answer this question, we compare DrBERT with our ChuBERT model, which is based on anonymized data obtained from the Nantes University Hospital data warehouse.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了回答这个问题，我们将伯特模型与我们的舒伯特模型进行比较，后者基于来自非大学医院数据仓库的匿名数据。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_552.wav", "doc_id": "rISrKoXQCx.seg_552", "src_text": "So on one hand, they were able to learn from diverse perspectives, which celebrates democracy and the plurality of ideas.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "一方面能够从多种角度学习，庆祝民主和多元化的思想；", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_682.wav", "doc_id": "oaOHnMCwad.seg_682", "src_text": "This is an example of a design bias where we see systematic performance differences of technology between populations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这就是一个设计偏见的例子，我们看到技术在不同人群之间的系统性性能差异。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_718.wav", "doc_id": "oaOHnMCwad.seg_718", "src_text": "So we have a few recommendations for this.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们对此有几些建议：", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_557.wav", "doc_id": "rISrKoXQCx.seg_557", "src_text": "This ensures us to do automatic evaluation well grounded in political science literature.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这确保我们可以在政治科学文献中进行自动评估。所以", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_408.wav", "doc_id": "WBLMIsdIrq.seg_408", "src_text": "And similarly, we find that certain languages also require context when we want to choose the appropriate verb form.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "类似地，我们发现某些语言在选择适当的词汇形式时也需要语境。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_666.wav", "doc_id": "FLkGnzVRew.seg_666", "src_text": "We also check the feasibility of each strategy for annotation quality and costs to annotators.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还检查了每种策略的可行性，包括注释质量和成本对注释者：", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_686.wav", "doc_id": "oaOHnMCwad.seg_686", "src_text": "And as a researcher, positionality can influence the research process and its outcomes and results because it can change the decisions that researchers make.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "作为研究者，位置性可以影响研究过程及其结果，因为它可以改变研究者做出的决定。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_693.wav", "doc_id": "oaOHnMCwad.seg_693", "src_text": "Our framework works in two main steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的框架在两个主要步骤中工作：", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_70.wav", "doc_id": "TVCREhgqUP.seg_70", "src_text": "After the first step, we have all the right tokens, but they're not ordered.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第一步之后，我们有所有的正确令牌，但它们没有被订购", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_281.wav", "doc_id": "PIZEXUFLAR.seg_281", "src_text": "For testing, we reserve the entire common sense reasoning group for testing, and we select additional 5 tasks from VQ and Miscellaneous groups.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "采样。我们保留了整个常识推理组用于测试，并且我们从Wiki和MiscellaneousGroup中选择了另外五个任务。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_786.wav", "doc_id": "WTTtiRKFZI.seg_786", "src_text": "OK.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "好吧，", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_713.wav", "doc_id": "oaOHnMCwad.seg_713", "src_text": "So for GPT 4, in the social acceptability task, we find that it's most aligned to people with a college education or Graduate School education and we find the same for Dynahate where it's most aligned to people with a college education.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，在社会可接受性任务中，我们发现GPT-4与大学教育或研究生教育最相关，而我们发现同样的事情发生在DALL-E中，它与大学教育最相关。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_268.wav", "doc_id": "PIZEXUFLAR.seg_268", "src_text": "Additionally, at the time of our research, we discovered a considerable discrepancy in the availability of instructional datasets between NLP and multi-modal.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "此外，在我们研究的过程中，我们发现了一个相当大的差异在LAP和多模之间的指令数据集的可用性。目前", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_35.wav", "doc_id": "aQpIWggfCo.seg_35", "src_text": "In total, we generate 55,000 specific goals with scripts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "总共，我们生成了55,000个与脚本相关的特定目标，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_624.wav", "doc_id": "oeooqChmKK.seg_624", "src_text": "When trained on KITMUS, however, both C2F and BERT4Coref perform significantly better than the random choice.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，在基德姆斯上训练，我们的两种模式都比随机选择表现得更好。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_222.wav", "doc_id": "oYCKgTzTDy.seg_222", "src_text": "But Chinese is missing and lack of coverage on certain meaning representation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "国语缺失。莱克人可以勇敢地承受不确定的许多代", "score": 34.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_579.wav", "doc_id": "rISrKoXQCx.seg_579", "src_text": "So a little bit of discussion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们也想在讨论中强调一下，", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_540.wav", "doc_id": "dvGkKzmIaN.seg_540", "src_text": "We also validate the covertness of the provided embedding by visualising the embedding of sentences on four dataset [INAUDIBLE 4:39] PCA.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还通过将句子嵌入到预训练的BERT模型中来验证了提供的嵌入的隐私性。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_692.wav", "doc_id": "oaOHnMCwad.seg_692", "src_text": "We do this through our framework NLPositionality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们通过我们的框架nl位置性来做到这一点。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_31.wav", "doc_id": "aQpIWggfCo.seg_31", "src_text": "Creating the dataset is an essential step to this end.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "创建数据集是实现目标的必不可少一步。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_720.wav", "doc_id": "oaOHnMCwad.seg_720", "src_text": "And the other is to do NLP research with the lens of perspectivism.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "要对研究过程进行NLP研究。", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_473.wav", "doc_id": "SUkmfOTvGi.seg_473", "src_text": "We then fine-tuned over 20 models on CoNLL-2003.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后，我们在Cora2003上对超过20个模型进行了微调。", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_763.wav", "doc_id": "XejEJmgUmE.seg_763", "src_text": "So we did a series of analysis where we tried to perturb the input sentence by, trying to preserve the relevant structure but adding like noise to the input.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们做了一系列的分析，试图通过保留相关结构来破坏输入句子，但添加噪音到输入，并在执", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_746.wav", "doc_id": "XejEJmgUmE.seg_746", "src_text": "So we can do the same thing by choosing unacceptable sentences from the same matching, and that could also be used to test the models acceptability.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们可以通过从同一匹配中选择不可接受的句子来做同样的事情，这也可以用来测试模型的可接受性。", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_247.wav", "doc_id": "oYCKgTzTDy.seg_247", "src_text": "We found it is because most of the major natural languages can obtain performance gain, except that English performance drops in seven datasets and only gains in three datasets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "发现的原因是大多数主要自然语言都能获得性能增强，除了英语在七个数据集中下降，只在三个数据集中上升。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_102.wav", "doc_id": "uZBWfYjYnf.seg_102", "src_text": "Use only one model for every latency regime and handle latency through specific parameters.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "单个模型来处理每个延迟模式，并通过特定的参数来处理延迟。", "score": 59.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_578.wav", "doc_id": "rISrKoXQCx.seg_578", "src_text": "So this has sound the alarm for us to acknowledge and tackle the fairness issues resulting by language model political leanings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以，这个声音警告我们要认识并解决由于语言模式政治表达方式而产生的不公正问题。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_807.wav", "doc_id": "WTTtiRKFZI.seg_807", "src_text": "Ok.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "好吧，", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_17.wav", "doc_id": "aQpIWggfCo.seg_17", "src_text": "Results in the figure show that the semantic completeness in generated scripts is acceptable but the faithfulness to the constraints cannot be guaranteed.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "图表的结果表明，生成的脚本的语义完整性是可接受的，但对约束的忠诚是无法保证的。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_662.wav", "doc_id": "FLkGnzVRew.seg_662", "src_text": "We compare this to the other state-of-the-art AL strategies that are commonly used in the community.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们将其与其他艺术艾尔策略相比，这些策略在社区中通常使用。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_171.wav", "doc_id": "SLpqvupgvW.seg_171", "src_text": "Here are some examples of indirect references for example, \"the newer one\" or \"the song that's not energetic.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，例如，较新的或非能量符号的歌曲。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_187.wav", "doc_id": "SLpqvupgvW.seg_187", "src_text": "Where A and B are samples from Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "A和B是来自维基百科的样本。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_70.wav", "doc_id": "TVCREhgqUP.seg_70", "src_text": "After the first step, we have all the right tokens, but they're not ordered.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一步后，我们已经有了所有正确的token，但它们并没有被正确排序。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_732.wav", "doc_id": "XejEJmgUmE.seg_732", "src_text": "So in this work, we revisit the minimal pair paradigms.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这项工作中，我们重新访问最小对的范畴。因此，", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_623.wav", "doc_id": "oeooqChmKK.seg_623", "src_text": "Without task-specific training on KITMUS, both models do not perform well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "结果。在任务特定训练的KidMus中。这两个模型都没有表现出很好的效果。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_639.wav", "doc_id": "FLkGnzVRew.seg_639", "src_text": "While dissonance is a very common phenomenon we experienced in daily decision making, they are really rare to find expressed in language among other kinds of discourse relations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "虽然不一致是我们在日常决策中经常遇到的现象，但在其他类型的言语关系中表达不一致的现象却很少见。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_84.wav", "doc_id": "TVCREhgqUP.seg_84", "src_text": "First of all, the alignment between input and output is not given in the training data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，输入和输出之间的对齐在训练数据中没有给出；", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_787.wav", "doc_id": "WTTtiRKFZI.seg_787", "src_text": "The argument is based on the principle of dependency length minimization that I will explain on the basis of these examples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这个论点是基于依赖关系长度最小化原则，这些例子就是基", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_138.wav", "doc_id": "wLqFAuDnKa.seg_138", "src_text": "In our experiments, we settled for a 5-shot prompting strategy where we just marked each sentence that we provide to the system, with the language it's in.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "的实验中，我们选择了一种五发促进策略，只是用系统中的语言标记我们提供给系统的每个句子。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_613.wav", "doc_id": "oeooqChmKK.seg_613", "src_text": "Second, there's a \"Background-Both\" setting, where background knowledge is available both at pretrain time and inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "其次，有两种背景设置：在练习前和练习时都有", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_616.wav", "doc_id": "oeooqChmKK.seg_616", "src_text": "For example, because new occupations have developed since the time of pretraining.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，因为新职业自从预训练以来已经发展起来。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_169.wav", "doc_id": "SLpqvupgvW.seg_169", "src_text": "Or the pronunciations are too similar to each other and hard to disambiguate.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "或者发音太相似，难以区分，", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_730.wav", "doc_id": "XejEJmgUmE.seg_730", "src_text": "Language model acceptability judgments are not always robust to context.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "语言模型可接受性判断并不总是对上下文有坚实的依据。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_533.wav", "doc_id": "dvGkKzmIaN.seg_533", "src_text": "Then the provider requests the embeddings from the stealer's service with the data set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后，提供商要求从类似的服务中获取数据集的嵌入。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_584.wav", "doc_id": "rISrKoXQCx.seg_584", "src_text": "And it's incredibly hard to determine what is actually neutral and should be retaining language monitoring data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "而且很难确定什么实际上是中立的，并且应该保留语言监控数据，因此它", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_419.wav", "doc_id": "WBLMIsdIrq.seg_419", "src_text": "And finally, we use our benchmark as well as other metrics to evaluate different models on the document-level machine translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，我们使用我们的基准值（与其他矩阵一样）来评估文档级计算机传输中的不同模型。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_32.wav", "doc_id": "aQpIWggfCo.seg_32", "src_text": "However, previous studies do not enable planning for specific goals and manual dataset annotation is expensive.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，以前的研究无法为特定目标制定计划，而且手动注释数据集是很昂贵的。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_273.wav", "doc_id": "PIZEXUFLAR.seg_273", "src_text": "These tasks are derived from 21 existing open-source dataset and each task is equipped with five expert written instructions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这些任务是从现有的开源数据集中衍生出来的，每个任务都配备了五个预先编写的指令。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_176.wav", "doc_id": "SLpqvupgvW.seg_176", "src_text": "The cartoon has three speech bubbles.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "画片有三个说话泡沫；", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_674.wav", "doc_id": "oaOHnMCwad.seg_674", "src_text": "Hi everyone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_187.wav", "doc_id": "SLpqvupgvW.seg_187", "src_text": "Where A and B are samples from Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "A和B是来自维基百科的示例。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_805.wav", "doc_id": "WTTtiRKFZI.seg_805", "src_text": "Right?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "确），", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_650.wav", "doc_id": "FLkGnzVRew.seg_650", "src_text": "To no surprise, the classifier performed not much better than chance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "毫不奇怪，分类器的表现不比随机好多少。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_344.wav", "doc_id": "gGbuDbHhyc.seg_344", "src_text": "I'd like to begin with a brief introduction to weak supervision and weakly supervised learning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "9.50.51.52.53.54.5", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_47.wav", "doc_id": "TVCREhgqUP.seg_47", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "Hi", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_410.wav", "doc_id": "WBLMIsdIrq.seg_410", "src_text": "And this helps us identify cases like the one here, where in Chinese you need context to translate proper nouns to make sure that you're using the same translation within the document.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这有助于识别像你这里的案例：在中国，我们需要联系传递正式通知，以确保您在文档中使用相同的传输。", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_433.wav", "doc_id": "hgIDlKNiFM.seg_433", "src_text": "Then we will present the main contribution of our article.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后我们将介绍我们文章的主要贡献", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_601.wav", "doc_id": "oeooqChmKK.seg_601", "src_text": "Servin is a judge.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "塞文是法官，", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_553.wav", "doc_id": "rISrKoXQCx.seg_553", "src_text": "On the other hand, these different political opinions are inherently socially biased and might lead to potential fairness issues in downstream task applications.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "另一方面，这些不同的政治观点在社会上是有偏见的，并且可能导致潜在的公平性问题。", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_250.wav", "doc_id": "oYCKgTzTDy.seg_250", "src_text": "In this figure, the blue line is Cross-lingual Few-shot transfer.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这个图中，蓝色线是跨语言的虚拟传输，", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_701.wav", "doc_id": "oaOHnMCwad.seg_701", "src_text": "We host 2 tasks on lab in the wild, one of them being social acceptability, and the way this works is that participants will read a situation from the social chemistry dataset and, then they'll write how socially acceptable a situation is.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们在现实世界中进行了两个任务，一个是社会可接受性，方式是参与者从社会化学数据集中读取情况，然后他们会评估情况的社会可接受性。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_8.wav", "doc_id": "aQpIWggfCo.seg_8", "src_text": "An abstract goal can be inherited by different real-life specific goals with multi-faceted constraints.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "一个抽象的目标可以由具有多面约束的不同现实生活目标继承;", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_535.wav", "doc_id": "dvGkKzmIaN.seg_535", "src_text": "We compute the similarity difference between benign and backdoor data set which is defined as delta cosine and delta L2.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们计算了Benign和后门数据集之间的相似性差异，这被定义为Deltacosin和DeltaL2。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_215.wav", "doc_id": "oYCKgTzTDy.seg_215", "src_text": "Hello everyone, my name is Yusen Zhang from the Penn State University.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大家好，我是宾夕法尼亚大学的UsmanJohn。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_360.wav", "doc_id": "gGbuDbHhyc.seg_360", "src_text": "Otherwise, there is a large performance drop.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "否则，性能会大幅下降，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_417.wav", "doc_id": "WBLMIsdIrq.seg_417", "src_text": "We can then also note that different languages have different proportions of these discourse phenomena.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后我们还可以注意到，不同语言中这些话语现象的比例是不同的。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_671.wav", "doc_id": "FLkGnzVRew.seg_671", "src_text": "These are the links to our core data set and our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这些是连接到我们的代码数据集和纸张的链接。感谢您与我们联系。", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_648.wav", "doc_id": "FLkGnzVRew.seg_648", "src_text": "As can be seen here, dissonance was only found in 3.5% of the annotated pairs.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如图所示，仅有3.5%的对齐对出现了不一致。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_393.wav", "doc_id": "WBLMIsdIrq.seg_393", "src_text": "And some people have suggested targeted evaluation on context-dependent translations, but these resources only support limited types of context-dependent translations and limited sets of languages since they usually rely on domain knowledge and human curation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "并且一些人已经建议针对上下文依赖翻译进行目标评估，但这些资源只支持有限的上下文依赖翻译类型和有限的语言集，因为它们通常依赖于领域知识和人类创作。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_516.wav", "doc_id": "dvGkKzmIaN.seg_516", "src_text": "Existing works can be broadly classified into four categories.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "现有的作品可以大致分为四类。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_96.wav", "doc_id": "uZBWfYjYnf.seg_96", "src_text": "Specific architectures are usually trained, introducing additional modules to be optimized.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "通常需要额外的模块来优化，", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_209.wav", "doc_id": "SLpqvupgvW.seg_209", "src_text": "If the language model has access to some partially overlapping background knowledge, then the accuracy is between 82 to 87%, which is more realistic.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果语言模型可以访问一些部分重叠的背景知识，那么精度在82%到87%之间，这更具现实意义，", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_52.wav", "doc_id": "TVCREhgqUP.seg_52", "src_text": "As usual, we have a training set of utterances.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "样：通常，我们在这种情况下会训练自己", "score": 23.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_704.wav", "doc_id": "oaOHnMCwad.seg_704", "src_text": "We then replicate a very similar setup for the toxicity and hate speech detection task, where they'll read an instance from Dynahate and write whether they think it's instance of hate speech.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后我们重复了一个非常相似的设置用于毒害和仇恨言论检测任务，其中他们读取一个实例并写出他们是否认为这是仇恨言论的实例。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_655.wav", "doc_id": "FLkGnzVRew.seg_655", "src_text": "We find that on transferring the zero-shot performance on the annotated data set is already much better than chance with the best, with AUC .62.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现在传输注解数据集上的零短期表现已经远远好于机会，使用AUC点6.2的最佳结果。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_796.wav", "doc_id": "WTTtiRKFZI.seg_796", "src_text": "\"Marge read this absolutely fascinating book about bees yesterday.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这两个句子都很好。马克·雷德写了关于昨天的比斯塞斯的绝对迷人的书。但也可以说马格读了昨天这本关于蜜蜂的绝对迷人的书。所", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_370.wav", "doc_id": "gGbuDbHhyc.seg_370", "src_text": "However, if we allow to continue fine-tuning on the clean samples, then FTw performs equally well as other methods.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，如果我们被允许继续对清晰样本进行精细调节，那么FTW的性能将与其他方法一样好。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_452.wav", "doc_id": "hgIDlKNiFM.seg_452", "src_text": "These models are compared to six baseline models which are CamemBERT OSCAR 138 GB, CamemBERT OSCAR 4 GB, CamemBERT CCNET 4 GB, PubMedBERT, BioBERT, and ClinicalBERT.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这个模型与六个基准模型进行比较，包括：Camembert-Oscar（1.38GB）、Camembert-Oscar（4GB）、Camembert-CiNet（4GB）、PlumeBERT（BioBERT）和ClinicalBERT。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_41.wav", "doc_id": "aQpIWggfCo.seg_41", "src_text": "In summary, we establish the constrained language planning problem.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "总之，我们建立了受限语言规划问题，", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_412.wav", "doc_id": "WBLMIsdIrq.seg_412", "src_text": "And finally, we look at different individual tokens that have high P-CXMI.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，我们看到了不同的个体标记具有高位的pxmi，并允", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_55.wav", "doc_id": "TVCREhgqUP.seg_55", "src_text": "These utterances are paired with logical forms that represent core aspects of their meaning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这些先驱者，带着逻辑的形式，代表了他们的思想。", "score": 59.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_415.wav", "doc_id": "WBLMIsdIrq.seg_415", "src_text": "For each of the five discourse phenomena we identified, we create taggers to automatically identify words that pertain to the phenomenon.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对于我们识别的每一个五个现象，我们都创建了标志来自动识别与该现象相关的单词，并称", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_815.wav", "doc_id": "WTTtiRKFZI.seg_815", "src_text": "So the governor is on the left in this example \"I saw Bart and Lisa\" so is the governor is on the left.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这个例子中，政府在左侧（isobutyl）和左侧（lisa），所以政府在左侧（absent）", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_9.wav", "doc_id": "aQpIWggfCo.seg_9", "src_text": "A good planner should write scripts that are reasonable and faithful to constraints.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "一个好的规划者应该写出遵循约束的合理和可靠的脚本。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_657.wav", "doc_id": "FLkGnzVRew.seg_657", "src_text": "Thus, this is the model that we use to cold start the active learning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "移学习对CE任务的迁移学习后，通过进一步对迁移学习进行迁移学习，我们发现迁移学习对CE任务的迁移学习后，通过进一步对迁移学习进行迁移学习，我们发现�", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_846.wav", "doc_id": "GvEBWkLmuI.seg_846", "src_text": "The second part is marked words, which is a method to identify the words that distinguish marked groups from unmarked ones, which I'll elaborate on shortly.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第二部分是标记词，这是一种方法，用于识别区分标记组和未标记组的词。", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_177.wav", "doc_id": "SLpqvupgvW.seg_177", "src_text": "In the first bubble, Bob says, \"Remember that song we were listening to yesterday?\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在第一个泡沫中，Bob说：“记住我们昨天听到的那首歌”", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_196.wav", "doc_id": "SLpqvupgvW.seg_196", "src_text": "So what we do is that we show some background knowledge about the two entities.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们做的是展示一些关于20世纪的背景知识；", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_163.wav", "doc_id": "SLpqvupgvW.seg_163", "src_text": "Consider this alternative question.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "考虑这个替代问题：", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_159.wav", "doc_id": "SLpqvupgvW.seg_159", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "今天，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_396.wav", "doc_id": "WBLMIsdIrq.seg_396", "src_text": "And second, how well do models handle these cases?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "其次，模型如何处理这些情况？", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_522.wav", "doc_id": "dvGkKzmIaN.seg_522", "src_text": "Before these main steps, we first select a trigger set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这些主要步骤之前，我们首先选择一个触发器集；", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_303.wav", "doc_id": "PIZEXUFLAR.seg_303", "src_text": "So overall, we propose the first large scale multi-model instruction tuning dataset with significantly improved their short capability of OFA, and we explore different transfer learning technique and show their benefits.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们提出了第一个大规模的多模态指令调节数据集，我们显著提高了OFA的实时能力，我们探索了不同的转移学习技术，并展示了它们的好处，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_748.wav", "doc_id": "XejEJmgUmE.seg_748", "src_text": "So that is what we call as the mismatch scenario.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "此我们称之为不匹配场景。所以在", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_236.wav", "doc_id": "oYCKgTzTDy.seg_236", "src_text": "For example, we put the German, English, Chinese queries together to train a multilingual model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，我们把德语、英语和中文的查询放在一起训练一个多语言模型，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_463.wav", "doc_id": "SUkmfOTvGi.seg_463", "src_text": "Hello everyone, my name is Shuheng.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "大家好，我是Zhuang，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_716.wav", "doc_id": "oaOHnMCwad.seg_716", "src_text": "We find this in the GPT 4 social acceptability task as well as the Dynahate task analysis as well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们在社会可接受性任务（GPT-4）中发现了这一点。同样地，DINIE也可以用于任务分析。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第一个是模型架构；", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_237.wav", "doc_id": "oYCKgTzTDy.seg_237", "src_text": "And during inference we can use this model to translate German queries or Chinese queries, et cetera.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在推理时，我们也可以使用这个模型。我想问一下，欧盟委员会是否有任何关于此事的信息。", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_711.wav", "doc_id": "oaOHnMCwad.seg_711", "src_text": "We find that Dynahate is also most aligned to English speaking countries.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现DINAHET也是最匹配的英语国家。”", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_461.wav", "doc_id": "hgIDlKNiFM.seg_461", "src_text": "All the pre-trained model obtained from NACHOS are freely available on Hugging Face, and under the MIT license, and all the training scripts are on our GitHub repository.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "从Nachiós获得的所有预训练模型都可以在Yugenface上免费获得，并且所有的训练脚本都在我们的GitHub存储库上。所以", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_667.wav", "doc_id": "FLkGnzVRew.seg_667", "src_text": "We find that PRC has the highest percentage of dissonance and works best for rare class.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现PRC有最高的不一致百分比，并且适用于罕见类别，但", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_797.wav", "doc_id": "WTTtiRKFZI.seg_797", "src_text": "It's okay the way instead of \"it\", we have this long NP.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "以这里的理由", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_320.wav", "doc_id": "dJGfOSFgZO.seg_320", "src_text": "We developed this method to comprehensively cover chat model behaviors that have been suggested to affect chat quality in recent literature.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们开发了这种方法来全面覆盖聊天模型行为，这些行为被认为会影响聊天质量的研究中提出的。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_127.wav", "doc_id": "wLqFAuDnKa.seg_127", "src_text": "In this work, we present the first systematic study of large language model prompting for machine translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这项工作中，我们提出了机器翻译中大语言模型提示的第一个系统研究。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_67.wav", "doc_id": "TVCREhgqUP.seg_67", "src_text": "For the first time, we show strong generalization to deeper recursion without relying on trees.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们会展示强大的泛化以避免在纱线上滴落。", "score": 43.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_74.wav", "doc_id": "TVCREhgqUP.seg_74", "src_text": "Conceptually, our permutation model works roughly like this.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "从概念上讲，我们的变换模型大致像这样工作。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_205.wav", "doc_id": "SLpqvupgvW.seg_205", "src_text": "The AltEntities Corpus has 6,000 alternative questions across three domains, and it has 42,000 indirect referring expressions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "AlternatingCorpus在三个领域中有6,000个替代问题，并且有42,000个间接引用表达式。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_377.wav", "doc_id": "gGbuDbHhyc.seg_377", "src_text": "Second, WSL approaches should be compared with few-shot learning baselines, as both work on clean samples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "其次，WLS方法应该与未来学习的基线相比较；", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_622.wav", "doc_id": "oeooqChmKK.seg_622", "src_text": "In this figure, we show the results of the best-performing models on the most difficult variant of the Background-Pretrain setting.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在此图中，我们显示了背景预训练设置中最难的变体中表现最好的模型的", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_196.wav", "doc_id": "SLpqvupgvW.seg_196", "src_text": "So what we do is that we show some background knowledge about the two entities.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们展示了关于这两个实体的背景知识。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_696.wav", "doc_id": "oaOHnMCwad.seg_696", "src_text": "And so we opt to re annotate data to get many annotates for instance and to get a rich set of demographic data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们选择重新注册数据，以便获得许多实体，例如人口统计数据。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_131.wav", "doc_id": "wLqFAuDnKa.seg_131", "src_text": "We use state-of-the-art, neural MT metrics, and additionally also show expert-based human evaluation results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们使用最新的神经MT评估指标，并且还展示了基于专家的人类评估结果。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_289.wav", "doc_id": "PIZEXUFLAR.seg_289", "src_text": "If the task is a multi-model classification task, we report accuracy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果任务是多模态分类任务，我们报告准确率；", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_783.wav", "doc_id": "WTTtiRKFZI.seg_783", "src_text": "So we get dependencies from the governor.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_371.wav", "doc_id": "gGbuDbHhyc.seg_371", "src_text": "So in practice, there's no reason to choose more complex WSL methods which require more computation time and disk space.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，在实践中，没有理由选择更复杂的WSL方法，这些方法需要更多的计算时间和磁盘空间。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_784.wav", "doc_id": "WTTtiRKFZI.seg_784", "src_text": "Here loves to all conjuncts separately: Lisa, Bart, and Maggie.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_822.wav", "doc_id": "WTTtiRKFZI.seg_822", "src_text": "What we see here is that when the governor is on the left, the tendency for the left conjunct to be shorter grows steadily, with the absolute difference in words, and the same is observed when there is no governor as in coordination of sentences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们说的是什么？那就是当总督在左边时左翼结合的倾向是逐渐增长的，词语之间绝对不同，当没有总理参与协调成语时也会观察到这一现象，", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_470.wav", "doc_id": "SUkmfOTvGi.seg_470", "src_text": "At the same time, if we do observe poor generalization, what causes the performance drop of these models?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "同时，如果我们观察到一般化不佳，这会导致这些模型的性能下降的原因是什么？", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_355.wav", "doc_id": "gGbuDbHhyc.seg_355", "src_text": "First, is clean validation data necessary for WSL or can we maybe use a noisy validation set instead?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，是否需要清洁的验证数据来验证WSL，或者我们可以使用一个嘈杂的验证集代替吗？", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_88.wav", "doc_id": "TVCREhgqUP.seg_88", "src_text": "Our permutation method is very flexible, but it brings the challenge that finding the highest-scoring permutation is NP-hard.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的变换方法非常灵活，但它带来了一个挑战：找到最高得分变换很难，", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_869.wav", "doc_id": "GvEBWkLmuI.seg_869", "src_text": "This connects to an archetype that people have called the \"Strong Black Women\" archetype.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这与人们所称的“强大黑人女性”典型而且听", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_103.wav", "doc_id": "uZBWfYjYnf.seg_103", "src_text": "And leverage the knowledge already acquired by the model through the attention mechanism between audio input and textual output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通过注意力机制在音频输入和文本输出之间的注意力机制，已经获得了所需的知识。", "score": 67.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_101.wav", "doc_id": "uZBWfYjYnf.seg_101", "src_text": "First, to use already existing offline ST models without re-training or adopting specific architecture for SimulST.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，使用现有的离线ST模型，不需要重新训练或采用特定架构来适应CST；使用", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_661.wav", "doc_id": "FLkGnzVRew.seg_661", "src_text": "Next, to improve the number of dissonance examples, we use a Probability-of-Rare-Class strategy — PRC — to select mostly the examples that are highly likely to be descended by the current model at any round of rare.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "接下来，为了提高不一致示例的数量，我们使用概率稀疏类策略（PRC）来选择大多数示例，高度可能在任何一轮的AL中与当前模型不一致。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_725.wav", "doc_id": "oaOHnMCwad.seg_725", "src_text": "And so that concludes our presentation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的演讲也包括了这一点。", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_657.wav", "doc_id": "FLkGnzVRew.seg_657", "src_text": "Thus, this is the model that we use to cold start the active learning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此这是我们用来启动主动学习的模型。", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_119.wav", "doc_id": "uZBWfYjYnf.seg_119", "src_text": "If you want to discover more results, read our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果你想了解更多结果，请阅读我们的论文，", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_178.wav", "doc_id": "SLpqvupgvW.seg_178", "src_text": "And with that, Bob sets the dialogue context.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，并通过这句话设定了对话背景。", "score": 32.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_873.wav", "doc_id": "GvEBWkLmuI.seg_873", "src_text": "So based on these patterns, we conclude with three recommendations for model owners.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "基于这些模式，我们可以得出三条模型所有者的建议", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_348.wav", "doc_id": "gGbuDbHhyc.seg_348", "src_text": "If we directly train neural networks on weakly labeled data, the neural networks tend to memorize the label noise and do not generalize.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在弱监督中，我们不会手动标记数据，而是使用弱标签源来标记数据，例如简单的规则、知识库或低质量的引语源，如右图所示。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_623.wav", "doc_id": "oeooqChmKK.seg_623", "src_text": "Without task-specific training on KITMUS, both models do not perform well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "结果。在基德姆斯上进行任务特定的训练，然而，两种模式都没有表现出色；", "score": 48.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_260.wav", "doc_id": "oYCKgTzTDy.seg_260", "src_text": "And et cetera.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，我们", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_42.wav", "doc_id": "aQpIWggfCo.seg_42", "src_text": "We evaluate constrained language planning ability of large language models and develop an over-generate-then-filter method for large language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们评估了大型语言模型的受限语言规划能力，并开发了大型语言模型的过度生成滤波器方法。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_32.wav", "doc_id": "aQpIWggfCo.seg_32", "src_text": "However, previous studies do not enable planning for specific goals and manual dataset annotation is expensive.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，之前的研究无法为特定目标进行规划，并且手动数据集注释是昂贵的。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_870.wav", "doc_id": "GvEBWkLmuI.seg_870", "src_text": "And while it sounds positive at first glance, there's been work showing that this kind of archetype actually is very harmful because it puts a lot of pressure on these demographics to be resilient and strong against societal obstacles.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一眼看起来似乎是积极的，但实际上工作已经表明，这种架构类型实际上是非常有害的，因为它会给这些人施加很多压力，要求他们要坚韧和强大，抵抗社会障碍", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_111.wav", "doc_id": "uZBWfYjYnf.seg_111", "src_text": "If we look at the main results of EDAtt, we'll plot the simultaneous speech translation results on graphs in which we have BLEU on one side that measures the translation quality, and average lagging that is the latency measure, and we also consider the computational aware average lagging that accounts for the model's computational times to predict the output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果你看一下那件事的主要结果。我们将在图表上将同时翻译结果打印出来，其中一边是蓝色，用于衡量翻译质量和平均延迟。这就是延迟测量，我们还考虑计算机意识的平均缺失，计算机模型计算时间预测输出。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_194.wav", "doc_id": "SLpqvupgvW.seg_194", "src_text": "For example, the same genre or the same artist for a song.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如同一类型或同一艺术家等。", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_71.wav", "doc_id": "TVCREhgqUP.seg_71", "src_text": "That's why in the second step we use another model to predict a permutation to put them into the right order.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，在第二步，我们使用另一个模型来预测一个置换来将它们放入正确的顺序。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_580.wav", "doc_id": "rISrKoXQCx.seg_580", "src_text": "We would also like to highlight that we expose the unique dilemma regarding language model political biases.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们暴露了语言模型政治偏见的独特困境：", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_4.wav", "doc_id": "aQpIWggfCo.seg_4", "src_text": "And show that large language models can effectively decompose goals into steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，并且表明大型语言模型可以有效地将目标分解为步骤。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_640.wav", "doc_id": "FLkGnzVRew.seg_640", "src_text": "So why does this matter?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以为什么会有这种问题？", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_107.wav", "doc_id": "uZBWfYjYnf.seg_107", "src_text": "For example, if we receive a speech chunk containing \"I'm going to talk about...\" and our model predicts the translation in German, and we will look at the cross-attention weights, we'll see that the first two words points to the earliest received speech frames, while the last word points to the last received speech frames, as lambda speech frames.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，如果我们收到一段语音片段，内容是“我要谈论的”，而我们的模型预测翻译成德语。我们将看待跨注意力等待我们看到前两个单词指的是最早接收到的语音框架，而最后一个单词指的是最后接收到的语音框架（至少是lamda语音框架）。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_239.wav", "doc_id": "oYCKgTzTDy.seg_239", "src_text": "We train on one source language and transfer to another language.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在一个源语言之间传输和传输到另一个语言之间。因此，", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_99.wav", "doc_id": "uZBWfYjYnf.seg_99", "src_text": "For example, training a model with an average of one second latency and another one with two seconds latency, and so on.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，其平均延迟为1秒钟，另一个模型的平均延迟为2秒钟，等等。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_379.wav", "doc_id": "gGbuDbHhyc.seg_379", "src_text": "Finally, we have open-sourced our code.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，我们有了开放源代码；", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_707.wav", "doc_id": "oaOHnMCwad.seg_707", "src_text": "So now we're better equipped to answer who do NLP datasets and models align with the most.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "虽然我们现在更好地回答了NLP数据集和模型与哪些国家最匹配的问题，但", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_328.wav", "doc_id": "dJGfOSFgZO.seg_328", "src_text": "For example, you can see how measuring the proportion of turns with self and partner contradictions explains 5% and 10% of conversation quality, respectively, while the average Likert consistency scores explain only 4% or less.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，您可以看到如何测量自我和配偶的对立性与对话质量的比例，分别解释了5%和10%的对话质量，而平均的Lickert一致性得分只解释了4%或更少。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_114.wav", "doc_id": "uZBWfYjYnf.seg_114", "src_text": "And we compare with popular strategies that are also applied to offline models that are the Wait-k strategy and the Local Agreement.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们将我们的方法与适用于离线模型的适当策略进行比较，例如Whitkey策略和局部一致性策略，并且我们还将", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_854.wav", "doc_id": "GvEBWkLmuI.seg_854", "src_text": "Now for some results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "现在我们来看一下结果。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_182.wav", "doc_id": "SLpqvupgvW.seg_182", "src_text": "We provide the first and second speech bubbles automatically, but the third one is filled in by the annotator.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们自动提供第一个和第二个语音泡，但第三个语音泡由注释者填充。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_39.wav", "doc_id": "aQpIWggfCo.seg_39", "src_text": "With CoScript we can try smaller but specialized models for constrained language planning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "以使用Corescript训练更小但更专门的模型来进行受限语言规划。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_272.wav", "doc_id": "PIZEXUFLAR.seg_272", "src_text": "Here we present MultiInstruct, the first multi-modal instruction tuning benchmark dataset that consists of 62 diverse multi-modal tasks covering 10 broad categories.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这里，我们展示了MultiInstruct，第一个多模态指令调整基准数据集，它由62个不同的多模态任务组成，涵盖10个类别。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_298.wav", "doc_id": "PIZEXUFLAR.seg_298", "src_text": "We use one instruction versus 5 instruction.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们使用一条指令对五条指令进行比较，", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_47.wav", "doc_id": "TVCREhgqUP.seg_47", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你好，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_303.wav", "doc_id": "PIZEXUFLAR.seg_303", "src_text": "So overall, we propose the first large scale multi-model instruction tuning dataset with significantly improved their short capability of OFA, and we explore different transfer learning technique and show their benefits.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "总之，我们提出了一个用于大规模多模数指令调谐数据集，显然可以显著提高OFV的射程，并探索不同的传输学习技术，并展示其优势。", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_168.wav", "doc_id": "SLpqvupgvW.seg_168", "src_text": "This could happen when the user cannot remember the name of the song.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这会发生在用户无法记住歌曲名称时。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_185.wav", "doc_id": "SLpqvupgvW.seg_185", "src_text": "We always use a simple template.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们总是使用一个简单的模板，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_503.wav", "doc_id": "dvGkKzmIaN.seg_503", "src_text": "Protecting the copyright of large language models for embedding as services via backdoor watermark.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "保护大语言模型用于嵌入和服务的版权：通过后门水印让我们首", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_282.wav", "doc_id": "PIZEXUFLAR.seg_282", "src_text": "We use all the instances in the test split for each task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们使用每个任务的所有示例进行测试。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_492.wav", "doc_id": "SUkmfOTvGi.seg_492", "src_text": "Our conclusion is that, for good generalization we would need a better model architecture, larger model size, as well as more fine tuning examples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的结论是：为了良好的泛化，我们需要更好的模型架构、更大的模型尺寸以及更多的精细调节示例。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_800.wav", "doc_id": "WTTtiRKFZI.seg_800", "src_text": "So these two trees only show the length of the crucial dependencies, the ones that are not constant among these two structures.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这两棵树只显示了关键依赖关系的长度，这些关系在这两种结构中是不恒定的。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_327.wav", "doc_id": "dJGfOSFgZO.seg_327", "src_text": "In addition, ABC-Eval labels are more predictive of the overall conversation quality compared to metrics produced by existing methods, as shown by this simple linear regression analysis.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "此外，ABC-EV的标签比由现有方法产生的矩阵更能预测整体通话质量，正如通过这种简单的线性降低分析所显示的。", "score": 66.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_726.wav", "doc_id": "oaOHnMCwad.seg_726", "src_text": "But if you'd like to learn more, feel free to check out our dashboard for the most updated analysis results and our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以如果您想学习更多，请自由查看我们的桌子，查看最先进的分析结果和我们的论文。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是模型架构。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_680.wav", "doc_id": "oaOHnMCwad.seg_680", "src_text": "But that's not really the case for Aditya Sharma.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但这并不是阿迪蒂亚·沙玛的真正情况，因为我们对攻击", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_396.wav", "doc_id": "WBLMIsdIrq.seg_396", "src_text": "And second, how well do models handle these cases?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "其次，模型如何处理这些情况？", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_71.wav", "doc_id": "TVCREhgqUP.seg_71", "src_text": "That's why in the second step we use another model to predict a permutation to put them into the right order.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这就是为什么在第二步中，我们使用另一种模型来预测混血，以将它们放入正确的顺序。", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_517.wav", "doc_id": "dvGkKzmIaN.seg_517", "src_text": "However, this method either not applicable to embedding as services or lack of transferability.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，这些方法既不能用于嵌入广告服务，也不能用于缺乏可传输性。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_159.wav", "doc_id": "SLpqvupgvW.seg_159", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "嗨，", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_56.wav", "doc_id": "TVCREhgqUP.seg_56", "src_text": "In contrast to standard machine learning evaluation, the test set does not come from the same distribution but contains structurally unseen logical forms.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "与标准机器学习评估不同，这个测试集并不是来自相同的分布，而是包含结构化的非逻辑形式。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_773.wav", "doc_id": "WTTtiRKFZI.seg_773", "src_text": "So for example, in the universal dependencies, the structure of the coordination, Lisa, Bart, and Maggie, such that the first conjunct is the head of the whole coordinate structure.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以，我们从头到尾都有依赖。最后，这也是一种多头方法的方法（即在Transformer中使用多个头），例如在Katz的词法语法中，所有的连接都是头的结构，所以我们可以从总督那里得到依赖性，依赖性是关于和做的。现在，目标论文是为了产生一个新的论点，即（）对比了这些结构的协调性", "score": 20.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_863.wav", "doc_id": "GvEBWkLmuI.seg_863", "src_text": "And these words define these groups only by their relationship to their identity and distinguish them as different from the white norm.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这些词与其身份的关系来定义这些组，并将它们区分开来，区别于白人规范。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_594.wav", "doc_id": "oeooqChmKK.seg_594", "src_text": "For example, in the sentence, \"John saw the newly elected president on TV.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，在这句话中，约翰看到电视上刚刚当选的总统。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_695.wav", "doc_id": "oaOHnMCwad.seg_695", "src_text": "And we ought to do this over looking at the demographics of original data sets annotators, because, usually only a few annotators annotate each instance and because demographics are rarely collected and shared.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们可以通过查看原始数据集的人口统计图来做到这一点，因为通常只有少数注释者注释过每个实例，而且因为人口统计图确实是真正收集和共享的。", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_590.wav", "doc_id": "oeooqChmKK.seg_590", "src_text": "This work is a collaboration between McGill University, Mila, and Microsoft Research.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "个工作是麦吉尔大学（McGillUniversity）、Mila和微软研究（MicrosoftResearch）之间的合作。", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_472.wav", "doc_id": "SUkmfOTvGi.seg_472", "src_text": "This is a data set that we collected from Reuters News from 2020, and then annotated them with the same CoNLL-2003 annotation guidelines.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是我们从2002年收集的数据集，然后用同样的卡诺尔2003年注释指南注释它们。", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_780.wav", "doc_id": "WTTtiRKFZI.seg_780", "src_text": "The conjunction headed approach assumed in Prague dependency treebanks, where coordinate structures are headed by the conjunction.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_299.wav", "doc_id": "PIZEXUFLAR.seg_299", "src_text": "As we can see, using more instructions can improve the model's overall performance and reduce its sensitivity a lot.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "正如我们所看到的，使用更多的指令可以提高模型的整体性能，并且可以大大降低其敏感性。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_611.wav", "doc_id": "oeooqChmKK.seg_611", "src_text": "We have defined three settings of KITMUS.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们定义了三个孩子的设置。", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_263.wav", "doc_id": "PIZEXUFLAR.seg_263", "src_text": "Hello everyone, my name is Ying and my colleague Zhiyang and I will be presenting our research on MultiInstruct improving Multi-Modal Zero-Shot Learning via Instruction Tuning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大家好，我是Yin，我和我的同事Jiayan将介绍我们关于多指令的研究：通过指令调节多模式性性学习。因此，", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_775.wav", "doc_id": "WTTtiRKFZI.seg_775", "src_text": "A similar approach is assumed in Igor Mel'čuk's meaning text theory, where again, the whole coordinate structure is headed by the first conjuct.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "尔丘克在他的著作中提到了这种方法，称之为“全连接结构由第一个连接组头部控制”，因此", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_143.wav", "doc_id": "wLqFAuDnKa.seg_143", "src_text": "It's the examples that carry most of the weight.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "它的例子承载了大部分的重量。哇！", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_375.wav", "doc_id": "gGbuDbHhyc.seg_375", "src_text": "First, report the model selection criteria.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，报告模型选择标准，", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_288.wav", "doc_id": "PIZEXUFLAR.seg_288", "src_text": "In each experiment, we report the min and max performance and the standard deviation of the performance across all 5 experiments.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "大值的性能。并且在这五个实验中，性能的标准差。", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_716.wav", "doc_id": "oaOHnMCwad.seg_716", "src_text": "We find this in the GPT 4 social acceptability task as well as the Dynahate task analysis as well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们在GPD的四个社会可接受性任务中发现了这一点。", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_186.wav", "doc_id": "SLpqvupgvW.seg_186", "src_text": "Do you mean A or B?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你是指A还是B？", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_566.wav", "doc_id": "rISrKoXQCx.seg_566", "src_text": "So we divide pretraining corpora, into pre 45th president of the United States and after 45th president of the United States.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们将预训练的语料库分为两部分：在45年前美国总统之前和在45年前美国总统之后，", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_672.wav", "doc_id": "FLkGnzVRew.seg_672", "src_text": "Feel free to get in touch with us if you have any questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果你有任何问题，请随时与我们联系。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_774.wav", "doc_id": "WTTtiRKFZI.seg_774", "src_text": "So in this case, Lisa.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "（symmetricstructuresofcoordinati", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_678.wav", "doc_id": "oaOHnMCwad.seg_678", "src_text": "You might turn towards a popular API like Prospective API for toxicity detection, and this works really well if you're Carl Jones.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "你可能会转向一个流行的API，例如PerspectiveAPI，用于毒性检测，这确实很好用，尤其是当你是卡尔·琼斯时，Pe", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_414.wav", "doc_id": "WBLMIsdIrq.seg_414", "src_text": "So now we use our findings from our analysis to design a benchmark for document-level translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "解决。现在我们使用我们的分析结果来设计文档的新版本的转换。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_728.wav", "doc_id": "XejEJmgUmE.seg_728", "src_text": "Hi, everyone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大家好，", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_530.wav", "doc_id": "dvGkKzmIaN.seg_530", "src_text": "Copyright verification is to detect whether a model behind another service contains the word mark.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "版权验证是检测一个服务后面的模型是否包含版权内容。水印", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_779.wav", "doc_id": "WTTtiRKFZI.seg_779", "src_text": "Now those are asymmetric approaches to coordinate structures, such as the Prague approach.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "现在，人们也采用对称的方法来协调结构，例如，", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_55.wav", "doc_id": "TVCREhgqUP.seg_55", "src_text": "These utterances are paired with logical forms that represent core aspects of their meaning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这些表述与表示其含义核心方面的逻辑形式配对。", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是模型架构。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_816.wav", "doc_id": "WTTtiRKFZI.seg_816", "src_text": "It's absent in the second example \"Homer came and sneezed.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在第二个例子中，homer和kame和sneezes，", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_437.wav", "doc_id": "hgIDlKNiFM.seg_437", "src_text": "And finally, we conclude about the experiments and give you more details about how to access those models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "上展示我们的结果，并最终总结关于实验的内容，并给您更多关于如何访问模型的细节。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_823.wav", "doc_id": "WTTtiRKFZI.seg_823", "src_text": "But when the governor is on the right this tendency disappears.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "来越短，越来越短，越来越短，越来越短，越来越短", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_11.wav", "doc_id": "aQpIWggfCo.seg_11", "src_text": "Since no dataset of specific goals exists to support our study, we have to acquire these goals first.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "由于没有相关数据集来支持我们的研究，我们必须首先获取这些数据。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_159.wav", "doc_id": "SLpqvupgvW.seg_159", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "嗨，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_144.wav", "doc_id": "wLqFAuDnKa.seg_144", "src_text": "The summary of our experimental results is that the example quality is more important than the similarity to the source sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的实验结果的总结是，样本质量比与源句子的相似度更重要。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_114.wav", "doc_id": "uZBWfYjYnf.seg_114", "src_text": "And we compare with popular strategies that are also applied to offline models that are the Wait-k strategy and the Local Agreement.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们与适当的策略进行比较，这些策略也适用于在线模型，例如惠特基策略和本地协议，并且", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_78.wav", "doc_id": "TVCREhgqUP.seg_78", "src_text": "We determine the third token in the output in a similar way by jumping to another multiset token.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们通过跳转到另一个多元集令牌来在输出中以类似的方式确定第三个令牌。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_459.wav", "doc_id": "hgIDlKNiFM.seg_459", "src_text": "Finally, as a conclusion our proper system offered better performance on nine of the 11 downstream tasks and surpassed globally the result of the generic model, here CamemBERT.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，作为一个结论，我们的正确系统在11个Don'treams任务中表现出更好的性能，全球范围内的结果都超过了这里的通用模型。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_62.wav", "doc_id": "TVCREhgqUP.seg_62", "src_text": "This works well, but trees are usually not given and need to be obtained somehow.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这很好地工作，但树通常不会给出，需要通过某种方式获得。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_455.wav", "doc_id": "hgIDlKNiFM.seg_455", "src_text": "We also observe that using more data translated to better performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还观察到使用更多数据可以转换为更好的性能。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_121.wav", "doc_id": "uZBWfYjYnf.seg_121", "src_text": "Thanks for your attention.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的关注。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_330.wav", "doc_id": "dJGfOSFgZO.seg_330", "src_text": "You can see how the combination of all ABC-Eval metrics explains over 25% of conversation quality, and as you remove the metrics one at a time, most of them result in losing a decent amount of information about the quality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "可以看到所有ABC-EVA指标的组合解释了超过25%的对话质量，并且当你逐一移除这些指标时，大多数指标都会导致失去大量关于质量的信息。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_769.wav", "doc_id": "XejEJmgUmE.seg_769", "src_text": "Please read our paper for more details of our experiments.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "请阅读我们的论文以获取更多实验细节。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_587.wav", "doc_id": "rISrKoXQCx.seg_587", "src_text": "I think that's pretty much all I have for today.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我想那很棒，我想那是很多的——我已经", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_649.wav", "doc_id": "FLkGnzVRew.seg_649", "src_text": "On collecting around 1,000 examples of discourse unit pairs, we ran training for an initial classifier trained only on 43 examples of dissonance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在收集约一千个对话单元对的例子后，我们只在四十三个不一致的例子上训练了一个初始分类器。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_129.wav", "doc_id": "wLqFAuDnKa.seg_129", "src_text": "This involves using the latest test sets to avoid an overlap of the test data with the training data of the language model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这涉及使用最新的测试集，以避免测试数据与语言模型的训练数据重叠。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_860.wav", "doc_id": "GvEBWkLmuI.seg_860", "src_text": "So instead to do that, we'll turn to the results from our Marked Words method to show how these positive-seeming words facilitate stereotypes and essentializing narratives.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以我们将转向我们市场词汇方法的结果，以展示这些积极的看起来的单词如何促进了成见和本质化的叙述。", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_47.wav", "doc_id": "TVCREhgqUP.seg_47", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你好，", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_38.wav", "doc_id": "aQpIWggfCo.seg_38", "src_text": "We find CoScript shows high pluralism in the generated specific goals.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现CoScript在生成的特定目标中表现出高超理想主义；", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_738.wav", "doc_id": "XejEJmgUmE.seg_738", "src_text": "These days large language models are coming up with longer and longer context windows.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这些日子里，大语言模型来得越来越多，越来越长的", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_233.wav", "doc_id": "oYCKgTzTDy.seg_233", "src_text": "In this setting, the source language is the same as target language, for example German to German or English to English.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这种设置中，源语言与目标语言相同，例如德语到德语或英语到英语。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_439.wav", "doc_id": "hgIDlKNiFM.seg_439", "src_text": "Since then, this model has been adapted to many other languages, like in French with CamemBERT, and also in domains like biomedical with PubMedBERT and BioBERT and on clinical with ClinicalBERT, but mostly in English.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "从那以后，这种模型已经被适用于许多其他语言，例如法语，Camembert和其他域名，例如生物医学，Pametber和Biober，和临床，Clinicalber，但大多数是英语。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_335.wav", "doc_id": "dJGfOSFgZO.seg_335", "src_text": "They produce irrelevant information in around 15% of the responses, and they contradict themselves or their partner around 10% of the time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "他们在回应中产生了15%的不相关信息，并且他们大约10%的时间与自己或他们的伙伴相互矛盾。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_849.wav", "doc_id": "GvEBWkLmuI.seg_849", "src_text": "So for instance, the word \"warrior\" is usually associated with men.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，“男人”一词通常与“男人”一词相关联，", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_349.wav", "doc_id": "gGbuDbHhyc.seg_349", "src_text": "In weakly supervised learning, training algorithms are proposed to robustly train neural networks under such label noise so that the trained models still generalize well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在弱视监督训练中，提出了训练算法，以便在这种标签“噪音”下对神经网络进行强大的训练，从而使训练模型仍然可以泛化。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_223.wav", "doc_id": "oYCKgTzTDy.seg_223", "src_text": "The Lambda calculus is missing, or they're only evaluated on certain neural models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "表。LamdaCocles消失了。或者，它们只在某些较新的模型上进行评估；", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_209.wav", "doc_id": "SLpqvupgvW.seg_209", "src_text": "If the language model has access to some partially overlapping background knowledge, then the accuracy is between 82 to 87%, which is more realistic.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果语言模型有对部分重叠的背景知识的访问，那么准确率在82%到87%之间，这更现实。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_672.wav", "doc_id": "FLkGnzVRew.seg_672", "src_text": "Feel free to get in touch with us if you have any questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果您有任何问题，请随时与我们联系。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_424.wav", "doc_id": "WBLMIsdIrq.seg_424", "src_text": "Now, we use the MuDA benchmark to evaluate models and we find that context-aware models are significantly more accurate than models that do not use context for certain discourse phenomena such as formality and lexical cohesion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "现在我们使用Muda基准来评估模型，我们发现上下文词模型在某些话语现象，如正式性和词汇凝聚性方面的准确性远远高于不使用上下文的模型，但这些模型", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_300.wav", "doc_id": "PIZEXUFLAR.seg_300", "src_text": "So this shows the effect of different fine-tuning strategies on the model sensitivity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这表明模型敏感度的不同调节策略的效果：正如我们可以", "score": 59.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_739.wav", "doc_id": "XejEJmgUmE.seg_739", "src_text": "So it's crucial that we evaluate the models' acceptability throughout the context window and that is what we are trying to do here.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此非常重要的是，我们通过整个联系窗口来评估模型的可接受性。我们正试图在这里做这件事：", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_269.wav", "doc_id": "PIZEXUFLAR.seg_269", "src_text": "There exist more than 1600 language-only instruction tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "存在超过1,600种语言的指令任务，", "score": 73.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_783.wav", "doc_id": "WTTtiRKFZI.seg_783", "src_text": "So we get dependencies from the governor.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此我们可以依靠上级", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_712.wav", "doc_id": "oaOHnMCwad.seg_712", "src_text": "We also find most additional alignment with people who have a college education.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还发现与拥有大学教育的人相比，我们与拥有大学教育的人有更强的联系。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_212.wav", "doc_id": "SLpqvupgvW.seg_212", "src_text": "We've also shown that the models are domain-generalizable.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还显示模型是域一般化的；", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_565.wav", "doc_id": "rISrKoXQCx.seg_565", "src_text": "And we also try to investigate whether language models can pick up the polarisation that's prevalent in our modern society.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们也试图研究语言模型是否可以捕捉到我们现代社会中普遍存在的多元化。", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_264.wav", "doc_id": "PIZEXUFLAR.seg_264", "src_text": "So with the advances in large language models, many works started to explore new learning paradigms of reusing pre-trained language models for different downstream tasks in a parameter and data-efficient way.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "随着大型语言模型的进步，许多作品开始探索使用预训练语言模型来以参数和数据有效的方式实现不同下游任务的学习新范式。", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_742.wav", "doc_id": "XejEJmgUmE.seg_742", "src_text": "So what we do is that to simulate these longer sequences, we revisit the data sets themselves and then we recreate sentences by choosing acceptable or unacceptable sentences from those datasets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们要模拟这些更长的序列，我们会重新访问这些数据集本身，然后我们会重新创建句子，通过选择这些数据集中的可接受或不可接受的句子。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_868.wav", "doc_id": "GvEBWkLmuI.seg_868", "src_text": "And finally, for black women, we see that some of the top words are things like \"strong\" and \"resilient\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，对于黑人女性，我们看到一些顶词是像强壮和坚韧的。这与一个架构有关，即", "score": 42.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_544.wav", "doc_id": "dvGkKzmIaN.seg_544", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，我们", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_347.wav", "doc_id": "gGbuDbHhyc.seg_347", "src_text": "When compared to human annotations, the weaker annotations are much cheaper, yet they are also noisy, meaning that a certain amount of the annotations are incorrect.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "6.77.78.79.80.81.82.83.84.85.86.87.88.89.90.91.92.93.94.95.96.97.98.99.100.101.102.103.104.105.106.107.108.109.110.111.112.113.114.115.116.117.118.119.120.121.122.123.124.125.126.127.128.129.130.131.132.133.134.135.136.137.138.139.140.141.142.143.144.145.146.147.148.149.150.151.152.153.154.155.156.157.158.159.160.161.162.163.164.165.166.167.168.169.170.171.172.173.174.175.176.177.178.179.180.181.182.183.184.185.186.187.188.189.190.191.192.193.194.195.196.197.198.199.200.201.202.203.204.205.206.207.208.209.210.211.212.213.214.215.216.217.218.219.220.221.222.223.224.225.226.227.228.229.230.231.232.233.234.235.236.237.238.239.240.241.242.243.244.245.246.247.248.249.250.251.252.253.254.255.256.257.258.259.260.261.262.263.264.265.266.267.268.269.270.271.272.273.274.275.276.277.278.279.280.281.282.283.284.285.286.287.288.289.290.291.292.293.294.295.296.297.298.299.300.301.302.303.304.305.306.307.308.309.310.311.312.313.314.315.316.317.318.319.320.321.322.323.324.325.326.327.328.329.330.331.332.333.334.335.336.337.338.339.340.341.342.343.344.345.346.347.348.349.350.351.352.353.354.355.356.357.358.359.360.361.362.363.364.365.366.367.368.369.370.371.372.373.374.375.376.377.378.379.380.381.382.383.384.385.386.387.388.389.390.391.392.393.394.395.396.397.398.399.400.401.402.403.404.405.406.407.408.409.410.411.412.413.414.415.416.417.418.419.420.421.422.423.424.425.426.427.428.429.430.431.432.433.434.435.436.437.438.439.440.441.442.443.444.445.446.447.448.449.450.451.452.453.454.455.456.457.458.459.460.461.462.463.464.465.466.467.468.469.470.471.472.473.474.475.476.477.478.479.480.481.482.483.484.485.486.487.488.489.490.491.492.493.494.495.496.497.498.499.500.501.502.503.504.505.506.507.508.509.510.511.512.513.514.515.516.517.518.519.520.521.522.523.524.525.526.527.528.529.530.531.532.533.534.535.536.537.538.539.540.541.542.543.544.545.546.547.548.549.550.551.552.553.554.555.556.557.558.559.560.561.562.563.564.565.566.567.568.569.570.571.572.573.574.575.576.577.578.579.580.581.582.583.584.585.586.587.588.589.590.591.592.593.594.595.596.597.598.599.600.601.602.603.604.605.606.607.608.609.610.611.612.613.614.615.616.617.618.619.620.621.622.623.624.625.626.627.628.629.630.631.632.633.634.635.636.637.638.639.640.641.642.643.644.645.646.647.648.649.650.651.652.653.654.655.656.657.658.659.660.661.662.663.664.665.666.667.668.669.670.671.672.673.674.675.676.677.678.679.680.681.682.683.684.685.686.687.688.689.690.691.692.693.694.695.696.697.698.699.700.701.702.703.704.705.706.707.708.709.710.711.712.713.714.715.716.717.718.719.720.721.722.723.724.725.726.727.728.729.730.731.732.733.734.735.736.737.738.739.740.741.742.743.744.745.746.747.748.749.750.751.752.753.754.755.756.757.758.759.760.761.762.763.764.765.766.767.768.769.770.771.772.773.774.775.776.777.778.779.780.781.782.783.784.785.786.787.788.789.790.791.792.793.794.795.796.797.798.799.800.801.802.803.804.805.806.807.808.809.810.811.812.813.814.815.816.817.818.819.820.821.822.823.824.825.826.827.828.829.830.831.832.833.834.835.836.837.838.839.840.841.842.843.844.845.846.847.848.849.850.851.852.853.854.855.856.857.858.859.860.861.862.863.864.865.866.867.868.869.870.871.872.873.874.875.876.877.878.879.880.881.882.883.884.885.886.887.888.889.890.891.892.893.894.895.896.897.898.899.900.901.902.903.904.905.906.907.908.909.910.911.912.913.914.915.916.917.918.919.920.921.922.923.924.925.926.927.928.929.930.931.932.933.934.935.936.937.938.939.940.941.942.943.944.945.946.947.948.949.950.951.952.953.954.955.956.957.958.959.960.961.962.963.964.965.966.967.968.969.970.971.972.973.974.975.976.977.978.979.980.981.982.983.984.985.986.987.988.989.990.991.992.993.994.995.996.997.998.999.1000.1001.1002.1003.1004.1005.1006.1007.1008.1009.1010.1011.1012.1013.1014.1015.1016.1017.1018.1019.1020.1021.1022.1023.1024.1025.1026.1027.1028.1029.1030.1031.1032.1033.1034.1035.1036.1037.1038.1039.1040.1041.1042.1043.1044.1045.1046.1047.1048.1049.1050.1051.1052.1053.1054.1055.1056.1057.1058.1059.1060.1061.1062.1063.1064.1065.1066.1067.1068.1069.1070.1071.1072.1073.1074.1075.1076.1077.1078.1079.1080.1081.1082.1083.1084.1085.1086.1087.1088.1089.1090.1091.1092.1093.1094.1095.1096.1097.1098.1099.1100.1101.1102.1103.1104.1105.1106.1107.1108.1109.1110.1111.1112.1113.1114.1115.1116.1117.1118.1119.1120.1121.1122.1123.1124.1125.1126.1127.1128.1129.1130.1131.1132.1133.1134.1135.1136.1137.1138.1139.1140.1141.1142.1143.1144.1145.1146.1147.1148.1149.1150.1151.1152.1153.1154.1155.1156.1157.1158.1159.1160.1161.1162.1163.1164.1165.1166.1167.1168.1169.1170.1171.1172.1173.1174.1175.1176.1177.1178.1179.1180.1181.1182.1183.1184.1185.1186.1187.1188.1189.1190.1191.1192.1193.1194.1195.1196.1197.1198.1199.1200.1201.1202.1203.1204.1205.1206.1207.1208.1209.1210.1211.1212.1213.1214.1215.1216.1217.1218.1219.1220.1221.1222.1223.1224.1225.1226.1227.1228.1229.1230.1231.1232.1233.1234.1235.1236.1237.1238.1239.1240.1241.1242.1243.1244.1245.1246.1247.1248.1249.1250.1251.1252.1253.1254.1255.1256.1257.1258.1259.1260.1261.1262.1263.1264.1265.1266.1267.1268.1269.1270.1271.1272.1273.1274.1275.1276.1277.1278.1279.1280.1281.1282.1283.1284.1285.1286.1287.1288.1289.1290.1291.1292.1293.1294.1295.1296.1297.1298.1299.1300.1301.1302.1303.1304.1305.1306.1307.1308.1309.1310.1311.1312.1313.1314.1315.1316.1317.1318.1319.1320.1321.1322.1323.1324.1325.1326.1327.1328.1329.1330.1331.1332.1333.1334.1335.1336.1337.1338.1339.1340.1341.1342.1343.1344.1345.1346.1347.1348.1349.1350.1351.1352.1353.1354.1355.1356.1357.1358.1359.1360.1361.1362.1363.1364.1365.1366.1367.1368.1369.1370.1371.1372.1373.1374.1375.1376.1377.1378.1379.1380.1381.1382.1383.1384.1385.1386.1387.1388.1389.1390.1391.1392.1393.1394.1395.1396.1397.1398.1399.1400.1401.1402.1403.1404.1405.1406.1407.1408.1409.1410.1411.1412.1413.1414.1415.1416.1417.1418.1419.1420.1421.1422.1423.1424.1425.1426.1427.1428.1429.1430.1431.1432.1433.1434.1435.1436.1437.1438.1439.1440.1441.1442.1443.1444.1445.1446.1447.1448.1449.1450.1451.1452.1453.1454.1455.1456.1457.1458.1459.1460.1461.1462.1463.1464.1465.1466.1467.1468.1469.1470.1471.1472.1473.1474.1475.1476.1477.1478.1479.1480.1481.1482.1483.1484.1485.1486.1487.1488.1489.1490.1491.1492.1493.1494.1495.1496.1497.1498.1499.1500.1501.1502.1503.1504.1505.1506.1507.1508.1509.1510.1511.1512.1513.1514.1515.1516.1517.1518.1519.1520.1521.1522.1523.1524.1525.1526.1527.1528.1529.1530.1531.1532.1533.1534.1535.1536.1537.1538.1539.1540.1541.1542.1543.1544.1545.1546.1547.1548.1549.1550.1551.1552.1553.1554.1555.1556.1557.1558.1559.1560.1561.1562.1563.1564.1565.1566.1567.1568.1569.1570.1571.1572.1573.1574.1575.1576.1577.1578.1579.1580.1581.1582.1583.1584.1585.1586.1587.1588.1589.1590.1591.1592.1593.1594.1595.1596.1597.1598.1599.1600.1601.1602.1603.1604.1605.1606.1607.1608.1609.1610.1611.1612.1613.1614.1615.1616.1617.1618.1619.1620.1621.1622.1623.1624.1625.1626.1627.1628.1629.1630.1631.1632.1633.1634.1635.1636.1637.1638.1639.1640.1641.1642.1643.1644.1645.1646.1647.1648.1649.1650.1651.1652.1653.1654.1655.1656.1657.1658.1659.1660.1661.1662.1663.1664.1665.1666.1667.1668.1669.1670.1671.1672.1673.1674.1675.1676.1677.1678.1679.1680.1681.1682.1683.1684.1685.1686.1687.1688.1689.1690.1691.1692.1693.1694.1695.1696.1697.1698.1699.1700.1701.1702.1703.1704.1705.1706.1707.1708.1709.1710.1711.1712.1713.1714.1715.1716.1717.1718.1719.1720.1721.1722.1723.1724.1725.1726.1727.1728.1729.1730.1731.1732.1733.1734.1735.1736.1737.1738.1739.1740.1741.1742.1743.1744.1745.1746.1747.1748.1749.1750.1751.1752.1753.1754.1755.1756.1757.1758.1759.1760.1761.1762.1763.1764.1765.1766.1767.1768.1769.1770.1771.1772.1773.1774.1775.1776.1777.1778.1779.1780.1781.1782.1783.1784.1785.1786.1787.1788.1789.1790.1791.1792.1793.1794.1795.1796.1797.1798.1799.1800.1801.1802.1803.1804.1805.1806.1807.1808.1809.1810.1811.1812.1813.1814.1815.1816.1817.1818.1819.1820.1821.1822.1823.1824.1825.1826.1827.1828.1829.1830.1831.1832.1833.1834.1835.1836.1837.1838.1839.1840.1841.1842.1843.1844.1845.1846.1847.1848.1849.1850.1851.1852.1853.1854.1855.1856.1857.1858.1859.1860.1861.1862.1863.1864.1865.1866.1867.1868.1869.1870.1871.1872.1873.1874.1875.1876.1877.1878.1879.1880.1881.1882.1883.1884.1885.1886.1887.1888.1889.1890.1891.1892.1893.1894.1895.1896.1897.1898.1899.1900.1901.1902.1903.1904.1905.1906.1907.1908.1909.1910.1911.1912.1913.1914.1915.1916.1917.1918.1919.1920.1921.1922.1923.1924.1925.1926.1927.1928.1929.1930.1931.1932.1933.1934.1935.1936.1937.1938.1939.1940.1941.1942.1943.1944.1945.1946.1947.1948.1949.1950.1951.1952.1953.1954.1955.1956.1957.1958.1959.1960.1961.1962.1963.1964.1965.1966.1967.1968.1969.1970.1971.1972.1973.1974.1975.1976.1977.1978.1979.1980.1981.1982.1983.1984.1985.1986.1987.1988.1989.1990.1991.1992.1993.1994.1995.1996.1997.1998.1999.2000.2001.2002.2003.2004.2005.2006.2007.2008.2009.2010.2011.2012.2013.2014.2015.2016.2017.2018.2019.2020.2021.2022.2023.2024.2025.2026.2027.2028.2029.2030.2031.2032.2033.2034.2035.2036.2037.2038.2039.2040.2041.2042.2043.2044.2045.2046.2047.2048.2049.2050.2051.2052.2053.2054.2055.2056.2057.2058.2059.2060.2061.2062.2063.2064.2065.2066.2067.2068.2069.2070.2071.2072.2073.2074.2075.2076.2077.2078.2079.2080.2081.2082.2083.2084.2085.2086.2087.2088.2089.2090.2091.2092.2093.2094.2095.2096.2097.2098.2099.2100.2101.2102.2103.2104.2105.2106.2107.2108.2109.2110.2111.2112.2113.2114.2115.2116.2117.2118.2119.2120.2121.2122.2123.2124.2125.2126.2127.2128.2129.2130.2131.2132.2133.2134.2135.2136.2137.2138.2139.2140.2141.2142.2143.2144.2145.2146.2147.2148.2149.2150.2151.2152.2153.2154.2155.2156.2157.2158.2159.2160.2161.2162.2163.2164.2165.2166.2167.2168.2169.2170.2171.2172.2173.2174.2175.2176.2177.2178.2179.2180.2181.2182.2183.2184.2185.2186.2187.2188.2189.2190.2191.2192.2193.2194.2195.2196.2197.2198.2199.2200.2201.2202.2203.2204.2205.2206.2207.2208.2209.2210.2211.2212.2213.2214.2215.2216.2217.2218.2219.2220.2221.2222.2223.2224.2225.2226.2227.2228.2229.2230.2231.2232.2233.2234.2235.2236.2237.2238.2239.2240.2241.2242.2243.2244.2245.2246.2247.2248.这是一项与肖尤谢夫、马里奥斯·穆斯巴赫、加斯·斯蒂芬和迪特里希·克拉科夫合作的研究。我想从弱监督和弱监督学习的简要介绍开始。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_229.wav", "doc_id": "oYCKgTzTDy.seg_229", "src_text": "The first one is Translate-Test.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第一个是翻译测试：", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_438.wav", "doc_id": "hgIDlKNiFM.seg_438", "src_text": "Since its release in 2018, BERT has become one of the most effective approach to solve natural language processing tasks and offers huge performance gains compared to historical static and contextualized methods such as Word2vec, fastText, or more.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "自2018年发布以来，BERT已经成为解决自然语言处理任务的最有效方法之一，并且在与历史静态和上下文化方法（如Word2Vec）相比，提供了巨大的性能提升。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_446.wav", "doc_id": "hgIDlKNiFM.seg_446", "src_text": "To answer this question, we first train and compare four from-scratch models: a first version of DrBERT, with 7 GB of NACHOS; a second version of 4 GB of set of NACHOS; a first version of ChuBERT, which is a clinical model with 4 GB of sentences taken from clinical notes; and a final version of ChuBERT with a mix of 4 GB of set of NACHOS and 4 GB of clinical notes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了回答这些问题，我们首先训练和比较四个从头开始的模型：BERT-base、BERT-large、RoBERTa-base和RoBERTa-large。我们将在此基础上进行以下实验：我们将ShuBERT的四个版本（一个是四个GB的自然语言处理数据集，一个是四个GB的临床注释数据集，一个是四个GB的自然语言处理数据集和四个GB的临床注释数据集的混合，另外一个是四个GB的临床注释数据集）与ShuBERT的四个", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_210.wav", "doc_id": "SLpqvupgvW.seg_210", "src_text": "For example, when the language model retrieves the background knowledge.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，当语言模型检索背景知识时，", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_271.wav", "doc_id": "PIZEXUFLAR.seg_271", "src_text": "Therefore, this motivates us to build a multi-modal instruction tuning dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此这促使我们建立多模态指令调节。”", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_202.wav", "doc_id": "SLpqvupgvW.seg_202", "src_text": "For example, the one with the piano music.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，带有钢琴音乐的实体。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_853.wav", "doc_id": "GvEBWkLmuI.seg_853", "src_text": "So for instance, for the personas of black women, we would do Fightin’ Words and compare the log-odds ratios against both white personas and man personas because those are the two corresponding unmarked groups.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，对于黑人女性，我们将使用词语对比，并将对比的词语频率与白人男性和黑人男性进行比较，因为这两个群体是对应的未标记组。", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_682.wav", "doc_id": "oaOHnMCwad.seg_682", "src_text": "This is an example of a design bias where we see systematic performance differences of technology between populations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是一个设计偏差的例子，我们看到技术在不同的人群之间存在系统性性能差异。设计师的视角就", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_453.wav", "doc_id": "hgIDlKNiFM.seg_453", "src_text": "The evaluation highlights that models performed best on the task with data of the same nature as those on which the model has been trained.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，我们可以从异源数据中获得数据，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_302.wav", "doc_id": "PIZEXUFLAR.seg_302", "src_text": "We also can see transfer learning from natural instruction datasets can help OFA to attain much better performance on the natural instruct dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还可以看到，基于自然指令数据集的迁移学习，可以帮助OWA在自然指令数据集上获得更好的性能。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_67.wav", "doc_id": "TVCREhgqUP.seg_67", "src_text": "For the first time, we show strong generalization to deeper recursion without relying on trees.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们首次展示了不依赖树结构的情况下，神经序列到序列模型可以对更深层次的递归进行强大的泛化。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_784.wav", "doc_id": "WTTtiRKFZI.seg_784", "src_text": "Here loves to all conjuncts separately: Lisa, Bart, and Maggie.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "的爱护而获得依赖。现在，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_577.wav", "doc_id": "rISrKoXQCx.seg_577", "src_text": "For example, if right-leaning language models were to be fine-tuned on hate speech or misinformation or whatever and deployed to a popular social media platform, this would mean that, people with opposite political opinions might be marginalised and hate speech targeting minority groups might just run rampant without any control.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，如果正确的语言模型可以对恨语或任何信息进行处理，并部署到流行的社交媒体平台上。这将意味着持有相反政治观点的人可能会被边缘化，而针对少数民族的仇恨言论可能会疯狂地蔓延而没有任何控制。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_672.wav", "doc_id": "FLkGnzVRew.seg_672", "src_text": "Feel free to get in touch with us if you have any questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果您有任何问题，请随时与我们联系。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_743.wav", "doc_id": "XejEJmgUmE.seg_743", "src_text": "So for example, here we have chosen like a typical pair of grammaticality from the BLiMP data set from the Adjunct Island case.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，在这里，我们选择了来自Blimp数据集的“典型”一对语法性", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_108.wav", "doc_id": "uZBWfYjYnf.seg_108", "src_text": "This means that the first two words will be emitted while since the sum of the cross-attention is above a certain threshold alpha, we will not emit the last word and we wait for another speech chunk.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这意味着前两个单词将会发出虽然大众注意力集中在某个临界点上，但我们不会说出最后的话，等待另一个演讲段落。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_336.wav", "doc_id": "dJGfOSFgZO.seg_336", "src_text": "With the rapid pace of improvement in the field, many of these error rates could see a decrease in new models released since our evaluation was conducted.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "由于快速的改进，许多这些错误率可以在新模型发布后看到降低", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_739.wav", "doc_id": "XejEJmgUmE.seg_739", "src_text": "So it's crucial that we evaluate the models' acceptability throughout the context window and that is what we are trying to do here.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "上下文窗口，所以很关键的是我们要评估这些模型在整个上下文窗口的可接受性，而这正是我们试图做的：", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_561.wav", "doc_id": "rISrKoXQCx.seg_561", "src_text": "Secondly, we aim to investigate to which extent the political biases of language models are actually picked up from training data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二，我们旨在调查语言模型的政治偏见实际上从训练数据中被提取的程度，", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_564.wav", "doc_id": "rISrKoXQCx.seg_564", "src_text": "For example, for RoBERTa further trained on the left-leaning Reddit corpus we can see a substantial liberal shift in terms of its political biases.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，罗伯塔在左侧的红色身体上进行了进一步的训练，我们可以看到它在这种方面有很大的自由。就其政治立场而言", "score": 36.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_358.wav", "doc_id": "gGbuDbHhyc.seg_358", "src_text": "We addressed these research questions in our work and our findings are as follows.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们在工作中解决了这些研究问题，结果如下。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_398.wav", "doc_id": "WBLMIsdIrq.seg_398", "src_text": "In the previous work, we introduced CXMI as a measure for context usage by machine translation models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在以前的工作中，我们引入了CSMI作为机器传输模型对联系人使用的测量指标，并", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_501.wav", "doc_id": "dvGkKzmIaN.seg_501", "src_text": "It's my pleasure to give a short advertisement video of our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我很高兴能给你们看一下我们论文的短视频：", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_51.wav", "doc_id": "TVCREhgqUP.seg_51", "src_text": "In the context of semantic parsing, testing for compositional generalization might look like this.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在语义分析的背景下，测试组成的普遍化可能看起来像这", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_455.wav", "doc_id": "hgIDlKNiFM.seg_455", "src_text": "We also observe that using more data translated to better performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们也观察到使用更多数据可以转化为更好的性能。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_757.wav", "doc_id": "XejEJmgUmE.seg_757", "src_text": "Now, what happens when we choose sentences from the same data set?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "现在我们从同一数据集中选择句子会发生什么？因此，", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_673.wav", "doc_id": "FLkGnzVRew.seg_673", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_337.wav", "doc_id": "dJGfOSFgZO.seg_337", "src_text": "However, this is all the more reason to pursue reliable and precise evaluation metrics for comparing models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，这更是为了追求可靠和精确的评估指标来比较模型的理由。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_577.wav", "doc_id": "rISrKoXQCx.seg_577", "src_text": "For example, if right-leaning language models were to be fine-tuned on hate speech or misinformation or whatever and deployed to a popular social media platform, this would mean that, people with opposite political opinions might be marginalised and hate speech targeting minority groups might just run rampant without any control.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，如果要训练语言模型来识别仇恨言论或信息等，并将其部署到流行的社交媒体平台上，这意味着人们持有相反的政治观点的人可能会被边缘化，而仇恨言论目标少数群体可能会毫无控制地疯狂地疯狂地疯狂地疯狂地", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_573.wav", "doc_id": "rISrKoXQCx.seg_573", "src_text": "And vice versa, right-leaning language models are better at detecting hate speech targeting white and men, however worse at detecting hate speech targeting at black LGBTQ plus and other minority communities.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "反，正确的语言模型更好地检测到目标白人和黑人，尽管在检测黑人LGBTQ+和其他少数民族时更好。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_225.wav", "doc_id": "oYCKgTzTDy.seg_225", "src_text": "So to this end we propose XSemPLR.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们提出了示例：", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_862.wav", "doc_id": "GvEBWkLmuI.seg_862", "src_text": "First, from our groups, the top words include things like \"culture\", \"tradition\", \"proud\", and \"exotic\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，对于标记组，顶部的单词包括文化、传统、自豪和异国的词汇，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_265.wav", "doc_id": "PIZEXUFLAR.seg_265", "src_text": "Recently, many studies have shown that instruction tuning enables large language models to perform on unseen tasks in a zero-shot manner by following natural instructions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最近，许多研究表明，指令调节使大型语言模型能够以自然指令的方式在未见过的任务上进行彻底的工作。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_329.wav", "doc_id": "dJGfOSFgZO.seg_329", "src_text": "Finally, we checked whether each evaluation metric captures a unique aspect of chat quality using a stepwise linear regression.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，我们检查了每个评估指标是否使用逐步线性回归捕捉了检查质量的独特方面。你", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_574.wav", "doc_id": "rISrKoXQCx.seg_574", "src_text": "Similar trends also happen for fake news detection, where we see that left-leaning language models are better at detecting misinformation from their opposite political leaning and vice versa.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "类似的趋势也发生在假新闻检测中，我们看到正确的语言模型更好地检测来自相反政治立场的信息。在此，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_312.wav", "doc_id": "dJGfOSFgZO.seg_312", "src_text": "So let's say that you just developed a dialogue model and you want to see how well it compares against the current state-of-the-art.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "假设您刚刚开发了一个对话模型，您想看看它与当前的最佳实践相比如何。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_597.wav", "doc_id": "oeooqChmKK.seg_597", "src_text": "In this work, we propose a diagnostic test suite for knowledge integration.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在本研究中，我们提出了一套诊断测试套件用于知识融合。", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_122.wav", "doc_id": "wLqFAuDnKa.seg_122", "src_text": "Hello everyone, my name is David Vilar, and I will be giving a short review of the paper \"Prompting PaLM for Translation: Assessing Strategies and Performance.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_135.wav", "doc_id": "wLqFAuDnKa.seg_135", "src_text": "The difference observed is of more than one BLEURT points.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "516/1000）观察到的差异大于1个模糊点。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_450.wav", "doc_id": "hgIDlKNiFM.seg_450", "src_text": "In total, we have seven models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们有7个模型。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_844.wav", "doc_id": "GvEBWkLmuI.seg_844", "src_text": "Our prompts to generate these personas were inspired by a study where they gave these prompts to human subjects, finding that by giving it to human subjects, they also were able to surface racial stereotypes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的促发词，生成这些人格特征的灵感来自一项研究，研究中他们把这些促发词给了人类受试者，发现人类受试者也能通过这些促发词表达出种族刻板印象。", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_110.wav", "doc_id": "uZBWfYjYnf.seg_110", "src_text": "This means that these three words will be emitted.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这意味着这三个单词将会被发出。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_534.wav", "doc_id": "dvGkKzmIaN.seg_534", "src_text": "The cosine and L2 similarity between the requested embedding and the target embedding are computed.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "计算了请求的嵌入和目标嵌入之间的余弦相似度和余弦相似度。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_559.wav", "doc_id": "rISrKoXQCx.seg_559", "src_text": "They occupy all four quadrants on the political campus.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们可以看到，政治背景下", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_50.wav", "doc_id": "TVCREhgqUP.seg_50", "src_text": "Compositional generalization can be understood as the ability of a learner to handle deeper recursion and unseen compositions of phrases that have been seen individually during training.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "可以理解为学习者在训练过程中看到的未见过的组合和未见过的组合的能力。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_834.wav", "doc_id": "GvEBWkLmuI.seg_834", "src_text": "To overcome these limitations, we rely on the property that these newer instruction-tuned LLMs are very good at responding to instructions and prompts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了克服这些限制，我们依赖于这些新型指令调整的内核很好地响应指令和提示的属性。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_356.wav", "doc_id": "gGbuDbHhyc.seg_356", "src_text": "Second, if clean data is required, or if clean data is mandatory for WSL to work, then how many clean samples do we need?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "其次，如果需要清洁数据，或者清洁数据是WSSL工作的必需品，那么我们需要多少清洁样本？", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_97.wav", "doc_id": "uZBWfYjYnf.seg_97", "src_text": "Long and complicated training procedures, for example, training involving different optimization objectives.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "漫长而复杂的训练程序，例如涉及不同优化目标的训练。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_397.wav", "doc_id": "WBLMIsdIrq.seg_397", "src_text": "To answer the first question, we started by measuring how much a word depends on context during translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了回答第一个问题，我们从测量翻译过程中语词依赖于语境的程度开始。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_406.wav", "doc_id": "WBLMIsdIrq.seg_406", "src_text": "And this allows us to find, for example, dual pronouns in Arabic that have relatively high P-CXMI.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，这使我们能够找到例如阿拉伯语中具有相对高PCSM的双名词的例子，", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_679.wav", "doc_id": "oaOHnMCwad.seg_679", "src_text": "Where prospective API is able to detect correctly toxic instances.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "有毒实例的预期API。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_367.wav", "doc_id": "gGbuDbHhyc.seg_367", "src_text": "As we can see, if we have 10 samples per class, direct fine-tuning starts to beat WSL approaches.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们可以看到，如果我们有十个样本/类别，直接精调开始击败WSL方法。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_445.wav", "doc_id": "hgIDlKNiFM.seg_445", "src_text": "Is it 4 gigabytes, 8 gigabytes, or more?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "是4GB、8GB还是更多？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_378.wav", "doc_id": "gGbuDbHhyc.seg_378", "src_text": "Third, continuous fine-tuning is a simple yet strong baseline that should be considered in future work in WSL.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第三，连续的微调是一条应该在未来WLS工作中考虑的简单而又强大的基线。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_362.wav", "doc_id": "gGbuDbHhyc.seg_362", "src_text": "This indicates that WSL approaches actually require cleanly labeled data to work properly, and the annotation cost for obtaining clean validation samples should not be overlooked.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这表明WSL方法实际上需要清洁标签的数据才能正常工作，并且获取清洁验证样本的注释成本不应该被忽视。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_272.wav", "doc_id": "PIZEXUFLAR.seg_272", "src_text": "Here we present MultiInstruct, the first multi-modal instruction tuning benchmark dataset that consists of 62 diverse multi-modal tasks covering 10 broad categories.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这里，我们介绍MultiInstruct：第一个多模态指令调音标数据集，该数据集由62个不同的多模态任务组成，涵盖10个类别。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_665.wav", "doc_id": "FLkGnzVRew.seg_665", "src_text": "On further rounds of AL with two best strategies, we improve dissonance classification AUC to 0.75, which is the best performance that we have on the task so far.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "们采用了两种最佳策略，提高了AUC到7.5点，这是我们迄今为止在任务上取得的最好的成绩。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_749.wav", "doc_id": "XejEJmgUmE.seg_749", "src_text": "So here the sentences are still coming from a, relevant data sets but it's not from the same data set that you are evaluating with.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这里，句子仍然来自相关数据集，但它不是您正在评估的相同数据集，", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_61.wav", "doc_id": "TVCREhgqUP.seg_61", "src_text": "The trees are intended to capture the compositional process that relates utterances with the logical forms.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "树的目的是捕捉与逻辑形式相关的表达的组合过程。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_752.wav", "doc_id": "XejEJmgUmE.seg_752", "src_text": "So this will tell us like whether the models acceptability judgments are actually impacted by any context, like, whether the context is coming from a different subset of the data set, or whether it's like completely irrelevant, to the current like to the sentence that we are looking at.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这将告诉我们是否模型的可接受性判断实际上受到任何上下文的影响。就像从数据集的不同子集中获取的内容，还是完全与当前句子无关。", "score": 73.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_215.wav", "doc_id": "oYCKgTzTDy.seg_215", "src_text": "Hello everyone, my name is Yusen Zhang from the Penn State University.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "今天，我", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_119.wav", "doc_id": "uZBWfYjYnf.seg_119", "src_text": "If you want to discover more results, read our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果您想了解更多结果，请阅读我们的论文，并", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_541.wav", "doc_id": "dvGkKzmIaN.seg_541", "src_text": "The legend of the figures means the number of triggers in each sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "数字的传说意味着每个句子的触发器数量。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_304.wav", "doc_id": "PIZEXUFLAR.seg_304", "src_text": "We design a new metric called sensitivity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们设计了一个新的传感器，称为敏感度。", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_273.wav", "doc_id": "PIZEXUFLAR.seg_273", "src_text": "These tasks are derived from 21 existing open-source dataset and each task is equipped with five expert written instructions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这些任务是从现有的21个开源数据集派生出来的，每个任务都配备了五个额外的写入指令。", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_729.wav", "doc_id": "XejEJmgUmE.seg_729", "src_text": "I'm Koustav Sinha, and I'm pleased to welcome you to our talk of our ACL 2023 paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我是科斯塔夫·塞纳。很高兴能在我们的ACLU2023年文件中与大家讨论", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_6.wav", "doc_id": "aQpIWggfCo.seg_6", "src_text": "Planning for the goals with specific constraints, such as \"make a chocolate cake\", still remains under-studied.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "规划具有特定目标和约束的目标，如制作巧克力蛋糕仍然是一个未被研究的领域。", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_162.wav", "doc_id": "SLpqvupgvW.seg_162", "src_text": "Our goal is to understand users’ language when they want to make a choice.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的目标是理解用户在选择时的语言；", "score": 74.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_416.wav", "doc_id": "WBLMIsdIrq.seg_416", "src_text": "And we called our tagger the Multilingual Discourse-Aware, or MuDA tagger.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们称我们的标签为多语言讨论意识的乌玛达标签。", "score": 40.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_638.wav", "doc_id": "FLkGnzVRew.seg_638", "src_text": "And they have a consonance relationship.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "并且他们有共享关系。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_270.wav", "doc_id": "PIZEXUFLAR.seg_270", "src_text": "However, there is no large-scale publicly-available multi-modal instruction task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但没有大规模的公共可用的多模态指令任务，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_692.wav", "doc_id": "oaOHnMCwad.seg_692", "src_text": "We do this through our framework NLPositionality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们通过我们的框架（NL位置性）来实现这一点。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_741.wav", "doc_id": "XejEJmgUmE.seg_741", "src_text": "So that is the approach.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这就是我们的方法。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_187.wav", "doc_id": "SLpqvupgvW.seg_187", "src_text": "Where A and B are samples from Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "A和B是来自维基百科的样本。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_30.wav", "doc_id": "aQpIWggfCo.seg_30", "src_text": "Since large language models are costly to deploy, it's essential to enable language planning ability of smaller and specialized models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "由于大型语言模型的部署成本很高，因此必须允许使用较小的和专门的模型进行语言规划。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_466.wav", "doc_id": "SUkmfOTvGi.seg_466", "src_text": "Our paper investigated the problem of generalization using the Named Entity Recognition Task or the NER task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的论文使用命名实体识别任务或NER任务来研究泛化问题。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_480.wav", "doc_id": "SUkmfOTvGi.seg_480", "src_text": "The second ingredient is the model size.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二个成分是模型大小。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_593.wav", "doc_id": "oeooqChmKK.seg_593", "src_text": "But natural language understanding often requires knowledge that is also supplied at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "训练获得的知识，通常通过预训练获得的知识，通常通过预训练获得的知识，通常通过预训练获得的知识，", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_62.wav", "doc_id": "TVCREhgqUP.seg_62", "src_text": "This works well, but trees are usually not given and need to be obtained somehow.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这很好，但通常我们不需要通过某种方式获得它。", "score": 21.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_323.wav", "doc_id": "dJGfOSFgZO.seg_323", "src_text": "To determine what kind of evaluation is most effective, we selected four state-of-the-art chat models and evaluated them on 100 human-bot conversations per model using ABC-Eval.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了确定哪种评估最有效，我们选择了四种最先进的聊天模型，并在每种模型上评估了100个人类机器人对话，使用ABCEVL。", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_828.wav", "doc_id": "GvEBWkLmuI.seg_828", "src_text": "Hi, I'm Myra and today I'll be talking about our paper \"Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "Hi,I’mMyriamandtodayI’llbetalkingaboutourpaper“Marked", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_80.wav", "doc_id": "TVCREhgqUP.seg_80", "src_text": "To give you a teaser of the experimental results, here we compare our method with other treeless models on the COGS benchmark.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了给您一个实验结果的提示，我们在这里将我们的方法与其他无树模型在Cogs基准上进行比较。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_683.wav", "doc_id": "oaOHnMCwad.seg_683", "src_text": "Design biases like the one that we just saw before might occur due to the positionality of the NLP researchers and model developers.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "像我们刚刚看到的那一个，我的鼓励是基于NLP研究人员和模型开发人员的视角：视角", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_83.wav", "doc_id": "TVCREhgqUP.seg_83", "src_text": "In our paper, we solve a couple of interesting technical challenges.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在我们的论文中，我们解决了一些有趣的技术挑战。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_490.wav", "doc_id": "SUkmfOTvGi.seg_490", "src_text": "So what about temporal drift then?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "那么，温度如何呢？", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_224.wav", "doc_id": "oYCKgTzTDy.seg_224", "src_text": "For example, there's only one single model to evaluate them.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，只有一个模型可以评估它们。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_395.wav", "doc_id": "WBLMIsdIrq.seg_395", "src_text": "First, when does translation require context?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，翻译需要联系吗？", "score": 33.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_381.wav", "doc_id": "gGbuDbHhyc.seg_381", "src_text": "Please feel free to check it out.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，请自由检查。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_112.wav", "doc_id": "uZBWfYjYnf.seg_112", "src_text": "So we want our curves to be as high as possible on this plot.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们希望我们的队伍在这个地图上尽可能高。", "score": 42.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_241.wav", "doc_id": "oYCKgTzTDy.seg_241", "src_text": "And we also find many interesting results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还发现了许多有趣的结果。因此，", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_107.wav", "doc_id": "uZBWfYjYnf.seg_107", "src_text": "For example, if we receive a speech chunk containing \"I'm going to talk about...\" and our model predicts the translation in German, and we will look at the cross-attention weights, we'll see that the first two words points to the earliest received speech frames, while the last word points to the last received speech frames, as lambda speech frames.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，如果我们收到一段包含“我要谈论”的语句，并且我们的模型预测了德语翻译，那么我们就可以确定翻译是稳定的。我们将看看交叉注意力（crossattention）等待（wait）。我们将看到，前两个词指向最早接收到的语音框架，而最后一个词指向最后接收到的语音框架（或称为lamda语音框架）。", "score": 59.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_329.wav", "doc_id": "dJGfOSFgZO.seg_329", "src_text": "Finally, we checked whether each evaluation metric captures a unique aspect of chat quality using a stepwise linear regression.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，我们检查了每个评估矩阵是否捕捉到独特的检查质量方面，使用步骤向量线性回归。你", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_579.wav", "doc_id": "rISrKoXQCx.seg_579", "src_text": "So a little bit of discussion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们也想讨论一下", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_468.wav", "doc_id": "SUkmfOTvGi.seg_468", "src_text": "Firstly, can these models generalise to modern data?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，这些模型能否推广到现代数据？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_46.wav", "doc_id": "aQpIWggfCo.seg_46", "src_text": "Please find more details of CoScript in our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "请在我们的论文中找到CoScript的更多详细信息。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_256.wav", "doc_id": "oYCKgTzTDy.seg_256", "src_text": "Pretraining on English natural language can significantly boost the performance of Few-shot on target natural languages, and we found multilingual language models such as Codex and BLOOM are still inadequate for cross-lingual semantic parsing tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "使用英语自然语言进行预训练可以显著提高在目标自然语言上的少量数据上进行的任务的性能。我们发现，多语言语言模型，如Coders和Blue，仍然不足以", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_659.wav", "doc_id": "FLkGnzVRew.seg_659", "src_text": "\"Cumulative\" accumulates all the data collected from active annotation so far, whereas \"Iterative\" updates the model by training on the latest set of data collected.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "累积积累所有从活动注释中收集的数据，因此迭代更新模型通过在收集的最新数据集上训练。", "score": 69.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_233.wav", "doc_id": "oYCKgTzTDy.seg_233", "src_text": "In this setting, the source language is the same as target language, for example German to German or English to English.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这种情况下，源语言与目标语言相同，例如德语与德语，或者英语与英语。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_531.wav", "doc_id": "dvGkKzmIaN.seg_531", "src_text": "We first construct a back door and a benign data set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们首先构建了一个后门和一个恶意数据集。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_614.wav", "doc_id": "oeooqChmKK.seg_614", "src_text": "Lastly, the \"Background-Inference\" setting, where both knowledge types are available only at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "背景知识；最后，背景知识只有在练习时可用。", "score": 23.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_290.wav", "doc_id": "PIZEXUFLAR.seg_290", "src_text": "If it's a multi-modal generation task, we report Rouge-L. For NLP task, we report Rouge-L as well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果是多模态生成任务，我们报告BLEU；对于NLP任务，我们报告BLEU也好。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_145.wav", "doc_id": "wLqFAuDnKa.seg_145", "src_text": "So it's important to select the examples from high-quality translations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，选择高质量的翻译例子非常重要，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_148.wav", "doc_id": "wLqFAuDnKa.seg_148", "src_text": "And their results so a better performance when using the dev data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "使用DEFT数据时表现更好。", "score": 33.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_122.wav", "doc_id": "wLqFAuDnKa.seg_122", "src_text": "Hello everyone, my name is David Vilar, and I will be giving a short review of the paper \"Prompting PaLM for Translation: Assessing Strategies and Performance.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大家好，我叫艾德·维拉尔，我们将会对这篇关于强大翻译、评估策略和性能的论文进行短评，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_622.wav", "doc_id": "oeooqChmKK.seg_622", "src_text": "In this figure, we show the results of the best-performing models on the most difficult variant of the Background-Pretrain setting.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这个图中，我们展示了在背景预训练设置中最难的变体上表现最好的模型的", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_840.wav", "doc_id": "GvEBWkLmuI.seg_840", "src_text": "The Asian woman is depicted as unassuming; the Middle-Eastern woman is referred to using words like exotic and like, referring to a mesmerizing region.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "亚洲女性被描绘为“无知”，而中东女性则被称为“异国情调”。比如，提到迷人的地区，", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_376.wav", "doc_id": "gGbuDbHhyc.seg_376", "src_text": "For example, report if the model selection is done via clean validation samples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，报告模型选择是否通过清洁的验证示例进行。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_445.wav", "doc_id": "hgIDlKNiFM.seg_445", "src_text": "Is it 4 gigabytes, 8 gigabytes, or more?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "是4GB、8GB还是更多？", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_702.wav", "doc_id": "oaOHnMCwad.seg_702", "src_text": "Afterwards to stay engaged in the study, they can compare their responses to an AI and others.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "之后，为了继续参与研究，他们可以将他们的回应与人工智能和其他人进行比较。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是模型架构。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_501.wav", "doc_id": "dvGkKzmIaN.seg_501", "src_text": "It's my pleasure to give a short advertisement video of our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我很高兴可以给您展示一段关于纸张的短视频：", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_89.wav", "doc_id": "TVCREhgqUP.seg_89", "src_text": "That's because this is related to the \"Traveling Salesman\" problem.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是因为这与旅行推销员问题有关。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_620.wav", "doc_id": "oeooqChmKK.seg_620", "src_text": "In the Background-Inference setting, we provide the fictional occupation \"mirituer\" instead of politician because \"mirituer\" is unlikely to be contained in the pretrained parameters.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，感谢您的关注。他们提供了虚构的职业“政治家”而不是“政治家”。因为预训练参数中不太可能包含merituary。", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_5.wav", "doc_id": "aQpIWggfCo.seg_5", "src_text": "However, previous work mainly focuses on planning for the abstract goals of stereotypical activities.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，过去的工作主要集中在规划典型活动的抽象目标上；", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_484.wav", "doc_id": "SUkmfOTvGi.seg_484", "src_text": "To our next question, what causes the performance drop of some models, We had two hypothesis.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "下一个问题是：什么原因导致一些模型的性能下降？我们有两个假设：", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_293.wav", "doc_id": "PIZEXUFLAR.seg_293", "src_text": "Here is our main result.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是我们主要的结果，", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_267.wav", "doc_id": "PIZEXUFLAR.seg_267", "src_text": "Therefore, in this work we want to investigate whether instruction tuning a multi-modal pre-trained models can actually improve generalisation to unseen multi-modal tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，在本研究中，我们想探索是否可以通过在多模态预训练模型上进行指令调节来实际改善对未见多模态任务的泛化。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_295.wav", "doc_id": "PIZEXUFLAR.seg_295", "src_text": "Also, transfer learning from natural instruction dataset can benefit instruction tuning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "此外，从自然指令集学习的转移学习也可以有益于指令调节。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_331.wav", "doc_id": "dJGfOSFgZO.seg_331", "src_text": "On the other hand, the combination of all turn-level Likert metrics explains far less of the quality, and fewer of these metrics carry unique information.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "另一方面，交叉级别的利克特指标的组合解释了质量的很少部分，并且这些指标中有更少的指标携带独特的信息。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_571.wav", "doc_id": "rISrKoXQCx.seg_571", "src_text": "So we see that if we investigate the per category performance, that is to say if we separate the performance into different demographics or political leaning of news media we can see a pattern.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以我们说，如果我们调查了类别级表现，那就是说，如果我们分开了表现不同的人口统计学或政治新闻媒体，我们可以看到一个模式，", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_206.wav", "doc_id": "SLpqvupgvW.seg_206", "src_text": "Results with T5 XL model are summarized below.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "使用T5-XL大型模型的结果如下：", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_442.wav", "doc_id": "hgIDlKNiFM.seg_442", "src_text": "So we ask ourselves a question about what is the most appropriate data sources for a wide range of usage and those crawled data are good substitution for clinical data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们问自己：对于广泛使用，哪些数据源是最合适的？这些数据源是用于临床数据的良好的替代品。", "score": 73.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_690.wav", "doc_id": "oaOHnMCwad.seg_690", "src_text": "However these works really don't look at comparing end users with the datasets and models themselves, and studying model and data set positionality is increasingly important as NLP tasks become more subjective and socially oriented, and it's challenging to characterise how these positionalities are skewed because not all decisions are documented and many models are hidden behind APIs.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，这些作品并没有真正比较最终用户与数据集和模型本身。学习模型和数据位置性是变得越来越重要的，因为Lp测试变得越来越主观和社会偏向。很难描述这些位置性是如何被扭曲的，因为并非所有的决定都被记录下来，许多模型都被API隐藏起来。", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_530.wav", "doc_id": "dvGkKzmIaN.seg_530", "src_text": "Copyright verification is to detect whether a model behind another service contains the word mark.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "版权证书验证是检测另一个服务后面的模型是否包含水印的过程。首先，", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_633.wav", "doc_id": "FLkGnzVRew.seg_633", "src_text": "I would like to present our work accepted into ACL 2023 as a long paper, \"Transfer Learning for Dissonance Detection: Addressing the Rare-Class Challenge.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我想以一篇长篇论文的形式介绍我们接受的ACLU23号案件：学习转移以检测异议，解决罕见的挑战。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_133.wav", "doc_id": "wLqFAuDnKa.seg_133", "src_text": "The prompting has a big influence on the performance of the LLMs for translation, as we can see in a simple experiment, where we used one-shot prompting and provided two different prompts for each sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "提示有很大的影响力于ELM的翻译性能，正如我们可以看到的一个简单的实验中，我们使用单次提示，并为一个句子提供了两个不同的提示。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_448.wav", "doc_id": "hgIDlKNiFM.seg_448", "src_text": "One based on the weight of CamemBERT and trained on a 4 GB set of NACHOS.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "一个以卡曼伯的重量为基础，并在纳查斯的四千克以上的重量上训练；", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_661.wav", "doc_id": "FLkGnzVRew.seg_661", "src_text": "Next, to improve the number of dissonance examples, we use a Probability-of-Rare-Class strategy — PRC — to select mostly the examples that are highly likely to be descended by the current model at any round of rare.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "接下来，为了改善离散示例的数量，我们使用了实体类策略PRC，选择大多数在任何一个空轴上高度有可能被当前模型分开的示例。", "score": 22.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_718.wav", "doc_id": "oaOHnMCwad.seg_718", "src_text": "So we have a few recommendations for this.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们对此有几些建议：", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是模型架构。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_684.wav", "doc_id": "oaOHnMCwad.seg_684", "src_text": "Positionality is simply the perspectives that people hold as a result of their demographics, identity, and life experiences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "位置性质只是由于人口统计学、身份和生活经验而形成的观点。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_277.wav", "doc_id": "PIZEXUFLAR.seg_277", "src_text": "We follow the method from OFA and formulate all the tasks in a unified sequence-to-sequence format.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们遵循MasterFromOFA，并将所有任务统一为一个统一的序列到序列格式。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_744.wav", "doc_id": "XejEJmgUmE.seg_744", "src_text": "And what we do is that to recreate like longer sequences and which are acceptable and which has the same matching of the grammatical structure.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，从AdjunctIsland数据集中。我们可以通过在接受的和不接受的问句", "score": 24.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_345.wav", "doc_id": "gGbuDbHhyc.seg_345", "src_text": "In weak supervision, you do not manually label the data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在弱监督中，我们不会手动标记数据；", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_14.wav", "doc_id": "aQpIWggfCo.seg_14", "src_text": "This table reports the overall accuracy of the results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "该表报告了结果的总体准确性。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_520.wav", "doc_id": "dvGkKzmIaN.seg_520", "src_text": "Embedding marker contains two main steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "嵌入标记器包含两个主要步骤：", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_355.wav", "doc_id": "gGbuDbHhyc.seg_355", "src_text": "First, is clean validation data necessary for WSL or can we maybe use a noisy validation set instead?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，WSL需要清洁的验证数据吗？或者我们是否可以使用嘈杂的验证集代替？", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_597.wav", "doc_id": "oeooqChmKK.seg_597", "src_text": "In this work, we propose a diagnostic test suite for knowledge integration.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "工作中，我们提出了一个用于知识融合的诊断测试套件。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_171.wav", "doc_id": "SLpqvupgvW.seg_171", "src_text": "Here are some examples of indirect references for example, \"the newer one\" or \"the song that's not energetic.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "是很有活力的歌曲。这是一个重要的问题在", "score": 28.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_636.wav", "doc_id": "FLkGnzVRew.seg_636", "src_text": "This belief and action are inconsistent, and they are in dissonance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这两句话之间存在矛盾。进一步提到我不认为我能", "score": 19.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_789.wav", "doc_id": "WTTtiRKFZI.seg_789", "src_text": "So \"Marge read it yesterday\" is fine because the direct object is close to the verb, while \"Marge read yesterday it\" is much worse.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_449.wav", "doc_id": "hgIDlKNiFM.seg_449", "src_text": "Another also based on CamemBERT, but trained this time on the 4 GB of clinical notes and finally, one based on English biomedical model PubMedBERT, and trained on 4 GB of set of NACHOS.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还基于Camembert，但这次训练在4GB的临床数据上，最后一次基于一个基于英超生物医学模型（BERT）并在4GB的自然数据上训练，总共", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_87.wav", "doc_id": "TVCREhgqUP.seg_87", "src_text": "We address this by inducing the alignment as part of the training.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们通过在训练中引入对齐来解决这个问题。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_556.wav", "doc_id": "rISrKoXQCx.seg_556", "src_text": "So specifically, we first proposed to prompt language models with different prompt formats using the political questionnaires such as the political conference test.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们首先提议使用不同形式的政治问题提问者来提前语言模型，例如政治会议测试，", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_18.wav", "doc_id": "aQpIWggfCo.seg_18", "src_text": "We dig into a more fine-grained topic categories of constraints defined in wikiHow.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们深入探讨了更细化的限制类别，根据工作方式不同。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_228.wav", "doc_id": "oYCKgTzTDy.seg_228", "src_text": "And to better evaluate our benchmark, we consider the six settings for training and evaluation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了更好地评估我们的基准，我们考虑了六种训练和评估设置。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_705.wav", "doc_id": "oaOHnMCwad.seg_705", "src_text": "We then compared these annotations with Dynahate, Perspective API, Rewire API, Hate Roberta and GPT 4.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后我们将这些注释与DynaHate、PerspectiveAPI、ReWireAPI、HateRobert和GPT-4进行比较。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_866.wav", "doc_id": "GvEBWkLmuI.seg_866", "src_text": "So for example, the words describing Latina women include things like \"vibrant\" and \"curvaceous\" which connect to a trope of tropicalism.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，描述拉丁美洲女性的词汇包括像“活力”和“性感”等词汇。这与热带主义有关。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_328.wav", "doc_id": "dJGfOSFgZO.seg_328", "src_text": "For example, you can see how measuring the proportion of turns with self and partner contradictions explains 5% and 10% of conversation quality, respectively, while the average Likert consistency scores explain only 4% or less.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，您可以看到如何衡量自我和伴侣冲突的比例，表明与谈话质量相对应的5%和10%，而平均液体一致性分数表明只有4%或低。", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_403.wav", "doc_id": "WBLMIsdIrq.seg_403", "src_text": "And we perform our analysis on transcripts of TED talks that have been translated from English to 14 different languages.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们对泰德·托克的翻译从英语翻译成14种不同的语言进行了分析。", "score": 42.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_706.wav", "doc_id": "oaOHnMCwad.seg_706", "src_text": "Our study in the end amassed over 16,000 annotations from over 1000 annotators from 87 countries.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "该研究研究了艺术，并最终收集了来自87个国家的超过1,000名注释者超过16,000条注释。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_430.wav", "doc_id": "WBLMIsdIrq.seg_430", "src_text": "See you in Toronto.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "见到你在多伦多。", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_120.wav", "doc_id": "uZBWfYjYnf.seg_120", "src_text": "And we also released open source the code and models and simultaneous output to facilitate the reproducibility of our work.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们也发布了开放源代码、模型和同时发布的输出，以便促进我们的工作可复制性。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_84.wav", "doc_id": "TVCREhgqUP.seg_84", "src_text": "First of all, the alignment between input and output is not given in the training data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，输入和输出之间的对齐在训练数据中没有给出。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_311.wav", "doc_id": "dJGfOSFgZO.seg_311", "src_text": "This work was done by the Emory NLP Lab led by Professor Jinho Choi at Emory University and in collaboration with Amazon Alexa AI.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这项工作由埃默里大学的GinoOchi教授领导的埃默里NLP实验室完成，并与亚马逊AlexaAI合作。那么", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_813.wav", "doc_id": "WTTtiRKFZI.seg_813", "src_text": "But what's novel in this paper is that we observed that this tendency only occurs when the governor is on the left or absent.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "的短语越倾向于先出现。我们观察到这种趋势只在当政府在左侧（absent）时发生。", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_447.wav", "doc_id": "hgIDlKNiFM.seg_447", "src_text": "In addition to this comparison, we introduced three models trained on continual pre-training to analyze the impact of pre-training strategy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "除此之外，我们还引入了三种文化预备训练模式，以分析预备训练策略的影响。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_402.wav", "doc_id": "WBLMIsdIrq.seg_402", "src_text": "Now we analyze words with high P-CXMI to look for patterns between these words.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "现在我们使用高斯混合模型分析这些单词，以寻找这些单词之间的模式。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_641.wav", "doc_id": "FLkGnzVRew.seg_641", "src_text": "Studying cognitive dissonance can help us understand the effects of disagreement among people, track trends and belief values, and attitude changes in population.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "研究认知差异可以帮助我们了解人们之间的不一致的影响，跟踪信仰、价值观和态度在人群中的变化趋势。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_737.wav", "doc_id": "XejEJmgUmE.seg_737", "src_text": "The current MPP pipeline basically doesn't allow us to evaluate a model's acceptance towards longer sentences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "当前的MP管道基本上不允许我们评估模型对更长句子的接受程度。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_13.wav", "doc_id": "aQpIWggfCo.seg_13", "src_text": "We sample 100 specific goals and evaluate the scripts generated from large language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们采样了100个具体目标，并评估了由大型模型生成的脚本。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_477.wav", "doc_id": "SUkmfOTvGi.seg_477", "src_text": "Throughout experiments we found that there are three main ingredients that are needed.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通过我们的实验，我们发现需要三种主要的原料。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_49.wav", "doc_id": "TVCREhgqUP.seg_49", "src_text": "This is joint work with my advisors Alexander Koller and Ivan Titov.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是与我的顾问AlexanderColler和IvanTito共同工作的。", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_327.wav", "doc_id": "dJGfOSFgZO.seg_327", "src_text": "In addition, ABC-Eval labels are more predictive of the overall conversation quality compared to metrics produced by existing methods, as shown by this simple linear regression analysis.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "els比现有方法产生的指标更能预测总体对话质量，正如通过此简单线性回归分析所示。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_450.wav", "doc_id": "hgIDlKNiFM.seg_450", "src_text": "In total, we have seven models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "总共有七个模型。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_132.wav", "doc_id": "wLqFAuDnKa.seg_132", "src_text": "Finally, we provide some recommendations for prompt selection strategies.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，我们提供了一些推荐的策略。.promp", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_228.wav", "doc_id": "oYCKgTzTDy.seg_228", "src_text": "And to better evaluate our benchmark, we consider the six settings for training and evaluation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了更好地评估我们的基准，我们考虑了六种训练和评估设置。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_371.wav", "doc_id": "gGbuDbHhyc.seg_371", "src_text": "So in practice, there's no reason to choose more complex WSL methods which require more computation time and disk space.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以在实践中，没有理由选择更复杂的WSSL方法，这些方法需要更多的计算时间和磁盘空间。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_134.wav", "doc_id": "wLqFAuDnKa.seg_134", "src_text": "The majority of sentences 516 out of 1,000.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大多数句子中，1000个句子中的", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_843.wav", "doc_id": "GvEBWkLmuI.seg_843", "src_text": "The first one is generating these personas.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个部分是生成这些人物。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_308.wav", "doc_id": "dJGfOSFgZO.seg_308", "src_text": "Hello, I'm James Finch.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "今", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是模型架构。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_109.wav", "doc_id": "uZBWfYjYnf.seg_109", "src_text": "If we go on and we receive another speech chunk, and our model predicts other three words and we will look at those cross-attention weights, we will see that no word points to the last lambda speech frames.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果我们继续下去，我们会收到另一个语音段，我们的模型会预测...我们将这些三个词语排列在交叉注意力权重中，我们会看到没有一个词语指向最后一个lambeadaspeechframe。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_837.wav", "doc_id": "GvEBWkLmuI.seg_837", "src_text": "And we can immediately see that this is very generalizable to any demographic because we can just specify whatever identity marker that we want into this prompt.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们可以立即看到这非常普遍化到任何人口统计，因为我们可以只指定我们想要在这个提示中输入的任何身份标记。所", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_502.wav", "doc_id": "dvGkKzmIaN.seg_502", "src_text": "Are you copying my model?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我会复制我的模型，", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_849.wav", "doc_id": "GvEBWkLmuI.seg_849", "src_text": "So for instance, the word \"warrior\" is usually associated with men.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，词语“人”或“战士”通常与男性相关联，", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_830.wav", "doc_id": "GvEBWkLmuI.seg_830", "src_text": "In recent years, many have documented the prevalence of social bias and stereotypes in large language models, or LLMs.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "近年来，许多人已经证明了社会偏见和大语言模型或ELMs中存在的成见。", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_256.wav", "doc_id": "oYCKgTzTDy.seg_256", "src_text": "Pretraining on English natural language can significantly boost the performance of Few-shot on target natural languages, and we found multilingual language models such as Codex and BLOOM are still inadequate for cross-lingual semantic parsing tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "学习英语自然语言可以显著提高未来的目标自然语言的表现。我们发现多语言模型（例如科达斯和蓝色）仍然不足以跨语言多人对话。", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_725.wav", "doc_id": "oaOHnMCwad.seg_725", "src_text": "And so that concludes our presentation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "不是为每个人工作。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_16.wav", "doc_id": "aQpIWggfCo.seg_16", "src_text": "Then we conduct detailed analysis to investigate why learning models fail.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后我们进行详细的分析，以调查宽泛的线性模型的结果。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_39.wav", "doc_id": "aQpIWggfCo.seg_39", "src_text": "With CoScript we can try smaller but specialized models for constrained language planning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "使用CoScript，我们可以选择更小但更专业的模型来进行受限语言规划。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_755.wav", "doc_id": "XejEJmgUmE.seg_755", "src_text": "We increase the context length toward up to 1024 for to max out OPT and GPT 2 models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们增加了上下文长度，直到1024，为了最大化OOP和GPT2模型，", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_618.wav", "doc_id": "oeooqChmKK.seg_618", "src_text": "In the Background-Pretrain setting, we assume that the background knowledge \"Politicians seek elected seats in government\" is contained in the pretrained parameters and in inference-time context we provide the entity-specific knowledge \"Chichester is a politician.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "一个例子在背景前，假设政治家寻求在政府中当选的背景知识包含在预备训练参数中。在延迟背景下，我们提供了反特异知识：切斯特是政治家。", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_95.wav", "doc_id": "uZBWfYjYnf.seg_95", "src_text": "And what are the problems of the current SimulST models?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "当前的ST模型存在什么问题？", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_804.wav", "doc_id": "WTTtiRKFZI.seg_804", "src_text": "That's why this sounds quite okay.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这就是为什么这听起来似乎很好", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_763.wav", "doc_id": "XejEJmgUmE.seg_763", "src_text": "So we did a series of analysis where we tried to perturb the input sentence by, trying to preserve the relevant structure but adding like noise to the input.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们进行了一系列分析，试图通过在输入句子中添加噪音来破坏输入句子，", "score": 74.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_731.wav", "doc_id": "XejEJmgUmE.seg_731", "src_text": "This is a joint work with John Gauthier, Aaron Mueller, Kanishka Misra, Karen Fences, Roger Levy, and Adina Williams.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是一项与JohnGauthier、AaronMueller、KanishkaMishra、KarinFuentès、RogerLevy和AdinaWilliams共同完成的工作。因此，", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_37.wav", "doc_id": "aQpIWggfCo.seg_37", "src_text": "This figure shows the constraint distribution of CoScript.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这张图表显示了CoScript的受限分布；", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_598.wav", "doc_id": "oeooqChmKK.seg_598", "src_text": "We introduce a coreference resolution task, designed to probe for the ability to draw on knowledge available in different sources.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们引入一个关联解释任务，旨在为能够在不同来源中提取可用的知识而设计。", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_765.wav", "doc_id": "XejEJmgUmE.seg_765", "src_text": "Basically, we find that the models are sensitive to the perturbed sentences in similar ways.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "面的路线。基本上，我们发现模型对句法结构的敏感性类似。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_696.wav", "doc_id": "oaOHnMCwad.seg_696", "src_text": "And so we opt to re annotate data to get many annotates for instance and to get a rich set of demographic data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此我们希望重新注释数据以获得每个实例的注释数量。数据集。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_40.wav", "doc_id": "aQpIWggfCo.seg_40", "src_text": "We find that T5 fine-tuned on CoScript can generate scripts of higher quality than most large language models, indicating that smaller models can surpass larger models when properly trained on suitable datasets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在Fansites上，TFileFountuneonCourserate可以生成质量高于大多数大型语言模块的脚本，表明小型模块可以在适当的数据站上经过适当的训练时支持大型模块。", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_180.wav", "doc_id": "SLpqvupgvW.seg_180", "src_text": "Which is the alternative question.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是", "score": 21.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_245.wav", "doc_id": "oYCKgTzTDy.seg_245", "src_text": "And we evaluate on mT5 and XLM-R + PTR on multilingual setting.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们评估MT5和示例XLM-R+PDR在多语言设置中的表现。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_208.wav", "doc_id": "SLpqvupgvW.seg_208", "src_text": "But this is not realistic.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但这不是现实的。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_363.wav", "doc_id": "gGbuDbHhyc.seg_363", "src_text": "Our second finding is that increasing the number of clean validation samples will help WSL approaches to achieve better performance, as shown in the figure on the left.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的第二个发现是，增加清洁验证样本的数量将有助于WSL方法实现更好的性能，如图左所示：", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_115.wav", "doc_id": "uZBWfYjYnf.seg_115", "src_text": "And we compare also with the state-of-the-art architecture specifically tailored for simultaneous pre-translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的方法与专门为同时预测翻译设计的最新技术进行比较。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_693.wav", "doc_id": "oaOHnMCwad.seg_693", "src_text": "Our framework works in two main steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的框架分为两个主要步骤。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_811.wav", "doc_id": "WTTtiRKFZI.seg_811", "src_text": "So when the difference between the lengths of the two conjuncts grows, the shorter conjunct prefers to be the first one, stronger, right?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，长度差异越大，短语越倾向于先出现。因此，当两个短语之", "score": 34.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_46.wav", "doc_id": "aQpIWggfCo.seg_46", "src_text": "Please find more details of CoScript in our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "请在我们的论文中查找更多关于科斯克里普特的细节。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_685.wav", "doc_id": "oaOHnMCwad.seg_685", "src_text": "This is a concept widely used in critical studies, specifically in feminist and queer academic spaces.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是一个在批判性研究中广泛使用的概念，特别是在女权主义和女权主义学术领域。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_45.wav", "doc_id": "aQpIWggfCo.seg_45", "src_text": "Thanks for your time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的时间，", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_516.wav", "doc_id": "dvGkKzmIaN.seg_516", "src_text": "Existing works can be broadly classified into four categories.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "现有的作品可以大致分为四类。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_19.wav", "doc_id": "aQpIWggfCo.seg_19", "src_text": "The heat map in the figure shows that the planning performance of InstructGPTs varies considerably for goals of different categories.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "图中的头图显示，教科书的规划性能对不同类别的女孩有很大差异。", "score": 41.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_674.wav", "doc_id": "oaOHnMCwad.seg_674", "src_text": "Hi everyone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "今天，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_568.wav", "doc_id": "rISrKoXQCx.seg_568", "src_text": "We can see that language models generally had a political leaning that is further away from the centre after 2017.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "可以看到语言模型一般来说具有更远离中心的政治倾向，", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_280.wav", "doc_id": "PIZEXUFLAR.seg_280", "src_text": "So for the training dataset, we use 53 tasks from 9 groups for training and we sample 10,000 instances per task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对于训练数据集，我们使用了NLPGroup的53个任务进行训练，并且我们对每个任务进行了10000个示例的", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_7.wav", "doc_id": "aQpIWggfCo.seg_7", "src_text": "In this paper, we define the problem of constrained language planning which imposes different constraints on the goals of planning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在本文中，我们定义了受限的语言规划问题，这种问题会对规划的目标施加不同的约束。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_479.wav", "doc_id": "SUkmfOTvGi.seg_479", "src_text": "Through our experiments we found that the transformer models normally generalize better to new data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通过我们的实验，我们发现变压器模型通常对新数据进行一般化。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_805.wav", "doc_id": "WTTtiRKFZI.seg_805", "src_text": "Right?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "：", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_195.wav", "doc_id": "SLpqvupgvW.seg_195", "src_text": "When we show this alternative question to the annotators, they know the name of these entities, but they don't necessarily know about the entities.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "当我们展示这些替代问题给注释者时，他们知道这些实体的名称，但他们不一定知道这些实体。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_113.wav", "doc_id": "uZBWfYjYnf.seg_113", "src_text": "But also we want that they are shifted on the left.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但我们也希望它们在左侧移动。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_261.wav", "doc_id": "oYCKgTzTDy.seg_261", "src_text": "And welcome to visit our paper and code.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "欢迎您访问我们的论文和代码。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_815.wav", "doc_id": "WTTtiRKFZI.seg_815", "src_text": "So the governor is on the left in this example \"I saw Bart and Lisa\" so is the governor is on the left.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这个例子中，州长在左边；我看到巴特和丽莎，所以州长在左边。", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_604.wav", "doc_id": "oeooqChmKK.seg_604", "src_text": "After a long day at work deciding cases in a law court, he was happy to relax.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在法庭上做出决定后很高兴放松一下。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_414.wav", "doc_id": "WBLMIsdIrq.seg_414", "src_text": "So now we use our findings from our analysis to design a benchmark for document-level translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们现在使用我们的分析结果来设计文档局部翻译的基准。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_758.wav", "doc_id": "XejEJmgUmE.seg_758", "src_text": "So here we are choosing or creating sentences from acceptable and unacceptable domains from the same BLiMP or SyntaxGym dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这里，我们选择或创建从可接受和不可接受域中选取的句子，从同一BLIMP或语法DIM数据集中，看", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_508.wav", "doc_id": "dvGkKzmIaN.seg_508", "src_text": "However, recent works have shown that the attacker may steal the model through learning from the embedding and provide similar services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，最近的研究已经表明，攻击者可能通过学习从嵌入中学习模型，并提供类似的服务来窃取模型。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_786.wav", "doc_id": "WTTtiRKFZI.seg_786", "src_text": "OK.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "昨天我读了", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_799.wav", "doc_id": "WTTtiRKFZI.seg_799", "src_text": "So the reasoning here is that this is possible because even though this sentence violates the general grammatical principle that direct objects should be next to the verb, it satisfies the principle of dependency length minimization, which says that shorter dependencies are preferred.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因为即使这句话违反了直接对象应该位于开发者之后的通用语法原则。它满足了依赖关系长度最小化的原则，该原则说短依赖关系优于长依赖关系。所以，", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_576.wav", "doc_id": "rISrKoXQCx.seg_576", "src_text": "There are a bunch of more examples in the appendix to further highlight that this indicates that there is a fairness issue that is very pressing regarding the political biases of language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "附录中。进一步强调的是，这表明这是一个公平问题，非常压力政治偏见。语言模型，", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_18.wav", "doc_id": "aQpIWggfCo.seg_18", "src_text": "We dig into a more fine-grained topic categories of constraints defined in wikiHow.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们探讨了更多的主题限制，根据工作方式不同。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_495.wav", "doc_id": "SUkmfOTvGi.seg_495", "src_text": "So going back to the question that we raised in the title of our paper Do CoNLL-2003 taggers still work in 2023?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，让我们回到我们论文标题中提出的问题：Cornell2003标签器在2023年是否仍然有效？", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_489.wav", "doc_id": "SUkmfOTvGi.seg_489", "src_text": "And this shows us that adaptive overfitting in this case is not observed.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这表明在这种情况下，适应性过度适应没有被观察到。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_136.wav", "doc_id": "wLqFAuDnKa.seg_136", "src_text": "And this can go, in extreme cases, up to 40 BLEURT points.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在极端情况下可以达到40个球点，因此很重要", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_259.wav", "doc_id": "oYCKgTzTDy.seg_259", "src_text": "And our results show many interesting findings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的结果显示了许多有趣的发现。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_835.wav", "doc_id": "GvEBWkLmuI.seg_835", "src_text": "So we can ask the model to generate a persona, which is a depiction of an imagined individual using a prompt like \"Imagine you are an Asian woman.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以我们可以要求模型生成一个人物，这是对想象个体使用一个像“想象你是一个亚洲女性”", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_559.wav", "doc_id": "rISrKoXQCx.seg_559", "src_text": "They occupy all four quadrants on the political campus.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "它们占领了政治大厅的四个象限。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_676.wav", "doc_id": "oaOHnMCwad.seg_676", "src_text": "This work was done in collaboration with some folks at the University of Washington and the Allen Institute for AI, namely Sebastian Santy, Ronan Le Bras, Katharina Reinecke and Maarten Sap.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这项工作是与华盛顿大学和人工智能研究所（AIInstitute）的一些人合作完成的，包括SebastianSantani,RonanLarbalestier,CaterinaRizica和MartinSapp。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_552.wav", "doc_id": "rISrKoXQCx.seg_552", "src_text": "So on one hand, they were able to learn from diverse perspectives, which celebrates democracy and the plurality of ideas.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "从不同角度来看，他们可以从多种角度欣赏民主和多元化的思想；", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_517.wav", "doc_id": "dvGkKzmIaN.seg_517", "src_text": "However, this method either not applicable to embedding as services or lack of transferability.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，这种方法既不适用于嵌入式服务，也缺乏可移植性。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_499.wav", "doc_id": "SUkmfOTvGi.seg_499", "src_text": "Thank you so much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，感谢您的支持。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_541.wav", "doc_id": "dvGkKzmIaN.seg_541", "src_text": "The legend of the figures means the number of triggers in each sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "嵌入的传说意味着每个句子中的触发器的数量。", "score": 43.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_393.wav", "doc_id": "WBLMIsdIrq.seg_393", "src_text": "And some people have suggested targeted evaluation on context-dependent translations, but these resources only support limited types of context-dependent translations and limited sets of languages since they usually rely on domain knowledge and human curation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "有些人建议对依赖于语境的传播进行目标评估，但这些资源只能支持有限的依赖于语境的传播和有限的语言集，因为它们通常依赖于主要的知识和人类创造力。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_154.wav", "doc_id": "wLqFAuDnKa.seg_154", "src_text": "So, it seems that PaLM chooses to produce a better-sounding translation, sometimes by dropping parts of the source sentence that are made in translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，似乎Palm选择了更好的翻译方式，通过删除翻译中排列的句子部分。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_862.wav", "doc_id": "GvEBWkLmuI.seg_862", "src_text": "First, from our groups, the top words include things like \"culture\", \"tradition\", \"proud\", and \"exotic\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，标记组的顶词包括文化、传统、自豪和异国情调等，这些词仅仅通过", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_681.wav", "doc_id": "oaOHnMCwad.seg_681", "src_text": "Where prospective AP is really not as sensitive to offensive terms that are more common in Indian contexts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "AI模型对更常见于印度语环境中的辱骂性词汇的敏感度确实不够。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_93.wav", "doc_id": "uZBWfYjYnf.seg_93", "src_text": "What is simultaneous speech translation?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "什么是同步语音翻译？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_492.wav", "doc_id": "SUkmfOTvGi.seg_492", "src_text": "Our conclusion is that, for good generalization we would need a better model architecture, larger model size, as well as more fine tuning examples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的结论是，为了好的泛化，我们需要更好的模型架构、更大的模型尺寸，以及更多的精调示例，并且", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_733.wav", "doc_id": "XejEJmgUmE.seg_733", "src_text": "So the minimal pair paradigm basically evaluates language models on top of acceptability judgments.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最小的对称范畴基本上评估语言模型，", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_154.wav", "doc_id": "wLqFAuDnKa.seg_154", "src_text": "So, it seems that PaLM chooses to produce a better-sounding translation, sometimes by dropping parts of the source sentence that are made in translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，所以它似乎是最合适的。.有时？通过在翻译中省略句子的某些部分。", "score": 40.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_481.wav", "doc_id": "SUkmfOTvGi.seg_481", "src_text": "We found that usually larger models lead to better generalization.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现，通常，较大的模型会导致更好的泛化。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_27.wav", "doc_id": "aQpIWggfCo.seg_27", "src_text": "We only keep the script if the target goal scores the highest in the goal set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "只有当目标鬼魂得分在鬼魂视野中最高时，我们才保留脚本。", "score": 43.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_524.wav", "doc_id": "dvGkKzmIaN.seg_524", "src_text": "We assume the provider can collect a general text corpus and count the word frequency with it.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们假设提供者可以收集通用文本库，并计算单词频率。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_567.wav", "doc_id": "rISrKoXQCx.seg_567", "src_text": "We separately pretrain language models on the two different temporal corpora.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们分别预训练语言模型在两个不同的时间段的语料库上。我们", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_488.wav", "doc_id": "SUkmfOTvGi.seg_488", "src_text": "This means that every unit of improvement that we made, on CoNLL-2003 translates to more than one unit improvement on CoNLL++ which means that there is no diminishing returns.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这意味着我们在Color2003上做的每个改进都转化为Color+的多个改进。这意味着没有减少的回报。", "score": 59.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_190.wav", "doc_id": "SLpqvupgvW.seg_190", "src_text": "The first one is uniform at random.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第一个是统一的轨道", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_825.wav", "doc_id": "WTTtiRKFZI.seg_825", "src_text": "So see the paper for the full arguments.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以看一下关于全面的协议和论点的", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_477.wav", "doc_id": "SUkmfOTvGi.seg_477", "src_text": "Throughout experiments we found that there are three main ingredients that are needed.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "通过我们的实验，我们发现需要三个主要成分。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_398.wav", "doc_id": "WBLMIsdIrq.seg_398", "src_text": "In the previous work, we introduced CXMI as a measure for context usage by machine translation models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在之前的工作中，我们引入了CMI作为机器翻译模型的上下文使用度量，并且", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_430.wav", "doc_id": "WBLMIsdIrq.seg_430", "src_text": "See you in Toronto.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_796.wav", "doc_id": "WTTtiRKFZI.seg_796", "src_text": "\"Marge read this absolutely fascinating book about bees yesterday.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_615.wav", "doc_id": "oeooqChmKK.seg_615", "src_text": "This last setting is especially interesting, since it simulates the case where the background knowledge necessary to solve a task is not part of the pretrain data of models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后的设置非常有趣，因为它模拟了需要解决任务的背景知识并不是模型的预训练数据的一部分，", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_687.wav", "doc_id": "oaOHnMCwad.seg_687", "src_text": "And so one question that people might ask is, do datasets and models have positionality?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，人们可能会问：数据集和模型是否具有位置性？", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_728.wav", "doc_id": "XejEJmgUmE.seg_728", "src_text": "Hi, everyone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大家好，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_98.wav", "doc_id": "uZBWfYjYnf.seg_98", "src_text": "And training and maintaining several models to reach different latency regimes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "维护多个模型以达到不同的延迟模式，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_85.wav", "doc_id": "TVCREhgqUP.seg_85", "src_text": "As a consequence, for a given token we don't know which multiset it came from, which poses a challenge for training.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，对于一个给定的令牌，我们不知道它来自哪个多标注器，这给训练带来了挑战。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_497.wav", "doc_id": "SUkmfOTvGi.seg_497", "src_text": "We hope our paper calls for more research on how to improve generalizations of the models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们希望我们的论文呼吁更多关于如何改善模型泛化的研究。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_343.wav", "doc_id": "gGbuDbHhyc.seg_343", "src_text": "This is joint work with Xiaoyu Shen, Marius Mosbach, Andreas Stephan, and Dietrich Klakow.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "7.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.4", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_468.wav", "doc_id": "SUkmfOTvGi.seg_468", "src_text": "Firstly, can these models generalise to modern data?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，这些模型可以将其概括为现代数据吗？", "score": 59.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_399.wav", "doc_id": "WBLMIsdIrq.seg_399", "src_text": "And this is done by measuring how much information the context C provides about the target Y, given the source X. You can think of CXMI as the information gained from giving context to the model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "是通过测量上下文C关于目标Y给定源X提供的信息量来实现的。您可以将CMI视为将上下文提供给模型时获得的信息量。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_138.wav", "doc_id": "wLqFAuDnKa.seg_138", "src_text": "In our experiments, we settled for a 5-shot prompting strategy where we just marked each sentence that we provide to the system, with the language it's in.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在我们的实验中，我们采用了五轮提示策略。我们只是标记了它（）是我们提供给系统的句子，使用的语言是（）；", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_199.wav", "doc_id": "SLpqvupgvW.seg_199", "src_text": "For the recipes and books domain, we show some background text from Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对于食谱和书籍，我们显示一些来自维基百科的背景文本；", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_278.wav", "doc_id": "PIZEXUFLAR.seg_278", "src_text": "In which the input text, images, instructions and bounding boxes are represented in the same token space.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "其中输入文本、图像、指令和边界盒都在同一个令牌空间中表示。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_685.wav", "doc_id": "oaOHnMCwad.seg_685", "src_text": "This is a concept widely used in critical studies, specifically in feminist and queer academic spaces.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是一个在批判性研究中广泛使用的概念，尤其是在女权主义和酷儿学术空间中。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_487.wav", "doc_id": "SUkmfOTvGi.seg_487", "src_text": "For data overfitting, we saw that from the graph on the right, the red best fit line has a gradient that is greater than one.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对于适应过度匹配，我们发现从右边的图表中，红色最佳匹配线具有比一大一的梯度。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_809.wav", "doc_id": "WTTtiRKFZI.seg_809", "src_text": "So, \"salt and pepper\" and not \"pepper and salt\", measured in syllables.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "的同位语短，例如盐和胡椒而不是胡椒和盐，和也。观察到", "score": 31.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_167.wav", "doc_id": "SLpqvupgvW.seg_167", "src_text": "But sometimes an indirect reference is more appropriate to have a more natural conversation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但有时，间接引用更适合。更自然的对话：", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_451.wav", "doc_id": "hgIDlKNiFM.seg_451", "src_text": "To evaluate our seven models, we gather data for public and private downstream tasks such as named entity recognition, classification, part-of-speech tagging, and question answering.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了评估我们七个模型，我们收集了多个公共和私人任务，例如姓名和身份识别，分类，语音采集和问答。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_770.wav", "doc_id": "XejEJmgUmE.seg_770", "src_text": "Thank you for listening.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": ".", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_86.wav", "doc_id": "TVCREhgqUP.seg_86", "src_text": "In addition, sometimes there are multiple permutations that are consistent with the data, but the linguistically correct one is latent.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "此外，有时会有多个与数据一致的置换，但从语言上来说正确的置换是隐性的。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_144.wav", "doc_id": "wLqFAuDnKa.seg_144", "src_text": "The summary of our experimental results is that the example quality is more important than the similarity to the source sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们实验结果的总结是，示例质量比与源句子相似度更重要。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_456.wav", "doc_id": "hgIDlKNiFM.seg_456", "src_text": "Overall, from-scratch pre-training seems to obtain higher performance on most of the tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "总的来说，从头开始训练似乎在大多数任务中获得更高的性能。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_672.wav", "doc_id": "FLkGnzVRew.seg_672", "src_text": "Feel free to get in touch with us if you have any questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果您有任何问题，请随时与我们联系。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_287.wav", "doc_id": "PIZEXUFLAR.seg_287", "src_text": "So during test for each task, we conduct a total of 5 experiments by evaluating the model using one of the five instructions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "期间，每个任务我们将执行五个实验，使用每个实验中的五个指令评估模型。我们报告了均值和最大", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_857.wav", "doc_id": "GvEBWkLmuI.seg_857", "src_text": "So, while the generated personas have much higher rates of the lexicon words, the human-written ones have a much wider distribution of words, while the stereotype words that are in the generated personas are really just the words \"tall\" and \"athletic\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "虽然生成的个人有更高的拉克索恩词的等级，但人类写的词有更广泛的分布，而生成的个人词只是高大而有活力。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_334.wav", "doc_id": "dJGfOSFgZO.seg_334", "src_text": "For example, the bots we tested have common sense violations in around 20% of their responses.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，我们测试的机器人在大约20%的回答中有常识违反。他们", "score": 67.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_402.wav", "doc_id": "WBLMIsdIrq.seg_402", "src_text": "Now we analyze words with high P-CXMI to look for patterns between these words.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "现在我们分析了高频语音信号（HPSMI），以寻找这些字之间的模式", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_445.wav", "doc_id": "hgIDlKNiFM.seg_445", "src_text": "Is it 4 gigabytes, 8 gigabytes, or more?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "是4GB、8GB还是更多？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_103.wav", "doc_id": "uZBWfYjYnf.seg_103", "src_text": "And leverage the knowledge already acquired by the model through the attention mechanism between audio input and textual output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "并利用模型通过输入音频和文本之间的注意力机制获得的知识。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_93.wav", "doc_id": "uZBWfYjYnf.seg_93", "src_text": "What is simultaneous speech translation?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "称Simulta", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_265.wav", "doc_id": "PIZEXUFLAR.seg_265", "src_text": "Recently, many studies have shown that instruction tuning enables large language models to perform on unseen tasks in a zero-shot manner by following natural instructions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最近，许多研究表明，指令调节允许大语言模型以完全的方式通过遵循自然指令执行未见过的任务。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_816.wav", "doc_id": "WTTtiRKFZI.seg_816", "src_text": "It's absent in the second example \"Homer came and sneezed.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在第二个例子中，HomerCameandSneid了", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_570.wav", "doc_id": "rISrKoXQCx.seg_570", "src_text": "So last but not least, we evaluate language models with different political leanings on hate speech detection and fake news detection to NLP applications that often involve language models and could have very significant implications.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后但不是最不重要的是，我们评估了语言模型的不同政治倾向，包括仇恨言论检测和假新闻检测，两种自然语言处理应用程序，通常都涉及语言模型。这可能会有非常重要的影响，", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_645.wav", "doc_id": "FLkGnzVRew.seg_645", "src_text": "To the goal of creating a cognitive dissonance resource, we conducted a large scale annotation of dissonance relations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了实现构建认知不协调资源的目标，我们进行了大规模的注释不协调关系。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_738.wav", "doc_id": "XejEJmgUmE.seg_738", "src_text": "These days large language models are coming up with longer and longer context windows.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如今，使用较长和较长的联系窗口的大型语言模型越来越多，", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_766.wav", "doc_id": "XejEJmgUmE.seg_766", "src_text": "That is, when we perturb the sentences in the acceptable domain, we see similar increase in all the perturbations and when we perturb the sentences in the unacceptable domain, we see decrease in MPP judgments in similar fashion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "当我们在可接受的领域扰乱句子时，我们看到所有扰乱的类似增加，当我们在不可接受的领域扰乱句子时，我们看到类似的MPP判决减少。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_737.wav", "doc_id": "XejEJmgUmE.seg_737", "src_text": "The current MPP pipeline basically doesn't allow us to evaluate a model's acceptance towards longer sentences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "当前的MPP管道基本上不允许我们评估模型对更长句子的接受程度。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_507.wav", "doc_id": "dvGkKzmIaN.seg_507", "src_text": "For example, OpenAI offers a GPT based embedding API.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，OpenI提供一个基于GPT的嵌入API。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_591.wav", "doc_id": "oeooqChmKK.seg_591", "src_text": "Natural language understanding models draw on a variety of knowledge sources, such as knowledge contained in their parameters, usually acquired by a pretraining, and knowledge given in inputs at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "国家语言理解模型从各种知识来源中提取出来，例如，通常通过预训练获得的参数中包含的知识，通常通过预训练获得的知识，", "score": 40.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_246.wav", "doc_id": "oYCKgTzTDy.seg_246", "src_text": "We found that Encoder-Decoder or Encoder-PTR can be improved by training in a mixture of various languages.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现编码器解码器或编码器-PDR可以通过在各种语言的混合中训练来改进。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_333.wav", "doc_id": "dJGfOSFgZO.seg_333", "src_text": "You can see that in the results of our experiment that several challenges still remain and have been precisely quantified.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在我们的实验结果中看到，仍然存在一些挑战，并且已经精确量化了。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_270.wav", "doc_id": "PIZEXUFLAR.seg_270", "src_text": "However, there is no large-scale publicly-available multi-modal instruction task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "没有大规模的公共可用的多模指令，", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_703.wav", "doc_id": "oaOHnMCwad.seg_703", "src_text": "We've then compared these, annotations with Social Chemistry, Delphi and GPT 4.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后我们将这些注释与社会化学、德尔菲和GPT4进行了比较。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_306.wav", "doc_id": "PIZEXUFLAR.seg_306", "src_text": "So this is a QR code for our data and model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是我们数据和模型的QR码。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_83.wav", "doc_id": "TVCREhgqUP.seg_83", "src_text": "In our paper, we solve a couple of interesting technical challenges.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在我们的论文中，我们发现了一些有趣的技术挑战。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_358.wav", "doc_id": "gGbuDbHhyc.seg_358", "src_text": "We addressed these research questions in our work and our findings are as follows.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们在我们的工作中解决了这些研究问题，以下是我们的发现。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_808.wav", "doc_id": "WTTtiRKFZI.seg_808", "src_text": "So what we did, we extracted various statistics about coordination from the enhanced version of the Penn Treebank and see the paper \"Why wouldn't you use universal dependencies\" and these statistics confirm the observation made many times before that left conjuncts tend to be shorter.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们从加强版的潘特里银行的协调数据中提取了许多统计数据，看到了一份文件，解释为什么我们不会使用大学的依赖关系。这些统计证实了人们多次观察到的左结合体往往比右结", "score": 69.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_146.wav", "doc_id": "wLqFAuDnKa.seg_146", "src_text": "In particular, we compare the selecting prompts from the training data for the WMT evaluations on the dev data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "特别是我们比较来自WMT评估的训练数据或欺骗数据的选择提示。", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_539.wav", "doc_id": "dvGkKzmIaN.seg_539", "src_text": "The results on four data sets show that our embedding marker can have great detection performance while keep great utility for downstream tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "四个数据集的结果表明，我们的嵌入式标记器可以具有很好的检测性能，同时为下屏幕任务提供很好的实用性。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_160.wav", "doc_id": "SLpqvupgvW.seg_160", "src_text": "I'm going to talk about our work on \"Resolving Indirect Referring Expressions for Entity Selection\", in which we introduce the AltEntities Corpus.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我要谈谈我们关于解决实体选择中的间接引用表达式的工作，其中我们介绍了AlternatingEntitiesCorpus。", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_93.wav", "doc_id": "uZBWfYjYnf.seg_93", "src_text": "What is simultaneous speech translation?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "什么是同步语音翻译？", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_118.wav", "doc_id": "uZBWfYjYnf.seg_118", "src_text": "And we also see that if we consider the actual elapsed time or the computational-aware time, that is the fastest strategy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还看到，如果我们考虑实际的延迟时间或计算的延迟时间，Adapt是最快的策略。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_50.wav", "doc_id": "TVCREhgqUP.seg_50", "src_text": "Compositional generalization can be understood as the ability of a learner to handle deeper recursion and unseen compositions of phrases that have been seen individually during training.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "组合一般化可以被理解为学习者处理更深层次的回归和未见过的短语组合的能力，这些短语在训练过程中被单独看到。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_257.wav", "doc_id": "oYCKgTzTDy.seg_257", "src_text": "To sum up, we build XSemPLR, a unified benchmark for cross-lingual semantic parsing with multiple natural languages and meaning representations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "适应跨语言语义分析任务。总之，我们建立了一个名为Exemplar的跨语言语义分析的统一基准，支持多种自然语言和多种表示形式。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_861.wav", "doc_id": "GvEBWkLmuI.seg_861", "src_text": "In our analysis, we reveal how these seemingly positive portrayals reflect harmful patterns.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在我们的分析中，我们回顾了这些似乎是积极的肖像画如何反映了有害的模式。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_426.wav", "doc_id": "WBLMIsdIrq.seg_426", "src_text": "So this sort of suggests where we would need to see more progress for document-level translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以这就意味着我们需要为文档级传输做更多的进步。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_248.wav", "doc_id": "oYCKgTzTDy.seg_248", "src_text": "I think this is known as the \"Curse of Multilinguality\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我认为这是多语言的诅咒。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_806.wav", "doc_id": "WTTtiRKFZI.seg_806", "src_text": "It violates one principle, but it satisfies another one.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "它违反了一个原则，但它满足了另一个原则。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_877.wav", "doc_id": "GvEBWkLmuI.seg_877", "src_text": "We just really can't make any assumptions or really study that further, without more transparency.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，我们真的不能做出任何假设，或者进一步研究这些问题，更多的透明度。", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_641.wav", "doc_id": "FLkGnzVRew.seg_641", "src_text": "Studying cognitive dissonance can help us understand the effects of disagreement among people, track trends and belief values, and attitude changes in population.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "学习认知差异可以帮助我们理解人际之间的冲突：追踪趋势和信仰，价值观和态度在人口中的变化。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_54.wav", "doc_id": "TVCREhgqUP.seg_54", "src_text": "And \"Mary knew that the girl slept.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "玛丽知道女孩睡着了。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_2.wav", "doc_id": "aQpIWggfCo.seg_2", "src_text": "In everyday life, humans often plan their actions by following step-by-step instructions in the form of goal-oriented scripts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在日常生活中，人类通常通过遵循目标脚本中的一步步指示来规划自己的行动。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_165.wav", "doc_id": "SLpqvupgvW.seg_165", "src_text": "Here, a user wants to select between one of these two songs.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这里，用户想选择这两首歌中的一个。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_708.wav", "doc_id": "oaOHnMCwad.seg_708", "src_text": "We find that there is positionality in NLP.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现NLP中存在位置性。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_425.wav", "doc_id": "WBLMIsdIrq.seg_425", "src_text": "But these models are not much better than models that do not use context on other phenomena like ellipsis, pronouns, and verb form.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但这些模型并不比没有使用联系其他现象（如椭圆、圆锥和形式）的模型好，", "score": 43.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_218.wav", "doc_id": "oYCKgTzTDy.seg_218", "src_text": "And Cross-Lingual Semantic Parsing is the task to translate queries in multiple natural languages into multiple meaning representations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "跨语言语义解析是任务，翻译查询到多种自然语言。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_476.wav", "doc_id": "SUkmfOTvGi.seg_476", "src_text": "So what is needed for a good generalization?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "那么，什么是好的概括所需要的？", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_512.wav", "doc_id": "dvGkKzmIaN.seg_512", "src_text": "First the method should be applicable to embedding as services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先该方法应该适用于嵌入和服务；", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_759.wav", "doc_id": "XejEJmgUmE.seg_759", "src_text": "And there we see that the MPP judgments either increase or decrease significantly when you add either acceptable prefixes or unacceptable prefixes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "到MP判断在添加可接受前缀或不可接受前缀时会显著增加或减少。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_152.wav", "doc_id": "wLqFAuDnKa.seg_152", "src_text": "The insights that we gained from the human evaluation that we performed using the MQM framework said that the fluency of PaLM is comparable to state-of-the-art systems but the main difference comes from the accuracy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "twithGoogleTranslate.“哇！我们从人类评估中获得的见解是：使用MCMF框架，我们可以实现与当前的SOTA系统相当的流畅性，但主要的差异来自准确性。", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_697.wav", "doc_id": "oaOHnMCwad.seg_697", "src_text": "We then take the annotations by demographic and compare them to the models and datasets using a Pearson's R correlation score, and thus our framework actually differs from annotator disagreement literature by comparing end users with models and datasets, predictions and labels, as opposed to looking at just annotator agreement or modelling annotator distributions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后我们使用人口统计学的注释将它们与模型和数据集进行比较。因此，我们的框架实际上与注释器协议文档不同，因为我们将最终用户与模型和数据集、预测和标签进行比较，而不是仅仅注释器协议或模型化注释器分发。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_348.wav", "doc_id": "gGbuDbHhyc.seg_348", "src_text": "If we directly train neural networks on weakly labeled data, the neural networks tend to memorize the label noise and do not generalize.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果我们直接训练神经网络并对数据进行弱标签，则神经网络往往会记住标签噪音，而不会进行泛化。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_166.wav", "doc_id": "SLpqvupgvW.seg_166", "src_text": "The most obvious thing is to use a direct reference, for example by saying the name of the song \"Easy on Me\" or its position, \"the first one\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最明显的方法是使用直接参考，例如说这首歌的名字“易于我”或它的位置，第一首。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_315.wav", "doc_id": "dJGfOSFgZO.seg_315", "src_text": "Therefore, you might want to evaluate multiple dimensions of chat quality to understand the strengths and weaknesses of the model on a finer-grained level.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此你可能想评估多个对话质量的维度，以了解模型在更细粒度的层面上的优缺点。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_246.wav", "doc_id": "oYCKgTzTDy.seg_246", "src_text": "We found that Encoder-Decoder or Encoder-PTR can be improved by training in a mixture of various languages.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "置中，并发现编码器解码器或编码器PDTR可以通过在混合各种语言的混合中进行训练来改进。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_831.wav", "doc_id": "GvEBWkLmuI.seg_831", "src_text": "However, these measures have various limitations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，这些措施有各种限制：", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_847.wav", "doc_id": "GvEBWkLmuI.seg_847", "src_text": "The benefit of this is that we get really specific stereotypes and patterns, without having to rely on any specific lexicon.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这的好处是我们得到真正具体的刻板印象和模式，", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_601.wav", "doc_id": "oeooqChmKK.seg_601", "src_text": "Servin is a judge.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "塞尔文是法官，", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_357.wav", "doc_id": "gGbuDbHhyc.seg_357", "src_text": "Finally, should we only use the clean samples for validation, or there are better ways to utilize them?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，我们只应该使用清洁样本进行验证，还是有更好的方法可以使用它们？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_851.wav", "doc_id": "GvEBWkLmuI.seg_851", "src_text": "And more broadly, dominant groups in society are both linguistically and socially unmarked, while the marginalized groups are usually marked.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "更广泛地说，社会中的主导群体在语言上和社会上都没有标记，而边缘群体通常是有标记的。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_448.wav", "doc_id": "hgIDlKNiFM.seg_448", "src_text": "One based on the weight of CamemBERT and trained on a 4 GB set of NACHOS.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "一项基于Camembert的权重并在4倍的NACHOS上进行训练。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_173.wav", "doc_id": "SLpqvupgvW.seg_173", "src_text": "We're not aware of a larger-scale public data set for the task, so we collect one using crowd annotation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们不知道公共数据集（一个大规模的公共数据集用于任务），所以我们使用人群计数收集一个数据集。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_440.wav", "doc_id": "hgIDlKNiFM.seg_440", "src_text": "Specialized models for other languages are scarce and are often based on continual pre-training due to the lack of in-domain data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "其他语言的专用模型很少，通常基于连续预训练，因为缺乏域内数据。然而，Fre", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_354.wav", "doc_id": "gGbuDbHhyc.seg_354", "src_text": "The aforementioned doubt is asked to ask three research questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "上面提到的疑问使我们不得不提出三个研究问题：", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_199.wav", "doc_id": "SLpqvupgvW.seg_199", "src_text": "For the recipes and books domain, we show some background text from Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对于食谱和书籍领域，我们展示一些来自维基百科的背景文本。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_672.wav", "doc_id": "FLkGnzVRew.seg_672", "src_text": "Feel free to get in touch with us if you have any questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果您有任何问题，请随时与我们联系。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_253.wav", "doc_id": "oYCKgTzTDy.seg_253", "src_text": "We found that, by comparing the green and orange line, we found the Zero-shot setting, the Cross-lingual transfer performance gap is significant, and then comparing the blue and orange lines, we found that with the Few-shot setting the transfer gap is shortened rapidly.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "通过比较绿色和橙色线，我们发现在零shot设置中，跨语言转移性能差距很大，并且通过比较蓝色和橙色线，我们发现在零shot设置中，跨语言转移性能差距很大。我们发现在几秒钟的设置下，传输差距迅速缩小。", "score": 43.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_106.wav", "doc_id": "uZBWfYjYnf.seg_106", "src_text": "A word is emitted if the attention is not concentrated, that is, its sum is below a certain threshold alpha towards the last lambda speech frames, meaning that the received information is enough stable.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果没有集中注意力，那么发出的词汇将会被发出，如果其总和低于某个阈值（即Alpha），则意味着所接收的信息是不稳定的。", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_409.wav", "doc_id": "WBLMIsdIrq.seg_409", "src_text": "We then look at vocabulary items that have high P-CXMI averaged over all of its different occurrences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们看到了语料库中的语料，高频词在所有不同情况下都有很高的平均频率。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_568.wav", "doc_id": "rISrKoXQCx.seg_568", "src_text": "We can see that language models generally had a political leaning that is further away from the centre after 2017.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "可以看到语言模型通常具有政治倾向，这种倾向在2017年之后从中心向外延伸，因此", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_12.wav", "doc_id": "aQpIWggfCo.seg_12", "src_text": "As shown in the table, we extend the abstract goals with multi-faceted constraints for human-in-the-loop data acquisition using InstructGPT.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如表所示，我们扩展了抽象目标的多阶段约束，以便人类在使用InstrGPT进行数据采集时可以更好地理解。", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_443.wav", "doc_id": "hgIDlKNiFM.seg_443", "src_text": "To answer this question, we compare DrBERT with our ChuBERT model, which is based on anonymized data obtained from the Nantes University Hospital data warehouse.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了回答这个问题，我们将与伯特医生和我们的舒伯特模型进行比较，舒伯特模型是基于我们收到的非大学院的匿名数据。", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_92.wav", "doc_id": "uZBWfYjYnf.seg_92", "src_text": "Hi, I'm Sara Papi from the University of Trento and Foundazione Bruno Kessler and I will briefly introduce the \"Attention as a Guide for Simultaneous Speech Translation\" paper, that is a joint work with Matteo Negri and Marco Turchi.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我是特伦特大学和布鲁诺·凯斯勒基金会的塞拉·帕比。我将简要介绍作为指南的注意力，用于同时翻译演讲文本，这是与马特奥·内格里和马可·图尔基共同工作的。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_489.wav", "doc_id": "SUkmfOTvGi.seg_489", "src_text": "And this shows us that adaptive overfitting in this case is not observed.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这表明在这种情况下没有观察到适应性超适应。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_689.wav", "doc_id": "oaOHnMCwad.seg_689", "src_text": "So prior work has suggested some anecdotal evidence of having positionality, such as cultural gaps and models and data sets, as well as theoretical definitions of model positionality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，前人工作已经提出了某些轶事性的位置性证据，如模型和数据集中的文化差异，以及模型位置性的理论定义。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_589.wav", "doc_id": "oeooqChmKK.seg_589", "src_text": "Hello everyone, I'm Akshatha, and today my co-author Martin and I are presenting our work \"The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "欢迎大家，今天我和我的同事马丁一起演示我们的工作，评估来自多个来源的知识积累。", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_520.wav", "doc_id": "dvGkKzmIaN.seg_520", "src_text": "Embedding marker contains two main steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "嵌入标记器包含两个主要步骤。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_15.wav", "doc_id": "aQpIWggfCo.seg_15", "src_text": "We find that all language models achieve unsatisfactory results on planning for specific goals.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现，所有轻量级模型在规划特定目标方面的结果都令人不满意。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_100.wav", "doc_id": "uZBWfYjYnf.seg_100", "src_text": "So what is our solution?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的解决方案是：", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_318.wav", "doc_id": "dJGfOSFgZO.seg_318", "src_text": "Our approach attempts to reduce the subjectivity of human evaluation by explicitly annotating whether or not each model response expresses certain behaviors, such as responding with irrelevant information or contradicting itself.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的方法尝试通过明确注释每个模型响应是否表达某些行为来减少人类评估的主观性，例如响应与无关信息或自相矛盾。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_458.wav", "doc_id": "hgIDlKNiFM.seg_458", "src_text": "Which is not the case for the model based on CamemBERT weights and tokenizer, which suffer from stability issues.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这并不是基于CommonBERT权重和Tokenizer的模型的案例，因为这些模型存在稳定性问题。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_77.wav", "doc_id": "TVCREhgqUP.seg_77", "src_text": "Then we jump to the next multiset token, to determine the second token in the output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后，我们跳到下一个多集令牌，以确定输出中的第二个令牌。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_82.wav", "doc_id": "TVCREhgqUP.seg_82", "src_text": "Some other kinds of structural generalization remain very challenging, though.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "更大的优势。其他类型的结构化转换也很有挑战性。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_254.wav", "doc_id": "oYCKgTzTDy.seg_254", "src_text": "We also find some other interesting findings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还发现了一些其他有趣的发现：", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_250.wav", "doc_id": "oYCKgTzTDy.seg_250", "src_text": "In this figure, the blue line is Cross-lingual Few-shot transfer.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这个图中，蓝色线是跨语言零shot转移，", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_97.wav", "doc_id": "uZBWfYjYnf.seg_97", "src_text": "Long and complicated training procedures, for example, training involving different optimization objectives.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "训练过程长且复杂，例如训练涉及不同的优化目标。并且训练和", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_257.wav", "doc_id": "oYCKgTzTDy.seg_257", "src_text": "To sum up, we build XSemPLR, a unified benchmark for cross-lingual semantic parsing with multiple natural languages and meaning representations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "总结起来，我们构建了示例，一个用于双语语法解析的统一基准，具有多种自然语言和多种代表。", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_392.wav", "doc_id": "WBLMIsdIrq.seg_392", "src_text": "Firstly because only a small portion of translations depend on context which makes corpus-level metrics like BLEU unable to capture these translations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，因为只有很小一部分翻译依赖于语境，这使得蓝色等体层级矩阵无法捕获这些翻译。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_289.wav", "doc_id": "PIZEXUFLAR.seg_289", "src_text": "If the task is a multi-model classification task, we report accuracy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果任务是多模分类任务，我们报告准确度；", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_648.wav", "doc_id": "FLkGnzVRew.seg_648", "src_text": "As can be seen here, dissonance was only found in 3.5% of the annotated pairs.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "正如在这里可以看到的，异议只在注释的对话中发现。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_321.wav", "doc_id": "dJGfOSFgZO.seg_321", "src_text": "ABC-Eval is capable of measuring the rates at which chat models will commit various thematic errors.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "ABC-EVAL可以衡量那些会犯各种主题错误的聊天机器人。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_729.wav", "doc_id": "XejEJmgUmE.seg_729", "src_text": "I'm Koustav Sinha, and I'm pleased to welcome you to our talk of our ACL 2023 paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "各位，我是科斯塔夫·塞纳，很高兴与大家分享我们的ACLU2023论文：", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_502.wav", "doc_id": "dvGkKzmIaN.seg_502", "src_text": "Are you copying my model?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我要复制我的模型，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_248.wav", "doc_id": "oYCKgTzTDy.seg_248", "src_text": "I think this is known as the \"Curse of Multilinguality\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我认为这被称为多语言的诅咒。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_611.wav", "doc_id": "oeooqChmKK.seg_611", "src_text": "We have defined three settings of KITMUS.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们定义了三种凯德莫斯的设置：", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_856.wav", "doc_id": "GvEBWkLmuI.seg_856", "src_text": "However, when we actually look at the distribution of the words and lexicon, we find very different things.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，当我们实际上看一下词典中的词的分布时，我们发现了非常不同的东西，所以", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_291.wav", "doc_id": "PIZEXUFLAR.seg_291", "src_text": "We also introduce an additional evaluation metric called sensitivity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还引入了一个额外的评估指标，称为敏感度。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_628.wav", "doc_id": "oeooqChmKK.seg_628", "src_text": "However, with task-specific training, some models successfully integrate knowledge from multiple sources.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，通过任务特定训练，某些模型成功地整合了来自多个来源的知识。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_740.wav", "doc_id": "XejEJmgUmE.seg_740", "src_text": "We're trying to revisit the MPP pipeline by asking the model to evaluate acceptability on longer and longer sequences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们试图通过要求模型对更长和更长的序列进行可接受性评估来重新访问NBP管道。所以，", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_241.wav", "doc_id": "oYCKgTzTDy.seg_241", "src_text": "And we also find many interesting results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还发现了许多有趣的结果。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_697.wav", "doc_id": "oaOHnMCwad.seg_697", "src_text": "We then take the annotations by demographic and compare them to the models and datasets using a Pearson's R correlation score, and thus our framework actually differs from annotator disagreement literature by comparing end users with models and datasets, predictions and labels, as opposed to looking at just annotator agreement or modelling annotator distributions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后我们根据人口统计学对注释进行分类，并将它们与使用Pearson相关系数进行比较的模型和数据集。因此，我们的框架实际上与注释者不一致的文献不同，因为我们将最终用户与模型、数据集、预测和标签进行比较，而不是仅仅注释者一致或建模注释者分布。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_454.wav", "doc_id": "hgIDlKNiFM.seg_454", "src_text": "However, we can observe that data from heterogeneous sources appear to be more versatile.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们可以观察到来自异源数据的数据似乎更具多样性，", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_547.wav", "doc_id": "rISrKoXQCx.seg_547", "src_text": "Today I'm presenting our work \"From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "今天，我要从预备数据到语言模型，到下游任务，跟踪政治行为导致不公正的ALV模型的轨迹。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_644.wav", "doc_id": "FLkGnzVRew.seg_644", "src_text": "Finally, cognitive dissonance is important to understand personal cognitive styles of individuals and helps us understand decision making processes better.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，认知差异很重要，需要了解个人认知风格，以便我们能更好地理解决策过程。", "score": 69.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_314.wav", "doc_id": "dJGfOSFgZO.seg_314", "src_text": "These approaches work well to provide holistic evaluations of overall dialogue quality, but dialogue quality has many aspects.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这些方法在提供对话质量的整体评估方面很好，但对话质量有很多方面，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_773.wav", "doc_id": "WTTtiRKFZI.seg_773", "src_text": "So for example, in the universal dependencies, the structure of the coordination, Lisa, Bart, and Maggie, such that the first conjunct is the head of the whole coordinate structure.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，在宇宙依赖中，协调的协调结构是由LisaA.Batty和Maggie所承认的。这是这样的：第一个结合体是整个心脏结构的头部，所以", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_825.wav", "doc_id": "WTTtiRKFZI.seg_825", "src_text": "So see the paper for the full arguments.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "和非对称性结构（如这两种）的论据。因此，请参阅论文中的完整协议和论据，", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_672.wav", "doc_id": "FLkGnzVRew.seg_672", "src_text": "Feel free to get in touch with us if you have any questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果您有任何问题，请随时与我们联系。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_79.wav", "doc_id": "TVCREhgqUP.seg_79", "src_text": "We continue this process until every token from the first stage has been visited exactly once.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们继续这个过程吗？直到第一阶段的每个令牌都被访问了一次。", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_782.wav", "doc_id": "WTTtiRKFZI.seg_782", "src_text": "And finally, there's also a multi-headed approach that's used, for example, in the Hudson's Word Grammar, where they say all conjuncts are heads of the coordinate structure.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，这也是一个多头的方法——例如，在《卡特森斯世界语法》中使用。在这种情况下，所有的管制结构都处于上级的管制之下，", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_74.wav", "doc_id": "TVCREhgqUP.seg_74", "src_text": "Conceptually, our permutation model works roughly like this.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "从概念上讲，我们的置换模型大致如此。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_159.wav", "doc_id": "SLpqvupgvW.seg_159", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "嗨，", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_618.wav", "doc_id": "oeooqChmKK.seg_618", "src_text": "In the Background-Pretrain setting, we assume that the background knowledge \"Politicians seek elected seats in government\" is contained in the pretrained parameters and in inference-time context we provide the entity-specific knowledge \"Chichester is a politician.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "回到预训练设置。我们假设背景知识：政客寻求在政府中获得选举席位。它包含在预训练参数中。在这个背景下。我们提供了非特定知识：Chichester是一名政治家。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_699.wav", "doc_id": "oaOHnMCwad.seg_699", "src_text": "In Live in the Wild is an online experimentation platform where we can recruit divers volunteers.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在世界上是在线实验平台，我们可以从世界各地招募志愿者，", "score": 73.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_845.wav", "doc_id": "GvEBWkLmuI.seg_845", "src_text": "And also this enables direct comparison between our generated personas and the human written responses.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这也使得我们与我们的生成人格之间可以直接比较，和人类的回应。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_733.wav", "doc_id": "XejEJmgUmE.seg_733", "src_text": "So the minimal pair paradigm basically evaluates language models on top of acceptability judgments.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最小对数对范式基本上评估语言模型的可接受性，包括可接受性判断、语法性如BLIMP、语法性如语法性、可", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_749.wav", "doc_id": "XejEJmgUmE.seg_749", "src_text": "So here the sentences are still coming from a, relevant data sets but it's not from the same data set that you are evaluating with.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，句子仍然来自相关的数据集，但不是您正在评估的相同数据集。我们也可以", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_812.wav", "doc_id": "WTTtiRKFZI.seg_812", "src_text": "So the proportion is bigger of the left short conjunct.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "间的长度差异越大时，短语越短", "score": 13.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_633.wav", "doc_id": "FLkGnzVRew.seg_633", "src_text": "I would like to present our work accepted into ACL 2023 as a long paper, \"Transfer Learning for Dissonance Detection: Addressing the Rare-Class Challenge.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我想介绍我们的工作，已被接受为ACL2023的长篇论文：TransferLearningforDissonanceDetection,AddressingtheRareClassChal", "score": 40.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_301.wav", "doc_id": "PIZEXUFLAR.seg_301", "src_text": "As we can see by transfer learning from natural instruction datasets, the model can achieve much better sensitivity compared to the original OFA model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "正如我们所看到的，通过从自然指令数据集中转移学习，模型可以在与原始的OWA模型相比，获得更好的敏感性。.", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_342.wav", "doc_id": "gGbuDbHhyc.seg_342", "src_text": "In this video, I would like to present our recent work \"Weaker Than You Think: A Critical Look at Weakly Supervised Learning.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "13.14.15.16.17.18.19.20.21.22.23.24.25.26.2", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_243.wav", "doc_id": "oYCKgTzTDy.seg_243", "src_text": "And, we also evaluate Encoder-Decoder models, which is Multilingual Pretrained Encoder-Decoder Models, such as mBART and mT5.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还评估了编码器解码器模型，这是多语言预训练的编码器解码器模型，例如AmBart和MT5。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_232.wav", "doc_id": "oYCKgTzTDy.seg_232", "src_text": "And we'll also test Monolingual Model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还将测试单线语模型。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_325.wav", "doc_id": "dJGfOSFgZO.seg_325", "src_text": "For each of the existing methods, we collected evaluations on eight of the most commonly measured aspects of dialogue, since this is the standard practice for evaluating chat models along multiple dimensions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对于每一种现有的方法，我们收集了对八种最常见的对话方面的评估，因为这是评估多维度聊天机器人评估标准实践。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_569.wav", "doc_id": "rISrKoXQCx.seg_569", "src_text": "So this indicates that language models can also pick up the polarisation in our society.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这表明语言模型也可以捕捉到我们社会中类似于政治化的趋势。所以，", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_386.wav", "doc_id": "WBLMIsdIrq.seg_386", "src_text": "So a lot of translations depend on context.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "很多翻译都取决于上下文：", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_499.wav", "doc_id": "SUkmfOTvGi.seg_499", "src_text": "Thank you so much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "谢谢。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_717.wav", "doc_id": "oaOHnMCwad.seg_717", "src_text": "So, given that there is positionality in NLP, what can we do about it?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，鉴于LED和LP有位置，我们可以做什么？", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_511.wav", "doc_id": "dvGkKzmIaN.seg_511", "src_text": "The watermark method need to meet the following properties.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "水印方法需要满足以下属性：", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_403.wav", "doc_id": "WBLMIsdIrq.seg_403", "src_text": "And we perform our analysis on transcripts of TED talks that have been translated from English to 14 different languages.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们在TED演讲的翻译文本上进行了分析，这些文本已经从英语翻译成14种不同的语言。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_602.wav", "doc_id": "oeooqChmKK.seg_602", "src_text": "Kea is a Baker.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "凯亚是面包师，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_791.wav", "doc_id": "WTTtiRKFZI.seg_791", "src_text": "Because here between the verb and the direct object is an adjunct: \"yesterday\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_266.wav", "doc_id": "PIZEXUFLAR.seg_266", "src_text": "However, most previous works on instruction tuning focused on improving the zero-shot performance on language only tasks, while computer vision and multi-modal tasks have been left out.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，大多数以前关于指令调优的研究都集中在改善语言任务的零样本性能，而忽略了计算机视觉和多模态任务。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_347.wav", "doc_id": "gGbuDbHhyc.seg_347", "src_text": "When compared to human annotations, the weaker annotations are much cheaper, yet they are also noisy, meaning that a certain amount of the annotations are incorrect.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "与人类注音相比，弱注音更便宜，但它们也很吵闹，这意味着某些注音是不正确的。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_698.wav", "doc_id": "oaOHnMCwad.seg_698", "src_text": "Our frame is largely enabled through Lab in the Wild and online crowdsourcing platform for where HCI collaborator.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的框架主要通过LabintheWild来实现，一个由我们合作的H.C.I.合作伙伴创建的在线众包平台。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_187.wav", "doc_id": "SLpqvupgvW.seg_187", "src_text": "Where A and B are samples from Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "A和B是来自维基百科的样本。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_612.wav", "doc_id": "oeooqChmKK.seg_612", "src_text": "First, we have the typical setting: \"Background-Pretrain\", where background knowledge is assumed to be available at pretrain time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，我们需要录制。“我们将在这里等待。预训练时假设背景知识可用。", "score": 37.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_189.wav", "doc_id": "SLpqvupgvW.seg_189", "src_text": "When we move higher in the list, the entities become more similar to each other and it's usually harder to make the disambiguation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "当我们在列表中移动到更高的位置时，实体变得彼此更相似，通常很难消除这种模糊。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_48.wav", "doc_id": "TVCREhgqUP.seg_48", "src_text": "My name is Matthias Lindemann, and today I'm going to give you a brief introduction to our paper on \"Compositional Generalization without Trees using Multiset Tagging and Latent Permutations\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我叫马蒂亚斯·林德曼，我今天要给你介绍一下我们关于无树木的组合通用化的论文，使用多元集标记和隐式变换。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_142.wav", "doc_id": "wLqFAuDnKa.seg_142", "src_text": "And when we go, as in our case, to five-shot prompting, there is nearly no difference to the actual form of the prompting.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "发的情况下，与实际弹发形式没有任何不同。", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_526.wav", "doc_id": "dvGkKzmIaN.seg_526", "src_text": "When a user send a sentence to the provider service the provider counts the trigger number in the sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "当用户将一句话发送到服务提供商服务时，服务提供商会计算这句话中的触发器数量。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_236.wav", "doc_id": "oYCKgTzTDy.seg_236", "src_text": "For example, we put the German, English, Chinese queries together to train a multilingual model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，我们将德语、英语和中文的询问结合在一起，以训练多语言模型，并", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_360.wav", "doc_id": "gGbuDbHhyc.seg_360", "src_text": "Otherwise, there is a large performance drop.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "否则，性能下降很大，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_838.wav", "doc_id": "GvEBWkLmuI.seg_838", "src_text": "So here are some example generations from GPT-4.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "以这里有一些来自GPT-4的示例世代。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_797.wav", "doc_id": "WTTtiRKFZI.seg_797", "src_text": "It's okay the way instead of \"it\", we have this long NP.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_0.wav", "doc_id": "aQpIWggfCo.seg_0", "src_text": "Hi, I'm Siyu Yuan from Fudan University.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我是来自复旦大学的徐元，今天", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_469.wav", "doc_id": "SUkmfOTvGi.seg_469", "src_text": "And when we develop new taggers, what is needed for good generalization?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "当我们开发新的标签时，我们需要什么才能进行良好的泛化？", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_188.wav", "doc_id": "SLpqvupgvW.seg_188", "src_text": "Here are the different sampling methods we've used.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在我们上升到", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_17.wav", "doc_id": "aQpIWggfCo.seg_17", "src_text": "Results in the figure show that the semantic completeness in generated scripts is acceptable but the faithfulness to the constraints cannot be guaranteed.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "图表结果表明，生成的脚本的语义完整性是可接受的，但忠诚度不足。不能保证满足这些约束。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_446.wav", "doc_id": "hgIDlKNiFM.seg_446", "src_text": "To answer this question, we first train and compare four from-scratch models: a first version of DrBERT, with 7 GB of NACHOS; a second version of 4 GB of set of NACHOS; a first version of ChuBERT, which is a clinical model with 4 GB of sentences taken from clinical notes; and a final version of ChuBERT with a mix of 4 GB of set of NACHOS and 4 GB of clinical notes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们首先训练并比较四种擦伤模型：贝特博士的第一种模型，具有七千千克的纳塞斯；贝特博士的第二种模型，具有四千千克的纳塞斯。舒伯特的第一版是临床模型，包含来自临床笔记的四个千兆字节的句子；舒伯特的最后版包含来自临床笔记的四千兆字节的句子和四千兆字节的临床笔记。", "score": 40.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_527.wav", "doc_id": "dvGkKzmIaN.seg_527", "src_text": "The provided embedding is a weight summation of the target embedding and the original embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "提供的嵌入是目标嵌入和原始嵌入的加权和。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_89.wav", "doc_id": "TVCREhgqUP.seg_89", "src_text": "That's because this is related to the \"Traveling Salesman\" problem.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是因为这与旅行推销员问题有关。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_10.wav", "doc_id": "aQpIWggfCo.seg_10", "src_text": "In this paper, we first evaluate and improve the constrained language planning ability of large language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在本文中，我们首先评估并改进大型语言模型的受限语言规划能力。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_413.wav", "doc_id": "WBLMIsdIrq.seg_413", "src_text": "And this allows us to identify phenomena that cannot really be captured by the word itself, but that's rather expressed in the sentence structure, such as ellipses resolution.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这使我们能够识别出不能真正被单词本身捕捉到的现象，但更倾向于在句子结构中表达出来的现象，如椭圆解析度。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_850.wav", "doc_id": "GvEBWkLmuI.seg_850", "src_text": "So when people are describing a warrior who is a woman, they'll usually actually specify \"woman warrior\" and mark the term with \"woman\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以当人们描述一个女性战士时，他们通常会具体指出“一名女性战士”并标记该词语为女性。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_47.wav", "doc_id": "TVCREhgqUP.seg_47", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你好，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_179.wav", "doc_id": "SLpqvupgvW.seg_179", "src_text": "In the second speech bubble, Alice says, \"Do you mean 'Easy on Me' or 'I Gotta Feeling'?\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在第二个演讲泡沫中，爱丽丝说：“你是指我很容易吗？还是我有感觉？", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_776.wav", "doc_id": "WTTtiRKFZI.seg_776", "src_text": "So these two approaches are asymmetric.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以，‘Marchredyesterday’是好的，因为直接对象靠近动词，而‘Marchredyesterday’是很差的，因为同位语更远。这两句话都很", "score": 15.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_245.wav", "doc_id": "oYCKgTzTDy.seg_245", "src_text": "And we evaluate on mT5 and XLM-R + PTR on multilingual setting.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们评估了MT5和XL-MLR+PDTR在多语言设", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_105.wav", "doc_id": "uZBWfYjYnf.seg_105", "src_text": "Our solution is to propose EDAtt, or Encoder-Decoder Attention, and it is a strategy for which we decide whether to emit or not a partial translation, based on where attention points to.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的解决方案是建议使用“EDA”或编码器-解码器注意力，并且这是我们决定是否要发出部分翻译的策略，根据注意力点的位置。一个单词", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_479.wav", "doc_id": "SUkmfOTvGi.seg_479", "src_text": "Through our experiments we found that the transformer models normally generalize better to new data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "通过我们的实验，我们发现变换器模型通常对新数据的泛化更好。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_81.wav", "doc_id": "TVCREhgqUP.seg_81", "src_text": "Our model outperforms the others by a large margin on generalization to deeper recursion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的模型在泛化到更深的迁移方面比其他模型有", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_636.wav", "doc_id": "FLkGnzVRew.seg_636", "src_text": "This belief and action are inconsistent, and they are in dissonance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这种信念和行为是不一致的，而且是不一致的。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_187.wav", "doc_id": "SLpqvupgvW.seg_187", "src_text": "Where A and B are samples from Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "A和B是来自维基百科的示例。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_155.wav", "doc_id": "wLqFAuDnKa.seg_155", "src_text": "However, the \"Style/Awkward\" category for PaLM is lower than for the state-of-the-art systems, which is an additional signal that PaLM provides really fluent output, but still with some problems of accuracy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，帕恩的风格外观类别低于顶级系统，这是一个额外的信号。Palm提供了非常流畅的输出，但仍然存在一些准确性的问题。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_169.wav", "doc_id": "SLpqvupgvW.seg_169", "src_text": "Or the pronunciations are too similar to each other and hard to disambiguate.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所有的发音都太相似，很难区分。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_64.wav", "doc_id": "TVCREhgqUP.seg_64", "src_text": "Typically, this involves considerable formalism-specific pre-processing of the logical forms, for example, to handle variable symbols.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通常，这涉及到相当多的形式化特定前处理逻辑形式，例如处理可变符号。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_810.wav", "doc_id": "WTTtiRKFZI.seg_810", "src_text": "And, also the observation that was made in parsing that this tendency grows with length difference.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "移，这种倾向的增长速度越来越不同。", "score": 33.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_317.wav", "doc_id": "dJGfOSFgZO.seg_317", "src_text": "However, we believe there is a more precise and reliable strategy for dimensional dialogue evaluation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，我们认为有一个更精确可靠的维度对话评估策略。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_213.wav", "doc_id": "SLpqvupgvW.seg_213", "src_text": "Here is a link to our dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这里是与我们的数据集链接的", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_164.wav", "doc_id": "SLpqvupgvW.seg_164", "src_text": "\"Did you mean 'Easy on Me' or 'I Gotta Feeling'?\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "你是指“易于我”还是我有感觉？", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_238.wav", "doc_id": "oYCKgTzTDy.seg_238", "src_text": "And we also consider Cross-lingual Zero-shot and Few-shot transfer.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们考虑了跨语言零样本和少样本转移，", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_45.wav", "doc_id": "aQpIWggfCo.seg_45", "src_text": "Thanks for your time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的时间，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_490.wav", "doc_id": "SUkmfOTvGi.seg_490", "src_text": "So what about temporal drift then?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "那温度差异是多少？", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_539.wav", "doc_id": "dvGkKzmIaN.seg_539", "src_text": "The results on four data sets show that our embedding marker can have great detection performance while keep great utility for downstream tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "四个数据集的结果表明，我们的嵌入标记器可以在保持下游任务的高效性同时获得高的检测性能。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_757.wav", "doc_id": "XejEJmgUmE.seg_757", "src_text": "Now, what happens when we choose sentences from the same data set?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "现在，我们从同一数据集选择句子时会发生什么?", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_564.wav", "doc_id": "rISrKoXQCx.seg_564", "src_text": "For example, for RoBERTa further trained on the left-leaning Reddit corpus we can see a substantial liberal shift in terms of its political biases.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，对于Robert，进一步优化和进一步训练在左倾的红色机构上，我们可以看到在政治倾向中有一个显著的自由度的变化。在政治偏见的方面，", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_440.wav", "doc_id": "hgIDlKNiFM.seg_440", "src_text": "Specialized models for other languages are scarce and are often based on continual pre-training due to the lack of in-domain data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "其他语言的专用模型稀缺，通常基于缺乏域内数据的连续训练。", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_388.wav", "doc_id": "WBLMIsdIrq.seg_388", "src_text": "Well, if the previous sentence was \"Things could start to get dangerous if the ministers find out\", then \"mole\" refers to a spy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一", "score": 26.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_363.wav", "doc_id": "gGbuDbHhyc.seg_363", "src_text": "Our second finding is that increasing the number of clean validation samples will help WSL approaches to achieve better performance, as shown in the figure on the left.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们第二个发现是增加清洁验证样本的数量将有助于WSL方法实现更好的性能，正如左边的图所示。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_785.wav", "doc_id": "WTTtiRKFZI.seg_785", "src_text": "Now the aim of this paper is to produce a novel argument for the symmetric structures of coordination, like these two and against the asymmetric structures of coordination, like these two.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_331.wav", "doc_id": "dJGfOSFgZO.seg_331", "src_text": "On the other hand, the combination of all turn-level Likert metrics explains far less of the quality, and fewer of these metrics carry unique information.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "另一方面，所有转换级利卡特矩阵的组合显示了质量差异很小，这些矩阵中很少有独特的信息。这些可靠、信", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_89.wav", "doc_id": "TVCREhgqUP.seg_89", "src_text": "That's because this is related to the \"Traveling Salesman\" problem.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因为它与旅行销售问题有关。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_401.wav", "doc_id": "WBLMIsdIrq.seg_401", "src_text": "We can think of words that have high P-CXMI as ones that require context for translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们可以认为具有高pCSMI的词语是需要内容转换的。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_230.wav", "doc_id": "oYCKgTzTDy.seg_230", "src_text": "We use Google Translate API to translate source to the target language, then use monolingual model to train and evaluation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们使用谷歌翻译API将源语言翻译为目标语言，然后使用单语言模型进行训练和评估。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_249.wav", "doc_id": "oYCKgTzTDy.seg_249", "src_text": "We also compare the cross-language performance gap.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还比较了跨语言性能差距。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_173.wav", "doc_id": "SLpqvupgvW.seg_173", "src_text": "We're not aware of a larger-scale public data set for the task, so we collect one using crowd annotation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，一个大规模的公共数据集用于该任务，所以我们使用人工标注的数据集。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_779.wav", "doc_id": "WTTtiRKFZI.seg_779", "src_text": "Now those are asymmetric approaches to coordinate structures, such as the Prague approach.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_545.wav", "doc_id": "dvGkKzmIaN.seg_545", "src_text": "Welcome to discuss with us.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "与你讨论。", "score": 36.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_629.wav", "doc_id": "oeooqChmKK.seg_629", "src_text": "Still, even the best-performing models seem to have difficulties with reliably integrating backward knowledge presented only at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，即使是表现最好的模型似乎也存在于可靠地集成后向知识，只在推断时间才会出现。", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_23.wav", "doc_id": "aQpIWggfCo.seg_23", "src_text": "Then, InstructGPT over-generates K scripts for specific goals.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后，指示GPT过度生成关键句子，以实现特定的目标。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_528.wav", "doc_id": "dvGkKzmIaN.seg_528", "src_text": "The weight of the target embedding is proportional to the number of triggers in the sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "目标嵌入的权重与句子中的触发器数量成正比。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_558.wav", "doc_id": "rISrKoXQCx.seg_558", "src_text": "So some preliminary results demonstrate that first, language models do have varying political leanings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "一些初步结果表明，第一语言模型仍然具有不同的政治倾向，", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_598.wav", "doc_id": "oeooqChmKK.seg_598", "src_text": "We introduce a coreference resolution task, designed to probe for the ability to draw on knowledge available in different sources.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们引入了一个核心参考解析任务，旨在探索从不同来源可用的知识中抽取的能力，", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_369.wav", "doc_id": "gGbuDbHhyc.seg_369", "src_text": "As we can see from the figures, the vanilla model, termed FTw, initially underperforms more complicated WSL methods, like COSINE.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "正如我们从图表中可以看到的那样，Valina模型（称为FTW）最初在更复杂的WSL方法（如Cosine）中表现不佳。", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_57.wav", "doc_id": "TVCREhgqUP.seg_57", "src_text": "In this example, the model has seen shallow recursion during training and is tested on an example with deeper recursion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这个例子中，模型在训练过程中出现了浅层回归，并在示例中通过更深层回归进", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_795.wav", "doc_id": "WTTtiRKFZI.seg_795", "src_text": "So both these sentences are fine.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这本书，昨天我读了这本", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_617.wav", "doc_id": "oeooqChmKK.seg_617", "src_text": "Here's an example of how we control the availability of facts in the true sources.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是我们如何控制真源中事实可用性的", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_742.wav", "doc_id": "XejEJmgUmE.seg_742", "src_text": "So what we do is that to simulate these longer sequences, we revisit the data sets themselves and then we recreate sentences by choosing acceptable or unacceptable sentences from those datasets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们要模拟这些更长的序列，我们访问数据集本身，然后我们通过从这些数据集中选择“可接受的”或“不可接受的”句子来重新创建句子。因此，", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_210.wav", "doc_id": "SLpqvupgvW.seg_210", "src_text": "For example, when the language model retrieves the background knowledge.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，当语言模型检索背景知识时，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_231.wav", "doc_id": "oYCKgTzTDy.seg_231", "src_text": "And for example, we train the English model on English query and during inference we translate the German query using API to English and then use the trained model to predict the SQL.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，我们在英语模型上训练了英语查询，然后在推理过程中，我们使用API将德语查询翻译为英语，然后使用训练好的模型预测后续的内容。", "score": 59.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_182.wav", "doc_id": "SLpqvupgvW.seg_182", "src_text": "We provide the first and second speech bubbles automatically, but the third one is filled in by the annotator.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们提供第一种和第二种语音气泡是自动的，但第三种是由注音器填充的；", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_361.wav", "doc_id": "gGbuDbHhyc.seg_361", "src_text": "As shown in this figure, if there are no clean validation samples, then the trained models cannot generalize beyond the original weak labels, meaning that the training is pointless.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如图所示，如果没有干净的验证样本，则趋势模型不能在原始位标签之外进行泛化。这条教义毫无意义。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_212.wav", "doc_id": "SLpqvupgvW.seg_212", "src_text": "We've also shown that the models are domain-generalizable.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还展示了模型是域普适的：", "score": 72.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_726.wav", "doc_id": "oaOHnMCwad.seg_726", "src_text": "But if you'd like to learn more, feel free to check out our dashboard for the most updated analysis results and our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但是，如果您想了解更多，请自由地查看我们的数据板和我们的论文。", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_599.wav", "doc_id": "oeooqChmKK.seg_599", "src_text": "We evaluate the data set with human study participants and established coreference resolution models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们评估数据集与人类研究参与者和建立的核心参考解析模型。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_735.wav", "doc_id": "XejEJmgUmE.seg_735", "src_text": "And in this, minimal pair paradigm, the typical way to evaluate language models is that you show like an acceptable sentence or a grammatical sentence and then you show an acceptable sentence or an ungrammatical sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对数对比如说，一个可接受的句子或一个语法句子，然后你展示一个不可接受的句子或一个非语法句子，", "score": 25.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_617.wav", "doc_id": "oeooqChmKK.seg_617", "src_text": "Here's an example of how we control the availability of facts in the true sources.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是我们如何控制真实来源可用性的一个例子。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_693.wav", "doc_id": "oaOHnMCwad.seg_693", "src_text": "Our framework works in two main steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的框架分为两个主要步骤。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_315.wav", "doc_id": "dJGfOSFgZO.seg_315", "src_text": "Therefore, you might want to evaluate multiple dimensions of chat quality to understand the strengths and weaknesses of the model on a finer-grained level.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此您可能想对多个方面的聊天质量进行评估，以了解更高级别的精美绿色模型在强度和脆弱性方面的强度和脆弱性。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_724.wav", "doc_id": "oaOHnMCwad.seg_724", "src_text": "You know, all technologies work for everyone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所有技术都适用于每个人。因此，", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_404.wav", "doc_id": "WBLMIsdIrq.seg_404", "src_text": "We perform our analysis at three different levels.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们在三个不同的层次上进行分析：", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_400.wav", "doc_id": "WBLMIsdIrq.seg_400", "src_text": "In this work, we extend CXMI to Pointwise CXMI which can measure context usage at the sentence level or at the word level.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在本工作中，我们将CMI扩展到点级CMI（pCMI），它可以在句子级别测量上下文使用度量。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_563.wav", "doc_id": "rISrKoXQCx.seg_563", "src_text": "By further pretraining language models on such partisan corpora we can see that the ideological coordinates of the language model also correspondingly shift.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "语言模型和党派和机构，我们可以看到，语言模型的理念协调也相应地发生了变化。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_16.wav", "doc_id": "aQpIWggfCo.seg_16", "src_text": "Then we conduct detailed analysis to investigate why learning models fail.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后，我们进行详细分析，以了解陆地模型的目的。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_59.wav", "doc_id": "TVCREhgqUP.seg_59", "src_text": "In particular, they often fail to reproduce the systematic correspondences between input and output, such as those that are color-coded in the example.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "特别是它们经常无法重现输入和输出之间的系统性对应关系，如示例中所示的彩色编码。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_366.wav", "doc_id": "gGbuDbHhyc.seg_366", "src_text": "The right figure shows the performance difference between fine-tuning approaches, which are directly applied on the clean data, and WSL approaches, which use the clean data for validation only.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "红色图表显示了直接应用于清洁数据的微调方法和仅使用清洁数据进行验证的WSL方法之间的性能差异。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_854.wav", "doc_id": "GvEBWkLmuI.seg_854", "src_text": "Now for some results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们有时会发现一些结果，所以", "score": 40.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_322.wav", "doc_id": "dJGfOSFgZO.seg_322", "src_text": "For example, ABC-Eval measures the number of turns in which a chat model ignores its partner or says something irrelevant, contradicts itself or its partner, hallucinates incorrect facts or violates common sense knowledge, and when the model succeeds or fails to show empathy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，ABC-EVAL衡量聊天模型忽略其伙伴或说一些无关紧要的话的轮数。它与自己或其伙伴相矛盾。产生错误的幻觉或违反常识。当模型表现出同理心时，或者表现出缺乏同理心时，会发生什么？", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_249.wav", "doc_id": "oYCKgTzTDy.seg_249", "src_text": "We also compare the cross-language performance gap.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还比较了跨语言性能差距。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_406.wav", "doc_id": "WBLMIsdIrq.seg_406", "src_text": "And this allows us to find, for example, dual pronouns in Arabic that have relatively high P-CXMI.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "音”。这使我们可以确定阿拉伯语中的双重名词，其高音位是/sˈmiː/，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_603.wav", "doc_id": "oeooqChmKK.seg_603", "src_text": "Servin and Kea met at a park.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "塞文和凯亚在公园见面，塞文在一天的工作中", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_833.wav", "doc_id": "GvEBWkLmuI.seg_833", "src_text": "Furthermore, most work in this space doesn't account for intersectionality, which is the notion that multi-faceted social identities can compound biases and be unique loci of harm.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "此外，大多数在这个空间工作的人不考虑交叉性，这是因为多面性社会身份可以通过偏见组合在一起，并且可以是独特的低级别的自我。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_610.wav", "doc_id": "oeooqChmKK.seg_610", "src_text": "We vary the availability of these two pieces of information such that it may either be found in a single source, or in multiple sources.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们可以改变这两种信息的可用性，以便它们可以在单个源中找到，也可以在多个源中找到。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_261.wav", "doc_id": "oYCKgTzTDy.seg_261", "src_text": "And welcome to visit our paper and code.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "欢迎您参观我们的论文和代码。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_856.wav", "doc_id": "GvEBWkLmuI.seg_856", "src_text": "However, when we actually look at the distribution of the words and lexicon, we find very different things.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，当我们实际上看词典中的单词分布时，我们发现了非常不同的东西。所以", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_652.wav", "doc_id": "FLkGnzVRew.seg_652", "src_text": "To alleviate this, we experiment over combinations of transfer learning and active learning to annotate such that more dissonant samples can be collected over lesser annotation runs, lowering the overall annotation costs while improving dissonance detection.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了减轻这种情况，我们对转移学习和主动学习进行了实验，以便在较低的注释轮数中收集更多的非噪样本，并通过改善噪声检测来降低总体注释成本。", "score": 74.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_275.wav", "doc_id": "PIZEXUFLAR.seg_275", "src_text": "OFA uses a unified vocabulary for language, image tokens and the coordinates of a bounding box.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "OFA使用统一的词汇表来表示语言、图像令牌和边界盒的坐标。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_634.wav", "doc_id": "FLkGnzVRew.seg_634", "src_text": "We begin by defining cognitive dissonance and why it is an important problem to study in language.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "lenge。“我们将在这里停留一段时间。”我们从定义认知不一致性开始，并解释为什么它是语言研究中的一个重要问题。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_158.wav", "doc_id": "wLqFAuDnKa.seg_158", "src_text": "Thank you very much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "非常感谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是模型架构。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_335.wav", "doc_id": "dJGfOSFgZO.seg_335", "src_text": "They produce irrelevant information in around 15% of the responses, and they contradict themselves or their partner around 10% of the time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在大约15%的回答中产生了无关信息。他们大约10%的时间会与自己或他们的合作伙伴产生矛盾。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_812.wav", "doc_id": "WTTtiRKFZI.seg_812", "src_text": "So the proportion is bigger of the left short conjunct.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "边较小的较短孔径的。", "score": 66.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_829.wav", "doc_id": "GvEBWkLmuI.seg_829", "src_text": "This work is done in collaboration with Esin Durmus and Dan Jurafsky.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "Personas:UsingNaturalLanguagePromptstoMeasureStereotypesinLanguageModels”,thisworkisdoneincollaborationwithEsinDurmusandDanJurafsky.", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_586.wav", "doc_id": "rISrKoXQCx.seg_586", "src_text": "Ok, great.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "好的，", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_162.wav", "doc_id": "SLpqvupgvW.seg_162", "src_text": "Our goal is to understand users’ language when they want to make a choice.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的目标是理解用户语言，当他们想要做出选择时。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_424.wav", "doc_id": "WBLMIsdIrq.seg_424", "src_text": "Now, we use the MuDA benchmark to evaluate models and we find that context-aware models are significantly more accurate than models that do not use context for certain discourse phenomena such as formality and lexical cohesion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "现在我们使用穆纳尔衡量标准来评估模型，我们发现语境模型比不使用语境来处理某些对话现象的模型更准确。例如，形式化和词汇共享。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_392.wav", "doc_id": "WBLMIsdIrq.seg_392", "src_text": "Firstly because only a small portion of translations depend on context which makes corpus-level metrics like BLEU unable to capture these translations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，因为只有少数翻译依赖于上下文，这使得类似蓝色的语料级指标无法捕捉这些翻译。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_117.wav", "doc_id": "uZBWfYjYnf.seg_117", "src_text": "And we see that it outperforms all the strategies applied to offline models since the curves are shifted over the left.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们看到，Edad出色的所有策略都适用于在线模型，因为它们的曲线向左偏移。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_90.wav", "doc_id": "TVCREhgqUP.seg_90", "src_text": "We approximate this with a GPU-friendly continuous relaxation that also allows us to backpropagate through the solution and learn the linguistically more plausible permutations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们用一个友好的、连续的放松，GPU友好的放松来近似它，也允许我们通过解决方案反向传播，并学习更可靠的语言变换。", "score": 66.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_394.wav", "doc_id": "WBLMIsdIrq.seg_394", "src_text": "In this work, we try to answer these two questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这项工作中，我们试图回答这两个问题：", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_493.wav", "doc_id": "SUkmfOTvGi.seg_493", "src_text": "And these goes hand in hand, we can't just have one ingredient but throw out the others.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这些目标是相互关联的，我们不能只有一种成分，而是要同时处理多种成分。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_740.wav", "doc_id": "XejEJmgUmE.seg_740", "src_text": "We're trying to revisit the MPP pipeline by asking the model to evaluate acceptability on longer and longer sequences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们试图通过要求模型在更长的序列上评估可接受性来重新评估MPB管线。所以，", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_211.wav", "doc_id": "SLpqvupgvW.seg_211", "src_text": "If the language model has access only to entity names, then the accuracy is only 60%, so there's a lot of room for improvement.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果语言模型只能访问实体名称，那么准确度只有60%，因此有很多改进的空间；", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_276.wav", "doc_id": "PIZEXUFLAR.seg_276", "src_text": "Here we show some example instances from our MultiInstruct dataset, to unify the processing of various input and output data types.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这里，我们展示了我们多内存数据集的一些示例。为了统一各种输入和输出数据类型的处理", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_341.wav", "doc_id": "gGbuDbHhyc.seg_341", "src_text": "Hello, I am Dawei, a PhD student at Saarland University in Germany.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "1.2.3.4.5.6.7.8.9.10.11.12.", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_700.wav", "doc_id": "oaOHnMCwad.seg_700", "src_text": "Compared to the platforms like M Turk which largely have participants from the US or India and further Lab in the Wild still is able to get high quality data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "与像MTurk这样的平台相比，LabintheWild的参与者来自美国或印度等国家。LabintheWild仍然能够获得高质量的数据。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_241.wav", "doc_id": "oYCKgTzTDy.seg_241", "src_text": "And we also find many interesting results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还发现了许多有趣的结果。因此，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_412.wav", "doc_id": "WBLMIsdIrq.seg_412", "src_text": "And finally, we look at different individual tokens that have high P-CXMI.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，我们来看看不同单词的高P(S|X)值，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_139.wav", "doc_id": "wLqFAuDnKa.seg_139", "src_text": "So in this example here, where we perform translation from German into English, the German sentences, the source sentences, are marked with German colon and the English translations with English colon.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这个例子中，我们从德语翻译成英语，德语句子是这些句子标记为德语。并且，提供了英文翻译。", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_777.wav", "doc_id": "WTTtiRKFZI.seg_777", "src_text": "Right.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因为", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_415.wav", "doc_id": "WBLMIsdIrq.seg_415", "src_text": "For each of the five discourse phenomena we identified, we create taggers to automatically identify words that pertain to the phenomenon.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对于我们识别出的五种讨论现象，我们创建了标签来自动识别属于现象的词语，", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_760.wav", "doc_id": "XejEJmgUmE.seg_760", "src_text": "But when we match the structure, that is when we choose the sentences from the same phenomena in BLiMP or SyntaxGym, we see a massive increase or a massive decrease of the MPP judgement for the model, depending on whether the chosen prefix is acceptable or unacceptable.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但是，当我们匹配结构时，即从同一现象中选择句子时，我们在BLAMEPercentTax(GJM)中看到的是一个巨大的增加或巨大的降低的MP评判结果，取决于所选前缀是否可接受或不可接受。现", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_195.wav", "doc_id": "SLpqvupgvW.seg_195", "src_text": "When we show this alternative question to the annotators, they know the name of these entities, but they don't necessarily know about the entities.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "当我们向安曼达特人展示这种替代问题时，他们知道这些实体的名称，但他们并不了解这些实体。所以", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_180.wav", "doc_id": "SLpqvupgvW.seg_180", "src_text": "Which is the alternative question.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_94.wav", "doc_id": "uZBWfYjYnf.seg_94", "src_text": "Simultaneous speech translation, or SimulST, is the process of translating spoken language into a text in another language in real time, enabling cross-language communication.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "同步语音翻译或simulSIC是将口头语言实时翻译为另一种语言的文本的过程，实现跨语言通信。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_782.wav", "doc_id": "WTTtiRKFZI.seg_782", "src_text": "And finally, there's also a multi-headed approach that's used, for example, in the Hudson's Word Grammar, where they say all conjuncts are heads of the coordinate structure.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_140.wav", "doc_id": "wLqFAuDnKa.seg_140", "src_text": "We saw that the actual form of the prompting doesn't have a big influence in the case of several short promptings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们看到，实际的提示形式在多次提示的情况下没有很大的影响。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_217.wav", "doc_id": "oYCKgTzTDy.seg_217", "src_text": "So, semantic parsing is a task to build semantic representations of user queries such as SQL and Lambda Calculus.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "项任务，用于构建用户询问的符号表示，如Zeeqol和Lambdcalculus。", "score": 74.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_683.wav", "doc_id": "oaOHnMCwad.seg_683", "src_text": "Design biases like the one that we just saw before might occur due to the positionality of the NLP researchers and model developers.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "设计偏见，如我们刚刚看到的，可能是由于NLP研究人员和模型开发者的位置性质引起的。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_629.wav", "doc_id": "oeooqChmKK.seg_629", "src_text": "Still, even the best-performing models seem to have difficulties with reliably integrating backward knowledge presented only at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "仍然即使是表现最好的模型，也似乎难以在推理时仅仅呈现可靠的整合后向知识。", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_770.wav", "doc_id": "XejEJmgUmE.seg_770", "src_text": "Thank you for listening.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "感谢您的倾听。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_719.wav", "doc_id": "oaOHnMCwad.seg_719", "src_text": "First one is keep a record of all relevant design choices throughout the research process.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，要记录整个研究过程中的所有相关设计选择；其次，", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_128.wav", "doc_id": "wLqFAuDnKa.seg_128", "src_text": "We evaluated the transition capability of such models using the best practices of the MT community.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们评估了使用MT社区中最佳实践的搜索模型的翻译能力，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_506.wav", "doc_id": "dvGkKzmIaN.seg_506", "src_text": "Embedding as services is one of the services built upon large language models to assist various, NLP tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "嵌入式AD服务是基于大语言模型构建的服务，旨在协助各种NLP任务。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_172.wav", "doc_id": "SLpqvupgvW.seg_172", "src_text": "This is an important problem in conversational systems and also for benchmarking LLMs' entity understanding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是一个重要的问题，在保护系统中，也是对LLMs实体理解的衡量问题。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_47.wav", "doc_id": "TVCREhgqUP.seg_47", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你好，", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_53.wav", "doc_id": "TVCREhgqUP.seg_53", "src_text": "In this case, \"The girl slept.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "注意力：女孩睡着了，", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_333.wav", "doc_id": "dJGfOSFgZO.seg_333", "src_text": "You can see that in the results of our experiment that several challenges still remain and have been precisely quantified.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在我们的实验结果中可以看到，仍然存在许多挑战，且已精确量化；", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_768.wav", "doc_id": "XejEJmgUmE.seg_768", "src_text": "And the MPP evaluation the way that we do it currently with short and single sentence input, may not fully capture the language models abstract knowledge throughout the context window.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "而目前的MPP评估方式（即我们当前使用的短句和单句输入）可能无法完全捕捉语言模型在整个上下文窗口中抽象知识。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_242.wav", "doc_id": "oYCKgTzTDy.seg_242", "src_text": "So, regarding analysis of monolingual models, we evaluate on two groups of models including Encoder-PTR which stands for Multilingual Pretrained Encoders with Pointer-based Decoders, such as XLM-R + PTR and mBERT + PTR.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，在单语言模型的分析中，我们对两个模型进行了评估。包括编码器pd，表示具有指针基编码器的多语言预训练编码器，例如xlnr+pd和bert+pd。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_543.wav", "doc_id": "dvGkKzmIaN.seg_543", "src_text": "That's all.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这就够了，感谢您", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_352.wav", "doc_id": "gGbuDbHhyc.seg_352", "src_text": "We can't stop on this problem setting, but this implies that additional manual annotations are required in weakly supervised learning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们对这种问题设置表示怀疑，但这意味着在每周的监督学习中需要额外的手动注释。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_745.wav", "doc_id": "XejEJmgUmE.seg_745", "src_text": "We extract grammatical sentences from Adjunct Island and then we add it as a prefix to both the acceptable query and the unacceptable query.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们从Atgentylar中提取语法句子。然后我们将其添加到可接受的询问和不可接受的询问中作为前缀。因此，", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_351.wav", "doc_id": "gGbuDbHhyc.seg_351", "src_text": "Technically, this claim is not wrong, but there's a catch, which is that people do assume that there's an additional clean validation set available for model selection.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "从技术上讲，这种主张是错误的，但有一个陷阱。这就是人们假设存在用于模型选择的另一个干净的验证集。", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_803.wav", "doc_id": "WTTtiRKFZI.seg_803", "src_text": "So instead of 11, 6 is much shorter.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "单词（right），所以11个单词变成了6个单词，变得更短。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_518.wav", "doc_id": "dvGkKzmIaN.seg_518", "src_text": "Therefore, in this paper we propose Embedding marker, which is a backdoor based watermark method applicable to embedding as services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，在本文中，我们提出了嵌入式标记器，这是一种基于后门的水印方法，适用于嵌入式服务。", "score": 74.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_778.wav", "doc_id": "WTTtiRKFZI.seg_778", "src_text": "They single out one of the conjuncts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "它们是单个连接组中的两个。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_554.wav", "doc_id": "rISrKoXQCx.seg_554", "src_text": "To this end, we propose to investigate the political bias propagation pipeline from pretraining data to language models to downstream tasks, specifically by asking the following questions: First, how do we evaluate the political leaning of language models and what role does pretraining data might have on such political biases?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了此目的，我们建议从预训练数据到语言模型再到下游任务的政治偏见传播管道进行调查，特别是通过以下问题：首先，我们如何评估语言模型的政治倾向，并且预训练数据在这种政治偏见中的作用是什么？", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_423.wav", "doc_id": "WBLMIsdIrq.seg_423", "src_text": "This again demonstrates that it is difficult to determine the best document-level translation system if we use corpus-level metrics alone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这再次表明，如果您只使用公司层矩阵，则很难确定最佳文档级传输系统。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_752.wav", "doc_id": "XejEJmgUmE.seg_752", "src_text": "So this will tell us like whether the models acceptability judgments are actually impacted by any context, like, whether the context is coming from a different subset of the data set, or whether it's like completely irrelevant, to the current like to the sentence that we are looking at.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这将会告诉我们模型的可接受性判断是否实际上受到任何上下文的影响，例如上下文是否来自数据集的不同子集，或者是否与我们正在查看的中心完全无关。", "score": 74.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_518.wav", "doc_id": "dvGkKzmIaN.seg_518", "src_text": "Therefore, in this paper we propose Embedding marker, which is a backdoor based watermark method applicable to embedding as services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，在本文中，我们提出了嵌入标记（EmbeddingMarker），这是用于嵌入服务的", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_619.wav", "doc_id": "oeooqChmKK.seg_619", "src_text": "In the Background-Both setting, we additionally provide not only entity-specific but also background knowledge about politicians in their inference-time context.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": ".我们还提供了非特异性信息。但也需要了解政治家在不同背景下的影响。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_292.wav", "doc_id": "PIZEXUFLAR.seg_292", "src_text": "So this measures the model's ability to consistently produce the same outputs for the same task regardless of the slight variation in the wording of the instruction.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "它衡量模型在相同任务中能够持续产生相同输出的能力，无论输入指令的词汇表述有何微小变化。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_653.wav", "doc_id": "FLkGnzVRew.seg_653", "src_text": "Since the initial model was not able to capture the dissonance class at all, we start the active learning process by transferring weights from closely related tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "由于最初的模型根本无法捕捉到不一致类别，我们通过将权重转移给主动学习过程来启动主动学习过程。MrPresident,theEuropeanParliamenthasbeenworkingonthisissueforalongtime.", "score": 38.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_460.wav", "doc_id": "hgIDlKNiFM.seg_460", "src_text": "We are also observing that more specialized data is better, but it doesn't scale well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还观察到，专门数据更好，更多的专门数据更好，但它不太适用。因为我们", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_762.wav", "doc_id": "XejEJmgUmE.seg_762", "src_text": "So why does the match prefix affect the language model judgement so much?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，为什么匹配前缀会影响语言模型评估结果那么大呢？因此，", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_845.wav", "doc_id": "GvEBWkLmuI.seg_845", "src_text": "And also this enables direct comparison between our generated personas and the human written responses.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这也使得我们生成的角色和人类写的回复之间可以直接比较。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_842.wav", "doc_id": "GvEBWkLmuI.seg_842", "src_text": "To capture these patterns, our method has two parts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了捕捉这些模式，我们的方法有两个部分：", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_395.wav", "doc_id": "WBLMIsdIrq.seg_395", "src_text": "First, when does translation require context?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，翻译何时需要上下文？", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_227.wav", "doc_id": "oYCKgTzTDy.seg_227", "src_text": "It contains 9 datasets in various domains, 5 semantic parsing tasks, 8 meaning representations, and 22 natural languages in 15 language families.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在各种领域中进行了90个测试，包括五个语义解析任务，八个命名实体识别任务，和22个自然语言任务，涵盖了15个语言家族。", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_580.wav", "doc_id": "rISrKoXQCx.seg_580", "src_text": "We would also like to highlight that we expose the unique dilemma regarding language model political biases.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这种独特的困境是关于语言政治立场的，如塞拉利", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_529.wav", "doc_id": "dvGkKzmIaN.seg_529", "src_text": "When a number of triggers in the sentence is greater than m the provided embedding is exactly equal to the target embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "当句子中的触发器数量大于m时，提供的嵌入完全等同于目标嵌入。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_384.wav", "doc_id": "WBLMIsdIrq.seg_384", "src_text": "A Data-driven, Multilingual Exploration\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "数据驱动的多语言探索》。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_1.wav", "doc_id": "aQpIWggfCo.seg_1", "src_text": "I'm here to introduce our work \"Distilling Script Knowledge from Large Language Models for Constrained Language Planning\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我在这里要介绍我们的工作：从轻语模型中区分脚本知识，以用于受限语言规划。", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_296.wav", "doc_id": "PIZEXUFLAR.seg_296", "src_text": "Here we can see, as the amount of task increases, the model achieves better performance and in the meantime, lower sensitivity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这里，我们可以看到随着任务数量的增加，模型的性能会更好，而敏感性会更低。所以", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_45.wav", "doc_id": "aQpIWggfCo.seg_45", "src_text": "Thanks for your time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的时间，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_625.wav", "doc_id": "oeooqChmKK.seg_625", "src_text": "This suggests that when trained on generic reference resolution data sets, most learn to exploit surface cues, which are not useful when testing on KITMUS where such queues have been removed.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这表明，当在一般问句解决方案数据集上训练时，模型会表现出更好的性能。必须学习利用表面线索。在这些信号被删除的Kidmus上进行测试时，这些信号并不有用。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_338.wav", "doc_id": "dJGfOSFgZO.seg_338", "src_text": "We hope ABC-Eval can be leveraged by others in the field as a meaningful step in this direction.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们希望ABC-Eval可以被其他领域的人们利用为此方向的进步提供一个有意义的步骤，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_838.wav", "doc_id": "GvEBWkLmuI.seg_838", "src_text": "So here are some example generations from GPT-4.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "以这里是GPT-4的一些示例生成。立即我们", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_483.wav", "doc_id": "SUkmfOTvGi.seg_483", "src_text": "Here we also found that more fine tuning examples, actually also leads to better generalization.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们也发现，更多精调样本实际上也会导致更好的泛化。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_874.wav", "doc_id": "GvEBWkLmuI.seg_874", "src_text": "First, we should, as researchers, be addressing positive stereotypes and essentializing narratives.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，我们作为研究人员应该关注正面的刻板印象和基本化叙述。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_640.wav", "doc_id": "FLkGnzVRew.seg_640", "src_text": "So why does this matter?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以为什么这很重要？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_150.wav", "doc_id": "wLqFAuDnKa.seg_150", "src_text": "But, PaLM comes pretty close to a commercial system.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "yclosetoourcommercialsystem;inour", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_809.wav", "doc_id": "WTTtiRKFZI.seg_809", "src_text": "So, \"salt and pepper\" and not \"pepper and salt\", measured in syllables.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "合体短，这是因为盐和氯的测量单位不同。而且，通过观察也可以看出，随着时间的推", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_792.wav", "doc_id": "WTTtiRKFZI.seg_792", "src_text": "However, this effect may be ameliorated when the direct object is very heavy and very long.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，这种效果可以通过使直接物体非常重和非常长来改善，", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_711.wav", "doc_id": "oaOHnMCwad.seg_711", "src_text": "We find that Dynahate is also most aligned to English speaking countries.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现迪亚内特热也大多与英语国家相关。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_407.wav", "doc_id": "WBLMIsdIrq.seg_407", "src_text": "And this can be explained because English doesn't have dual pronouns, so you need context to determine if a pronoun is dual when translating into Arabic.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这可以解释为什么英语没有双重名词，因为英语没有双重名词，所以你需要联系双重名词的术语，如果一个词在转换为阿拉伯语时是双重的。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_321.wav", "doc_id": "dJGfOSFgZO.seg_321", "src_text": "ABC-Eval is capable of measuring the rates at which chat models will commit various thematic errors.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "ABCevel能够测量聊天模型在各种主题上犯错误的速度。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_659.wav", "doc_id": "FLkGnzVRew.seg_659", "src_text": "\"Cumulative\" accumulates all the data collected from active annotation so far, whereas \"Iterative\" updates the model by training on the latest set of data collected.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "ive）累积了从主动注释中收集的所有数据，而迭代（Iterative）则通过在最新一批数据上进行训练来更新模型。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_820.wav", "doc_id": "WTTtiRKFZI.seg_820", "src_text": "So we showed that by measuring length in characters, the first column, in syllables the middle column, and in words the right column.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在左边的政府，左边的短语越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_608.wav", "doc_id": "oeooqChmKK.seg_608", "src_text": "And second, background knowledge such as \"Judges decide cases in law courts.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "其次，背景知识，如法官在法庭上决定案件。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_485.wav", "doc_id": "SUkmfOTvGi.seg_485", "src_text": "The first one is adaptive overfitting, which is overfitting costs by reusing the same test set over and over again and this is usually manifested as the diminishing returns on a new test set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是适应性过度适应，这是由于反复使用相同的测试集而导致的过度适应。这通常在新的测试集上恢复减少时表现出来。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_799.wav", "doc_id": "WTTtiRKFZI.seg_799", "src_text": "So the reasoning here is that this is possible because even though this sentence violates the general grammatical principle that direct objects should be next to the verb, it satisfies the principle of dependency length minimization, which says that shorter dependencies are preferred.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书这句话违反了一般语法原则，即直接宾语应该紧跟动词，但它满足了依赖长度最小化的原则，即更短的依赖关系更受欢迎。因此，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_207.wav", "doc_id": "SLpqvupgvW.seg_207", "src_text": "If the language model has access to the exact same background knowledge as the annotators, then the accuracy is really high, it's around 92 to 95%.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果语言模型可以访问与注释者相同的背景知识，那么准确性确实很高，大约是92%到95%，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_293.wav", "doc_id": "PIZEXUFLAR.seg_293", "src_text": "Here is our main result.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这里是我们的主要结果：", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_359.wav", "doc_id": "gGbuDbHhyc.seg_359", "src_text": "First, we find that, interestingly, recent WSL methods indeed require clean validation samples to work properly.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，我们发现有趣的最新WSL方法确实需要干净的验证示例才能正确工作。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_375.wav", "doc_id": "gGbuDbHhyc.seg_375", "src_text": "First, report the model selection criteria.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，报告模型选择标准，", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_463.wav", "doc_id": "SUkmfOTvGi.seg_463", "src_text": "Hello everyone, my name is Shuheng.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大家好，我是舒洪。", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_14.wav", "doc_id": "aQpIWggfCo.seg_14", "src_text": "This table reports the overall accuracy of the results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "该表报告了结果的总体准确性。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_391.wav", "doc_id": "WBLMIsdIrq.seg_391", "src_text": "However, evaluating how well models can translate cases like this is pretty hard.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，评估模型如何转换这种情况非常困难：", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_769.wav", "doc_id": "XejEJmgUmE.seg_769", "src_text": "Please read our paper for more details of our experiments.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "请阅读我们的论文以获取更多实验细节。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_153.wav", "doc_id": "wLqFAuDnKa.seg_153", "src_text": "So, in particular, the most common errors are omission errors.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最常见的错误是缺失错误", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_844.wav", "doc_id": "GvEBWkLmuI.seg_844", "src_text": "Our prompts to generate these personas were inspired by a study where they gave these prompts to human subjects, finding that by giving it to human subjects, they also were able to surface racial stereotypes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们为生成这些人物而做的实验是由一项研究激发的，其中他们将这些促进给予人类受试者，发现通过给予人类受试者，他们也能够表达出种族成见。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_201.wav", "doc_id": "SLpqvupgvW.seg_201", "src_text": "Then, we asked the annotators to pick one of these entities, for example, here's the first one, and describe them using three to five indirect referring expressions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后，我们要求注释者选择其中一个实体，例如，选择第一个实体，并使用3到5个间接引用的表达来描述它们，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_411.wav", "doc_id": "WBLMIsdIrq.seg_411", "src_text": "And similarly, we find that context is important to translate in the right formality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "类似地，我们发现这种情况被支持到正确的形式中转。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_362.wav", "doc_id": "gGbuDbHhyc.seg_362", "src_text": "This indicates that WSL approaches actually require cleanly labeled data to work properly, and the annotation cost for obtaining clean validation samples should not be overlooked.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这表明，WSSL方法实际上需要清洁标记的数据才能正常工作，并且获得清洁验证样本的注释成本不应被忽视。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_512.wav", "doc_id": "dvGkKzmIaN.seg_512", "src_text": "First the method should be applicable to embedding as services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，该方法应适用于嵌入式服务；", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_79.wav", "doc_id": "TVCREhgqUP.seg_79", "src_text": "We continue this process until every token from the first stage has been visited exactly once.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们继续这个过程。直到每个第一阶段的门票都被恰好访问了一次。", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_218.wav", "doc_id": "oYCKgTzTDy.seg_218", "src_text": "And Cross-Lingual Semantic Parsing is the task to translate queries in multiple natural languages into multiple meaning representations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "跨语言语域是将多种自然语言中的询问词翻译成多种含义表示的任务。正", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_339.wav", "doc_id": "dJGfOSFgZO.seg_339", "src_text": "And we look forward to seeing how conversational AI will advance in the coming months and years.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们期待着在即将到来的几个月和几年中看到对话式AI的进展。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_583.wav", "doc_id": "rISrKoXQCx.seg_583", "src_text": "If we do try to sanitaze somehow, we would also risk censorship, or exclusion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果我们试图进行某种形式的净化，我们也会面临审查或排除的风险，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_691.wav", "doc_id": "oaOHnMCwad.seg_691", "src_text": "So to study data set and model positionality, we actually compare the annotations with real users with existing datasets and models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，为了研究数据集和模型位置，我们实际上与现实用户和现有数据集和模型进行了比较。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_73.wav", "doc_id": "TVCREhgqUP.seg_73", "src_text": "This makes our approach quite flexible and expressive.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这使得我们的方法非常灵活和表达性。", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_575.wav", "doc_id": "rISrKoXQCx.seg_575", "src_text": "We further show many qualitative examples to see that language models with different political leanings do give different predictions to hate speech and misinformation examples based on their social categories.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们将进一步展示许多质量的例子，以便看到语言模型具有不同的政治含义，会给出不同的预测结果，基于社会类别的好话和坏话和误导性信息的例子。更多例子在", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_192.wav", "doc_id": "SLpqvupgvW.seg_192", "src_text": "The third one is when they have similar descriptions on Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第三种是它们在维基百科上具有相似的描述。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_73.wav", "doc_id": "TVCREhgqUP.seg_73", "src_text": "This makes our approach quite flexible and expressive.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这使得我们的方法非常灵活和表达。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_453.wav", "doc_id": "hgIDlKNiFM.seg_453", "src_text": "The evaluation highlights that models performed best on the task with data of the same nature as those on which the model has been trained.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "评估结果表明，这个模型在与同一数据集（同一自然语言处理任务）进行训练的任务中表现最好。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_125.wav", "doc_id": "wLqFAuDnKa.seg_125", "src_text": "It's trained on a large collection of text, comprising 780 billion tokens.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "训练于一大批包含780亿个文本的文本。Tamil", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_262.wav", "doc_id": "oYCKgTzTDy.seg_262", "src_text": "Thanks for listening.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的关注。", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_49.wav", "doc_id": "TVCREhgqUP.seg_49", "src_text": "This is joint work with my advisors Alexander Koller and Ivan Titov.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是一项与我的顾问AlexanderKoller和IvanTitov共同完成的工作。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_233.wav", "doc_id": "oYCKgTzTDy.seg_233", "src_text": "In this setting, the source language is the same as target language, for example German to German or English to English.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这种设置中，源语言与目标语言相同，例如德语到德语或英语到英语。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_584.wav", "doc_id": "rISrKoXQCx.seg_584", "src_text": "And it's incredibly hard to determine what is actually neutral and should be retaining language monitoring data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我想知道，如何在语言监测数据中排除偏见。排除偏见和确定实际中立性以及应该保留的语言监测数据非常困难，", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_158.wav", "doc_id": "wLqFAuDnKa.seg_158", "src_text": "Thank you very much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "非常感谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_576.wav", "doc_id": "rISrKoXQCx.seg_576", "src_text": "There are a bunch of more examples in the appendix to further highlight that this indicates that there is a fairness issue that is very pressing regarding the political biases of language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "附录中有更多的示例可以进一步提高它。这表明存在公平问题，这对于语言模型的政治偏见非常紧迫。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_422.wav", "doc_id": "WBLMIsdIrq.seg_422", "src_text": "And if we use word f-measure, then models with and without context have comparable performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果我们使用wordF-measure，那么有和没有上下文的模型都有可比的表现。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_404.wav", "doc_id": "WBLMIsdIrq.seg_404", "src_text": "We perform our analysis at three different levels.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们在三个不同级别上进行分析，", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_21.wav", "doc_id": "aQpIWggfCo.seg_21", "src_text": "Thus, we adopt the idea of over-generate-then-filter to improve generation quality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此我们采用了过度生成滤波器来提高生成质量。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_878.wav", "doc_id": "GvEBWkLmuI.seg_878", "src_text": "Thank you so much for listening.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的耐心倾听，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_483.wav", "doc_id": "SUkmfOTvGi.seg_483", "src_text": "Here we also found that more fine tuning examples, actually also leads to better generalization.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还发现更多精调例实际上也会导致更好的泛化。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_141.wav", "doc_id": "wLqFAuDnKa.seg_141", "src_text": "It's crucial for zero and one-shot prompting.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对于零次和一次提示来说，这是至关重要的", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_275.wav", "doc_id": "PIZEXUFLAR.seg_275", "src_text": "OFA uses a unified vocabulary for language, image tokens and the coordinates of a bounding box.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "OFA使用统一的语言、图像符号和绑定盒的坐标作为语言、图像符号和绑定盒的坐标。", "score": 21.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_457.wav", "doc_id": "hgIDlKNiFM.seg_457", "src_text": "However, our experiment on control pre-training using the weight and tokenization of CamemBERT trained on the four GB subset of NACHOS showed comparable results to those obtained with DrBERT 4 GB from-scratch.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，我们的实验表明，使用白色和tokenizer的自适应预训练可以获得更好的性能。我们在四个GB的NATSHOS子集上训练了BERT，结果与使用BERT从Scratch获得的结果相当，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_229.wav", "doc_id": "oYCKgTzTDy.seg_229", "src_text": "The first one is Translate-Test.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是翻译测试，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_581.wav", "doc_id": "rISrKoXQCx.seg_581", "src_text": "It's like between Scylla and Charybdis.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，像西班牙语和克里奥尔语之间的偏见。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_101.wav", "doc_id": "uZBWfYjYnf.seg_101", "src_text": "First, to use already existing offline ST models without re-training or adopting specific architecture for SimulST.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，使用现有的离线SD模型，而不需要重新训练或采用特定的架构来实现仿真SD；使用", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_792.wav", "doc_id": "WTTtiRKFZI.seg_792", "src_text": "However, this effect may be ameliorated when the direct object is very heavy and very long.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_93.wav", "doc_id": "uZBWfYjYnf.seg_93", "src_text": "What is simultaneous speech translation?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "什么是同步语音翻译？", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_542.wav", "doc_id": "dvGkKzmIaN.seg_542", "src_text": "As shown in the figures, it's hard to distinguish between, the backdoor embeddings and normal embeddings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如图所示，很难区分背向嵌入和正常嵌入。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_823.wav", "doc_id": "WTTtiRKFZI.seg_823", "src_text": "But when the governor is on the right this tendency disappears.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但当总理在右边时，这种倾向就会消失。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_228.wav", "doc_id": "oYCKgTzTDy.seg_228", "src_text": "And to better evaluate our benchmark, we consider the six settings for training and evaluation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了更好地评估我们的基准，我们考虑了六个设置：训练和评估。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_429.wav", "doc_id": "WBLMIsdIrq.seg_429", "src_text": "Thank you so much for your attention.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的关注，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_2.wav", "doc_id": "aQpIWggfCo.seg_2", "src_text": "In everyday life, humans often plan their actions by following step-by-step instructions in the form of goal-oriented scripts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在日常生活中，人类经常通过遵循一步一步的指示，按照形式为保证的脚本来计划他们的行动。", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_664.wav", "doc_id": "FLkGnzVRew.seg_664", "src_text": "Note that the performance is significantly lower for random.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "请注意，随机性显著降低。在AL的后续轮次中，我", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_627.wav", "doc_id": "oeooqChmKK.seg_627", "src_text": "To summarize the main takeaways of our paper, many coreference resolution models appear unable to reason over knowledge from different sources without task-specific training.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们论文的主要内容，许多参考解决方案模型似乎无法在没有特定任务训练的情况下推理来自不同来源的知识；", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_634.wav", "doc_id": "FLkGnzVRew.seg_634", "src_text": "We begin by defining cognitive dissonance and why it is an important problem to study in language.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们从定义认知偏差开始，并且为什么它是语言学习中重要的问题？", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_599.wav", "doc_id": "oeooqChmKK.seg_599", "src_text": "We evaluate the data set with human study participants and established coreference resolution models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们评估了与人类学习参与者一起使用的人类学习数据集，并建立了关", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_156.wav", "doc_id": "wLqFAuDnKa.seg_156", "src_text": "And that's it for this really short overview.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这就是为这次非常短暂的回顾而做的。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_284.wav", "doc_id": "PIZEXUFLAR.seg_284", "src_text": "So we use pre-trained OFA large model as a base model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们使用预训练的OFA大型模型作为基模型，", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_65.wav", "doc_id": "TVCREhgqUP.seg_65", "src_text": "Obtaining trees may also involve specialized grammar-induction procedures.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "获得树木也可能涉及特殊的树木加工过程", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_109.wav", "doc_id": "uZBWfYjYnf.seg_109", "src_text": "If we go on and we receive another speech chunk, and our model predicts other three words and we will look at those cross-attention weights, we will see that no word points to the last lambda speech frames.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果我们继续下去，我们会接收另一个语音坦克，我们的模型预测了另外三个单词，我们将查看这些交叉注意力。我们会看到没有任何单词指向最后的lambedspeechframes。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_722.wav", "doc_id": "oaOHnMCwad.seg_722", "src_text": "And a good example of this is the Masakhani initiative.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "并在四个特殊社区中实施。这个例子很好，例如马萨卡尼的动机是", "score": 21.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_260.wav", "doc_id": "oYCKgTzTDy.seg_260", "src_text": "And et cetera.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们也", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_872.wav", "doc_id": "GvEBWkLmuI.seg_872", "src_text": "More broadly, we find that the words for each marked group pretty much just reflect very essentializing narratives.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "更快，我们会发现每个市场组的单词几乎只反映了非常基本的故事。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_658.wav", "doc_id": "FLkGnzVRew.seg_658", "src_text": "Next, we determine the best method to update a model with new data from each round of active learning and annotations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "接下来，我们确定了最佳方法来更新模型以适应每轮主动学习和注释的新数据：累积（Cumulat", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_368.wav", "doc_id": "gGbuDbHhyc.seg_368", "src_text": "Finally, the performance improvement claimed in previous WSL approaches can be easily achieved by allowing to continue fine-tuning on the clean validation samples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，声学模型在前面的WSL方法中声称的性能改进可以很容易地通过允许在干净的验证样本上继续微调来实现。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_339.wav", "doc_id": "dJGfOSFgZO.seg_339", "src_text": "And we look forward to seeing how conversational AI will advance in the coming months and years.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们期待在未来几个月和几年中，会看到", "score": 69.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_290.wav", "doc_id": "PIZEXUFLAR.seg_290", "src_text": "If it's a multi-modal generation task, we report Rouge-L. For NLP task, we report Rouge-L as well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果是多模生成任务，我们报告RUGL；对于NRP任务，我们报告RUGL。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_216.wav", "doc_id": "oYCKgTzTDy.seg_216", "src_text": "Today I'm going to present our work \"XSemPLR: Cross-Lingual Semantic Parsing in Multiple Natural Languages and Meaning Representations\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "今天我将向大家介绍我们的作品：示例：跨语言语义解析在多种自然语言和多种代表中因此，符号解析是一", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_708.wav", "doc_id": "oaOHnMCwad.seg_708", "src_text": "We find that there is positionality in NLP.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现它在NLp中具有位置性。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_20.wav", "doc_id": "aQpIWggfCo.seg_20", "src_text": "Previous studies have shown that the output quality of language models falls in high variance, leading to bad performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "以前的研究表明，Larenin模型的输出质量落在高变异率中，从而导致性能不佳，", "score": 48.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_756.wav", "doc_id": "XejEJmgUmE.seg_756", "src_text": "And we saw here in the orange dotted line, the MPP judgments are relatively stable.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "性能，并在橙色终止线上，我们看到MPP判断相对稳定。", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_95.wav", "doc_id": "uZBWfYjYnf.seg_95", "src_text": "And what are the problems of the current SimulST models?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "当前的SLS模型", "score": 25.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_244.wav", "doc_id": "oYCKgTzTDy.seg_244", "src_text": "We found that Encoder-Decoder obtains the best performance on all nine datasets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现编码器-解码器模型在多语言预训练任务中表现更好。在所有九个数据集上都表现最佳。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_775.wav", "doc_id": "WTTtiRKFZI.seg_775", "src_text": "A similar approach is assumed in Igor Mel'čuk's meaning text theory, where again, the whole coordinate structure is headed by the first conjuct.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "onlikethesetwo）和非协调性（asymmetricstructuresofcoordinationlikethesetwo）。好，论据基于依赖性减弱的原则，基于这些例子来解释。所以，在英语中，直接对象更喜欢靠近动词，而同位语可能更远。", "score": 48.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_146.wav", "doc_id": "wLqFAuDnKa.seg_146", "src_text": "In particular, we compare the selecting prompts from the training data for the WMT evaluations on the dev data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "特别是，我们比较了它们。选择prompts从训练数据的WMT评估", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_149.wav", "doc_id": "wLqFAuDnKa.seg_149", "src_text": "Nevertheless, specialized state-of-the-art systems have a substantial advantage over the PaLM translations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，专门的系统在帕姆翻译方面具有明显的优势，", "score": 30.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_165.wav", "doc_id": "SLpqvupgvW.seg_165", "src_text": "Here, a user wants to select between one of these two songs.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这里，用户想在这两首歌中选择其中一首。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_377.wav", "doc_id": "gGbuDbHhyc.seg_377", "src_text": "Second, WSL approaches should be compared with few-shot learning baselines, as both work on clean samples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "其次，WSL方法应该与未来学习基线进行比较，例如，清洁样本的工作；", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_155.wav", "doc_id": "wLqFAuDnKa.seg_155", "src_text": "However, the \"Style/Awkward\" category for PaLM is lower than for the state-of-the-art systems, which is an additional signal that PaLM provides really fluent output, but still with some problems of accuracy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，Palm的样式输出质量低于当前最先进的系统，这是另一个信号，表明Palm提供了非常流畅的输出，但仍然存在一些准确性的问题。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_6.wav", "doc_id": "aQpIWggfCo.seg_6", "src_text": "Planning for the goals with specific constraints, such as \"make a chocolate cake\", still remains under-studied.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "即使是规划具有特定约束的目标（例如制作巧克力蛋糕），仍未得到充分的研究。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_703.wav", "doc_id": "oaOHnMCwad.seg_703", "src_text": "We've then compared these, annotations with Social Chemistry, Delphi and GPT 4.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后，我们将这些注释与社会化学（Delphi）和社会化学（GPT-4）进行比较。", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_585.wav", "doc_id": "rISrKoXQCx.seg_585", "src_text": "So it's kind of like the electric trolley problem.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "有点像电力电费问题。", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_741.wav", "doc_id": "XejEJmgUmE.seg_741", "src_text": "So that is the approach.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是我们的方法，", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_444.wav", "doc_id": "hgIDlKNiFM.seg_444", "src_text": "Afterwards, we ask ourselves how much data do we need to train a specialized model on French data?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "之后，我们问自己：如何评估这些原始数据的质量，并且如何将它们与临床数据进行比较。我们需要多少数据来训练一个专门的模型？", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_111.wav", "doc_id": "uZBWfYjYnf.seg_111", "src_text": "If we look at the main results of EDAtt, we'll plot the simultaneous speech translation results on graphs in which we have BLEU on one side that measures the translation quality, and average lagging that is the latency measure, and we also consider the computational aware average lagging that accounts for the model's computational times to predict the output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果我们看一下这项研究的主要结果，我们将同时预测翻译结果的图表绘制在图表上，其中我们在一侧使用蓝色来衡量翻译质量和平均延迟（即延迟测量）以及计算机感知平均这意味着模型的计算时间与输出的准确性成正比。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_80.wav", "doc_id": "TVCREhgqUP.seg_80", "src_text": "To give you a teaser of the experimental results, here we compare our method with other treeless models on the COGS benchmark.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了给您一个实验结果的佐证，我们这里将我们的方法与科格斯基标记上的其他无树模型进行比较，", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_710.wav", "doc_id": "oaOHnMCwad.seg_710", "src_text": "So for the GPT 4 social acceptability analysis, we find that it's most aligned to confucian and English speaking countries.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在进行GDP四项社会可持续性分析时，我们发现它大多与混乱和英语国家相关，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_462.wav", "doc_id": "hgIDlKNiFM.seg_462", "src_text": "So thank you for this presentation, and we are looking forward to exchange at the poster session in Toronto.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "谢谢你为我们做了这次演讲，我们期待在多伦多邮局采取行动。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_91.wav", "doc_id": "TVCREhgqUP.seg_91", "src_text": "If you want to learn more about our experiments and how we address these challenges, please have a look at our paper or come to our poster.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果您想了解我们对这些挑战的实验和我们如何应对这些挑战，请看我们的论文或来到我们的邮筒。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_621.wav", "doc_id": "oeooqChmKK.seg_621", "src_text": "We evaluate the data set both with human study participants, and established coreference resolution models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们使用人类研究参与者和已建立的问答解决方案模型评估数据集。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_319.wav", "doc_id": "dJGfOSFgZO.seg_319", "src_text": "We call this approach annotating behaviors in chat or ABC-Eval in short.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们称这种方法为注释聊天行为（或缩写为ABCEval在短语中）。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_151.wav", "doc_id": "wLqFAuDnKa.seg_151", "src_text": "In our case, we chose to evaluate with Google Translate.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，我们选择与谷歌翻译合作。", "score": 38.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_549.wav", "doc_id": "rISrKoXQCx.seg_549", "src_text": "Political news media are well covered in their pretraining data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "根据《西方四大报》", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_82.wav", "doc_id": "TVCREhgqUP.seg_82", "src_text": "Some other kinds of structural generalization remain very challenging, though.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，其他一些结构化的泛化仍然非常具有挑战性。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_669.wav", "doc_id": "FLkGnzVRew.seg_669", "src_text": "In summary, we find that PRC is a simple AL strategy for rare class acquisition and cold starting AL with appropriately designed transfer learning task and help significantly.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "总而言之，我们发现PRC是一种简单的AL策略，用于高级收购，并且使用适当的设计的传输学习任务可以帮助显著地提高AL的效果。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_26.wav", "doc_id": "aQpIWggfCo.seg_26", "src_text": "In addition, we reward the script that contains the keywords of the target constraint.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "此外，我们避免包含目标限制关键词的脚本；", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_851.wav", "doc_id": "GvEBWkLmuI.seg_851", "src_text": "And more broadly, dominant groups in society are both linguistically and socially unmarked, while the marginalized groups are usually marked.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "更广泛地说，社会中的主导群体在语言和社会上都没有被标记，而边缘化的群体通常被标记。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_308.wav", "doc_id": "dJGfOSFgZO.seg_308", "src_text": "Hello, I'm James Finch.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你好，我是詹姆斯·芬奇，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_589.wav", "doc_id": "oeooqChmKK.seg_589", "src_text": "Hello everyone, I'm Akshatha, and today my co-author Martin and I are presenting our work \"The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "作者：Akshata,今天我和我的同事Martin一起呈现我们的工作：评估来自多个来源的知识整合：这", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_808.wav", "doc_id": "WTTtiRKFZI.seg_808", "src_text": "So what we did, we extracted various statistics about coordination from the enhanced version of the Penn Treebank and see the paper \"Why wouldn't you use universal dependencies\" and these statistics confirm the observation made many times before that left conjuncts tend to be shorter.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们从Pentagons的增强版本中提取了关于协调的统计数据，并在论文“为什么我们不使用大学依存关系”中看到这些统计数据，这些统计数据证实了之前观察到的左边的同位语往往比右边", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_230.wav", "doc_id": "oYCKgTzTDy.seg_230", "src_text": "We use Google Translate API to translate source to the target language, then use monolingual model to train and evaluation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们使用谷歌翻译API将源语言翻译成目标语言，然后使用单语言模型进行评估。训练和评估。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_544.wav", "doc_id": "dvGkKzmIaN.seg_544", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "们将", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_525.wav", "doc_id": "dvGkKzmIaN.seg_525", "src_text": "In watermark injection, we first define a target embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在水印注入中，我们首先定义一个目标嵌入。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_468.wav", "doc_id": "SUkmfOTvGi.seg_468", "src_text": "Firstly, can these models generalise to modern data?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，这些模型能否推广到现代数据？", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_234.wav", "doc_id": "oYCKgTzTDy.seg_234", "src_text": "We also test Monolingual Few-shot setting by training monolingual models with only 10% of training data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还将测试单语言少样本设置，通过仅使用10%的训练数据来训练单语言模型。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_555.wav", "doc_id": "rISrKoXQCx.seg_555", "src_text": "Secondly, how do language models with different political leanings actually perform on downstream tasks and whether that might result in fairness issues in NLP applications?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "其次，如何语言模型在不同的政治背景下实际上在下游任务中表现良好，而我在使用NL-P应用程序时可能会出现不公平的情况。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_421.wav", "doc_id": "WBLMIsdIrq.seg_421", "src_text": "But then if we use COMET, context-aware models perform best.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但如果我们使用感知模型（context-awaremodels），并且", "score": 18.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_748.wav", "doc_id": "XejEJmgUmE.seg_748", "src_text": "So that is what we call as the mismatch scenario.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这就是我们所说的不匹配场景。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_145.wav", "doc_id": "wLqFAuDnKa.seg_145", "src_text": "So it's important to select the examples from high-quality translations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，重要的是要从高质量的翻译中选择例子，", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_655.wav", "doc_id": "FLkGnzVRew.seg_655", "src_text": "We find that on transferring the zero-shot performance on the annotated data set is already much better than chance with the best, with AUC .62.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现在注释数据集上零样本性能已经远远好于随机猜测，达到0.652的最佳结果。进一步来", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_847.wav", "doc_id": "GvEBWkLmuI.seg_847", "src_text": "The benefit of this is that we get really specific stereotypes and patterns, without having to rely on any specific lexicon.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这的好处是我们可以获得真正的特定语音和模式，而不需要依赖任何特定的词典。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_806.wav", "doc_id": "WTTtiRKFZI.seg_806", "src_text": "It violates one principle, but it satisfies another one.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "它破坏了一个原则，但又满足了另一个原则。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_183.wav", "doc_id": "SLpqvupgvW.seg_183", "src_text": "The first speech bubble is chosen from a few manual prompts per domain.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第一种语音气泡是从几个按键提示中选择的。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_291.wav", "doc_id": "PIZEXUFLAR.seg_291", "src_text": "We also introduce an additional evaluation metric called sensitivity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还引入了另一个评估矩阵，称为敏感度，", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_532.wav", "doc_id": "dvGkKzmIaN.seg_532", "src_text": "Back door data set contains sentences of which all words belong to the trigger set while all words in the sentences of benign data set do not belong to the trigger sets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "后门数据集包含所有单词都属于触发器集的句子。而“benign”等词语不属于触发器集。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_452.wav", "doc_id": "hgIDlKNiFM.seg_452", "src_text": "These models are compared to six baseline models which are CamemBERT OSCAR 138 GB, CamemBERT OSCAR 4 GB, CamemBERT CCNET 4 GB, PubMedBERT, BioBERT, and ClinicalBERT.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这种模型与六种设计模型相似，其中包括卡曼伯格奥斯卡一世三百八十八吉瓦,卡曼伯格奥斯卡四吉瓦,卡曼伯格西网四吉瓦,泵建,生物,和临床.演算结果突显该模型在任务中表现最佳，其数据与其训练数据具有相同的性质。", "score": 34.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_180.wav", "doc_id": "SLpqvupgvW.seg_180", "src_text": "Which is the alternative question.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是一个替代的问题，", "score": 73.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_814.wav", "doc_id": "WTTtiRKFZI.seg_814", "src_text": "Right?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对了，所以", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_141.wav", "doc_id": "wLqFAuDnKa.seg_141", "src_text": "It's crucial for zero and one-shot prompting.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对于零和一次弹发很重要，事实上在我们实际弹", "score": 42.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_251.wav", "doc_id": "oYCKgTzTDy.seg_251", "src_text": "The orange line is Cross-lingual Zero-shot transfer.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "橙色线是跨语言的零虚拟传输，而", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_326.wav", "doc_id": "dJGfOSFgZO.seg_326", "src_text": "From our analysis of these evaluation results, we found that ABC-Eval behavior labels are overall more reliable than labels collected by existing methods, as measured by inter-annotator agreement on 100 doubly-labeled conversations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "从我们对这些评估结果的分析中，我们发现ABEBA行为标签总体上比现有方法收集的标签更可靠，根据1000对双标注对话的内注释一致性进行衡量。此外，ABCElab", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_631.wav", "doc_id": "oeooqChmKK.seg_631", "src_text": "Thanks for listening.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的倾听。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_60.wav", "doc_id": "TVCREhgqUP.seg_60", "src_text": "A popular method to address this is to integrate trees into the models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "答：一种流行的方法是将树集成到模型中。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_204.wav", "doc_id": "SLpqvupgvW.seg_204", "src_text": "For example, \"the one without words\", \"not the one with the 12 year old boy\", or \"the fictional one\", or \"comes from Azerbaijan\", and so on.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如，没有说话的那个人；不是十二岁的男孩；不是虚构的那个人；或者来自阿塞拜疆的那个人", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_560.wav", "doc_id": "rISrKoXQCx.seg_560", "src_text": "We can also see that GPT-4 is the most liberal language model of them all, and GPT series are generally more socially liberal than BART series and its variants.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，GPT-4是最自由的语言模型，而GPT理论通常比BERT理论和变体更自由。", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_626.wav", "doc_id": "oeooqChmKK.seg_626", "src_text": "Additional experiments with fictional knowledge indicated even the best performing models, cannot reliably integrate backward knowledge provided only at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对虚构知识的其他实验表明，即使是表现最好的模型也无法可靠地集成在干涉时提供的背景知识中。为了总结", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_22.wav", "doc_id": "aQpIWggfCo.seg_22", "src_text": "We first show constraint types with examples for InstructGPT and obtain specific goals based on the seed abstract goals.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，我们展示了受限类型的例子，包括INTENT-PT，并基于该抽象获得特定目标。他。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_583.wav", "doc_id": "rISrKoXQCx.seg_583", "src_text": "If we do try to sanitaze somehow, we would also risk censorship, or exclusion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果我们尝试标准化一些，我们也会冒着审查的风险。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_400.wav", "doc_id": "WBLMIsdIrq.seg_400", "src_text": "In this work, we extend CXMI to Pointwise CXMI which can measure context usage at the sentence level or at the word level.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这项工作中，我们将CSMI扩展到YCSMI，这可以用于句子级或词级的内容使用；", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_353.wav", "doc_id": "gGbuDbHhyc.seg_353", "src_text": "But like an elephant in the room this necessity is often overlooked.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但像一头在房间里的大象一样，这种必要性经常被忽视。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_429.wav", "doc_id": "WBLMIsdIrq.seg_429", "src_text": "Thank you so much for your attention.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "感谢您在圣地亚哥会见", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_768.wav", "doc_id": "XejEJmgUmE.seg_768", "src_text": "And the MPP evaluation the way that we do it currently with short and single sentence input, may not fully capture the language models abstract knowledge throughout the context window.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "并且，使用短语和单句输入的MP评估方式（我们目前的评估方式）可能无法完全捕捉语模型的抽象知识。", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_108.wav", "doc_id": "uZBWfYjYnf.seg_108", "src_text": "This means that the first two words will be emitted while since the sum of the cross-attention is above a certain threshold alpha, we will not emit the last word and we wait for another speech chunk.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这意味着前两个词将被丢弃，而后面的词将被保留。当焦虑的压力超过某个阈值α，我们不会发表最后一句话，而是等待另一段话。", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_850.wav", "doc_id": "GvEBWkLmuI.seg_850", "src_text": "So when people are describing a warrior who is a woman, they'll usually actually specify \"woman warrior\" and mark the term with \"woman\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以当人们描述一个女武士时，他们通常会具体说明“一名女武士”，并标记为“女人”。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_390.wav", "doc_id": "WBLMIsdIrq.seg_390", "src_text": "So, depending on context, the meaning of the word changes, and therefore its translation changes as well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，依赖于语境的单词的含义会改变，因此它的翻译也会改变。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_45.wav", "doc_id": "aQpIWggfCo.seg_45", "src_text": "Thanks for your time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "谢谢你的时间，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_548.wav", "doc_id": "rISrKoXQCx.seg_548", "src_text": "So language models are trained on large scale web crawl data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "语言模型在大规模的网上数据集上训练，", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_368.wav", "doc_id": "gGbuDbHhyc.seg_368", "src_text": "Finally, the performance improvement claimed in previous WSL approaches can be easily achieved by allowing to continue fine-tuning on the clean validation samples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，声学库的性能改进，声学库的以前方法可以通过允许在干净的验证样本上继续进行精细调节来轻松实现。", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_168.wav", "doc_id": "SLpqvupgvW.seg_168", "src_text": "This could happen when the user cannot remember the name of the song.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这可能发生在用户无法记住软件名称的情况下。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是模型架构。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_340.wav", "doc_id": "dJGfOSFgZO.seg_340", "src_text": "Thank you for watching.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "对话智能的发展。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_785.wav", "doc_id": "WTTtiRKFZI.seg_785", "src_text": "Now the aim of this paper is to produce a novel argument for the symmetric structures of coordination, like these two and against the asymmetric structures of coordination, like these two.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这本书的目的是为了产生一种新的论点来对抗这种协调的对称结构和这种对称结构。", "score": 31.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_181.wav", "doc_id": "SLpqvupgvW.seg_181", "src_text": "And in the third speech bubble, Bob uses an indirect reference to select one of these entities, for example, \"the newer one.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第三个语气泡，鲍勃使用了间接引用来选择其中一个实体，例如更新的实体。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_604.wav", "doc_id": "oeooqChmKK.seg_604", "src_text": "After a long day at work deciding cases in a law court, he was happy to relax.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在法庭上做判决后，他很高兴可以放松一下。在", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_798.wav", "doc_id": "WTTtiRKFZI.seg_798", "src_text": "But it's also OK to say, \"Marge read yesterday this absolutely fascinating book about bees.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了", "score": 10.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_22.wav", "doc_id": "aQpIWggfCo.seg_22", "src_text": "We first show constraint types with examples for InstructGPT and obtain specific goals based on the seed abstract goals.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，我们展示带示例的受限类型，用于显示intracpt，基于所述抽象目标获得特定目标。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_117.wav", "doc_id": "uZBWfYjYnf.seg_117", "src_text": "And we see that it outperforms all the strategies applied to offline models since the curves are shifted over the left.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们看到，EAD优于所有应用于离线模型的策略，因为它们的曲线向左偏移。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_563.wav", "doc_id": "rISrKoXQCx.seg_563", "src_text": "By further pretraining language models on such partisan corpora we can see that the ideological coordinates of the language model also correspondingly shift.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通过进一步预训练语言模型和类似的语料库，我们可以看到语言模型的理想协调也会随之变化。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_821.wav", "doc_id": "WTTtiRKFZI.seg_821", "src_text": "So I'll concentrate on the right one.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "精力于正确的单词。", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_690.wav", "doc_id": "oaOHnMCwad.seg_690", "src_text": "However these works really don't look at comparing end users with the datasets and models themselves, and studying model and data set positionality is increasingly important as NLP tasks become more subjective and socially oriented, and it's challenging to characterise how these positionalities are skewed because not all decisions are documented and many models are hidden behind APIs.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，这些研究真正不考虑的是比较终端用户与数据集和模型本身的比较，并且研究模型和数据集的位置性变得越来越重要，因为NLP任务变得越来越主观和社会性定向，并且挑战性地描述这些位置性是如何扭曲的，因为并非所有决策都有文档，并且许多模型都被API隐藏了。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_394.wav", "doc_id": "WBLMIsdIrq.seg_394", "src_text": "In this work, we try to answer these two questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这项工作中，我们试图回答这两个问题：", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_802.wav", "doc_id": "WTTtiRKFZI.seg_802", "src_text": "When you swap these two constituents, the sum of these two dependencies becomes 6.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "交换这两个构成部分时，这两个依赖关系的部分总和为6（正确", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_332.wav", "doc_id": "dJGfOSFgZO.seg_332", "src_text": "These reliable, informative, and distinct ABC-Eval metrics enable us to evaluate conversational AI with a higher resolution than previous methods are able to achieve.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这些可靠、信息丰富和独特的ABC-EVL指标使我们能够用更高的分辨率评估会话AI。你可以", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_695.wav", "doc_id": "oaOHnMCwad.seg_695", "src_text": "And we ought to do this over looking at the demographics of original data sets annotators, because, usually only a few annotators annotate each instance and because demographics are rarely collected and shared.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们希望在查看原始数据集的统计图（注释器）时做到这一点，因为通常只有很少的注释器注释了每个实例，并且统计图通常是收集和共享的，", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_858.wav", "doc_id": "GvEBWkLmuI.seg_858", "src_text": "So, really just only the positive or at least non-negative ones.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "(1)但", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_830.wav", "doc_id": "GvEBWkLmuI.seg_830", "src_text": "In recent years, many have documented the prevalence of social bias and stereotypes in large language models, or LLMs.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "近年来，许多人都记录了大型语言模型（LLMs）中社会偏见和刻板印象的普遍性。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_351.wav", "doc_id": "gGbuDbHhyc.seg_351", "src_text": "Technically, this claim is not wrong, but there's a catch, which is that people do assume that there's an additional clean validation set available for model selection.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "从技术上讲，这个说法是正确的，但有一个陷阱，即人们确实假设有一个额外的清洁验证集可用于模型选择。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_652.wav", "doc_id": "FLkGnzVRew.seg_652", "src_text": "To alleviate this, we experiment over combinations of transfer learning and active learning to annotate such that more dissonant samples can be collected over lesser annotation runs, lowering the overall annotation costs while improving dissonance detection.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了缓解这一问题，我们实验了转移学习和主动学习的组合，以便在更少的注释轮次中收集更多的不一致样本，从而降低总体注释成本，同时改善不一致检测。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_305.wav", "doc_id": "PIZEXUFLAR.seg_305", "src_text": "So one more thing, we are collecting a much larger multi-model instruction tuning dataset with around 150 additional vision language tasks and we will release them.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后一个问题：我们收集了一个更大的多模态指令调节数据集，包含大约150个额外的视觉语言任务，并将其发布，因此", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_47.wav", "doc_id": "TVCREhgqUP.seg_47", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你好，", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_31.wav", "doc_id": "aQpIWggfCo.seg_31", "src_text": "Creating the dataset is an essential step to this end.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "创建数据集是实现这一目标的必需步骤。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_20.wav", "doc_id": "aQpIWggfCo.seg_20", "src_text": "Previous studies have shown that the output quality of language models falls in high variance, leading to bad performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "前面的研究已经表明，低层次模型的输出质量在高变异性中下降，导致性能下降。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_855.wav", "doc_id": "GvEBWkLmuI.seg_855", "src_text": "So first we use a lexicon of stereotypes, and we find that the generated personas contain a lot more stereotypes than the human-written ones.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先我们使用了一个类似于stereotypes的词汇，发现生成的个人包含的stereotypes比人类所包含的更多。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_715.wav", "doc_id": "oaOHnMCwad.seg_715", "src_text": "An example of this is that datasets and models are less aligned to non binary people compared to the men and women counterparts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，我们发现数据集模型与非二元人群相比，与男性和女性对手相比，数据集模型的准确度较低。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_536.wav", "doc_id": "dvGkKzmIaN.seg_536", "src_text": "Meanwhile, we also apply KS test and use its p-value as the third metric.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "同时，我们还应用kS测试，并将其p值作为第三个指标。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_395.wav", "doc_id": "WBLMIsdIrq.seg_395", "src_text": "First, when does translation require context?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，翻译何时需要上下文？", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_177.wav", "doc_id": "SLpqvupgvW.seg_177", "src_text": "In the first bubble, Bob says, \"Remember that song we were listening to yesterday?\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在第一个泡泡里，鲍勃说：“记得昨天我们听的那首歌吗？”", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_527.wav", "doc_id": "dvGkKzmIaN.seg_527", "src_text": "The provided embedding is a weight summation of the target embedding and the original embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "提供的嵌入是目标嵌入和原始嵌入的加权和。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_667.wav", "doc_id": "FLkGnzVRew.seg_667", "src_text": "We find that PRC has the highest percentage of dissonance and works best for rare class.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现PRC在离散度和适用于稀疏类的工作中具有最高百", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_167.wav", "doc_id": "SLpqvupgvW.seg_167", "src_text": "But sometimes an indirect reference is more appropriate to have a more natural conversation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但有时直接引用更适合自然的对话；", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_566.wav", "doc_id": "rISrKoXQCx.seg_566", "src_text": "So we divide pretraining corpora, into pre 45th president of the United States and after 45th president of the United States.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们将预备军分为两部分：在美国第四十五届总统任期前和美国第四十五届总统任期后，", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_616.wav", "doc_id": "oeooqChmKK.seg_616", "src_text": "For example, because new occupations have developed since the time of pretraining.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，因为新职业自培训前就已经发展起来了。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_639.wav", "doc_id": "FLkGnzVRew.seg_639", "src_text": "While dissonance is a very common phenomenon we experienced in daily decision making, they are really rare to find expressed in language among other kinds of discourse relations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "们有共识关系。但疏远是非常普遍的现象，我们在日常决策中经历过，他们确实很渴望用语言表达其他类型的关系。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_387.wav", "doc_id": "WBLMIsdIrq.seg_387", "src_text": "For example, how would we translate \"mole\" in this sentence?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，我们如何翻译这句话中的更多单词？然而，", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_515.wav", "doc_id": "dvGkKzmIaN.seg_515", "src_text": "Finally, the watermark needs to be transferable to the attacker's services during the model extraction process.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，在模型提取过程中，水印需要转移到攻击者的表面。", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_60.wav", "doc_id": "TVCREhgqUP.seg_60", "src_text": "A popular method to address this is to integrate trees into the models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "一个流行的方法是将树集成到模型中。这些", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_152.wav", "doc_id": "wLqFAuDnKa.seg_152", "src_text": "The insights that we gained from the human evaluation that we performed using the MQM framework said that the fluency of PaLM is comparable to state-of-the-art systems but the main difference comes from the accuracy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们从人类分析中获得的见解（我们使用MCM框架）是，棕榈油的流动性与艺术系统的状态相似，但主要的不同之处在于精确度。", "score": 38.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_63.wav", "doc_id": "TVCREhgqUP.seg_63", "src_text": "This can be complicated and sometimes a computationally expensive process.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这可能会很复杂，有时会涉及计算上昂贵的过程。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_722.wav", "doc_id": "oaOHnMCwad.seg_722", "src_text": "And a good example of this is the Masakhani initiative.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "MasakaniInitiative。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_216.wav", "doc_id": "oYCKgTzTDy.seg_216", "src_text": "Today I'm going to present our work \"XSemPLR: Cross-Lingual Semantic Parsing in Multiple Natural Languages and Meaning Representations\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "将展示我们的工作：跨语言语义解析在多种自然语言和多种表示中。语义解析是任", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_519.wav", "doc_id": "dvGkKzmIaN.seg_519", "src_text": "Then let me introduce the details of our embedding marker.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后让我介绍一下我们的嵌入标记器的细节。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_537.wav", "doc_id": "dvGkKzmIaN.seg_537", "src_text": "We conduct experiments on four data sets AG News, MIND, SST2 and Enron Spam.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们在四个数据集上进行了实验。AgeNews，Mind，SST2和ErrorSpam。", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_686.wav", "doc_id": "oaOHnMCwad.seg_686", "src_text": "And as a researcher, positionality can influence the research process and its outcomes and results because it can change the decisions that researchers make.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "作为研究人员，位置性可以影响研究过程及其结果和结果，因为它可以改变研究人员做出的决定。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_334.wav", "doc_id": "dJGfOSFgZO.seg_334", "src_text": "For example, the bots we tested have common sense violations in around 20% of their responses.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，我们测试的机器人在大约20%的反应中表现出道德感。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_306.wav", "doc_id": "PIZEXUFLAR.seg_306", "src_text": "So this is a QR code for our data and model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是我们数据和模型的QR码。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_846.wav", "doc_id": "GvEBWkLmuI.seg_846", "src_text": "The second part is marked words, which is a method to identify the words that distinguish marked groups from unmarked ones, which I'll elaborate on shortly.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二部分是标记词，这是一种方法来识别区分标记组和未标记组的词语，我将在不久的将来详细说明。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_14.wav", "doc_id": "aQpIWggfCo.seg_14", "src_text": "This table reports the overall accuracy of the results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "该表报告了结果的总体准确性。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_186.wav", "doc_id": "SLpqvupgvW.seg_186", "src_text": "Do you mean A or B?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "你是指A还是B？其中", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_861.wav", "doc_id": "GvEBWkLmuI.seg_861", "src_text": "In our analysis, we reveal how these seemingly positive portrayals reflect harmful patterns.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在我们的分析中，我们展示了这些看似积极的形象如何反映有害的模式。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_772.wav", "doc_id": "WTTtiRKFZI.seg_772", "src_text": "As you may know, there are different dependency structures assumed by different theories and corpus approaches.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "处理了一个连接。现在还有对称的方法来处理连接结构，如Pragglejaz的方法，依赖性三元组的连接结构是由连接来头的。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_69.wav", "doc_id": "TVCREhgqUP.seg_69", "src_text": "First, we tag each input token with an unordered multiset of tokens that will appear in the output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，我们将每个输入令牌标记为输出中将出现的令牌的无序多集。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_63.wav", "doc_id": "TVCREhgqUP.seg_63", "src_text": "This can be complicated and sometimes a computationally expensive process.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这可能会变得复杂，有时甚至是一个计算上昂贵的过程；", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_593.wav", "doc_id": "oeooqChmKK.seg_593", "src_text": "But natural language understanding often requires knowledge that is also supplied at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但是，自然语言理解往往需要在推理时提供的知识。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_137.wav", "doc_id": "wLqFAuDnKa.seg_137", "src_text": "So, it's important to select a good prompting strategy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "选择一个好的促进策略是很重要的。答：.", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_462.wav", "doc_id": "hgIDlKNiFM.seg_462", "src_text": "So thank you for this presentation, and we are looking forward to exchange at the poster session in Toronto.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢主持人为这次演讲提供了机会，我们期待在多伦多的国际邮政联盟会议上与大家见面。", "score": 31.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_175.wav", "doc_id": "SLpqvupgvW.seg_175", "src_text": "Our data set collection methodology emphasizes informality using a cartoon completion setup.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的数据集收集方法使用卡通完成设置强调非正式性。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_863.wav", "doc_id": "GvEBWkLmuI.seg_863", "src_text": "And these words define these groups only by their relationship to their identity and distinguish them as different from the white norm.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这些词汇只通过它们与身份的关系来定义这些组，并将它们区分开于白人规范。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_344.wav", "doc_id": "gGbuDbHhyc.seg_344", "src_text": "I'd like to begin with a brief introduction to weak supervision and weakly supervised learning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我想从介绍周监督和周监督学习开始。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_602.wav", "doc_id": "oeooqChmKK.seg_602", "src_text": "Kea is a Baker.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "凯亚是面包师。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_183.wav", "doc_id": "SLpqvupgvW.seg_183", "src_text": "The first speech bubble is chosen from a few manual prompts per domain.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个语音泡从每个域的几条手动提示中选择。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_621.wav", "doc_id": "oeooqChmKK.seg_621", "src_text": "We evaluate the data set both with human study participants, and established coreference resolution models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们通过与人类研究参与者一起评估数据集，并建立了已建立的参考解模型；", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_85.wav", "doc_id": "TVCREhgqUP.seg_85", "src_text": "As a consequence, for a given token we don't know which multiset it came from, which poses a challenge for training.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，对于给定的令牌，我们不知道它来自哪个多集，哪个多集对训练构成了挑战。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_7.wav", "doc_id": "aQpIWggfCo.seg_7", "src_text": "In this paper, we define the problem of constrained language planning which imposes different constraints on the goals of planning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在本文中，我们定义了受限语言规划的问题。这对规划目标施加了不同的约束;", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_35.wav", "doc_id": "aQpIWggfCo.seg_35", "src_text": "In total, we generate 55,000 specific goals with scripts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "总的来说，我们生成了50,000个特定的脚本", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_837.wav", "doc_id": "GvEBWkLmuI.seg_837", "src_text": "And we can immediately see that this is very generalizable to any demographic because we can just specify whatever identity marker that we want into this prompt.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们可以立即看到这对于任何人口普查都非常普遍，因为我们可以简单地指定我们想要在这个提示中输入的任何身份标记。所", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_178.wav", "doc_id": "SLpqvupgvW.seg_178", "src_text": "And with that, Bob sets the dialogue context.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后，鲍勃设定了对话背景。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_595.wav", "doc_id": "oeooqChmKK.seg_595", "src_text": "Pretrained parameters can contain information about what presidents do and what a TV is but they cannot reliably know who this instance-specific entity \"John\" is, or who the new president is, because the president might have changed since pretraining.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "预备役参数可以包含关于总统做什么和ATLE是什么的信息，但他们不能可靠地知道这个特定实体是谁，或者新总统是谁，因为总统可能在预备役期间改变了。", "score": 42.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_147.wav", "doc_id": "wLqFAuDnKa.seg_147", "src_text": "The dev data is much more curated, and with higher quality than the training data, that it's more noisy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "数据或DEFT数据：DEFT数据更精确，质量更高，结果更好，", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_34.wav", "doc_id": "aQpIWggfCo.seg_34", "src_text": "We appy our method for building a dataset of constrained language planning, named as CoScript.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们将规划我们的方法来构建一个受限语言规划数据集，称为“代码”。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_717.wav", "doc_id": "oaOHnMCwad.seg_717", "src_text": "So, given that there is positionality in NLP, what can we do about it?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，鉴于在NLP中存在位置和ALD，我们可以如何做？", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_675.wav", "doc_id": "oaOHnMCwad.seg_675", "src_text": "I'm Jenny, a first year PhD student at Carnegie Mellon University and today I'll be presenting your work NLPositionality characterising design biases of datasets and Models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "是卡内基·梅尔伦大学的第一届博士生，今天将会展示我的作品，展示设计师通过视觉数据集模型设计的设计。", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_688.wav", "doc_id": "oaOHnMCwad.seg_688", "src_text": "And we're not trying to say that models themselves in data sets themselves have demographic identities and life experiences, but they do aggregate judgments and opinions of real people, and can thus represent certain positionalities over others.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们并不是说模型和数据集本身具有人口统计特征和生活经历，但它们确实聚合了真实人群的判断和意见，并因此代表了某些位置性。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_98.wav", "doc_id": "uZBWfYjYnf.seg_98", "src_text": "And training and maintaining several models to reach different latency regimes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "训练和维护多个模型以达到不同的延迟模式，例如训练一个模型", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_525.wav", "doc_id": "dvGkKzmIaN.seg_525", "src_text": "In watermark injection, we first define a target embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在水印注入中，我们首先定义一个目标嵌入。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_189.wav", "doc_id": "SLpqvupgvW.seg_189", "src_text": "When we move higher in the list, the entities become more similar to each other and it's usually harder to make the disambiguation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "列表中的更高位置时，实体变得彼此更加相似，通常更难进行消歧。我们使用以下不同采样方法：", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_456.wav", "doc_id": "hgIDlKNiFM.seg_456", "src_text": "Overall, from-scratch pre-training seems to obtain higher performance on most of the tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "总的来说，冲刺免疫训练似乎在大多数任务中获得了更好的表现。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_170.wav", "doc_id": "SLpqvupgvW.seg_170", "src_text": "Or when the user wants to specify a preference.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "或者当用户想要指定一个偏好时，例如更新的歌曲或者不", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_771.wav", "doc_id": "WTTtiRKFZI.seg_771", "src_text": "Hi, my name is Adam Przepiórkowski and this talk is about the Dependency Structure of Coordination.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "(1)在协调结构中，协调结构的头是整个结构的第一个成分。这两种方法是对称的，因为它们都单独", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_506.wav", "doc_id": "dvGkKzmIaN.seg_506", "src_text": "Embedding as services is one of the services built upon large language models to assist various, NLP tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "嵌入式服务是基于大型语言模型构建的服务之一，用于协助各种NLP任务。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_433.wav", "doc_id": "hgIDlKNiFM.seg_433", "src_text": "Then we will present the main contribution of our article.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后我们将介绍我们文章的主要贡献：", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_203.wav", "doc_id": "SLpqvupgvW.seg_203", "src_text": "Here are some examples from our dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "个人；例", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_545.wav", "doc_id": "dvGkKzmIaN.seg_545", "src_text": "Welcome to discuss with us.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "将来与我们讨论。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_78.wav", "doc_id": "TVCREhgqUP.seg_78", "src_text": "We determine the third token in the output in a similar way by jumping to another multiset token.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们通过跳到另一个多集令牌来确定输出中的第三个令牌；", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_669.wav", "doc_id": "FLkGnzVRew.seg_669", "src_text": "In summary, we find that PRC is a simple AL strategy for rare class acquisition and cold starting AL with appropriately designed transfer learning task and help significantly.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "“”总之，我们发现PRC是一种简单的AL策略，适用于稀疏类的获取，并且与适当设计的转移学习任务一起启动AL可以帮助显著地。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_157.wav", "doc_id": "wLqFAuDnKa.seg_157", "src_text": "For more details, please come to the full presentation of the paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如需更多细节，请随我一起参加这篇论文的完整演讲。", "score": 73.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_678.wav", "doc_id": "oaOHnMCwad.seg_678", "src_text": "You might turn towards a popular API like Prospective API for toxicity detection, and this works really well if you're Carl Jones.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "你可能会转向一个流行的API，例如用于毒性检测的API，正如卡尔·琼斯（CarlJones）所说，能够正确检测", "score": 43.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_767.wav", "doc_id": "XejEJmgUmE.seg_767", "src_text": "So, the key takeaways of our work is that language models are sensitive to latent syntactic and semantic features which are shared across the sentences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们工作的关键观点是语言模型对句子间共享的隐含语法和语义特征敏感，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_480.wav", "doc_id": "SUkmfOTvGi.seg_480", "src_text": "The second ingredient is the model size.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二个成分是模型大小。", "score": 99.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_529.wav", "doc_id": "dvGkKzmIaN.seg_529", "src_text": "When a number of triggers in the sentence is greater than m the provided embedding is exactly equal to the target embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "当句子中的触发器数量大于m时，提供的嵌入与目标嵌入完全相同。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_480.wav", "doc_id": "SUkmfOTvGi.seg_480", "src_text": "The second ingredient is the model size.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第二个成分是模型大小；", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_241.wav", "doc_id": "oYCKgTzTDy.seg_241", "src_text": "And we also find many interesting results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还发现了许多有趣的结果。", "score": 99.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_668.wav", "doc_id": "FLkGnzVRew.seg_668", "src_text": "However, the annotators also find the examples difficult.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "注释者也发现了例子很难。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_647.wav", "doc_id": "FLkGnzVRew.seg_647", "src_text": "Tweets were passed using the PDTB parser, and pairs of discourse units were annotated according to the guidelines that are described in our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "推文使用PDTB分析器进行分析，根据我们论文中描述的指南对话单元对齐。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_124.wav", "doc_id": "wLqFAuDnKa.seg_124", "src_text": "PaLM is a 540 billion-parameter large language model presented last year in 2022.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：：[伊Pram是一款540亿参数的大型语言模型，于2022年在2022年发布。", "score": 42.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_628.wav", "doc_id": "oeooqChmKK.seg_628", "src_text": "However, with task-specific training, some models successfully integrate knowledge from multiple sources.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，在进行特定任务训练后，某些模型成功地从多个来源中整合了知识。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_86.wav", "doc_id": "TVCREhgqUP.seg_86", "src_text": "In addition, sometimes there are multiple permutations that are consistent with the data, but the linguistically correct one is latent.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "此外，有时会出现多个与数据一致的变换，但其中一个在语言上是正确的，但却是隐蔽的；", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_561.wav", "doc_id": "rISrKoXQCx.seg_561", "src_text": "Secondly, we aim to investigate to which extent the political biases of language models are actually picked up from training data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "其次，我们希望投资到哪些程度，政治语言模型实际上是从训练数据中提取出来的。", "score": 47.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_534.wav", "doc_id": "dvGkKzmIaN.seg_534", "src_text": "The cosine and L2 similarity between the requested embedding and the target embedding are computed.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "计算了请求的嵌入和目标嵌入之间的余弦和L2相似性；", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_94.wav", "doc_id": "uZBWfYjYnf.seg_94", "src_text": "Simultaneous speech translation, or SimulST, is the process of translating spoken language into a text in another language in real time, enabling cross-language communication.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "neousST）是将口语翻译成另一种语言的文本的过程，实时进行，实现跨语言的交流。问题在于", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_61.wav", "doc_id": "TVCREhgqUP.seg_61", "src_text": "The trees are intended to capture the compositional process that relates utterances with the logical forms.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "树的目的是捕捉与逻辑形式相关的意指。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_285.wav", "doc_id": "PIZEXUFLAR.seg_285", "src_text": "During training, we mix all the instances for all the tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在训练期间，我们混合所有任务的所有实例，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_635.wav", "doc_id": "FLkGnzVRew.seg_635", "src_text": "Simply put, cognitive dissonance is two beliefs or actions that are inconsistent, such as this example where a person states, \"I know that cigarettes could kill me\", and then goes on to say \"I grabbed a couple of smokes after the meeting\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "简单来说，认知不一致性是两个相互矛盾的信念或行为。例如，一个人说“我知道烟草会杀死我”，然后接着说“我在会后抽了几口烟”，", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_511.wav", "doc_id": "dvGkKzmIaN.seg_511", "src_text": "The watermark method need to meet the following properties.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "水印方法必须符合以下属性：", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_316.wav", "doc_id": "dJGfOSFgZO.seg_316", "src_text": "One approach is to simply ask human judges to evaluate several dimensions of dialogue quality, such as the relevance of model responses using existing comparative or Likert scale methods.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "一种方法是简单地要求人类评判者评估对话质量的几种维度，例如使用现有的比较或利克特等级方法评估模型响应的相关性；", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_807.wav", "doc_id": "WTTtiRKFZI.seg_807", "src_text": "Ok.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "好，所以", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_143.wav", "doc_id": "wLqFAuDnKa.seg_143", "src_text": "It's the examples that carry most of the weight.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "它是最重的例子", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_728.wav", "doc_id": "XejEJmgUmE.seg_728", "src_text": "Hi, everyone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大家好，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_274.wav", "doc_id": "PIZEXUFLAR.seg_274", "src_text": "For investigating multi-modal instruction tuning on our proposed dataset, we take OFA, a unified multi-modal pre-trained model, as our base model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了研究我们拟议的数据集中的多模态指令调谐，我们采用OFA作为我们的基准模型；", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_646.wav", "doc_id": "FLkGnzVRew.seg_646", "src_text": "We used dissonance-first approach, as seen in the flow chart here.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们使用了不协调的第一种方法，如图所示。", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_790.wav", "doc_id": "WTTtiRKFZI.seg_790", "src_text": "Right?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "天我读", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_219.wav", "doc_id": "oYCKgTzTDy.seg_219", "src_text": "As shown in this figure, we need to translate the query in multiple natural languages using neural models to SQL, Lambda or FunQL, and etcetera.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如图所示，我们需要将问题翻译成多种自然语言，使用神经模型：2、CQA、LAMBDA或FQL等。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_853.wav", "doc_id": "GvEBWkLmuI.seg_853", "src_text": "So for instance, for the personas of black women, we would do Fightin’ Words and compare the log-odds ratios against both white personas and man personas because those are the two corresponding unmarked groups.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，为了黑人女性的身份，我们将使用战斗的词汇，并比较白人和男性对黑人女性的比例，因为这两组是相互对立的未标记组。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_197.wav", "doc_id": "SLpqvupgvW.seg_197", "src_text": "For songs, we simply show a Google search link to each song and then ask the annotators to listen to at least some of each song, and read about each song.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对于一些，我们只是显示每首歌的谷歌搜索链接，然后要求注释者至少听一下其中一些。每首歌和每首歌的阅读。", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_780.wav", "doc_id": "WTTtiRKFZI.seg_780", "src_text": "The conjunction headed approach assumed in Prague dependency treebanks, where coordinate structures are headed by the conjunction.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "普拉格方法，联合在普拉格依赖三角洲的联合过程中，协调结构由联合过程控制。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_751.wav", "doc_id": "XejEJmgUmE.seg_751", "src_text": "Finally, we can choose sentences from a completely unrelated domain such as Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，我们可以从完全无关的域，如维基百科中选择句子。因此，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_535.wav", "doc_id": "dvGkKzmIaN.seg_535", "src_text": "We compute the similarity difference between benign and backdoor data set which is defined as delta cosine and delta L2.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们计算了benign和backdoor数据集之间的相似度差异。它被定义为deltacos和deltaL2。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_130.wav", "doc_id": "wLqFAuDnKa.seg_130", "src_text": "And we compared to state-of-the-art systems, so the best performing system, so the WMT evaluation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们比较了两个最先进的系统，即在WMT评估中表现最佳的系统。", "score": 31.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_788.wav", "doc_id": "WTTtiRKFZI.seg_788", "src_text": "So in English, as you might know, direct objects prefer to be close to the verb, while adjuncts may be further away.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "于这个原则的。所以在英语中，直接对象喜欢被闭在词中，而不是被远离", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_871.wav", "doc_id": "GvEBWkLmuI.seg_871", "src_text": "So rather than actually working towards changing those obstacles, it puts pressure on those people to overcome them, which leads to a very negative health outcomes for these people, among other harms.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，而不是实际上努力改变这些障碍。这导致这些人在其他方面的健康问题加剧。", "score": 40.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_859.wav", "doc_id": "GvEBWkLmuI.seg_859", "src_text": "And in fact, this lexicon doesn't really capture many of the harmful patterns that we saw in the earlier slides well at all.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "事实上，这个词典并没有真正捕捉到我们早期版本中看到的许多有害模式，", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_876.wav", "doc_id": "GvEBWkLmuI.seg_876", "src_text": "And finally, there should really be increased transparency about bias mitigation methods, because for instance, like these positive stereotypes, we don't know if it's because there is some sort of weird overly-excessive value alignment going on, or maybe some other anti-stereotyping methods that are resulting in these pernicious patterns.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，关于偏差减法方法的透明度应该有所提高。因为，例如，这些积极的成见我们不知道是因为什么，因为有某种奇怪的...过度的价值定位，或者可能是其他导致这些可怕模式的", "score": 71.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_23.wav", "doc_id": "aQpIWggfCo.seg_23", "src_text": "Then, InstructGPT over-generates K scripts for specific goals.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后，指令gpt生成用于特定目标的脚本。", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_282.wav", "doc_id": "PIZEXUFLAR.seg_282", "src_text": "We use all the instances in the test split for each task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们在测试分为每个任务使用所有实例；", "score": 69.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_30.wav", "doc_id": "aQpIWggfCo.seg_30", "src_text": "Since large language models are costly to deploy, it's essential to enable language planning ability of smaller and specialized models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "由于大型语言模型的部署成本高昂，因此有必要能够让语言规划者能够使用一些较小和专门化的模型。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_507.wav", "doc_id": "dvGkKzmIaN.seg_507", "src_text": "For example, OpenAI offers a GPT based embedding API.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，OpenNLP提供了一个基于GPD的嵌入API。", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_728.wav", "doc_id": "XejEJmgUmE.seg_728", "src_text": "Hi, everyone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "大家好，", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_874.wav", "doc_id": "GvEBWkLmuI.seg_874", "src_text": "First, we should, as researchers, be addressing positive stereotypes and essentializing narratives.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，我们作为研究者应该对正面刻板印象和本质化叙述进行分析；", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_500.wav", "doc_id": "dvGkKzmIaN.seg_500", "src_text": "Hello everyone, my name is Jingwei Yi from the University of Science and Technology of China.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "大家好，我是中国科学技术大学的金伟易。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_113.wav", "doc_id": "uZBWfYjYnf.seg_113", "src_text": "But also we want that they are shifted on the left.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但我们还希望他们被移至左侧", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_496.wav", "doc_id": "SUkmfOTvGi.seg_496", "src_text": "And we found that the answer is actually a resounding yes.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现答案实际上是响亮的“是”。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_427.wav", "doc_id": "WBLMIsdIrq.seg_427", "src_text": "We also compared different commercial systems and our benchmark shows that DeepL is usually more accurate than Google Translate for document-level translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还比较了不同的商业系统，并且我们的基准测试表明，DeepLL通常比GoogleTranslate更准确地进行文档本地化翻译。为了", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_658.wav", "doc_id": "FLkGnzVRew.seg_658", "src_text": "Next, we determine the best method to update a model with new data from each round of active learning and annotations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "接下来，我们将确定使用每轮活动学习和注释中的新数据更新模型的最佳方法。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_200.wav", "doc_id": "SLpqvupgvW.seg_200", "src_text": "For recipes, we additionally show their images, again from Wikipedia, so that the annotators know how they look like.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对于食谱，我们还展示它们的图像，来自维基百科，以便注释者知道它们看起来是什么样子的。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_350.wav", "doc_id": "gGbuDbHhyc.seg_350", "src_text": "In recent works in WSL, so WSL stands for Weakly Supervised Learning, a common claim is that people say that they only train models on the weakly labeled data and achieve high performance on clean test sets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在最近的WSL研究中，WSL的意思是“每周监督学习”。人们普遍认为，只在每周级别的数据上训练模型，并在干净的测试集上获得高性能。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_374.wav", "doc_id": "gGbuDbHhyc.seg_374", "src_text": "Our concrete recommendations for future work are as follows.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的具体建议是：未来工作的具体建议如下。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_645.wav", "doc_id": "FLkGnzVRew.seg_645", "src_text": "To the goal of creating a cognitive dissonance resource, we conducted a large scale annotation of dissonance relations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了创建认知的离散资源，我们进行了大量的离散关系的注入；", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_523.wav", "doc_id": "dvGkKzmIaN.seg_523", "src_text": "The trigger set is a group of words in a moderate frequency interval.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "触发器集是位于一个中频间隔中的单词组。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_241.wav", "doc_id": "oYCKgTzTDy.seg_241", "src_text": "And we also find many interesting results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还发现了许多有趣的结果。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_503.wav", "doc_id": "dvGkKzmIaN.seg_503", "src_text": "Protecting the copyright of large language models for embedding as services via backdoor watermark.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "保护大语言模型的版权，用于嵌入和服务的背后水印。首", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_798.wav", "doc_id": "WTTtiRKFZI.seg_798", "src_text": "But it's also OK to say, \"Marge read yesterday this absolutely fascinating book about bees.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "是，这是可能的，", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_701.wav", "doc_id": "oaOHnMCwad.seg_701", "src_text": "We host 2 tasks on lab in the wild, one of them being social acceptability, and the way this works is that participants will read a situation from the social chemistry dataset and, then they'll write how socially acceptable a situation is.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们在世界上有两项测试，一个是社会适应性，通过这种方式，参与者将从社会化学数据中读出一种情况，接着他们将正确地了解这种社会适应性。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_607.wav", "doc_id": "oeooqChmKK.seg_607", "src_text": "First, entity-specific knowledge such as \"Servin is a judge.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，实体特定知识，例如“仆人”是法官。第", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_761.wav", "doc_id": "XejEJmgUmE.seg_761", "src_text": "Now this and this is very large like this effect, increases throughout the context length and this would probably affect like newer language models which has large context window.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在，这是一个非常大的效果，例如，这种效果贯穿整个语境，并且这可能会影响具有较大语境窗口的新语言模型。", "score": 67.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_24.wav", "doc_id": "aQpIWggfCo.seg_24", "src_text": "Next, a filter model is developed to select the faithful scripts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "接下来，开发了一个过滤器模型来选择可视化脚本。", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_777.wav", "doc_id": "WTTtiRKFZI.seg_777", "src_text": "Right.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "好，很有趣。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_471.wav", "doc_id": "SUkmfOTvGi.seg_471", "src_text": "To investigate these problems, we developed the CoNLL++ Dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了调查这些问题，我们开发了卡诺尔+数据集，", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_52.wav", "doc_id": "TVCREhgqUP.seg_52", "src_text": "As usual, we have a training set of utterances.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "行：我们通常有一个训练语", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_746.wav", "doc_id": "XejEJmgUmE.seg_746", "src_text": "So we can do the same thing by choosing unacceptable sentences from the same matching, and that could also be used to test the models acceptability.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们可以通过从相同的匹配中选择不可接受的句子来做同样的事情，这也可以用于测试模型的可接受性。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_312.wav", "doc_id": "dJGfOSFgZO.seg_312", "src_text": "So let's say that you just developed a dialogue model and you want to see how well it compares against the current state-of-the-art.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，你刚刚开发了一个对话模型，你想看看它与当前艺术状态的比较。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_44.wav", "doc_id": "aQpIWggfCo.seg_44", "src_text": "We hope the CoScript dataset can be a valuable resource to advance research on language planning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们希望CoScript数据集可以成为推进语言规划研究的宝贵资源。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_867.wav", "doc_id": "GvEBWkLmuI.seg_867", "src_text": "For Asian women, the words are things like \"petite\" and \"delicate\" and \"silky\" which connects to a long history of Asian women being hyper-sexualized, seen as very docile and submissive, and so on.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，例如“小巧”和“精致”和“柔和”。这与亚洲女性长期以来被性化的历史有关，她们被认为是非常淫乱和顺从的。", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_53.wav", "doc_id": "TVCREhgqUP.seg_53", "src_text": "In this case, \"The girl slept.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "句集，在这种情况下，女孩睡着了，", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_291.wav", "doc_id": "PIZEXUFLAR.seg_291", "src_text": "We also introduce an additional evaluation metric called sensitivity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还引入了一个额外的评估指标，称为敏感度。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_228.wav", "doc_id": "oYCKgTzTDy.seg_228", "src_text": "And to better evaluate our benchmark, we consider the six settings for training and evaluation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了更好地评估我们的基准，我们考虑了六种训练和评估设置。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_790.wav", "doc_id": "WTTtiRKFZI.seg_790", "src_text": "Right?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "事实上", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_689.wav", "doc_id": "oaOHnMCwad.seg_689", "src_text": "So prior work has suggested some anecdotal evidence of having positionality, such as cultural gaps and models and data sets, as well as theoretical definitions of model positionality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，首席工程师建议一些关于拥有位置性的证据，例如文化差距和模型和数据集，以及关于模型位置性的理论定义。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_619.wav", "doc_id": "oeooqChmKK.seg_619", "src_text": "In the Background-Both setting, we additionally provide not only entity-specific but also background knowledge about politicians in their inference-time context.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在两种背景中，我们不仅提供了非特异性的背景知识，而且还提供了关于在不同政治派系背景下的政治家的背景知识。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_431.wav", "doc_id": "hgIDlKNiFM.seg_431", "src_text": "Hi, I am Yanis Labrak and I will present you our works on \"DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical Domains.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "喂，我是Yanislavac，我将向您介绍我们关于Bert的作品：一个强大的英国模范在法语中用于生物医学和临床领域。", "score": 28.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_865.wav", "doc_id": "GvEBWkLmuI.seg_865", "src_text": "Furthermore, there's a lot of common tropes that are reflected in these words, especially for women of color.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "此外，许多常见的动作都反映在这些词中，尤其是对女性的颜色，", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_48.wav", "doc_id": "TVCREhgqUP.seg_48", "src_text": "My name is Matthias Lindemann, and today I'm going to give you a brief introduction to our paper on \"Compositional Generalization without Trees using Multiset Tagging and Latent Permutations\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": ",mynameisMathiasLindemann,andtodayIamgoingtogiveyouabriefintroductiontoourpaperoncompositionalgeneralizationwithouttreesusingmulti-celltaggingandlatentpermutations.", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_199.wav", "doc_id": "SLpqvupgvW.seg_199", "src_text": "For the recipes and books domain, we show some background text from Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对于食谱和书籍领域，我们展示一些来自维基百科的背景文本。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_513.wav", "doc_id": "dvGkKzmIaN.seg_513", "src_text": "Second, the watermark should not degrade the utility of the provided embeddings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二，水印不应降低提供的嵌入的实用性；", "score": 99.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_663.wav", "doc_id": "FLkGnzVRew.seg_663", "src_text": "We find that the proposed PRC strategy works better than other state-of-the-art strategies, although the difference is small.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "答：好的。我们发现该提出的PRC策略比其他当前最好的策略更好，尽", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_283.wav", "doc_id": "PIZEXUFLAR.seg_283", "src_text": "In addition, we randomly sample 20 tasks from the test split of natural instructions as an unseen task for NLP.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "随机从自然指令测试集中抽取100个任务作为未见任务（UnseenTasksforNLP）。因此，", "score": 42.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_385.wav", "doc_id": "WBLMIsdIrq.seg_385", "src_text": "This work was done in collaboration with Patrick Fernandes, Emmy Liu, André F. T. Martins, and Graham Neubig.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这项工作是与帕特里克·法纳赫、梅·尤、安德烈·F·马丁斯和格雷姆·纽维克合作完成的。", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是模型架构。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_249.wav", "doc_id": "oYCKgTzTDy.seg_249", "src_text": "We also compare the cross-language performance gap.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还可以比较跨语言性能差距", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_679.wav", "doc_id": "oaOHnMCwad.seg_679", "src_text": "Where prospective API is able to detect correctly toxic instances.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "rspectiveAPI能够正确检测有害的实例。", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_242.wav", "doc_id": "oYCKgTzTDy.seg_242", "src_text": "So, regarding analysis of monolingual models, we evaluate on two groups of models including Encoder-PTR which stands for Multilingual Pretrained Encoders with Pointer-based Decoders, such as XLM-R + PTR and mBERT + PTR.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "关于单语言模型的分析，我们评估了两个模型组。包括编码器-解码器（Encoder-Decoder）模型，例如Transformer-XL+PDFR、BERT+PDFR等", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_536.wav", "doc_id": "dvGkKzmIaN.seg_536", "src_text": "Meanwhile, we also apply KS test and use its p-value as the third metric.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "同时，我们还应用K-S测试，并使用其p值作为第三个矩阵。", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_121.wav", "doc_id": "uZBWfYjYnf.seg_121", "src_text": "Thanks for your attention.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "感谢您的关注。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_90.wav", "doc_id": "TVCREhgqUP.seg_90", "src_text": "We approximate this with a GPU-friendly continuous relaxation that also allows us to backpropagate through the solution and learn the linguistically more plausible permutations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们用一个友好的GPU友好的连续放松来近似这个问题，这也允许我们通过解决方案进行反向传播，并学习更合理的语言.permutation", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_99.wav", "doc_id": "uZBWfYjYnf.seg_99", "src_text": "For example, training a model with an average of one second latency and another one with two seconds latency, and so on.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如训练一个平均延迟为一秒的模型，另一个平均延迟为两秒的模型等等。因此，", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_274.wav", "doc_id": "PIZEXUFLAR.seg_274", "src_text": "For investigating multi-modal instruction tuning on our proposed dataset, we take OFA, a unified multi-modal pre-trained model, as our base model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了研究我们提出的数据集上的多模态指令调节，我们将OFA（统一的多模态模式）作为我们的基本模型。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_33.wav", "doc_id": "aQpIWggfCo.seg_33", "src_text": "Thus, we follow the idea of symbolic knowledge distillation, to distil constrained language planning datasets from large language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们遵循符号知识蒸馏的想法，将受限语言规划数据集蒸馏到轻量级语言模型中，", "score": 69.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_376.wav", "doc_id": "gGbuDbHhyc.seg_376", "src_text": "For example, report if the model selection is done via clean validation samples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，报告模型选择是否使用清洁验证样本；", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_585.wav", "doc_id": "rISrKoXQCx.seg_585", "src_text": "So it's kind of like the electric trolley problem.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以这就像电力电池问题。", "score": 30.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_771.wav", "doc_id": "WTTtiRKFZI.seg_771", "src_text": "Hi, my name is Adam Przepiórkowski and this talk is about the Dependency Structure of Coordination.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我叫亚当·施瓦尔科夫斯基，这次谈话是关于协调机制的依赖结构。", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_126.wav", "doc_id": "wLqFAuDnKa.seg_126", "src_text": "At the time of publication, it achieved state-of-the-art in hundreds of NLP tasks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "Nadu的出版业在数百个NRP任务中达到了艺术的顶峰。", "score": 31.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_642.wav", "doc_id": "FLkGnzVRew.seg_642", "src_text": "High cognitive dissonance is also related to anxiety disorders and can help understand people's mental health better.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "高认知不协调也与焦虑障碍有关，研究不协调表达的语言也可以更好地了解人们的心理健康。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_356.wav", "doc_id": "gGbuDbHhyc.seg_356", "src_text": "Second, if clean data is required, or if clean data is mandatory for WSL to work, then how many clean samples do we need?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二，如果需要清洁数据或清洁数据是WSSL工作所必需的，那么我们需要多少个清洁样本？", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_750.wav", "doc_id": "XejEJmgUmE.seg_750", "src_text": "And we can do the same for unacceptability case.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对不可接受的情况做同样的事情。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_445.wav", "doc_id": "hgIDlKNiFM.seg_445", "src_text": "Is it 4 gigabytes, 8 gigabytes, or more?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "是4GB、8GB或更多？", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_279.wav", "doc_id": "PIZEXUFLAR.seg_279", "src_text": "Ok, now I'm going to talk about multi-modal instruction tuning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "好，现在我要谈谈多模态指令调优。因此，", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_556.wav", "doc_id": "rISrKoXQCx.seg_556", "src_text": "So specifically, we first proposed to prompt language models with different prompt formats using the political questionnaires such as the political conference test.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以我们首先提出使用不同促进格式的语言模型，使用政治问题等政治问答测试（例如政治复", "score": 41.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_382.wav", "doc_id": "gGbuDbHhyc.seg_382", "src_text": "Thank you and enjoy the conference.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "谢谢您和我们一起参加会议。", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_12.wav", "doc_id": "aQpIWggfCo.seg_12", "src_text": "As shown in the table, we extend the abstract goals with multi-faceted constraints for human-in-the-loop data acquisition using InstructGPT.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如表所示，我们扩展抽象目标，通过多阶段约束，对于人类在LUG数据采集中使用instrgpt。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_51.wav", "doc_id": "TVCREhgqUP.seg_51", "src_text": "In the context of semantic parsing, testing for compositional generalization might look like this.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在语义分析的背景下，测试组合的普遍化可能会像这样进", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_510.wav", "doc_id": "dvGkKzmIaN.seg_510", "src_text": "To protect the copyright of embedding as services, one of the solutions is to embed a watermark in the provider service and detect whether another service contain the watermark.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "保护嵌入式服务的版权的一种方法是在提供服务中嵌入水印，并检测另一个服务是否包含水印。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_133.wav", "doc_id": "wLqFAuDnKa.seg_133", "src_text": "The prompting has a big influence on the performance of the LLMs for translation, as we can see in a simple experiment, where we used one-shot prompting and provided two different prompts for each sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "ting)对于翻译任务的性能有很大的影响。如我们在一个简单的实验中所见，我们使用了一次提示，并为一句话提供了两个不同的提示：", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_375.wav", "doc_id": "gGbuDbHhyc.seg_375", "src_text": "First, report the model selection criteria.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，报告模型选择标准，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_505.wav", "doc_id": "dvGkKzmIaN.seg_505", "src_text": "Currently, large language models such as GPT, LLAMA, PALM are exceptional in natural language understanding and generation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "目前，大型语言模型，如tpT，lama，palm在自然语言理解和生成方面是例外的。", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_4.wav", "doc_id": "aQpIWggfCo.seg_4", "src_text": "And show that large language models can effectively decompose goals into steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "槛”），并表明大型语言模型可以有效地分解目标为步骤。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_668.wav", "doc_id": "FLkGnzVRew.seg_668", "src_text": "However, the annotators also find the examples difficult.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "分比，但注释者也发现了困难的例子。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_842.wav", "doc_id": "GvEBWkLmuI.seg_842", "src_text": "To capture these patterns, our method has two parts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为了捕获这些模式，我们的方法有两部分：", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_532.wav", "doc_id": "dvGkKzmIaN.seg_532", "src_text": "Back door data set contains sentences of which all words belong to the trigger set while all words in the sentences of benign data set do not belong to the trigger sets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "后门数据集包含其所有单词都属于触发器集的句子，而所有单词在善意数据集的句子中都不是触发器集的成员。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_753.wav", "doc_id": "XejEJmgUmE.seg_753", "src_text": "So how does the model do?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "那么模型是如何做到的呢？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_817.wav", "doc_id": "WTTtiRKFZI.seg_817", "src_text": "Here we have coordination of two verbs and there's no outsides, external governor.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们有两个词中心现在没有外部总督，所以", "score": 12.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_255.wav", "doc_id": "oYCKgTzTDy.seg_255", "src_text": "For example, Encoder-Decoder outperforms previous work or achieves comparable results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，编码器-解码器可以超越进度工作或获得可比结果，", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_159.wav", "doc_id": "SLpqvupgvW.seg_159", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "嗨，", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_572.wav", "doc_id": "rISrKoXQCx.seg_572", "src_text": "For example, for hate speech detection, left-leaning language models are better at detecting hate speech targeting socially minority groups, however are worse at detecting hate speech targeting more powerful groups in our society.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，针对仇语检测，左倾语言模型更好。在检测针对社会少数群体的仇恨言论然而，我们最好是检测仇恨言论，针对我们社会中更有权势的群体。相", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_418.wav", "doc_id": "WBLMIsdIrq.seg_418", "src_text": "We then use the MuDA tagger, by applying the tagger on a parallel corpus that we want to use for evaluation and we apply our translation metrics of choice on the context-dependent examples that the MuDA tagger has identified.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然后我们使用Muda标签器，通过将标签应用于我们想用于评估的平行语料来使用标签器。然后我们将我们的翻译指标应用于模态标记器识别的上下文相关示例。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_550.wav", "doc_id": "rISrKoXQCx.seg_550", "src_text": "According to a survey of the C4 Corpus, we can see that New York Times, Los Angeles Times, The Guardian, Huffington Post, etcetera are well covered in language model training data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "的调查，我们可以看到纽约时报、洛杉矶时报、加德因报、哈芬顿邮报等政治新闻媒体在其预备训练数据中都被涵盖。", "score": 64.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_91.wav", "doc_id": "TVCREhgqUP.seg_91", "src_text": "If you want to learn more about our experiments and how we address these challenges, please have a look at our paper or come to our poster.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果你想了解更多关于我们的实验和我们如何解决这些挑战的信息，请看看我们的论文或来看看我们的海报。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_397.wav", "doc_id": "WBLMIsdIrq.seg_397", "src_text": "To answer the first question, we started by measuring how much a word depends on context during translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了回答第一个问题，我们从测量一个单词在翻译过程中依赖于上下文的程度开始。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_373.wav", "doc_id": "gGbuDbHhyc.seg_373", "src_text": "Their performance gain and practicality are heavily overestimated.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "它们的性能增益和实用性都被严重低估了。", "score": 28.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_486.wav", "doc_id": "SUkmfOTvGi.seg_486", "src_text": "The second hypothesis is temporal drift which is the performance degradation that is caused by the increasing temporal gap between the train and the test data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第二种假设是时间漂移，这是由于训练数据和测试数据之间的时间差异增加导致的性能退化。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_560.wav", "doc_id": "rISrKoXQCx.seg_560", "src_text": "We can also see that GPT-4 is the most liberal language model of them all, and GPT series are generally more socially liberal than BART series and its variants.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们也可以看到，GPT4是我们中最自由的语言模型，GPT理论通常比Bert理论及其变体更具社会自由。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_781.wav", "doc_id": "WTTtiRKFZI.seg_781", "src_text": "So, we get some dependencies from end to all the conjuncts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_3.wav", "doc_id": "aQpIWggfCo.seg_3", "src_text": "Previous work has exploited language models to plan for abstract goals of stereotypical activities such as \"make a cake\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "以前的研究利用语言模型来规划抽象目标的刻板活动，如“做一个蛋糕”", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_8.wav", "doc_id": "aQpIWggfCo.seg_8", "src_text": "An abstract goal can be inherited by different real-life specific goals with multi-faceted constraints.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "一个抽象的目标可以通过多个具有多个具体约束的现实生活中的具体目标来继承。", "score": 87.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_504.wav", "doc_id": "dvGkKzmIaN.seg_504", "src_text": "Let's first introduce the background about embedding as services.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "先介绍嵌入式服务的背景。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_543.wav", "doc_id": "dvGkKzmIaN.seg_543", "src_text": "That's all.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "不客气，谢谢你！我", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_259.wav", "doc_id": "oYCKgTzTDy.seg_259", "src_text": "And our results show many interesting findings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "结果显示了许多有趣的发现等", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_567.wav", "doc_id": "rISrKoXQCx.seg_567", "src_text": "We separately pretrain language models on the two different temporal corpora.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们分别在两个不同的暂时军团上使用预备军语言模型。我们", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_478.wav", "doc_id": "SUkmfOTvGi.seg_478", "src_text": "The first one is the model architecture.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是模型架构。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_37.wav", "doc_id": "aQpIWggfCo.seg_37", "src_text": "This figure shows the constraint distribution of CoScript.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这个图表显示了coscript的约束分布，", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_710.wav", "doc_id": "oaOHnMCwad.seg_710", "src_text": "So for the GPT 4 social acceptability analysis, we find that it's most aligned to confucian and English speaking countries.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "G4的社会可接受性分析中，我们发现它最匹配的是孔教和英语国家。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_105.wav", "doc_id": "uZBWfYjYnf.seg_105", "src_text": "Our solution is to propose EDAtt, or Encoder-Decoder Attention, and it is a strategy for which we decide whether to emit or not a partial translation, based on where attention points to.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的解决方案是提议“适应”或“编码”注意力，并且这是一个策略，我们决定是否要发出或不发出部分翻译，基于注意力点。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_464.wav", "doc_id": "SUkmfOTvGi.seg_464", "src_text": "Today I'm going to present our paper Do CoNLL-2003 named entity taggers still work well in 2023?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "今天我要向大家介绍我们的论文。Cornell2003年命名的实体标签器是否在2023年仍然有效？", "score": 69.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_715.wav", "doc_id": "oaOHnMCwad.seg_715", "src_text": "An example of this is that datasets and models are less aligned to non binary people compared to the men and women counterparts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，数据集模型在非二元人群（如非男性和非女性人群）方面的表现与男性和女性人群的表现相比不尽相同。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_28.wav", "doc_id": "aQpIWggfCo.seg_28", "src_text": "With our method, InstructGPT can generate scripts of higher quality.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通过我们的方法，敏感性可以产生高质量的痕迹；", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_698.wav", "doc_id": "oaOHnMCwad.seg_698", "src_text": "Our frame is largely enabled through Lab in the Wild and online crowdsourcing platform for where HCI collaborator.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的框架主要是通过LabintheWild，一个在线众包平台，一个前HCI合作伙伴，来启用。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_465.wav", "doc_id": "SUkmfOTvGi.seg_465", "src_text": "Let's get started.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "让我们开始吧。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_116.wav", "doc_id": "uZBWfYjYnf.seg_116", "src_text": "These are all the results of the simultaneous speech translation strategy on German.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这些都是基于德语的同时语音翻译策略的结果。", "score": 96.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_286.wav", "doc_id": "PIZEXUFLAR.seg_286", "src_text": "Each instance is randomly combined with one of its five instruction templates.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "每个实例随机与其五个指令模板之一结合。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_418.wav", "doc_id": "WBLMIsdIrq.seg_418", "src_text": "We then use the MuDA tagger, by applying the tagger on a parallel corpus that we want to use for evaluation and we apply our translation metrics of choice on the context-dependent examples that the MuDA tagger has identified.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后我们使用mudatagger应用标签在我们要用于评估的平行语料库上，并在mudatager识别的语境相关示例上应用我们的选择转换矩阵。", "score": 43.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_750.wav", "doc_id": "XejEJmgUmE.seg_750", "src_text": "And we can do the same for unacceptability case.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "而我们可以对不接受性情况做同样的事情。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_249.wav", "doc_id": "oYCKgTzTDy.seg_249", "src_text": "We also compare the cross-language performance gap.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还比较了跨语言性能差距。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_153.wav", "doc_id": "wLqFAuDnKa.seg_153", "src_text": "So, in particular, the most common errors are omission errors.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "特别是，最常见的错误是遗漏错误。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_25.wav", "doc_id": "aQpIWggfCo.seg_25", "src_text": "We convert scripts and goals into InstructGPT embeddings and calculate the cosine similarity as similarity scores to measure semantic similarity.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们将脚本和符号转换为引入的GPT嵌入，并计算符号相似性和相似性分数，以衡量语义相似性。", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_125.wav", "doc_id": "wLqFAuDnKa.seg_125", "src_text": "It's trained on a large collection of text, comprising 780 billion tokens.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "它在包含7.8亿个令牌的大型文本集合上进行了训练。在其", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_643.wav", "doc_id": "FLkGnzVRew.seg_643", "src_text": "Studying dissonance expressed in language can also be beneficial in understanding extremism and polarization of vulnerable groups.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "研究不协调表达的语言还可以有助于了解极端主义和脆弱群体的极化。", "score": 63.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_268.wav", "doc_id": "PIZEXUFLAR.seg_268", "src_text": "Additionally, at the time of our research, we discovered a considerable discrepancy in the availability of instructional datasets between NLP and multi-modal.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "此外，在我们研究的时期，我们发现了LP和多模态之间可用指令数据集的可用性存在相当大的差异：", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_525.wav", "doc_id": "dvGkKzmIaN.seg_525", "src_text": "In watermark injection, we first define a target embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在水印注入中，我们首先定义一个目标嵌入。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_289.wav", "doc_id": "PIZEXUFLAR.seg_289", "src_text": "If the task is a multi-model classification task, we report accuracy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果任务是多模态分类任务，我们报告准确率；", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_781.wav", "doc_id": "WTTtiRKFZI.seg_781", "src_text": "So, we get some dependencies from end to all the conjuncts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以，我们从合同的末尾获得了一些依赖关系。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_527.wav", "doc_id": "dvGkKzmIaN.seg_527", "src_text": "The provided embedding is a weight summation of the target embedding and the original embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "提供的嵌入是目标嵌入和原始嵌入的加权和。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_243.wav", "doc_id": "oYCKgTzTDy.seg_243", "src_text": "And, we also evaluate Encoder-Decoder models, which is Multilingual Pretrained Encoder-Decoder Models, such as mBART and mT5.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，以及编码器-解码器模型，例如MBART和MT5等。", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_100.wav", "doc_id": "uZBWfYjYnf.seg_100", "src_text": "So what is our solution?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "那么，我们的解决方案是什么？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_614.wav", "doc_id": "oeooqChmKK.seg_614", "src_text": "Lastly, the \"Background-Inference\" setting, where both knowledge types are available only at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "最后，背景前景设置。上述知识类型仅在推理时间可用。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_731.wav", "doc_id": "XejEJmgUmE.seg_731", "src_text": "This is a joint work with John Gauthier, Aaron Mueller, Kanishka Misra, Karen Fences, Roger Levy, and Adina Williams.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这是与JohnGaultier、AaronMueller、KanishkaMisha、KarenFuentes、RogerLevey和AdinaWilliams共同工作的。因此，", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_286.wav", "doc_id": "PIZEXUFLAR.seg_286", "src_text": "Each instance is randomly combined with one of its five instruction templates.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "每个实例随机组合其中五个指令模板。因此，在测试", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_427.wav", "doc_id": "WBLMIsdIrq.seg_427", "src_text": "We also compared different commercial systems and our benchmark shows that DeepL is usually more accurate than Google Translate for document-level translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还比较了不同的商业系统，我们的基准表明，迪贝尔通常比谷歌传输文档本地传输更准确。", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_630.wav", "doc_id": "oeooqChmKK.seg_630", "src_text": "If you're interested in more details, please see our paper and check out the data set and code on GitHub.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如您有兴趣了解更多细节，请看我们的论文，并在GitHub上查看数据集和代码。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_547.wav", "doc_id": "rISrKoXQCx.seg_547", "src_text": "Today I'm presenting our work \"From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "天我要介绍我们的工作，从预训练数据到语言模型到下游任务，追踪导致不公平和不准确的NLP模型的政治偏见的踪迹。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_730.wav", "doc_id": "XejEJmgUmE.seg_730", "src_text": "Language model acceptability judgments are not always robust to context.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "语言模型可接受度判断在语境中并不总是坚定的。", "score": 99.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_476.wav", "doc_id": "SUkmfOTvGi.seg_476", "src_text": "So what is needed for a good generalization?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以，什么是好的泛化？", "score": 50.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_578.wav", "doc_id": "rISrKoXQCx.seg_578", "src_text": "So this has sound the alarm for us to acknowledge and tackle the fairness issues resulting by language model political leanings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地疯狂地�我们研究了语言模型的政治倾向如何导致了不平等问题。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_232.wav", "doc_id": "oYCKgTzTDy.seg_232", "src_text": "And we'll also test Monolingual Model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还将测试单语言模型，", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_795.wav", "doc_id": "WTTtiRKFZI.seg_795", "src_text": "So both these sentences are fine.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "两个句子都很好，所以", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_700.wav", "doc_id": "oaOHnMCwad.seg_700", "src_text": "Compared to the platforms like M Turk which largely have participants from the US or India and further Lab in the Wild still is able to get high quality data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "类似于来自美国和印度的EmTurk的平台，其他在世界上的实验室仍然能够获得高质量的数据。", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_68.wav", "doc_id": "TVCREhgqUP.seg_68", "src_text": "Our approach predicts the output from the input in two steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的方法预测输入信号的输出信号在两个步骤中", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_205.wav", "doc_id": "SLpqvupgvW.seg_205", "src_text": "The AltEntities Corpus has 6,000 alternative questions across three domains, and it has 42,000 indirect referring expressions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "；身份识别库包含6,000个跨域的替代问题，并包含42,000个间接引用表达式；", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_148.wav", "doc_id": "wLqFAuDnKa.seg_148", "src_text": "And their results so a better performance when using the dev data.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "使用这些数据时的性能更好。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_672.wav", "doc_id": "FLkGnzVRew.seg_672", "src_text": "Feel free to get in touch with us if you have any questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果您有任何问题，请随时与我们联系。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_163.wav", "doc_id": "SLpqvupgvW.seg_163", "src_text": "Consider this alternative question.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "考虑这个替代问题：", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_179.wav", "doc_id": "SLpqvupgvW.seg_179", "src_text": "In the second speech bubble, Alice says, \"Do you mean 'Easy on Me' or 'I Gotta Feeling'?\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在第二个发言泡泡中，爱丽丝说：“你是指我容易对付吗，还是我有感觉？”", "score": 74.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_127.wav", "doc_id": "wLqFAuDnKa.seg_127", "src_text": "In this work, we present the first systematic study of large language model prompting for machine translation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在本研究中，我们首次系统地研究了大型语言模型促进机器翻译。我们评估了", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_297.wav", "doc_id": "PIZEXUFLAR.seg_297", "src_text": "So we also did one experiment.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还进行了一次实验：", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_54.wav", "doc_id": "TVCREhgqUP.seg_54", "src_text": "And \"Mary knew that the girl slept.\"", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "玛丽知道女孩睡着了。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_758.wav", "doc_id": "XejEJmgUmE.seg_758", "src_text": "So here we are choosing or creating sentences from acceptable and unacceptable domains from the same BLiMP or SyntaxGym dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以，我们从可接受的和不可接受的域中选择或创建句子，", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_383.wav", "doc_id": "WBLMIsdIrq.seg_383", "src_text": "Hello, my name is Kayo Yin and I will be presenting our work titled \"When Does Translation Require Context?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我是KyaoYen，今天我将会介绍我们的题目：《翻译需要什么样的背景？", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_793.wav", "doc_id": "WTTtiRKFZI.seg_793", "src_text": "Because then it can be moved to the position after the adjunct.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，昨天我读了这本书，昨天我读了这本书，昨天我读了", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_694.wav", "doc_id": "oaOHnMCwad.seg_694", "src_text": "The first step is to re annotate data sets with diverse annotators.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第一步是使用不同的注音器重新注音数据集", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_609.wav", "doc_id": "oeooqChmKK.seg_609", "src_text": "Generally, background knowledge is learned during the pretraining of large language models, while entity-specific knowledge is typically observed at inference time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "一般来说，背景知识是在大型语言模型的预训练过程中学习的，而实体特定知识通常是在推理时观察到的；", "score": 95.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_19.wav", "doc_id": "aQpIWggfCo.seg_19", "src_text": "The heat map in the figure shows that the planning performance of InstructGPTs varies considerably for goals of different categories.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "图中的头图显示，教科书的规划表现对于不同类别的女孩来说会有很大的差异。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_458.wav", "doc_id": "hgIDlKNiFM.seg_458", "src_text": "Which is not the case for the model based on CamemBERT weights and tokenizer, which suffer from stability issues.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "是基于卡曼伯尔白人和托肯海尔的模型的案例，这些模型都面临稳定问题。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_210.wav", "doc_id": "SLpqvupgvW.seg_210", "src_text": "For example, when the language model retrieves the background knowledge.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如，当语言模型检索背景知识时，", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_318.wav", "doc_id": "dJGfOSFgZO.seg_318", "src_text": "Our approach attempts to reduce the subjectivity of human evaluation by explicitly annotating whether or not each model response expresses certain behaviors, such as responding with irrelevant information or contradicting itself.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的方法试图通过明确指出每个模型响应是否表达某些行为来减少人类评估的主观性，例如对与否相关信息的反应或自相矛盾。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_724.wav", "doc_id": "oaOHnMCwad.seg_724", "src_text": "You know, all technologies work for everyone.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "为所有人工作，而", "score": 41.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_409.wav", "doc_id": "WBLMIsdIrq.seg_409", "src_text": "We then look at vocabulary items that have high P-CXMI averaged over all of its different occurrences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "项，它们在所有不同情况下的高频率平均值较高，", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_562.wav", "doc_id": "rISrKoXQCx.seg_562", "src_text": "So we could conduct a controlled experiment by further pretraining language model checkpoints on 6 different partisan corpora separated into news and social media, further divided into their political leaning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们可以通过进一步培训语言模型检查点来进行控制实验，分开六个不同的公司和组织，分散在新闻和社交媒体中，并进一步分散在他们的政治联系中。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_879.wav", "doc_id": "GvEBWkLmuI.seg_879", "src_text": "Have a good time at ACL.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "说话，我在伊斯坦布尔度过了美好的时光。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_606.wav", "doc_id": "oeooqChmKK.seg_606", "src_text": "The resolution of a given pronoun requires two types of information.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "给定名词的决策需要两种信息：", "score": 48.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_736.wav", "doc_id": "XejEJmgUmE.seg_736", "src_text": "And then the hope is that the model, basically, puts more probability to the acceptable sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后希望是模型基本上会使可接受的状态有更大的可能性。", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_248.wav", "doc_id": "oYCKgTzTDy.seg_248", "src_text": "I think this is known as the \"Curse of Multilinguality\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我认为这被称为多语言的诅咒。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_219.wav", "doc_id": "oYCKgTzTDy.seg_219", "src_text": "As shown in this figure, we need to translate the query in multiple natural languages using neural models to SQL, Lambda or FunQL, and etcetera.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如图所示，我们需要将查询语句翻译成多种自然语言，使用新模型：2、ceq、lambda或fql等。", "score": 41.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_59.wav", "doc_id": "TVCREhgqUP.seg_59", "src_text": "In particular, they often fail to reproduce the systematic correspondences between input and output, such as those that are color-coded in the example.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "特别是，他们经常无法在输入和输出之间产生系统的相互关联，例如在示例中被彩色编码的那些。", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_767.wav", "doc_id": "XejEJmgUmE.seg_767", "src_text": "So, the key takeaways of our work is that language models are sensitive to latent syntactic and semantic features which are shared across the sentences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以我们的工作的关键之处在于语言模型对句子中隐含的语法和语义特征敏感。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_410.wav", "doc_id": "WBLMIsdIrq.seg_410", "src_text": "And this helps us identify cases like the one here, where in Chinese you need context to translate proper nouns to make sure that you're using the same translation within the document.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这有助于我们识别类似于这里的案例，在中文中需要上下文来翻译正确的名词，以确保您在文档中使用相同的翻译。", "score": 82.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_187.wav", "doc_id": "SLpqvupgvW.seg_187", "src_text": "Where A and B are samples from Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "A和B是来自维基百科的样本。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_223.wav", "doc_id": "oYCKgTzTDy.seg_223", "src_text": "The Lambda calculus is missing, or they're only evaluated on certain neural models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "为了解决这", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_870.wav", "doc_id": "GvEBWkLmuI.seg_870", "src_text": "And while it sounds positive at first glance, there's been work showing that this kind of archetype actually is very harmful because it puts a lot of pressure on these demographics to be resilient and strong against societal obstacles.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "起来在第一次听起来很积极。已经有工作表明这种类型的原型实际上是非常有害的，因为它会对这些人口统计学施加很多压力，使其能够对社会障碍非", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_612.wav", "doc_id": "oeooqChmKK.seg_612", "src_text": "First, we have the typical setting: \"Background-Pretrain\", where background knowledge is assumed to be available at pretrain time.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先是典型的设置：背景预备训练；在预备训练时，假定背景知识可用。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_399.wav", "doc_id": "WBLMIsdIrq.seg_399", "src_text": "And this is done by measuring how much information the context C provides about the target Y, given the source X. You can think of CXMI as the information gained from giving context to the model.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通过测量联系人C提供关于目标Y的信息来衡量多少信息。你可以认为CSMI是从给模型联系而获得的信息。", "score": 28.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_457.wav", "doc_id": "hgIDlKNiFM.seg_457", "src_text": "However, our experiment on control pre-training using the weight and tokenization of CamemBERT trained on the four GB subset of NACHOS showed comparable results to those obtained with DrBERT 4 GB from-scratch.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，我们的实验和持续训练，使用帕默特伯特的重量和托肯化器，训练在纳托斯的4GB子集上，显示出与伯特博士从斯克拉奇获得的结果相似的结果。这不", "score": 26.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_226.wav", "doc_id": "oYCKgTzTDy.seg_226", "src_text": "We provide a uniform data set XSemPLR for cross-lingual semantic parsing in multiple natural languages and meaning representations.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们提供了一个统一的数据集，用于跨语言语义分析的多个自然语言和多个表示。它包含我们的模型", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_754.wav", "doc_id": "XejEJmgUmE.seg_754", "src_text": "So first, we look at the Wikipedia sentences, which are completely irrelevant to the current query pair, and there we find that the MPP judgments are mostly robust for arbitrary context length.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，我们看了维基百科的句子，这些句子与当前的询问对完全无关，然后我们发现MPD的判断对于自主语音线路非常强大。", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_610.wav", "doc_id": "oeooqChmKK.seg_610", "src_text": "We vary the availability of these two pieces of information such that it may either be found in a single source, or in multiple sources.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们改变这些两块信息的可用性，使得它们可能在单一来源或多个来源中找到。", "score": 81.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_383.wav", "doc_id": "WBLMIsdIrq.seg_383", "src_text": "Hello, my name is Kayo Yin and I will be presenting our work titled \"When Does Translation Require Context?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我是凯奥恩，我将介绍我们的作品标题：当需要翻译所需的联系人时，", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_188.wav", "doc_id": "SLpqvupgvW.seg_188", "src_text": "Here are the different sampling methods we've used.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这里，我们使用的不同采样方法：", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_166.wav", "doc_id": "SLpqvupgvW.seg_166", "src_text": "The most obvious thing is to use a direct reference, for example by saying the name of the song \"Easy on Me\" or its position, \"the first one\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最明显的方法是使用直接引用，例如说“歌曲的名字是IsyaneMe”，或其位置（第一个）。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_44.wav", "doc_id": "aQpIWggfCo.seg_44", "src_text": "We hope the CoScript dataset can be a valuable resource to advance research on language planning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们希望CoScript数据集可以成为语言规划研究的宝贵资源。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_712.wav", "doc_id": "oaOHnMCwad.seg_712", "src_text": "We also find most additional alignment with people who have a college education.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还发现，更多的人与有大学教育的人有更多的联系。", "score": 37.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_879.wav", "doc_id": "GvEBWkLmuI.seg_879", "src_text": "Have a good time at ACL.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "希望您在ACL有一个愉快的时间。", "score": 73.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_714.wav", "doc_id": "oaOHnMCwad.seg_714", "src_text": "However, when models and data sets are aligned to specific populations, some are inevitably left behind.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "然而，当模型和数据集与特定人群相关时，然而，某些方面的信息不可避免地被遗漏，", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_677.wav", "doc_id": "oaOHnMCwad.seg_677", "src_text": "So let's start off by imagining that you're working for a newspaper and you're sifting through comments under your news article trying to remove toxic content.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以让我们先想象一下，你正在为一家报纸工作，正在浏览新闻文章的评论，试图去除有害内容。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_527.wav", "doc_id": "dvGkKzmIaN.seg_527", "src_text": "The provided embedding is a weight summation of the target embedding and the original embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "提供的嵌入是目标嵌入和原始嵌入的加权和。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_225.wav", "doc_id": "oYCKgTzTDy.seg_225", "src_text": "So to this end we propose XSemPLR.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，我们提出了Exemplar：", "score": 74.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_317.wav", "doc_id": "dJGfOSFgZO.seg_317", "src_text": "However, we believe there is a more precise and reliable strategy for dimensional dialogue evaluation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，我们认为这是对尺寸对话评估的更精确和可靠的策略。", "score": 65.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_859.wav", "doc_id": "GvEBWkLmuI.seg_859", "src_text": "And in fact, this lexicon doesn't really capture many of the harmful patterns that we saw in the earlier slides well at all.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "事实上，这个词汇表并没有真正捕捉到我们在早期研究中看到的许多有害模式，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_864.wav", "doc_id": "GvEBWkLmuI.seg_864", "src_text": "This contributes to a long legacy of discrimination and othering for these groups.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这导致了这些群体长期遭受歧视和其他形式的压迫。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_647.wav", "doc_id": "FLkGnzVRew.seg_647", "src_text": "Tweets were passed using the PDTB parser, and pairs of discourse units were annotated according to the guidelines that are described in our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "使用PTV解析器通过Twitter发送的消息和一些对话单元被注释为根据我们文件中的指南。", "score": 36.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_521.wav", "doc_id": "dvGkKzmIaN.seg_521", "src_text": "Watermark injection and copyright verification.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "水印注入和版权实施。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_240.wav", "doc_id": "oYCKgTzTDy.seg_240", "src_text": "So during training, we train it on English queries or the combination of English and German Few-shot queries to train a multilingual model to predict the SQL output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在训练期间，我将在英语询问中训练，或在英语和德语询问的结合中训练，以训练多语言模型，并预测TEC输出。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_14.wav", "doc_id": "aQpIWggfCo.seg_14", "src_text": "This table reports the overall accuracy of the results.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "该表报告了结果的总体准确性。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_834.wav", "doc_id": "GvEBWkLmuI.seg_834", "src_text": "To overcome these limitations, we rely on the property that these newer instruction-tuned LLMs are very good at responding to instructions and prompts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "如果我们克服这些限制，我们可以依靠这些新的指令集（ELMs）很好地响应指令和促。", "score": 70.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_538.wav", "doc_id": "dvGkKzmIaN.seg_538", "src_text": "We assume the provider apply wiki text data set to count word frequency.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们假设提供者应用了维基文本数据集来计算词频。", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_258.wav", "doc_id": "oYCKgTzTDy.seg_258", "src_text": "We conduct a comprehensive benchmark study on three representative types of multilingual language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们对三种代表性的多语言模型类型进行了全面基准测试研究，并且", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_224.wav", "doc_id": "oYCKgTzTDy.seg_224", "src_text": "For example, there's only one single model to evaluate them.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "个问题", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_259.wav", "doc_id": "oYCKgTzTDy.seg_259", "src_text": "And our results show many interesting findings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们的结果显示了许多有趣的发现。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_359.wav", "doc_id": "gGbuDbHhyc.seg_359", "src_text": "First, we find that, interestingly, recent WSL methods indeed require clean validation samples to work properly.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，我们发现，最新的WSL方法确实需要清洁的验证样本才能正常工作；", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_87.wav", "doc_id": "TVCREhgqUP.seg_87", "src_text": "We address this by inducing the alignment as part of the training.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们通过将对齐作为训练的一部分来解决这个问题。", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_600.wav", "doc_id": "oeooqChmKK.seg_600", "src_text": "Here is an example from our data set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "联解释模型。从我们的数据集中，我们的一个例子是", "score": 28.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_76.wav", "doc_id": "TVCREhgqUP.seg_76", "src_text": "For the first output position, we simply select one, as highlighted in red.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "对于第一个输出位置，我们简单地选择一个，如红色所示。", "score": 97.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_352.wav", "doc_id": "gGbuDbHhyc.seg_352", "src_text": "We can't stop on this problem setting, but this implies that additional manual annotations are required in weakly supervised learning.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们对这种问题设置表示怀疑，因为这意味着在周日学习时需要额外的手动注释，", "score": 35.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_92.wav", "doc_id": "uZBWfYjYnf.seg_92", "src_text": "Hi, I'm Sara Papi from the University of Trento and Foundazione Bruno Kessler and I will briefly introduce the \"Attention as a Guide for Simultaneous Speech Translation\" paper, that is a joint work with Matteo Negri and Marco Turchi.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "1.2.本文与MatteoNegri和MarcoTurchi合作。同时语音翻译（SimultaneousSpeechTranslation，简", "score": 18.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_220.wav", "doc_id": "oYCKgTzTDy.seg_220", "src_text": "Existing cross-lingual semantic parsing models are separately proposed and evaluated on data set of limited tasks and applications.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "他存在。跨语言同义词解析模型分别提出了和评估在数据集的有限任务和应用程序上；", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_372.wav", "doc_id": "gGbuDbHhyc.seg_372", "src_text": "To summarize, we showed that recent WSL approaches require clean, manually annotated samples for them to work properly.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "总而言之，我们证明了最近的WSSL方法需要清洁的，手动注释的样本才能正常工作；", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_705.wav", "doc_id": "oaOHnMCwad.seg_705", "src_text": "We then compared these annotations with Dynahate, Perspective API, Rewire API, Hate Roberta and GPT 4.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然后我们将这些注释与DianaHeat,PerspectiveAPI,RewireAPI,HateRoberta和GPT4进行比较。", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_571.wav", "doc_id": "rISrKoXQCx.seg_571", "src_text": "So we see that if we investigate the per category performance, that is to say if we separate the performance into different demographics or political leaning of news media we can see a pattern.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "所以我们看到，如果我们研究每个类别的表现，那就是说，如果我们将表现分成不同的人口统计学或政治新闻媒体，我们可以看到一个模式，", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_441.wav", "doc_id": "hgIDlKNiFM.seg_441", "src_text": "However, French didn't have any open source model for biomedical until now.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "nch没有一个新的开源模型用于生物医学领域。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_553.wav", "doc_id": "rISrKoXQCx.seg_553", "src_text": "On the other hand, these different political opinions are inherently socially biased and might lead to potential fairness issues in downstream task applications.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "另一方面，这些不同的政治观点在本质上是社会偏见的，并且我认为它们在下游任务应用中可能会导致公平问题。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_266.wav", "doc_id": "PIZEXUFLAR.seg_266", "src_text": "However, most previous works on instruction tuning focused on improving the zero-shot performance on language only tasks, while computer vision and multi-modal tasks have been left out.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，大多数以前关于指令调节的工作都集中在提高单语言任务的零轴性能上，而没有考虑计算机视觉和多模态任务。", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_820.wav", "doc_id": "WTTtiRKFZI.seg_820", "src_text": "So we showed that by measuring length in characters, the first column, in syllables the middle column, and in words the right column.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们通过测量字符的长度来显示它是第一个单词的第一个字母，第一个单词的中间字母，和单词的正确字母。因此，我将集中", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_217.wav", "doc_id": "oYCKgTzTDy.seg_217", "src_text": "So, semantic parsing is a task to build semantic representations of user queries such as SQL and Lambda Calculus.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "务，建立用户查询的语义表示，例如，SQL和LambdaCalculus。", "score": 73.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_721.wav", "doc_id": "oaOHnMCwad.seg_721", "src_text": "Our third recommendation is to build specialised datasets and models within 4 specific communities.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第三个建议是建立特殊数据集和模型，", "score": 75.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_523.wav", "doc_id": "dvGkKzmIaN.seg_523", "src_text": "The trigger set is a group of words in a moderate frequency interval.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "触发器集是中频间隔内的一组词，", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_1.wav", "doc_id": "aQpIWggfCo.seg_1", "src_text": "I'm here to introduce our work \"Distilling Script Knowledge from Large Language Models for Constrained Language Planning\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我来介绍我们的工作：区分脚本知识和语言模型的语言规划。", "score": 44.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_280.wav", "doc_id": "PIZEXUFLAR.seg_280", "src_text": "So for the training dataset, we use 53 tasks from 9 groups for training and we sample 10,000 instances per task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们在训练数据集中使用了53个任务来进行训练，并对测试任务进行了10,000次采样", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_158.wav", "doc_id": "wLqFAuDnKa.seg_158", "src_text": "Thank you very much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "非常感谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_606.wav", "doc_id": "oeooqChmKK.seg_606", "src_text": "The resolution of a given pronoun requires two types of information.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "代词的解释需要两种类型的信息。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_835.wav", "doc_id": "GvEBWkLmuI.seg_835", "src_text": "So we can ask the model to generate a persona, which is a depiction of an imagined individual using a prompt like \"Imagine you are an Asian woman.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "因此，我们可以要求模型生成一个“人格”，即使用一个提示（如“想象你是一个亚洲女性，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_314.wav", "doc_id": "dJGfOSFgZO.seg_314", "src_text": "These approaches work well to provide holistic evaluations of overall dialogue quality, but dialogue quality has many aspects.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这些方法很好地提供了对整个对话质量的全局评估，但对话质量有很多方面，", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_313.wav", "doc_id": "dJGfOSFgZO.seg_313", "src_text": "The common practice is to use human evaluation, such as by asking human judges to select which of two conversations is better or to rate conversations given a Likert scale.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "普遍做法是使用人类评估，例如要求人类裁判者选择哪个对话更好，或要求对话评估给出一个利克特级别。", "score": 51.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_432.wav", "doc_id": "hgIDlKNiFM.seg_432", "src_text": "In this presentation, we first talk about language modeling in healthcare.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在演示会上，我们首先谈到了医疗保健语言模型，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_207.wav", "doc_id": "SLpqvupgvW.seg_207", "src_text": "If the language model has access to the exact same background knowledge as the annotators, then the accuracy is really high, it's around 92 to 95%.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如果语言模型有与注释者相同的背景知识，那么准确率就非常高，达到92%到95%。", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_240.wav", "doc_id": "oYCKgTzTDy.seg_240", "src_text": "So during training, we train it on English queries or the combination of English and German Few-shot queries to train a multilingual model to predict the SQL output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在训练中，我们在英语查询或英语和德语少样本查询的组合上进行训练，以训练多语言模型。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_467.wav", "doc_id": "SUkmfOTvGi.seg_467", "src_text": "We observe that models have been used in CoNLL-2003 to develop NER for almost 20 years and this naturally raises several problems.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们观察到，模型已经使用卡诺尔2003年开发的NRL模型开发NRL模型已经有20多年了，这自然会引起许多问题：", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_89.wav", "doc_id": "TVCREhgqUP.seg_89", "src_text": "That's because this is related to the \"Traveling Salesman\" problem.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是因为这与旅行推销员问题有关。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_822.wav", "doc_id": "WTTtiRKFZI.seg_822", "src_text": "What we see here is that when the governor is on the left, the tendency for the left conjunct to be shorter grows steadily, with the absolute difference in words, and the same is observed when there is no governor as in coordination of sentences.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越来越短，越", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_0.wav", "doc_id": "aQpIWggfCo.seg_0", "src_text": "Hi, I'm Siyu Yuan from Fudan University.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我是福德大学的CSYuan，", "score": 28.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_142.wav", "doc_id": "wLqFAuDnKa.seg_142", "src_text": "And when we go, as in our case, to five-shot prompting, there is nearly no difference to the actual form of the prompting.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "，但当我们像我们这样做到五次提示时，实际提示形式几乎没有区别。", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_573.wav", "doc_id": "rISrKoXQCx.seg_573", "src_text": "And vice versa, right-leaning language models are better at detecting hate speech targeting white and men, however worse at detecting hate speech targeting at black LGBTQ plus and other minority communities.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "反之亦然，正确的语言模型更好地检测白人和男性，然而更好地检测黑人、LGBTQ+和其他少数群体的言论。", "score": 48.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_365.wav", "doc_id": "gGbuDbHhyc.seg_365", "src_text": "But that's not the end of the story, because if we either way decide to access clean samples, then training on them directly will even achieve better performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "但这不是故事的结尾，因为如果我们任一方式决定访问干净的样本，那么直接在样本上训练甚至会获得更好的性能。", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_235.wav", "doc_id": "oYCKgTzTDy.seg_235", "src_text": "And we test Multilingual Model which we train one multilingual model for all languages.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们测试了一个多语言模型，这个模型我们训练了一个所有语言的多语言模型，", "score": 57.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_788.wav", "doc_id": "WTTtiRKFZI.seg_788", "src_text": "So in English, as you might know, direct objects prefer to be close to the verb, while adjuncts may be further away.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_511.wav", "doc_id": "dvGkKzmIaN.seg_511", "src_text": "The watermark method need to meet the following properties.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "水印方法需要满足以下属性：", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_554.wav", "doc_id": "rISrKoXQCx.seg_554", "src_text": "To this end, we propose to investigate the political bias propagation pipeline from pretraining data to language models to downstream tasks, specifically by asking the following questions: First, how do we evaluate the political leaning of language models and what role does pretraining data might have on such political biases?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，我们建议从收集数据、语言模型到下游任务的政治偏见传播管道中进行调查，特别是通过询问下游任务。首先，我们如何评估政治语言模型的倾向？这种政治偏见对我们什么价值？", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_420.wav", "doc_id": "WBLMIsdIrq.seg_420", "src_text": "First of all, when we use corpus-level metrics: so for BLEU, we find that context-agnostic models have the best performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，当我们使用语料级指标（如SILVER-BLUE）时，我们发现上下文无关模型表现最佳，", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_836.wav", "doc_id": "GvEBWkLmuI.seg_836", "src_text": "Describe yourself.\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这样的提示来描述自己。", "score": 53.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_191.wav", "doc_id": "SLpqvupgvW.seg_191", "src_text": "The second one is when the entities have similar titles, for example, two books with the name \"The Return\".", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第二种情况是当实体具有相似的名称，例如两个带有名称“德里特”的书籍。", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_498.wav", "doc_id": "SUkmfOTvGi.seg_498", "src_text": "And lastly, please make sure to check out our paper, our data set and if you have any questions, feel free to contact me.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，请务必检查我们的文件和数据集，如果您有任何问题，请随时联系我。", "score": 66.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_762.wav", "doc_id": "XejEJmgUmE.seg_762", "src_text": "So why does the match prefix affect the language model judgement so much?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以，为什么匹配前缀会对语言模型判断产生这么大的影响？所以", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_671.wav", "doc_id": "FLkGnzVRew.seg_671", "src_text": "These are the links to our core data set and our paper.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这些是我们代码数据集和我们的论文的链接。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_810.wav", "doc_id": "WTTtiRKFZI.seg_810", "src_text": "And, also the observation that was made in parsing that this tendency grows with length difference.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在传递中，趋势是随着长度的增长而增长的", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_586.wav", "doc_id": "rISrKoXQCx.seg_586", "src_text": "Ok, great.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "好的，", "score": 84.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_494.wav", "doc_id": "SUkmfOTvGi.seg_494", "src_text": "At the same time, we also found that the performance drop here is caused by temporal drift and kind of surprisingly, it is not caused by adaptive overfitting even though CoNLL-2003 has been used for over 20 years.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "同时，我们还发现这里的性能下降是由暂时偏离造成的，而且令人惊奇的是，这不是由于适应性过度调整造成的，尽管康纳尔2003年已经使用了20多年。", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_819.wav", "doc_id": "WTTtiRKFZI.seg_819", "src_text": "However, when the governor is on the right, as here, \"laughed\" governs the coordination Ted and Ned, this effect disappears.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "主语-谓语-宾语时，这种效果消失了。因此，我们通过测量句子中各个部分的长度来展示这一点：第一列是子句的中间列，第二列是子句的右列。我们将注意力集中在右列上，结果是当句法结构是主语-谓语-宾语时，这种效果消失了。", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_434.wav", "doc_id": "hgIDlKNiFM.seg_434", "src_text": "We introduce the first biomedical model in French named DrBERT, which is based on RoBERTa and trained on NACHOS, which is a data set of medical crawled data from the web.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们介绍了第一个法国生物医学模型，名为伯特医生，该模型基于罗伯塔，并在纳乔斯上训练，这是从网络上收集的医疗数据集。", "score": 68.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_841.wav", "doc_id": "GvEBWkLmuI.seg_841", "src_text": "And both of the women of color personas make references to ancestry while the white man persona has nothing of the sort.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "两位女性角色都提到祖先，而白人男性角色则什么也没有。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_789.wav", "doc_id": "WTTtiRKFZI.seg_789", "src_text": "So \"Marge read it yesterday\" is fine because the direct object is close to the verb, while \"Marge read yesterday it\" is much worse.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "，而adjuncts可能会被进一步远离，正如昨天的马克·雷迪特（MarcReddy）很好，因为直接对象被闭在词中。", "score": 31.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_818.wav", "doc_id": "WTTtiRKFZI.seg_818", "src_text": "In such cases, the left conjunct prefers to be shorter; the most of the biggest difference between the two conjuncts.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在这种情况下，左侧的协调应该更短。.然而，当右侧的句法结构是", "score": 25.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_176.wav", "doc_id": "SLpqvupgvW.seg_176", "src_text": "The cartoon has three speech bubbles.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这部动画片有三个对话泡泡。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_386.wav", "doc_id": "WBLMIsdIrq.seg_386", "src_text": "So a lot of translations depend on context.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此，许多翻译都依赖于语境；", "score": 98.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_229.wav", "doc_id": "oYCKgTzTDy.seg_229", "src_text": "The first one is Translate-Test.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一个是翻译测试，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_425.wav", "doc_id": "WBLMIsdIrq.seg_425", "src_text": "But these models are not much better than models that do not use context on other phenomena like ellipsis, pronouns, and verb form.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在其他现象，如省略、代词和动词形式方面并没有比不使用上下文的模型好多少。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_693.wav", "doc_id": "oaOHnMCwad.seg_693", "src_text": "Our framework works in two main steps.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们的框架分为两个主要步骤。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_69.wav", "doc_id": "TVCREhgqUP.seg_69", "src_text": "First, we tag each input token with an unordered multiset of tokens that will appear in the output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先，我们为每个输入令牌标记一个未排序的多个令牌集，这些令牌将出现在输出中。", "score": 66.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_140.wav", "doc_id": "wLqFAuDnKa.seg_140", "src_text": "We saw that the actual form of the prompting doesn't have a big influence in the case of several short promptings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们看到实际的提示形式在多次提示的情况下并没有太大的影响。", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_193.wav", "doc_id": "SLpqvupgvW.seg_193", "src_text": "And finally when they have similar info boxes or attributes on Wikipedia.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "最后，他们在维基百科上有相似的信息框或属性，", "score": 88.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_389.wav", "doc_id": "WBLMIsdIrq.seg_389", "src_text": "But if the previous sentence was \"Could it be anything serious, doctor?\", then \"mole\" refers to a birthmark.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果前一句话是‘如果而“more”也可以指“出生年份”，", "score": 8.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_756.wav", "doc_id": "XejEJmgUmE.seg_756", "src_text": "And we saw here in the orange dotted line, the MPP judgments are relatively stable.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "并在这里，在橙色线上，我们看到MPPT判断相对稳定。", "score": 61.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_663.wav", "doc_id": "FLkGnzVRew.seg_663", "src_text": "We find that the proposed PRC strategy works better than other state-of-the-art strategies, although the difference is small.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现该提议的PRC策略比其他艺术状态策略更好，尽管差异很小。", "score": 52.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_89.wav", "doc_id": "TVCREhgqUP.seg_89", "src_text": "That's because this is related to the \"Traveling Salesman\" problem.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这是因为这与旅行推销员问题有关。", "score": 94.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_374.wav", "doc_id": "gGbuDbHhyc.seg_374", "src_text": "Our concrete recommendations for future work are as follows.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们关于未来工作的具体建议如下。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_184.wav", "doc_id": "SLpqvupgvW.seg_184", "src_text": "The second one, which is the alternative question is generated as follows.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第二个，即替代问题，是通过以下方式生成的。", "score": 91.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_299.wav", "doc_id": "PIZEXUFLAR.seg_299", "src_text": "As we can see, using more instructions can improve the model's overall performance and reduce its sensitivity a lot.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因为我们可以看到使用更多的指令可以改善模型的整体性能，并降低其敏感度。", "score": 56.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_231.wav", "doc_id": "oYCKgTzTDy.seg_231", "src_text": "And for example, we train the English model on English query and during inference we translate the German query using API to English and then use the trained model to predict the SQL.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "例如，我们可以训练一个英语模型来处理英语查询，然后在推断过程中使用API将德语查询翻译成英语，然后使用训练好的模型来预测后续内容。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_214.wav", "doc_id": "SLpqvupgvW.seg_214", "src_text": "Thanks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_680.wav", "doc_id": "oaOHnMCwad.seg_680", "src_text": "But that's not really the case for Aditya Sharma.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "但这并不是阿迪蒂亚·沙玛的案例。", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_441.wav", "doc_id": "hgIDlKNiFM.seg_441", "src_text": "However, French didn't have any open source model for biomedical until now.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "然而，法国直到现在仍然没有新的开放源模型。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_394.wav", "doc_id": "WBLMIsdIrq.seg_394", "src_text": "In this work, we try to answer these two questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在本研究中，我们试图回答这两个问题：", "score": 62.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_460.wav", "doc_id": "hgIDlKNiFM.seg_460", "src_text": "We are also observing that more specialized data is better, but it doesn't scale well.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还观察到，专业数据是更好的数据，但它没有很好地衡量。", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_130.wav", "doc_id": "wLqFAuDnKa.seg_130", "src_text": "And we compared to state-of-the-art systems, so the best performing system, so the WMT evaluation.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们将比较两个艺术系统，即最佳性能系统，即双语测试评估", "score": 48.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_190.wav", "doc_id": "SLpqvupgvW.seg_190", "src_text": "The first one is uniform at random.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "第一种是随机均匀采样；", "score": 93.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_787.wav", "doc_id": "WTTtiRKFZI.seg_787", "src_text": "The argument is based on the principle of dependency length minimization that I will explain on the basis of these examples.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读了这本书，昨天我读", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_526.wav", "doc_id": "dvGkKzmIaN.seg_526", "src_text": "When a user send a sentence to the provider service the provider counts the trigger number in the sentence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "当用户向提供者服务发送一个句子时，提供者计算句子中的触发器数字。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_375.wav", "doc_id": "gGbuDbHhyc.seg_375", "src_text": "First, report the model selection criteria.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "首先，报告模型选择标准，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oaOHnMCwad.seg_687.wav", "doc_id": "oaOHnMCwad.seg_687", "src_text": "And so one question that people might ask is, do datasets and models have positionality?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "所以，人们可能会问：数据集和模型是否具有位置性？", "score": 60.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_42.wav", "doc_id": "aQpIWggfCo.seg_42", "src_text": "We evaluate constrained language planning ability of large language models and develop an over-generate-then-filter method for large language models.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们评估了大型语言模型的受限语言规划能力，并为大型语言模型开发了一个过度生成过滤方法。", "score": 90.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_766.wav", "doc_id": "XejEJmgUmE.seg_766", "src_text": "That is, when we perturb the sentences in the acceptable domain, we see similar increase in all the perturbations and when we perturb the sentences in the unacceptable domain, we see decrease in MPP judgments in similar fashion.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "也就是说，当我们在可接受的领域破坏句子时，我们会看到所有破坏的类似增加，而当我们在不可接受的领域破坏句子时，我们会看到类似的减少在MPJ的判决中。", "score": 48.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_40.wav", "doc_id": "aQpIWggfCo.seg_40", "src_text": "We find that T5 fine-tuned on CoScript can generate scripts of higher quality than most large language models, indicating that smaller models can surpass larger models when properly trained on suitable datasets.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们发现，T5Fine-tunedonCorescript可以生成比大多数大型语言模型更高质量的脚本，表明更小的模型可以在适当的数据集上正确训练时支持更大的模型。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_632.wav", "doc_id": "FLkGnzVRew.seg_632", "src_text": "Hello, my name is Vasudha and I'm a Computer Science PhD candidate at Stony Brook University.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "hello，我叫瓦苏达，我是斯通尼布鲁克大学的计算机科学博士研究生，", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_307.wav", "doc_id": "PIZEXUFLAR.seg_307", "src_text": "Thank you.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "谢谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/TVCREhgqUP.seg_66.wav", "doc_id": "TVCREhgqUP.seg_66", "src_text": "In this paper, we don't use trees and introduce a neural seq2seq model that directly models the correspondences between fragments of the input and fragments of the output.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在本文中，我们没有使用树状结构，而是引入了一个年序模型，该模型直接模拟输入片段和输出片段之间的相互关联。第一次", "score": 54.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_516.wav", "doc_id": "dvGkKzmIaN.seg_516", "src_text": "Existing works can be broadly classified into four categories.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "现有的作品可以大致分为四类。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/GvEBWkLmuI.seg_832.wav", "doc_id": "GvEBWkLmuI.seg_832", "src_text": "They usually rely on hand-constructed data sets that are very time-consuming to curate and they also usually only. measure very specific stereotypes, meaning that they don't generalize well to other demographics or contexts, or they simply capture very general broad associations, like negative associations with particular groups.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "它们通常依赖于非常耗时的手工构建的数据集。它们通常只会测量非常特殊的刻度，意味着它们不能与其他人口统计学或联系广泛化，而它们只是简单地捕捉非常普遍的广泛联系，例如与特定群体的负面联系。", "score": 89.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/aQpIWggfCo.seg_15.wav", "doc_id": "aQpIWggfCo.seg_15", "src_text": "We find that all language models achieve unsatisfactory results on planning for specific goals.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们发现所有的LilyPad模型在规划特定目标时都达不到令人满意的结果。", "score": 55.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dvGkKzmIaN.seg_525.wav", "doc_id": "dvGkKzmIaN.seg_525", "src_text": "In watermark injection, we first define a target embedding.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在水印注入中，我们首先定义一个目标嵌入。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/hgIDlKNiFM.seg_445.wav", "doc_id": "hgIDlKNiFM.seg_445", "src_text": "Is it 4 gigabytes, 8 gigabytes, or more?", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "是4GB、8GB还是更多？", "score": 83.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_369.wav", "doc_id": "gGbuDbHhyc.seg_369", "src_text": "As we can see from the figures, the vanilla model, termed FTw, initially underperforms more complicated WSL methods, like COSINE.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "正如我们从图表中可以看到，瓦利纳模型称为FTW，最初表现不佳，无法比拟于更复杂的WSL方法，如共轭。", "score": 30.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/dJGfOSFgZO.seg_316.wav", "doc_id": "dJGfOSFgZO.seg_316", "src_text": "One approach is to simply ask human judges to evaluate several dimensions of dialogue quality, such as the relevance of model responses using existing comparative or Likert scale methods.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "一种方法是简单地要求人类裁判人对对话质量的几种维度进行评估，例如模拟回应的相关性，使用现有的比较或利基级方法。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_159.wav", "doc_id": "SLpqvupgvW.seg_159", "src_text": "Hi!", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "嗨，", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_603.wav", "doc_id": "oeooqChmKK.seg_603", "src_text": "Servin and Kea met at a park.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "塞尔文和凯亚在公园里见面，工作一整天", "score": 80.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_235.wav", "doc_id": "oYCKgTzTDy.seg_235", "src_text": "And we test Multilingual Model which we train one multilingual model for all languages.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "它有一个多语言模型，我们可以为所有语言训练一个多语言模型。", "score": 77.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/wLqFAuDnKa.seg_158.wav", "doc_id": "wLqFAuDnKa.seg_158", "src_text": "Thank you very much.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "非常感谢。", "score": 100.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/uZBWfYjYnf.seg_118.wav", "doc_id": "uZBWfYjYnf.seg_118", "src_text": "And we also see that if we consider the actual elapsed time or the computational-aware time, that is the fastest strategy.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "我们还看到，如果我们考虑实际的适应时间或计算的适应时间，适应是最快的策略。", "score": 58.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_557.wav", "doc_id": "rISrKoXQCx.seg_557", "src_text": "This ensures us to do automatic evaluation well grounded in political science literature.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "杂性测试）来确保在政治科学文献中进行自动评估。", "score": 86.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oYCKgTzTDy.seg_254.wav", "doc_id": "oYCKgTzTDy.seg_254", "src_text": "We also find some other interesting findings.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "我们还发现一些其他有趣的结果，", "score": 79.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_394.wav", "doc_id": "WBLMIsdIrq.seg_394", "src_text": "In this work, we try to answer these two questions.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在这项工作中，我们试图回答这两个问题：", "score": 92.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_194.wav", "doc_id": "SLpqvupgvW.seg_194", "src_text": "For example, the same genre or the same artist for a song.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "例如歌曲的同一类型或同一艺术家。", "score": 85.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_656.wav", "doc_id": "FLkGnzVRew.seg_656", "src_text": "Further, on iteratively fine-tuning on both tasks, we find that fine-tuning of CE tasks followed by further fine-tuning on debate yields a much better zero-shot performance.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "在进一步对双任务进行迁移学习时，我们发现对CE任务进行迁移学习后，通过进一步对迁移学习进行迁移学习，我们发现迁移学习对CE任务的迁移学习后，通过进一步对迁移学习进行迁移学习，我们发现迁移学习对CE任务的迁移学习后，通过进一步对迁移学习进行迁移学习，我们发现迁", "score": 46.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/PIZEXUFLAR.seg_271.wav", "doc_id": "PIZEXUFLAR.seg_271", "src_text": "Therefore, this motivates us to build a multi-modal instruction tuning dataset.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "因此这促使我们建造一个多模指令调音器数据集。", "score": 76.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/rISrKoXQCx.seg_546.wav", "doc_id": "rISrKoXQCx.seg_546", "src_text": "Hi, I'm Shangbin, PhD student in the University of Washington.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "今", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/FLkGnzVRew.seg_637.wav", "doc_id": "FLkGnzVRew.seg_637", "src_text": "Further mentioning that \"I don't think I could keep my job without them\" justifies the second occurrence.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "在没有他们的情况下保持我的工作，这就合理了第二次出现，", "score": 78.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WBLMIsdIrq.seg_405.wav", "doc_id": "WBLMIsdIrq.seg_405", "src_text": "First, we look at part-of-speech tags that have high mean P-CXMI.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "首先我们看了语音标记符号，发现它们的意思是“高", "score": 49.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/XejEJmgUmE.seg_745.wav", "doc_id": "XejEJmgUmE.seg_745", "src_text": "We extract grammatical sentences from Adjunct Island and then we add it as a prefix to both the acceptable query and the unacceptable query.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "前加上一个句子来重建更长的序列。这个句子来自一个名为“adgentile”的句子库。因此，", "score": 45.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SUkmfOTvGi.seg_485.wav", "doc_id": "SUkmfOTvGi.seg_485", "src_text": "The first one is adaptive overfitting, which is overfitting costs by reusing the same test set over and over again and this is usually manifested as the diminishing returns on a new test set.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "第一个是适应性过度适应，这是由反复使用相同的测试集引起的过度适应，并且这通常在新测试集上出现减少时表现出来。", "score": 73.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/SLpqvupgvW.seg_214.wav", "doc_id": "SLpqvupgvW.seg_214", "src_text": "Thanks.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "。", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/WTTtiRKFZI.seg_794.wav", "doc_id": "WTTtiRKFZI.seg_794", "src_text": "This is illustrated here.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "这", "score": 0.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/gGbuDbHhyc.seg_361.wav", "doc_id": "gGbuDbHhyc.seg_361", "src_text": "As shown in this figure, if there are no clean validation samples, then the trained models cannot generalize beyond the original weak labels, meaning that the training is pointless.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "long_KIT_primary", "tgt_text": "如图所示：如果没有清洁的验证样本，那么训练模型就不能超越原始的微基准级别，这意味着训练是无用的。", "score": 69.0}
{"audio_path": "data/iwslt25/IWSLT25INSTRUCT/segmented/oeooqChmKK.seg_592.wav", "doc_id": "oeooqChmKK.seg_592", "src_text": "Recent works in tasks like question answering show that models can use pretrained-time knowledge to solve the task.", "src_text_system": "human", "src_lang": "en", "tgt_lang": "zh", "domain": "acl", "tgt_system": "short_NLE_primary", "tgt_text": "通常通过预训练获得的知识，通常通过预训练获得的知识，通常通过预训练获得的知识，通常通过预训练获得的知识，通常通过预", "score": 25.0}