Kaguya-19 commited on
Commit
c6232d6
·
verified ·
1 Parent(s): 744cc1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +261 -1
README.md CHANGED
@@ -81,11 +81,271 @@ You can read more tutorials about AgentCPM-Report in the [documentation](https:/
81
 
82
 
83
  ## Evaluation
84
- Experiments on DeepResearch Bench, DeepConsult, and DeepResearch Gym demonstrate that AgentCPM-Report outperforms leading closed-source systems, with substantial gains in Insight. Detailed benchmark results can be found in the associated research paper.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  ## Acknowledgements
87
  This project would not be possible without the support and contributions of the open-source community. During development, we referred to and used multiple excellent open-source frameworks, models, and data resources, including [verl](https://github.com/volcengine/verl), [UltraRAG](https://github.com/OpenBMB/UltraRAG), [MiniCPM4.1](https://github.com/OpenBMB/MiniCPM), and [SurveyGo](https://surveygo.modelbest.cn/).
88
 
 
 
 
 
 
 
 
89
  ## Citation
90
 
91
  If **AgentCPM-Report** is helpful for your research, please cite it as follows:
 
81
 
82
 
83
  ## Evaluation
84
+ <table align="center">
85
+ <thead>
86
+ <tr>
87
+ <th align="center">DeepResearch Bench</th>
88
+ <th align="center">Overall</th>
89
+ <th align="center">Comprehensiveness</th>
90
+ <th align="center">Insight</th>
91
+ <th align="center">Instruction Following</th>
92
+ <th align="center">Readability</th>
93
+ </tr>
94
+ </thead>
95
+ <tbody>
96
+ <tr>
97
+ <td align="center">Doubao-research</td>
98
+ <td align="center">44.34</td>
99
+ <td align="center">44.84</td>
100
+ <td align="center">40.56</td>
101
+ <td align="center">47.95</td>
102
+ <td align="center">44.69</td>
103
+ </tr>
104
+ <tr>
105
+ <td align="center">Claude-research</td>
106
+ <td align="center">45.00</td>
107
+ <td align="center">45.34</td>
108
+ <td align="center">42.79</td>
109
+ <td align="center">47.58</td>
110
+ <td align="center">44.66</td>
111
+ </tr>
112
+ <tr>
113
+ <td align="center">OpenAI-deepresearch</td>
114
+ <td align="center">46.45</td>
115
+ <td align="center">46.46</td>
116
+ <td align="center">43.73</td>
117
+ <td align="center">49.39</td>
118
+ <td align="center">47.22</td>
119
+ </tr>
120
+ <tr>
121
+ <td align="center">Gemini-2.5-Pro-deepresearch</td>
122
+ <td align="center">49.71</td>
123
+ <td align="center">49.51</td>
124
+ <td align="center">49.45</td>
125
+ <td align="center">50.12</td>
126
+ <td align="center">50.00</td>
127
+ </tr>
128
+ <tr>
129
+ <td align="center">WebWeaver(Qwen3-30B-A3B)</td>
130
+ <td align="center">46.77</td>
131
+ <td align="center">45.15</td>
132
+ <td align="center">45.78</td>
133
+ <td align="center">49.21</td>
134
+ <td align="center">47.34</td>
135
+ </tr>
136
+ <tr>
137
+ <td align="center">WebWeaver(Claude-Sonnet-4)</td>
138
+ <td align="center">50.58</td>
139
+ <td align="center">51.45</td>
140
+ <td align="center">50.02</td>
141
+ <td align="center">50.81</td>
142
+ <td align="center">49.79</td>
143
+ </tr>
144
+ <tr>
145
+ <td align="center">Enterprise-DR(Gemini-2.5-Pro)</td>
146
+ <td align="center">49.86</td>
147
+ <td align="center">49.01</td>
148
+ <td align="center">50.28</td>
149
+ <td align="center">50.03</td>
150
+ <td align="center">49.98</td>
151
+ </tr>
152
+ <tr>
153
+ <td align="center">RhinoInsigh(Gemini-2.5-Pro)</td>
154
+ <td align="center">50.92</td>
155
+ <td align="center">50.51</td>
156
+ <td align="center">51.45</td>
157
+ <td align="center">51.72</td>
158
+ <td align="center">50.00</td>
159
+ </tr>
160
+ <tr>
161
+ <td align="center">AgentCPM-Report</td>
162
+ <td align="center">50.11</td>
163
+ <td align="center">50.54</td>
164
+ <td align="center">52.64</td>
165
+ <td align="center">48.87</td>
166
+ <td align="center">44.17</td>
167
+ </tr>
168
+ </tbody>
169
+ </table>
170
+
171
+
172
+
173
+ <table align="center">
174
+ <thead>
175
+ <tr>
176
+ <th align="center">DeepResearch Gym</th>
177
+ <th align="center">Avg.</th>
178
+ <th align="center">Clarity</th>
179
+ <th align="center">Depth</th>
180
+ <th align="center">Balance</th>
181
+ <th align="center">Breadth</th>
182
+ <th align="center">Support</th>
183
+ <th align="center">Insightfulness</th>
184
+ </tr>
185
+ </thead>
186
+ <tbody>
187
+ <tr>
188
+ <td align="center">Doubao-research</td>
189
+ <td align="center">84.46</td>
190
+ <td align="center">68.85</td>
191
+ <td align="center">93.12</td>
192
+ <td align="center">83.96</td>
193
+ <td align="center">93.33</td>
194
+ <td align="center">84.38</td>
195
+ <td align="center">83.12</td>
196
+ </tr>
197
+ <tr>
198
+ <td align="center">Claude-research</td>
199
+ <td align="center">80.25</td>
200
+ <td align="center">86.67</td>
201
+ <td align="center">96.88</td>
202
+ <td align="center">84.41</td>
203
+ <td align="center">96.56</td>
204
+ <td align="center">26.77</td>
205
+ <td align="center">90.22</td>
206
+ </tr>
207
+ <tr>
208
+ <td align="center">OpenAI-deepresearch</td>
209
+ <td align="center">91.27</td>
210
+ <td align="center">84.90</td>
211
+ <td align="center">98.10</td>
212
+ <td align="center">89.80</td>
213
+ <td align="center">97.40</td>
214
+ <td align="center">88.40</td>
215
+ <td align="center">89.00</td>
216
+ </tr>
217
+ <tr>
218
+ <td align="center">Gemini-2.5-pro-deepresearch</td>
219
+ <td align="center">96.02</td>
220
+ <td align="center">90.71</td>
221
+ <td align="center">99.90</td>
222
+ <td align="center">93.37</td>
223
+ <td align="center">99.69</td>
224
+ <td align="center">95.00</td>
225
+ <td align="center">97.45</td>
226
+ </tr>
227
+ <tr>
228
+ <td align="center">WebWeaver (Qwen3-30b-a3b)</td>
229
+ <td align="center">77.27</td>
230
+ <td align="center">71.88</td>
231
+ <td align="center">85.51</td>
232
+ <td align="center">75.80</td>
233
+ <td align="center">84.78</td>
234
+ <td align="center">63.77</td>
235
+ <td align="center">81.88</td>
236
+ </tr>
237
+ <tr>
238
+ <td align="center">WebWeaver (Claude-sonnet-4)</td>
239
+ <td align="center">96.77</td>
240
+ <td align="center">90.50</td>
241
+ <td align="center">99.87</td>
242
+ <td align="center">94.30</td>
243
+ <td align="center">100.00</td>
244
+ <td align="center">98.73</td>
245
+ <td align="center">97.22</td>
246
+ </tr>
247
+ <tr>
248
+ <td align="center">AgentCPM-Report</td>
249
+ <td align="center">98.48</td>
250
+ <td align="center">95.10</td>
251
+ <td align="center">100.00</td>
252
+ <td align="center">98.50</td>
253
+ <td align="center">100.00</td>
254
+ <td align="center">97.30</td>
255
+ <td align="center">100.00</td>
256
+ </tr>
257
+ </tbody>
258
+ </table>
259
+
260
+ <table align="center">
261
+ <thead>
262
+ <tr>
263
+ <th align="center">DeepConsult</th>
264
+ <th align="center">Avg.</th>
265
+ <th align="center">Win</th>
266
+ <th align="center">Tie</th>
267
+ <th align="center">Lose</th>
268
+ </tr>
269
+ </thead>
270
+ <tbody>
271
+ <tr>
272
+ <td align="center">Doubao-research</td>
273
+ <td align="center">5.42</td>
274
+ <td align="center">29.95</td>
275
+ <td align="center">40.35</td>
276
+ <td align="center">29.70</td>
277
+ </tr>
278
+ <tr>
279
+ <td align="center">Claude-research</td>
280
+ <td align="center">4.60</td>
281
+ <td align="center">25.00</td>
282
+ <td align="center">38.89</td>
283
+ <td align="center">36.11</td>
284
+ </tr>
285
+ <tr>
286
+ <td align="center">OpenAI-deepresearch</td>
287
+ <td align="center">5.00</td>
288
+ <td align="center">0.00</td>
289
+ <td align="center">100.00</td>
290
+ <td align="center">0.00</td>
291
+ </tr>
292
+ <tr>
293
+ <td align="center">Gemini-2.5-Pro-deepresearch</td>
294
+ <td align="center">6.70</td>
295
+ <td align="center">61.27</td>
296
+ <td align="center">31.13</td>
297
+ <td align="center">7.60</td>
298
+ </tr>
299
+ <tr>
300
+ <td align="center">WebWeaver(Qwen3-30B-A3B)</td>
301
+ <td align="center">4.57</td>
302
+ <td align="center">28.65</td>
303
+ <td align="center">34.90</td>
304
+ <td align="center">36.46</td>
305
+ </tr>
306
+ <tr>
307
+ <td align="center">WebWeaver(Claude-Sonnet-4)</td>
308
+ <td align="center">6.96</td>
309
+ <td align="center">66.86</td>
310
+ <td align="center">10.47</td>
311
+ <td align="center">22.67</td>
312
+ </tr>
313
+ <tr>
314
+ <td align="center">Enterprise-DR(Gemini-2.5-Pro)</td>
315
+ <td align="center">6.82</td>
316
+ <td align="center">71.57</td>
317
+ <td align="center">19.12</td>
318
+ <td align="center">9.31</td>
319
+ </tr>
320
+ <tr>
321
+ <td align="center">RhinoInsigh(Gemini-2.5-Pro)</td>
322
+ <td align="center">6.82</td>
323
+ <td align="center">68.51</td>
324
+ <td align="center">11.02</td>
325
+ <td align="center">20.47</td>
326
+ </tr>
327
+ <tr>
328
+ <td align="center">AgentCPM-Report</td>
329
+ <td align="center">6.60</td>
330
+ <td align="center">57.60</td>
331
+ <td align="center">13.73</td>
332
+ <td align="center">28.68</td>
333
+ </tr>
334
+ </tbody>
335
+ </table>
336
+
337
+ Our evaluation datasets include DeepResearch Bench, DeepConsult, and DeepResearch Gym. The writing-time knowledge base includes about 2.7 million [Arxiv papers](https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv) and about 200,000 internal webpage summaries.
338
 
339
  ## Acknowledgements
340
  This project would not be possible without the support and contributions of the open-source community. During development, we referred to and used multiple excellent open-source frameworks, models, and data resources, including [verl](https://github.com/volcengine/verl), [UltraRAG](https://github.com/OpenBMB/UltraRAG), [MiniCPM4.1](https://github.com/OpenBMB/MiniCPM), and [SurveyGo](https://surveygo.modelbest.cn/).
341
 
342
+ ## Contributions
343
+ Project leads: Yishan Li, Wentong Chen
344
+
345
+ Contributors: Yishan Li, Wentong Chen, Yukun Yan, Mingwei Li, Sen Mei, Xiaorong Wang, Kunpeng Liu, Xin Cong, Shuo Wang, Zhong Zhang, Yaxi Lu, Zhenghao Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun
346
+
347
+ Advisors: Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
348
+
349
  ## Citation
350
 
351
  If **AgentCPM-Report** is helpful for your research, please cite it as follows: