Topics

[feature request] Add start_msg_date param to endpoint /downloadarchives


valcos
 

Hi Mark,

It would be great if the endpoint /downloadarchives could have an optional parameter to download messages after a given date. The param (e.g., start_msg_date) would be similar to start_msg_num and would return the messages
sent after a given date (in isoformat, UTC).

Do you think it's possible to implement this feature?

Best,
Valerio


 

On Mon, Jan 20, 2020 at 7:11 AM valcos <valcos@...> wrote:

It would be great if the endpoint /downloadarchives could have an optional parameter to download messages after a given date. The param (e.g., start_msg_date) would be similar to start_msg_num and would return the messages
sent after a given date (in isoformat, UTC).

It would be more reliable for you to use the start_msg_num parameter in coordination with the X-Groupsio-MsgNum header line. That way you are guaranteed to fetch all messages, with no overlap. Is there any reason you can't use that? I am going to start more aggressively ratelimiting /downloadarchives access, as it puts a load on our system.

Thanks,
Mark


valcos
 

Thank you Mark for answering.

I don't see the X-Groupsio-MsgNum in the response headers (see them below) or as part of the messages, I'm using the group onap+onap-zoom-hosts to test it. Please let me know if I'm missing something, or if you can point me to a group that uses the X-Groupsio-MsgNum header line. 

{
    "Connection": "keep-alive",
    "Content-Disposition": "attachment; filename=\"messages.zip\"",
    "Content-Type": "application/octet-stream",
    "Date": "Tue, 21 Jan 2020 11:30:39 GMT",
    "Server": "nginx/1.17.6",
    "Set-Cookie": "groupsio=MTU3O...aClXhs=; Path=/; Domain=groups.io; Expires=Thu, 20 Feb 2020 11:30:39 GMT; Max-Age=2592000; HttpOnly; Secure",
    "Transfer-Encoding": "chunked",
    "X-Frame-Options": "DENY"

}

The main reason to ask for a start_msg_date is because the way I collect mailing list data is time-based (it leverages on the field `Date` of the emails). Since the information is already included in each email, it isn't needed to add any extra information (e.g., response headers or additional headers in the original emails) to perform incremental fetching.

I would say that start_msg_date and start_msg_num are equivalent in terms of guaranteeing to fetch all messages, however I understand that the offset-based solution performs better for corner cases when there are N messages sent out at the same date for a given group. From a user point of view, I think it makes more sense to get all messages after a given date, since the user doesn't probably know the start_msg_num.


In any case, it would be great if you can include the start_msg_date. If this isn't possible, I'll figure out how to use the start_msg_num in my scenario.

Best,
Valerio


 

On Tue, Jan 21, 2020 at 4:02 AM valcos <valcos@...> wrote:
Thank you Mark for answering.

I don't see the X-Groupsio-MsgNum in the response headers (see them below) or as part of the messages, I'm using the group onap+onap-zoom-hosts to test it. Please let me know if I'm missing something, or if you can point me to a group that uses the X-Groupsio-MsgNum header line. 

That line appears in the email headers of every email message downloaded. Look for the largest, which should be in the last message downloaded, and use that next time.

I may be able to add start_msg_date, but I'm not sure when.

Thanks,
Mark 


valcos
 

Thank you Mark for adding the start_time param for https://groups.io/api#download-archives