
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.
The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.
This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.
Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.
The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.
Given a string s and a string t, check if s is subsequence of t.
You may assume that there is only lower case English letters in both s and t. t is potentially a very long (length ~= 500,000) string, and s is a short string (<=100).
A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie,
"ace"
is a subsequence of"abcde"
while"aec"
is not).Example 1:
s =
"abc"
, t ="ahbgdc"
Return
true
.Example 2:
s =
"axc"
, t ="ahbgdc"
Return
false
.Follow up:
If there are lots of incoming S, say S1, S2, ... , Sk where k >= 1B, and you want to check one by one to see if T has its subsequence. In this scenario, how would you change your code?
Credits:
Special thanks to @pbrother for adding this problem and creating all test cases.
这道题算比较简单的一种,我们可以用两个指针分别指向字符串s和t,然后如果字符相等,则i和j自增1,反之只有j自增1,最后看i是否等于s的长度,等于说明s已经遍历完了,而且字符都有在t中出现过,参见代码如下:
解法一:
题目中的 Follow up 说如果有大量的字符串s,问我们如何进行优化。那么既然字符串t始终保持不变,我们就可以在t上做一些文章。子序列虽然不需要是连着的子串,但是字符之间的顺序是需要的,那么我们可以建立字符串t中的每个字符跟其位置直接的映射,由于t中可能会出现重复字符,所以把相同的字符出现的所有位置按顺序加到一个数组中,所以就是用 HashMap 来建立每个字符和其位置数组之间的映射。然后遍历字符串s中的每个字符,对于每个遍历到的字符c,我们到 HashMap 中的对应的字符数组中去搜索,由于位置数组是有序的,我们使用二分搜索来加快搜索速度,这里需要注意的是,由于子序列是有顺序要求的,所以需要一个变量 pre 来记录当前匹配到t字符串中的位置,对于当前s串中的字符c,即便在t串中存在,但是若其在位置 pre 之前,也是不能匹配的。所以我们可以使用 uppper_bound() 来二分查找第一个大于 pre 的位置,若不存在,直接返回 false,否则将 pre 更新为二分查找的结果并继续循环即可,参见代码如下:
解法二:
类似题目:
Number of Matching Subsequences
参考资料:
https://leetcode.com/problems/is-subsequence/
https://leetcode.com/problems/is-subsequence/discuss/87302/Binary-search-solution-for-follow-up-with-detailed-comments
https://leetcode.com/problems/is-subsequence/discuss/87408/Binary-search-solution-to-cope-with-input-with-many-Ss(with-explanation)
LeetCode All in One 题目讲解汇总(持续更新中...)
The text was updated successfully, but these errors were encountered: