Find out whether a given text contains any of the prohibited words
up vote
2
down vote
favorite
We have been given a list of strings which are blacklisted. The goal is to identify if a given text contains any of these blacklisted strings. The restriction here is that the blacklisted string needs to match on the word boundary e.g. consider a blacklist string "abc" and text "abc pqr", the text in this case is unsafe (i.e. it contains a blacklisted string). On the other hand if the text is "abcoqr", then the text is safe since the string "abc" is not on the word boundary.
Here is my solution using a modified Trie data structure. https://gist.github.com/hgadre/d4e9ec576932167f01fd33970002a882
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class SafeText {
static class Tuple {
int span = 0; // the length of previous words which should have been matched if end = true.
boolean end; // marks the identification of a blacklisted string.
Set<String> nextWords = new HashSet<>(); // next set of words to search for matching blacklisted strings.
public void setEnd(boolean end, int span) {
this.span = span;
this.end = end;
}
public boolean isEnd(int span) {
return end && span == this.span;
}
public void addNextWord (String word) {
this.nextWords.add(word);
}
public boolean containsWord(String word) {
return this.nextWords.contains(word);
}
}
private final Map<String, Tuple> m = new HashMap<>();
public SafeText(List<String> blackList) {
Collections.sort(blackList);
for (String str : blackList) {
String tokens = str.split("\s");
int i = 0;
for (; i < tokens.length - 1; i++) {
m.computeIfAbsent(tokens[i], x -> new Tuple()).addNextWord(tokens[i+1]);
}
m.computeIfAbsent(tokens[i], x -> new Tuple()).setEnd(true, tokens.length-1);
}
}
public boolean isSafe(String text) {
String tokens = text.split("\s");
for (int i = 0; i < tokens.length; i++) {
String key = tokens[i];
int j = i;
while (j < tokens.length && m.containsKey(key)) {
Tuple t = m.get(key);
if (t.isEnd(j-i)) {
return false;
} else if ((j+1) < tokens.length && t.containsWord(tokens[j+1])) {
key = tokens[j+1];
j++;
} else {
break;
}
}
}
return true;
}
}
Is this an optimal solution? Or is there any better approach to solve this problem?
java algorithm strings
New contributor
add a comment |
up vote
2
down vote
favorite
We have been given a list of strings which are blacklisted. The goal is to identify if a given text contains any of these blacklisted strings. The restriction here is that the blacklisted string needs to match on the word boundary e.g. consider a blacklist string "abc" and text "abc pqr", the text in this case is unsafe (i.e. it contains a blacklisted string). On the other hand if the text is "abcoqr", then the text is safe since the string "abc" is not on the word boundary.
Here is my solution using a modified Trie data structure. https://gist.github.com/hgadre/d4e9ec576932167f01fd33970002a882
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class SafeText {
static class Tuple {
int span = 0; // the length of previous words which should have been matched if end = true.
boolean end; // marks the identification of a blacklisted string.
Set<String> nextWords = new HashSet<>(); // next set of words to search for matching blacklisted strings.
public void setEnd(boolean end, int span) {
this.span = span;
this.end = end;
}
public boolean isEnd(int span) {
return end && span == this.span;
}
public void addNextWord (String word) {
this.nextWords.add(word);
}
public boolean containsWord(String word) {
return this.nextWords.contains(word);
}
}
private final Map<String, Tuple> m = new HashMap<>();
public SafeText(List<String> blackList) {
Collections.sort(blackList);
for (String str : blackList) {
String tokens = str.split("\s");
int i = 0;
for (; i < tokens.length - 1; i++) {
m.computeIfAbsent(tokens[i], x -> new Tuple()).addNextWord(tokens[i+1]);
}
m.computeIfAbsent(tokens[i], x -> new Tuple()).setEnd(true, tokens.length-1);
}
}
public boolean isSafe(String text) {
String tokens = text.split("\s");
for (int i = 0; i < tokens.length; i++) {
String key = tokens[i];
int j = i;
while (j < tokens.length && m.containsKey(key)) {
Tuple t = m.get(key);
if (t.isEnd(j-i)) {
return false;
} else if ((j+1) < tokens.length && t.containsWord(tokens[j+1])) {
key = tokens[j+1];
j++;
} else {
break;
}
}
}
return true;
}
}
Is this an optimal solution? Or is there any better approach to solve this problem?
java algorithm strings
New contributor
I don't see any kind of trie involved in your code.
– 200_success
9 hours ago
@200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
– hsg
9 hours ago
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
We have been given a list of strings which are blacklisted. The goal is to identify if a given text contains any of these blacklisted strings. The restriction here is that the blacklisted string needs to match on the word boundary e.g. consider a blacklist string "abc" and text "abc pqr", the text in this case is unsafe (i.e. it contains a blacklisted string). On the other hand if the text is "abcoqr", then the text is safe since the string "abc" is not on the word boundary.
Here is my solution using a modified Trie data structure. https://gist.github.com/hgadre/d4e9ec576932167f01fd33970002a882
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class SafeText {
static class Tuple {
int span = 0; // the length of previous words which should have been matched if end = true.
boolean end; // marks the identification of a blacklisted string.
Set<String> nextWords = new HashSet<>(); // next set of words to search for matching blacklisted strings.
public void setEnd(boolean end, int span) {
this.span = span;
this.end = end;
}
public boolean isEnd(int span) {
return end && span == this.span;
}
public void addNextWord (String word) {
this.nextWords.add(word);
}
public boolean containsWord(String word) {
return this.nextWords.contains(word);
}
}
private final Map<String, Tuple> m = new HashMap<>();
public SafeText(List<String> blackList) {
Collections.sort(blackList);
for (String str : blackList) {
String tokens = str.split("\s");
int i = 0;
for (; i < tokens.length - 1; i++) {
m.computeIfAbsent(tokens[i], x -> new Tuple()).addNextWord(tokens[i+1]);
}
m.computeIfAbsent(tokens[i], x -> new Tuple()).setEnd(true, tokens.length-1);
}
}
public boolean isSafe(String text) {
String tokens = text.split("\s");
for (int i = 0; i < tokens.length; i++) {
String key = tokens[i];
int j = i;
while (j < tokens.length && m.containsKey(key)) {
Tuple t = m.get(key);
if (t.isEnd(j-i)) {
return false;
} else if ((j+1) < tokens.length && t.containsWord(tokens[j+1])) {
key = tokens[j+1];
j++;
} else {
break;
}
}
}
return true;
}
}
Is this an optimal solution? Or is there any better approach to solve this problem?
java algorithm strings
New contributor
We have been given a list of strings which are blacklisted. The goal is to identify if a given text contains any of these blacklisted strings. The restriction here is that the blacklisted string needs to match on the word boundary e.g. consider a blacklist string "abc" and text "abc pqr", the text in this case is unsafe (i.e. it contains a blacklisted string). On the other hand if the text is "abcoqr", then the text is safe since the string "abc" is not on the word boundary.
Here is my solution using a modified Trie data structure. https://gist.github.com/hgadre/d4e9ec576932167f01fd33970002a882
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class SafeText {
static class Tuple {
int span = 0; // the length of previous words which should have been matched if end = true.
boolean end; // marks the identification of a blacklisted string.
Set<String> nextWords = new HashSet<>(); // next set of words to search for matching blacklisted strings.
public void setEnd(boolean end, int span) {
this.span = span;
this.end = end;
}
public boolean isEnd(int span) {
return end && span == this.span;
}
public void addNextWord (String word) {
this.nextWords.add(word);
}
public boolean containsWord(String word) {
return this.nextWords.contains(word);
}
}
private final Map<String, Tuple> m = new HashMap<>();
public SafeText(List<String> blackList) {
Collections.sort(blackList);
for (String str : blackList) {
String tokens = str.split("\s");
int i = 0;
for (; i < tokens.length - 1; i++) {
m.computeIfAbsent(tokens[i], x -> new Tuple()).addNextWord(tokens[i+1]);
}
m.computeIfAbsent(tokens[i], x -> new Tuple()).setEnd(true, tokens.length-1);
}
}
public boolean isSafe(String text) {
String tokens = text.split("\s");
for (int i = 0; i < tokens.length; i++) {
String key = tokens[i];
int j = i;
while (j < tokens.length && m.containsKey(key)) {
Tuple t = m.get(key);
if (t.isEnd(j-i)) {
return false;
} else if ((j+1) < tokens.length && t.containsWord(tokens[j+1])) {
key = tokens[j+1];
j++;
} else {
break;
}
}
}
return true;
}
}
Is this an optimal solution? Or is there any better approach to solve this problem?
java algorithm strings
java algorithm strings
New contributor
New contributor
edited 9 hours ago
200_success
128k15149412
128k15149412
New contributor
asked 10 hours ago
hsg
111
111
New contributor
New contributor
I don't see any kind of trie involved in your code.
– 200_success
9 hours ago
@200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
– hsg
9 hours ago
add a comment |
I don't see any kind of trie involved in your code.
– 200_success
9 hours ago
@200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
– hsg
9 hours ago
I don't see any kind of trie involved in your code.
– 200_success
9 hours ago
I don't see any kind of trie involved in your code.
– 200_success
9 hours ago
@200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
– hsg
9 hours ago
@200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
– hsg
9 hours ago
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
hsg is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209688%2ffind-out-whether-a-given-text-contains-any-of-the-prohibited-words%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
hsg is a new contributor. Be nice, and check out our Code of Conduct.
hsg is a new contributor. Be nice, and check out our Code of Conduct.
hsg is a new contributor. Be nice, and check out our Code of Conduct.
hsg is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209688%2ffind-out-whether-a-given-text-contains-any-of-the-prohibited-words%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I don't see any kind of trie involved in your code.
– 200_success
9 hours ago
@200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
– hsg
9 hours ago