Find out whether a given text contains any of the prohibited words











up vote
2
down vote

favorite












We have been given a list of strings which are blacklisted. The goal is to identify if a given text contains any of these blacklisted strings. The restriction here is that the blacklisted string needs to match on the word boundary e.g. consider a blacklist string "abc" and text "abc pqr", the text in this case is unsafe (i.e. it contains a blacklisted string). On the other hand if the text is "abcoqr", then the text is safe since the string "abc" is not on the word boundary.



Here is my solution using a modified Trie data structure. https://gist.github.com/hgadre/d4e9ec576932167f01fd33970002a882



import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class SafeText {
static class Tuple {
int span = 0; // the length of previous words which should have been matched if end = true.
boolean end; // marks the identification of a blacklisted string.
Set<String> nextWords = new HashSet<>(); // next set of words to search for matching blacklisted strings.

public void setEnd(boolean end, int span) {
this.span = span;
this.end = end;
}

public boolean isEnd(int span) {
return end && span == this.span;
}

public void addNextWord (String word) {
this.nextWords.add(word);
}

public boolean containsWord(String word) {
return this.nextWords.contains(word);
}
}

private final Map<String, Tuple> m = new HashMap<>();


public SafeText(List<String> blackList) {
Collections.sort(blackList);
for (String str : blackList) {
String tokens = str.split("\s");
int i = 0;
for (; i < tokens.length - 1; i++) {
m.computeIfAbsent(tokens[i], x -> new Tuple()).addNextWord(tokens[i+1]);
}
m.computeIfAbsent(tokens[i], x -> new Tuple()).setEnd(true, tokens.length-1);
}
}

public boolean isSafe(String text) {
String tokens = text.split("\s");
for (int i = 0; i < tokens.length; i++) {
String key = tokens[i];
int j = i;
while (j < tokens.length && m.containsKey(key)) {
Tuple t = m.get(key);
if (t.isEnd(j-i)) {
return false;
} else if ((j+1) < tokens.length && t.containsWord(tokens[j+1])) {
key = tokens[j+1];
j++;
} else {
break;
}
}
}
return true;
}
}


Is this an optimal solution? Or is there any better approach to solve this problem?










share|improve this question









New contributor




hsg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • I don't see any kind of trie involved in your code.
    – 200_success
    9 hours ago










  • @200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
    – hsg
    9 hours ago















up vote
2
down vote

favorite












We have been given a list of strings which are blacklisted. The goal is to identify if a given text contains any of these blacklisted strings. The restriction here is that the blacklisted string needs to match on the word boundary e.g. consider a blacklist string "abc" and text "abc pqr", the text in this case is unsafe (i.e. it contains a blacklisted string). On the other hand if the text is "abcoqr", then the text is safe since the string "abc" is not on the word boundary.



Here is my solution using a modified Trie data structure. https://gist.github.com/hgadre/d4e9ec576932167f01fd33970002a882



import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class SafeText {
static class Tuple {
int span = 0; // the length of previous words which should have been matched if end = true.
boolean end; // marks the identification of a blacklisted string.
Set<String> nextWords = new HashSet<>(); // next set of words to search for matching blacklisted strings.

public void setEnd(boolean end, int span) {
this.span = span;
this.end = end;
}

public boolean isEnd(int span) {
return end && span == this.span;
}

public void addNextWord (String word) {
this.nextWords.add(word);
}

public boolean containsWord(String word) {
return this.nextWords.contains(word);
}
}

private final Map<String, Tuple> m = new HashMap<>();


public SafeText(List<String> blackList) {
Collections.sort(blackList);
for (String str : blackList) {
String tokens = str.split("\s");
int i = 0;
for (; i < tokens.length - 1; i++) {
m.computeIfAbsent(tokens[i], x -> new Tuple()).addNextWord(tokens[i+1]);
}
m.computeIfAbsent(tokens[i], x -> new Tuple()).setEnd(true, tokens.length-1);
}
}

public boolean isSafe(String text) {
String tokens = text.split("\s");
for (int i = 0; i < tokens.length; i++) {
String key = tokens[i];
int j = i;
while (j < tokens.length && m.containsKey(key)) {
Tuple t = m.get(key);
if (t.isEnd(j-i)) {
return false;
} else if ((j+1) < tokens.length && t.containsWord(tokens[j+1])) {
key = tokens[j+1];
j++;
} else {
break;
}
}
}
return true;
}
}


Is this an optimal solution? Or is there any better approach to solve this problem?










share|improve this question









New contributor




hsg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • I don't see any kind of trie involved in your code.
    – 200_success
    9 hours ago










  • @200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
    – hsg
    9 hours ago













up vote
2
down vote

favorite









up vote
2
down vote

favorite











We have been given a list of strings which are blacklisted. The goal is to identify if a given text contains any of these blacklisted strings. The restriction here is that the blacklisted string needs to match on the word boundary e.g. consider a blacklist string "abc" and text "abc pqr", the text in this case is unsafe (i.e. it contains a blacklisted string). On the other hand if the text is "abcoqr", then the text is safe since the string "abc" is not on the word boundary.



Here is my solution using a modified Trie data structure. https://gist.github.com/hgadre/d4e9ec576932167f01fd33970002a882



import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class SafeText {
static class Tuple {
int span = 0; // the length of previous words which should have been matched if end = true.
boolean end; // marks the identification of a blacklisted string.
Set<String> nextWords = new HashSet<>(); // next set of words to search for matching blacklisted strings.

public void setEnd(boolean end, int span) {
this.span = span;
this.end = end;
}

public boolean isEnd(int span) {
return end && span == this.span;
}

public void addNextWord (String word) {
this.nextWords.add(word);
}

public boolean containsWord(String word) {
return this.nextWords.contains(word);
}
}

private final Map<String, Tuple> m = new HashMap<>();


public SafeText(List<String> blackList) {
Collections.sort(blackList);
for (String str : blackList) {
String tokens = str.split("\s");
int i = 0;
for (; i < tokens.length - 1; i++) {
m.computeIfAbsent(tokens[i], x -> new Tuple()).addNextWord(tokens[i+1]);
}
m.computeIfAbsent(tokens[i], x -> new Tuple()).setEnd(true, tokens.length-1);
}
}

public boolean isSafe(String text) {
String tokens = text.split("\s");
for (int i = 0; i < tokens.length; i++) {
String key = tokens[i];
int j = i;
while (j < tokens.length && m.containsKey(key)) {
Tuple t = m.get(key);
if (t.isEnd(j-i)) {
return false;
} else if ((j+1) < tokens.length && t.containsWord(tokens[j+1])) {
key = tokens[j+1];
j++;
} else {
break;
}
}
}
return true;
}
}


Is this an optimal solution? Or is there any better approach to solve this problem?










share|improve this question









New contributor




hsg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











We have been given a list of strings which are blacklisted. The goal is to identify if a given text contains any of these blacklisted strings. The restriction here is that the blacklisted string needs to match on the word boundary e.g. consider a blacklist string "abc" and text "abc pqr", the text in this case is unsafe (i.e. it contains a blacklisted string). On the other hand if the text is "abcoqr", then the text is safe since the string "abc" is not on the word boundary.



Here is my solution using a modified Trie data structure. https://gist.github.com/hgadre/d4e9ec576932167f01fd33970002a882



import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class SafeText {
static class Tuple {
int span = 0; // the length of previous words which should have been matched if end = true.
boolean end; // marks the identification of a blacklisted string.
Set<String> nextWords = new HashSet<>(); // next set of words to search for matching blacklisted strings.

public void setEnd(boolean end, int span) {
this.span = span;
this.end = end;
}

public boolean isEnd(int span) {
return end && span == this.span;
}

public void addNextWord (String word) {
this.nextWords.add(word);
}

public boolean containsWord(String word) {
return this.nextWords.contains(word);
}
}

private final Map<String, Tuple> m = new HashMap<>();


public SafeText(List<String> blackList) {
Collections.sort(blackList);
for (String str : blackList) {
String tokens = str.split("\s");
int i = 0;
for (; i < tokens.length - 1; i++) {
m.computeIfAbsent(tokens[i], x -> new Tuple()).addNextWord(tokens[i+1]);
}
m.computeIfAbsent(tokens[i], x -> new Tuple()).setEnd(true, tokens.length-1);
}
}

public boolean isSafe(String text) {
String tokens = text.split("\s");
for (int i = 0; i < tokens.length; i++) {
String key = tokens[i];
int j = i;
while (j < tokens.length && m.containsKey(key)) {
Tuple t = m.get(key);
if (t.isEnd(j-i)) {
return false;
} else if ((j+1) < tokens.length && t.containsWord(tokens[j+1])) {
key = tokens[j+1];
j++;
} else {
break;
}
}
}
return true;
}
}


Is this an optimal solution? Or is there any better approach to solve this problem?







java algorithm strings






share|improve this question









New contributor




hsg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




hsg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 9 hours ago









200_success

128k15149412




128k15149412






New contributor




hsg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 10 hours ago









hsg

111




111




New contributor




hsg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





hsg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






hsg is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • I don't see any kind of trie involved in your code.
    – 200_success
    9 hours ago










  • @200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
    – hsg
    9 hours ago


















  • I don't see any kind of trie involved in your code.
    – 200_success
    9 hours ago










  • @200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
    – hsg
    9 hours ago
















I don't see any kind of trie involved in your code.
– 200_success
9 hours ago




I don't see any kind of trie involved in your code.
– 200_success
9 hours ago












@200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
– hsg
9 hours ago




@200_success its a modified trie. I am storing a mapping between a given word and the set of words which can occur in one or more blacklisted strings. e.g. if blacklisted string is "a b" then the mapping would be : a -> {false, 0, [b]} and b -> {true, 2, }. make sense?
– hsg
9 hours ago















active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






hsg is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209688%2ffind-out-whether-a-given-text-contains-any-of-the-prohibited-words%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes








hsg is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















hsg is a new contributor. Be nice, and check out our Code of Conduct.













hsg is a new contributor. Be nice, and check out our Code of Conduct.












hsg is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209688%2ffind-out-whether-a-given-text-contains-any-of-the-prohibited-words%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Ellipse (mathématiques)

Quarter-circle Tiles

Mont Emei