If you read the words into a set (one for each file), you can use set.difference(). This works if you don't care about the order of the output.
If you care about the order, read the first file into a list, the second into a set, and remove all the elements in the list that are in the set.
a = ["a", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"]
b = {"quick", "brown"}
c = [x for x in a if not x in b]
print c
gives: ['a', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']
Below are the different methods to remove duplicates in a string.
METHOD 1 (Use Sorting)
Algorithm: 1) Sort the elements. 2) Now in a loop, remove duplicates by comparing the current character with previous character. 3) Remove extra characters at the end of the resultant string.
Example:
Note that, this method doesn’t keep the original order of the input string. For example, if we are to remove duplicates for geeksforgeeks and keep the order of characters same, then output should be geksfor, but above function returns efgkos. We can modify this method by storing the original order. METHOD 2 keeps the order same.
Implementation:
# include <stdio.h> # include <stdlib.h> /* Function to remove duplicates in a sorted array */ char *removeDupsSorted(char *str); /* Utitlity function to sort array A[] */ void quickSort(char A[], int si, int ei); /* Function removes duplicate characters from the string This function work in-place and fills null characters in the extra space left */ char *removeDups(char *str) { int len = strlen(str); quickSort(str, 0, len-1); return removeDupsSorted(str); } /* Function to remove duplicates in a sorted array */ char *removeDupsSorted(char *str) { int res_ind = 1, ip_ind = 1; /* In place removal of duplicate characters*/ while(*(str + ip_ind)) { if(*(str + ip_ind) != *(str + ip_ind - 1)) { *(str + res_ind) = *(str + ip_ind); res_ind++; } ip_ind++; } /* After above step string is stringiittg. Removing extra iittg after string*/ *(str + res_ind) = '\0'; return str; } /* Driver program to test removeDups */ int main() { char str[] = "eeeefggkkosss"; printf("%s", removeDups(str)); getchar(); return 0; } /* FOLLOWING FUNCTIONS ARE ONLY FOR SORTING PURPOSE */ void exchange(char *a, char *b) { char temp; temp = *a; *a = *b; *b = temp; } int partition(char A[], int si, int ei) { char x = A[ei]; int i = (si - 1); int j; for (j = si; j <= ei - 1; j++) { if(A[j] <= x) { i++; exchange(&A[i], &A[j]); } } exchange (&A[i + 1], &A[ei]); return (i + 1); } /* Implementation of Quick Sort A[] --> Array to be sorted si --> Starting index ei --> Ending index */ void quickSort(char A[], int si, int ei) { int pi; /* Partitioning index */ if(si < ei) { pi = partition(A, si, ei); quickSort(A, si, pi - 1); quickSort(A, pi + 1, ei); } }Time Complexity: O(nlogn) If we use some nlogn sorting algorithm instead of quicksort.
METHOD 2 (Use Hashing )
Algorithm:
1: Initialize: str = "test string" /* input string */ ip_ind = 0 /* index to keep track of location of next character in input string */ res_ind = 0 /* index to keep track of location of next character in the resultant string */ bin_hash[0..255] = {0,0, ….} /* Binary hash to see if character is already processed or not */ 2: Do following for each character *(str + ip_ind) in input string: (a) if bin_hash is not set for *(str + ip_ind) then // if program sees the character *(str + ip_ind) first time (i) Set bin_hash for *(str + ip_ind) (ii) Move *(str + ip_ind) to the resultant string. This is done in-place. (iii) res_ind++ (b) ip_ind++ /* String obtained after this step is "te sringng" */ 3: Remove extra characters at the end of the resultant string. /* String obtained after this step is "te sring" */Implementation:
# include <stdio.h> # include <stdlib.h> # define NO_OF_CHARS 256 # define bool int /* Function removes duplicate characters from the string This function work in-place and fills null characters in the extra space left */ char *removeDups(char *str) { bool bin_hash[NO_OF_CHARS] = {0}; int ip_ind = 0, res_ind = 0; char temp; /* In place removal of duplicate characters*/ while(*(str + ip_ind)) { temp = *(str + ip_ind); if(bin_hash[temp] == 0) { bin_hash[temp] = 1; *(str + res_ind) = *(str + ip_ind); res_ind++; } ip_ind++; } /* After above step string is stringiittg. Removing extra iittg after string*/ *(str+res_ind) = '\0'; return str; } /* Driver program to test removeDups */ int main() { char str[] = "geeksforgeeks"; printf("%s", removeDups(str)); getchar(); return 0; }Time Complexity: O(n)
NOTES:
* It is assumed that number of possible characters in input string are 256. NO_OF_CHARS should be changed accordingly.
* calloc is used instead of malloc for memory allocations of counting array (count) to initialize allocated memory to ‘\0′. malloc() followed by memset() could also be used.
* Above algorithm also works for an integer array inputs if range of the integers in array is given. Example problem is to find maximum occurring number in an input array given that the input array contain integers only between 1000 to 1100
Relevant pages:
- Perl Array Addition
- Perl Foreach Key Hash
A short tutorial (with free source code) that shows how to sort a Perl hash by the hash key. We also have a companion article on how to sort a Perl hash by the hash value.
- Perl Eq Null
Hello- I'm trying to figure out some of the perl syntax...When is it appropriate to you single quotation marks and when is it appropriate to use double? Same question with == ...
- Perl Assignment Operators
- Remove Duplicates Array Perl




